Speed up Update statement T-SQL with Top 1

I'm creating a stored procedure for an ETL script that'll run once per hour to give results of specific operations to users.

I need to find the previous result to the current result. This is fine and I have a working query that I export into Excel. However now I wish to automate the process.

The stored procedure averages at 42 seconds per run. This isn't feasible when running once an hour on the server as I have other automated scripts also running.

My issue is one chunk of the stored procedure averages at 28 seconds, whilst everything else usually takes less than a second (shows up at 00:00:00 in SSMS).

I've managed to reduce the runtime of other chunks myself bringing it down to 42 seconds average, but I can't do this.

I was wondering if any of you know any specfic ways to speed this small chunk up?

UPDATE #tmp
SET prev_test_date = (  
    SELECT TOP 1 r.test_date 
    FROM [dbo].[results] r (NOLOCK)
    WHERE r.number = #tmp.number
    AND r.test_date < #tmp.test_date
    ORDER BY r.test_date DESC
)       

I was originally going to use joins for this to speed it up, although I can't do this due to the TOP 1 part of the query.

Any ideas?

Answers


For this query:

UPDATE #tmp
SET prev_test_date = (  
    SELECT TOP 1 r.test_date 
    FROM [dbo].[results] r
    WHERE r.number = #tmp.number AND
          r.test_date < #tmp.test_date
    ORDER BY r.test_date DESC
)   

You want an index on r(number, test_date).

If you are using SQL Server 2012+ and the test dates are not duplicated, you can also write this as:

with r as (
      select r.*,
             lag(r.test_date) over (partition by r.number order by r.test_date desc) as prev_test_date
      from [dbo].[results] r
     )
update t
    set t.prev_test_date = r.prev_test_date
    from #tmp t join
         r
         on t.number = r.number;

In fact, if this is the case, you might not need the temporary table. You might be able to modify the code just to use lag().


UPDATE #tmp
SET prev_test_date = (  
    SELECT max(r.test_date) 
    FROM [dbo].[results] r (NOLOCK)
    WHERE r.number = #tmp.number
    AND r.test_date < #tmp.test_date    
)  

Without more info it is hard to tell, but if there is simply too much processing You may need to make separate precalculated table and update it incrementally on data change.


UPDATE #tmp
   SET #tmp.prev_test_date = tt.maxdate 
from #tmp 
join 
(
select #tmp.number, max(r.test_date) maxdate 
  from #tmp
  join [dbo].[results] r (NOLOCK)
        on r.number = #tmp.number
       AND r.test_date < #tmp.test_date
 group by #tmp.number 
 ) tt
 on tt.number = #tmp.number

and have indexes on both #tmp and [results] on number, text_date


I'm having to make some assumptions about the structure and contents of your tables, but if my assumptions are correct, here's the approach I usually use in such situations:

with cteOrderedResults as (
  -- Ideally R will be clustered by number, test_date for this
  select      R.number
              ,R.test_date
              ,row_number() over  (   partition by R.number
                                      order by R.test_date desc 
                                      -- So the most recent R.test_date from 
                                      -- before T.test_date gets RowNo=1
                                  ) as RowNo
  from        dbo.results R
  inner join  #tmp T            on R.number=T.number
                                and R.test_date<T.test_date

  )
update      T
set         T.prev_test_date=R.test_date
from        #tmp T
inner join  cteOrderedResults R   on T.number=R.number
                                  and 1=R.RowNo

This approach works quickly for me on rowsets ranging up to about the million mark. As I've commented, I believe the partitioned row_number() is going to be taking advantage of a corresponding clustered index if it exists; you might find this doesn't work so fast if you don't have the table clustered appropriately.

I agree with comments made elsewhere here, that you should only add the nolock hint back in if you're really sure you need it. If you do, you should use the full correct syntax, with (nolock). From the official MSDN page:

Omitting the WITH keyword is a deprecated feature: This feature will be removed in a future version of Microsoft SQL Server.


Need Your Help

How to dynamically load pictures onto webpage from databse using javascript/node js

javascript image node.js express pug

I am working on a web app, and I am creating a user profile page in which I want to load in user information, and display a profile picture. I am using node/express/jade. I have a javascript file ...

Rails - MVC - Should I move an external API call to the model?

ruby-on-rails api rails-models

I'm very new to rails and MVC development in general and have been working on an app which makes an external API call to bitly.