Speed up Update statement T-SQL with Top 1
I'm creating a stored procedure for an ETL script that'll run once per hour to give results of specific operations to users.
I need to find the previous result to the current result. This is fine and I have a working query that I export into Excel. However now I wish to automate the process.
The stored procedure averages at 42 seconds per run. This isn't feasible when running once an hour on the server as I have other automated scripts also running.
My issue is one chunk of the stored procedure averages at 28 seconds, whilst everything else usually takes less than a second (shows up at 00:00:00 in SSMS).
I've managed to reduce the runtime of other chunks myself bringing it down to 42 seconds average, but I can't do this.
I was wondering if any of you know any specfic ways to speed this small chunk up?
UPDATE #tmp SET prev_test_date = ( SELECT TOP 1 r.test_date FROM [dbo].[results] r (NOLOCK) WHERE r.number = #tmp.number AND r.test_date < #tmp.test_date ORDER BY r.test_date DESC )
I was originally going to use joins for this to speed it up, although I can't do this due to the TOP 1 part of the query.
For this query:
UPDATE #tmp SET prev_test_date = ( SELECT TOP 1 r.test_date FROM [dbo].[results] r WHERE r.number = #tmp.number AND r.test_date < #tmp.test_date ORDER BY r.test_date DESC )
You want an index on r(number, test_date).
If you are using SQL Server 2012+ and the test dates are not duplicated, you can also write this as:
with r as ( select r.*, lag(r.test_date) over (partition by r.number order by r.test_date desc) as prev_test_date from [dbo].[results] r ) update t set t.prev_test_date = r.prev_test_date from #tmp t join r on t.number = r.number;
In fact, if this is the case, you might not need the temporary table. You might be able to modify the code just to use lag().
UPDATE #tmp SET prev_test_date = ( SELECT max(r.test_date) FROM [dbo].[results] r (NOLOCK) WHERE r.number = #tmp.number AND r.test_date < #tmp.test_date )
Without more info it is hard to tell, but if there is simply too much processing You may need to make separate precalculated table and update it incrementally on data change.
UPDATE #tmp SET #tmp.prev_test_date = tt.maxdate from #tmp join ( select #tmp.number, max(r.test_date) maxdate from #tmp join [dbo].[results] r (NOLOCK) on r.number = #tmp.number AND r.test_date < #tmp.test_date group by #tmp.number ) tt on tt.number = #tmp.number
and have indexes on both #tmp and [results] on number, text_date
I'm having to make some assumptions about the structure and contents of your tables, but if my assumptions are correct, here's the approach I usually use in such situations:
with cteOrderedResults as ( -- Ideally R will be clustered by number, test_date for this select R.number ,R.test_date ,row_number() over ( partition by R.number order by R.test_date desc -- So the most recent R.test_date from -- before T.test_date gets RowNo=1 ) as RowNo from dbo.results R inner join #tmp T on R.number=T.number and R.test_date<T.test_date ) update T set T.prev_test_date=R.test_date from #tmp T inner join cteOrderedResults R on T.number=R.number and 1=R.RowNo
This approach works quickly for me on rowsets ranging up to about the million mark. As I've commented, I believe the partitioned row_number() is going to be taking advantage of a corresponding clustered index if it exists; you might find this doesn't work so fast if you don't have the table clustered appropriately.
I agree with comments made elsewhere here, that you should only add the nolock hint back in if you're really sure you need it. If you do, you should use the full correct syntax, with (nolock). From the official MSDN page:
Omitting the WITH keyword is a deprecated feature: This feature will be removed in a future version of Microsoft SQL Server.