How do I decide if I should use a CTE or not?
Bear with me, I'm new to intermediate.
My question is - When should I use a CTE? How do I decide if I should use a CTE?
When should I use this:
;with cteTesting as ( select * from testing.first_table ) select * from testing.second_table s inner join cteTesting t on s.key = t.key
select * from testing.second_table s inner join ( select * from testing.first_table ) t on s.key = t.key
And why? Is this just for code flow, code readability - or is there something more technical? Will one yield better execution plans under some circumstances?
edit: Just realized my example code example is very poor. I was trying to highlight that there are many cases where I can use a select in the from statement instead of a CTE - how do I decide which one I should be using?
For simple examples, it doesn't make much difference. If you need to use the Recursive features to build up a hierarchy, then you haven't much choice - you need to use a CTE.
Another case where it probably doesn't make much performance difference, but does for readability, is when you need to join the same subquery multiple times. If you're using subqueries, you have to repeat the entire expression, whereas with CTEs, you just use the name twice:
;With NamedExpression as ( select t1.ID,t2.ID as ID2,SUM(t3.Value) as Val from Table1 t1 left join Table2 t2 on t1.id = t2.t1id inner join Table3 t3 on t3.col = t1.id or t3.col2 = t2.id group by t1.ID,t2.ID ) select * from NamedExpression ne inner join NamedExpression ne2 on ne.ID2 = ne2.ID
It should also be noted that if you do the above as subqueries, and the expressions are particularly complex, it can sometimes take time for the reader/maintainer to verify that the two subqueries are in fact identical, and there's not some subtle difference between the two
Also, if you have an indenting style that says that subqueries should appear further to the right than their enclosing query, then expressions that build on other expressions can cause all of the code to shift to the right - whereas with CTEs, you stop and move back to the left in building each subexpression (CTE):
;WITH CTE1 AS ( SELECT ... ), CTE2 as ( SELECT ... FROM CTE1 ), CTE3 as ( SELECT ... FROM CTE2 ) select * from CTE3
select * from ( select ... from ( select ... from ( select ...
I personally find the CTE version to be more readable especially if the select gets bigger.
When you use the derived table more than once in the main SELECT it might be better to use the CTE because it tells the database that you want to run this only once. Although I wouldn't be surprised if the optimizer was smart enough to detect two identical sub-selects in the from clause and only runs them once:
with foo as ( select .. from bar ) select f1.* from foo f1 join foo f2 on ...
select f1.* from (select ... from bar ) f1 join (select ... from bar) f2 on ...
I think the most important part is to be consistent (across what you write and inside your team).
I noticed that JOINs (esp. when combined with lot of WHERE clauses) can have disastrous performance when large data sets are involved.
CTEs can resolve this by selecting only relevant records and joining over these subsets.
Consider a CTE as a sort of pre-select to prepare data for the final SELECT.
One other reason I use CTEs isn't to replace derived tables, but to ensure that complex reporting SQL contains the correct records. So suppose you are doing some type of financial report and you want to make sure you return exactly the records you want. When you have 10 joins, it becomes hard to tell if the data is correct.
So I build a complex query in pieces using CTEs. So for instance I only want orders meeting certain criteria. The first CTE is the one that picks those out. I write it and then run a select on the CTE. This tells me my basline number of orders so that as I add complexity, I can immedaitely see where the number changed and determine if it should have changed or if I need to change the query. This lets me know fairly quickly if I need left joins or inner joins or if I might need conditions on a related table to limit it to one record.
Often when I do this I will have chained CTES before I get to the final select which is much simpler. And the further value of this is that I find it much easier to maintain these complex reporting queries when it is time to change them. So suppose I have CTEs in a chain like:
- Cost Summary
- Customer Demographics
Then when I need to change somthing about how I do the cost calculation, it is both easier to find where to make the change and easier to check the end results.