While busy with query performance tuning and trying to see whether the query would run better using a CTE’s (Common Table Expression) or Temp tables and comparing times, I realised I forgot to think about adding indexes on my temp tables which could increase the performance.
I then came across Brent Ozar’s post on Indexing Temp Tables and found it extremely useful and thought I would share it here.
Most people don’t think about indexing temp tables when they are trying to optimise their queries. Which is a shame, because I see people sticking fairly large amount of data into temp tables. On the rare occurrence that I do see them indexed, it’s a nonclustered index on a column or two. The optimser promptly ignores this index while you select 10 columns and join 10,000 rows to another temp table with another ignored nonclustered index on it.
Now, not every temp table needs a clustered index. I don’t even have a rule of thumb here, because it depends on so many things, like
- Are you joining it to other tables?
- What does the execution plan look like?
- Is this even part of the stored procedure that needs tuning?
Assuming that all three things lead us to index temp tables, let’s create some and see how they react to different indexing strategies.
SELECT TOP 100000 * INTO #TempUsers FROM dbo.Users AS u; SELECT TOP 100000 p.* INTO #TempPosts FROM dbo.Posts AS p JOIN #TempUsers AS tu ON p.OwnerUserId = tu.Id; SELECT TOP 100000 c.* INTO #TempComments FROM dbo.Comments AS c JOIN #TempUsers AS tu ON c.UserId = tu.Id JOIN #TempPosts AS tp ON c.PostId = tp.Id; SELECT * FROM #TempUsers AS tu JOIN #TempPosts AS tp ON tu.Id = tp.OwnerUserId JOIN #TempComments AS tc ON tc.UserId = tu.Id AND tc.PostId = tp.Id;
We have three temp tables with 100k rows in them. I joined them a bit to make sure we get good correlation in the data.
If we get the plan for select statement, this is our query plan. A bunch of table scans and hash joins. SQL is lazy. I don’t blame it one bit.
I’m so parallel…
After frequent user complaints and careful plan analysis, we decide to add some indexes. Let’s start with nonclustered indexes.
/*Nonclustered*/ CREATE NONCLUSTERED INDEX ix_tempusers ON #TempUsers (Id); CREATE NONCLUSTERED INDEX ix_tempposts ON #TempPosts (Id, OwnerUserId); CREATE NONCLUSTERED INDEX ix_tempposts2 ON #TempPosts (OwnerUserId, Id); CREATE NONCLUSTERED INDEX ix_tempcomments ON #TempComments (UserId, PostId);
I’ve gone ahead and added two indexes to #TempPosts so the optimizer has a choice if joining in another order is more efficient.
How’d we do?
Not so hot…
Not quite there. Two table scans and an index seek followed by a key lookup that SQL estimates will execute 62k times.
Let’s try that again with clustered indexes. We can only give #TempPosts one, so I picked at random. In real life, you might want to do your job!
/*Clustered*/ CREATE CLUSTERED INDEX cx_tempposts ON #TempPosts (Id, OwnerUserId); CREATE CLUSTERED INDEX cx_tempusers ON #TempUsers (Id); CREATE CLUSTERED INDEX cx_tempcomments ON #TempComments (UserId, PostId);
Our indexes get used and everyone is happy. Well, sort of. TempDB isn’t happy, but then it never is. It’s like your liver, or Oscar the Grouch.
Used and abused
When starting to index temp tables, I usually start with a clustered index, and potentially add nonclustered indexes later if performance isn’t where I want it to be.
You can do yourself a lot of favors by only bringing columns into the temp table that you need. If it’s still a lot, there’s a much higher chance SQL will judge a table scan cheaper than using a narrow nonclustered index along with a lookup.
Remember to test this carefully. You want to make sure that adding the indexes doesn’t introduce more overhead than they alleviate, and that they actually make your query better.