Resolving Key Lookup Deadlocks with Plan Explorer
One of the sleeper features of our free Plan Explorer is its ability to open and view SQL Server deadlocks. The view is designed to clearly present all of the details you need to troubleshoot a deadlock, without overwhelming you with less helpful elements like owner IDs and transaction descriptors.
The deadlock diagram is synchronized with a treeview showing query text, call stack, lock details, owners & waiters, and more. Clicking on any process node on the diagram instantly takes you to the associated query, and sequence indicators on the connectors let you quickly ascertain the order of events that led to the deadlock.
The view is mostly the same between the standalone Plan Explorer and the integrated Plan Explorer in the full SQL Sentry software. The primary difference is that SQL Sentry automatically captures the deadlocks and associated queries and query plans for you, and lets you jump directly into the query plan for further analysis, index updates, etc. With standalone Plan Explorer you must capture the deadlock xml via other means, and then you can open the .xdl file manually.
I'd estimate that at least 2/3 of the deadlocks I've run across in my career working with SQL Server involve key lookups. They seem to be everywhere on busy OLTP systems, and are most common when SELECTs with key lookups are regularly operating within the same range of rows as many UPDATEs and DELETEs. The good news is that they are often one of the easier deadlocks to identify and resolve.
If you open a deadlock with Plan Explorer, the telltale sign a key lookup is involved is a key lock against a clustered index, and a key lock against a non-clustered index. These are visible on both the diagram and treeview as shown below (click to enlarge):
With integrated Plan Explorer, you can quickly confirm by clicking the "View" button on the non-clustered index row (shown above) and it will open the query plan captured automatically when the deadlock occurred. If no plan was captured, it will auto-request an estimated plan.
With standalone Plan Explorer, simply copy the query text via right-click
Copy -> Cell, and paste it into a new session tab and request the estimated or actual query plan.
If it turns out that the SELECT doesn't use a key lookup but rather a clustered index scan, it's likely because statistics and/or the cached plan have changed, and the optimizer now thinks a scan would be more efficient. (For more details on how and why this happens, see Kimberly Tripp's tipping point series.)
To create a simulated deadlock, I'll use two queries against the WideWorldImporters database, a SELECT and an UPDATE, both looped to simulate heavy activity. If you run these in separate tabs in SSMS, a deadlock should result within a few seconds.
-- Run in SSMS session 1: USE WideWorldImporters; SET NOCOUNT ON; declare @counter int = 0; while (@counter < 100000) begin UPDATE [WideWorldImporters].[Sales].[OrderLines] SET [Quantity] = @counter WHERE OrderLineID = 231390; set @counter = @counter + 1; end --------------------------- -- Run in SSMS session 2: USE WideWorldImporters; SET NOCOUNT ON; declare @counter int = 0; while (@counter < 100000) begin SELECT [OrderLineID] ,[OrderID] ,[StockItemID] ,[Quantity] ,[UnitPrice] ,[TaxRate] FROM [WideWorldImporters].[Sales].[OrderLines] WHERE PickingCompletedWhen > '2016-05-31'; set @counter = @counter + 1; end
I'll walk through the sequence of events leading to the deadlock, using Plan Explorer's deadlock and plan diagrams to illustrate each step. The nodes outlined in green (or red) are those that are relevant for that step, and the number in parens on each deadlock connector line represents the order in which the lock was acquired.
- The SELECT retrieves data from the non-clustered index via a Seek. A shared lock is taken on one or more pages.
- The UPDATE takes an exclusive lock on the clustered index row. Because at least one of the columns in the non-clustered index is being updated, an exclusive lock is immediately attempted on the non-clustered index row, but it can't be acquired because of the shared lock owned by the SELECT.
- The Key Lookup tries to acquire a shared lock on the same same row in the clustered key currently locked by the UPDATE, but it can't be granted because of its exclusive lock.
At this point a stalemate exists until the lock monitor thread detects the deadlock, usually within 5 seconds. Inevitably the SELECT will be the loser because the UPDATE will have done more work. (The amount of work done by each process is shown in the Log Used column in the treeview.)
- Once the SELECT thread has been chosen as the deadlock victim and terminated, the UPDATE can successfully acquire the exclusive lock on the non-clustered row and update it.
In the plan diagram above, note that the number of non-clustered index updates associated with an UPDATE or DELETE is always highlighted in blue below the Clustered Index Update operator, with the list of affected index names shown in the tooltip.
The key lookup happens because the non-clustered index doesn't include, or cover, all of the columns used by the query. This is apparent on the Index Analysis tab – note the light red cells for Tax Rate and Unit Price (click to enlarge):
The easy solution is to add these as included columns to the existing index, which will eliminate the lookup. I can do this by selecting the dropdowns for both red cells and selecting the "Included" option, then clicking the Script Index (<S>) button (see image at right).
We also show a missing index recommendation found by the optimizer, but there's no point in creating an entirely new index when only these two columns are needed.
NOTE: Before adding included columns, you should first consider the number of rows in the index, the total size of the columns, and the level of DML activity on the table to determine whether the additional overhead of a larger index will be justified.
If adjusting the non-clustered index isn't a viable option, and the clustered index is small enough where it will easily fit entirely in buffer (maybe a few thousand pages max), a FORCESCAN hint can be used with the SELECT to force a clustered index scan.
If you don't control the SQL, such as when it is being sent by a 3rd party app, a plan guide can be used to apply the hint. When using QDS (Query Data Store) on SQL Server 2016+, the scan plan can be "forced" instead. You should of course adequately test to ensure that shifting those lookups over to scans isn't going to cause a significant performance hit.
Another option would be to enable RCSI (Read Committed Snapshot Isolation), which will effectively prevent the blocking that causes these deadlocks. I would not recommend RCSI for resolving key lookup deadlocks alone, but rather more as a possibility to consider for systems suffering from heavy reader-writer contention in general. This is not something to be done lightly, so if you're not familiar with RCSI, I'd encourage you to read this post by Paul White, and this one by Kendra Little, before making such a move.
Hopefully by now you are excited to squash some key lookup deadlocks. There are two ways get started with SentryOne software:
- Download the free Plan Explorer and open any deadlock .xdl file.
- Download the SentryOne trial and start monitoring your SQL Servers with SQL Sentry. You'll be alerted via email whenever a deadlock occurs, and you can simply click the link in the email to jump directly to the deadlock.