#BackToBasics : An Updated "Kitchen Sink" Example

#BackToBasics : An Updated "Kitchen Sink" Example

At the beginning of the year, I pledged to publish an entry level post on the first Wednesday of every month throughout 2016. Even when the first Wednesday sneaks up on me as the first day of the month. For June, I wanted to talk about a pattern I see quite a bit when developers try to make a single "satisfy ALL the searches" stored procedure.

sales.customersA few years ago, I posted a video on our SQLSentry.TV site, describing what I call "the kitchen sink" procedure. I call it the kitchen sink because the pattern is a single procedure that is meant to satisfy a wide variety of very different search criteria. ALL of the search criteria, in fact.

Let's take a look at a table in the new Wide World Importers sample database, Sales.Customers, at right (click to enlarge).

This table has a lot of columns, obviously. Now envision an interface where employees at Wide World Importers could perform searches for specific customers. They may want to perform different searches based on unique combinations of criteria, including (and this is not an exhaustive list by any means, but enough for my purposes today):

  • CustomerID (e.g. pick the customer from a list);
  • actual "bill to" CustomerID;
  • name;
  • category;
  • buying group;
  • when the account was opened;
  • city; or,
  • whether the account is on credit hold.

And any combination of the above. I know you've seen and probably built interfaces like this before, but just as a quick mockup, the employees would have a screen something like this:

search_ui

So, it makes sense: on the receiving end of this UI, the developer writes a single stored procedure that accepts all of these optional search criteria, and then formulates a query that will plug in any submitted values. After all, you can't write a stored procedure for each decision point independently; how would you return unified search results if the user entered a name pattern of "starts with W" (which would call one stored procedure) and all accounts opened in 2015 (which would call a different stored procedure)?

As a result, the typical pattern might look like this:

CREATE PROCEDURE dbo.CustomerSearch_KitchenSink
  @CustomerID            int            = NULL,
  @CustomerName          nvarchar(100)  = NULL,
  @BillToCustomerID      int            = NULL,
  @CustomerCategoryID    int            = NULL,
  @BuyingGroupID         int            = NULL,
  @MinAccountOpenedDate  date           = NULL,
  @MaxAccountOpenedDate  date           = NULL,
  @DeliveryCityID        int            = NULL,
  @IsOnCreditHold        bit            = NULL
AS
BEGIN
  SET NOCOUNT ON;
 
  SELECT CustomerID, CustomerName, IsOnCreditHold
    FROM Sales.Customers
    WHERE (@CustomerID IS NULL 
           OR CustomerID = @CustomerID)
      AND (@CustomerName IS NULL 
           OR CustomerName LIKE @CustomerName)
      AND (@BillToCustomerID IS NULL 
           OR BillToCustomerID = @BillToCustomerID)
      AND (@CustomerCategoryID IS NULL 
           OR CustomerCategoryID = @CustomerCategoryID)
      AND (@BuyingGroupID IS NULL 
           OR BuyingGroupID = @BuyingGroupID)
      AND AccountOpenedDate >= 
          COALESCE(@MinAccountOpenedDate, AccountOpenedDate)
      AND AccountOpenedDate <= 
          COALESCE(@MaxAccountOpenedDate, AccountOpenedDate)
      AND (@DeliveryCityID IS NULL 
           OR DeliveryCityID = @DeliveryCityID)
      AND (@IsOnCreditHold IS NULL 
           OR IsOnCreditHold = @IsOnCreditHold);
END

I could go down a lot of rabbit holes about how this logic needs to account for things like nullability of columns, cases where NULL might actually mean something specific users might be searching for, and so on. But I'd like to focus on the main reason the above pattern is problematic. It all has to do with how SQL Server caches and re-uses execution plans.

 

The Main Problems

The first time you call this stored procedure (either after creation, after a service restart, or after some other event that evicted the cached plan or all cached plans), SQL Server will generate an execution plan that is optimized for the set of parameter values that were passed to the stored procedure. The next time you call the procedure, it will attempt to re-use the same plan, even if you pass a completely different set of parameters, or the same parameters with completely different values. Consider the following situations:

  1. Different parameters
    • On first call, you pass @CustomerName = N'W%' and @DeliveryCityID = 242. SQL Server creates and caches a plan that tries to optimize for those parameters. If we look at the Parameters tab in SQL Sentry Plan Explorer, we see the compiled and runtime parameter values:

    • On the next call, you pass @CustomerID = 900. SQL Server re-uses the same plan as above, and note that it is still optimizing for the first set of parameters:

    • Now, issue DBCC FREEPROCCACHE;, and try those again in the reverse order. The plans won't change much (because of the table design more than anything), and the specific details aren't overly important, but they will be different (at the very least, examine the estimated rows, and how the compiled/runtime parameter values swap).
  2. Different parameter values
    • Similar to the above example, let's clear the procedure cache, and then call the procedure with the leading wildcard of N'Y%'. I know that there is only a single row where the customer name starts with Y, and this is the core portion of the plan we get (there is other irrelevant stuff off to the lower right, but the important part is the index scan + key lookup against Customers), which costs 11 logical reads:

      bb-plan-2a

      SQL Server chose a seek and a lookup, which makes sense with only one row estimated.

    • Next, change the parameter value to N'W%', which will match 201 rows. Now the index scan + lookup doesn't make as much sense, but this is the plan SQL Server is stuck with, so it goes ahead and does that, at a (relatively!) whopping cost of 411 logical reads:

      bb-plan-2ab

    • If you free the procedure cache and run the N'W%' version again, SQL Server chooses a more optimal plan, just scanning the clustered index and avoiding the key lookups, and a much more reasonable 40 logical reads:

      bb-plan-2b

 

How Can We Fix It?

You may see some recommendations out there to just throw OPTION (RECOMPILE) onto the end of the query, and everything will be fine. While this can work, you should test against your entire workload, including edge cases and atypical parameters, to make sure that the overhead of compiling a new plan every single time, even in cases where the same parameters and parameter values are passed every time, does not counteract the performance benefit.

If you are dealing with a scenario where you have very different parameters coming in, but the parameter values don't change much when those parameters are used (or there is no significant data skew like this, where cases like N'W%' vs. N'Y%' wouldn't optimize much differently), I would use the following replacement logic. What it does is it only attaches parameter-based filter criteria when the user passes a valid value for that parameter. This way, a search against CustomerID can't possibly use a plan that is optimized for a search against CustomerName.

CREATE PROCEDURE dbo.CustomerSearch_DynamicSQL
  @CustomerID            int            = NULL,
  @CustomerName          nvarchar(100)  = NULL,
  @BillToCustomerID      int            = NULL,
  @CustomerCategoryID    int            = NULL,
  @BuyingGroupID         int            = NULL,
  @MinAccountOpenedDate  date           = NULL,
  @MaxAccountOpenedDate  date           = NULL,
  @DeliveryCityID        int            = NULL,
  @IsOnCreditHold        bit            = NULL
AS
BEGIN
  SET NOCOUNT ON;
 
  DECLARE @sql nvarchar(max) = N'
    SELECT CustomerID, CustomerName, IsOnCreditHold
      FROM Sales.Customers WHERE 1 = 1'
      + CASE WHEN @CustomerID IS NOT NULL THEN
        N' AND CustomerID = @CustomerID' ELSE N'' END
      + CASE WHEN @CustomerName IS NOT NULL THEN
        N' AND CustomerName LIKE @CustomerName' ELSE N'' END
      + CASE WHEN @CustomerName IS NOT NULL THEN
        N' AND CustomerName LIKE @CustomerName' ELSE N'' END
      + CASE WHEN @BillToCustomerID IS NOT NULL THEN
        N' AND BillCustomerID = @BillCustomerID' ELSE N'' END
      + CASE WHEN @CustomerCategoryID IS NOT NULL THEN
        N' AND CustomerCategoryID = @CustomerCategoryID' ELSE N'' END
      + CASE WHEN @BillToCustomerID IS NOT NULL THEN
        N' AND BuyingGroupID = @BuyingGroupID' ELSE N'' END
      + CASE WHEN @MinAccountOpenedDate IS NOT NULL THEN
        N' AND AccountOpenedDate >= @MinAccountOpenedDate' ELSE N'' END
      + CASE WHEN @MaxAccountOpenedDate IS NOT NULL THEN
        N' AND AccountOpenedDate <= @MaxAccountOpenedDate' ELSE N'' END
      + CASE WHEN @DeliveryCityID IS NOT NULL THEN
        N' AND DeliveryCityID = @DeliveryCityID' ELSE N'' END
      + CASE WHEN @IsOnCreditHold IS NOT NULL THEN
        N' AND IsOnCreditHold = @IsOnCreditHold' ELSE N'' END;
 
    DECLARE @params nvarchar(max) = N'
      @CustomerID            int,
      @CustomerName          nvarchar(100),
      @BillToCustomerID      int,
      @CustomerCategoryID    int,
      @BuyingGroupID         int,
      @MinAccountOpenedDate  date,
      @MaxAccountOpenedDate  date,
      @DeliveryCityID        int,
      @IsOnCreditHold        bit';
 
    --PRINT @sql;
 
    EXEC sys.sp_executesql @sql, @params, 
      @CustomerID,
      @CustomerName,
      @BillToCustomerID,
      @CustomerCategoryID,
      @BuyingGroupID,
      @MinAccountOpenedDate,
      @MaxAccountOpenedDate,
      @DeliveryCityID,
      @IsOnCreditHold;
END
GO

Will this fill up your plan cache with a whole bunch of different plans? Yes! And that's the point: Different plans optimized for different parameter combinations. This is one case where Optimize for ad hoc workloads can pay huge dividends: You won't actually cache full plans for specific parameter variations until that variation has been used twice.

Now, if you have cases where parameter sniffing (data skew or parameter value variance) is also a problem, you can easily thwart that by adding OPTION (RECOMPILE) – dynamically – only when parameters that are sensitive to skew are part of the query. This way you won't pay recompilation costs for parameters where skew is not an issue. For example:

IF @CustomerName IS NOT NULL
  SET @sql += N' OPTION (RECOMPILE)';

Afraid of dynamic SQL? Don't be. While the above is a little harder to read and maintain, the benefits will quite often outweigh that downside. And no, this code is no more vulnerable to SQL injection than the original procedure was.

For a lot more detail on parameter sniffing and the pros and cons of the various recompilation options, see this great post by Paul White.

Comments ( 2 )

  • Robert L Davis says:

    Sort of related: I saw a forum post this week where someone thought they had created a "catch-all index that would satisfy any possible query". A kitchen-sink index, if you will.

    CREATE NONCLUSTERED INDEX [IX_A] ON [dbo].[Ad]
    (
    	[ClientID] ASC,
    	[TransactionType] ASC,
    	[TradeDate] ASC,
    	[Broker] ASC,
    	[AccountNumber] ASC,
    	[ImportedFile] ASC,
    	[Currency] ASC,
    	[PricePerShare] ASC,
    	[NetAmount] ASC,
    	[Active] ASC
    )
    INCLUDE 
    (
     	[AccountName],
    	[SettleDate],
    	[Ticker],
    	[CUSIP],
    	[SecurityDescription],
    	[ApprovedBy],
    	[Custodian],
    	[PreAllocated],
    	[Executed],
    	[Commission],
    	[ResultingShares],
    	[Notes],
    	[InterestRate],
    	[MaturityDate],
    	[AccruedInterest],
    	[Par],
    	[OtherFees],
    	[AdvisorBlotterID],
    	[CreateDate],
    	[CreateBy]
    )
  • Sheldon Hull says:

    Fantastic write-up, I've read a lot on this, but this still helped. Thanks aaron

Leave A Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.