Statistics – What, why and how (Part 4)
In my last post, we covered how essentially the same query can generate different execution plans based on cardinality estimates. We also looked further into how the Query Optimizer determines estimated rows.
In the final post for this series, we'll talk about statistics maintenance. Stats, like indexes, require maintenance. Unlike indexes, statistics don't become fragmented, but the histograms associated with those statistics become out of date as the data in the column changes. While statistics maintenance is important, you have to keep in mind that updating stats can also cause plan recompilation and that has costs of its own. So, like anything else, it should be done, but as DBAs, we should make some educated decisions about how and when. Stats can be updated via the Auto Update Statistics setting, manually or some combination of the two. We'll go through all of these.
Auto Update Statistics
Auto Update Statistics is a database level setting. It is enabled by default and can be viewed by looking at Database Properties->Options. What this setting does is automatically update statistics on a column when the number of changes passes a certain threshold. Here are those thresholds:
- Empty tables – Statistics will automatically update when any rows are entered.
- Tables with <= 500 rows – Statistics will automatically update when more than 500 changes (insert/update/delete/merge operations) have occurred
- Tables with > 500 rows – Statistics will automatically update when changes are greater than or equal to 500 + 20% of the table rowcount
Something you might notice is that if you have an extremely large table, it could be a while until statistics automatically update. Fortunately there is a trace flag that can be used in those situations. In SQL Server 2008 R2 SP1, trace flag 2371 was introduced. This trace flag causes that 20% threshold to change dynamically once the row count surpasses ~25,000 rows, meaning you will get more predictable updates based on the same churn.
You can find more on that trace flag in KB #2754171 and in the SAP on SQL Server blog. For more information about when automatic updates occur, Erin Stellato (b/t) wrote a great blog post, Understanding When Statistics Will Automatically Update. There are also some great posts about stats on SQLPerformance.com.
If you enable Auto Update Statistics, there is another setting that you should pay attention to: Auto Update Statistics Asynchronously. This setting determines when the statistics will be updated, and is disabled by default. This means that if the query optimizer sees that the statistics are out of date when compiling a query, the statistics will be updated at runtime and the execution plan recompiled if necessary. Typically, you want updated stats so that an optimal plan is chosen. There may be situations, though, when you don't want the stats updated and plan recompiled at runtime, which forces the triggering query to wait for the update to finish. There can be cases, especially with very large tables, that this could cause timeouts. In those cases, you would want to enable Auto Update Statistics Asynchronously, so that the query that caused the update can continue running (using the old stats) while the stats are updated in the background. You can also set that on a table, index, or stats object using sp_autostats.
Updating statistics manually
Like most things SQL Server related, while you should set up options for actions to happen automatically (like auto update stats and file autogrowth), it's usually best to intentionally manage your indexes. There are three ways outside of automatic updates that stats can be updated. Whenever an index is rebuilt, the associated statistics are also rebuilt with fullscan. Or you can manually update stats using sp_updatestats or UPDATE STATISTICS.
This is basically the hammer method of updating your stats. It runs UPDATE STATISTICS against (almost) all of your statistics objects where there has been one or more change since the last update. I specified almost all, because filtered statistics will not be updated. One note, though: for memory-optimized tables, all statistics will be updated regardless of whether there have been any changes since the last update. Also, if you don't specify a value for @resample, it will use the default sample size. Using @resample = resample will use the last sample rate. While it is probably better than not updating stats at all, it's not the optimal method. For more information about the issues with sp_updatestats, see Erin's posts, Understanding What sp_updatestats Really Updates and Another Reason to Avoid sp_updatestats.
Using UPDATE STATISTICS allows you finer control of what stats you are updating. Obviously you will want to update statistics on columns where there have been a significant number of changes, either in number of values, or distribution of values. As of SQL Server 2008R2 SP2 (or SQL Server 2012 SP1), you can use the DMF sys.dm_db_stats_properties to see how many modifications have occurred in any of the statistics for a particular table. The query can look something like this:
SELECT sp.stats_id, s.name, s.filter_definition, sp.last_updated, sp.[rows], sp.rows_sampled, sp.steps, sp.unfiltered_rows, sp.modification_counter FROM sys.stats AS s CROSS APPLY sys.dm_db_stats_properties(s.[object_id], s.stats_id) AS sp WHERE s.[object_id] = OBJECT_ID(N'Sales.SalesOrderDetail');
The results provide information about the last time the stats were updated, the total number of rows vs. the rows sampled in the last update, but most importantly, it shows the number of modifications since that last update.
With this information, you can put together a more focused statistics maintenance strategy, e.g. updating statistics where modifications are greater than X% of the total row count, or even modifications greater than X% of the row count and the stats have not been updated within the last Y days. As with anything else: test, evaluate, re-test, re-evaluate.
I hope that this series on statistics has given you some insight into what stats are and how they affect execution plans.