Data Compression in the SQL Sentry Database, Part 1

Applying Data Compression to the SQL Sentry Database : Part 1

Before I jump into this blog post, which is my first post at SQL Sentry, I would like to take a brief moment to introduce myself. Hello, reader! I am Melissa; you may know me from PASS events as the person running the Maine chapter (where you were surprised to learn about non-lobster related careers existing in Maine) or have stumbled across me on Twitter over the years as @MelikaNoKaOi, which is clearly my stage name ("All the world's a stage", after all). I started working on Special Projects (or SpecialOps as I say when I am trying to impress my friends) here at SQL Sentry in April. One of those first projects involved analyzing the usage of the data compression feature in SQL Server with the SQL Sentry database.

Many helpful and knowledgeable bloggers out there have posted about SQL Server’s data compression feature. While I will go over some of the very basics, this series of posts is primarily concerned with showing you how you can best implement data compression with your SQL Sentry repository database and what to expect for changes in performance and resource consumption. I hope that walking you through various stages of my testing and analysis will also give you some ideas on how to apply data compression to one of your other databases, but that will not be the focus here. If you are looking for more in-depth coverage on data compression, I recommend this SQL Server Technical Article from Microsoft titled, "Data Compression: Strategy, Capacity Planning, and Best Practices".

The data compression feature is limited to the Enterprise Edition of SQL Server. This is still true in SQL Server 2014 and appears to remain that way in SQL Server 2016. I wanted to get that reminder out of the way in the beginning so as not to disappoint anyone at the end of all of this. I suppose I am a "bad news first" kind of person. It's best to go out on a high note, right? That will be my approach to this series.

When considering data compression, these two thoughts seem to come to mind almost instantly "this will save disk space" and "this will be a major hit to CPU." Both of those statements can be true, but if done properly you can minimize the impact to CPU, if done improperly, you can actually waste more disk space. I will show you an example of that in the SQL Sentry database as we continue. I will also highlight a couple of benefits outside of saving disk space.

When I first started exploring compression in a test SQL Sentry database, I applied it to the top 50 largest indexes by row count to get some numbers for overhead. That roughly equated applying compression to 7% of the indexes in the database. I am not advocating the blind application of compression in such a manner, I was exploring in a test environment at this point to satisfy some curiosity. One of the many benefits of working in a test environment is that you can do something like that right away to look for some of the negative side effects and make note of things that might require more attention later on in the testing and fine-tuning phase.

For those unfamiliar with compression, you must make a choice between row compression and page compression when you apply it. Row compression saves space by providing greater efficiency in data storage at the row level, and page compression adds prefix and dictionary compression to the efficiency attained in row compression, which may result in significantly more savings in space.

A look at sp_spaceused after applying compression to the top 50 indexes by row count showed about a 42% space savings in data using row compression and a 61% space savings in data when using page compression (click to enlarge).

Figure 1-A showing sp_spaceused in MB for the SQL Sentry test database

The unallocated space in this test database increased from 77GB to 109GB between no compression and page compression. Now the question is what do you do with all of that extra space? You might be tempted to shrink your database; you might even have requirements to do so. I am going to refer you to Paul Randal’s blog, "Why you should not shrink your data files" and strongly suggest that if you choose to shrink your database that you perform a proper reorganize as Paul recommends.

A quick point to make about the space savings achieved through data compression is that it is not the same compression used in backup compression. If you normally compress your backups, the backups with the added data compression will compress a bit more. If you do not compress your backups, then the backups using data compression may be significantly smaller. The following charts are examples of backup sizes from my test database.

Figure 1-B showing database backup files (with backup compression) in MB for the SQL Sentry test database

Figure 1-C showing database backup files (without backup compression) in MB for the SQL Sentry test database

Others have pointed out the space and time savings you can achieve from using backup compression even if your data is already compressed: Aaron Bertrand and Denis Gobo.

At this point, we know from the overall space savings that at least some of those indexes in the top 50 compress well. It would be useful to see that size difference for individual tables and indexes. One of the first things you can do when considering data compression for a table is to execute the sp_estimate_data_compression_savings procedure to see how well the data will compress. We will start in SQL Sentry by using one of the larger tables in a production database, EventSourceHistoryDetail.

EXEC sys.sp_estimate_data_compression_savings 
  @schema_name      = N'dbo', 
  @object_name      = N'EventSourceHistoryDetail',
  @index_id         = NULL,
  @partition_number = NULL, 
  @data_compression = N'ROW';

T-SQL to estimate savings with row compression

EXEC sys.sp_estimate_data_compression_savings 
  @schema_name      = N'dbo', 
  @object_name      = N'EventSourceHistoryDetail',
  @index_id         = NULL,
  @partition_number = NULL, 
  @data_compression = N'PAGE';

T-SQL to estimate savings with page compression

I used the size_with_current_compression_setting(KB) & size_with_requested_compression_setting(KB) columns, returned by the above statements to build a more suitable table for my testing and reporting:

Figure 1-D showing size with row compression for each index in the EventSourceHistoryDetail table in the SQL Sentry test database

Figure 1-E showing size with page compression for each index in the EventSourceHistoryDetail table in the SQL Sentry test database

You can see that index-by-index, page compression provides a significant "% savings" over row compression, but this information alone does not tell us whether we should apply either method of compression to these indexes. I will go into the details of making those decisions in the third post of this series.

Next, we will survey the PerformanceAnalysisPlan table because I promised you an example of poor compression.

EXEC sys.sp_estimate_data_compression_savings 
  @schema_name      = N'dbo', 
  @object_name      = N'PerformanceAnalysisPlan',
  @index_id         = NULL,
  @partition_number = NULL, 
  @data_compression = N'ROW';

T-SQL to estimate savings with row compression

EXEC sys.sp_estimate_data_compression_savings 
  @schema_name      = N'dbo', 
  @object_name      = N'PerformanceAnalysisPlan',
  @index_id         = NULL,
  @partition_number = NULL, 
  @data_compression = N'PAGE';

T-SQL to estimate savings with page compression

Figure 1-F showing size with row compression for each index in the PerformanceAnalysisPlan table in the SQL Sentry test database

Figure 1-G showing size with page compression for each index in the PerformanceAnalysisPlan table in the SQL Sentry test database

Notice the overall lack of space savings, the increased size on IX_PerformanceAnalysisPlan_PlanH, and the fact that the highest "% savings" comes from the smallest indexes. These indexes would not be good candidates for data compression. Imagine the damage you could inflict by simply applying compression to your largest tables without looking at this information first. All of the overhead, with none of the savings!

In my second post, I will show you some examples of overhead created by data compression. The third post will go over some of the benefits achieved with compression and walk through the decision making process for applying compression to specific indexes, and finally, the fourth post will provide you with the scripts necessary to apply that compression to your SQL Sentry database as well as some recommendations on the best way to execute them.

Thank you for reading my first post and I hope you enjoy the series as we move forward.

Comments ( 2 )

Leave A Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.