Delayed Durability while Purging Data - SentryOne Team Blog

Delayed Durability while Purging Data

Delayed Durability Background

It is January, which means a new year, a new you, time to purge everything from your closet to your pantry, and perhaps your database, too! This whole data purge related topic at the beginning of the year is nothing more than a happy little accident, but I thought I would roll with it just now.

Microsoft introduced the Delayed Durability feature in SQL Server 2014 as a potential way to reduce latency and increase performance, but with an increased risk of data loss (up to about 7MB in a perfect storm, worst-case scenario). If you are not familiar with this feature, I invite you to read Aaron Bertrand’s detailed blog post regarding Delayed Durability before continuing with this one.

Despite Delayed Durability having been available for a couple of years, I personally have not talked about it with anyone who is actually using it in production. I suspect this is because people are uncomfortable with an increased risk of data loss, which is understandable, since data loss has been associated with an increased risk of job loss. To be fair, my perception of it as a rarely-used feature could easily be because I do not talk to that many people. Have I mentioned that my social network consists mostly of squirrels?

Recently, while considering possible use cases for Delayed Durability, it occurred to me that data loss might be entirely acceptable in cases where the data would not truly be lost. I have worked with a number of applications that have processes that purge old information from the database. If a purge process failed in these applications, data would simply live a little bit longer, and be purged the next time the process was successful – they have a recovery mechanism built in as it is. I decided to test Delayed Durability in a database with a long-running purge to observe the potential performance impact. I chose a process that was clearly contributing to transaction log waits, because that is where the real performance impact comes from when delaying durability. If you do not have notable waits or some level of a bottleneck there, you are not likely to improve anything simply by turning on this feature.

Now, because I wanted to look at the impact of utilizing this on a specific purge process, I set Delayed Durability to Allowed within the database options, and updated the specific transaction to commit with Delayed Durability set to On. You can view the specifics of how to do this in Aaron's post as well as the Control Transaction Durability entry on MSDN.

Testing with SQL Sentry Performance Advisor

Testing this process was rather simple. I had a test server where I executed the stored procedure, and monitored that server with SQL Sentry Performance Advisor. I was able to watch my stored procedure execute within the Top SQL tab to get some statistics, including the basics like start/end time and duration.

On the Performance Advisor Dashboard, I used the start and end times from Top SQL to set the window for my baseline:

showing start and end times on Performance Advisor DashboardFigure 1 showing start and end times on Performance Advisor Dashboard

I highlighted that time period on the Dashboard, and right-clicked to create both a Quick Report and a Baseline:

create quick report and baselineFigure 2 showing how to create reports and baselines from the Performance Advisor Dashboard

I used the values calculated in the Baselines and Quick Reports for my comparisons:

baseline valuesFigure 3 showing values calculated for the Baseline

quick reportFigure 4 showing values calculated for the wait stats via Quick Report

You can see how quickly you can collect and compare your performance metrics when using Performance Advisor.

Delayed Durability Test Results

The purge process that I tested in this example had multiple transactions that required a change to commit with Delayed Durability set to on. As shown below, the process took about ninety minutes to complete when Delayed Durability was Disabled in the database and approximately sixty-six minutes to complete when Delayed Durability was Allowed and turned on for specific transactions.

26% savings in total duration with delayed durabilityFigure 5 showing Delayed Durability providing a 26% savings in total duration

As one would probably expect, there is a very similar percentage of savings between duration and disk-related waits when using Delayed Durability. If you have these types of waits during your purge process, then it is possible that this change will assist in reducing them.

25% savings in disk-related waits with delayed durabilityFigure 6 showing Delayed Durability providing a 25% savings in disk-related waits

The disk waits related specifically to the transaction log show a 48% improvement when delaying durability.

48% savings in transaction log related waits with delayed durabilityFigure 7 showing Delayed Durability providing a 48% savings in transaction log related waits

There is also a 48% decrease in the average number of transaction log flushes per second when delaying durability.

48% savings in average log flushes per second with delayed durabilityFigure 8 showing Delayed Durability providing a 48% savings in average log flushes per second

In this specific process, environment, and overall scenario that I have tested, I saw a noteworthy performance improvement when delaying durability during a significant purging of data. If you have a purge or even an archive-type process where you would not actually lose data when delaying durability, and could restart the process if it should fail, I would encourage you to try this in your test environment. What do you have to lose? ;-)

While one might expect to see a performance improvement by using this SQL Server feature, it still falls under the omnipresent umbrella of "it depends." Trust that it works in at least one scenario for me, but verify that it also works in yours because that is what really matters, for you anyway. If you are fortunate enough to have SQL Sentry Performance Advisor, then testing this out will be particularly easy for you with the baseline feature to collect those metrics and calculate the averages for you. It certainly makes my life easier when performing these types of tests.

SQL Sentry Example

For a real use case, we have background purge processes in the SQL Sentry database that simply delete old data. If these transactions were lost due to a power outage or some other catastrophe, it wouldn't be the end of the world; those same rows would end up being deleted the next time the purge process ran. If you have SQL Sentry Performance Advisor, SQL Server version 2014 or higher, and would like to enable Delayed Durability in your SQL Sentry database, please follow the instructions in this example.

--Step 1
--You need to allow Delayed Durability for the SQL Sentry database
ALTER DATABASE dbname SET DELAYED_DURABILITY = ALLOWED;

Altering the database options

This may also be changed or checked via Database Properties –> Options in SQL Server Management Studio.

SSMS Database Properties OptionsFigure 9 showing Delayed Durability setting in SSMS

The stored procedure that we'll be altering in this example is called [dbo].[PurgePerformanceAnalysisData]. This particular purge process was chosen based on its transaction log usage pattern and observed benefits when allowing Delayed Durability. You should make a backup copy of this stored procedure before proceeding. You could do this by right-clicking on the stored procedure in SSMS, and choosing to "Script Stored Procedure as" -> "Create To" -> "File" to create a script of the original.

SSMS Creating backup copy of stored procedureFigure 10 showing how to send the stored procedure to a create file

SSMS Save file dialog for original stored procedureFigure 11 showing the save file dialog for original stored procedure

Once you have verified that your backup has been created, you are ready to alter the stored procedure with the Delayed Durability changes. There are four transactions in this [dbo].[PurgePerformanceAnalysisData] stored procedure which will need to be wrapped in an explicit transaction to commit with delayed durability.

SSMS Modify the stored procedureFigure 12 showing how to modify the stored procedure via SSMS

For each transaction, you will be adding a line before the code block:

BEGIN TRANSACTION [DelayedDurabilityOnTransactionName]

Opening line for wrapping the transaction

As well as a line after the code block:

COMMIT TRANSACTION [DelayedDurabilityOnTransactionName] WITH (DELAYED_DURABILITY = On  )

Closing line for wrapping the transaction

Below are the four blocks of code you will need to find and wrap:

--First Transaction
--Located around line 260
BEGIN TRANSACTION [DelayedDurabilityOn]
	EXEC sp_executesql
		@DynamicSQL
		,@SQLParams
		,@Timestamp = @Timestamp
		,@CounterID = @CounterID
	SET @DeleteRowCount = @@rowcount
	SET @DeleteRowCountTotal = @DeleteRowCountTotal + @DeleteRowCount
 
	SET @CountersTableRowID = @CountersTableRowID + 1
COMMIT TRANSACTION [DelayedDurabilityOn] WITH (DELAYED_DURABILITY = On  )

Altering the first transaction for delayed durability

--Second Transaction
--Located around line 285, if counting these comments
BEGIN TRANSACTION [DelayedDurabilityOn2]
	EXEC (@DynamicSQL)
	SET @DeleteRowCount = @@rowcount
	SET @DeleteRowCountTotal = @DeleteRowCountTotal + @DeleteRowCount
COMMIT TRANSACTION [DelayedDurabilityOn2] WITH (DELAYED_DURABILITY = On  )

Altering the second transaction for delayed durability

--Third Transaction
--Located around line 366, if counting these comments
BEGIN TRANSACTION [DelayedDurabilityOn3]
	EXEC sp_executesql
		@DynamicSQL
		,@SQLParams
		,@Timestamp = @Timestamp
	SET @DeleteRowCount = @@rowcount
	SET @DeleteRowCountTotal = @DeleteRowCountTotal + @DeleteRowCount
 
	SET @TimestampsTableRowID = @TimestampsTableRowID + 1
COMMIT TRANSACTION [DelayedDurabilityOn3] WITH (DELAYED_DURABILITY = On  )

Altering the third transaction for delayed durability

--Fourth Transaction
--Located around line 404, if counting these comments
BEGIN TRANSACTION [DelayedDurabilityOn4]
	EXEC sp_executesql
		@DynamicSQL
		,@SQLParams
		,@PurgeBeforeTimestamp = @PurgeBeforeTimestamp
		,@DeleteRowCountTotal = @DeleteRowCountTotal
		,@CategoryID = @CategoryID
	SET @CategoryTableRowID = @CategoryTableRowID + 1
COMMIT TRANSACTION [DelayedDurabilityOn4] WITH (DELAYED_DURABILITY = On  )

Altering the fourth transaction for delayed durability

Once you execute these changes, you will be able to take advantage of Delayed Durability in your SQL Sentry database. Unfortunately, you may need to re-apply these changes after a future upgrade. Fortunately, we will be adding these changes to our product so eventually they will be built in and you won't need to make any modifications to the stored procedure to benefit from Delayed Durability.

Comments ( 11 )

          • Andrej Kuklin says:

            Yes, I think it was indeed the question of definition. My definition of the data loss was from the client point of view: "The transaction was successfully committed, but the data is not in my user tables". I think this is also what most people would expect :). But now I understand what Melissa and you meant by max. 7Mb irrecoverable data loss.

          • Andrej Kuklin says:

            It's only the statement "risk of data loss (up to about 7MB in a perfect storm, worst-case scenario)" in the article I'm arguing about. Of course, DD has its use cases and the example in the article is in my opinion one of the good ones.

          • Aaron Bertrand says:

            I still think we're talking about two different levels of "data loss."

            Yes, you can have a transaction > 7MB that gets rolled back during recovery in the event of a crash.

            Without delayed durability, because SQL Server uses write-ahead logging, you can get that information back (as long as the log file is still available, and hasn't been corrupted as a result of the crash or other attempts to retrieve it).

            With delayed durability, you can lose up to around 7MB of data completely, if that data is in the log buffers but not yet written to disk at the time of the crash. The log buffers can't contain more than about 7MB of data, so for a bigger transaction, anything more than 7MB should have already been written to the log, and therefore can be recovered just like any other rolled back transaction. And I think it would be very hard for you to even get to that 7MB threshold.

            If you have a transaction > 7MB and a crash happens before any of that data even gets written to the log, then the data loss you're talking about (which can be recovered from the log) is no different with or without delayed durability.

          • Andrej Kuklin says:

            Hello Aaron,

            It took me some substantial time to answer your question :)
            I've not seen it in production yet, but data loss of more than 7Mb is relatively easy to reproduce with Hyper-V.
            Shameless plug: I've written a blog post illustrating the steps https://www.sdx-ag.de/2017/02/delayed-durability-der-mythos-ber-die-grenze-des-maximalen-datenverlusts. The post is in German, but as with almost any technical text you can get pretty far using Google Translate.
            The steps in brief:
            1. Create a virtual machine with 2 virtual hard drives.
            2. In the VM create a database placing data file on the first hard drive and transaction log file on the second.
            3. Limit IOPS bandwidth for the second hard drive to some really small value in order to have enough time to simulate system crash.
            4. Start a delayed durability transaction producing more than 7Mb transaction log.
            5. Wait for transaction to be signaled as committed to the client (LOP_COMMIT_XACT record is not hardened yet due to slow IO).
            6. Shut down the virtual machine -> after restart your transaction is rolled back during recovery, the data is lost.

          • Aaron Bertrand says:

            Ok, but look at the trouble you had to go through to make that happen. I'm sure I could go to similar great lengths to intentionally cause more than 7MB of data loss *without* delayed durability, too. (I don't know that I would ever consider rollbacks due to crash/recovery "data loss," either.)

            Remember, the use case recommended here is for *deleting* data. If you lose those transactions, who cares?

          • Simon Harvey says:

            We actually considered using it recently for a data warehouse load where we clear down the data each evening and do almost a full rebuild. In this sense any data loss would be very easily mitigated by running the job again.

            We actually decided against it in the end largely because although it made a performance improvement of about 30%, it didn't make enough of a difference in our scenario to be worth removing the airbags. Had it reduced an 8 hour job down to 4 hours it might have been worth a punt, but at the moment it would be reducing a 1 hour job down to 45 minutes. I'd rather have my airbags switched on :-)

            I wrote about when and why you might want to use delayed durability here:
            https://rule30.wordpress.com/2014/03/16/delayed-durability-in-sql-server-2014-part-3-when-should-i-use-it/

          • James Lupolt says:

            Hi there, the first screenshot looks different to me. Is this from 9.1?

          • Aaron Bertrand says:

            Hi James, do you just mean the fonts? I think they're smoothed quite differently on retina / 4K / HiDPI displays.

          • Andrej Kuklin says:

            "Data loss up to about 7MB in a perfect storm, worst-case scenario" is a common misconception. Actually the data loss can be arbitrary large. Imagine that log block contained the LOP_COMMIT_XACT log record for a long-running transaction and it wasn't written to the disk when system crashed. In this case the whole transaction will be rolled back during crash recovery

          • Aaron Bertrand says:

            Andrej, is this theoretical, or are you sure this could happen? Have you reproduced a data loss of > 7 MB? In any case, Melissa certainly isn't advocating the feature for any use cases that can't tolerate *any* data loss at all.

          • Kenneth Fisher says:

            Another good use would be during a batch process where if it fails you are going to have to restore back to the beginning of the process regardless.

          Leave A Comment

          Your email address will not be published.

          This site uses Akismet to reduce spam. Learn how your comment data is processed.