SentryOne Team Blog (blogs.sentryone.com)

Dynamic Data Masking : My Thoughts

Recently I have been exploring Dynamic Data Masking. Looking at ways that it can be used and I thought I would share my thoughts with you. My initial thoughts are that there is a lot of potential with this feature, however, it needs to have some gaps filled. In both functionality and also from a security perspective with regards to preventing information leakage.

What is Dynamic Data Masking?

Before we dive in and discuss the good, the bad, and the ugly of Dynamic Data Masking, let's take a step back and look at what it actually is. Essentially, Microsoft has given us the ability to obscure the contents of a column for specific users at query time. Some of the key elements of this feature are:

  • Introduced first in Azure SQL Database, later released in SQL Server 2016 for on-premises use.
  • Available in Standard and Enterprise Editions of SQL Server 2016.
  • Data in the tables is stored in the clear.
  • Masking is defined at the column level, the policy is handled globally.
  • Definition of who can and cannot see the data is based on logins/users.
Available Masks

There are four main masking operations that can be applied to columns. These are:

default()

The default() masking function has no configuration options. It simply returns a static default value for the column being queried, based on the datatype that is in use. E.g.

  • Date : 01/01/1900
  • String : "XXXX"
  • Numeric : 0
email()

This function is used to return a mask that indicates the underlying string value is an email address. It does this by taking the first character of the string and then replacing the rest of it with XXX@XXXX.com. It is important to note that the ".com" suffix is constant and no other domains will be shown. Also, this mask can be applied to any (n)char/(n)varchar column, even if it is not storing email addresses. So you could use the mask on names and get a name of JXXX@XXXX.COM as a result.

random()

The random() function is used to return a random number to mask any numeric type; it cannot be used to mask other data types. When configuring this mask, a lower and upper limit for the random number needs to be specified. So when selecting to use this mask over default(), ensure you know the data in the column. To really make this effective you should look to specify a range that has numbers that are not present in the underlying data. Give yourself plenty of extra scope for the random numbers to help reduce potential information leakage.

partial()

partial() is the most user-configurable of the functions available. When configuring the masking function, the number of leading and trailing characters of the string with a custom padding value in the middle.

Dynamic Data Masking, Partial Function

Setting Up Dynamic Data Masking

The setup and configuration of Dynamic Data Masking can be relatively simple, lending itself to being retrofitted into existing systems. This is quite different from Always Encrypted or Row Level Security, which can require a large element of refactoring of the database and/or application.

Usage Scenario

For the rest of this blog post, I will be working with the following scenario:

  • I have an SSRS Server, hosting a number of reports that display information about my SQL Server estate. From performance metrics through to details of failed jobs and poorly performing queries. I want to add an additional layer of security, restricting who can see the names of servers, databases, and other internal infrastructure information. Permission to view these reports will be granted to both support teams and business users, with the business users not being permitted to see the sensitive data.
User Classification

As the Dynamic Data Masking feature is built around the existing SQL Server security model for logins and users. It is important to understand how these users will be connecting to the database or whether the execution context of the queries is being set at execution, via EXECUTE AS. The first hurdle in our scenario is going to be, how we handle the data source configuration. Here we need to think about the fact that we have two collections of users: some that can see all data, and others that will see masked data.

The options for setting SSRS Data Source security Credentials are as follows:

SSRS Security Configuration Options

In this situation, the only two options that are really open to us are "As the user viewing the report" and "By prompting the user viewing the report for credentials." The reason being that if we specify an account or use the Execution Account, then we cannot discern whether to display masked or unmasked data. These two potential configurations are seen below:

SSRS Configuration 1.

SSRS Dynamic Data Masking Config 01

SSRS Configuration 2.

DDM-SSRSConfig02

Note! — In theory, it would be possible to use the second configuration. Though this would then require that a user mapping configuration is created and maintained. This could then be used to identify who should see masked and unmasked data. Creating two database users, one for masked and the other for unmasked query execution, then using EXECUTE AS for the statement, which can then return the appropriate results.

Applying Masks

Applying the masks is a relatively simple task, performed with an ALTER COLUMN statement:

ALTER TABLE dbo.EventSourceConnection
  ALTER COLUMN ServerName
  ADD MASKED WITH (FUNCTION = 'default()');

Once we have done this, all we need to do is create the user in the database and grant them SELECT on the appropriate tables/views needed for the queries they will be running. It is important to ensure that they do not have additional permissions that will allow them to bypass the controls that have been put in place. Details on the permissions that need to be avoided can be found in the Microsoft documentation here.

With the mask in place, if I run the report as an admin user, then I can see the clear text data with the server names in:

Dynamic Data Masking - Unmasked report

Whereas executing the report as a user who only has SELECT permissions on the data means that I get the masked server names back:

Dynamic Data Masking - Masked report

The Niggles

While all of this seems really useful, it is worth noting that there are a few little things that need to be worked through. Once they are sorted then this will be a really powerful option for those of us in the Data Platform space.

Reliance on database principals

By focusing on granting the masking and unmasking of data to database-level principals, applications and other solutions that leverage a service/application account for accessing the database are not natively able to leverage Dynamic Data Masking. In these situations, the data access component would need to be modified to alter the execution context based on logic built into the application.

Unmask, not granular enough

Currently, it is an "All or Nothing" approach to masked/unmasked. With the database principal either seeing none of the data where masks are defined or all of it. There is no granularity to define whether a principal can see data based on tables or columns. The UNMASK permission has database scope:

GRANT UNMASK TO [database_user];

What would I like to see?

I would like to see the granularity enhanced so that database principals can UNMASK either a whole table or even individual columns within a table. How would this help? It would mean that where there are many different properties of the data, then masking can be configured globally. And then as user privilege increases, different levels of masking can be applied.

If you agree with the view that I have, then please feel free to review and upvote/downvote (and comment) on the Connect item that I have raised for this.

Information Leakage – Statistics

SQL Server 2012, altered the permission set required to use the DBCC SHOW_STATISTICS() command. In SQL Server 2008 R2 and prior, you required membership of the db_owner fixed database role. As of SQL Server 2012, this was expanded to include owners of the table or members of ddl_admin. Then with SQL Server 2012 SP1 this was expanded so that the principal executing the command only needed SELECT permission on all of the columns included in the statistics object.

Now, unlike Row Level Security, Dynamic Data Masking does not prevent masked users from running DBCC SHOW_STATISTICS() against statistics objects that have columns with masks configured. This means that there are potentially up to 200 values that can be seen via the statistics object, that would otherwise not be visible. As such it is not wise to grant SELECT permissions directly on tables if you want to use this feature. I would strongly suggest the use of views and procedures, meaning there is no way for the masked user to have permissions to the statistics object directly.

This behaviour can be disabled via Trace Flag 9485. However, when you are looking at Azure SQL Database, this is not an option.

What would I like to see?

I would like to see an implementation where checks are made to see if any columns in the statistics object have masks configured, then whether the principal executing the statement is allowed to see the unmasked data or not. If they cannot see the unmasked data then they should receive an error. This differs slightly from the Row Level Security implementation due to the scope of the mask being on the column rather than the row.

Again, I have opened a Connect Item and would appreciate any comments/votes.

Information Leakage – Predicates

This is an interesting one: information leakage via inference. Even though the data is masked in the query result, the user can supply a value to filter on for columns that they are not permitted to see the value for. As such, they would receive a set of data that conforms to the filter provided, as such then now know the value that sits behind the mask. So effectively they have circumvented the masking, needing simply to perform repeated queries supplying multiple filter values to build up a view of the data.

What would I like to see?

This is a difficult one. Ideally, I would like to have users who are configured not to see the data be unable to filter on that column. But that would mean that there would also be a need to establish a design pattern to allow for filtering. Because running queries without a WHERE clause is not a great idea.

There is also a facet to this in that if you have SQL Server configured to automatically create statistics. If there are none on the masked column and the user specified the column in a filter, SQL Server would then generate the statistics object. Which then brings us back to the previous point about limiting who can access the statistics objects. As this could represent a way of finding a sample of the data which can then help build filter values for a brute force attack.

Yup, you guessed it. I have raised another connect item asking for this to be looked at. If you have the time to review and comment/vote it would be much appreciated.

In Summary

So, while I have called out a few issues with this current implementation of Dynamic Data Masking, I think that with a few tweaks it can be a really useful and powerful addition to our toolset. Do I think that it is ready for widespread use? No. Do I think that it can be applied now in specific scenarios? Yes.

Where you have a system that has access controlled via Windows/SQL Authentication, and that you can abstract the tables behind views, procedures, or an application, then I think it can be very useful. I can see a lot of potential behind reporting systems such as PowerBI, SSRS, and others. Where you have the ability to mask the filters via an aliasing mechanism, so that the user cannot see the actual filter value bur rather the alias that you want them to see.

I'm going to keep working on this, so keep your eyes peeled for some future blog posts going into detail on possible implementation scenarios.

4 replies on “Dynamic Data Masking : My Thoughts”

4 Comments (Comments are now closed.)
  1. The work of dynamic data masking is to protect personally identifiable data. Dynamic data masking does not require any additional server resources.

  2. Native Dynamic Data Masking in MSSQL Server was introduced in version 2016. And like in any generic tool that database vendor tries to fill the check box it covers just basic features. Like in monitoring.
    You may want to check dynamic data masking from 3rd party vendors. For instance DataSunrise. http://www.datasunrise.com/dynamic-data-masking/

    Integrated with blocking and data auditing functionality, it also includes the required granularity plus ability to implement your own masking algorithm

  3. Good day,

    You are talking about "Information Leakage – Statistics", but you forget that any user with select permission can see all the data unmask with simple actions, as I show in my blog!

    SQL Server 2016 – Crack the masking:
    http://ariely.info/Blog/tabid/83/EntryId/182/SQL-Server-2016-Crack-the-masking.aspx

    * There is shorter version (but cleaner) as an article at the TechNet WIKI.
    ** Same idea of exposing data I presented at the Israeli User Group about Row Level Security (the presentation and code can be download from Microsoft gallery).
    *** There is a nice discussion about my blog post at linkedin.

    :-)

    1. Ronen

      Thanks for your comment, I had not forgotten, rather took a different approach. Details on how to circumvent via T-SQL queries are documented already. As you demonstrate well in your post, it is possible to derive the masked values via other means due to joins/filtering.

      I think we can safely say that for a V1 feature that it has a lot of potential if some of these issues are fixed. At the moment I would use it behind an application/reporting solution where there is no direct T-SQL access for the user.

      Thanks

      JQ