Dynamic Data Masking : My Thoughts
Recently I have been exploring Dynamic Data Masking. Looking at ways that it can be used and I thought I would share my thoughts with you. My initial thoughts are that there is a lot of potential with this feature, however, it needs to have some gaps filled. In both functionality and also from a security perspective with regards to preventing information leakage.
Before we dive in and discuss the good, the bad, and the ugly of Dynamic Data Masking, let's take a step back and look at what it actually is. Essentially, Microsoft has given us the ability to obscure the contents of a column for specific users at query time. Some of the key elements of this feature are:
- Introduced first in Azure SQL Database, later released in SQL Server 2016 for on-premises use.
- Available in Standard and Enterprise Editions of SQL Server 2016.
- Data in the tables is stored in the clear.
- Masking is defined at the column level, the policy is handled globally.
- Definition of who can and cannot see the data is based on logins/users.
There are four main masking operations that can be applied to columns. These are:
default() masking function has no configuration options. It simply returns a static default value for the column being queried, based on the datatype that is in use. E.g.
- Date : 01/01/1900
- String : "XXXX"
- Numeric : 0
This function is used to return a mask that indicates the underlying string value is an email address. It does this by taking the first character of the string and then replacing the rest of it with
XXX@XXXX.com. It is important to note that the ".com" suffix is constant and no other domains will be shown. Also, this mask can be applied to any (n)char/(n)varchar column, even if it is not storing email addresses. So you could use the mask on names and get a name of
JXXX@XXXX.COM as a result.
random() function is used to return a random number to mask any numeric type; it cannot be used to mask other data types. When configuring this mask, a lower and upper limit for the random number needs to be specified. So when selecting to use this mask over
default(), ensure you know the data in the column. To really make this effective you should look to specify a range that has numbers that are not present in the underlying data. Give yourself plenty of extra scope for the random numbers to help reduce potential information leakage.
partial() is the most user-configurable of the functions available. When configuring the masking function, the number of leading and trailing characters of the string with a custom padding value in the middle.
The setup and configuration of Dynamic Data Masking can be relatively simple, lending itself to being retrofitted into existing systems. This is quite different from Always Encrypted or Row Level Security, which can require a large element of refactoring of the database and/or application.
For the rest of this blog post, I will be working with the following scenario:
- I have an SSRS Server, hosting a number of reports that display information about my SQL Server estate. From performance metrics through to details of failed jobs and poorly performing queries. I want to add an additional layer of security, restricting who can see the names of servers, databases, and other internal infrastructure information. Permission to view these reports will be granted to both support teams and business users, with the business users not being permitted to see the sensitive data.
As the Dynamic Data Masking feature is built around the existing SQL Server security model for logins and users. It is important to understand how these users will be connecting to the database or whether the execution context of the queries is being set at execution, via EXECUTE AS. The first hurdle in our scenario is going to be, how we handle the data source configuration. Here we need to think about the fact that we have two collections of users: some that can see all data, and others that will see masked data.
The options for setting SSRS Data Source security Credentials are as follows:
In this situation, the only two options that are really open to us are "As the user viewing the report" and "By prompting the user viewing the report for credentials." The reason being that if we specify an account or use the Execution Account, then we cannot discern whether to display masked or unmasked data. These two potential configurations are seen below:
SSRS Configuration 1.
SSRS Configuration 2.
Note! — In theory, it would be possible to use the second configuration. Though this would then require that a user mapping configuration is created and maintained. This could then be used to identify who should see masked and unmasked data. Creating two database users, one for masked and the other for unmasked query execution, then using EXECUTE AS for the statement, which can then return the appropriate results.
Applying the masks is a relatively simple task, performed with an
ALTER COLUMN statement:
ALTER TABLE dbo.EventSourceConnection ALTER COLUMN ServerName ADD MASKED WITH (FUNCTION = 'default()');
Once we have done this, all we need to do is create the user in the database and grant them SELECT on the appropriate tables/views needed for the queries they will be running. It is important to ensure that they do not have additional permissions that will allow them to bypass the controls that have been put in place. Details on the permissions that need to be avoided can be found in the Microsoft documentation here.
With the mask in place, if I run the report as an admin user, then I can see the clear text data with the server names in:
Whereas executing the report as a user who only has SELECT permissions on the data means that I get the masked server names back:
While all of this seems really useful, it is worth noting that there are a few little things that need to be worked through. Once they are sorted then this will be a really powerful option for those of us in the Data Platform space.
By focusing on granting the masking and unmasking of data to database-level principals, applications and other solutions that leverage a service/application account for accessing the database are not natively able to leverage Dynamic Data Masking. In these situations, the data access component would need to be modified to alter the execution context based on logic built into the application.
Currently, it is an "All or Nothing" approach to masked/unmasked. With the database principal either seeing none of the data where masks are defined or all of it. There is no granularity to define whether a principal can see data based on tables or columns. The
UNMASK permission has database scope:
GRANT UNMASK TO [database_user];
I would like to see the granularity enhanced so that database principals can
UNMASK either a whole table or even individual columns within a table. How would this help? It would mean that where there are many different properties of the data, then masking can be configured globally. And then as user privilege increases, different levels of masking can be applied.
If you agree with the view that I have, then please feel free to review and upvote/downvote (and comment) on the Connect item that I have raised for this.
SQL Server 2012, altered the permission set required to use the
DBCC SHOW_STATISTICS() command. In SQL Server 2008 R2 and prior, you required membership of the db_owner fixed database role. As of SQL Server 2012, this was expanded to include owners of the table or members of
ddl_admin. Then with SQL Server 2012 SP1 this was expanded so that the principal executing the command only needed SELECT permission on all of the columns included in the statistics object.
Now, unlike Row Level Security, Dynamic Data Masking does not prevent masked users from running
DBCC SHOW_STATISTICS() against statistics objects that have columns with masks configured. This means that there are potentially up to 200 values that can be seen via the statistics object, that would otherwise not be visible. As such it is not wise to grant SELECT permissions directly on tables if you want to use this feature. I would strongly suggest the use of views and procedures, meaning there is no way for the masked user to have permissions to the statistics object directly.
This behaviour can be disabled via Trace Flag 9485. However, when you are looking at Azure SQL Database, this is not an option.
I would like to see an implementation where checks are made to see if any columns in the statistics object have masks configured, then whether the principal executing the statement is allowed to see the unmasked data or not. If they cannot see the unmasked data then they should receive an error. This differs slightly from the Row Level Security implementation due to the scope of the mask being on the column rather than the row.
Again, I have opened a Connect Item and would appreciate any comments/votes.
This is an interesting one: information leakage via inference. Even though the data is masked in the query result, the user can supply a value to filter on for columns that they are not permitted to see the value for. As such, they would receive a set of data that conforms to the filter provided, as such then now know the value that sits behind the mask. So effectively they have circumvented the masking, needing simply to perform repeated queries supplying multiple filter values to build up a view of the data.
This is a difficult one. Ideally, I would like to have users who are configured not to see the data be unable to filter on that column. But that would mean that there would also be a need to establish a design pattern to allow for filtering. Because running queries without a WHERE clause is not a great idea.
There is also a facet to this in that if you have SQL Server configured to automatically create statistics. If there are none on the masked column and the user specified the column in a filter, SQL Server would then generate the statistics object. Which then brings us back to the previous point about limiting who can access the statistics objects. As this could represent a way of finding a sample of the data which can then help build filter values for a brute force attack.
Yup, you guessed it. I have raised another connect item asking for this to be looked at. If you have the time to review and comment/vote it would be much appreciated.
So, while I have called out a few issues with this current implementation of Dynamic Data Masking, I think that with a few tweaks it can be a really useful and powerful addition to our toolset. Do I think that it is ready for widespread use? No. Do I think that it can be applied now in specific scenarios? Yes.
Where you have a system that has access controlled via Windows/SQL Authentication, and that you can abstract the tables behind views, procedures, or an application, then I think it can be very useful. I can see a lot of potential behind reporting systems such as PowerBI, SSRS, and others. Where you have the ability to mask the filters via an aliasing mechanism, so that the user cannot see the actual filter value bur rather the alias that you want them to see.
I'm going to keep working on this, so keep your eyes peeled for some future blog posts going into detail on possible implementation scenarios.