Dynamic Data Masking in SQL Server for Advanced Users

Total: 2 Average: 5

This article gives a high-level understanding of dynamic data masking in SQL Server along with its use cases, best practices and security implications for advanced SQL users (including developers and testers). Additionally, the readers of this article are going to get familiar with applying different types of dynamic data masking. The article also highlights the importance of advanced-level use of data masking in day-to-day database development and testing tasks.

Pre-requisites

Let’s go through the pre-requisites for this article first.

T-SQL Familiarity

This article assumes that the readers are well familiar with T-SQL scripts and can comfortably write and run SQL queries to view and manipulate the SQL databases.

Basics of Dynamic Data Masking

This article also assumes that the readers know the basic concepts of dynamic data masking in SQL Server. Please refer to the article Data Masking in SQL Server for beginners to get yourself familiar with the basics of Dynamic data masking if you have not done it yet.

Azure SQL database or SQL Server 2016 Compatibility

Dynamic data masking feature is available in SQL Server 2016 through SQL Server 2019 so it is highly recommended to have one of the following:
1. Azure SQL database
2. SQL Server 2016 installed locally or remotely.

Setting up Sample Database

Please remember to create a sample database using the following T-SQL script compatible with SQL Server 2016 or as Azure SQL Database if you would like to follow the walkthrough in this article:

-- Create sample database ITSalesV2
CREATE DATABASE ITSalesV2;
GO

USE [ITSalesV2]

-- (2) Create MonthlySale table
CREATE TABLE [dbo].[MonthlySale](
	[SaleId] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
	[SellingDate] [datetime2](7) NULL,
	[Customer] [varchar](50) NULL,
	[Email] [varchar] (200) NULL,
	[Product] [varchar](150) NULL,
	[TotalPrice] [decimal](10, 2) NULL,
)


-- (2) Populate monthly sale table
SET IDENTITY_INSERT [dbo].[MonthlySale] ON
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (1, N'2019-05-01 00:00:00', N'Asif', N'Asif@companytest-0001.com', N'Dell Laptop', CAST(300.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (2, N'2019-05-02 00:00:00', N'Mike',N'Mike@companytest-0002.com', N'Dell Laptop', CAST(300.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (3, N'2019-05-02 00:00:00', N'Adil',N'Adil@companytest-0003.com',N'Lenovo Laptop', CAST(350.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (4, N'2019-05-03 00:00:00', N'Sarah',N'Sarah@companytest-0004', N'HP Laptop', CAST(250.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (5, N'2019-05-05 00:00:00', N'Asif', N'Asif@companytest-0001.com', N'Dell Desktop', CAST(200.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (6, N'2019-05-10 00:00:00', N'Sam',N'Sam@companytest-0005', N'HP Desktop', CAST(300.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (7, N'2019-05-12 00:00:00', N'Mike',N'Mike@companytest-0002.comcom', N'iPad', CAST(250.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (8, N'2019-05-13 00:00:00', N'Mike',N'Mike@companytest-0002.comcom', N'iPad', CAST(250.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (9, N'2019-05-20 00:00:00', N'Peter',N'Peter@companytest-0006', N'Dell Laptop', CAST(350.00 AS Decimal(10, 2)))
INSERT INTO [dbo].[MonthlySale] ([SaleId], [SellingDate], [Customer],[Email], [Product], [TotalPrice]) VALUES (10, N'2019-05-25 00:00:00', N'Peter',N'Peter@companytest-0006', N'Asus Laptop', CAST(400.00 AS Decimal(10, 2)))
SET IDENTITY_INSERT [dbo].[MonthlySale] OFF

Checking Data

Check the newly created and populated sample database ITSalesV2 by running the following script:

-- View monthly sales data
SELECT
  s.SaleId
 ,s.SellingDate
 ,s.Customer
 ,s.Email
 ,s.Product
 ,s.TotalPrice
FROM dbo.MonthlySale s

The output is as follows:

Creating a non-privileged user

Please create a non-privileged user without login having only SELECT permission on the MonthlySale table who is going to see the masked data as a pre-requisite to this article. Use the following script to do that:

-- Create DataUser to have Select access to MonthlySale table
CREATE USER DataUser WITHOUT LOGIN;  
GRANT SELECT ON MonthlySale TO DataUser;

Creating a Procedure to check masking status

This article also assumes that there is a stored procedure in the sample database which shows us the dynamic data masking status of the columns in the database tables:

-- Stored procedure to check dynamic data masking status
CREATE PROC ShowMaskingStatus
AS
BEGIN
SET NOCOUNT ON 
SELECT c.name, tbl.name as table_name, c.is_masked, c.masking_function  
FROM sys.masked_columns AS c  
JOIN sys.tables AS tbl   
    ON c.[object_id] = tbl.[object_id]  
WHERE is_masked = 1;
END

Dynamic Data Masking Types

There are four common types of Dynamic data masking in SQL Server:
1. Default Data Mask(s)
2. Partial Data Mask(s)
3. Random Data Mask(s)
4. Custom String Data Mask(s)
We are now going to implement all the four common types of dynamic data masking.

Implementing Default Data Masking

Default data masking hides a column completely from an unauthorized user by covering all the column values with a special character making it very difficult to guess the column contents.

Business Requirement

Now suppose you receive a business requirement which states that e-mail addresses of the customers should be completely hidden (masked) due to the sensitivity of this information.
The best way to meet this business requirement is to mask the Email column using Dynamic data masking (DDM).

E-mail Address Default Data Masking

We are going to alter the table to mask the e-mail addresses as follows:

--Default dynamic data masking of Email column 
ALTER TABLE MonthlySale
ALTER COLUMN Email varchar(200) MASKED WITH (FUNCTION = 'default()');

Checking Masking Status

Check the Dynamic data masking status by using the following stored procedure based on a T-SQL script referenced in Microsoft documentation:

-- Checking dynamic data masking status
EXEC ShowMaskingStatus

The output shows us which columns have been masked successfully:

Viewing Email Column as a DataUser

Next, execute Select statement to view monthly sales (table) as a low-privileged user called DataUser having only select permission on the table as follows:

-- Execute SELECT as DataUser
EXECUTE AS USER = 'DataUser';  

-- View monthly sales 
SELECT s.SaleId,s.SellingDate,s.Customer,s.Email,s.Product,s.Product from dbo.MonthlySale s

-- Revert the User back to what user it was before
REVERT;

The output is as follows:

Implementing Partial Data Masking

Partial data masking as the name implies partially hides a column from viewing it by an unauthorized user covering some part of the column values with special characters making the column contents somewhat readable but still difficult to guess.

Business Requirement

Now think of a business requirement in which you have been asked to partially hide the name of the customers in such a way that only the first character of the name remains visible. The best way to meet this business requirement is to mask the Customer column using Partial dynamic data masking.

Partial Data Masking of Customer Names

We are going to alter the table to partially mask Customer column as follows:

-- Partial data masking of Customer names
ALTER TABLE MonthlySale
ALTER COLUMN [Customer] ADD MASKED WITH (FUNCTION = 'partial(1,"XXXXXXX",0)')

Checking Masking Status

Check the dynamic data masking status:

-- Checking dynamic data masking status
EXEC ShowMaskingStatus

The output shows us which columns have been masked successfully:

Viewing the Customer Column as a DataUser

View the table as a test user DataUser who must see the masked data:

-- Execute SELECT as DataUser
EXECUTE AS USER = 'DataUser';  

-- View monthly sales as DataUser
SELECT s.SaleId,s.SellingDate,s.Customer,s.Email,s.Product,s.Product from dbo.MonthlySale s

-- Revert the User back to what user it was before
REVERT;

The output is as follows:

Implementing Random Data Masking

Random data masking hides a column randomly from an unauthorized user by covering a column based on a range of values making it very difficult to guess the column contents. Please remember that random data masking type is only applicable to the columns which store only numbers and it can be specified by providing a range for randomization.

Business Requirement

You receive a business requirement which states that product price should be masked with a random range of numbers so that users with low privileges must not know the exact prices of the product for privacy reasons. The best way to meet this business specification is to mask the TotalPrice column using random dynamic data masking.

Random Data Masking of the TotalPrice Column

Alter the table MonthlySale to mask TotalPrice randomly as follows:

--Random dynamic data masking of TotalPrice column 
ALTER TABLE MonthlySale
ALTER COLUMN [TotalPrice] decimal(10,2) MASKED WITH (FUNCTION = 'random(1, 12)')

Checking Masking Status

Please check the dynamic data masking status by running the following stored procedure:

-- Checking dynamic data masking status
EXEC ShowMaskingStatus

The output shows us which columns have been masked successfully:

Viewing the TotalPrice Column as a DataUser

View the table as a DataUser now:

-- Execute SELECT as DataUser
EXECUTE AS USER = 'DataUser';  

-- View monthly sales 
SELECT s.SaleId,s.SellingDate,s.Customer,s.Email,s.Product,s.TotalPrice from dbo.MonthlySale s

-- Revert the User back to what user it was before
REVERT;

The output is as follows:

Please remember that the output may differ for the Random Data Masked column due to the random number generation.

Implementing Custom String Data Masking

Custom string data masking as its name implies adds custom character to hide a column by making it very difficult to guess its contents. Please remember that Custom string data masking is used in conjunction with Partial data masking by customizing the character to mask the actual column values. In other words, Custom string data masking is an enhanced form of Partial data masking.

Business Requirement

Consider a business requirement to only show the first and last character of the Product column while the rest of the characters should be hidden or masked with hyphens (-). The best way to meet this business specification is to mask the Product column using Partial dynamic data masking with the required custom string.

Customer String Data Masking of Selling Data

Alter the table MonthlySale to mask the Product column as follows:

--Custom string dynamic data masking of Product column 
ALTER TABLE MonthlySale
ALTER COLUMN [Product] ADD MASKED WITH (FUNCTION = 'partial(1,"---",1)')

Checking Masking Status

It is worth checking Dynamic data masking status at this point using the following script:

-- Checking dynamic data masking status
EXEC ShowMaskingStatus

The output shows all the columns on which Dynamic data masking has been successfully applied as shown below:

Viewing the Product Column as a DataUser

View the table as a DataUser now:

-- Execute SELECT as DataUser
EXECUTE AS USER = 'DataUser';  

-- View monthly sales 
SELECT s.SaleId,s.SellingDate,s.Customer,s.Email,s.Product,s.TotalPrice from dbo.MonthlySale s

-- Revert the User back to what user it was before
REVERT;

The output is as follows:

Congratulations! You have successfully implemented all four masking techniques.
Please refer to the article Data Masking in SQL Server for beginners to drop the applied data masking types.

Best Practices

Please remember the following things:
1. Dynamic data masking does not protect or encrypt the column data so it should not be used for that purpose.
2. The potential user who is supposed to see the masked data must have very limited access to view the data and should not at all be given Update permission to exploit the data.
3. The potential user even with SELECT only permission can run exhaustive queries to guess the correct value so please beware of that.
4. You can also use ALTER COLUMN Email ADD MASKED WITH (FUNCTION = ’email()’) to mask e-mail columns rather than using Default dynamic masking.
5. You can use Custom string data masking to hide a debit card number in a transaction report by showing only the last two or four digits as you might have seen in shopping receipts.

Things to do

Now that you can implement all the four types of masking please try the following things to improve your skills further:
1. Please create a sample database by following the walkthrough in the article SSRS Reports Development in Simple Terms followed by developing an SSRS report which only shows the first character of the Author names by hiding the rest using Partial data masking.
2. Please try to create a sample database referenced in the article Creating and Deploying Multiple Versions of Database through Schema Snapshots and then create a test user called Student and apply suitable dynamic data masking to hide the marks of all the students for this test user.
3. Please try to create and drop dynamic data masking to ensure that you can successfully add and remove Dynamic data masking on a SQL table.

Haroon Ashraf