Generating Test Data in SQL Server

Total: 13 Average: 4.2

When testing the functionality of your application or the performance of a specific stored procedure or an ad-hoc query in the development environment, you need to have data stored in your development databases typical or similar to the data stored in the production databases. This is because the performance of a query that is processing 50 records will be different from the performance of the same query that is processing 50M rows. Restoring a copy of the production database to the development database server for testing purposes is not always a valid option, due to the critical data that is stored in these databases and should not be open for all employees to see, unless you are developing a new application and there is no production database yet.

The best and most secure alternative is to fill the development database tables with testing data. Test data generation is useful for testing the performance of the application or a new functionality without changing the production data. There is no single straight-forward way to generate test data that will fit all scenarios, especially when you need to generate large amount of data to test the performance of complex queries and transactions in which you should cover all possible combinations of testing cases.

To fill a table with a large amount of data, the easiest way is to write a simple script that keeps inserting identical records into the database table with the number of duplicates you need. But the problem is that the SQL Server Query Optimizer will build a different plan on the development database from the one built on the production database due to the difference in the data distribution. For example, the script below will fill the Students table with 100K redundant testing records using the GO Number statement:

Another option is to generate random data depending on the data type of each column. The ID column with IDENTITY property will automatically generate sequence numbers without the need for any coding effort from your side. But if you plan to generate random grades for the students, you can take benefits from the RAND() T-SQL function and cast the result as the required numeric data type. For example, the below script will generate 100K random grades for the student between 1 and 100 with three different data types: INTEGER grades, REAL grades and DECIMAL grades, with the ability to control the ranges of these values depending on your mathematical and programming skills, as shown below:

Generating random names can also be achieved using the AdventureWorks and Northwind Microsoft testing databases. You need to download these databases from Microsoft website, attach these databases to your SQL Server instance and take benefits from the data stored in these databases to generate random names in your development database. For example, the DimCustomer table from the AdventureworksDW2016CTP3 database contains about 18K first names, middle names and last names that you can use. You can also use a CROSS JOIN statement to generate a large number of combinations of these names to exceed the 18K value. The following script can be used to generate 100K first names and last names:

Random email addresses and dates can also be generated from the Microsoft testing databases. For example, the BirthDate column and the EmailAddress column from the same DimCustomer table can provide us with random dates and email addresses. The below script can be used to generate 100K combination of Birthdates and Email addresses:

Random values of the Country column can also be generated using the Person.CountryRegion table from the AdventureWorks2016CTP3 testing database. It can provide you with more than 200 country names and codes that you can take benefits from in your development database. For example, you can take it as a lookup table to map between the country name and code, as in the script below:

To generate random Address values, you can take benefits from the data stored in the Person. Address table from the AdventureWorks2016CTP3 testing database. It contains more than 19K different addresses with its spatial location, that you can easily use in your development database and take random combination from these values, in the same way we did in the previous example. The below script can be easily used to generate random 100K addresses from the Person.Address table:

To generate random passwords for specific system users, we can take benefits from the CRYPT_GEN_RANDOM T-SQL function. This function returns a cryptographic, randomly-generated hexadecimal number with a length of a specified number of bytes, generated by the Crypto API (CAPI).  The value returned from that function can be converted to a VARCHAR data type in order to have more meaningful passwords, as in the script below, that generates 100K random password:

Generating test data to fill the development database tables can also be performed easily and without wasting time for writing scripts for each data type or using third party tools. You can find various tools in the market that can be used to generate testing data. One of these wonderful tools is the dbForge Data Generator for SQL Server . It is a powerful GUI tool for a fast generation of meaningful test data for the development databases. dbForge data generation tool includes 200+ predefined data generators with sensible configuration options that allow you to emulate column-intelligent random data. The tool also allows generating demo data for SQL Server databases already filled with data and creating your own custom test data generators. dbForge Data Generator for SQL Server can save your time and effort spent on demo data generation by populating SQL Server tables with millions of rows of sample data that look just like real data. dbForge Data Generator for SQL Server helps to populate tables with most frequently used data types such as Basic, Business, Health, IT, Location, Payment and Person data types. The figure below shows how easily this tool works:

After you install the dbForge Data Generator for SQL Server tool and run that tool, you need to specify the target server name and database name in the Connection window as shown below:

In the Options window, you can specify the number of rows to be inserted into your table and other different options that control the generated testing data criteria, as shown below:

After customizing the options to fit your testing data requirements, click on the  button, and a new window with a list of all tables and columns under the selected database will be displayed, asking you to choose which table to fill with testing data, as shown below:

Just choose the table that you need to fill with data, and the tool will automatically provide you with the suggested data in the Preview section at the bottom of the window and customizable options for each column in that table that you can easily customize, as shown below:

For example, you can choose from the built-in generators data types that can be used to generate the ID column values as described previously:

Or customize the characteristics of that ID column values, such as the Uniqueness, Min, Max and the Increment of the generated values, as below:

In addition, the First_Name column can be limited to be male or female or combination of these two types. Also, you can control the percentage of NULL or empty values within that column, as shown below:

The BirthDate column also can be controlled by specifying the category that these students will fall under, such as Students, Teenagers, Adults or Retired as shown below:

You can also specify the fully-described generator that can be used to generate the Country column values as shown below:

And customize the equation that will be used to generate the Email Address column values as follows:

In addition to the complexity of the customizable equation, we generate the Password column values, as shown below:

And finally, for my example and not for this magical tool, the generators and equations used to generate the Address column values below:

After this tour, you can imagine how this magical tool will help you in generating data and simulating real time scenarios to test the functionality of your application. Install it and enjoy taking benefits from all available features and options.


Useful tool:

dbForge Data Generator for SQL Server – powerful GUI tool for a fast generation of meaningful test data for SQL Server databases.

Ahmad Yaseen

Ahmad Yaseen is a SQL Server database administration leader at Aramex International Company with a bachelor’s degree in computer engineering as well as .NET development experience.
He is a Microsoft Certified Professional with a good experience in SQL server development, administration, performance tuning, monitoring and high availability and disaster recovery technologies.
Ahmad Yaseen