Introduction
Datafiles are physical objects that constitute the most important part of the database system since they contain actual data. You can think of a database as a collection of data files. An instance gives you the means of mounting and accessing such files.
Here, managing datafiles is understanding how to monitor and resize existing datafiles and how to add or remove the data files from a database.
T-SQL codes for these operations are present in Microsoft documentation. However, in this article, we’d like to discuss the tactics around managing these files for those of us who still run on-premises installations of SQL Server.
Datafiles Types and Possible Issues
For every new database created in SQL Server, we must have at least two files created – a primary datafile and a logfile.
- The primary data file has the .MDF extension.
- The logfile has the .LDF extension.
- When we add datafiles to an SQL Server database, we typically use the .NDF extension.
Note: It is possible to create the datafiles in SQL Server without any extension, but that is not the best practice. Using .mdf, .ndf and .ldf serves to distinguish these files when we view them on an operating system level.
Obviously, datafiles are created when you create a database. You can do it with the CREATE DATABASE command. While it seems so easy, you should be aware of possible issues.
Depending on how large the database and its associated datafiles eventually become, you might face fragmentation issues and other troubles with backup time and moving about your data. It happens of the datafiles are not sized correctly.
Have a look at the illustration below. It shows the result of executing CREATE DATABASE and providing the name of the database (MyDB).
Listing 1 shows the created database details:
-- Listing 1: Create Database Script
USE [master]
GO
/****** Object: Database [MyDB] Script Date: 29/11/2020 10:38:18 pm ******/
CREATE DATABASE [MyDB]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'MyDB', FILENAME = N'C:\MSSQL\Data\MyDB.mdf' , SIZE = 102400KB , MAXSIZE = UNLIMITED, FILEGROWTH = 131072KB )
LOG ON
( NAME = N'MyDB_log', FILENAME = N'E:\MSSQL\Log\MyDB_log.ldf' , SIZE = 102400KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [MyDB].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
You might wonder where SQL Server got all those option from, since all we did was issuing CREATE DATABASE MyDB.
SQL Server uses model database settings as default values for any new database created on that instance. In this case we see the initial file size of 100MB. The autogrowth is 12MB and 64MB respectively for the data and log files.
The Ensuing Problems
The implication of the settings highlighted in Figure 1 are:
- The primary data file starts from the size of 100MB. It is a small size. Thus, depending on the activity level in the database, it will need to grow very soon.
- Whenever there is a need to autogrow the datafile, SQL server needs to acquire 128MB from available space on the operating system. Again, it is small, implying that the database will autogrow frequently. The database growth is an expensive operation that can affect performance if it happens too often. Besides, frequent database growths can cause a phenomenon called fragmentation that, in its turn, has a knack for cause severe performance degradation on databases. The other extreme of setting the increment to high value may result in growth operations taking long to complete, depending on the underlying storage system performance.
- The database files can grow indefinitely. It means, with a sufficient time allowed, these files can consume all the space on the volume where they are sitting. To move them, you require a volume of their size or more. Another option is adding storage to the volume when these files are sitting.
These are key problems associated with depending on default values for creating databases.
Pre-allocation
Given the performance impact of growth, it would make more sense to properly size the database at the project beginning. This way, we accommodate the solution requirements for the foreseeable future.
Assume that we know our database will eventually become 1GB in size. We might allocate 1GB of storage when the project starts. Then, the database never needs to grow. It eliminates the fragmentation problems caused by the database growth.
Listing 2 shows the script applicable for this pre-allocation:
-- Listing 2: Create Database Script with Pre-allocation
USE [master]
GO
/****** Object: Database [MyDB] Script Date: 29/11/2020 10:38:18 pm ******/
CREATE DATABASE [MyDB]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'MyDB', FILENAME = N'C:\MSSQL\Data\MyDB.mdf' , SIZE = 1024MB , MAXSIZE = 2048MB, FILEGROWTH = 512MB )
LOG ON
( NAME = N'MyDB_log', FILENAME = N'E:\MSSQL\Log\MyDB_log.ldf' , SIZE = 512MB , MAXSIZE = 2048GB , FILEGROWTH = 512MB )
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [MyDB].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
We mention 1GB of space for demonstration purposes. Typically, a production database may require 1TB. The point is: allocate the necessary space at the beginning. Then you eliminate or significantly reduce the need for growth.
Now, we must ask ourselves whether we really want a single 1TB file sitting on our volume. It would be wise to break it down into smaller chunks. When parallel operations, such as backups, occur, each file will be addressed by a single CPU thread for a multiprocessor system. With a single file, it would not go smoothly.
Again, we modify our script to accommodate this requirement in Listing 3:
-- Listing 3: Create Database Script with Pre-allocation and
USE [master]
GO
/****** Object: Database [MyDB] Script Date: 29/11/2020 10:38:18 pm ******/
CREATE DATABASE [MyDB]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'MyDB01', FILENAME = N'C:\MSSQL\Data\MyDB01.mdf' , SIZE = 256MB , MAXSIZE = 512MB, FILEGROWTH = 512MB ) ,
( NAME = N'MyDB02', FILENAME = N'C:\MSSQL\Data\MyDB02.ndf' , SIZE = 256MB , MAXSIZE = 512MB, FILEGROWTH = 512MB )
( NAME = N'MyDB03', FILENAME = N'C:\MSSQL\Data\MyDB03.ndf' , SIZE = 256MB , MAXSIZE = 512MB, FILEGROWTH = 512MB ) ,
( NAME = N'MyDB04', FILENAME = N'C:\MSSQL\Data\MyDB04.ndf' , SIZE = 256MB , MAXSIZE = 512MB, FILEGROWTH = 512MB ) ,
( NAME = N'MyDB05', FILENAME = N'C:\MSSQL\Data\MyDB05.ndf' , SIZE = 256MB , MAXSIZE = 512MB, FILEGROWTH = 512MB )
LOG ON
( NAME = N'MyDB_log', FILENAME = N'E:\MSSQL\Log\MyDB_log.ldf' , SIZE = 512MB , MAXSIZE = 2048GB , FILEGROWTH = 512MB )
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [MyDB].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
Additional Information
We should also mention that there is no value in using this approach for log files. The thing is, SQL Server always writes to log files sequentially. Also, we have used the .ndf extension for the new files we are adding.
The MAXSIZE clause ensures that our data files will not grow indefinitely. We have given each file a different logical and physical name – the NAME clause specifies the logical name of the file, and the FILENAME clause specifies the physical name.
Creating a database with larger datafiles will take more time than otherwise. It may me more reasonable to create a small database first, and then manipulate it with appropriate commands to resize and add files, until we establish an ideal database structure.
By creating the database with explicit options, we have addressed the three concerns raised earlier in this article. Figure 2 shows the result of this approach:
Now we have a database properly configured to accommodate the data growth for a protracted period without the need for datafile growth. It removes the fragmentation risks and helps to ensure a better data files management.
Managing Datafiles
Instead of creating four or five datafiles in the CREATE DATABASE statement, we can use the MODIFY and ADD clauses of the ALTER DATABASE T-SQL statement.
Once again, we begin with the statement shown in Listing 4 below. It creates a single database with the 100MB data file and an accompanying log file. Our goal is to ensure pre-allocation by extending this file and then adding more files.
-- Listing 4: Create Database Script
USE [master]
GO
IF EXISTS (SELECT * FROM sys.databases WHERE name='MyDB')
DROP DATABASE MyDB;
/****** Object: Database [MyDB] Script Date: 29/11/2020 10:38:18 pm ******/
CREATE DATABASE [MyDB]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'MyDB', FILENAME = N'C:\MSSQL\Data\MyDB.mdf' , SIZE = 102400KB , MAXSIZE = UNLIMITED, FILEGROWTH = 131072KB )
LOG ON
( NAME = N'MyDB_log', FILENAME = N'E:\MSSQL\Log\MyDB_log.ldf' , SIZE = 102400KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [MyDB].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
Extend Data Files
The T-SQL Statement that extends a data file is shown in Listing 5. It specifies the database name, the file name, and the desired initial size and increment. In this case, we set the SQL Server to allocate 1GB on start and then allocate 512MB for all subsequent auto growths.
-- Listing 5: Extend the Primary Datafile
USE [master]
GO
ALTER DATABASE [MyDB] MODIFY FILE ( NAME = N'MyDB', SIZE = 1048576KB , FILEGROWTH = 524288KB )
GO
Listing 6 shows what the code would look like if we specify the MAXSIZE of 2GB:
-- Listing 6: Extend the Primary Datafile with Maximum Size
USE [master]
GO
ALTER DATABASE [MyDB] MODIFY FILE ( NAME = N'MyDB', SIZE = 1048576KB , MAXSIZE = 2097152KB , FILEGROWTH = 524288KB )
GO
If we set the FILEGROWTH clause to 0, we set our SQL Server NOT to autogrow the data file. In this case, we need to explicitly issue commands to grow the data file or add other files.
Adding Data Files
Listing 7 shows the code we use to add a new data file to the database. Note that we must again specify the logical file name and the physical file name which includes a full path.
Also, we can place the physical file on a different volume. For that, we only need to change the path.
-- Listing 7: Add Data Files to the Primary Filegroup
USE [master]
GO
ALTER DATABASE [MyDB] ADD FILE ( NAME = N'MyDB01', FILENAME = N'C:\MSSQL\Data\MyDB01.ndf' , SIZE = 1048576KB , FILEGROWTH = 524288KB ) TO FILEGROUP [PRIMARY]
GO
Extending and adding data files also apply to the scenario where we choose to disable autogrowth for our databases (See Figure 4).
Then we need to extend the database manually using the above codes from Listings 5 or 6, or add files as in Listing 7.
Using Filegroups
Filegroups let us manage the data files collections together. We can logically group some data files stored on different disks or different volumes into one file group. That filegroup creates an abstraction layer between the tables and indexes, and the actual physical files storing the data.
Therefore, if we create a table on a filegroup, the data in this table get spread across all the data files assigned to the filegroup.
Till this point, we have been dealing with the PRIMARY filegroup only. Listing 8 shows how we can add a new file MyDB02 to a filegroup, other than the primary filegroup.
The first statement after setting the database context to master creates the new filegroup FG01. The next statement then adds the file to this new filegroup with similar options to those used in Listing 7.
-- Listing 8: Add Data Files to the Primary Filegroup
USE [master]
GO
ALTER DATABASE [MyDB] ADD FILEGROUP [FG01]
GO
ALTER DATABASE [MyDB] ADD FILE ( NAME = N'MyDB02', FILENAME = N'C:\MSSQL\Data\MyDB02.ndf' , SIZE = 102400KB , MAXSIZE = 2097152KB , FILEGROWTH = 524288KB ) TO FILEGROUP [FG01]
GO
Dropping Data Files
Figure 5 shows the outcome of the operations we have performed so far. We have three data files. Two of them are in the PRIMARY filegroup, and the third one is in the FG01 filegroup.
Let’s assume we’ve done something wrong, for instance, set the wrong file size. Then, we can drop the filegroup using the following code in Listing 9:
-- Listing 9: Drop Data Files
USE [MyDB]
GO
ALTER DATABASE [MyDB] REMOVE FILE [MyDB02]
GO
Conclusion
This article explored the database files types, the possible complications caused by the datafiles growth, and the ways of resolving the problem. Also, we examined the T-SQL codes for extending data files and adding new data files to a database. We also touched on the use of filegroups.
Our goal is to ensure that when we deploy databases, we prepare the database to store all the data it will ever need for a particular application.