Chances are you already know that every SQL Server database contains one or more transaction log files, in addition to data files, that records all the transactions and database modifications made by each transaction. The transaction log is an essential part of any database, and the database administrator must keep it under observation. One of the most important tasks for the database administrator is to truncate the transaction log file to keep it from overflow. Why? Because the transaction log plays the most important role when it comes to the database recovery.
This article describes the internals of the transaction log and examines how SQL Server logs data modifications. It also explains how SQL Server performs database crash recovery.
Introduction to the SQL Server Transaction Log File
The transaction log ensures the transactional consistency of the data in the database. Remember the ACID properties of transactions:
All these properties guarantee that SQL Server database engine processes transactions reliably. Each transaction is “all or nothing”. If a part of the transaction fails, the entire transaction fails, and the database state stays unchanged.
SQL Server stores a record of all the transactions performed on the database into the associated log file. In the case of some disasters that may entail SQL Server shutdown, it uses transaction log to recover the database to a consistent state, with data integrity. After the restart, SQL Server starts the crash recovery process, it reads the transaction log file to ensure that all valid data is stored in the data files, and uncommitted transactions are rolled back.
During normal operation, SQL Server also uses the transaction log. The information contained in the file is needed to identify what SQL Server needs to do when a transaction rolls back, due to either an error or a user-specified ROLLBACK statement.
How SQL Server Uses the Transaction Log
There are at least three physical files associated with every SQL Server database. They are:
- primary data file (.mdf)
- secondary data file (.ndf)
- transaction log file (.ldf)
The transaction log is a physical file with the extension LDF. SQL Server creates the log file automatically for any new database, along with the primary data file, which stores the database objects and the data itself.
Whenever T-SQL code changes a database object or the data it contains, details of the change are recorded as a log record in the transaction log file. The log record contains the information of a specific change made to the database (e.g. insert a single row), so there will be a series of log records to describe fully the effects of a single transaction.
Transaction Log Architecture
Log Sequence Numbers
A log record has a unique, auto-incrementing Log Sequence Number (LSN), that allows the log record to be found in the transaction log. LSN describes the data change and contains the information about:
- the operation and affected row
- the old and new version of the data
- the transaction that performed the modification
LSN is comprised of three numbers: LSN = <VLF sequence # [4bytes]>:<log block # [4bytes]>:<log record #[2 butes]>. Each data-file page has an LSN in its page header that identifies the most recent log record whose change is reflected on the page. This is absolutely critical for crash recovery. When crash recovery runs, it compares the LSNs of log records for committed or uncommitted transactions with the LSNs in data-file pages to determine whether there is any redo or undo that has to be done on those particular log records.
When you create a database, it is a very good practice to specify the size of the transaction log. If you do not do this, SQL Server will automatically create the transaction log with the default size. The default size of the transaction log of a new database, if not specified, is the larger of 0.5MB or 25% of the total size of all data files created in the same CREATE DATABASE statement.
You have to be very careful because new portions of the transaction log are always zero-initialized. If you have the CREATE DATABASE, statement where you do not specify the size of the log file, and you create, for instance, a database of 1TB, SQL Server will create 250GB transaction log. Therefore, it is always good to figure out how big your transaction log should be.
Because the log has to be zero-initialized, it does not make use of instant file initialization. This feature was added in SQL Server 2005 to allow data files to be created or grown almost instantaneously. We can see what is going on when we do CREATE DATABASE, we can see that zero-initialization occurring of our log, using trace flag 3004 which actually prints messages about zero-initialization, and trace flag 3605 that allows those log messages be printed by trace flag 3004.
The following demo shows how you can see the log file zeroing happening.
1. Execute the following script to make sure that we do not have a database called DBTest2014
USE master; GO IF DATABASEPROPERTY(N'DBTest2014', N'Version')>0 BEGIN ALTER DATABASE DBTest2014 SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE DBTest2014; END GO
2. Enable trace flags to watch zero-initialization
DBCC TRACEON (3605, 3004, -1); GO
3. Flush the error log
EXEC sp_cycle_errorlog; GO
4. Create a database
CREATE DATABASE DBTest2014 ON PRIMARY ( NAME = N'DBTest2014', FILENAME = N'D:\DBTest2014_data.mdf') LOG ON ( NAME= N'DBTest2014_log', FILENAME= N'D:\DBTest2014_log.ldf', SIZE = 10MB, FILEGROWTH = 10 MB); GO
5. Read the error log file
EXEC sys.xp_readerrorlog; GO
Virtual Log Files
The transaction log internally is split up into a series of chunks called virtual log files (VLFs for short). The reason the transaction log is split up in this way is to allow the management of the transaction log to be easier.
Whenever a transaction log is created, we got a certain number of VLFs. Newly created VLFs are inactive and unused. An active VLF cannot be reused until it is made inactive by log clearing. However, there is an exception, in a new database, the first VLF is always active. The reason for that is that any transaction log has to have at least one active VLF.
Every log file also has a file header page which takes 8KB at the start of the transaction log file. The file header page stores metadata about the file such as size and auto-growth settings.
How Many VLFs Do You Get?
The number and size of VLFs in a new portion of transaction log are determined by SQL Server and can not be configured. If the newly added size is:
- < 1MB is irrelevant for discussion
- < 64MB there will be 4 new VLFs (each 1/4 of growth size)
- 64MB to 1GB there will be 8 new VLFs (each 1/8 of growth size)
- > 1GB there will be 16 new VLFs (each 1/16 of growth size)
This applies to the initially created transaction log, and for each manual or automatic growth that occurs. You might think about why do we care about this? Well, knowing what the formula is and how many VLFs you going to get, and how big they are, can help you in the management of the log. If you have too few or too many VLFs this can cause performance problems with transaction log operations.
VLF Sequence Number
Every VLF has a sequence number, which serves to uniquely identify the VLF within the transaction log. The sequence number increase by one every time the log management system activates the next VLF. And the chain of the sequence numbers gives the currently active set of VLFs. The start of the active portion of the transaction log begins with the VLF that has the lowest sequence number and is still active. Inactive VLF still has the sequence numbers, but they are not part of the active portion of the log. The active portion of the log is the portion of the log that has log records that are still required for some reason by SQL Server.
When you first create a new database the VLF sequence numbers of the VLF don’t start at 1. They start with whatever the highest VLF sequence number is in the model database transaction log, plus 1. You might think it is possible to run out of VLF sequence numbers, but it is not. In fact, SQL Server has code that will force the instance to shutdown if a VLF sequence number ever wraps around to zero (if the next VLF sequence number is ever less than the previous one).
VLF and Log Blocks
Within VLFs there are variably sized log blocks. The minimum size of the log block is 512 bytes and log blocks grow up to a maximum size of 60 KB. The log block size is set when one of the following things occurs:
- A transaction generates a log record to commit of finish aborting a transaction.
- The log block size reaches 60KB without a transaction committing or aborting.
There are log records inside a log block (colored on the diagram). Log records are also variably sized. The diagram also shows that log records from multiple concurrent transactions can exist within the same log block. The log records stored in the order written, in a similar manner to a data-page file.
Every VLF contains a VLF header, which includes the following information:
- Whether the VLF is active or not.
- The log sequence number when the VLF was created.
- The current parity bits for all 512-byte blocks in the VLF.
The parity bits start at 64, for the very first time that VLF is used and then if VLF becomes inactive, and then activated again, the parity bits will become 128. These are used during crash recovery.