Before going through the Forwarded Records performance issue and resolving it, we need to review the structure of the SQL Server tables.
Table Structure Overview
In SQL Server, the fundamental unit of the data storage is the 8-KB Pages. Each page starts with a 96-byte header that stores the system information about that page. Then, the table rows will be stored on the data pages serially after the header. At the end of the page, the row offset table, that contains one entry for each row, will be stored opposite to the sequence of the rows in the page. This row offset entry shows how far the first byte of that row is located from the start of the page.
SQL Server provides us with two types of tables, based on the structure of that table. The Clustered table stores and sorts the data in the data pages based on the predefined Clustered index key column or columns values. In addition, the data pages within the Clustered table are sorted and linked together in a linked list based on the Clustered index key values. The B-tree structure of the Clustered index provides a fast data access method based on the Clustered index key values. If a new row is inserted or an existing key value is updated in the Clustered table, SQL Server will store the new value in the correct logical position that fits the inserted row size without breaking the ordering criteria. If the inserted or updated value is larger than the available space in the data page, the page will be split into two pages to fit the new value.
The second type of tables is the Heap table, in which the data is not sorted within the data pages in any order and the pages are not linked together, as there is no Clustered index defined on that table, to enforce any sorting criteria. Tracking the pages that are not sorted in any ordering criteria or linked together in the heap table is not an easy mission. To simplify the tracking process of the page allocation within the heap table, SQL Server uses the Index Allocation Map (IAM), the only logical connection between the data pages in the heap table, by keeping an entry for each data page in the table or the index in the IAM table. To retrieve any data from the heap table, SQL Server Engine scans the IAM to locate the extent, which forms 8 pages that store the requested data.
Forwarded Records Issue
If a new row is inserted into the heap table, SQL Server Engine will scan the Page Free Space (PFS) pages to track the allocation status and the space usage on each data page in order to find the first available location in the data pages that fits the inserted row size. Then, the row will be added to the selected page. If the inserted value is larger than the available space in the data pages, a new page will be added to that table to be able to insert the new value.
On the other hand, if the existing data in the heap table is modified, for example, we updated a variable length string with larger data size, and the current space does not fit the new data, the data will be moved to a different physical location and the Forwarded Record will be inserted into the heap table in the original data location, to point to the new location of that data and to simplify the tracking data location. The new data location contains also a pointer that points at the forwarding pointer in order to keep it updated in the case of moving the data from the new location and to prevent the long forwarding pointer chain or delete it. This may lead to removing the forwarding record as well.
Although the Forwarded Records redirection method reduces the need for the resource-intensive table and non-clustered indexes rebuild operations to update the data addresses each time the location of the data is changed, it also doubles the number of reads required to retrieve the data. SQL Server will visit the old location first, where it will find the Forwarded Record that redirects it to the new data location. Then, it will read the requested data, performing the read operation twice. In addition, the Forwarded Records issue leads to changing the sequential data read into random data read affecting the data retrieval operation performance over time negatively.
Let us create the following ForwardRecordDemo heap table using the CREATE TABLE T-SQL statement below:
CREATE TABLE ForwardRecordDemo ( ID INT IDENTITY (1,1), Emp_Name NVARCHAR (50), Emp_BirthDate DATETIME, Emp_Salary INT )
Then, populate that table with 3K records for testing purposes, using the INSERT INTO T-SQL statement below:
INSERT INTO ForwardRecordDemo VALUES ('John','2000-05-05',500) GO 1000 INSERT INTO ForwardRecordDemo VALUES ('Zaid','1999-01-07',700) GO 1000 INSERT INTO ForwardRecordDemo VALUES ('Frank','1988-07-04',900) GO 1000
Identifying the Forwarded Records Issue
The information about the table type and the number of pages consumed while storing the table data, as well as the index fragmentation percentage and the number of Forwarded Records for a specific table can be viewed by querying the sys.dm_db_index_physical_stats system dynamic management function and by passing to the DETAILED mode to return the number of Forwarding Records. To do this, use the T-SQL script below:
SELECT OBJECT_NAME(PhysSta.object_id) as DBTableName, PhysSta.index_type_desc, PhysSta.avg_fragmentation_in_percent, PhysSta.forwarded_record_count, PhysSta.page_count FROM sys.dm_db_index_physical_stats (DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED') AS PhysSta WHERE OBJECT_NAME(PhysSta.object_id) = 'ForwardRecordDemo' AND forwarded_record_count is NOT NULL
As you can see from the query result, the previous table is the heap table that has no Clustered index created on it to sort the data in the pages and link the pages between each other. The 3K rows inserted into the table are assigned to 15 data pages, without forwarded records and zero fragmentation percentage, as shown in the result below:
When you define the data type of a column as VARCHAR or NVARCHAR, the value specified in the data type definition is the maximum allowed size for that string, without fully reserving that amount while saving the values into the data pages. For example, the John employee name inserted into that table will reserve only 8 bytes out of the maximum 100 bytes for that column, taking into consideration that saving the NVARCHAR string will double the bytes required for the VARCHAR column, as shown in the DATALENGTH function result below:
If you want to update the value of the Emp_Name column in order to include the full name of the John employee, use the UPDATE statement below:
UPDATE ForwardRecordDemo SET Emp_Name='John David Micheal' WHERE Emp_Name='John'
Check the length of the updated column using the DATALENGTH function. You will see that the length of the Emp_Name column in the updated rows has been expanded by 28 bytes per each column, which is about 3.5 additional data pages to that table, as shown in the result below:
Then, check the number of Forwarded Records after the update operation by querying the sys.dm_db_index_physical_stats system dynamic management function. To do this, use the T-SQL script below:
SELECT OBJECT_NAME(PhysSta.object_id) as DBTableName, PhysSta.index_type_desc, PhysSta.avg_fragmentation_in_percent, PhysSta.forwarded_record_count, PhysSta.page_count FROM sys.dm_db_index_physical_stats (DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED') AS PhysSta WHERE OBJECT_NAME(PhysSta.object_id) = 'ForwardRecordDemo' AND forwarded_record_count is NOT NULL
As you can see, updating the Emp_Name column on 1K records with larger string values, without adding any new record, will assign the extra 5 pages to that table, rather than 3.5 pages as expected previously. This will happen due to generating 484 forwarded records to point to the new locations of the moved data. This may cause the table to be 33% fragmented, as shown clearly below:
Again, if you manage to update the value of the Emp_Name column to include the full name of the Zaid employee, use the UPDATE statement below:
UPDATE ForwardRecordDemo SET Emp_Name='Zaid Fuad Zreeq' WHERE Emp_Name='Zaid'
Check the length of the updated column using the DATALENGTH function. You will see that the length of the Emp_Name column in the updated rows expanded by 22 bytes per each column, which is about 2.7 additional data pages added to that table, as shown in the result below:
Check the number of forwarded records after performing the update operation. You can do this by querying the sys.dm_db_index_physical_stats system dynamic management function using the same T-SQL script below:
SELECT OBJECT_NAME(PhysSta.object_id) as DBTableName, PhysSta.index_type_desc, PhysSta.avg_fragmentation_in_percent, PhysSta.forwarded_record_count, PhysSta.page_count FROM sys.dm_db_index_physical_stats (DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED') AS PhysSta WHERE OBJECT_NAME(PhysSta.object_id) = 'ForwardRecordDemo' AND forwarded_record_count is NOT NULL
The result will show you that updating the Emp_Name column on the other 1K records with larger string values without inserting any new row will assign another 4 pages to that table, rather than 2.7 pages as expected. This will happen due to generating additional 417 forwarded records in order to point to the new locations of the moved data and keeping the same 33% fragmentation percentage, as shown below:
Fixing the Forwarded Records Issue
The simplest way to fix the Forwarded Records issue is to estimate the maximum length of the string that will be stored in the column and assign it using the fixed length data type for that column rather than using the variable length data type. The optimal permanent way to fix the Forwarded Records issue is adding the Clustered index to that table. In this way, the table will be completely converted into a Clustered table, that is sorted based on the Clustered index key values. It will control the order of the existing data, the newly inserted and updated data that does not fit the current available space in the data page, as described previously in the introduction of this article.
If adding the Clustered index to that table is not an option for specific requirements, such as the staging tables or the ETL tables, you can overcome the Forwarded Records issue temporarily by monitoring the Forwarded Records and rebuilding the heap table to remove it, that will also update all Non-clustered indexes on that heap table. The functionality of rebuilding the heap table is introduced in SQL Server 2008, by using the ALTER TABLE…REBUILD T-SQL command.
To see the performance impact of the Forwarded Records on the data retrieval queries, let us run the SELECT query that performs the search based on the Emp_Name column valuesю However, before executing the query, enable the TIME and IO statistics:
SET STATISTICS TIME ON SET STATISTICS IO ON SELECT * FROM ForwardRecordDemo WHERE Emp_Name like 'John%'
As a result, you will see that 925 logical read operations are performed to retrieve the requested data within 84ms as shown below:
To rebuild the heap table in order to remove all Forwarded Records, use the ALTER TABLE…REBUILD command:
ALTER TABLE ForwardRecordDemo REBUILD;
Run the same SELECT statement again:
SELECT * FROM ForwardRecordDemo WHERE Emp_Name like 'John%'
The TIME and IO statistics will show you that only 21 logical read operations compared to the 925 logical read operations with the Forwarded Records included are performed to retrieve the requested data within 79ms:
To check the number of forwarded records after rebuilding the heap table, run the sys.dm_db_index_physical_stats system dynamic management function, use the same T-SQL script below:
SELECT OBJECT_NAME(PhysSta.object_id) as DBTableName, PhysSta.index_type_desc, PhysSta.avg_fragmentation_in_percent, PhysSta.forwarded_record_count, PhysSta.page_count FROM sys.dm_db_index_physical_stats (DB_ID(), DEFAULT, DEFAULT, DEFAULT, 'DETAILED') AS PhysSta WHERE OBJECT_NAME(PhysSta.object_id) = 'ForwardRecordDemo' AND forwarded_record_count is NOT NULL
You will see that only 21 pages, with the previous 3 pages consumed for the Forwarded Records, are assigned to that table to store the data, which is similar to the estimated result we have got during the data insert and update operations (15+3.5+2.7). After rebuilding the heap table, all Forwarded Records are removed now. As a result, we have a table with no fragmentation:
The Forwarded Records issue is an important performance issue that the database administrators should consider when planning for the heap table maintenance. The previous results are retrieved from our testing table that contains only 3K records. You can imagine the number of pages that will be wasted by the Forwarded Records and the I/O performance degradation, due to reading a large number of Forwarded Records when reading from huge tables!
References:
- Pages and Extents Architecture Guide
- dm_db_index_physical_stats (Transact-SQL)
- ALTER TABLE (Transact-SQL)
- Knowing about ‘Forwarded Records’ can help diagnose hard to find performance issues