SQL Server Transactional Replication is one of the most commonly used replication techniques used to copy or distribute data across multiple destinations. In the previous article, we discussed SQL Server Replication, Types of Replication, and the Basic Internals on how the Transactional Replication works. Now, we are going to dive into Advanced Internals of how SQL Server Transactional Replication works.
Transactional Replication Architecture
Before we start, I recommend you to refresh your knowledge with my previous article here.
Let’s start by looking into SQL Server Transactional Replication Architecture shown below from Microsoft documentation.
In the Publisher Database, create a Publication comprising the list of Articles (Tables, Views, etc.,) that you need to replicate to the Subscriber database. Once the Articles are enabled for Replication, any changes happening on these articles will be marked for Replication in the Transactional Logs of Publisher database.
SQL Server Transactional Replication can be initialized from the Publisher to Distributor and then to the Subscriber database via Snapshot Agent or Full Backups. The Snapshot Agent can perform the Initialization via the Replication Configuration Wizard. The Full Backup is supported via T-SQL statements only.
The Log Reader Agent scans the Transactional Log of the Publisher database to detect the tracked changes marked for Replication. It ignores other changes captured in the Transactional Logs and copies data changes from the Transactional Log to the Distribution database.
The Distribution database can be either in Publisher or Subscriber, or it can be in another independent SQL Server instance. Note the following things:
- Log Reader Agent runs continuously from the Distributor Server to scan for new commands marked for replication. However, if you don’t want to run continuously and want to run on a schedule instead, we can change the Log Reader Agent SQL Job that will be created out.
- Log Reader Agent picks up all records that are marked for replication from the Transactional Log in batches and sends it to the Distribution database.
- Log Reader Agent picks up only Committed transactions from the Transactional Log of Publisher Database. So, any long-running queries on the Publisher database can directly impact Replication as it is waiting for the active transaction to complete.
The Distribution Agent picks up all undistributed new commands from the Distribution database and applies them to the Subscription database via either Push or Pull Mechanism.
- Push Subscription – the Distributor takes ownership to apply changes from the Distribution database to the Subscriber.
- Pull Subscription – the Subscriber database takes ownership to fetch changes from the Distribution database to the Subscriber.
Once the records are distributed successfully from Distribution to the Subscriber database, they will be marked as Distributed and also marked for deletion from the Distribution database.
One of the Key Replication Maintenance jobs is the Distribution Clean Up: The Distribution job runs every 10 minutes to delete distributed records from the Distribution database to maintain the size of the Distribution database under control.
Therefore, our goal for the current article is to explore the following topics:
- Distribution Database
- Replication Agents
- Snapshot Agent
- Log Reader Agent
- Distribution Agent
- Replication Agent Profiles
- Replication Maintenance Jobs
- Replication Latency and Tracer Tokens
- TableDiff Utility
- SQL Server Agent Alerts
SQL Server Distribution Database
A distribution database is a system database created while configuring Replication. It is the heart of Replication since most of the process runs out of a distribution database.
Due to the nature of the distribution database, we can perform only limited operations on it, such as Backup and Restore. We can’t drop it directly like User databases.
For a huge database with lots of data being replicated, we need to take few special measures to improve Replication throughput performance:
By default, the distribution database is created on the default installation path configured in SQL Server. If not configured, it will be created on the C: drive or in the SQL Server Installation folders. We’d recommend you move the distribution database to a faster storage/ disk to improve the performance.
The Initial File Size and Autogrowth of the distribution database will be set according to the model database’s Initial File Size and Autogrowth settings. Configure the Initial File size to a better value like 10GB in case of the transactionally busy database replication. The Autogrowth properties should be up to 512 MB or 1 GB for both Data and Log files. Then, there won’t be much fragmentation to the Data and Log Files.
Configure the Daily or Routine Backup jobs to include the distribution database for reference purposes or troubleshooting in case of any data corruption or loss.
Configure the Daily Index Reorganization or Maintenance jobs to include the distribution database. The database involves huge data insertions into the MSrepl_transactions and MSrepl_commands tables.
Note: Continuous polling on these 2 tables and DELETE from them after sending data successfully to the Subscriber database increases the risk of fragmentation. Rebuilding these tables on a scheduled basis can improve the distribution database performance.
To view and modify any of the Distribution database attributes related to Replication, right-click on Replication > Distributor Properties:
Click on the ellipsis button on the right to view more details about the individual options listed out.
Pay attention, changing any parameters can impact the distribution database performance. Hence, implement changes only after you evaluate carefully all parameters you wish to modify.
Transaction Retention specifies how much data should be retained in the distribution database. The minimum and maximum values can be specified in hours or days.
The Transaction Retention value set as 0 hours indicates that once records are successfully replicated to the Subscriber database, they can be deleted from the distribution database. If you increase this value, the distribution database size will increase. Thus, we need to plan it accordingly.
History Retention specifies the retention period to store the Transactional Replication Performance History data. By default, it is 48 Hours.
To drop a distribution database, we need to Disable Publication associated with that particular distribution database and then drop it using one of the two methods. One is a combination of Disable Publishing and the Distribution wizard. Another one is using sp_dropdistributor or sp_dropdistributiondb procedures. The Wizard internally uses these 2 procedures to disable Distribution and drop the distribution database.
SQL Server Replication Agents
Replication agents are standalone programs responsible for tracking data changes from Publisher and propagating those changes to Distributor and Subscriber databases. They are executed as SQL Server Agent jobs.
First, let’s see the location of these standalone programs.
To configure Replication, we need to have the Replication components installed via the SQL Server installer. When done, we can see the Replication agent-related standalone programs available in the installation path:
C:\Program Files\Microsoft SQL Server\130\COM
In my case, the version of SQL Server version is 2016. Therefore, it is under 130 in the path.
For every Replication Agent, we can see the respective standalone program available:
- DISTRIB.exe – Distribution Agent
- Logread.exe – Log Reader Agent
- Qrdrsvc.exe – Queue Reader Service Agent
- Replmerg.exe – Merge Replication Agent
- Snapshot.exe – Snapshot Agent
- Tablediff.exe – Utility to Compare Tables. We can get into more details later in this article.
Now that we know what these standalone programs are responsible for and where they are located, we can understand how they are executed via SQL Server Agent Jobs.
Since we are dealing with SQL Server Transactional Replication, we will go through the Snapshot Agent, Log Reader Agent, and Distribution Agent jobs (the same logic applies to all other agents).
The Snapshot Agent runs from the Server holding the distribution database. It prepares the Schema and initial data of all Articles included in a Publication on a Publisher, creates the Snapshot files in the snapshot folder, and records the Synchronization details in the Distribution database.
From the distribution MSSnapshot_agents table, we can identify the SQL Server Agent Job that does the Snapshot Agent activities. Every Publication involves a dedicated SQL Server Agent job that must take care of the Snapshot Agent Responsibilities.
Expand SQL Server Agent and open the job name mentioned above. It will display the details and the Category name – REPL-Snapshot
Click on the Step to see activities performed by the Snapshot agent.
Click on some individual step to view the info on the Snapshot agent job.
Step 1 logs an entry to the history table every time the Snapshot agent starts by using the sp_MSadd_snapshot_history procedure.
The table that holds the history of the details executed by the Snapshot agent is the MSsnapshot_history table in the distribution database.
It will match the View Snapshot Agent Status dialog window.
Move to Step 2 – Run Agent. It will start the Snapshot Agent job.
Under the Command, we couldn’t find any T-SQL statements or queries. There were only some parameters listed out. So, the answer lies in the highlighted section on the above illustration. It shows that the Job Step Type is Replication Snapshot which launches the snapshot.exe standalone program to perform the Snapshot Agent responsibilities.
For more details about snapshot.exe, refer to this MSDN article. We’ll also go into detail while troubleshooting the Replication-related issues later.
Finally, we are going to Step 3 – the last Job Step. It captures any unexpected shutdowns of Agent Jobs and logs them out.
Log Reader Agent
Whenever the Publication is configured on a database, any changes that happen to those Articles get marked for Replication in the Transaction Log. The Log Reader Agent reads those data changes via logread.exe and stores them to the Distribution Database via 2 separate processes:
- Read Transactional Logs – it uses the sp_replcmds extended stored procedure to scan for data changes from the Publisher. Since this stored procedure references DLL files, internals on how exactly Microsoft reads log files are not identified. However, we can try undocumented functions like fn_dblog() and fn_dump_dblog() to understand how the Transactional Log file works.
- Write to Distribution Database – it uses the sp_MSadd_replcmds stored procedure in the distribution database to write the binary data read from the Transactional Logs of the Publisher database. It writes the transaction details to the MSrepl_transactions table and individual commands to the MSrepl_commands table.
Only one Log Reader SQL Server Agent job is available for every Published database. You can identify its name as shown below:
Expand the SQL Server Agent and open up the above Log Reader Agent job to view the steps. It will display the job Сategory under Replication Log Reader.
Click on Steps to see individual steps performed by the Log Reader Agent. As with the Snapshot Agent job, we can see 3 equivalent steps for the Log Reader Agent job.
Step 1 calls the sp_MSadd_logreader_history procedure to log the Startup status history messages of the Log Reader Agent to the MSlogreader_history table.
Step 2 starts the Log Reader Agent process using the logread.exe standalone program.
You can find more details about logread.exe in the respective MSDN article. Later, we’ll also examine the critical configuration parameters of the Log Reader Agent.
Step 3 captures an abrupt shutdown of the Log Reader Agent job.
Distribution Agent (DISTRIB.exe) was used by Transactional and Snapshot Replication to apply the initial Snapshot files and incremental or apply available pending transactions from the Distribution database to the Subscriber database.
This Agent runs from the Distributor Server for the Push Subscriptions and the Subscriber Server for the Pull Subscriptions. To find the name of the SQL Server Agent Job that performs the Distribution agent responsibilities, we can execute the specific query as shown below:
Expand the SQL Server Agent job and open it to see more information and the category assigned to the Replication Distribution.
Click on Steps – you will see the steps similar to the previously exposed steps of the Snapshot and Log Reader Agent jobs.
Step 1 calls the sp_MSadd_distribution_history procedure to log the Startup status history messages of the Log Reader Agent to the MSdistribution_history table.
Step 2 starts the Distribution Agent process (DISTRIB.exe) with the default parameters.
For more details about DISTRIB.exe, turn to the MSDN article. Further, we’ll go through the critical configuration parameters of the Distribution agent in the next articles.
Step 3 captures details about the abrupt shutdown of the Distribution Agent job.
Replication Agent Profiles
From the Distributor Properties, we can get the option to view the Replication Agent Profiles. Leave the Agent Profiles to the default values and change only as required for troubleshooting purposes.
Click on Profile Defaults to view the default values configured for all Replication Agents available in the Server.
Select the Distribution Agents section and click the ellipsis button next to the Default agent profile to see the configured values. See the illustration below:
View the Default agent profile of the Snapshot Agents reader Agent:
Default agent profile for the Log Reader Agent:
Replication Maintenance Jobs
Besides Replication Agents, we have Replication Maintenance Jobs.
These are SQL Server Agent jobs created while configuring the SQL Server Transactional Replication. They are available to ensure that the Transactional Replication is working correctly.
Some Jobs on Replication Maintenance are essential for Transactional Replication. Let’s review them.
- Distribution Clean Up: Distribution – executes the sp_MSdistribution_cleanup procedure to delete replication commands from the MSrepl_transactions and MSrepl_commands tables. Clean-up happens on Distribution database after commands are successfully sent to the Subscriber database based upon on the Transaction Retention period value configured in the distribution database. By default, this job runs every 10 minutes on the distribution database. Change these values only after in-depth evaluation.
- Agent History Clean Up: Distribution – executes the sp_MShistory_cleanup procedure in the distribution database to clean up historical records older than the History Retention period configured in that database. By default, it is configured for 48 days and executed every 10 minutes. If you want to change these values, consider all aspects carefully.
- Expired Subscription Clean Up – executes the sp_expired_subscription_cleanup procedure in the master database to drop those subscriptions that expired or were inactive for a long time. By default, this procedure executes once a day.
Replication Latency and Tracer Tokens
Replication Latency is the time required by the Replication process to track any changes happening on published articles from the Publisher database till it is delivered successfully to the Subscriber via Distributor.
Replication Latency is measured in milliseconds. The target value of 0 (real-time replication) to a very low value (ideal cases). It is one of the key measures to monitor the Replication performance.
We can verify the Replication Latency using Replication Monitor or the dedicated sp_replcounters procedure.
Since the Replication Monitor has the refresh rate, there might be slight deviations from the values observed. To overcome the slight deviations while calculating the Replication latency, Tracer Tokens come to our rescue.
Click on the Tracer Tokens tab (see the image above) to send a new set of test commands from the Publisher. Then, measure it when it reaches the Distributor database, and when it was sent to the Subscriber database. Click on Insert Tracer to send tracer tokens from the Publisher database:
Once the records are successfully received in Subscriber, we can track the total Replication latency for our current setup. In our case, it is 9 seconds: 4 seconds from Publisher to Distributor, and 5 seconds from Distributor to Subscriber.
Tablediff utility(tablediff.exe) will be installed in the path C:\Program Files\Microsoft SQL Server\130\COM once we have the Replication Components installed out.
TableDiff utility compares 2 tables for non-convergence. It means that we can compare 2 tables and identify the differences between them. Then it synchronizes the Destination table compared to the Source table by generating dedicated INSERT/UPDATE/DELETE scripts. More details are available in the official documentation.
Since SQL Server Transactional Replication doesn’t care about manual changes on the Subscriber database, this utility can help synchronize these kinds of tables as and when required. However, it does not have a Wizard or UI – you can only access it via the Command Prompt or from batch files.
Other tools can make comparison and synchronization simpler. The dbForge Compare Bundle for SQL Server checks for the discrepancies in the databases and specific tables identifies and analyzes them. It also generates the necessary scripts to synchronize them. It offers a visual interface and a bunch of options to run the tasks fast and straightforward.
SQL Server Agent Alerts
All key components related to Replication Agents are residing as Jobs reside under the SQL Server Agent jobs. Hence, it is critical to monitor how SQL Server Agent jobs function continually to ensure that Replication works without any issues. The most common issues are below:
- Permissions issue executing any of the Replication Agent Jobs
- Permissions issue executing any of the Replication Maintenance Jobs.
- Permissions issue accessing Publisher or distribution or Subscriber database.
- SQL Server Agent not configured to start automatically on server restart.
- Several other Replication related Data issues like conflicts, missing data, and so on.
That’s why we should have a proper alerting mechanism in place to notify the DBA or other person about any issue immediately.
To alert DBAs or other people in case of any job failures or errors, we should configure the Database Mail to send email alerts. It allows the DBA to respond at once and fix the problem. We’ll discuss how to configure the Database Mail and Alerts in a separate article later.
While configuring Replication, SQL Server by default creates the below set of Alerts. You can configure them easily for the required criteria. It also ensures sending Notifications to required people for immediate action.
Thanks for going through another huge article about Replication. I hope it helped to clarify the internals of Transactional Replication and the details about Distribution Database, Replication Agents, and Standalone Programs responsible for those. We also identified the Replication Latency, Alerts, and Tracer Tokens.
Now we can dive deeper and learn how to treat and resolve Replication issues professionally. Stay tuned for the next article!