SQL Server Transactional Replication Internals

SQL Server Transactional Replication is one of the most common Replication techniques used to share, copy, or distribute data to multiple destinations. In this article, we will discuss Replication, various Replication Types, and pay special attention to the Transactional Replication work.

CodingSight - SQL Server Transactional Replication Internals

What is SQL Transactional Replication?

Replication is the SQL Server technology for Copying or Distributing data from one database to another while maintaining Data consistency.

Replication can be used to transfer data from one database to another

  • on the same instance or another instance in the same server;
  • or across Servers in a single location or multiple locations.

First, we should go through the Replication Architecture and understand the Replication terminologies.

SQL Server Replication Architecture & Terminologies

Replication Architecture & Terminologies
Image source
  • The Publisher is the Source Database instance that publishes the data changes that can be distributed to another database. Data from a Single Publisher can be sent to a Single Subscriber or multiple Subscribers.
  • Subscriber is the Destination Database Instance where we distribute the data changes captured from the Publisher Database. Subscriber can be either a Publisher Instance or another instance in Publisher Server/ another Server in the same location/distant location. Before the data changes are distributed to the Subscriber Database instance, these data are stored in the Distributor.
  • The Distributor is a Database that stores change logs captured from Publisher databases. When a Server is enabled as a Distributor, it will create a system Database named the distribution database.

By the location of distribution databases, they can be classified as either local or remote distributors.

Local Distributor is a distribution database residing on the Publisher database instance.
Remote Distributor is a distribution database residing either in the Subscriber database instance or in any other SQL Server instance apart from the Publisher database instance.

The deciding factor is where to place the Distribution database on the Publisher instance (another instance). it depends on the Server resources available to handle the Data Distribution load.

According to the way of how the data will be sent from the Distribution database to the Subscriber instance, it can be classified into either Push or Pull Subscriptions.

Push Subscription means that the Distribution database takes responsibility to push the data to the Subscriber database instance.
Pull Subscription means that the Subscriber database instance takes responsibility to pull the available data from the Distribution database and apply it to the Subscriber database.

  • Articles are the fundamental unit of Replication. It indicates any data changes on this database object or article that will be replicated from Publisher to Subscriber. The Article can be a Table, View, Indexed View, Stored Procedure, or the User Defined Function.
  • Publications are a collection of one or more Articles from the database in Publisher.
  • Subscription defines what Publication will be received. Also, it determines from which Publication and on what schedule the data is replicated. Subscription can be either Push or Pull (from Publication to Subscription).
  • Replication Agents are standalone programs responsible for tracking changes and distributing data across the Publisher to Distributor and Subscriber. All Replication Agents execute as Jobs under SQL Server Agent. Thus, it can be administered via SSMS under SQL Server Agent Jobs or Replication Monitor. The following types of Replication Agents are available:
  • Snapshot Agent – Used by almost all types of Replication. The Snapshot Agent runs from Server holding the distribution database. It prepares the Schema and initial data of all Articles included in a Publication on a Publisher. Also, it creates the Snapshot files in a snapshot folder and records Synchronization details in the Distribution database.
  • Log Reader Agent – Used by Transactional Replication. The goal is to read the data changes of articles enabled for Replication from the Publisher Database Transaction Logs and stored to the Distribution Database. the Log Reader Agent runs from the Distributor Server.
  • Distribution Agent – Used by Transactional and Snapshot Replication. It applies the initial Snapshot files and incremental or available pending transactions from the Distribution database to the Subscriber database. The Distribution Agent runs from the Distributor Server for Push Subscriptions and the Subscriber Server for Pull Subscriptions.
  • Merge Agent – Used by Merge Replication only. It applies the initial snapshot files and reconciliation of differential or incremental changes across the Publisher or Subscriber. The Merge Agent runs on the Distributor Server for Push Replication and from the Subscriber Server for Pull Subscriptions.
  • Queue Reader Agent – Queue Reader Agent is used by Transactional Replication with queued update option. It moves changes from Subscriber to Publisher. The Queue Reader Agent runs from the Distributor Server.
  • Replication Maintenance Jobs – As explained earlier, all Replication Agents are standalone programs set up while configuring Replication. They run as jobs under SQL Server Agent Jobs. Few significant jobs to be noted are Distribution Clean Up: Distribution, Agent History Clean Up: Distribution, and Expired Subscription Clean Up.

Types of Replication in SQL Server

Now when we know the terminology, let’s dive into the types of Replication.

  1. Transactional Replication. As the name suggests, every transaction or data change within the transactional scope on Publisher will be sent to Subscriber in near real-time with minor delays depending upon the network bandwidth and server resources. Transactional Replication uses the Log Reader Agent to read the data changes from the Transactional Logs of the Publisher database. It also uses the Distribution Agent to apply changes to Subscriber. Occasionally it may use Snapshot Agent to take initial Snapshot data of all Replicated articles. Transactional Replication Publication may fall under the below categories:
    • Standard Transactional Replication – Subscriber is a read-only database from the Transactional Replication perspective. Any changes performed by anyone on the Subscriber database will not be tracked and updated to the Publisher Database. The Standard Transactional Replication is often referred to as Transactional Replication.
    • Transactional Replication with Updatable Subscriptions is an Enhancement of Standard Transactional Replication that tracks the data changes taking place on Subscribes. Whenever data changes are initiated on an Updateable Subscription, they will be first propagated to Publisher and then to other Subscribers.
    • Peer-to-Peer Replication is an Enhancement of the Standard Transactional Replication. It propagates transactionally consistent changes in near real-time across multiple server instances.
    • Bidirectional Replication is an Enhancement of the Standard Transactional Replication that allows two servers (limit to only 2 Servers and hence named Bidirectional) to exchange the data changes across each other with any server acting as a Publisher (to send changes to another Server) or as a Subscriber (to receive changes from another server).
  2. Merge Replication – Supports capturing data changes that take place across both Publisher and Subscriber and distributes it to the other Server. The Merge Replication requires the ROWGUID column on the Table Articles involved in Merge Replication. It uses Triggers to capture the data changes across Publisher and Subscriber. Also, it delivers the changes across Servers when both Publisher and Subscriber are connected to the network. Merge Replication uses the Merge Agent to replicate the data changes across Publisher and Subscriber.
  3. Snapshot Replication – As the name indicates, Snapshot Replication doesn’t rely on either Transactional Logs or Triggers to capture the changes. It takes a snapshot of articles involved in Publication and applies it to the Subscriber with the records available at the time of the snapshot. The Snapshot Replication uses the Snapshot Agent to take a snapshot of the Publisher and uses the Distribution Agent to apply these records to Subscriber.

SQL Server Transactional Replication

Transactional Replication is typically preferred in scenarios where the OLTP Publisher database has heavy Data INSERT/UPDATE and/or DELETE activities.

Since the Publisher server instance has huge DISK IO happening, generating Reports may cause severe blockings. It also can impact the Server performance. Hence, another Server with near real-time data is good for offloading the Reporting requirements.

One of the fundamental requirements for Transactional Replication is that the tables replicated should have a Primary Key available.

We can summarize how Transactional Replication works. Have a look at the below Transactional Replication Architecture diagram taken from the official Microsoft documentation.

Transactional Replication Architecture diagram

The Publication is created on the Publisher database that comprises the list of Articles for replicating to the Subscriber database.

Transactional Replication will typically be initialized from Publisher to Distributor via the Snapshot Agent or Full Backups. The Snapshot Agent is supported via Replication Configuration Wizard. The Full Backup is supported via TSQL Statements to initialize the Transactional Replication.

The Log Reader Agent scans the Transactional Log of the Publisher database for tracked Articles. Then it copies the data changes from the Transactional Log to the Distribution database.

The Distribution database can be either in Publisher or Subscriber; it can also be or another independent SQL Server instance.

Note also the following things:

  • Log Reader Agent runs continuously from the Distributor Server to scan for new commands marked for Replication. However, if you don’t wish to run continuously and want to run on a schedule instead, we can change the Log Reader Agent SQL Job that will be created out.
  • Log Reader Agent picks up all records marked for Replication from the Transactional Log in batches and sends them to the Distribution database.
  • Log Reader Agent picks up only Committed transactions from the Transactional Log of Publisher Database. So, any long-running queries on the Publisher database can directly impact Replication as it is waiting for the active transaction to complete.

The Distribution Agent picks up all undistributed new commands from the Distribution database and applies them to the Subscription database via either Push or Pull Mechanism. As mentioned earlier, if Push Subscription Distributor takes ownership to apply the changes from Distribution database to Subscriber whereas in Pull Subscription Subscriber database takes ownership to fetch the changes from Distribution database to Subscriber.

Once the records are distributed successfully from the Distribution to Subscriber database, they will be marked as Distributed and marked for deletion from the Distribution database. One of the Key Replication Maintenance jobs named Distribution Clean Up: Distribution job runs once every 10 minutes to delete the distributed records from the Distribution database to maintain the Distribution database size under control.

With the detailed explanation of concepts of Replication and Transactional Replication, we can get our hands on it by configuring Replication for AdventureWorks database and verifying Replication for every component discussed theoretically.

Configuring Transactional Replication Step by Step (via SSMS GUI)

Transactional Replication configuration involves 3 major steps:

  1. Configuring Distribution database
  2. Publication Creation
  3. Subscription Creation

Before trying to configure Replication, make sure that Replication Components are installed as part of SQL Server installation or use SQL Server media to install Replication Components, as they are necessary for the task.

In SSMS, connect to the Publisher Database Instance and right-click on Replication:

Publisher Database Instance

Distribution is not configured right now. Hence, we have the Configure Distribution option. We can either Configure the Distribution database using the Configure Distribution wizard or via the Publication Creation wizard.

To configure the Distribution database and Publication, follow the below steps:

Expand Replication and right-click on New Publication.

Expand Replication and right-click on New Publication

New Publication Wizard will launch. Click Next to see the Distributor configuration options.

By default, it chooses the Publisher server to hold the distribution database. If you wish to use a remote distribution database, choose the second option. Click Next.

New Publication Wizard

The next option is for configuring the Snapshot Folder. Change it to the required folder. Otherwise, it will be created on the SQL Server installation folder path by default. Click Next.

configuring the Snapshot Folder

Select the Publication Database (here it is AdventureWorks) and click Next.

Publication Database

Choose the Publication TypeTransactional Replication. Click Next.

Publication Type – Transactional Replication.

 Choose Articles for this publication. For testing purposes, select all Tables and Views:

Choose Articles for this publication.

Before clicking on Next, expand tables once more to verify some issues.

Some tables are marked by red icons. When we click on those tables, we see the warning indicating that a table can’t be replicated because it doesn’t have a Primary Key, one of the crucial requirements for Transactional Replication. We will later go into more detail. Now, click Next.

warning indicating that a table can’t be replicated because it doesn’t have a Primary Key

A page with Article Issues related to dependencies will appear. Click Next.

A page with Article Issues related to dependencies will appear

The next option is to Filter Table Rows – since we are testing the basic replication, we can ignore it. Click Next.

Filter Table Rows

Configure Snapshot Agent – ignore and click Next.

Configure Snapshot Agent

Agent Settings – click Security Settings to configure account to execute the Snapshot agent and Log Reader Agent under it.

configure account to execute the Snapshot agent and Log Reader Agent under it

Then, change the Snapshot Agent Process to run under SQL Server Agent Service Account.

Set the Log Reader Agent to Connect to Publisher > By impersonating the Process Account. Click OK.

Log Reader Agent

The Agent Security will get updated.

Thus, we have configured the Distributor and all elements of the Publication like Articles, Snapshot Agent, Log Reader Agent, and Agent Securities. We have almost completed the Publication creation via Wizard.

If you need to study further on TSQL scripts used to create Publication, we can check the Generate a script file to create the Publication option. Otherwise, click Next.

Generate a script file to create the Publication

Since I chose to save the file, the wizard allows me to set the Script file path and name. Provide these details and click Next.

Script File Properties

The Wizard finally asks for the Publication Name, I’ve named it AdventureWorks_pub with the database name and keywords to indicate it as a publication for easier identification.

Verify all data provided on the Summary page and click Finish.

Verify all data provided on the Summary

The Wizard will display the progress in Creating Publication. When it is complete, we’ll see the confirmation. Click Close.

The Wizard will display the progress in Creating Publication.

To verify the successful creation of the Distributor (Distribution database), expand the system databases:

expand the system databases

To verify the successful creation of Publication, expand Local Publication:

expand Local Publication

We have configured the Distribution Database and created the Publication database on the AdventureWorks database successfully. Now we can proceed with the Subscription creation

Right-click on the new Publication we have just created and select New Subscriptions:

New Subscription has just created

The New Subscriptions Wizard will appear. To start the process, click Next.

The Publication page asks to ensure that both the Publication and Publisher databases are selected. Click Next.

Publication and Publisher databases are selected

Set the Distribution Agent to either Push or Pull Subscription. We are going to use the Publisher Server as the Subscriber, and that type won’t have any impact. Hence, we leave the default Push Subscription. Click Next.

Set the Distribution Agent to either Push or Pull Subscription

Select the Subscribers (database). I’m selecting AdventureWorks_REPL restored from the same AdventureWorks Database backup. Click Next.

Select the Subscribers (database)

Set the Agent Security:

Set the Agent Security

Since I’m going to do everything within a single Server, I’m using the Agent service account.

Agent service account

The next window presents the Distribution Agent Security values already configured. Click Next.

Distribution Agent Security

Synchronization Schedule – leave it to the default. Click Next.

Synchronization Schedule

Initialize Subscriptions – leave it with the default values. Click Next.

Initialize Subscriptions

After you provide all the necessary details, you will be able to complete the process of creating the Subscription. Check the Generate Script file… option to study the scripts later and click Next.

Generate Script file

Provide the path to save the files, click Next.

Provide the path to save the files

Have a look at the summary and check all the configured values. Once verified, click Finish.

check all the configured values

The Subscription Creation is completed. Click Close.

The Subscription Creation is completed

Now we can see the Subscription displayed under our Publication.

the Subscription displayed under our Publication.

Configure the Snapshot Agent

Our next step is to work on the Snapshot Agent to send the initial data from Publisher to Subscriber.

Before stepping into it, we need to notice the Replication Monitor. This critical tool is available in SSMS to view the Replication status at various levels, Server Level, Publisher Database level, Subscription level, and Replication Agents level.

Right-click on Replication/Local Publication/Local Subscription/Publication or the Subscription we created to launch the Replication Monitor as shown below:

Configure the Snapshot Agent

In the Replication Monitor, expand Publisher Server (RRJ) > Publication ([AdventureWorks]: AdventureWorks_pub) to display the Subscription details. Right-click on Subscription and select View Details.

Right-click on Subscription and select View Details.

As we can see, the information about the Initial Snapshot for our publication AdventureWorks_pub is not yet available. We’ll need to execute the Snapshot agent job to send initial data to the Subscriber database.

Keep this window open to see the progress of Snapshot after starting the Snapshot Agent job.

execute the Snapshot agent job to send initial data to the Subscriber database.

Right-click on Publication > View Snapshot Agent Status:

View Snapshot Agent Status

The agent has never been run message states that we have never executed the Snapshot Agent. Click Start.

The agent has never been run message

While the Snapshot Agent is executing, you can watch the progress:

Snapshot Agent is executing

When all snapshots are created, it will produce the confirmation message:

all snapshots are created

We can see the Snapshot files created newly in the Snapshot folder for which we provided the path earlier.

Snapshot files created newly in the Snapshot folder

After all snapshots are applied by the Distribution Agent to the Subscriber database, it will display the below status in the open Replication Monitor window:

Replication Monitor window

Congrats! We have successfully configured Transactional Replication using Snapshot Agent.

Note: If we have a huge Publisher Database, creating snapshot might take a lot of time. Thus, it is recommended to use the full backup of the Publisher database instead of executing the Snapshot Agent – we’ll cover this issue in subsequent articles.

Verifying Replication Components

Every Replication Components can be verified by both SSMS GUI and TSQL queries. We’ll discuss it in further articles, and here we’ll quickly explain how to view the properties of the below components.

Publisher

In SSMS, right-click Replication > Publisher Properties > Publication Databases:

Publication Databases in SSMS

To view details about the Publisher, execute the below queries against the distribution database.

USE distribution
GO
exec sp_helpdistpublisher
GO
select * from MSpublisher_databases
GO

Subscriber

Subscriber info can be obtained with the below query in SSMS.

USE distribution
GO
exec sp_helpsubscriberinfo
GO
select * from MSsubscriber_info

Distributor

In SSMS, right-click Replication > Distributor Properties:

Distributor Properties in SSMS

Click on Publishers to display the list of all Publishers using this Distribution database.

Publishers using this Distribution database

In SSMS, we can run the below query to obtain the same details.

USE distribution
GO
exec sp_helpdistributor
GO
exec sp_helpdistributiondb
GO	

Articles

Right-click on Publication > Publication Properties > Articles. You will see the list of all Articles available. The properties of Individual articles can be modified by clicking on Article Properties as well.

Article Properties
USE AdventureWorks
GO
-- To View all articles available under a Publication
exec sp_helparticle @publication = 'Adventureworks_pub'
GO
-- To View all article columns for a particular article available under a Publication
exec sp_helparticlecolumns @publication = 'Adventureworks_pub', @article = 'Address'
GO
USE distribution
GO
SELECT * from MSArticles

Publication

Right-click on Publication > Properties:

Publication > Properties

In SSMS, we can run the below query to view the Publication properties:

USE AdventureWorks
GO
exec sp_helppublication
GO
USE distribution
GO
SELECT * FROM MSPublications

Subscription

Right-click on Subscription > Subscription properties:

 Subscription > Subscription properties

In SSMS, we can execute the below script to get the Subscription info:

USE AdventureWorks
GO
exec sp_helpsubscription
GO
USE distribution
GO
SELECT * FROM MSsubscriptions
GO

Replication Agents

Under SQL Server Agent Jobs, we can find the specific Jobs created for all Replication Agents:

SQL Server Agent Jobs

In SSMS, we can execute the query to find out which job is the necessary Log Reader Agent Job, Snapshot Agent Job, and Distribution Agent Jobs. Besides, we can see the Distribution Agent Cleanup job and several other jobs related to Replication created while we were setting Publication and Subscriptions internally.

How Log Reader Agent Works

The Log Reader Agent reads all committed data from the Publisher database transactional logs and pushes it to the Distributor database. Even though Microsoft doesn’t provide an official way to read Transactional Logs, there are few undocumented functions like fn_dblog() and fn_dump_dblog() that can read the data from Log Files. However, these functions are undocumented and not covered by Microsoft support. Thus, we won’t explore them further.

How Distribution Agent Delivers the Data Changes to the Subscriber Database

Once the data is written to the Distribution database, we can read how that data is stored in distribution tables. For that, we apply the sp_browsereplcmds procedure – it fetches the records across the MSrepl_commands and MSrepl_transactions tables.

For learning purposes, let’s take a table with 3 columns named Person.ContactType:

table with 3 columns named Person.ContactType

The created Subscription will make 3 procedures for every article that is part of Publication in Subscriber database with the below naming conventions:

  • dbo.sp_MSins_<SchemaName><TableName>
  • dbo.sp_MSupd_<SchemaName><TableName>
  • dbo.sp_MSdel_<SchemaName><TableName>

For the Person.ContactType Table article, we can see the below procedures created in the Subscriber database:

  • dbo.sp_MSins_PersonContactTypeINSERT new records captured from the Transaction Logs of Publisher database and then propagated to the distribution database.
  • dbo.sp_MSupd_PersonContactTypeUPDATE changes captured from the Transaction Logs of Publisher database and then propagated to the distribution database.
  • dbo.sp_MSdel_PersonContactTypeDELETE records captured from Transaction Logs of Publisher database and then propagated to the distribution database.
procedures created in the Subscriber database

Script of the dbo.sp_MSins_PersonContactType Procedure

As you can see, it’s a straightforward INSERT statement that comes out of the distribution database:

ALTER procedure [dbo].[sp_MSins_PersonContactType]
    @c1 int,
    @c2 nvarchar(50),
    @c3 datetime
as
begin  
	insert into [Person].[ContactType] (
		[ContactTypeID],
		[Name],
		[ModifiedDate]
	) values (
		@c1,
		@c2,
		@c3	) 
end  
GO

Script of the dbo.sp_MSupd_PersonContactType Procedure

The script relies on the Primary Key values to identify the unique record for updating:

ALTER procedure [dbo].[sp_MSupd_PersonContactType]
		@c1 int = NULL,
		@c2 nvarchar(50) = NULL,
		@c3 datetime = NULL,
		@pkc1 int = NULL,
		@bitmap binary(1)
as
begin  
	declare @primarykey_text nvarchar(100) = ''
update [Person].[ContactType] set
		[Name] = case substring(@bitmap,1,1) & 2 when 2 then @c2 else [Name] end,
		[ModifiedDate] = case substring(@bitmap,1,1) & 4 when 4 then @c3 else [ModifiedDate] end
	where [ContactTypeID] = @pkc1
if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[ContactTypeID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[Person].[ContactType]', @param2=@primarykey_text, @param3=13233 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End
end 
GO

Script of the dbo.sp_MSdel_PersonContactType Procedure

This script relies on the Primary Key values to identify a unique record for deleting records from the Subscriber:

ALTER procedure [dbo].[sp_MSdel_PersonContactType]
		@pkc1 int
as
begin  
	declare @primarykey_text nvarchar(100) = ''
	delete [Person].[ContactType] 
	where [ContactTypeID] = @pkc1
if @@rowcount = 0
    if @@microsoftversion>0x07320000
		Begin
			if exists (Select * from sys.all_parameters where object_id = OBJECT_ID('sp_MSreplraiserror') and [name] = '@param3')
			Begin
				
				set @primarykey_text = @primarykey_text + '[ContactTypeID] = ' + convert(nvarchar(100),@pkc1,1)
				exec sp_MSreplraiserror @errorid=20598, @param1=N'[Person].[ContactType]', @param2=@primarykey_text, @param3=13234 
			End
			Else
				exec sp_MSreplraiserror @errorid=20598
		End
end  
GO

This clearly explains why Transactional Replication enforces the Primary Key as a key requirement for tables to be added as part of Replication.

Now, let’s see the Transactional Replication in action. Let’s change some data in the Publisher database. For simplicity, I’ll take the same Person.ContactType table.

Executing the SELECT statement on the table yields 20 records:

Executing the SELECT statement on the table yields

Next, I INSERTed a sample record into the Person.ContactType table:

sample record was INSERTed into the Person.ContactType table

And, I UPDATE the newly inserted record:

newly inserted record was UPDATEd

DELETE the newly inserted record from the table:

newly inserted record was DELETEd from the table

We need to verify these transactions in Replication using sp_browsereplcmds

verifying the transactions in Replication

Changes from Person.ContactType were captured from the Transactional Logs of Publisher Database (AdventureWorks) and sent to the Distribution database in the same order. Later, it was propagated to the Subscriber Database (AdventureWorks_REPL).

Conclusion

Thanks for reading this long power-packed article! We have gone through a variety of topics, such as:  

  • Replication Architecture and Terminologies
  • SQL Server Replication Types
  • SQL Server Transactional Replication in Detail
  • SQL Server Transactional Replication Configuration (Default approach)
  • SQL Server Transactional Replication Verification
  • SQL Server Transactional Replication in action

I hope that you’ve found lots of helpful information in this article. In subsequent parts, we’ll explore troubleshooting various issues that are frequently encountered in Replication, and learn how to handle them more efficiently.

Raja Jegan

Raja Jegan

Raja Jegan Ramesh is an experienced Database Architect with more than 15+ years of experience in and around various RDBMS technologies. Primarily focused on SQL Server and related areas, he handles other RDBMS platforms like Oracle, MySQL, etc., in his day to day work. During his spare time, he contributes in Experts-Exchange platform to help resolve day to day issues for several people which helps him encounter lot of scenarios related to SQL Server and how to handle those efficiently by working with other SQL Server experts as well.