Managing your MS SQL Replication

Replication is one of the oldest technologies on MS SQL Server, loved by every database administrator. It is a great and reliable technology for bringing data closer to users, especially for distributed reporting. Thanks to it, the database availability increases across multiple SQL Servers and regions.

CodingSight - Managing your MS SQL Replication

Replication was introduced in SQL 2005. Can you believe it’s that old? Though newer managed SQL platforms on the cloud are introduced regularly, I believe that SQL replication will remain here. If it was a bug or insect, I would think of it as a cockroach. It’s hard to squash!

If you are one of those belonging to a small population of administrators who never managed a database, there is official Microsoft documentation on the topic. However, note that is it pretty long, comprehensive and will rob you of some time off from holiday or planned binge-watching TV series. Also, CodingSight offers the SQL Server Database Replication setup and configuration guide.

But before you get your hands dirty with the technical stuff, and I know you’re eager too, it’s important to plan for it.

The replication requirement may change with regards to location when you deploy to SQL Servers running on the cloud. But once the SQL replication is running like a well-oiled machine and replicating production data, you need to plan how you manage it.

In this post, I will share some tips and T-SQL scripts for you to use when you need to check many SQL Agent jobs are created after the replication configuration.

Monitor Replication Agents

When you set up and configure SQL replication, it also creates a set of standalone functions and SQL Agent jobs known as replication agents. Their goal is to carry out tasks associated with moving your tables, also called articles, in the replication configuration from publisher to subscriber/s. You can run replication agents from the command line and by applications that use Replication Management Objects (RMO).

SQL Server replication agents can be monitored and administered via Replication Monitor and SQL Server Management Studio.

Replication Monitor and SQL Server Management Studio.

The primary concern of a database administrator/replication administrator is making sure that all SQL Agents replication jobs are running. If the replication agent job fails, the subscriber may not receive data. Therefore, the distribution database may grow huge because of accumulated rows that won’t move to the subscriber database.

To set an alert for any replication agent job failure, you can create another agent job. It will check the job failures and send an email to your dba team if it identifies problems.

Check for Failed Replication Agent Jobs

Use the below script:

declare @time time 
set @time = dateadd(n,-30,getdate()) 
declare @date date 
set @date = convert(date,getdate()) 
declare @publisher varchar(100) 
set @publisher = @@SERVERNAME
SELECT LEFT(name,50) as [JobName], run_date AS [RunDate], run_time AS [RunTime], LEFT([message],50) AS [Message] 
(select distinct,a.run_date, run_time, message 
from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id where like 'servername here%' and run_status <> 1 and message like '%error%' 
and convert(date,convert(varchar,a.run_date ))= convert(date,getutcdate()) replace(convert(varchar(8),dateadd(n,-30,getutcdate())),':','') ) a 

Create an Email Alert to Notify about the Job Failure

Apply the following script:

exec msdb.dbo.sp_send_dbmail 
@profile_name = 'DBA Alerts', 
@recipients = 'your dba team email here', 
@subject = '[Database name] Replication Jobs Failure', 
@query = 'SELECT LEFT(name,50) as [JobName], run_date AS [RunDate], run_time AS [RunTime], LEFT([message],50) AS [Message] 
(select distinct, a.run_date, a.run_time, message 
from msdb.dbo.sysjobhistory a inner join msdb.dbo.sysjobs b on a.job_id = b.job_id 
where like ''servername here %'' and 
convert(date,convert(varchar,a.run_date)) = convert(date,getutcdate()) ) a 
@attach_query_result_as_file = 0 ; 

Monitor the Table Containing Replicated Commands

To monitor the msrepl_commands table, you may use one more script provided below. Note that this table should grow too huge and too fast. If that is the case, the replication agent jobs might fail, or there could be a problem in the replication configuration.

The script is as follows:

use distribution 
SELECT Getdate() AS CaptureTime, LEFT(Object_name(t.object_id),20) AS TableName, st.row_count 
FROM sys.dm_db_partition_stats st WITH (nolock) 
INNER JOIN sys.tables t WITH (nolock) ON st.object_id = t.object_id INNER JOIN sys.schemas s WITH (nolock) ON t.schema_id = s.schema_id WHERE index_id < 2 AND Object_name(t.object_id) 
IN ('MSsubscriptions', 'MSdistribution_history', 'MSrepl_commands', 'MSrepl_transactions') 
ORDER BY st.row_count DESC

The msrepl_commands table growth trend also gives you a hint of how healthy your replication latency is. There are many factors of impact. If your environment is in the cloud, the region selection may contribute a big deal to replication latency.

Create a Simple Reporting on Replication and Send it by Email

You may use the following script:

Declare @Publisher sysname, @PublisherDB sysname 
-- Set Publisher server and database name 
Set @Publisher = 'publication server name'; 
Set @PublisherDB = 'publishing database name'; 
-- Refresh replication monitor data 
USE [distribution] 
Exec sys.sp_replmonitorrefreshjob @iterations = 1; 
With MaxXact (ServerName, PublisherDBID, XactSeqNo) 
As (Select, DA.publisher_database_id, max(H.xact_seqno) From dbo.MSdistribution_history H with(nolock) 
Inner Join dbo.MSdistribution_agents DA with(nolock) On = H.agent_id 
Inner Join master.sys.servers S with(nolock) On S.server_id = DA.subscriber_id 
Where DA.publisher_db = @PublisherDB 
Group By, DA.publisher_database_id), OldestXact (ServerName, OldestEntryTime) 
As (Select MX.ServerName, Min(entry_time) 
From dbo.msrepl_transactions T with(nolock) 
Inner Join MaxXact MX On MX.XactSeqNo < T.xact_seqno And 
MX.PublisherDBID = T.publisher_database_id 
Group By MX.ServerName) 
Select [Replication Status] = Case MD.status 
When 1 Then 'Started' 
When 2 Then 'Succeeded' 
When 3 Then 'In progress' 
When 4 Then 'Idle' 
When 5 Then 'Retrying' 
When 6 Then 'Failed' 
Subscriber = SubString(MD.agent_name, Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4, 
Charindex('-', MD.agent_name, 
Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 5) - 
(Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4)), 
[Subscriber DB] = A.subscriber_db, 
[Publisher DB] = MD.publisher_db, 
Publisher = MD.publisher, 
[Current Latency (sec)] = MD.cur_latency,
[Current Latency (hh:mm:ss)] = Right('00' + Cast(MD.cur_latency/3600 As varchar), 2) + 
':' + Right('00' + 
Cast((MD.cur_latency%3600)/60 As varchar), 2) + 
':' + Right('00' + 
Cast(MD.cur_latency%60 As varchar), 2), 
[Latency Threshold (min)] = Cast(T.value As Int), 
[Agent Last Stopped (sec)] = DateDiff(hour, agentstoptime, getdate()) - 1, 
[Agent Last Sync] = MD.last_distsync, 
[Last Entry TimeStamp] = OX.OldestEntryTime 
From dbo.MSreplication_monitordata MD with(nolock) 
Inner Join dbo.MSdistribution_agents A with(nolock) On = MD.agent_id Inner Join dbo.MSpublicationthresholds T with(nolock) On T.publication_id = MD.publication_id And T.metric_id = 2 -- Latency 
Inner Join OldestXact OX On OX.ServerName = SubString(MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 4, 
Charindex('-', MD.agent_name, 
Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 5) - 
(Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4)) 
Where MD.publisher = @Publisher 
And MD.publisher_db = @PublisherDB 
And MD.publication_type = 0 -- 0 = Transactional publication And MD.agent_type = 3; -- 3 = distribution agent 
IF (@@ROWCOUNT > 500) 
-- send alerts here.. 500 rows of undistributed transactions , should be higher. run this on remote distributor 
EXEC msdb.dbo.sp_send_dbmail 
@profile_name = 'DBA Alert', 
@recipients = 'your dba team email here', 
@body = 'This is replication latency alert. Check undistributed transactions query.', 
@subject = 'Replication Latency Alert' ; 
PRINT 'Alert here!' --since email is not yet working 

Query the Articles List and Check for the Subscribers’ Health

If you are working on a transaction replication, these operations are extremely important. Here is a script:

SELECT DISTINCT LEFT(srv.srvname,50) AS publication_server 
, LEFT(a.publisher_db, 50) AS publisher_db 
, LEFT(p.publication,25) AS publication_name
, LEFT(a.article, 50) AS [article] 
, LEFT(a.destination_object,50) AS destination_object 
, LEFT(ss.srvname,25) AS subscription_server 
, LEFT(s.subscriber_db,25) AS subscriber_db 
, LEFT(,50) AS distribution_agent_job_name 
FROM distribution..MSArticles a 
JOIN distribution..MSpublications p ON a.publication_id = p.publication_id JOIN distribution..MSsubscriptions s ON p.publication_id = s.publication_id JOIN master..sysservers ss ON s.subscriber_id = ss.srvid 
JOIN master..sysservers srv ON srv.srvid = p.publisher_id 
JOIN distribution..MSdistribution_agents da ON da.publisher_id = p.publisher_id AND da.subscriber_id = s.subscriber_id 
ORDER BY 1,2,3 

Create a Reporting Summary for the DBA Team

To combine all replication statistics and delivered and undelivered commands, you can create a table in the distribution database to contain all the replication details.

From this table, you can create a reporting summary to distribute to the dba team. This table can be refreshed every day as part of the daily replication health check aside from the standard database administrator morning health check.

USE [distribution] 
IF OBJECT_ID('Tempdb.dbo.#ReplStats') IS NOT NULL 
DROP TABLE #ReplStats 
CREATE TABLE [dbo].[#ReplStats] ( 
[DistributionAgentName] [nvarchar](100) NOT NULL 
,[DistributionAgentStartTime] [datetime] NOT NULL 
,[DistributionAgentRunningDurationInSeconds] [int] NOT NULL ,[IsAgentRunning] [bit] NULL 
,[ReplicationStatus] [varchar](14) NULL 
,[LastSynchronized] [datetime] NOT NULL 
,[Comments] [nvarchar](max) NOT NULL 
,[Publisher] [sysname] NOT NULL 
,[PublicationName] [sysname] NOT NULL 
,[PublisherDB] [sysname] NOT NULL 
,[Subscriber] [nvarchar](128) NULL 
,[SubscriberDB] [sysname] NULL 
,[SubscriptionType] [varchar](64) NULL 
,[DistributionDB] [sysname] NULL 
,[Article] [sysname] NOT NULL 
,[UndelivCmdsInDistDB] [int] NULL 
,[DelivCmdsInDistDB] [int] NULL 
,[CurrentSessionDeliveryRate] [float] NOT NULL 
,[CurrentSessionDeliveryLatency] [int] NOT NULL 
,[TotalTransactionsDeliveredInCurrentSession] [int] NOT NULL
,[TotalCommandsDeliveredInCurrentSession] [int] NOT NULL ,[AverageCommandsDeliveredInCurrentSession] [int] NOT NULL ,[DeliveryRate] [float] NOT NULL 
,[DeliveryLatency] [int] NOT NULL 
,[TotalCommandsDeliveredSinceSubscriptionSetup] [int] NOT NULL ,[SequenceNumber] [varbinary](16) NULL 
,[LastDistributerSync] [datetime] NULL 
,[Retention] [int] NULL 
,[WorstLatency] [int] NULL 
,[BestLatency] [int] NULL 
,[AverageLatency] [int] NULL 
,[CurrentLatency] [int] NULL 
SELECT da.[name] AS [DistributionAgentName] 
,dh.[start_time] AS [DistributionAgentStartTime] 
,dh.[duration] AS [DistributionAgentRunningDurationInSeconds] ,md.[isagentrunningnow] AS [IsAgentRunning] 
,CASE md.[status] 
THEN '1 - Started' 
THEN '2 - Succeeded' 
THEN '3 - InProgress' 
THEN '4 - Idle' 
THEN '5 - Retrying' 
THEN '6 - Failed' 
END AS [ReplicationStatus] 
,dh.[time] AS [LastSynchronized] 
,dh.[comments] AS [Comments] 
,md.[publisher] AS [Publisher] 
,da.[publication] AS [PublicationName] 
,da.[publisher_db] AS [PublisherDB] 
WHEN da.[anonymous_subid] IS NOT NULL 
THEN UPPER(da.[subscriber_name]) 
ELSE UPPER(s.[name]) 
END AS [Subscriber] 
,da.[subscriber_db] AS [SubscriberDB] 
,CASE da.[subscription_type] 
WHEN '0' 
THEN 'Push' 
WHEN '1' 
THEN 'Pull'
WHEN '2' 
THEN 'Anonymous' 
ELSE CAST(da.[subscription_type] AS [varchar](64)) 
END AS [SubscriptionType] 
,md.[distdb] AS [DistributionDB] 
,ma.[article] AS [Article] 
,dh.[current_delivery_rate] AS [CurrentSessionDeliveryRate] ,dh.[current_delivery_latency] AS [CurrentSessionDeliveryLatency] ,dh.[delivered_transactions] AS 
,dh.[delivered_commands] AS [TotalCommandsDeliveredInCurrentSession] ,dh.[average_commands] AS [AverageCommandsDeliveredInCurrentSession] ,dh.[delivery_rate] AS [DeliveryRate] 
,dh.[delivery_latency] AS [DeliveryLatency] 
,dh.[total_delivered_commands] AS 
,dh.[xact_seqno] AS [SequenceNumber] 
,md.[last_distsync] AS [LastDistributerSync] 
,md.[retention] AS [Retention] 
,md.[worst_latency] AS [WorstLatency] 
,md.[best_latency] AS [BestLatency] 
,md.[avg_latency] AS [AverageLatency] 
,md.[cur_latency] AS [CurrentLatency] 
FROM [distribution]..[MSdistribution_status] ds 
INNER JOIN [distribution]..[MSdistribution_agents] da ON da.[id] = ds.[agent_id] 
INNER JOIN [distribution]..[MSArticles] ma ON ma.publisher_id = da.publisher_id 
AND ma.[article_id] = ds.[article_id] 
INNER JOIN [distribution]..[MSreplication_monitordata] md ON [md].[job_id] = da.[job_id] 
INNER JOIN [distribution]..[MSdistribution_history] dh ON [dh].[agent_id] = md.[agent_id] 
AND md.[agent_type] = 3 
INNER JOIN [master].[sys].[servers] s ON s.[server_id] = da.[subscriber_id] 
--Created WHEN your publication has the immediate_sync property set to true. This property dictates 
--whether snapshot is available all the time for new subscriptions to be initialized. 
--This affects the cleanup behavior of transactional replication. If this property is set to true, 
--the transactions will be retained for max retention period instead of it getting cleaned up 
--as soon as all the subscriptions got the change. 
WHERE da.[subscriber_db] <> 'virtual' 
AND da.[anonymous_subid] IS NULL
AND dh.[start_time] = ( 
SELECT TOP 1 start_time 
FROM [distribution]..[MSdistribution_history] a 
INNER JOIN [distribution]..[MSdistribution_agents] b ON a.[agent_id] = b.[id] 
AND b.[subscriber_db] <> 'virtual' 
WHERE [runstatus] <> 1 
ORDER BY [start_time] DESC 
AND dh.[runstatus] <> 1 
SELECT 'Transactional Replication Summary' AS [Comments]; 
SELECT [DistributionAgentName] 
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB] 
,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB] 
FROM #ReplStats 
GROUP BY [DistributionAgentName] 
SELECT 'Transactional Replication Summary Details' AS [Comments]; 
SELECT [Publisher] 
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB] ,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB] 
FROM #ReplStats 
GROUP BY [Publisher] 


I hope that these few T-SQL scripts provided above will help you in your replication agents’ monitoring. I highly recommend you monitor them closely. Otherwise, users at the subscriber’s end may complain endlessly about not having (close to) real-time data.

In the coming articles, I will dig deeper into the SQL technology of replicating data to any part of the globe. Happy monitoring!

Carla Abanes

Carla Abanes

Carla is a certified MS SQL database administrator that also doubles as cloud architect. She is experienced managing both AWS and Azure platforms. In her free time she loves to sit down with a good book and coffee.