Written by 10:54 Database administration, Transaction Log

SQL Server Replication Monitor: Scripts for Checking Replication Status

CodingSight - Managing your MS SQL Replication

Replication is one of the oldest technologies on MS SQL Server, loved by every database administrator. It is a great and reliable technology for bringing data closer to users, especially for distributed reporting. Thanks to it, the database availability increases across multiple SQL Servers and regions.

Replication was introduced in SQL 2005. Can you believe it’s that old? Though newer managed SQL platforms on the cloud are introduced regularly, I believe that SQL replication will remain here. If it was a bug or insect, I would think of it as a cockroach. It’s hard to squash!

If you are one of those belonging to a small population of administrators who never managed a database, there is official Microsoft documentation on the topic. However, note that is it pretty long, comprehensive and will rob you of some time off from holiday or planned binge-watching TV series. Also, Codingsight offers the SQL Server Database Replication setup and configuration guide.

But before you get your hands dirty with the technical stuff, and I know you’re eager too, it’s important to plan for it.

The replication requirement may change with regards to location when you deploy to SQL Servers running on the cloud. But once the SQL replication is running like a well-oiled machine and replicating production data, you need to plan how you manage it.

In this post, I will share some tips and T-SQL scripts for you to use when you need to check many SQL Agent jobs are created after the replication configuration.

What is Monitor Replication Agents?

When you set up and configure SQL replication, it also creates a set of standalone functions and SQL Agent jobs known as replication agents. Their goal is to carry out tasks associated with moving your tables, also called articles, in the replication configuration from publisher to subscriber/s. You can run replication agents from the command line and by applications that use Replication Management Objects (RMO).

SQL Server replication agents can be monitored and administered via Replication Monitor and SQL Server Management Studio.

Replication Monitor and SQL Server Management Studio.

The primary concern of a database administrator/replication administrator is making sure that all SQL Agents replication jobs are running. If the replication agent job fails, the subscriber may not receive data. Therefore, the distribution database may grow huge because of accumulated rows that won’t move to the subscriber database.

To set an alert for any replication agent job failure, you can create another agent job. It will check the job failures and send an email to your dba team if it identifies problems.

Script 1: How to Check for Failed Replication Agent Jobs

The script provided below is a powerful tool that allows you to identify replication agent jobs that have failed within the last 30 minutes. It does so by querying the job history from the MS SQL database, specifically targeting jobs with a run status that indicates an error. This script is particularly useful for database administrators who need to keep a close eye on the health and performance of their SQL Server replication environment. By using this script, you can promptly detect and address any issues, thereby minimizing the potential impact on your database operations.

Use the below script:

declare @time time 
set @time = dateadd(n,-30,getdate()) 
declare @date date 
set @date = convert(date,getdate()) 
declare @publisher varchar(100) 
set @publisher = @@SERVERNAME
SELECT LEFT(name,50) as [JobName], run_date AS [RunDate], run_time AS [RunTime], LEFT([message],50) AS [Message] 
FROM 
(select distinct b.name,a.run_date, run_time, message 
from msdb..sysjobhistory a inner join msdb..sysjobs b on a.job_id = b.job_id where b.name like 'servername here%' and run_status <> 1 and message like '%error%' 
and convert(date,convert(varchar,a.run_date ))= convert(date,getutcdate()) replace(convert(varchar(8),dateadd(n,-30,getutcdate())),':','') ) a 

Script 2: Create an Email Alert to Notify about the Job Failure

We will explore how to set up an automated alert system for job failures in your SQL Server environment. The provided script utilizes the ‘sp_send_dbmail’ stored procedure, which is a part of SQL Server’s Database Mail system. This script is designed to send an email to your DBA team whenever a job failure occurs. The email includes essential details about the failed job, such as the job name, run date, run time, and a brief message describing the failure. This proactive approach ensures that your team is promptly notified of any issues, allowing for quicker response times and more effective database management. By implementing this script, you can significantly enhance your monitoring capabilities and maintain the overall health and performance of your databases.

Apply the following script:

exec msdb.dbo.sp_send_dbmail 
@profile_name = 'DBA Alerts', 
@recipients = 'your dba team email here', 
@subject = '[Database name] Replication Jobs Failure', 
@query = 'SELECT LEFT(name,50) as [JobName], run_date AS [RunDate], run_time AS [RunTime], LEFT([message],50) AS [Message] 
FROM 
(select distinct b.name, a.run_date, a.run_time, message 
from msdb.dbo.sysjobhistory a inner join msdb.dbo.sysjobs b on a.job_id = b.job_id 
where b.name like ''servername here %'' and 
convert(date,convert(varchar,a.run_date)) = convert(date,getutcdate()) ) a 
', 
@attach_query_result_as_file = 0 ; 

Script 3: How to Monitor the Table Containing Replicated Commands

To monitor the msrepl_commands table, you may use one more script provided below. Note that this table should grow too huge and too fast. If that is the case, the replication agent jobs might fail, or there could be a problem in the replication configuration.

The script is as follows:

use distribution 
SELECT Getdate() AS CaptureTime, LEFT(Object_name(t.object_id),20) AS TableName, st.row_count 
FROM sys.dm_db_partition_stats st WITH (nolock) 
INNER JOIN sys.tables t WITH (nolock) ON st.object_id = t.object_id INNER JOIN sys.schemas s WITH (nolock) ON t.schema_id = s.schema_id WHERE index_id < 2 AND Object_name(t.object_id) 
IN ('MSsubscriptions', 'MSdistribution_history', 'MSrepl_commands', 'MSrepl_transactions') 
ORDER BY st.row_count DESC

The msrepl_commands table growth trend also gives you a hint of how healthy your replication latency is. There are many factors of impact. If your environment is in the cloud, the region selection may contribute a big deal to replication latency.

Script 4: Automated Email Reports for Replication Monitoring in SQL Server

In the upcoming section, we will discuss how to generate a simple yet comprehensive report on the status of replication in SQL Server and how to automate its delivery via email. The script provided here is designed to gather crucial information about your replication environment, such as the status of replication, subscriber details, publisher details, current latency, and the timestamp of the last entry.

This script also includes a built-in alert system that triggers an email to your DBA team when there are more than 500 undistributed transactions, indicating potential replication latency. This feature is particularly beneficial for maintaining the smooth operation of your replication setup, as it allows for early detection and resolution of potential issues. By implementing this script, you can ensure that your team stays informed about the state of replication in your SQL Server environment, enabling proactive management and efficient problem-solving.

You may use the following script:

Declare @Publisher sysname, @PublisherDB sysname 
-- Set Publisher server and database name 
Set @Publisher = 'publication server name'; 
Set @PublisherDB = 'publishing database name'; 
-- Refresh replication monitor data 
USE [distribution] 
Exec sys.sp_replmonitorrefreshjob @iterations = 1; 
With MaxXact (ServerName, PublisherDBID, XactSeqNo) 
As (Select S.name, DA.publisher_database_id, max(H.xact_seqno) From dbo.MSdistribution_history H with(nolock) 
Inner Join dbo.MSdistribution_agents DA with(nolock) On DA.id = H.agent_id 
Inner Join master.sys.servers S with(nolock) On S.server_id = DA.subscriber_id 
Where DA.publisher_db = @PublisherDB 
Group By S.name, DA.publisher_database_id), OldestXact (ServerName, OldestEntryTime) 
As (Select MX.ServerName, Min(entry_time) 
From dbo.msrepl_transactions T with(nolock) 
Inner Join MaxXact MX On MX.XactSeqNo < T.xact_seqno And 
MX.PublisherDBID = T.publisher_database_id 
Group By MX.ServerName) 
Select [Replication Status] = Case MD.status 
When 1 Then 'Started' 
When 2 Then 'Succeeded' 
When 3 Then 'In progress' 
When 4 Then 'Idle' 
When 5 Then 'Retrying' 
When 6 Then 'Failed' 
End, 
Subscriber = SubString(MD.agent_name, Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4, 
Charindex('-', MD.agent_name, 
Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 5) - 
(Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4)), 
[Subscriber DB] = A.subscriber_db, 
[Publisher DB] = MD.publisher_db, 
Publisher = MD.publisher, 
[Current Latency (sec)] = MD.cur_latency,
[Current Latency (hh:mm:ss)] = Right('00' + Cast(MD.cur_latency/3600 As varchar), 2) + 
':' + Right('00' + 
Cast((MD.cur_latency%3600)/60 As varchar), 2) + 
':' + Right('00' + 
Cast(MD.cur_latency%60 As varchar), 2), 
[Latency Threshold (min)] = Cast(T.value As Int), 
[Agent Last Stopped (sec)] = DateDiff(hour, agentstoptime, getdate()) - 1, 
[Agent Last Sync] = MD.last_distsync, 
[Last Entry TimeStamp] = OX.OldestEntryTime 
From dbo.MSreplication_monitordata MD with(nolock) 
Inner Join dbo.MSdistribution_agents A with(nolock) On A.id = MD.agent_id Inner Join dbo.MSpublicationthresholds T with(nolock) On T.publication_id = MD.publication_id And T.metric_id = 2 -- Latency 
Inner Join OldestXact OX On OX.ServerName = SubString(MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 4, 
Charindex('-', MD.agent_name, 
Len(MD.publisher) + Len(MD.publisher_db) + 
Len(MD.publication) + 5) - 
(Len(MD.publisher) + 
Len(MD.publisher_db) + Len(MD.publication) + 4)) 
Where MD.publisher = @Publisher 
And MD.publisher_db = @PublisherDB 
And MD.publication_type = 0 -- 0 = Transactional publication And MD.agent_type = 3; -- 3 = distribution agent 
IF (@@ROWCOUNT > 500) 
BEGIN 
-- send alerts here.. 500 rows of undistributed transactions , should be higher. run this on remote distributor 
EXEC msdb.dbo.sp_send_dbmail 
@profile_name = 'DBA Alert', 
@recipients = 'your dba team email here', 
@body = 'This is replication latency alert. Check undistributed transactions query.', 
@subject = 'Replication Latency Alert' ; 
PRINT 'Alert here!' --since email is not yet working 
END

Script 5: Example of Query the Articles List and Check for the Subscribers’ Health

If you are working on a transaction replication, these operations are extremely important. Here is a script:

SELECT DISTINCT LEFT(srv.srvname,50) AS publication_server 
, LEFT(a.publisher_db, 50) AS publisher_db 
, LEFT(p.publication,25) AS publication_name
, LEFT(a.article, 50) AS [article] 
, LEFT(a.destination_object,50) AS destination_object 
, LEFT(ss.srvname,25) AS subscription_server 
, LEFT(s.subscriber_db,25) AS subscriber_db 
, LEFT(da.name,50) AS distribution_agent_job_name 
FROM distribution..MSArticles a 
JOIN distribution..MSpublications p ON a.publication_id = p.publication_id JOIN distribution..MSsubscriptions s ON p.publication_id = s.publication_id JOIN master..sysservers ss ON s.subscriber_id = ss.srvid 
JOIN master..sysservers srv ON srv.srvid = p.publisher_id 
JOIN distribution..MSdistribution_agents da ON da.publisher_id = p.publisher_id AND da.subscriber_id = s.subscriber_id 
ORDER BY 1,2,3 

Script 6: Building a Replication Reporting Overview for the DBA Team

To combine all replication statistics and delivered and undelivered commands, you can create a table in the distribution database to contain all the replication details.

From this table, you can create a reporting summary to distribute to the dba team. This table can be refreshed every day as part of the daily replication health check aside from the standard database administrator morning health check.

USE [distribution] 
IF OBJECT_ID('Tempdb.dbo.#ReplStats') IS NOT NULL 
DROP TABLE #ReplStats 
CREATE TABLE [dbo].[#ReplStats] ( 
[DistributionAgentName] [nvarchar](100) NOT NULL 
,[DistributionAgentStartTime] [datetime] NOT NULL 
,[DistributionAgentRunningDurationInSeconds] [int] NOT NULL ,[IsAgentRunning] [bit] NULL 
,[ReplicationStatus] [varchar](14) NULL 
,[LastSynchronized] [datetime] NOT NULL 
,[Comments] [nvarchar](max) NOT NULL 
,[Publisher] [sysname] NOT NULL 
,[PublicationName] [sysname] NOT NULL 
,[PublisherDB] [sysname] NOT NULL 
,[Subscriber] [nvarchar](128) NULL 
,[SubscriberDB] [sysname] NULL 
,[SubscriptionType] [varchar](64) NULL 
,[DistributionDB] [sysname] NULL 
,[Article] [sysname] NOT NULL 
,[UndelivCmdsInDistDB] [int] NULL 
,[DelivCmdsInDistDB] [int] NULL 
,[CurrentSessionDeliveryRate] [float] NOT NULL 
,[CurrentSessionDeliveryLatency] [int] NOT NULL 
,[TotalTransactionsDeliveredInCurrentSession] [int] NOT NULL
,[TotalCommandsDeliveredInCurrentSession] [int] NOT NULL ,[AverageCommandsDeliveredInCurrentSession] [int] NOT NULL ,[DeliveryRate] [float] NOT NULL 
,[DeliveryLatency] [int] NOT NULL 
,[TotalCommandsDeliveredSinceSubscriptionSetup] [int] NOT NULL ,[SequenceNumber] [varbinary](16) NULL 
,[LastDistributerSync] [datetime] NULL 
,[Retention] [int] NULL 
,[WorstLatency] [int] NULL 
,[BestLatency] [int] NULL 
,[AverageLatency] [int] NULL 
,[CurrentLatency] [int] NULL 
) ON [PRIMARY] 
INSERT INTO #ReplStats 
SELECT da.[name] AS [DistributionAgentName] 
,dh.[start_time] AS [DistributionAgentStartTime] 
,dh.[duration] AS [DistributionAgentRunningDurationInSeconds] ,md.[isagentrunningnow] AS [IsAgentRunning] 
,CASE md.[status] 
WHEN 1 
THEN '1 - Started' 
WHEN 2 
THEN '2 - Succeeded' 
WHEN 3 
THEN '3 - InProgress' 
WHEN 4 
THEN '4 - Idle' 
WHEN 5 
THEN '5 - Retrying' 
WHEN 6 
THEN '6 - Failed' 
END AS [ReplicationStatus] 
,dh.[time] AS [LastSynchronized] 
,dh.[comments] AS [Comments] 
,md.[publisher] AS [Publisher] 
,da.[publication] AS [PublicationName] 
,da.[publisher_db] AS [PublisherDB] 
,CASE 
WHEN da.[anonymous_subid] IS NOT NULL 
THEN UPPER(da.[subscriber_name]) 
ELSE UPPER(s.[name]) 
END AS [Subscriber] 
,da.[subscriber_db] AS [SubscriberDB] 
,CASE da.[subscription_type] 
WHEN '0' 
THEN 'Push' 
WHEN '1' 
THEN 'Pull'
WHEN '2' 
THEN 'Anonymous' 
ELSE CAST(da.[subscription_type] AS [varchar](64)) 
END AS [SubscriptionType] 
,md.[distdb] AS [DistributionDB] 
,ma.[article] AS [Article] 
,ds.[UndelivCmdsInDistDB] 
,ds.[DelivCmdsInDistDB] 
,dh.[current_delivery_rate] AS [CurrentSessionDeliveryRate] ,dh.[current_delivery_latency] AS [CurrentSessionDeliveryLatency] ,dh.[delivered_transactions] AS 
[TotalTransactionsDeliveredInCurrentSession] 
,dh.[delivered_commands] AS [TotalCommandsDeliveredInCurrentSession] ,dh.[average_commands] AS [AverageCommandsDeliveredInCurrentSession] ,dh.[delivery_rate] AS [DeliveryRate] 
,dh.[delivery_latency] AS [DeliveryLatency] 
,dh.[total_delivered_commands] AS 
[TotalCommandsDeliveredSinceSubscriptionSetup] 
,dh.[xact_seqno] AS [SequenceNumber] 
,md.[last_distsync] AS [LastDistributerSync] 
,md.[retention] AS [Retention] 
,md.[worst_latency] AS [WorstLatency] 
,md.[best_latency] AS [BestLatency] 
,md.[avg_latency] AS [AverageLatency] 
,md.[cur_latency] AS [CurrentLatency] 
FROM [distribution]..[MSdistribution_status] ds 
INNER JOIN [distribution]..[MSdistribution_agents] da ON da.[id] = ds.[agent_id] 
INNER JOIN [distribution]..[MSArticles] ma ON ma.publisher_id = da.publisher_id 
AND ma.[article_id] = ds.[article_id] 
INNER JOIN [distribution]..[MSreplication_monitordata] md ON [md].[job_id] = da.[job_id] 
INNER JOIN [distribution]..[MSdistribution_history] dh ON [dh].[agent_id] = md.[agent_id] 
AND md.[agent_type] = 3 
INNER JOIN [master].[sys].[servers] s ON s.[server_id] = da.[subscriber_id] 
--Created WHEN your publication has the immediate_sync property set to true. This property dictates 
--whether snapshot is available all the time for new subscriptions to be initialized. 
--This affects the cleanup behavior of transactional replication. If this property is set to true, 
--the transactions will be retained for max retention period instead of it getting cleaned up 
--as soon as all the subscriptions got the change. 
WHERE da.[subscriber_db] <> 'virtual' 
AND da.[anonymous_subid] IS NULL
AND dh.[start_time] = ( 
SELECT TOP 1 start_time 
FROM [distribution]..[MSdistribution_history] a 
INNER JOIN [distribution]..[MSdistribution_agents] b ON a.[agent_id] = b.[id] 
AND b.[subscriber_db] <> 'virtual' 
WHERE [runstatus] <> 1 
ORDER BY [start_time] DESC 
) 
AND dh.[runstatus] <> 1 
SELECT 'Transactional Replication Summary' AS [Comments]; 
SELECT [DistributionAgentName] 
,[DistributionAgentStartTime] 
,[DistributionAgentRunningDurationInSeconds] 
,[IsAgentRunning] 
,[ReplicationStatus] 
,[LastSynchronized] 
,[Comments] 
,[Publisher] 
,[PublicationName] 
,[PublisherDB] 
,[Subscriber] 
,[SubscriberDB] 
,[SubscriptionType] 
,[DistributionDB] 
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB] 
,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB] 
,[CurrentSessionDeliveryRate] 
,[CurrentSessionDeliveryLatency] 
,[TotalTransactionsDeliveredInCurrentSession] 
,[TotalCommandsDeliveredInCurrentSession] 
,[AverageCommandsDeliveredInCurrentSession] 
,[DeliveryRate] 
,[DeliveryLatency] 
,[TotalCommandsDeliveredSinceSubscriptionSetup] 
,[SequenceNumber] 
,[LastDistributerSync] 
,[Retention] 
,[WorstLatency] 
,[BestLatency] 
,[AverageLatency] 
,[CurrentLatency] 
FROM #ReplStats 
GROUP BY [DistributionAgentName] 
,[DistributionAgentStartTime] 
,[DistributionAgentRunningDurationInSeconds] 
,[IsAgentRunning]
,[ReplicationStatus] 
,[LastSynchronized] 
,[Comments] 
,[Publisher] 
,[PublicationName] 
,[PublisherDB] 
,[Subscriber] 
,[SubscriberDB] 
,[SubscriptionType] 
,[DistributionDB] 
,[CurrentSessionDeliveryRate] 
,[CurrentSessionDeliveryLatency] 
,[TotalTransactionsDeliveredInCurrentSession] 
,[TotalCommandsDeliveredInCurrentSession] 
,[AverageCommandsDeliveredInCurrentSession] 
,[DeliveryRate] 
,[DeliveryLatency] 
,[TotalCommandsDeliveredSinceSubscriptionSetup] 
,[SequenceNumber] 
,[LastDistributerSync] 
,[Retention] 
,[WorstLatency] 
,[BestLatency] 
,[AverageLatency] 
,[CurrentLatency] 
SELECT 'Transactional Replication Summary Details' AS [Comments]; 
SELECT [Publisher] 
,[PublicationName] 
,[PublisherDB] 
,[Article] 
,[Subscriber] 
,[SubscriberDB] 
,[SubscriptionType] 
,[DistributionDB] 
,SUM([UndelivCmdsInDistDB]) AS [UndelivCmdsInDistDB] ,SUM([DelivCmdsInDistDB]) AS [DelivCmdsInDistDB] 
FROM #ReplStats 
GROUP BY [Publisher] 
,[PublicationName] 
,[PublisherDB] 
,[Article] 
,[Subscriber] 
,[SubscriberDB] 
,[SubscriptionType] 
,[DistributionDB]

Summary

I hope that these few T-SQL scripts provided above will help you in your replication agents’ monitoring. I highly recommend you monitor them closely. Otherwise, users at the subscriber’s end may complain endlessly about not having (close to) real-time data.

In the coming articles, I will dig deeper into the SQL technology of replicating data to any part of the globe. Happy monitoring!

Tags: , , Last modified: June 06, 2023
Close