Implementing Failover in MS SQL Server 2017 Standard

Total: 3 Average: 4.7

Introduction

Often, we need to ensure fault tolerance of MS SQL Server DBMS, especially, when there is no Enterprise edition, but only the Standard one.

We would like to note that we are not going to examine the Express edition because there are significant restrictions to this instance. Sure, we can bypass some of them. For example, to resolve the issue with the database size of 10 GB, we can split a large database into smaller ones. To do this, we can create a new database based on a certain property, and combine the selections from the same tables of different databases in the views in the principal database. However, fault tolerance in the Express edition will be performed either by a system administrator or by using your own or third-party software.

In this article, we are going to explore all existing standard fault-tolerance technologies for MS SQL Server 2017 and an example of implementing the most suitable unified standard of fault-tolerance in the Standard edition.

Short review

  1. AlwaysOn

    Load distribution among all parties, all parties should be similar in their characteristics to each other.The synchronous mode ensures the maximum reliability of data transmission; however, the performance will be equal to the speed of the slowest party. The asynchronous mode ensures the highest performance, but there may be mismatching data between the parties, which may cause a more complex maintenance and the probability of losing the latest changes in the case of the main party failure.The speed of switching into a synchronous mode is almost instantaneous and does not require a system administrator and DBA, while in the asynchronous mode, it depends on the current state of DB-duplicates and usually takes about 5 minutes (you can also automate the switching process by one DBA without a system administrator).Microsoft recommends using this technology for a database. It is available in the Enterprise edition starting from version 2012 and higher and with restrictions in the Standard edition (To learn more, please refer to Top 5 Questions about Basic Availability Groups).

  2. Clustering

    Despite the simplicity of configuration, this solution is unreliable due to the bottleneck in the form of a single data warehouse. In case of the data warehouse failure, it will take over 1 hour to restore it. This technology is available in the Standard edition of version 2008 and higher.

  3. Replication

    Any replication involves creating system triggers for each participating table while snapshot replication will heavily load the principal database. Therefore, snapshot replication can be done only during off-peak hours of the database load (for example, at night), which is unacceptable, because a hot standby is necessary. Merge replication is complicated to be maintained by some systems (for example, CRM, NAV).
    Transactional replication is possible in the Enterprise edition. Available in the Standard edition (merging and database snapshots) and the Enterprise edition (transactions) up to the version 2008 and higher.

  4. Mirroring

    It is possible in any mode. However, the synchronous mode ensures the maximum reliability and fast switching, while the asynchronous mode provides users with the maximum performance speed of the principal database. However, mismatching data is possible between parties and switching may be slow.

    Here, a witness server or DBA provides the automatic switch at the database level (for example, when the CPU load is over 50% on the principal server). A system administrator grants the connection to the other server. A backup database for any type of mirroring is in a continuous recovery mode, so it cannot be accessed.

    A recovery mode of the database is full.

    Microsoft considers it as an outdated database technology. It is available in the Standard edition (in the synchronous mode) and in the Enterprise edition (in the asynchronous mode) up to version 2008 and higher.

  5. Transaction log shipping

    There are two modes: continuous recovery on a standby server or recovery with delays. The first mode switches a backup database to a continuous recovery mode and in this case, we cannot access it.

    The second mode switches the backup database to a recovery mode regularly while deploying updates (the backup database is available between deployments, but this is possible provided that MS SQL Server instances are of the same version).

    How it works:

    1. Periodically, a backup copy of the database transaction log is stored to a public folder on the source and standby server (the directory and schedule are configured every 15 minutes by default).
    2. The standby server periodically copies the transaction log backup of the database to a local folder (the directory and schedule are configured every 15 minutes by default).
    3. The standby server restores the transaction log from the backup of the transaction log (the schedule is configured every 15 minutes by default).

    Database administrators can automate the switching process at the database level, while a system administrator can do this at the level of connecting to the server.

    Also, it should be noted that this method always works in the asynchronous mode. You can configure multiple backup databases.

    The database recovery mode is full or bulk-logged.

    It is available in the Standard edition up to the version 2008 and higher.

    There are two modes: continuous recovery on a standby server or recovery with delays.

Summary

The most preferable is transaction log shipping in the Standard edition because it is convenient to use it for a smooth transition from one server to another, for example, when updating the environment. In addition, the transaction log shipping is simple and easy to use, as well as always works in the asynchronous mode, which does not load the database much, unlike the synchronous mirroring mode. In any case, mirroring is acceptable, if it is possible to configure its own automatic switching; otherwise, false switching is possible (for example, when the CPU of the principal server is loaded by more than 50%).

For the Enterprise edition, use the AlwaysOn technology.

Configuring failover upon transaction log shipping

You can find a more detailed information on configuring the transaction log shipping here. In addition, it is possible to automate this process by developing your own utility for a repetitive multiple usage, as well as for returning to the principal server after it has been repaired in the case of failover.

Let us explore one of the possible options for debugging failover upon transaction log shipping at a DBMS level.

It should be noted that this method is suitable for a server reserved for only one instance of MS SQL Server instance, since, for several instances, there is a problem in determining which tasks to execute and which ones we do not.

Let’s describe the sequence of steps:

  1. Perform all the tasks to copy the latest files from the source (With a well-thought-out architecture, the directory must be accessible even if the principal server is down)
  2. Disable all the tasks to copy files from the source
  3. Perform all the tasks to restore a database using the latest files from the source
  4. Disable all the database restore tasks using the latest files from the source
  5. Make the database restored and principal for the log shipping, but without a recipient
  6. Create full backups of the database
  7. Create tasks to back up transaction logs

Below, there is an example of implementing the above-mentioned sequence as a stored procedure.

It should be noted that it is important to configure a login (preferably a domain login) under which the tasks will be performed to create backups of transaction logs.

An example of debugging the failover of the transaction log shipping

CREATE PROCEDURE [srv].[RunLogShippingFailover]
	@isfailover			bit=1,
	@login				nvarchar(255)=N'LOGIN', -- a domain login under which the tasks will be performed run to create backups of transaction logs.
	@backup_directory	nvarchar(255)=N'DIRECTORY'—public directory to send backups of transaction logs between MS SQL Server instances (for example, 'D:\Shared')
AS
	/*
	Moving the standby server to the main mode when the principal server is down if @ isfailover = 1 is fully automated
        when @isfailover equals 0, nothing happens - here we need to create anew the shipping log from the standby to the principal one,
        and then we need to switch to the principal server and then to configure the transaction log shipping again.
        this standby server is believed to receive backups of transaction logs from one server 
        */
BEGIN
	--if there is a shift switch to a standby server, you need to perform all the tasks to copy the latest files from the source
	if(@isfailover=1)
	begin
		select [job_id]
		into #jobs
		from [msdb].[dbo].[sysjobs]
		where [name] like 'LSCopy%';
	
		declare @job_id uniqueidentifier;
	
		while(exists(select top(1) 1 from #jobs))
		begin
			select top(1)
			@job_id=[job_id]
			from #jobs;
	
			begin try
				EXEC [msdb].dbo.sp_start_job @job_id=@job_id;
			end try
			begin catch
			end catch
	
			delete from #jobs
			where [job_id]=@job_id;
		end
		
		drop table #jobs;
	end
	
	--disable all the tasks for copying files from the source when switching to the backup server
	--enable all the tasks for copying files from the source when returning to the production server
	update [msdb].[dbo].[sysjobs]
	set [enabled]=case when (@isfailover=1) then 0 else 1 end
	where [name] like 'LSCopy%';
	
	--if we shift to a standby server, we need to perform all the tasks to restore databases by using the latest files from the source
	if(@isfailover=1)
	begin
		select [job_id]
		into #jobs2
		from [msdb].[dbo].[sysjobs]
		where [name] like 'LSRestore%';
	
		while(exists(select top(1) 1 from #jobs2))
		begin
			select top(1)
			@job_id=[job_id]
			from #jobs2;
	
			begin try
				EXEC [msdb].dbo.sp_start_job @job_id=@job_id;
				EXEC [msdb].dbo.sp_start_job @job_id=@job_id;
			end try
			begin catch
			end catch
	
			delete from #jobs2
			where [job_id]=@job_id;
		end
		drop table #jobs2;
	end
	
	--disable all the tasks to restore databases using the latest files from the source when switching to a standby server
	--enable all the tasks to restore databases using the latest files when returning to the production server 
	update [msdb].[dbo].[sysjobs]
	set [enabled]=case when (@isfailover=1) then 0 else 1 end
	where [name] like 'LSRestore%';
	
	--when switching to a standby server, we make the database restorable and principal for log shipping without a recipient
	if(@isfailover=1)
	begin
		select [secondary_database] as [name]
		into #dbs
		from msdb.dbo.log_shipping_monitor_secondary
		where [secondary_server]=@@SERVERNAME;
	
		declare @db nvarchar(255);
	
		while(exists(select top(1) 1 from #dbs))
		begin
			select top(1)
			@db=[name]
			from #dbs;
	
			begin try
				RESTORE DATABASE @db WITH RECOVERY;
			end try
			begin catch
			end catch
	
			delete from #dbs
			where [name]=@db;
		end
	
		drop table #dbs;
	
		select [secondary_database] as [name]
		into #dbs2
		from msdb.dbo.log_shipping_monitor_secondary
		where [secondary_server]=@@SERVERNAME;
	
		declare @jobId BINARY(16);
		declare @command nvarchar(max);
	
		declare @dt nvarchar(255)=cast(YEAR(GetDate()) as nvarchar(255))
							  +'_'+cast(MONTH(GetDate()) as nvarchar(255))
							  +'_'+cast(DAY(GetDate()) as nvarchar(255))
							  +'_'+cast(DatePart(hour,GetDate()) as nvarchar(255))
							  +'_'+cast(DatePart(minute,GetDate()) as nvarchar(255))
							  +'.trn';
	
		declare @backup_job_name		nvarchar(255);
		declare @schedule_name			nvarchar(255);
		declare @disk					nvarchar(255);
		declare @uid					uniqueidentifier;
	
		while(exists(select top(1) 1 from #dbs2))
		begin
			select top(1)
			@db=[name]
			from #dbs2;
	
			set @disk=@backup_directory+N'\'+@db+N'.bak';
			set @backup_job_name=N'LSBackup_'+@db;
			set @schedule_name=N'LSBackupSchedule_'+@@SERVERNAME+N'_'+@db;
			set @command=N'declare @disk nvarchar(max)='+N''''+@backup_directory+N'\'+@db+'_'+@dt+N''''
						+N'BACKUP LOG ['+@db+'] TO DISK = @disk
							WITH NOFORMAT, NOINIT,  NAME = '+N''''+@db+N''''+N', SKIP, NOREWIND, NOUNLOAD,  STATS = 10;';
			set @uid=newid();
			
			begin try
				BACKUP DATABASE @db TO  DISK = @disk 
				WITH NOFORMAT, NOINIT,  NAME = @db, SKIP, NOREWIND, NOUNLOAD,  STATS = 10;
				
				EXEC msdb.dbo.sp_add_job @job_name=@backup_job_name, 
				@enabled=1, 
				@notify_level_eventlog=0, 
				@notify_level_email=0, 
				@notify_level_netsend=0, 
				@notify_level_page=0, 
				@delete_level=0, 
				@description=N'No description available.', 
				@category_name=N'[Uncategorized (Local)]', 
				@owner_login_name=@login, @job_id = @jobId OUTPUT;
		
				EXEC msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=@backup_job_name, 
				@step_id=1, 
				@cmdexec_success_code=0, 
				@on_success_action=1, 
				@on_success_step_id=0, 
				@on_fail_action=2, 
				@on_fail_step_id=0, 
				@retry_attempts=0, 
				@retry_interval=0, 
				@os_run_priority=0, @subsystem=N'TSQL', 
				@command=@command, 
				@database_name=N'master', 
				@flags=0;
	
				EXEC msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1;
	
				EXEC msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=@backup_job_name, 
				@enabled=1, 
				@freq_type=4, 
				@freq_interval=1, 
				@freq_subday_type=4, 
				@freq_subday_interval=5, 
				@freq_relative_interval=0, 
				@freq_recurrence_factor=0, 
				@active_start_date=20171009, 
				@active_end_date=99991231, 
				@active_start_time=0, 
				@active_end_time=235959, 
				@schedule_uid=@uid;
	
				EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)';
			end try
			begin catch
			end catch
	
			delete from #dbs2
			where [name]=@db;
		end
	
		drop table #dbs2;
	end
END

To return to the principal server, it is necessary to configure the transaction log shipping from the standby server to the principal one, and then to perform the debugging of a failover. Then, the principal server will become the production server. After that, you need to configure the transaction log shipping from the production server to the standby one.

Configuring automatic adjustment for monitoring the transaction log shipping

To monitor the transaction log shipping, use the LSAlert_<INSTANCE_NAME> task and a report on the monitoring server. To do this, right-click the instance on the monitoring server and then select Reports/Standard report/ transaction log shipping status.

Quite often, over time, the monitoring server (in case if it is not a production one) incorrectly takes the recent time of creating a backup of the database transaction log on the production server. As a result, we face false warnings.

It is possible to solve the issue using the following script:

An example of configuring the adjustment for monitoring transaction log shipping

CREATE PROCEDURE [srv].[AutoCorrectMonitorLogShipping] 
AS
BEGIN
	/*
		Adjustment of monitoring the transaction log shipping
	*/
	SET NOCOUNT ON;

    update t2
	set
	    t2.[last_backup_date]=t1.[BackupFinishDate]
	    ,t2.[last_backup_date_utc]=DateAdd(hour,-DateDiff(hour,GetUTCDate(),GetDate()),t1.[BackupFinishDate])
		,t2.[last_backup_file]=
RIGHT(t1.[PhysicalDeviceName], CHARINDEX('\',REVERSE(t1.[PhysicalDeviceName]),1)-1)

	from [PRODUCTION_INSTANCE_NAME].[SRV].[inf].[vServerLastBackupDB] as t1
	inner join [msdb].[dbo].[log_shipping_monitor_primary] as t2 on t1.[DBName] collate SQL_Latin1_General_CP1_CI_AS=t2.[primary_database] collate SQL_Latin1_General_CP1_CI_AS
	where t1.[BackupType]=N'log';
END

We can automate a call for a stored procedure by time. For example, we can create an appropriate task in the Agent and schedule it for every 5 minutes. Of course, the production server must be linked to the backup server (Server objects\Linked servers).

Here we use the [inf].[vServerLastBackupDB] view in the SRV database that defines the latest database backups:

An example of implementing the vServerLastBackupDB view:

CREATE VIEW [inf].[vServerLastBackupDB] as
with backup_cte as
(
    select
        bs.[database_name],
        backup_type =
            case bs.[type]
                when 'D' then 'database'
                when 'L' then 'log'
                when 'I' then 'differential'
                else 'other'
            end,
        bs.[first_lsn],
		bs.[last_lsn],
		bs.[backup_start_date],
		bs.[backup_finish_date],
		cast(bs.[backup_size] as decimal(18,3))/1024/1024 as BackupSizeMb,
        rownum = 
            row_number() over
            (
                partition by bs.[database_name], type 
                order by bs.[backup_finish_date] desc
            ),
		LogicalDeviceName = bmf.[logical_device_name],
		PhysicalDeviceName = bmf.[physical_device_name],
		bs.[server_name],
		bs.[user_name]
    FROM msdb.dbo.backupset bs
    INNER JOIN msdb.dbo.backupmediafamily bmf 
        ON [bs].[media_set_id] = [bmf].[media_set_id]
)
select
    [server_name] as [ServerName],
	[database_name] as [DBName],
	[user_name] as [USerName],
    [backup_type] as [BackupType],
	[backup_start_date] as [BackupStartDate],
    [backup_finish_date] as [BackupFinishDate],
	[BackupSizeMb], -- uncompressed size
	[LogicalDeviceName],
	[PhysicalDeviceName],
	[first_lsn] as [FirstLSN],
	[last_lsn] as [LastLSN]
from backup_cte
where rownum = 1;

Result

In this article, we have reviewed briefly all possible fault tolerance and quick availability options in MS SQL Server 2017, as well as examples of implementing the debugging of failover and automatic adjustment of monitoring the transaction log shipping.

References:

Evgeniy Gribkov