Implementing Failover in MS SQL Server 2017 Standard

Implementing Failover in MS SQL Server 2017 Standard
Rate this post

Introduction

Often, we need to ensure fault tolerance of MS SQL Server DBMS, especially, when there is no Enterprise edition, but only the Standard one.

We would like to note that we are not going to examine the Express edition because there are significant restrictions to this instance. Sure, we can bypass some of them. For example, to resolve the issue with the database size of 10 GB, we can split a large database into smaller ones. To do this, we can create a new database based on a certain property, and combine the selections from the same tables of different databases in the views in the principal database. However, fault tolerance in the Express edition will be performed either by a system administrator or by using your own or third-party software.

In this article, we are going to explore all existing standard fault-tolerance technologies for MS SQL Server 2017 and an example of implementing the most suitable unified standard of fault-tolerance in the Standard edition.

Short review

  1. AlwaysOn

    Load distribution among all parties, all parties should be similar in their characteristics to each other.The synchronous mode ensures the maximum reliability of data transmission; however, the performance will be equal to the speed of the slowest party. The asynchronous mode ensures the highest performance, but there may be mismatching data between the parties, which may cause a more complex maintenance and the probability of losing the latest changes in the case of the main party failure.The speed of switching into a synchronous mode is almost instantaneous and does not require a system administrator and DBA, while in the asynchronous mode, it depends on the current state of DB-duplicates and usually takes about 5 minutes (you can also automate the switching process by one DBA without a system administrator).Microsoft recommends using this technology for a database. It is available in the Enterprise edition starting from version 2012 and higher and with restrictions in the Standard edition (To learn more, please refer to Top 5 Questions about Basic Availability Groups).

  2. Clustering

    Despite the simplicity of configuration, this solution is unreliable due to the bottleneck in the form of a single data warehouse. In case of the data warehouse failure, it will take over 1 hour to restore it. This technology is available in the Standard edition of version 2008 and higher.

  3. Replication

    Any replication involves creating system triggers for each participating table while snapshot replication will heavily load the principal database. Therefore, snapshot replication can be done only during off-peak hours of the database load (for example, at night), which is unacceptable, because a hot standby is necessary. Merge replication is complicated to be maintained by some systems (for example, CRM, NAV).
    Transactional replication is possible in the Enterprise edition. Available in the Standard edition (merging and database snapshots) and the Enterprise edition (transactions) up to the version 2008 and higher.

  4. Mirroring

    It is possible in any mode. However, the synchronous mode ensures the maximum reliability and fast switching, while the asynchronous mode provides users with the maximum performance speed of the principal database. However, mismatching data is possible between parties and switching may be slow.

    Here, a witness server or DBA provides the automatic switch at the database level (for example, when the CPU load is over 50% on the principal server). A system administrator grants the connection to the other server. A backup database for any type of mirroring is in a continuous recovery mode, so it cannot be accessed.

    A recovery mode of the database is full.

    Microsoft considers it as an outdated database technology. It is available in the Standard edition (in the synchronous mode) and in the Enterprise edition (in the asynchronous mode) up to version 2008 and higher.

  5. Transaction log shipping

    There are two modes: continuous recovery on a standby server or recovery with delays. The first mode switches a backup database to a continuous recovery mode and in this case, we cannot access it.

    The second mode switches the backup database to a recovery mode regularly while deploying updates (the backup database is available between deployments, but this is possible provided that MS SQL Server instances are of the same version).

    How it works:

    1. Periodically, a backup copy of the database transaction log is stored to a public folder on the source and standby server (the directory and schedule are configured every 15 minutes by default).
    2. The standby server periodically copies the transaction log backup of the database to a local folder (the directory and schedule are configured every 15 minutes by default).
    3. The standby server restores the transaction log from the backup of the transaction log (the schedule is configured every 15 minutes by default).

    Database administrators can automate the switching process at the database level, while a system administrator can do this at the level of connecting to the server.

    Also, it should be noted that this method always works in the asynchronous mode. You can configure multiple backup databases.

    The database recovery mode is full or bulk-logged.

    It is available in the Standard edition up to the version 2008 and higher.

    There are two modes: continuous recovery on a standby server or recovery with delays.

Summary

The most preferable is transaction log shipping in the Standard edition because it is convenient to use it for a smooth transition from one server to another, for example, when updating the environment. In addition, the transaction log shipping is simple and easy to use, as well as always works in the asynchronous mode, which does not load the database much, unlike the synchronous mirroring mode. In any case, mirroring is acceptable, if it is possible to configure its own automatic switching; otherwise, false switching is possible (for example, when the CPU of the principal server is loaded by more than 50%).

For the Enterprise edition, use the AlwaysOn technology.

Configuring failover upon transaction log shipping

You can find a more detailed information on configuring the transaction log shipping here. In addition, it is possible to automate this process by developing your own utility for a repetitive multiple usage, as well as for returning to the principal server after it has been repaired in the case of failover.

Let us explore one of the possible options for debugging failover upon transaction log shipping at a DBMS level.

It should be noted that this method is suitable for a server reserved for only one instance of MS SQL Server instance, since, for several instances, there is a problem in determining which tasks to execute and which ones we do not.

Let’s describe the sequence of steps:

  1. Perform all the tasks to copy the latest files from the source (With a well-thought-out architecture, the directory must be accessible even if the principal server is down)
  2. Disable all the tasks to copy files from the source
  3. Perform all the tasks to restore a database using the latest files from the source
  4. Disable all the database restore tasks using the latest files from the source
  5. Make the database restored and principal for the log shipping, but without a recipient
  6. Create full backups of the database
  7. Create tasks to back up transaction logs

Below, there is an example of implementing the above-mentioned sequence as a stored procedure.

It should be noted that it is important to configure a login (preferably a domain login) under which the tasks will be performed to create backups of transaction logs.

An example of debugging the failover of the transaction log shipping

To return to the principal server, it is necessary to configure the transaction log shipping from the standby server to the principal one, and then to perform the debugging of a failover. Then, the principal server will become the production server. After that, you need to configure the transaction log shipping from the production server to the standby one.

Configuring automatic adjustment for monitoring the transaction log shipping

To monitor the transaction log shipping, use the LSAlert_<INSTANCE_NAME> task and a report on the monitoring server. To do this, right-click the instance on the monitoring server and then select Reports/Standard report/ transaction log shipping status.

Quite often, over time, the monitoring server (in case if it is not a production one) incorrectly takes the recent time of creating a backup of the database transaction log on the production server. As a result, we face false warnings.

It is possible to solve the issue using the following script:

An example of configuring the adjustment for monitoring transaction log shipping

We can automate a call for a stored procedure by time. For example, we can create an appropriate task in the Agent and schedule it for every 5 minutes. Of course, the production server must be linked to the backup server (Server objects\Linked servers).

Here we use the [inf].[vServerLastBackupDB] view in the SRV database that defines the latest database backups:

An example of implementing the vServerLastBackupDB view:

Result

In this article, we have reviewed briefly all possible fault tolerance and quick availability options in MS SQL Server 2017, as well as examples of implementing the debugging of failover and automatic adjustment of monitoring the transaction log shipping.

References:

Evgeniy Gribkov

Evgeniy Gribkov

Evgeniy is a MS SQL Server database analyst, developer and administrator. He is involved in development and testing of tools for SQL Server database management. Evgeniy also writes SQL Server-related articles.
Evgeniy Gribkov