This article is a short review of the main scheduled maintenance with a database of the 24/7 information system that does not have downtime, as well as approaches to their execution in MS SQL Server.
Any comments and updates to the article are much appreciated.
There is the following scheduled maintenance I would like to point out:
- Scheduled backups with further verification without restoring
- Scheduled restoring of backups to verify their performance
- Analysis of data storage device that contains system and all the necessary databases
- Scheduled testing of required services
- Scheduled optimization of a system performance
- Scheduled maintenance of data integrity
- Scheduled maintenance of data validation
The first three points are the most important, as they provide a system restoring after various failures. However, I would recommend executing the least three points as well, so that users can work at ease (thus, all the queries should be executed fast) and so that data should be validated in all reporting systems.
To automate scheduled maintenance, it is possible to arrange its parts in the Agent or Windows Scheduler.
The sixth point is based on the CHECKDB command.
The seventh point is implemented towards the domain area used in the information system.
I am going to talk in details about the first five points.
Scheduled backups with further verification without restoring
Since there are a lot of articles on this topic, it should be noted that it is necessary to regularly execute this scheduled maintenance on a backup server, rather than on the main server. This backup server should contain up-to-date data (for example, the one that was got with replication). In addition, you need to back up all system databases (except tempdb) on each instance of the MS SQL Server.
When the backup fails, or a backup scan identifies an issue, it is required to report this information to administrators. For example, you may email them.
It is important to determine a strategy for backup, which will answer the following questions:
- How often and when should we back up data (full, differential, and transaction log)?
- How long and when should we delete backups?
Scheduled restoring of backups to verify their performance
I recommend executing this procedure on a backup server with third-party utilities or the RESTORE command.
When restoring backup fails, it is required to report this information to administrators. For example, you may email them.
In addition, it is necessary to restore backups of system databases. To do this, you need to restore them as a usual user database with a name that differs from names of system databases.
Analysis of data storage devices that contain system and all the necessary databases
You need to analyze how much space each database takes, how file sizes change, and how sizes of free space in the whole storage device change. For example, you can perform this task partially with automatic data collection about databases files and logical drives of the operating system in MS SQL Server.
You can do this check every day and then send results. As usual, you can send them to an email.
It is also necessary to monitor system databases so that you make sure that everything works correctly.
In addition, it is important to test storage devices to check if there are any depreciation or bad sectors.
Note that while testing a device should be out of operation and all the data should be copied to another device as testing loads the device drastically.
This task is strictly related to the system administrator duties so we will keep it aside. To take the full control over the case, you need to automate email report delivery.
I would recommend executing this test twice a year.
Scheduled testing of required services
Service downtime is a bad practice. Therefore, a backup server will come into action in case of any failures. Still, it is necessary to check logs from time to time. In addition, you can also think of an automatic data collection with further notification to an administrator by sending an email.
It is necessary to check tasks of the SQL Server Agent or Windows Scheduler with an automatic data collection about completed tasks in MS SQL Server.
Scheduled optimization of a system performance
It includes the following aspects:
- Automating index defragmentation in MS SQL Server databases
- Automating data collection about changes of database schemes in MS SQL Server. You can restore a backup and compare changes, for example, using dbForge
- Automating cleanup of stuck processes in MS SQL Server
- Cleaning up the procedure cache. Here you need to determine when and what should be cleaned up
- Implementing a performance indicator
- Developing and modifying clustered indexes
In addition, I recommend turning off the AUTO_CLOSE feature.
Sometimes, for different reasons, an optimizer parallelizes a query, which is not always optimal.
Thus, there are some recommendations you should keep in mind:
- If you get much data, then leave parallelism.
- If you get a few data, then do not use parallelism.
There are two parameters in the SQL Server instance settings responsible for parallelism:
- max degree of parallelism. To turn off parallelism, set “1” as a value, which means that only one processor will execute a code.
- cost threshold for parallelism. It should be set by default.
There are two main queues:
- a queue for the CPU time (QCPU queue). It takes place when a query has been enabled and is waiting for a processor to execute it.
- a queue for resources (QR queue). It takes place when a query is waiting for the resources to be unbound to execute the process.
The following formula describes the query execution (T):
- TP is compiling time for a plan
- TQR is queue time for resources (QR queue)
- TQCPU is queue time for resources to be unbound (QCPU queue)
- TCPU is time to execute a query
In the sys.dm_exec_query_stats system view:
- total_worket_time = TP+TCPU+TQCPU
- total_elapsed_time = TQR+TCPU
Built-in tools do not allow precisely assessing query execution time.
In most cases, total_elapsed_time provides you with the time that is close to query execution time.
You can determine the query execution time more accurately by using trace. Alternatively, you can log query start and end time. Be careful with traces as they significantly load the system. Thus, it is better to perform it on a backup server and gather data from the main server. In this case, only the network will be loaded.
When parallelizing, SQL Server allocates N processes to a query (in the edition Standart n<=4). Each process requires CPU time to execute a query (one process is not always to be executed on each core).
The more processes you have, the more chances that some will be replaced by others, which lead to increasing TQCPU.
It may take much more time to execute a query when parallelizing, in the following cases:
- Low disk sub-system throughput. In this case, query decomposition takes much more time.
- Data may be blocked for the process.
- There is no index for the predicate, which leads to a table scan.
You need to disable paralleling queries on servers where there is no need to perform huge selection (total_worket_time should be reduced due to a possible decrease of TCPU and TQCPU). To do this, you need to set the max degree of parallelism feature into ‘1’ for only one processor to work.
In addition, you can use other frameworks to build a system that determines the high-speed performance of databases. It is important to understand how these frameworks work and how to interpret retrieved numbers.
As for development and modification of indexes, namely clustered indexes, the main point is to understand how the logic of indexes is set and how it works.
Keep in mind that primary and clustered keys do not mean the same:
A primary key is a column or a set of columns, which make a record unique in the table. For the primary key, you can either create a unique clustered or non-clustered index. The primary key is used in other tables as a foreign key to provide data integrity.
A clustered index is a B-tree or its modification. The leaves contain data itself while the nodes hold index information. In addition, a clustered index can be non-unique as well. Still, I recommend it to be unique.
I would like to remind that a B-tree is a structure that stores data in the order filtered by a clustered index. Thus, it is important to group fields selected as a clustered index in a descending or ascending order. For a clustered index, you can use columns of integer (identity), as well as data and time. Still, columns such as unique identifier are not suitable as the latter will lead to regular restructuring of a B-tree, which will increase the amount of readings and records on a storage device where the database is located.
In addition, you need to make sure that the index is used with the sys.dm_db_index_usage_stats system view.
P.S. It is necessary to check whether data is up-to-date on a backup server, as well as check a system, which synchronizes this data (for example, replications).