Analysis of MS SQL Server for those who see it for the first time – Part 1

Ive recently encountered a problem – SVN went down on ubuntu server. I develop for Windows and I do not have much experience with Linux. I googled the errorwithout success. The error turned out to be the most typical one (the server unexpectedly closed the connection) and does not indicate anything. Therefore, it was necessary to go deeper and analyze logs/settings/rights/etc.

Finally, I figured out the mistake and found everything I needed, but I spent a lot of time. After solving this problem, I thought about how to reduce the uselessly spent hours and decided to write an article that will help people quickly get the understanding of the unfamiliar software.

I would like to emphasize that in this article, I will write about MS SQL Server, not Linux. I often had to answer standard questions of beginners, which I discuss in this article.

Targeted audience

If you are a database administrator, you can save this article for yourself. My article can oftentimes be useful to you in your work. If beginners appeal to you, you can save your time by simply sharing my article with them.

SQL Server is often used in small companies and it’s not always profitable for them to employ a DB administrator, a regular developer will have to deal with the stuff. This is those for whom this article is written.

What problems will we consider?

If the server notifies you “there is no more space on the E drive” – no deep analysis is needed. We will not consider errors, the solution of which is obvious from the text of the message and for which Google immediately throws a link to MSDN with the solution.

Let’s examine the problems which are not obvious for Google, such as, for example, a sudden drop in performance or the absence of connection. Consider the main tools for customization and analysis. Let’s see where the logs and other useful information are located. In fact, I’ll try to collect in one article all the necessary information for a quick start.

First of all

We are going to start with the most frequent questions and consider them separately. Besides, authors write about MS SQL performance in all the relevant articles – and I will not break from the tradition.

If your database suddenly, for no apparent reason, began to work slowly, but you did not change anything (as initially installed, nobody touched anything) – first of all, update the statistics and rebuild the indexes. Make sure that this is done, then it makes sense to dig deeper. I would like to emphasize that you must do this, the only question is how often.

On the Internet, there are lots of methods like this, examples of scripts are provided. I will assume that all those methods are for professionals and are odd for beginners. Well, I’ll describe the simplest way: you only need a mouse for implementing it.

Abbreviations

  • Profiler is an application of “SQL Server Profiler” installed with SSMS.

  • Performance Monitor is a snap-in of the control panel, allows you to monitor the performance counters, log and view the history of measurements.

Statistics update using a “service plan”:

  • run SSMS;

  • connect to a required server;

  • expand the tree in Object Inspector: Management\Maintenance Plans (Service Plans);

  • right-click the node and select “Maintenance Plan Wizard”;

  • in the wizard, mark the required tasks: rebuild index and update statistics

  • you can mark both tasks at once or make two maintenance plans with one task in each (see the “important notes” below);

  • further, we check a required DB (or several databases). We do this for each task (if two tasks are chosen, there will be two dialogs with the choice of a database);

  • Next, Next, Finish.

After these actions, a “maintenance plan” will be created (not executed). You can run it manually by rightclicking it and selecting “Execute”. Alternatively, you configure the launch via “SQL Agent”.

Important notes:

  • Updating statistics is a nonblocking operation. You can perform it in a working mode. Of course, this will create an additional load, but everything is running slowly anyway, you will not notice.

  • Index rebuilding is a blocking operation. You can Run it only outside working hours. There is an exception — the Enterprise edition of the server allows the execution of an “online rebuild”. This option can be enabled in the task settings. Please note that there is a checkmark in all the editions, but it works only in Enterprise.

  • Of course, these tasks must be performed regularly. I suggest an easy way to determine how often you do this:

– With the first problems, execute the maintenance plan;

– If it helped – wait until the problems occur again (usually until the next monthly closing/salary calculation/ etc. of bulk transactions);

– The resulting period of normal operation will be your reference point;

– For example, configure the execution of the maintenance plan twice as often.

The server is slow – what should you do?

The resources used by the server

Like any other program, the server needs processor time, data on the disk, the amount of RAM and network bandwidth.

Task Manager will help you assess the lack of a given resource in the first approximation, no matter how terrible it may sound.

CPU Load

Even a schoolboy can check the utilization in the Manager. We just need to make sure that if the processor is loaded, then it’s the sqlserver.exe process.

If this is your case, then you need to go to the analysis of user activity to understand what exactly caused the load (see below).

Disc Load

Many people only look at the CPU load but do not forget that the DBMS is a data store. The data volumes are growing, the processor performance is increasing while the HDD speed is pretty much the same. With SSDs the situation is better, but storing terabytes on them is expensive.

It turns out that I often encounter situations where the disk system becomes the bottleneck, rather than the CPU.

For disks, the following metrics are important:

  • average queue length (outstanding I/O operations, number);

  • read-write speed (in Mb/s).

The server version of the Task Manager, as a rule (depending on the system version), shows both. If not, run the Performance Monitor snap-in (system monitor). We are interested in the following counters:

  • Physical (logical) disk/Average read (write) time

  • Physical (logical) disk/Average disk queue length

  • Physical (logical) disk/Disk speed

For more details, you can read the manufacturer’s manuals, for example, here: social.technet.microsoft.com/wiki/contents/articles/3214.monitoring-disk-usage.aspx.

In short:

  • The queue should not exceed 1. Short bursts are allowed if they quickly subside. The bursts can be different depending on your system. For a simple RAID mirror of two HDDs – the queue of more than 10-20 is a problem. For a cool library with super caching, I saw bursts of up to 600-800 which were instantly resolved without causing delays.

  • The normal exchange rate also depends on the type of a disk system. The usual (desktop) HDD transmits at 50-100 MB/s. A good disk library – at 500 MB/s and more. For small random operations, the speed is less. This may be your reference point.

  • These parameters must be considered as a whole. If your library transmits 50MB/s and a queue of 50 operations lines up — obviously, something is wrong with the hardware. If the queue lines up when the transmission is close to a maximum – most likely, the disks are not to be blamed for – they just cannot do more – we need to look for a way to reduce the load.

  • The load should be checked separately on disks (if there are several of them) and compared with the location of server files. The Task Manager can show the most actively used files. This can be used to ensure that the load is caused by DBMS.

What can cause the disk system problems:

  • problems with hardware

  • cache burned out, performance dropped dramatically;

  • the disk system is used by something else;

  • RAM shortage. Swapping. Сaching decayed, performance dropped (see the section about RAM below).

  • User load increased. It is necessary to evaluate the work of users (problematic query/new functionality/increase in the number of users/increase in the amount of data/etc).

  • Database data fragmentation (see the index rebuild above), system files fragmentation.

  • The disk system has reached its maximum capabilities.

In case of the last option – do not throw out the hardware at once. Sometimes, you can get a little more out of the system if you approach the problem wisely. Check the location of the system files for compliance with the recommended requirements:

  • Do not mix OS files with database data files. Store them on different physical media so that the system does not compete with DBMS for I/O.

  • The database consists of two file types: data (*.mdf, *.ndf) and logs (*.ldf).
    Data files, as a rule, are mostly used for reading. Logs serve for writing (wherein the writing is consecutive). It is, therefore, recommended to store logs and data on different physical media so that the logging does not interrupt the data reading (as a rule, the write operation takes precedence over reading).

  • MS SQL can use “temporary tables” for query processing. They are stored in the tempdb system database. If you have a high load on files of this database, you can try to render it on physically separate media.

Summarizing the issue with file location, use the principle of “divide and conquer”. Evaluate which files are accessed and try to distribute them to different media. Also, use the features of RAID systems. For example, RAID-5 reads faster than writes – which is good for data files.

In the following Part 2, we will consider:

  • how to analyze the use of RAM and networks.

  • how to review in detail the work of users using SSMS, profiler and direct queries to system views.

  • query plan and statistics (consider a number of ways). live query statistics.

  • waits. current information and statistics.

  • problems with server connection. processes/ports/protocols

This article was translated by Codingsight team with the permission of the author.

Mikhail Radchenko

Mikhail Radchenko

Mikhail develops software for power engineering. Participated in projects as a developer, architect, and team lead. A fan of MS technologies: MS SQL Server, .NET, C#, WPF, etc.
Mikhail Radchenko

Mikhail Radchenko

Mikhail develops software for power engineering. Participated in projects as a developer, architect, and team lead. A fan of MS technologies: MS SQL Server, .NET, C#, WPF, etc.