I noticed that very few people understand how indexes work in SQL Server, especially Included Columns. Nevertheless, indexes are the great way to optimize queries. At first, I also did not get the idea of the included columns, but my experiments showed that they are very useful.
Suppose, we have the following table and query:
CREATE TABLE Person (
SELECT FirstName, LastName, Age
WHERE FirstName = 'John' and LastName = 'Smith'
It is clear that PersonID is a primary key. Suppose we have an index by the first and last names, let’s call it IX_Person_FirstNameLastName. The execution plan for such a query will look as follows:
- Locating all the lines with the specified first and last names with help of the IX_Person_FirstNameLastName index tree
- Detecting the actual location of the line on the disk on the index leaves, going to the actual location and reading the age.
Now, let’s consider that this query is executed quite frequently. We have to execute 2 steps each time. Can it be optimized? In case of MS SQL Server, it is not a problem – you can include values right into the index with the help of the INCLUDE option.
CREATE INDEX IX_PERSON ON Person
Now, this field is not used during indexing but is included in the index. What troubles can we face in this regard? When we index a table by a certain field, the database server must build an index tree by this field. This means that we need to change the index tree when changing the value. When values are modified intensively, it becomes a problematic and hard task for the server. When updating becomes too massive, sometimes it is easier to drop the index. Index greatly optimizes the search but negatively affects the insert, delete, and update operations.
If a field is simply included in an index, it is not used during the building of an index tree and does not affect it, but the value can be easily found on the leaf of this tree. When a search by the last and first names takes place, the server searches for all the first and last names from the tree, and when it reaches the leaf (finds the required index value), then in addition to the pointer to the physical location of the line values, it also contains field values included in the index. It means that there is no need to take the second step for switching to the physical location of the line and read it from there.
Since you don’t need to change the tree when modifying the age data, all this stuff does not affect much the data modification operations. We don’t need to change the index, we just need to change the values on the tree leaf. That is why even a massive change of the Age field won’t have a great impact on the performance. It will certainly affect, but not so much.
As far as I know, the values of the clustered index are automatically included in the leaf level, but this must be checked with the specification.
So, when the use of the included fields is beneficial? When they are frequently used in query results but are changed once in a while. An example is a table of bank transactions. Such table may consist of the following fields: account number, transaction type, date, sum. There is no point in indexing by the sum, but we can include it into the index and it will significantly speed up the query.
To retrieve the real effect from indexing, the queries should not select all fields, i.e. we should forget about SELECT * FROM table. Always recalculate only the fields you really need. And if their values come to be in the index, the execution speed may be quite high.