Written by 12:22 Computer Environment, Languages & Coding

CombGuid: Generation of SQL Server-friendly Guid Values in .NET Applications

Usage of UUID as a primary key for tables has a bunch of pros, including the option to retrieve IDs for objects created in a client application on its own without calls to the database server. However, usage of UUID as a primary key has a con: GUIDs generated by the client application may be not quite SQL Server-friendly that can lead to the overhead during the addition of a new record.

The possible cost increase of the INSERT operation is a result of that SQL Server usually uses structures, known as b-trees for storing tables for which the primary key is set. When you add a new record into a table, SQL Server, according to the sorting by the primary key, looks for a sheet which should contain the record being inserted. Taking into account the pseudorandom algorithm of the UUID generation, the sorting order of the new record is also random, and it may happen that the sheet that is supposed to contain the new record is packed to capacity. In such cases, SQL Server has to divide the sheet into two parts and rebuild branches of the b-tree that lead to this sheet.

In order to prevent SQL Server from the constant rebuild of the clustered index during new records addition, we can generate the primary key values incrementally. One of the variants of the GUID generation incrementally is to link the sorting order of the generated GUID to the current time. Generation of IDs in such a way is often called CombGuid, implying that they are built up of two halves – a pseudorandom part, as in a normal GUID, and a row linked to time.

How SQL Server Compares UUIDs

SQL Server sorts UUID values in a way that differs from the .NET one. The comparison is carried out by byte groups from right to left. Inside a byte group, the comparison is carried out from left to right. (A byte group is a sequence limited by the ‘-’ symbol.) If we compare two UUID values,

@u1 = ‘206AEBE7-ABF0-47A8-8AA5-6FDDF39B9E4F’

and

@u2 =’0F8257A1-B40C-4DA0-8A37-8BBC55183CAE’, we will get @u2> @u1, since, as I wrote above, SQL Server begins comparing from the rightmost byte groups, where 6FDDF39B9E4F < 8BBC55183CAE. Speaking technically, bytes from 9 to 15 in the descending order have the biggest impact on the sorting order.

Implementation of CombGuid in the Magnum Library

We use the Magnum library in our project. The library includes the static CombGuid class with a single Generate() method that creates GUIDs linked to time. Magnum is an open-source library, that is available at GitHub. I decided to go and see how the implementation of the Guid creation method looks like in this library.

public static class CombGuid
{
        static readonly DateTime _baseDate = new DateTime(1900, 1, 1);

        public static Guid Generate()
        {
                byte[] guidArray = Guid.NewGuid().ToByteArray();

                DateTime now = DateTime.Now;

                // Get the days and milliseconds which will be used to build the byte string
                var days = new TimeSpan(now.Ticks - _baseDate.Ticks);
                TimeSpan msecs = now.TimeOfDay;

                // Convert to a byte array
                // Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.333333
                byte[] daysArray = BitConverter.GetBytes(days.Days);
                byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds/3.333333));

                // Reverse the bytes to match SQL Servers ordering
                Array.Reverse(daysArray);
                Array.Reverse(msecsArray);

                // Copy the bytes into the guid
                Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2);
                Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4);

                return new Guid(guidArray);
        }
}

The algorithm is pretty simple. In Bytes 9-10, the number of days passed from January 1900 is encoded. Bytes 11-15 are used for encoding of milliseconds form the beginning of the day (for some reason, they are divided by 3.33333). The code comments indicate that this operation is related to the fact that the preciseness of the timestamp storage in SQL Server comprises 1/300 of a second. It is quite an odd solution, taking into account that when generating UUID, we really do not care how SQL Server stores timestamps. I googled this question for a while, and the only thing I understood was that Chris Patterson, the author of the Magnum library, copied the code of the CombGuid generation from Nhibernate. As you can learn from here, the GenerateComb method contains the same code. Fairness requires me to say that dividing milliseconds by 3.333333 does not have a notable impact on the algorithm work. It is just an unnecessary and optional step.

Guid vs CombGuid. Comparing the insert speed in DB

Finally, we came down to the main point of the article: let’s compare the speed of UUIDs generated with the Guid.NewGuid() method with their contenders created with CombGuid.Generate() in the context of the records insert into an SQL Server table.

For this test, I created two scripts that create tables on SQL Server and insert 100000 rows into these tables. The first script inserts the rows with IDs created with CombGuid.Generate(), the second one – with the Guid.NewGuid() method.

A fragment of the text script:

USE [CombIdTest]
GO
--resetting server cache
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
CREATE TABLE [dbo].[CombId](
        [ID] [uniqueidentifier] NOT NULL,
        [Value] [varchar](4000) NOT NULL,
 CONSTRAINT [PK_CombId] PRIMARY KEY CLUSTERED
(
        [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
--insert within a single transaction
begin transaction
 insert into CombId Values ('5cb31d3d-3793-428e-beb0-a2e4047e255c','somevalue');
 insert into CombId Values ('1e905fa1-e4d4-4a2c-a185-a2e4047e255d','somevalue');
-- another 99998 insert operations
commit transaction

Before the insert execution, buffer caches were reset and the insert was carried out within a single transaction in order to minimize the number of calls to the transactions log. Each script was run three times, the ‘Total execution time’ parameter from the client statistics was taken as the execution time. The measurements were made on MSSQL Server 2012.

Measurement Results (in milliseconds):

The advantage of the script that inserts the records containing CombGuid is more than 10 percent over the script with a ‘normal’ UUID. Also, CombGuid had a positive effect on the table size – it turned out to be smaller by half: 3.75 Mb against 5.25 Mb.

In the end, I’d like to ask you the following questions:

  • What do you use as primary keys for your databases?
  • If you use UUID or similar byte structures, how do you generate them?
Tags: , Last modified: September 23, 2021
Close