Do you still hold on to the parent/child design, or would like to try something new, like SQL Server hierarchyID? Well, it is really new because hierarchyID has been a part of SQL Server since 2008. Of course, the novelty itself is not a persuasive argument. But note that Microsoft added this feature to represent one-to-many relationships with multiple levels in a better way.
You may wonder what difference it makes and which benefits you get from using hierarchyID instead of the usual parent/child relationships. If you never explored this option, it might be surprising for you.
The truth is, I didn’t explore this option since it was released. However, when I finally did it, I found it a great innovation. It is a better-looking code, but it has much more in it. In this article, we are going to find out about all those excellent opportunities.
However, before we dive into the peculiarities of using SQL Server hierarchyID, let’s clarify its meaning and scope.
What is SQL Server HierarchyID?
SQL Server hierarchyID is a built-in data type designed to represent trees, which are the most common type of hierarchical data. Each item in a tree is called a node. In a table format, it is a row with a column of hierarchyID data type.
Usually, we demonstrate hierarchies using a table design. An ID column represents a node, and another column stands for the parent. With the SQL Server HierarchyID, we only need one column with a data type of hierarchyID.
When you query a table with a hierarchyID column, you see hexadecimal values. It is one of the visual images of a node. Another way is a string:
‘/’ stands for the root node;
‘/1/’, ‘/2/’, ‘/3/’ or ‘/n/’ stand for the children – direct descendants 1 to n;
‘/1/1/’ or ‘/1/2/’ are the “children of children – “grandchildren.” The string like ‘/1/2/’ means that the first child from the root has two children, which are, in their turn, two grandchildren of the root.
Here’s a sample of what it looks like:
Unlike other data types, hierarchyID columns can take advantage of built-in methods. For example, if you have a hierarchyID column named RankNode, you can have the following syntax:
SQL Server HierarchyID Methods
One of the available methods is IsDescendantOf. It returns 1 if the current node is a descendant of a hierarchyID value.
You can write code with this method similar to the one below:
SELECT r.RankNode ,r.Rank FROM dbo.Ranks r WHERE r.RankNode.IsDescendantOf(0x58) = 1
Other methods used with hierarchyID are the following:
- GetRoot – the static method that returns the root of the tree.
- GetDescendant – returns a child node of a parent.
- GetAncestor – returns a hierarchyID representing the nth ancestor of a given node.
- GetLevel – returns an integer that represents the depth of the node.
- ToString – returns the string with the logical representation of a node. ToString is called implicitly when the conversion from hierarchyID to the string type occurs.
- GetReparentedValue – moves a node from the old parent to the new parent.
- Parse – acts as the opposite of ToString. It converts the string view of a hierarchyID value to hexadecimal.
SQL Server HierarchyID Indexing Strategies
To ensure that queries for tables using hierarchyID run as fast as possible, you need to index the column. There are two indexing strategies:
In a depth-first index, the subtree rows are closer to each other. It suits queries like finding a department, its subunits, and employees. Another example is a manager and its employees stored closer together.
In a table, you can implement a depth-first index by creating a clustered index for the nodes. Further, we perform one of our examples, just like that.
In a breadth-first index, the same level’s rows are closer together. It suits queries like finding all the manager’s directly reporting employees. If most of the queries are similar to this, create a clustered index based on (1) level and (2) node.
It depends on your requirements if you need a depth-first index, a breadth-first, or both. You need to balance between the importance of the queries type and the DML statements you execute on the table.
SQL Server HierarchyID Limitations
Unfortunately, using hierarchyID can’t resolve all issues:
- SQL Server can’t guess what the child of a parent is. You have to define the tree in the table.
- If you don’t use a unique constraint, the generated hierarchyID value won’t be unique. Handling this problem is the developer’s responsibility.
- Relationships of a parent and child nodes are not enforced like a foreign key relationship. Hence, before deleting a node, query for any descendants existing.
Before we proceed, consider one more question. Looking at the result set with node strings, do you find the hierarchy visualizing hard for your eyes?
For me, it’s a big yes because I am not getting younger.
For this reason, we are going to use Power BI and Hierarchy Chart from Akvelon along with our database tables. They will help to display the hierarchy in an organizational chart. I hope it will make the job easier.
Now, let’s get down to business.
Uses of SQL Server HierarchyID
You can use HierarchyID with the following business scenarios:
- Organizational structure
- Folders, subfolders, and files
- Tasks and subtasks in a project
- Pages and subpages of a website
- Geographical data with countries, regions, and cities
Even if your business scenario is similar to the above, and you rarely query across the hierarchy sections, you don’t need hierarchyID.
For example, your organization processes payrolls for employees. Do you need to access the subtree to process someone’s payroll? Not at all. However, if you process commissions of people in a multi-level marketing system, it can be different.
In this post, we use the portion of the organizational structure and the chain of command on a cruise ship. The structure was adapted from the organizational chart from here. Take a look at it in Figure 4 below:
Now you can visualize the hierarchy in question. We use the below tables throughout this post:
- Vessels – is the table standing for the cruise ships’ list.
- Ranks – is the table of crew ranks. There we establish hierarchies using the hierarchyID.
- Crew – is the list of the crew of each vessel and their ranks.
The table structure of each case is as follows:
CREATE TABLE [dbo].[Vessel]( [VesselId] [int] IDENTITY(1,1) NOT NULL, [VesselName] [varchar](20) NOT NULL, CONSTRAINT [PK_Vessel] PRIMARY KEY CLUSTERED ( [VesselId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[Ranks]( [RankId] [int] IDENTITY(1,1) NOT NULL, [Rank] [varchar](50) NOT NULL, [RankNode] [hierarchyid] NOT NULL, [RankLevel] [smallint] NOT NULL, [ParentRankId] [int] -- this is redundant but we will use this to compare -- with parent/child ) ON [PRIMARY] GO CREATE UNIQUE NONCLUSTERED INDEX [IX_RankId] ON [dbo].[Ranks] ( [RankId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE UNIQUE CLUSTERED INDEX [IX_RankNode] ON [dbo].[Ranks] ( [RankNode] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE TABLE [dbo].[Crew]( [CrewId] [int] IDENTITY(1,1) NOT NULL, [CrewName] [varchar](50) NOT NULL, [DateHired] [date] NOT NULL, [RankId] [int] NOT NULL, [VesselId] [int] NOT NULL, CONSTRAINT [PK_Crew] PRIMARY KEY CLUSTERED ( [CrewId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[Crew] WITH CHECK ADD CONSTRAINT [FK_Crew_Ranks] FOREIGN KEY([RankId]) REFERENCES [dbo].[Ranks] ([RankId]) GO ALTER TABLE [dbo].[Crew] CHECK CONSTRAINT [FK_Crew_Ranks] GO ALTER TABLE [dbo].[Crew] WITH CHECK ADD CONSTRAINT [FK_Crew_Vessel] FOREIGN KEY([VesselId]) REFERENCES [dbo].[Vessel] ([VesselId]) GO ALTER TABLE [dbo].[Crew] CHECK CONSTRAINT [FK_Crew_Vessel] GO
Inserting Table Data with SQL Server HierarchyID
The first task in using hierarchyID thoroughly is to add records into the table with a hierarchyID column. There are two ways to do it.
The quickest way to insert data with hierarchyID is to use strings. To see this in action, let’s add some records to the Ranks table.
INSERT INTO dbo.Ranks ([Rank], RankNode, RankLevel) VALUES ('Captain', '/',0) ,('First Officer','/1/',1) ,('Chief Engineer','/2/',1) ,('Hotel Director','/3/',1) ,('Second Officer','/1/1/',2) ,('Second Engineer','/2/1/',2) ,('F&B Manager','/3/1/',2) ,('Chief Housekeeping','/3/2/',2) ,('Chief Purser','/3/3/',2) ,('Casino Manager','/3/4/',2) ,('Cruise Director','/3/5/',2) ,('Third Officer','/1/1/1/',3) ,('Third Engineer','/2/1/1/',3) ,('Asst. F&B Manager','/3/1/1/',3) ,('Asst. Chief Housekeeping','/3/2/1/',3) ,('First Purser','/3/3/1/',3) ,('Asst. Casino Manager','/3/4/1/',3) ,('Music Director','/3/5/1/',3) ,('Asst. Cruise Director','/3/5/2/',3) ,('Youth Staff Director','/3/5/3/',3)
The above code adds 20 records to the Ranks table.
As you can see, the tree structure has been defined in the INSERT statement above. It is discernible easily when we use strings. Besides, SQL Server converts it to the corresponding hexadecimal values.
Using Max(), GetAncestor(), and GetDescendant()
Using strings suits the task of populating the initial data. In the long run, you need the code to handle insertion without providing strings.
To do this task, get the last node used by a parent or ancestor. We accomplish it by using the functions MAX() and GetAncestor(). See the sample code below:
-- add a bartender rank reporting to the Asst. F&B Manager DECLARE @MaxNode HIERARCHYID DECLARE @ImmediateSuperior HIERARCHYID = 0x7AD6 SELECT @MaxNode = MAX(RankNode) FROM dbo.Ranks r WHERE r.RankNode.GetAncestor(1) = @ImmediateSuperior INSERT INTO dbo.Ranks ([Rank], RankNode, RankLevel) VALUES ('Bartender', @ImmediateSuperior.GetDescendant(@MaxNode,NULL), @ImmediateSuperior.GetDescendant(@MaxNode, NULL).GetLevel())
Below are the points taken from the above code:
- First, you need a variable for the last node and the immediate superior.
- The last node can be acquired using MAX() against RankNode for the specified parent or immediate superior. In our case, it’s the Assistant F&B Manager with a node value of 0x7AD6.
- Next, to ensure no duplicate child appears, use @ImmediateSuperior.GetDescendant(@MaxNode, NULL). The value in @MaxNode is the last child. If it’s not NULL, GetDescendant() returns the next possible node value.
- Last, GetLevel() returns the level of the new node created.
After adding records to our table, it’s time to query it. 2 ways to query data are available:
The query for Direct Descendants
When we look for the employees directly reporting to the manager, we need to know two things:
- The node value of the manager or parent
- The level of the employee under the manager
For this task, we can use the code below. The output is the list of the crew under the Hotel Director.
-- Get the list of crew directly reporting to the Hotel Director DECLARE @Node HIERARCHYID = 0x78 -- the Hotel Director's node/hierarchyid DECLARE @Level SMALLINT = @Node.GetLevel() SELECT a.CrewName ,a.DateHired ,b.Rank ,b.RankLevel ,c.VesselName ,(SELECT Rank FROM dbo.Ranks WHERE RankNode = b.RankNode.GetAncestor(1)) AS ReportsTo FROM dbo.Crew a INNER JOIN dbo.Ranks b ON a.RankId = b.RankId INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId WHERE b.RankNode.IsDescendantOf(@Node)=1 AND b.RankLevel = @Level + 1 -- add 1 for the level of the crew under the -- Hotel Director
The result of the above code is as follows in Figure 5:
Query for Subtrees
Sometimes, you also need to list the children and the children’s children down to the bottom. To do this, you need to have the hierarchyID of the parent.
The query will be similar to the previous code but without the need to get the level. See the code example:
-- Get the list of the crew under the Hotel Director down to the lowest level DECLARE @Node HIERARCHYID = 0x78 -- the Hotel Director's node/hierarchyid SELECT a.CrewName ,a.DateHired ,b.Rank ,b.RankLevel ,c.VesselName ,(SELECT Rank FROM dbo.Ranks WHERE RankNode = b.RankNode.GetAncestor(1)) AS ReportsTo FROM dbo.Crew a INNER JOIN dbo.Ranks b ON a.RankId = b.RankId INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId WHERE b.RankNode.IsDescendantOf(@Node)=1
The result of the above code:
Moving Nodes with SQL Server HierarchyID
Another standard operation with hierarchical data is moving a child or an entire subtree to another parent. However, before we proceed, please note one potential problem:
- First, moving nodes involve I/O. How frequent you move nodes can be the deciding factor if you use hierarchyID or the usual parent/child.
- Second, moving a node in a parent/child design updates one row. At the same time, when you move a node with hierarchyID, it updates one or more rows. The number of rows affected depends on the hierarchy level depth. It can turn into a significant performance problem.
You may handle this problem with your database design.
Let’s consider the design we used here.
Instead of defining the hierarchy on the Crew table, we defined it in the Ranks table. This approach differs from the Employee table in the AdventureWorks sample database, and it offers the following advantages:
- The crew members move more often than the ranks in a vessel. This design will reduce the movements of nodes in the hierarchy. As a result, it minimizes the problem defined above.
- Defining more than one hierarchy in the Crew table is more complicated, as two vessels need two captains. The result is two root nodes.
- If you need to display all ranks with the corresponding crew member, you can use a LEFT JOIN. If no one is on board for that rank, it shows an empty slot for the position.
Now, let’s move on to the objective of this section. Add child nodes under the wrong parents.
To visualize what we are about to do, imagine a hierarchy like the one below. Note the yellow nodes.
Move a Node with No Children
Moving a child node requires the following:
- Define the hierarchyID of the child node to move.
- Define the old parent’s hierarchyID.
- Define the new parent’s hierarchyID.
- Use UPDATE with GetReparentedValue() to move the node physically.
Start by moving a node with no children. In the example below, we move the Cruise Staff from under the Cruise Director to under the Asst. Cruise Director.
-- Moving a node with no child node DECLARE @NodeToMove HIERARCHYID DECLARE @OldParent HIERARCHYID DECLARE @NewParent HIERARCHYID SELECT @NodeToMove = r.RankNode FROM dbo.Ranks r WHERE r.RankId = 24 -- the cruise staff SELECT @OldParent = @NodeToMove.GetAncestor(1) SELECT @NewParent = r.RankNode FROM dbo.Ranks r WHERE r.RankId = 19 -- the assistant cruise director UPDATE dbo.Ranks SET RankNode = @NodeToMove.GetReparentedValue(@OldParent,@NewParent) WHERE RankNode = @NodeToMove
Once the node is updated, a new hex value will be used for the node. Refreshing my Power BI connection to SQL Server – it will change the hierarchy chart as shown below:
In Figure 8, the Cruise staff no longer reports to the Cruise Director – it is changed to report to the Assistant Cruise Director. Compare it with Figure 7 above.
Now, let’s proceed to the next stage and move the Head Waiter to the Assistant F&B Manager.
Move a Node with Children
There is a challenge in this part.
The thing is, the previous code won’t work with a node with even one child. We remember that moving a node requires to update one or more children nodes.
Further, it doesn’t end there. If the new parent has an existing child, we might bump into duplicate node values.
In this example, we have to face that problem: the Asst. F&B Manager has a Bartender child node.
Ready? Here’s the code:
-- Move a node with at least one child DECLARE @NodeToMove HIERARCHYID DECLARE @OldParent HIERARCHYID DECLARE @NewParent HIERARCHYID SELECT @NodeToMove = r.RankNode FROM dbo.Ranks r WHERE r.RankId = 22 -- the head waiter SELECT @OldParent = @NodeToMove.GetAncestor(1) -- head waiter's old parent --> asst chief housekeeping SELECT @NewParent = r.RankNode FROM dbo.Ranks r WHERE r.RankId = 14 -- the assistant f&b manager DECLARE children_cursor CURSOR FOR SELECT RankNode FROM dbo.Ranks r WHERE RankNode.GetAncestor(1) = @OldParent; DECLARE @ChildId hierarchyid; OPEN children_cursor FETCH NEXT FROM children_cursor INTO @ChildId; WHILE @@FETCH_STATUS = 0 BEGIN START: DECLARE @NewId hierarchyid; SELECT @NewId = @NewParent.GetDescendant(MAX(RankNode), NULL) FROM dbo.Ranks r WHERE RankNode.GetAncestor(1) = @NewParent; -- ensure --to get a new id in case there's a --sibling UPDATE dbo.Ranks SET RankNode = RankNode.GetReparentedValue(@ChildId, @NewId) WHERE RankNode.IsDescendantOf(@ChildId) = 1; IF @@error <> 0 GOTO START -- On error, retry FETCH NEXT FROM children_cursor INTO @ChildId; END CLOSE children_cursor; DEALLOCATE children_cursor;
In the above code example, the iteration starts as the need to transfer the node down to the child at the last level.
After you run it, the Ranks table will be updated. And again, if you want to see the changes visually, refresh the Power BI report. You will see the changes similar to the one below:
Benefits of Using SQL Server HierarchyID vs. Parent/Child
To convince anyone to use a feature, we need to know the benefits.
Thus, in this section, we will compare statements using the same tables like those from the beginning. One will use hierarchyID, and the other one will use the parent/child approach. The result set will be the same for both approaches. We expect it for this exercise as that one from Figure 6 above.
Now that the requirements are precise let’s examine the benefits thoroughly.
Simpler to Code
See the code below:
-- List down all the crew under the Hotel Director using hierarchyID SELECT a.CrewName ,a.DateHired ,b.Rank ,b.RankLevel ,c.VesselName ,d.RANK AS ReportsTo FROM dbo.Crew a INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId INNER JOIN dbo.Ranks b ON a.RankId = b.RankId INNER JOIN dbo.Ranks d ON d.RankNode = b.RankNode.GetAncestor(1) WHERE a.VesselId = 1 AND b.RankNode.IsDescendantOf(0x78)=1
This sample needs a hierarchyID value only. You can change the value at will without changing the query.
Now, compare the statement for the parent/child approach producing the same result set:
-- List down all the crew under the Hotel Director using parent/child SELECT a.CrewName ,a.DateHired ,b.Rank ,b.RankLevel ,c.VesselName ,d.Rank AS ReportsTo FROM dbo.Crew a INNER JOIN dbo.Vessel c ON a.VesselId = c.VesselId INNER JOIN dbo.Ranks b ON a.RankId = b.RankId INNER JOIN dbo.Ranks d ON b.RankParentId = d.RankId WHERE a.VesselId = 1 AND (b.RankID = 4) OR (b.RankParentID = 4 OR b.RankParentId >= 7)
What do you think? The code samples are almost the same except one point.
The WHERE clause in the second query will not be flexible to adapt if a different subtree is required.
Make the second query generic enough, and the code will be longer. Yikes!
According to Microsoft, “subtree queries are significantly faster with hierarchyID” compared to parent/child. Let’s see if it’s true.
We use the same queries as earlier. One significant metric to use for performance is the logical reads from the SET STATISTICS IO. It tells how many 8KB pages SQL Server will need to get the result set we want. The higher the value, the bigger the number of pages that SQL Server accesses and reads, and the slower the query runs. Execute SET STATISTICS IO ON and re-execute the two queries above. The lower value of the logical reads will be the winner.
As you can see in Figure 10, the I/O statistics for the query with hierarchyID have lower logical reads than their parent/child counterparts. Note the following points in this result:
- The Vessel table is the most notable of the three tables. Using hierarchyID requires only 2 * 8KB = 16KB of pages to be read by SQL Server from the cache (memory). Meanwhile, using parent/child requires 26 * 8KB = 208KB of pages – significantly higher than using hierarchyID.
- The Ranks table, which includes our definition of hierarchies, requires 92 * 8KB = 736KB. On the other hand, using parent/child requires 132 * 8KB = 1056KB.
- The Crew table needs 2 * 8KB = 16KB, which is the same for both approaches.
Kilobytes of pages may be a small value for now, but we only have a few records. However, it gives us an idea of how taxing our query will be on any server. To improve performance, you can do one or more of the following actions:
- Add appropriate index(es)
- Restructure the query
- Update statistics
If you did the above, and the logical reads decreased without adding more records, the performance would increase. As long as you make the logical reads lower than for the one using hierarchyID, that will be good news.
But why refer to logical reads instead of elapsed time?
Checking the elapsed time for both queries using SET STATISTICS TIME ON reveals a small number of millisecond differences for our small set of data. Also, your development server may have a different hardware configuration, SQL Server settings, and workload. An elapsed time of less than a millisecond may deceive you if your query is performing as fast as you expect or not.
SET STATISTICS IO ON does not reveal the things happening “behind the scenes.” In this section, we find out why SQL Server arrives with those numbers by looking at the execution plan.
Let’s start with the execution plan of the first query.
Now, look at the execution plan of the second query.
Comparing Figures 11 and 12, we see that SQL Server needs additional effort to produce the result set if you use the parent/child approach. The WHERE clause is responsible for this complication.
However, the fault may also be of the table design. We used the same table for both approaches: the Ranks table. So, I tried to duplicate the Ranks table but utilize different clustered indexes appropriate for each procedure.
In the result, using hierarchyID still had less logical reads compared to the parent/child counterpart. Finally, we proved that Microsoft was right claiming it.
Here the central aha moment for hierarchyID are:
- HierarchyID is a built-in data type designed for a more optimized representation of trees, which are the most common type of hierarchical data.
- Each item in the tree is a node, and hierarchyID values can be in hexadecimal or string format.
- HierarchyID is applicable for data of organizational structures, project tasks, geographical data, and the like.
- There are methods for traversing and manipulating hierarchical data, such as GetAncestor(), GetDescendant(). GetLevel(), GetReparentedValue(), and more.
- The conventional way to query hierarchical data is to get the direct descendants of a node or get the subtrees under a node.
- The usage of hierarchyID for querying subtrees is not only simpler to code. It also performs better than parent/child.
Parent/child design is not bad at all, and this post is not to diminish it. However, expanding the options and introducing new ideas is always a great benefit for a developer.
You can try the examples we offered here yourself. Receive the effects and see how you can apply it for your next project involving hierarchies.
If you like the post and its ideas, you may spread the word by clicking the sharing buttons for the preferred social media.
- 3 Nasty I/O Statistics That Lag SQL Query Performance - September 24, 2020
- How to Make Sense of SQL Server Geography Data Type - September 8, 2020
- How to Make Use of SQL Server Graph Database Features - September 4, 2020