How to Use Array Functionality in SQL Server?

Total: 1 Average: 5

Processing an array of values inside a procedure or function is a common requirement in most business cases. Since SQL Server does not support an array variable type, developers use a list of values (mostly in CSV) as input.

CodingSight - Working with arrays

Table Valued Parameters (TVP) Instead of Arrays

The SQL Server 2008 introduced a feature called table-valued parameters (TVP). It enabled users to combine values in a table and process them in table format. Stored procedures or functions could use this variable in join operations. It let them improve performance and avoid cursor-like operations item-by-item.

Although the table-based approach has been available for a long time, lists of values as parameters are widely used for the following two reasons:

  • Table-based variables are not supported by JDBC drivers. Java applications and services have to use comma-separated lists or structured formats like XML to transfer the list of values to the database server.
  • Legacy code is still in production and needs to be maintained, and migration projects are too costly for execution.

To solve this problem, specialists apply several approaches. Some of them include writing a .NET CLR function or using XML. However, as the execution of .NET CLR functions might not be available in all environments, and XML is usually not the fastest solution, I will focus on the other two common approaches. They are the numbers table and common table expression (CTE) methods.

 Split String into Array in SQL Server

The numbers table approach means that you must manually create a table containing enough rows so that the longest string you split will never exceed this number.

In this example, I am using 100.000 rows with a clustered index and compression on the generated column. This allows accelerating the data retrieval.

IF EXISTS (
	SELECT *
	FROM sys.objects
	WHERE object_id = OBJECT_ID(N'dbo.NumbersTest')
)
BEGIN
	DROP TABLE NumbersTest;
END;
 
CREATE TABLE NumbersTest (Number INT NOT NULL);
 
DECLARE @RunDate datetime = GETDATE()
DECLARE @i INT = 1;
 
WHILE @i <= 100000
BEGIN
	INSERT INTO dbo.NumbersTest(Number) VALUES (@i);
	SELECT @i = @i + 1;
END;
 
CREATE UNIQUE CLUSTERED INDEX n ON dbo.NumbersTest(Number) WITH (DATA_COMPRESSION = PAGE);
GO

Notice that index compression can only be used on Enterprise versions of SQL Server. Otherwise, you do not apply this option during the index creation.

With a newly created NumbersTest table, we can write a custom function performing an array splitting functionality:

CREATE FUNCTION dbo.Split_Numbers ( @List       NVARCHAR(MAX), @Delimiter  NVARCHAR(255) )
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
 SELECT Item = SUBSTRING(@List, Number,
                    	     CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)
 FROM dbo.NumbersTest
  WHERE Number <= CONVERT(INT, LEN(@List))
   AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter);

On the other hand, if we use the CTE method, it will not require a numbers table. Instead, we will use a recursive CTE to extract each part of the string from the “remainder” of the previous part.

CREATE FUNCTION dbo.Split_CTE (  @List NVARCHAR(MAX),
                                                                @Delimiter  NVARCHAR(255) )
RETURNS @Items TABLE (Item NVARCHAR(4000))
WITH SCHEMABINDING
AS
BEGIN
   DECLARE @ll INT = LEN(@List) + 1, @ld INT = LEN(@Delimiter);
   WITH a AS
   (
   	SELECT
       	[start] = 1,
       	[end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter,   @List, 1), 0), @ll),
       	[value] = SUBSTRING(@List, 1,
                        COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, 1), 0), @ll) - 1)
       UNION ALL
   	SELECT
       	[start] = CONVERT(INT, [end]) + @ld,
  [end]   = COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, [end] + @ld), 0), @ll),
       	[value] = SUBSTRING(@List, [end] + @ld, COALESCE(NULLIF(CHARINDEX(@Delimiter, @List, [end] + @ld), 0), @ll)-[end]-@ld)
   	FROM a
   	WHERE [end] < @ll
   )
   INSERT @Items SELECT [value]
   FROM a
   WHERE LEN([value]) > 0
   OPTION (MAXRECURSION 0);
 
   RETURN;
END

Now we need to test these two approaches. Let us start with a simple test case to check if they work properly:

DECLARE @Values NVARCHAR(MAX) =N'Value 1,Value 2,Value 3,Value 4,Value 5';
SELECT Item AS NumSplit FROM dbo.Split_Numbers (@Values, N',');
SELECT Item AS CTESplit FROM dbo.Split_CTE (@Values, N',');
The numbers table approach

The Execution plan is below. As we see, functions use different methods to achieve the same result.

The numbers table approach

Since this is a very small dataset, performance cannot be compared. Let us do a test on a bigger set of values.

First, we need to create a table with test values. I will use the TestData table and populate it with different records, marking them by different types, depending on the category they are assigned to:

CREATE TABLE dbo.TestData
(
  string_typ   INT,
  string_val   NVARCHAR(MAX)
);
CREATE CLUSTERED INDEX st ON dbo.TestData(string_len);
 
CREATE TABLE #Temp(st NVARCHAR(MAX));
INSERT #Temp
SELECT N'a,va,val,value,value1,valu,va,value12,,valu,value123,value1234';
GO
 
INSERT dbo.TestData SELECT 1, st FROM #Temp;
GO 10000
INSERT dbo.TestData SELECT 2, REPLICATE(st,10) FROM #Temp;
GO 1000
INSERT dbo.TestData SELECT 3, REPLICATE(st,100) FROM #Temp;
GO 100
INSERT dbo.TestData SELECT 4, REPLICATE(st,1000) FROM #Temp;
GO 10
INSERT dbo.TestData SELECT 5, REPLICATE(st,10000) FROM #Temp;
GO
 
DROP TABLE #Temp;
GO
 -- Cleanup up the trailing comma
UPDATE dbo.TestData
   	SET string_val = SUBSTRING(string_val, 1, LEN(string_val)-1) + 'x';

With the test data available and split functions ready, we can try to test the case on bigger datasets to see how different functions perform (for each type of data, in seconds).

SELECT func.Item
  FROM dbo.TestData AS tst
   	CROSS APPLY dbo.Split_CTE(tst.string_val, ',') AS func
  WHERE tst.string_typ = 1; -- Values of string_typ are from 1-5

SELECT func.Item
  FROM dbo.TestData AS tst
   	CROSS APPLY dbo.Split_Numbers(tst.string_val, ',') AS func
  WHERE tst.string_typ = 1; -- Values of string_typ are from 1-5
 STRING_TYP Split_CTE function (s) Split_Numbers function ( s)
 1 1 4
 2 1 4
 3 1 10
 4 4 57
 5 18 956

As the results show, with the strings getting larger, the advantage of the CTE method increases. It should be used over the table numbers approach.

Also, take into account that results depend on the hardware available on your server machine.

String Split Functions in MS SQL

The SQL Server 2016 provided a new built-in function STRING_SPLIT. This function changes the delimited string into a single-column table with two inputs, the string to be split into values and the separator character. It returns one column named value.

Have a look at the example below:

SELECT *
  FROM string_split('Value 1,Value 2,Value 3,Value 4,Value 5',',' );

The execution plan for this query is bellow:

Using the List of Values with STRING_SPLIT Function

The Estimated Number of Rows Per Execution value is always 50. It does not depend on the number of string elements.

If we have the user-defined table-valued functions, the Estimated Number of Rows is 100.

As a table-valued function, it can also be used in FROM and WHERE expressions, and wherever a table expression is supported.

For example, we will use the AdventureWorks2019 database to demonstrate the usage of string_split in JOIN statements:

USE AdventureWorks2019;
DECLARE @Persons NVARCHAR(4000) = 'Miller,Margheim,Galvin,Duffy,Khanna';
 
SELECT PersonType, FirstName, MiddleName, LastName
FROM PErson.Person
WHERE LastName IN ( SELECT value FROM string_split(@Persons,',') );
AdventureWorks2019 database to demonstrate the usage of string_split in JOIN statements

If you are working on a newer version of SQL Server, using the list of values with STRING_SPLIT function is optimal. It is optimized and easy to use, without the potential third-to-find bugs of some custom solutions.

However, there are some limitations:

  • Single-character separator is the only type accepted. If you need more characters, you have to use a custom function.
  • Single-output column – the output is always a single-column table without positioning the string element within the delimited string. It allows sorting only by the element name.
  • String data type – you use this function to delimit a string of numbers (although all values in the output column are numbers, their data type is a string). When you join them to numeric columns in other tables, the data type conversion is required. If you forget to do explicit casts, you might get unexpected results.

If these limitations are acceptable to you, feel free to apply the function.

Conclusion

The modern methods of processing arrays allow doing the required tasks appropriately. In my work, I would always suggest a built-in function rather than a user-defined one if they are similar from a performance point of view. It is always pre-deployed and available in all databases.

Besides, the software tools available for working with SQL servers help significantly. They provide the necessary values faster and can automate lots of tasks that often consume that time you might use better. For instance, dbForge SQL Complete offers a handy feature of getting the aggregates’ values for the selected data in the SSMS Results Grid (MIN, MAX, AVG, CONT, etc.).

Latest posts by Josip Saban (see all)

Josip Saban

An experienced database developer and software manager with more than 15 years in the field, both in corporate and start-up environments. Josip has strong problem-solving skills and a proven track-record of successfully implementing strategic solutions to meet changing business needs, combined with the ability to build key relationships and communicate well with stakeholders, most of it in finance industry. Focused on Microsoft tool stack, specifically SQL Server, especially in data warehouse modelling and development. from version 2000 to latest versions. He is interested in database architect and engineering management positions, with strong affinity to data-driven industries.