Having a hard time with SQL UNION? It happens if the results you combined put your SQL Server into a standstill. Or a report that’s been working before pops up a box with a red X icon. An “Operand type clash” error occurs pointing to a line with UNION. The “fire” starts. Sounds familiar?
Whether you’ve been using SQL UNION for a while or just start it out, a cheat sheet or a concise set of notes won’t hurt. This is what you are going to get today in this post. This list offers 10 useful tips for both newbies and veterans. Also, there will be examples and some advanced discussions.
[sendpulse-form id=”11900″]
But before we get into the first point, let’s clarify the terms.
UNION is one of the set operators in SQL that combines 2 or more result sets. It may come in handy when you need to combine names, monthly stats, and more from different sources. And whether you use SQL Server, MySQL, or Oracle, the purpose, behavior, and syntax will be very similar. But how does it work?
1. Use SQL UNION to Combine Unique Records
Using UNION to combine result sets removes duplicates.
Why Is This Important?
Most of the time, you don’t want results with duplicates. A report with duplicate lines wastes ink and paper in hardcopies. And this will anger your users.
How to Use It
You combine the results of the SELECT statements with UNION in between.
Before we begin with the example, let’s prepare our sample data.
USE AdventureWorks
GO
IF OBJECT_ID ('dbo.Customer1', 'U') IS NOT NULL
DROP TABLE dbo.Customer1;
GO
IF OBJECT_ID ('dbo.Customer2', 'U') IS NOT NULL
DROP TABLE dbo.Customer2;
GO
IF OBJECT_ID ('dbo.Customer3', 'U') IS NOT NULL
DROP TABLE dbo.Customer3;
GO
-- Get 3 customer names with Andersen lastname
SELECT TOP 3
p.LastName
, p.FirstName
, c.AccountNumber
INTO dbo.Customer1
FROM Person.Person AS p
INNER JOIN Sales.Customer c
ON c.PersonID = p.BusinessEntityID
WHERE p.LastName = 'Andersen';
-- Make sure we have a duplicate in another table
SELECT
c.LastName
,c.FirstName
,c.AccountNumber
INTO dbo.Customer2
FROM Customer1 c
-- Seems it's not enough. Let's have a 3rd copy
SELECT
c.LastName
,c.FirstName
,c.AccountNumber
INTO dbo.Customer3
FROM Customer1 c
We’ll use the data generated by the above code until the third tip. Now that we’re ready, below is the example:
SELECT
c.LastName
,c.FirstName
,c.AccountNumber
FROM dbo.Customer1 c
UNION
SELECT
c2.LastName
,c2.FirstName
,c2.AccountNumber
FROM dbo.Customer2 c2
UNION
SELECT
c3.LastName
,c3.FirstName
,c3.AccountNumber
FROM dbo.Customer3 c3
We have 3 copies of the same customers’ names and expect that unique records will disappear. See the results:
The dbForge Studio for SQL Server solution we use for our examples shows only 3 records. It could have been 9. By applying UNION, we removed the duplicates.
How Does it Work?
The Plan Diagram in dbForge Studio reveals how SQL Server produces the result shown in Figure 1. Have a look:
To interpret Figure 2, start from right to left:
- We retrieved 3 records from each Table Scan operator. Those are the 3 SELECT statements from the example above. Each line going out of it shows ‘3’ meaning 3 records each.
- The Concatenation operator does the combining of results. The line going out of it shows ‘9’ – an output of 9 records from the combining of results.
- The Distinct Sort operator ensures unique records to be the final output. The line going out of it shows ‘3,’ which is consistent with the number of records in Figure 1.
The above diagram shows how UNION is processed by SQL Server. The number and type of operators used may be different depending on the query and the underlying data source. But in summary, a UNION works as follows:
- Retrieve the results of each SELECT statement.
- Combine the results with a Concatenation operator.
- If the combined results are not unique, SQL Server will filter off the duplicates.
All successful examples with UNION follow these basic steps.
2. Use SQL UNION ALL to Combine Records with Duplicates
Using UNION ALL combines result sets with duplicates included.
Why Is This Important?
You may want to combine result sets and then get the records with duplicates for processing later. This task is useful for cleaning up your data.
How to Use It
You combine the results of the SELECT statements with UNION ALL in between. Have a look at the example:
SELECT
c.LastName
,c.FirstName
,c.AccountNumber
FROM dbo.Customer1 c
UNION ALL
SELECT
c2.LastName
,c2.FirstName
,c2.AccountNumber
FROM dbo.Customer2 c2
UNION ALL
SELECT
c3.LastName
,c3.FirstName
,c3.AccountNumber
FROM dbo.Customer3 c3
The above code outputs 9 records as shown in Figure 3:
How Does it Work?
Like before, we use the Plan diagram to know how this works:
Except for the Sort Distinct in Figure 2, the diagram above is the same. That’s fitting because we don’t want to filter out the duplicates.
The above diagram shows how UNION ALL works. In summary, these are the steps that SQL Server will follow:
- Retrieve the results of each SELECT statement.
- Then, combine the results with a Concatenation operator.
Successful examples with UNION ALL follow this pattern.
3. You Can Mix SQL UNION and UNION ALL but Group Them with Parentheses
You can mix the use of UNION and UNION ALL in at least three SELECT statements.
How to Use It?
You combine results of the SELECT statements with either UNION or UNION ALL in between. Parentheses group the results coming together. Let’s use the same data for the next example:
SELECT
c.LastName
,c.FirstName
,c.AccountNumber
FROM dbo.Customer1 c
UNION ALL
(
SELECT
c2.LastName
,c2.FirstName
,c2.AccountNumber
FROM dbo.Customer2 c2
UNION
SELECT
c3.LastName
,c3.FirstName
,c3.AccountNumber
FROM dbo.Customer3 c3
)
The above example combines the results of the last two SELECT statements without duplicates. Then, it combines that with the result of the first SELECT statement. The result is in Figure 5 below:
4. Columns of Each SELECT Statement Should Have Compatible Data Types
Columns in each SELECT statement that uses UNION can have different data types. It’s acceptable as long as they are compatible and allow implicit conversion over them. The final data type of the combined results will use the data type with the highest precedence. Also, the basis of the final data size is the data with the largest size. In the case of strings, it will use the data with the greatest number of characters.
Why Is This Important?
If you need to insert the result of UNIONs to a table, the final data type and size will determine if it fits on the target table column or not. If not, an error will occur. For example, one of the columns in the UNION has a final type of NVARCHAR(50). If the target table column is VARCHAR(50), then you can’t insert it into the table.
How Does it Work?
There’s no better way to explain it than an example:
SELECT N'김지수' AS FullName
UNION
SELECT N'김제니' AS KoreanName
UNION
SELECT N'박채영' AS KoreanName
UNION
SELECT N'ลลิษา มโนบาล' AS ThaiName
UNION
SELECT 'Kim Ji-soo' AS EnglishName
UNION
SELECT 'Jennie Kim' AS EnglishName
UNION
SELECT 'Roseanne Park' AS EnglishName
UNION
SELECT 'Lalisa Manoban' AS EnglishName
The sample above contains data with English, Korean, and Thai character names. Thai and Korean are Unicode characters. English characters are not. So, what do you think will the final data type and size will be? dbForge Studio shows it in the result set:
Did you notice the final data type in Figure 6? It can’t be VARCHAR because of the Unicode characters. So, it has to be NVARCHAR. Meanwhile, the size can’t be less than 14 because the data with the greatest number of characters has 14 characters. See the captions in red in Figure 6. It’s good to include the data type and size in the column header in dbForge Studio.
It is the case not just for string data types. It also applies to numbers and dates. Meanwhile, if you try to combine data with incompatible data types, an error will occur. See the example below:
SELECT CAST('12/25/2020' AS DATE) AS col1
UNION
SELECT CAST('10' AS INT) AS col1
We can’t combine dates and integers into one column. So, expect an error like the one below:
5. The Column Names of the Combined Results Will Use the Column Names of the First SELECT Statement
This issue relates to the previous tip. Notice the column names in the code in Tip #4. There are different column names in each SELECT statement. However, we saw the final column name in the combined result in Figure 6 earlier. Thus, the basis is the column name of the first SELECT statement.
Why Is This Important?
This can be handy when you need to dump the result of the UNION in a temporary table. If you need to refer to its column names in the succeeding statements, you need to be certain of the names. Unless you are using an advanced code editor with IntelliSense, you’re up for another error in your T-SQL code.
How Does it Work?
See Figure 8 for clearer results of using dbForge Studio:
6. Add ORDER BY in the Last SELECT Statement with SQL UNION to Sort the Results
You need to sort the combined results. In a series of SELECT statements with UNION in between, you can do it with the ORDER BY clause in the last SELECT statement.
Why Is This Important?
Users want to sort the data the way they prefer in apps, web pages, reports, spreadsheets, and more.
How to Use It
Use ORDER BY in the last SELECT statement. Here’s an example:
SELECT
e.BusinessEntityID
,p.FirstName
,p.MiddleName
,p.LastName
,'Employee' AS PersonType
FROM HumanResources.Employee e
INNER JOIN Person.Person p ON e.BusinessEntityID = p.BusinessEntityID
UNION
SELECT
c.PersonID
,p.FirstName
,p.MiddleName
,p.LastName
,'Customer' AS PersonType
FROM Sales.Customer c
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
ORDER BY p.LastName, p.FirstName
The sample above makes it look like the sorting only happens in the last SELECT statement. But it’s not. It will work for the combined result. You’ll be in trouble if you place it in each SELECT statement. See the result:
Without ORDER BY, the result set will have all Employee PersonType first followed by all Customer PersonType. However, Figure 9 demonstrates that names become the sort order of the combined result.
If you try to place ORDER BY in each SELECT statement to sort, here’s what will happen:
Did you see the squiggly line in Figure 10? It’s a warning. If you didn’t notice it and went on, an error will appear in the Error list window of dbForge Studio.
7. WHERE and GROUP BY Clauses Can be Used in Each SELECT Statement with SQL UNION
ORDER BY clause doesn’t work in each SELECT statement with UNION in between. However, WHERE and GROUP BY clauses work.
Why Is This Important?
You may want to combine the results of different queries that filter, count, or summarize data. For example, you can do this to get the total sales orders for January 2012 and compare it to January 2013, January 2014, and so on.
How to Use It
Place the WHERE and/or GROUP BY clauses in each SELECT statement. Check out the example below:
USE AdventureWorks
GO
-- Get the number of orders for January 2012, 2013, 2014 for comparison
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2012' AND '01/31/2012'
GROUP BY YEAR(soh.OrderDate)
UNION
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2013' AND '01/31/2013'
GROUP BY YEAR(soh.OrderDate)
UNION
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2014' AND '01/31/2014'
GROUP BY YEAR(soh.OrderDate)
The code above combines the number of January orders for three consecutive years. Now, check the output:
This example shows that it’s possible to use WHERE and GROUP BY in each of the three SELECT statements with UNION.
8. SELECT INTO Works with SQL UNION
When you need to insert the results of a query with SQL UNION into a table, you can do so by using SELECT INTO.
Why Is This Important?
There will be times when you need to put the results of a query with UNION into a table for further processing.
How to Use It
Place the INTO clause in the first SELECT statement. Here’s an example:
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
INTO JanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2012' AND '01/31/2012'
GROUP BY YEAR(soh.OrderDate)
UNION
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2013' AND '01/31/2013'
GROUP BY YEAR(soh.OrderDate)
UNION
SELECT
YEAR(soh.OrderDate) AS OrderYear
,COUNT(*) AS NumberOfJanuaryOrders
FROM Sales.SalesOrderHeader soh
WHERE soh.OrderDate BETWEEN '01/01/2014' AND '01/31/2014'
GROUP BY YEAR(soh.OrderDate)
Remember to place only one INTO clause into the first SELECT statement.
How Does it Work
SQL Server follows the pattern of processing UNION. Then, it inserts the result into the table specified in the INTO clause.
9. Differentiate SQL UNION from SQL JOIN
Both SQL UNION and SQL JOIN combine the table data, but the difference in syntax and results is like night and day.
Why Is This Important?
If your report or any requirement needs a JOIN but you did a UNION, the output will be wrong.
How SQL UNION and SQL JOIN are Used
It’s SQL UNION vs. JOIN. This is one of the related search queries and questions a newbie does in Google when learning about SQL UNION. Here’s the table of differences:
SQL UNION | SQL JOIN | |
What is combined | Rows | Columns (using a key) |
Number of columns per table | The same for all tables | Variable (Zero to all columns/table) |
In all projects I’ve been with, SQL JOIN applies most of the time. I only had a few cases that used SQL UNION. But as you have seen so far, the SQL UNION is far from useless.
10. SQL UNION ALL is Faster Than UNION
The Plan Diagrams in Figure 2 and Figure 4 earlier suggests that UNION requires an extra operator to ensure unique results. That’s why UNION ALL is faster.
Why Is This Important?
You, your users, your customers, your boss, all want quick results. Knowing that UNION ALL is faster than UNION makes you wonder what to do if you need unique combined results. There’s one solution, as you will see later.
SQL UNION ALL vs. UNION Performance
Figure 2 and Figure 4 already gave you an idea of which is faster. But the code samples used are simple with a small result set. Let’s add some more comparisons using millions of records to make it compelling.
To begin with, let’s prepare the data:
SELECT TOP (2000000)
val = ROW_NUMBER() OVER (ORDER BY sod.SalesOrderDetailID)
INTO dbo.TestNumbers
FROM AdventureWorks.Sales.SalesOrderDetail sod
CROSS JOIN AdventureWorks.Sales.SalesOrderDetail sod2
That’s 2 million records. I hope that’s compelling enough. Now, let’s have the next two query samples below.
-- Using UNION ALL
SELECT
val
FROM TestNumbers tn
UNION ALL
SELECT
val
FROM TestNumbers tn
-- Using UNION
SELECT
val
FROM TestNumbers tn
UNION
SELECT
val
FROM TestNumbers tn
Let’s examine the processes involved in these queries starting with the faster one.
Plan Diagram Analysis
The diagram in Figure 12 looks typical of a UNION ALL process. However, the resulting is 4 million combined results. See the arrow going out of the Concatenation operator. Still, it is typically because it does not deal with the duplicates.
Now, let’s have the diagram of the UNION query in Figure 13:
This one is no longer typical. The plan becomes a parallel query plan to deal with the removal of duplicates in four million rows. The parallel query plan means SQL Server needs to divide the process by the number of processor cores available to it.
Let’s interpret it, starting from the right operators going to the left:
- Since we are combining a table to itself, SQL Server needs to retrieve it twice. See the two Table Scans with two million records each.
- Repartition Stream operators will control the distribution of each row to the next available thread.
- Concatenation doubles the result to four million. This is still considering the number of processor cores.
- A Hash Match applies to remove the duplicates. This is an expensive process with a 65.8% operator cost. As a result, two million records got discarded.
- Gather Streams recombine the results done in each processor core or thread into one.
That’s too much work even though the process is divided into multiple threads. Therefore, you will conclude that it will run slower. But what if there is a solution to get unique records with UNION ALL but faster than this?
Unique Results but Faster Fix with UNION ALL – How?
I won’t make you wait. Here’s the code:
SELECT DISTINCT
val
FROM
(
SELECT
val
FROM TestNumbers tn
UNION ALL
SELECT
val
FROM TestNumbers tn
) AS uniqtn
This can be a lame solution. But check out its Plan Diagram in Figure 14:
So, what made it better? If you compare it to Figure 13, you see the Repartition Stream operators gone. However, it still utilizes multiple threads to get the job done. On the other hand, it implies that the query optimizer deems this process simpler to do than the query using UNION.
Can we safely conclude that we should avoid using UNION and use this approach instead? Not at all! Always check the execution plan diagram! It always depends on what you want SQL Server to give you. This one only shows that if you bump into a performance wall, you need to change your query approach.
How About I/O Statistics?
We can’t dismiss how much resources SQL Server needs to process our query examples. That’s why we also need to examine their STATISTICS IO. Comparing the three queries above, we get the logical reads below:
From Figure 15, we can still conclude that UNION ALL is faster than UNION though the logical reads are the same. The presence of Worktable and Workfile shows using tempdb to get the job done. Meanwhile, when we use SELECT DISTINCT from a derived table with UNION ALL, the tempdb usage is less compared to UNION. This reconfirms further that our analysis from the Plan Diagrams earlier is correct.
How About Time Statistics?
Though elapsed time can change in every execution we do to the same queries, it can give us some idea and add more evidence to our analysis. dbForge Studio displays the time differences of the three queries above. This comparison is consistent with the previous analysis we did.
Conclusion
We covered a lot of background to provide what you need to use SQL UNION and UNION ALL. You may not remember everything after reading this post, so be sure to bookmark this page.
If you like the post, feel free to share it on social media.
Tags: sql operator, sql union, t-sql Last modified: September 17, 2021