In SQL Server, we can combine the same type of data from multiple tables using SET operators. After combining multiple SQL statements, it returns one result set. Following is the list of T-SQL SET operators:
- UNION
- UNION ALL
- INTERSECT
- EXCEPT
To use SET operators, we must follow a number of rules:
- The result set of both queries must have the same number of columns.
- The data type of columns retrieved by the top and bottom queries must be the same.
- If we want to sort the final result set, the ORDER BY clause must be at the end of the query.
- The positional ordering of the columns returned by the top and bottom queries must be same.
In this article, I am going to explain the following:
- UNION and UNION ALL operator.
- Difference between UNION and UNION ALL.
- Performance comparison between UNION and UNION ALL.
- Performance comparison of UNION and UNION ALL with SELECT Distinct.
What is UNION
UNION is one of the SET operators. The UNION operator combines results generated by multiple SQL queries or multiple tables and returns a single result set. The final result set contains all the rows returned by all the queries in the UNION, and duplicate rows are removed.
Following is the syntax of the UNION operator.
SELECT COLUMN1, COLUMN2, COLUMN3, COLUMN4..FROM TABLE1 UNION SELECT COLUMN1, COLUMN2, COLUMN3, COLUMN4..FROM TABLE2
What is UNION ALL
UNION All is also the SET operators. Similar to UNION, it combines results generated by multiple SQL queries or tables and returns a single result set. The final result set contains all the rows returned by all the queries in the UNION ALL, but it also contains duplicate records. The following image illustrates the UNION ALL.
Following is the syntax of the UNION ALL operator.
SELECT COLUMN1, COLUMN2, COLUMN3, COLUMN4.FROM TABLE1 UNION ALL SELECT COLUMN1, COLUMN2, COLUMN3, COLUMN4.FROM TABLE2
Difference between UNION and UNION ALL
- UNION retrieves only distinct records from all queries or tables, whereas UNION ALL returns all the records retrieved by queries.
- Performance of UNION ALL is higher than UNION.
In following the demonstration, I will briefly explain the difference between UNION and UNION ALL.
Prepare Demo Setup
To demonstrate the syntax of the UNION and UNION ALL operators, I have created the following setup.
Firstly, create two tables named STUDENT_ GRADE_A and STUDENT _GRADE_B in DemoDatabase. To do that, execute the following query:
CREATE TABLE STUDENT_GRADE_A ( ID INT IDENTITY(1, 1), STUDENTNAME VARCHAR(50), GRADE CHAR(1), PERCENTAGE INT ) GO CREATE TABLE STUDENT_GRADE_B ( ID INT IDENTITY(1, 1), STUDENTNAME VARCHAR(50), GRADE CHAR(1), PERCENTAGE INT ) GO
Add some dummy data by executing the following query:
INSERT INTO STUDENT_GRADE_A VALUES ('KEN J SÁNCHEZ', 'A', 90), ('TERRI LEE DUFFY', 'A', 80), ('ROBERTO TAMBURELLO', 'B', 55), ('ROB WALTERS', 'B', 60) GO INSERT INTO STUDENT_GRADE_B VALUES ('GAIL A ERICKSON', 'A', 90), ('JOSSEF H GOLDBERG', 'A', 50), ('DIANE L MARGHEIM', 'B', 60), ('GIGI N MATTHEW', 'C', 35) GO
Execute the following query to see the data in both tables:
Now, let us combine the result set of both queries using UNION. To do that, execute the following query:
USE DEMODATABASE GO SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_B
Following is the output:
As you can see in the above image, UNION returned 6 rows instead of 8, that means it combined the output of both queries, but it removed duplicate records.
Now let’s take a look at an execution plan of the above query. Following is a screenshot of the execution plan.
As you can see, the UNION operator first combines the output generated by both queries using the concatenation operator (Red Box) and then it performs the distinct operation (green box) on the result set.
Now, let’s join both the tables using UNION ALL. To do that, execute the following query.
USE DEMODATABASE GO SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION ALL SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_B
As I explained above, UNION ALL returns matching records and duplicate records. Following is the output:
As you can see on an above screenshot, the query returned 8 rows, and the final result set contains duplicate records.
Now let’s take a look at an execution plan of the above query. Following is a screenshot of the execution plan.
As you can see, the UNION ALL operator combines the output generated by both queries using the concatenation operator (Red Box).
Performance comparison of UNION and UNION ALL
Now as I mentioned, the UNION operator combines the results and performs distinct sort when generating the final result set whereas UNION ALL combines the result set of both queries or tables. So, when we use UNION ALL to combine the result sets, it gives the faster result.
To demonstrate that, execute the following queries:
USE DEMODATABASE GO SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_B SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION ALL SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_B
Following is the execution plan of the above queries:
As you can see in the above image:
- UNION performs expensive distinct SORT operation which reduces the performances. The query cost relative to the batch is 73%.
- UNION ALL does not perform a distinct sort. The query cost relative to the batch is 27%.
Now let’s try to perform UNION ALL on the result set generated by SELECT DISTINCT and compare its execution plan. To do that, execute the following query:
/*Query with UNION*/ SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION SELECT GRADE, PERCENTAGE FROM STUDENT_GRADE_B GO /*Query with UNION All and Select Distinct*/ SELECT DISTINCT GRADE, PERCENTAGE FROM STUDENT_GRADE_A UNION ALL SELECT DISTINCT GRADE, PERCENTAGE FROM STUDENT_GRADE_B
Following is the execution plan:
As you can see in the above image:
- UNION: query cost relative to the batch is 38%.
- UNION ALL with Select distinct: query cost relative to the batch is 62%.
So, combining UNION ALL with SELECT DISTINCT performs two distinct sorts, but this does not give performance benefits, in fact, it reduces the performance.
The above scenario proves that:
- UNION ALL is faster and more optimized than UNION. But we cannot use it in all scenarios.
- UNION ALL with SELECT DISTINCT is not equivalent to UNION.
Summary
In this article, I have covered:
- T-SQL SET operators.
- What are UNION and UNION ALL
- Performance comparison of UNION and UNION ALL.
[…] UNION is one of the set operators in SQL that combines 2 or more result sets. It may come in handy when you need to combine names, monthly stats, and more from different sources. And whether you use SQL Server, MySQL, or Oracle, the purpose, behavior, and syntax will be very similar. But how does it work? […]