Written by 02:18 Languages & Coding, T-SQL • One Comment

T-SQL SET Operators Part 1: UNION and UNION ALL

In SQL Server, we can combine the same type of data from multiple tables using SET operators. After combining multiple SQL statements, it returns one result set. Following is the list of T-SQL SET operators:

  1. UNION
  2. UNION ALL
  3. INTERSECT
  4. EXCEPT

To use SET operators, we must follow a number of rules:

  1. The result set of both queries must have the same number of columns.
  2. The data type of columns retrieved by the top and bottom queries must be the same.
  3. If we want to sort the final result set, the ORDER BY clause must be at the end of the query.
  4. The positional ordering of the columns returned by the top and bottom queries must be same.

In this article, I am going to explain the following:

  1. UNION and UNION ALL operator.
  2. Difference between UNION and UNION ALL.
  3. Performance comparison between UNION and UNION ALL.
  4. Performance comparison of UNION and UNION ALL with SELECT Distinct.

What is UNION

UNION is one of the SET operators. The UNION operator combines results generated by multiple SQL queries or multiple tables and returns a single result set. The final result set contains all the rows returned by all the queries in the UNION, and duplicate rows are removed.

Following is the syntax of the UNION operator.

SELECT COLUMN1, 
       COLUMN2, 
       COLUMN3, 
       COLUMN4..FROM TABLE1 
UNION 
SELECT COLUMN1, 
       COLUMN2, 
       COLUMN3, 
       COLUMN4..FROM TABLE2 

What is UNION ALL

UNION All is also the SET operators. Similar to UNION, it combines results generated by multiple SQL queries or tables and returns a single result set. The final result set contains all the rows returned by all the queries in the UNION ALL, but it also contains duplicate records. The following image illustrates the UNION ALL.

Following is the syntax of the UNION ALL operator.

SELECT COLUMN1, 
       COLUMN2, 
       COLUMN3, 
       COLUMN4.FROM TABLE1 
UNION ALL
SELECT COLUMN1, 
       COLUMN2, 
       COLUMN3, 
       COLUMN4.FROM TABLE2 

Difference between UNION and UNION ALL

  1. UNION retrieves only distinct records from all queries or tables, whereas UNION ALL returns all the records retrieved by queries.
  2. Performance of UNION ALL is higher than UNION.

In following the demonstration, I will briefly explain the difference between UNION and UNION ALL.

Prepare Demo Setup

To demonstrate the syntax of the UNION and UNION ALL operators, I have created the following setup.

Firstly, create two tables named STUDENT_ GRADE_A and STUDENT _GRADE_B in DemoDatabase. To do that, execute the following query:

CREATE TABLE STUDENT_GRADE_A 
  ( 
     ID          INT IDENTITY(1, 1), 
     STUDENTNAME VARCHAR(50), 
     GRADE       CHAR(1), 
     PERCENTAGE  INT 
  ) 
GO 

CREATE TABLE STUDENT_GRADE_B 
  ( 
     ID          INT IDENTITY(1, 1), 
     STUDENTNAME VARCHAR(50), 
     GRADE       CHAR(1), 
     PERCENTAGE  INT 
  ) 

GO 

Add some dummy data by executing the following query:

INSERT INTO STUDENT_GRADE_A 
VALUES      ('KEN J SÁNCHEZ', 
             'A', 
             90), 
            ('TERRI LEE DUFFY', 
             'A', 
             80), 
            ('ROBERTO TAMBURELLO', 
             'B', 
             55), 
            ('ROB WALTERS', 
             'B', 
             60) 

GO 

INSERT INTO STUDENT_GRADE_B 
VALUES      ('GAIL A ERICKSON', 
             'A', 
             90), 
            ('JOSSEF H GOLDBERG', 
             'A', 
             50), 
            ('DIANE L MARGHEIM', 
             'B', 
             60), 
            ('GIGI N MATTHEW', 
             'C', 
             35) 

GO 

Execute the following query to see the data in both tables:

Data in both tables

Now, let us combine the result set of both queries using UNION. To do that, execute the following query:

USE DEMODATABASE 
GO 

SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION 
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_B 

Following is the output:

output

As you can see in the above image, UNION returned 6 rows instead of 8, that means it combined the output of both queries, but it removed duplicate records.

Now let’s take a look at an execution plan of the above query. Following is a screenshot of the execution plan.

Execution plan

As you can see, the UNION operator first combines the output generated by both queries using the concatenation operator (Red Box) and then it performs the distinct operation (green box) on the result set.

Now, let’s join both the tables using UNION ALL. To do that, execute the following query.

USE DEMODATABASE 
GO 

SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION ALL 
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_B 

As I explained above, UNION ALL returns matching records and duplicate records.  Following is the output:

output

As you can see on an above screenshot, the query returned 8 rows, and the final result set contains duplicate records.

Now let’s take a look at an execution plan of the above query. Following is a screenshot of the execution plan.

execution plan

As you can see, the UNION ALL operator combines the output generated by both queries using the concatenation operator (Red Box).

Performance comparison of UNION and UNION ALL

Now as I mentioned, the UNION operator combines the results and performs distinct sort when generating the final result set whereas UNION ALL combines the result set of both queries or tables. So, when we use UNION ALL to combine the result sets, it gives the faster result.

To demonstrate that, execute the following queries:

USE DEMODATABASE 
GO 

SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION 
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_B 

SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION ALL 
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_B 

Following is the execution plan of the above queries:

execution plan

As you can see in the above image:

  1. UNION performs expensive distinct SORT operation which reduces the performances. The query cost relative to the batch is 73%.
  2. UNION ALL does not perform a distinct sort. The query cost relative to the batch is 27%.

Now let’s try to perform UNION ALL on the result set generated by  SELECT DISTINCT and compare its execution plan. To do that, execute the following query:

/*Query with UNION*/
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION 
SELECT GRADE, 
       PERCENTAGE 
FROM   STUDENT_GRADE_B 
GO 
/*Query with UNION All and Select Distinct*/
SELECT DISTINCT GRADE, 
                PERCENTAGE 
FROM   STUDENT_GRADE_A 
UNION ALL 
SELECT DISTINCT GRADE, 
                PERCENTAGE 
FROM   STUDENT_GRADE_B 

Following is the execution plan:

execution plan

As you can see in the above image:

  1. UNION: query cost relative to the batch is 38%.
  2. UNION ALL with Select distinct: query cost relative to the batch is 62%.

So, combining UNION ALL with SELECT DISTINCT performs two distinct sorts, but this does not give performance benefits, in fact, it reduces the performance.

The above scenario proves that:

  1. UNION ALL is faster and more optimized than UNION. But we cannot use it in all scenarios.
  2. UNION ALL with SELECT DISTINCT is not equivalent to UNION.

Summary

In this article, I have covered:

  1. T-SQL SET operators.
  2. What are UNION and UNION ALL
  3. Performance comparison of UNION and UNION ALL.
Tags: , , Last modified: September 22, 2021
Close