Written by 08:12 SQL Server

The Difference Between UNION and UNION ALL

Twice in the last six months, I have had to get involved in the resolution of an incident associated with active database tables growing to the extent that they caused issues. Such incidents are easy to avoid by designing a mechanism to manage the growth of the core table by moving data periodically from this core table to a history table.

It can be achieved using SQL Agent jobs that can keep the core table small. With scheduled index maintenance and statistics updates, performance will remain top-notch for live transactions hitting this core table.

One consequence of this approach is that business data is now distributed across two or more tables. If the application is not designed to cater to this ab initio, the data experts would need to come up with a reporting approach that combines data from both tables to give a complete picture of the transaction data that has been introduced to the database over time.

One way is to use JOINs based on a key column, another is to use a UNION assuming a concatenation of data from both tables is desirable.

Sample Data

To show how this might be useful in a database, we have created tables – core and history tables – to represent the scenario described in the previous paragraph See Listings 1 and 2). TranHistory is an exact copy of Tran with an extra column to capture the timestamp when rows are inserted into the history table.

-- Listing 1: Structure of the Tran Table
USE [AU]
GO
CREATE TABLE [dbo].[Tran](
	[responseId] [int] IDENTITY(1,1) NOT NULL,
	[senderId] [varchar](15) NULL,
	[msisdn] [varchar](15) NULL,
	[message] [varchar](160) NULL,
	[status] [smallint] NULL,
	[application] [varchar](1) NULL,
	[receivedTime] [datetime2](7) NULL,
	[processedTime] [datetime2](7) NULL,
	[flag] [smallint] NULL,
	[requestDelivery] [smallint] NULL,
	[delivered] [smallint] NULL,
	[account] [varchar](20) NULL,
	[srcTon] [smallint] NULL,
	[srcNpi] [smallint] NULL,
	[destTon] [smallint] NULL,
	[destNpi] [smallint] NULL,
	[errorCode] [smallint] NULL,
	[messageId] [varchar](1) NULL,
	[sequenceNo] [int] NULL,
	[retries] [smallint] NULL,
	[messagePriority] [int] NULL,
	[userId] [varchar](20) NULL,
	[bulkId] [varchar](20) NULL
) ON [PRIMARY]
GO


-- Listing 2: Structure of the Tran Table
USE [AU]
GO
CREATE TABLE [dbo].[TranHistory](
	[responseId] [int] NOT NULL,
	[senderId] [varchar](15) NULL,
	[msisdn] [varchar](15) NULL,
	[message] [varchar](160) NULL,
	[status] [smallint] NULL,
	[application] [varchar](1) NULL,
	[receivedTime] [datetime2](7) NULL,
	[processedTime] [datetime2](7) NULL,
	[flag] [smallint] NULL,
	[requestDelivery] [smallint] NULL,
	[delivered] [smallint] NULL,
	[account] [varchar](20) NULL,
	[srcTon] [smallint] NULL,
	[srcNpi] [smallint] NULL,
	[destTon] [smallint] NULL,
	[destNpi] [smallint] NULL,
	[errorCode] [smallint] NULL,
	[messageId] [varchar](1) NULL,
	[sequenceNo] [int] NULL,
	[retries] [smallint] NULL,
	[messagePriority] [int] NULL,
	[userId] [varchar](20) NULL,
	[bulkId] [varchar](20) NULL,
	[archivedTime] [datetime2](7) NOT NULL
) ON [PRIMARY]
GO

For the purpose f this article, both tables have been populated with data as shown in Figure 1. The Tran table has two rows and the TranHistory table has four rows.

Assuming the four rows in the TranHistory table are the result of an archival process moving data from the core Tran table to the history table, a query to discover how many transactions have hit the database since inception would require combining both tables. The total number of records across both tables is six.

Figure 1: Data in Tran and TranHistory Tables

The UNION Operator

A UNION in SQL is an operator that combines data from two or more result sets. Using the UNION operator requires that the result sets involved have the same number of columns. The corresponding columns must also be of the same data type and be retrieved in the same order. Listing 1 shows the structure of an SQL statement with a UNION operator.

-- Listing 3: A UNION Statement
USE [AU]
GO

SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[Tran]
  UNION
  SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[TranHistory]
GO

The UNION operator is designed to return a combined result set from two discrete data sets such as the two tables we described earlier – Tran and TranHistory. The result is a combination of Tran and TranHistory without duplicates. To achieve the “without” duplicates part, it is necessary for SQL Server to sort the intermediate results as shown by the Live Query Statistics captured in SQL Server Management Studio (See Fig. 2).

Figure 2: Live Query Statistics for UNION Command

The UNION statement returns four rows as sown in Figure 3.

Figure 3: Result of Listing 3 Showing Four Rows

The UNION ALL Operator

The UNION ALL statement differs from the UNION in that the command simply combines the data from two or more tables without removing duplicates. This implies there is no need for sorting which happens to be an expensive task in the SQL world. This would also mean the result set is likely to be returned faster because the overhead of sorting is excluded.

It is advisable to use UNION ALL when there is no requirement to sort the results of such a combination of two or more tables or remove duplicates. By design, there may also be a certainty that there are no duplicates across the two tables so in such a case it would be unnecessary to use a UNION.

Listing 4 shows the UNION ALL statement. Notice it is the same as the command “UNION ALL” and the exclusion of the archivedTime column in the TranHistory table. This ensures the sets being concatenated have the same set of columns.

-- Listing 4: A UNION Statement
USE [AU]
GO

SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[Tran]
  UNION ALL
  SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[TranHistory]
GO

Figure 4 shows the result of the UNION ALL statement. We see six rows returned instead of four earlier returned by the UNION command.

Figure 4: Output of Listing 4

Comparing the Live Query Statistics, we see that the sorting is excluded.

Figure 5: Live Query Statistics for a UNION ALL Statement

Key Differences between UNION and UNION ALL in SQL

SQL Server provides powerful tools for managing and querying databases. When working with multiple tables or queries that share similar structures, combining their result sets can be essential. Two primary methods for combining these results are UNION and UNION ALL. This article will explore the key differences between UNION and UNION ALL in SQL Server to help you make informed decisions when managing your data.

  1. Handling Duplicate Rows:
    UNION:
    • The UNION operator combines the result sets of two or more SELECT statements and eliminates duplicate rows in the final result set.It implicitly uses the DISTINCT keyword, ensuring that each row in the result set is unique.Useful when you need a combined result set without any duplicate records.
    UNION ALL:
    • The UNION ALL operator concatenates the result sets of two or more SELECT statements without eliminating duplicate rows.
    • It preserves all records from both result sets, including duplicates.
    • Ideal for scenarios where retaining duplicate records is important or when you want to optimize query performance.
  2. Performance Implications:
    UNION:
    • Since UNION removes duplicate rows, it requires additional processing.The performance may be slower, especially when dealing with large data sets, as SQL Server needs to sort and compare records to identify duplicates.
    UNION ALL:
    • UNION ALL generally offers better performance, as it does not require the removal of duplicate rows.
    • It simply combines the result sets without any additional processing, making it faster than UNION in most cases.
  3. Syntax and Compatibility:
    Both UNION and UNION ALL require that the SELECT statements being combined have the same number of columns, and that the corresponding columns share compatible data types. Although the column names don’t have to match, it’s a good practice to use aliases for clarity and consistency.
  4. Choosing the Appropriate Method:
    • Assess the requirements of your specific use case to determine whether duplicates need to be eliminated or retained.
    • Consider the performance impact of each method, especially when dealing with large data sets. If removing duplicates is not critical, UNION ALL may be a more efficient choice.

Best Practices and Tips for Using UNION and UNION ALL in SQL

To make the most of UNION and UNION ALL operators in SQL, it’s essential to follow best practices and be aware of common pitfalls. This article will provide examples and explanations for best practices and tips when using UNION and UNION ALL.

Ensuring Column Compatibility

Matching column numbers and data types:
Combining results from Employees and Contractors tables

  SELECT EmployeeID, FirstName, LastName, HireDate
  FROM Employees
  UNION
  SELECT ContractorID, FirstName, LastName, StartDate
  FROM Contractors;

Ensure that the SELECT statements being combined have the same number of columns, and that corresponding columns share compatible data types. In the example above, both queries have four columns, and the data types of the corresponding columns match.

Using aliases and data type conversions when necessary:
Combining results from Employees and Contractors tables with differing column names and data types

  SELECT EmployeeID AS ID, FirstName, LastName, CAST(HireDate AS DATE) AS StartDate
  FROM Employees
  UNION
  SELECT ContractorID AS ID, FName AS FirstName, LName AS LastName, StartDate
  FROM Contractors;

If the column names or data types don’t match, use aliases and data type conversions to ensure compatibility. In this example, we use aliases to standardize column names (FName and LName) and CAST function to convert HireDate to the same data type as StartDate.

Improving Performance

Opting for UNION ALL when duplicates are not a concern:
Combining results from two SELECT statements without eliminating duplicates

  SELECT ProductID, ProductName
  FROM Products_A
  UNION ALL
  SELECT ProductID, ProductName
  FROM Products_B;

When duplicate rows are not a concern, or you want to optimize query performance, use UNION ALL instead of UNION. In this example, we combine results from Products_A and Products_B tables without removing duplicates, providing faster query execution.

Limiting result sets using WHERE and LIMIT clauses:

Combining top 10 results from two SELECT statements filtered by a specific condition

 SELECT TOP 10 ProductID, ProductName
  FROM Products
  WHERE CategoryID = 1
  UNION
  SELECT TOP 10 ProductID, ProductName
  FROM Products
  WHERE CategoryID = 2;

Use WHERE and LIMIT (or TOP in SQL Server) clauses to limit the result sets being combined, reducing the amount of data processed by UNION or UNION ALL. In this example, we combine the top 10 results from two categories, reducing the size of the result set and improving performance.

Common Pitfalls and How to Avoid Them

Mismatched columns:

Pitfall:
Attempting to combine SELECT statements with different numbers of columns or incompatible data types.

How to Avoid:
Verify that the SELECT statements being combined have the same number of columns and compatible data types. Use aliases or data type conversion functions, such as CAST or CONVERT, when necessary.

Incorrect use of UNION or UNION ALL:

Pitfall:
Using UNION when duplicates should be retained or using UNION ALL when duplicates need to be eliminated.

How to Avoid:
Assess the requirements of your specific use case to determine if duplicates should be eliminated or retained. Choose UNION or UNION ALL accordingly.

Conclusion

The UNION and UNION ALL operators are used to combine multiple data sets to a single result set. There are a few conditions necessary for the results to be meaningful and these have been highlighted in the article. The UNION ALL command is preferred if sorting and uniqueness are not required since it performs better simply by excluding sorting.

SQL coding in SSMS is much faster with such a helpful add-in as SQL Complete.

It is worth mentioning that JOIN operators are another very useful and somewhat more sophisticated set of commands for combining data sets based on shared columns.

Last modified: March 23, 2023
Close