Written by 08:12 SQL Server, Uncategorized

The Difference Between UNION and UNION ALL

Twice in the last six months, I have had to get involved in the resolution of an incident associated with active database tables growing to the extent that they caused issues. Such incidents are easy to avoid by designing a mechanism to manage the growth of the core table by moving data periodically from this core table to a history table.

It can be achieved using SQL Agent jobs that can keep the core table small. With scheduled index maintenance and statistics updates, performance will remain top-notch for live transactions hitting this core table.

One consequence of this approach is that business data is now distributed across two or more tables. If the application is not designed to cater to this ab initio, the data experts would need to come up with a reporting approach that combines data from both tables to give a complete picture of the transaction data that has been introduced to the database over time.

One way is to use JOINs based on a key column, another is to use a UNION assuming a concatenation of data from both tables is desirable.

Sample Data

To show how this might be useful in a database, we have created tables – core and history tables – to represent the scenario described in the previous paragraph See Listings 1 and 2). TranHistory is an exact copy of Tran with an extra column to capture the timestamp when rows are inserted into the history table.

-- Listing 1: Structure of the Tran Table
USE [AU]
GO
CREATE TABLE [dbo].[Tran](
	[responseId] [int] IDENTITY(1,1) NOT NULL,
	[senderId] [varchar](15) NULL,
	[msisdn] [varchar](15) NULL,
	[message] [varchar](160) NULL,
	[status] [smallint] NULL,
	[application] [varchar](1) NULL,
	[receivedTime] [datetime2](7) NULL,
	[processedTime] [datetime2](7) NULL,
	[flag] [smallint] NULL,
	[requestDelivery] [smallint] NULL,
	[delivered] [smallint] NULL,
	[account] [varchar](20) NULL,
	[srcTon] [smallint] NULL,
	[srcNpi] [smallint] NULL,
	[destTon] [smallint] NULL,
	[destNpi] [smallint] NULL,
	[errorCode] [smallint] NULL,
	[messageId] [varchar](1) NULL,
	[sequenceNo] [int] NULL,
	[retries] [smallint] NULL,
	[messagePriority] [int] NULL,
	[userId] [varchar](20) NULL,
	[bulkId] [varchar](20) NULL
) ON [PRIMARY]
GO


-- Listing 2: Structure of the Tran Table
USE [AU]
GO
CREATE TABLE [dbo].[TranHistory](
	[responseId] [int] NOT NULL,
	[senderId] [varchar](15) NULL,
	[msisdn] [varchar](15) NULL,
	[message] [varchar](160) NULL,
	[status] [smallint] NULL,
	[application] [varchar](1) NULL,
	[receivedTime] [datetime2](7) NULL,
	[processedTime] [datetime2](7) NULL,
	[flag] [smallint] NULL,
	[requestDelivery] [smallint] NULL,
	[delivered] [smallint] NULL,
	[account] [varchar](20) NULL,
	[srcTon] [smallint] NULL,
	[srcNpi] [smallint] NULL,
	[destTon] [smallint] NULL,
	[destNpi] [smallint] NULL,
	[errorCode] [smallint] NULL,
	[messageId] [varchar](1) NULL,
	[sequenceNo] [int] NULL,
	[retries] [smallint] NULL,
	[messagePriority] [int] NULL,
	[userId] [varchar](20) NULL,
	[bulkId] [varchar](20) NULL,
	[archivedTime] [datetime2](7) NOT NULL
) ON [PRIMARY]
GO

For the purpose f this article, both tables have been populated with data as shown in Figure 1. The Tran table has two rows and the TranHistory table has four rows.

Assuming the four rows in the TranHistory table are the result of an archival process moving data from the core Tran table to the history table, a query to discover how many transactions have hit the database since inception would require combining both tables. The total number of records across both tables is six.

Figure 1: Data in Tran and TranHistory Tables

The UNION Operator

A UNION in SQL is an operator that combines data from two or more result sets. Using the UNION operator requires that the result sets involved have the same number of columns. The corresponding columns must also be of the same data type and be retrieved in the same order. Listing 1 shows the structure of an SQL statement with a UNION operator.

-- Listing 3: A UNION Statement
USE [AU]
GO

SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[Tran]
  UNION
  SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[TranHistory]
GO

The UNION operator is designed to return a combined result set from two discrete data sets such as the two tables we described earlier – Tran and TranHistory. The result is a combination of Tran and TranHistory without duplicates. To achieve the “without” duplicates part, it is necessary for SQL Server to sort the intermediate results as shown by the Live Query Statistics captured in SQL Server Management Studio (See Fig. 2).

Figure 2: Live Query Statistics for UNION Command

The UNION statement returns four rows as sown in Figure 3.

Figure 3: Result of Listing 3 Showing Four Rows

The UNION ALL Operator

The UNION ALL statement differs from the UNION in that the command simply combines the data from two or more tables without removing duplicates. This implies there is no need for sorting which happens to be an expensive task in the SQL world. This would also mean the result set is likely to be returned faster because the overhead of sorting is excluded.

It is advisable to use UNION ALL when there is no requirement to sort the results of such a combination of two or more tables or remove duplicates. By design, there may also be a certainty that there are no duplicates across the two tables so in such a case it would be unnecessary to use a UNION.

Listing 4 shows the UNION ALL statement. Notice it is the same as the command “UNION ALL” and the exclusion of the archivedTime column in the TranHistory table. This ensures the sets being concatenated have the same set of columns.

-- Listing 4: A UNION Statement
USE [AU]
GO

SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[Tran]
  UNION ALL
  SELECT [responseId]
      ,[senderId]
      ,[msisdn]
      ,[message]
      ,[status]
      ,[application]
      ,[receivedTime]
      ,[processedTime]
      ,[flag]
      ,[requestDelivery]
      ,[delivered]
      ,[srcTon]
      ,[srcNpi]
      ,[destTon]
      ,[destNpi]
      ,[errorCode]
      ,[messageId]
      ,[sequenceNo]
      ,[retries]
      ,[messagePriority]
      ,[bulkId]
  FROM [dbo].[TranHistory]
GO

Figure 4 shows the result of the UNION ALL statement. We see six rows returned instead of four earlier returned by the UNION command.

Figure 4: Output of Listing 4

Comparing the Live Query Statistics, we see that the sorting is excluded.

Figure 5: Live Query Statistics for a UNION ALL Statement

Conclusion

The UNION and UNION ALL operators are used to combine multiple data sets to a single result set. There are a few conditions necessary for the results to be meaningful and these have been highlighted in the article. The UNION ALL command is preferred if sorting and uniqueness are not required since it performs better simply by excluding sorting.

SQL coding in SSMS is much faster with such a helpful add-in as SQL Complete.

It is worth mentioning that JOIN operators are another very useful and somewhat more sophisticated set of commands for combining data sets based on shared columns.

Last modified: October 31, 2022
Close