Written by 13:41 Database development, Tables, Work with data

Your Ultimate Guide to SQL Joins: OUTER JOIN – Part 2

Outer join is at the center stage today. And this is part 2 of your ultimate guide to SQL joins. If you missed part 1, here’s the link.

By the looks of it, outer is the opposite of inner. However, if you consider the outer join this way, you’ll be confused. To top that, you don’t have to include the word outer in your syntax explicitly. It’s optional!

But before we dive in, let’s discuss nulls concerning outer joins.

Nulls and OUTER JOIN

When you join 2 tables, one of the values from either table can be null. For INNER JOINs, records with nulls won’t match, and they will be discarded and won’t appear in the result set. If you want to get the records that don’t match, your only option is OUTER JOIN.

Going back to antonyms, isn’t that the opposite of INNER JOINs? Not entirely, as you will see in the next section.

CodingSight - Your Ultimate Guide to SQL Joins: OUTER JOIN – Part 2

All About SQL Server OUTER JOIN

Understanding outer joins starts with the output. Here’s a complete list of what you can expect:

  • All records that match the join condition or predicate. That’s the expression right after the ON keyword, much like the INNER JOIN output. We refer to this issue as the inner row.
  • Non-NULL values from the left table with the null counterparts from the right table. We refer to this issue as outer rows.
  • Non-NULL values from the right table with the null counterparts from the left table. This is another form of outer rows.
  • Finally, it could be a combination of all things described above.

With that list, we can say that OUTER JOIN returns inner and outer rows.

  • Inner – because the exact results of the INNER JOIN can be returned.
  • Outer – because the outer rows can also be returned.

It is the difference from INNER JOIN.

INNER JOINS RETURN INNER ROWS ONLY. OUTER JOINS CAN RETURN BOTH INNER AND OUTER ROWS

Notice that I used “can be” and “can also be.” It depends on your WHERE clause (or if you ever include a WHERE clause) if it returns both inner and/or outer rows.

But from a SELECT statement, how can you determine which is the left or right table? Good question!

How to Know Which is Left or Right Table in a Join?

We can answer this question with examples:

SELECT *
FROM Table1 a
LEFT OUTER JOIN Table2 b on a.column1 = b.column1

From the example above, Table1 is the left table, and Table2 is the right table. Now, let’s have another example. This time, it is a simple multi-join.

SELECT *
FROM Table1 a
LEFT OUTER JOIN Table2 b on a.column1 = b.column1
LEFT OUTER JOIN Table3 c on b.column2 = c.column1

In this case, to know what’s left or right, remember that a join works on 2 tables.

Table1 is still the left table, and Table2 is the right table. This refers to joining 2 tables: Table1 and Table2. What about joining Table2 and Table3Table2 becomes the left table, and Table3 is the right table.

If we add a fourth table, Table3 becomes the left table, and Table4 is the right table. But it doesn’t end there. We can join another table to the Table1. Here’s an example:

SELECT *
FROM Table1 a
LEFT OUTER JOIN Table2 b on a.column1 = b.column1
LEFT OUTER JOIN Table3 c on b.column2 = c.column1
LEFT OUTER JOIN Table4 d on c.column1 = d.column2
LEFT OUTER JOIN Table5 e on a.column2 = e.column1

Table1 is the left table, and Table5 is the right table. You can also do the same with the other tables.

Okay, let’s go back to the list of expected outputs above. We can also derive the outer joins types from these.

Types of Outer Joins

There are 3 types based on the OUTER JOIN outputs.

LEFT OUTER JOIN (LEFT JOIN)

LEFT JOIN returns inner rows + Non-NULL values from the left table with the right table’s null counterparts. Hence, it is LEFT JOIN because the left table is the dominant of the two tables within the join having non-null values.

LEFT OUTER JOIN EXAMPLE 1
-- Return all customerIDs with orders and no orders

USE AdventureWorks
GO

SELECT
 c.CustomerID
,soh.OrderDate
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader soh ON c.CustomerID = soh.CustomerID 

In the example above, the Customer is the left table, and SalesOrderHeader is the right table. The result of the query is 32,166 records – it includes both inner and outer rows. You can see a portion of it in Figure 1:

Result set from a LEFT OUTER JOIN that shows both inner and outer rows. Outer rows are those with null values.
Figure 1. Result set from a LEFT OUTER JOIN that shows both inner and outer rows. Outer rows are those with null values.

Suppose we want to return the outer rows only or the customers with no orders. To do that, add a WHERE clause to include only rows with nulls from SalesOrderHeader.

SELECT
 c.CustomerID
,soh.OrderDate
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader soh ON c.CustomerID = soh.CustomerID
WHERE soh.SalesOrderID IS NULL

The result set I got is 701 records. All of them like the null OrderDate from Figure 1.

If I get the inner rows only, the result will be 31,465 records. I can do it by changing the WHERE clause to include those SalesOrderIDs that are not null. Or I can change the join to an INNER JOIN and remove the WHERE clause.

To see if it checks out from the output of the first example without the WHERE clause, let’s sum up the records.

Inner RowsOuter RowsTotal Rows
31,465 records701 records32,166 records

From the Total Rows above with 32,166 records, you can see that it checks out with the first example results. This also shows how LEFT OUTER JOIN works.

LEFT OUTER JOIN EXAMPLE 2

This time, the example is a multi-join. Notice also that we do away with the OUTER keyword.

-- show the people with and without addresses from AdventureWorks
USE AdventureWorks
GO

SELECT
 P.FirstName
,P.MiddleName
,P.LastName
,a.AddressLine1
,a.AddressLine2
,a.City
,adt.Name AS AddressType
FROM Person.Person p
LEFT JOIN Person.BusinessEntityAddress bea ON P.BusinessEntityID = bea.BusinessEntityID
LEFT JOIN Person.Address a ON bea.AddressID = a.AddressID
LEFT JOIN person.AddressType adt ON bea.AddressTypeID = adt.AddressTypeID 

It generated 19,996 records. You can check out the portion of the output in Figure 2 below. The records with null AddressLine1 are outer rows. Above it is inner rows.

Result set showing people with and without addresses.
Figure 2. Result set showing people with and without addresses.

RIGHT OUTER JOIN (RIGHT JOIN)

RIGHT JOIN returns inner rows + Non-NULL values from the right table with the left table’s null counterparts.

RIGHT OUTER JOIN EXAMPLE 1
-- From the product reviews, return the products without product reviews
USE AdventureWorks
GO

SELECT
P.Name
FROM Production.ProductReview pr
RIGHT OUTER JOIN Production.Product p ON pr.ProductID = p.ProductID
WHERE pr.ProductReviewID IS NULL 

Figure 3 shows 10 of 501 records in the result set.

RIGHT JOIN result set showing products without product reviews.
Figure 3. RIGHT JOIN result set showing products without product reviews.

In the above example, ProductReview is the left table, and the Product is the right table. Since this is a RIGHT OUTER JOIN, we intend to include the Non-NULL values from the right table.

However, choosing between LEFT JOIN or RIGHT JOIN depends on you. Why? Because you can express the query, whether LEFT or RIGHT JOIN, and get the same results. Let’s try it with a LEFT JOIN.

-- return the products without product reviews using LEFT OUTER JOIN
USE AdventureWorks
GO

SELECT
P.Name
FROM Production.Product p
LEFT OUTER JOIN Production.ProductReview pr ON pr.ProductID = p.ProductID
WHERE pr.ProductReviewID IS NULL

Try to execute the above, and you will have the same result as in Figure 3. But do you think the Query Optimizer will treat them differently? Let’s find out in the Execution Plan of both in Figure 4.

Execution plan of 2 queries having the same result set. But the first use RIGHT JOIN and the other LEFT JOIN.
Figure 4. Execution plan of 2 queries having the same result set. But the first use RIGHT JOIN and the other LEFT JOIN.

If you’re new to this, there are a few surprises in the Execution Plan.

  1. The diagrams look the same, and they are: try a Compare Showplan, and you will see the same QueryPlanHash.
  2. Notice the top diagram with a Merge join. We used a RIGHT OUTER JOIN, but SQL Server changed it to LEFT OUTER JOIN. It also switched the left and right tables. It makes it equal to the second query with the LEFT JOIN.

As you see now, the results are the same. So, choose which of the OUTER JOINs will be more convenient.

Why Did SQL Server Change the RIGHT JOIN to LEFT JOIN?

The database engine doesn’t have to follow the way you express the logical joins. As long as it can produce correct results in the fastest way it thinks possible, it will make changes. Even shortcuts.

Do not conclude that RGHT JOIN is bad and LEFT JOIN is good.

RIGHT OUTER JOIN EXAMPLE 2

Have a look at the below example:

-- Get the unassigned addresses and the address types with no addresses
SELECT
 P.FirstName
,P.MiddleName
,P.LastName
,a.AddressLine1
,a.AddressLine2
,a.City
,adt.Name AS AddressType
FROM Person.Person p
RIGHT JOIN Person.BusinessEntityAddress bea ON P.BusinessEntityID = bea.BusinessEntityID
RIGHT JOIN Person.Address a ON bea.AddressID = a.AddressID
RIGHT JOIN person.AddressType adt ON bea.AddressTypeID = adt.AddressTypeID
WHERE P.BusinessEntityID IS NULL 

There are 2 things that you can get from this query, as you can see in Figure 5 below:

Part of the RIGHT JOIN query result set showing unassigned addresses and address types with no addresses.
Figure 5. Part of the RIGHT JOIN query result set showing unassigned addresses and address types with no addresses.

The query results show the following:

  1. The unassigned addresses – these records are those with null names.
  2. Address Types with no addresses. Archive, Billing, and Primary address types have no corresponding addresses. Those are from records 817 to 819.

FULL OUTER JOIN (FULL JOIN)

FULL JOIN returns a combination of inner rows and outer rows, left and right.

-- Get people with and without addresses, unassigned addresses, and address types without addresses
SELECT
 P.FirstName
,P.MiddleName
,P.LastName
,a.AddressLine1
,a.AddressLine2
,a.City
,adt.Name AS AddressType
FROM Person.Person p
FULL JOIN Person.BusinessEntityAddress bea ON P.BusinessEntityID = bea.BusinessEntityID
FULL JOIN Person.Address a ON bea.AddressID = a.AddressID
FULL JOIN person.AddressType adt ON bea.AddressTypeID = adt.AddressTypeID

The result set includes 20,815 records. Like what you’d expect, it is a total number of records from the result set of INNER JOIN, LEFT JOIN, and RIGHT JOIN.

LEFT and RIGHT JOIN include a WHERE clause to show only the results with nulls in either left or right tables.

INNER JOINLEFT JOIN
(WHERE a.AddressID IS NULL)
RIGHT JOIN
(WHERE P.BusinessEntityID IS NULL)
TOTAL (Same as FULL JOIN)
18,798 records1,198 records819 records20,815 records

Note that the FULL JOIN can produce a huge result set from large tables. So, use it only when you need it only.

Practical Uses of OUTER JOIN

If you still hesitate when you can and should use OUTER JOIN, here are some ideas.

Outer Joins that Output Both Inner and Outer Rows

Examples can be:

  • Alphabetical list of paid and unpaid customer orders.
  • Alphabetical list of employees with tardiness or no tardiness record.
  • A list of policyholders who renewed and not renewed their most recent insurance policies.

Outer Joins That Output Outer Rows Only

Examples include:

  • alphabetical list of employees with no tardiness record for the zero-tardiness award
  • list of territories with no customers
  • list of sales agents with no sales of a particular product
  • getting results from missing values, like dates without sales orders in a given period (example below)
  • nodes with no child in a parent-child relationship (example below)

Getting Results from Missing Values

Suppose you need to produce a report. That report must show the number of days for each month in a given period where there weren’t any orders. The SalesOrderHeader in AdventureWorks contains the OrderDates,  but they don’t have dates with no orders. What can you do?

1. Create a Table of All Dates in a Period

A sample script below will create a table of dates for the whole of 2014:

DECLARE @StartDate date = '20140101', @EndDate date = '20141231';

CREATE TABLE dbo.Dates
(
	d DATE NOT null PRIMARY KEY
)

WHILE @StartDate <= @EndDate
BEGIN
  INSERT Dates([d]) SELECT @StartDate;
  SET @StartDate = DATEADD(DAY, 1, @StartDate);
END

SELECT d FROM Dates ORDER BY [d];
2. Use LEFT JOIN to Output the Days with No Orders
SELECT
 MONTH(d.d) AS [month]
,YEAR(d.d) AS [year]
,COUNT(*) AS NoOrderDays
FROM Dates d
LEFT JOIN Sales.SalesOrderHeader soh ON d.d = soh.OrderDate
WHERE soh.OrderDate IS NULL
GROUP BY YEAR(d.d), MONTH(d.d)
ORDER BY [year], [month]

The code above counts the number of days when no orders have been made. SalesOrderHeader contains the dates with orders. So, nulls returned in the join will count as days without orders.

Meanwhile, if you want to know the exact dates, you can remove the count and grouping.

SELECT
 d.d
,soh.OrderDate
FROM Dates d
LEFT JOIN Sales.SalesOrderHeader soh ON d.d = soh.OrderDate
WHERE soh.OrderDate IS NULL

Or, if you want to count orders in a given period and see which date has zero orders, here’s how:

SELECT DISTINCT
 D.d AS SalesDate
,COUNT(soh.OrderDate) AS NoOfOrders
FROM Dates d
LEFT JOIN Sales.SalesOrderHeader soh ON d.d = soh.OrderDate
WHERE d.d BETWEEN '02/01/2014' AND '02/28/2014'
GROUP BY d.d
ORDER BY d.d

The above code counts orders for February 2014. See the result:

Result of counting the orders per day within February 2014. Feb. 2 has no orders.
Figure 6. Result of counting the orders per day within February 2014. Feb. 2 has no orders.

Why does it highlight February 3, 2014? In my copy of AdventureWorks, there are no sales orders for that date.

Now, notice COUNT(soh.OrderDate) in the code. Later, we’ll clarify why this is so important.

Getting Childless Nodes in Parent-Child Relationships

Sometimes we need to know the nodes with no child in a parent-child relationship.

Let’s use the database I have used in my article about HierarchyID. You need to get nodes with no children in a parent-child relationship table using a self-join.

SELECT 
 r1.RankParentId
,r1.Rank AS RankParent
,r.RankId
FROM Ranks r
RIGHT JOIN Ranks r1 ON r.RankParentId = r1.RankId
WHERE r.RankId is NULL 

Caveats in Using OUTER JOIN

Since an OUTER JOIN can return inner rows like an INNER JOIN, it can confuse. Performance problems can also creep in. So, note the 3 points below (I come back to them from time to time – I’m not getting younger, so I forget too).

Filtering the Right Table in a LEFT JOIN with a Non-Null Value in the WHERE Clause

It can be a problem if you used a LEFT OUTER JOIN but filtered the right table with a non-null value in the WHERE clause. The reason is, it will become functionally equivalent to an INNER JOIN. Consider the example below:

USE AdventureWorks
GO

SELECT
 P.FirstName
,P.MiddleName
,P.LastName
,a.AddressLine1
,a.AddressLine2
,a.City
,adt.Name AS AddressType
FROM Person.Person p
LEFT JOIN Person.BusinessEntityAddress bea ON P.BusinessEntityID = bea.BusinessEntityID
LEFT JOIN Person.Address a ON bea.AddressID = a.AddressID
LEFT JOIN person.AddressType adt ON bea.AddressTypeID = adt.AddressTypeID
WHERE bea.AddressTypeID = 5 

From the code above, let’s examine the 2 tables: Person and BusinessEntityAddressThe Person is the left table, and BusinessEntityAddress is the right table.

LEFT JOIN is used, so it assumes a null BusinessEntityID somewhere in BusinessEntityAddress. Here, notice the WHERE clause. It filters the right table with AddressTypeID = 5. It completely discards all outer rows in BusinessEntityAddress.

This can either be any of these:

  • The developer is testing something in the result but forgot to remove it.
  • INNER JOIN was intended, but for some reason, LEFT JOIN was used.
  • The developer doesn’t understand the difference between LEFT JOIN and INNER JOIN. He assumes any of the 2 will work, and it doesn’t matter because the results are the same in this case.

Any of the 3 above is bad, but the third entry has another implication. Let’s compare the code above with the INNER JOIN equivalent:

SELECT
 P.FirstName
,P.MiddleName
,P.LastName
,a.AddressLine1
,a.AddressLine2
,a.City
,adt.Name AS AddressType
FROM Person.Person p
INNER JOIN Person.BusinessEntityAddress bea ON P.BusinessEntityID = bea.BusinessEntityID
INNER JOIN Person.Address a ON bea.AddressID = a.AddressID
INNER JOIN person.AddressType adt ON bea.AddressTypeID = adt.AddressTypeID
WHERE bea.AddressTypeID = 5

It looks similar to the previous code except for the type of join. The result is also the same, but you should note the logical reads in STATISTICS IO:

Higher logical reads result when an OUTER JOIN is used but functionally similar to INNER JOIN.
Figure 7. Higher logical reads result when an OUTER JOIN is used but functionally similar to INNER JOIN.

In Figure 7, the first I/O stats are from the use of INNER JOIN. A total of logical reads is 177. However, the second stats are for the LEFT JOIN with a higher logical reads value of 223. Thus, the wrong usage of LEFT JOIN in this example will require more pages or resources from SQL Server. Therefore, it will run slower.

Takeaway

If you intend to output inner rows, use INNER JOIN. Otherwise, do not filter the right table in a LEFT JOIN with a non-null value. If this happens, you end up with a slower query than if you use INNER JOIN.

BONUS TIP: This situation also happens in a RIGHT JOIN when the left table is filtered with a non-null value.

Improperly Using Join Types in a Multi-Join

Suppose we want to get all the vendors and the number of product purchase orders for each. Here’s the code:

USE AdventureWorks
GO

SELECT
 v.BusinessEntityID
,v.Name AS Vendor
,pod.ProductID
,pod.OrderQty
FROM Purchasing.Vendor v
LEFT JOIN Purchasing.PurchaseOrderHeader poh ON v.BusinessEntityID = poh.VendorID
LEFT JOIN Purchasing.PurchaseOrderDetail pod ON poh.PurchaseOrderID = pod.PurchaseOrderID 

The above code returns both the vendors with purchase orders and those without. Figure 8 shows the Actual Execution Plan of the above code.

Execution Plan of the Vendors and Purchase Order Tables showing LEFT OUTER JOIN.
Figure 8. Execution Plan of the Vendors and Purchase Order Tables showing LEFT OUTER JOIN.

Thinking that every purchase order has a guaranteed purchase order detail, an INNER JOIN would be better. However, is it really so?

First, let’s have the modified code with the INNER JOIN.

USE AdventureWorks
GO

SELECT
 v.BusinessEntityID
,v.Name AS Vendor
,pod.ProductID
,pod.OrderQty
FROM Purchasing.Vendor v
LEFT JOIN Purchasing.PurchaseOrderHeader poh ON v.BusinessEntityID = poh.VendorID
INNER JOIN Purchasing.PurchaseOrderDetail pod ON poh.PurchaseOrderID = pod.PurchaseOrderID 

 Remember, the requirement above says “all” vendors. Since we used the LEFT JOIN in the previous code, we’ll get vendors without purchase orders returned. That’s because of the null PurchaseOrderID.

Changing the join to an INNER JOIN will discard all the null PurchaseOrderIDs. It will also cancel all the null VendorIDs from the Vendor table. In effect, it becomes an INNER JOIN.

Is that a correct assumption? The Execution Plan will reveal the answer:

Execution Plan showing INNER JOIN in all joined tables after changing the join of the third table from LEFT JOIN.
Figure 9. Execution Plan showing INNER JOIN in all joined tables after changing the join of the third table from LEFT JOIN.

As you see, all tables were processed using INNER JOIN. Therefore, our assumption is correct. But for the worst part, the result set is now incorrect because the vendors with no orders were not included.

Takeaway

Like in the previous case, if you intend an INNER JOIN, use it. But you know what to do if you encounter a situation like the one here.

In this case, an INNER JOIN will discard all outer rows up to the top table in the relationship. Even if your other join is a LEFT JOIN, it won’t matter. We have proven that in the Execution Plans.

Incorrect Use of COUNT() in Outer Joins

Remember our sample code that counts the number of orders per date and the result in Figure 6?

Here, we’ll clarify why 02/03/2014 is highlighted and its relation to COUNT(soh.OrderDate).

If you try using COUNT(*), the number of orders for that date becomes 1, which is wrong. There are no orders on that date. So, when using COUNT() with an OUTER JOIN, use the correct column to count.

In our case, soh.OrderDate can be null or not. When it’s not null, COUNT() will include the row in the count. COUNT(*) will make it count everything, including the nulls. And in the end, wrong results.

The OUTER JOIN Takeaways

Let’s summarize the points:

  • OUTER JOIN can return both inner rows and outer rows. Inner rows are the result that’s similar to INNER JOIN’s result. Outer rows are the non-null values with their null counterparts based on the join condition.
  • OUTER JOIN can be LEFT, RIGHT, or FULL. We had examples for each.
  • The outer rows returned by OUTER JOIN can be used in a variety of practical ways. We had ideas on when you can use this stuff.
  • We also had caveats in using OUTER JOIN. Be mindful of the 3 points above to avoid bugs and performance problems.

The final part of this series will discuss CROSS JOIN. So, until then. And if you like this post, share some love by clicking the social media buttons. Happy coding!

Tags: , , , Last modified: September 18, 2021
Close