Inner join, outer join, cross join? What gives?
It’s a valid question. I once saw a Visual Basic code with T-SQL codes embedded in it. The VB code retrieves table records with multiple SELECT statements, one SELECT * per table. Then, it combines multiple result sets into a record set. Absurd?
To the young developers who did it, it was not. But when they asked me to evaluate why the system was slow, that issue was the first to catch my attention. That’s right. They never heard of SQL joins. In fairness to them, they were honest and open to suggestions.
How do you describe SQL joins? Perhaps, you remember one song – Imagine by John Lennon:
You may say I’m a dreamer, but I’m not the only one.
I hope someday you’ll join us, and the world will be as one.
In the song’s context, joining is uniting. In an SQL database, combining records of 2 or more tables into one result set forms a join.
This article is the start of a 3-part series talking about SQL joins:
- INNER JOIN
- OUTER JOIN, which includes LEFT, RIGHT, and FULL
- CROSS JOIN
But before we start discussing INNER JOIN, let’s describe joins in general.
More About SQL JOIN
Joins appear right after the FROM clause. In its simplest form, it looks like using the SQL-92 standard:
FROM <table source> [<alias1>]
<join type> JOIN <table source> [<alias2>] [ON <join condition>]
[<join type> JOIN <table source> [<alias3>] [ON <join condition>]
<join type> JOIN <table source> [<aliasN>] [ON <join condition>]]
[WHERE <condition>]
Let’s describe the everyday stuff surrounding the JOIN.
Table Sources
You can add up to 256 tables sources, according to Microsoft. Of course, it depends on your server resources. I have never joined more than 10 tables in my life, not to mention 256. Anyway, table sources can be any of the following:
- Table
- View
- Table or view synonym
- Table variable
- Table-valued function
- Derived table
Table Alias
An alias is optional, but it shortens your code and minimizes typing. It also helps you avoid errors when a column name exists in two or more tables used in a SELECT, UPDATE, INSERT, or DELETE. It also adds clarity to your code. It is optional, but I’d recommend using aliases. (Unless you love to type table sources by name.)
Join Condition
The ON keyword precedes the join condition that can be a single join or 2-key columns from the 2 tables joined. Or it can be a composite join using more than 2 key columns. It defines how tables are related.
However, we use the join condition only for INNER and OUTER joins. Using it on a CROSS JOIN will trigger an error.
Since join conditions define the relationships, they need operators.
The most common of join condition operators is equality (=) operator. Other operators like > or < work too, but they are rare.
SQL JOIN vs. Subqueries
Most joins can be rewritten as subqueries and vice-versa. Check out this article to learn more about subqueries compared to joins.
Joins and Derived Tables
Using derived tables in a join looks like this:
FROM table1 a
INNER JOIN (SELECT y.column3 from table2 x
INNER JOIN table3 y on x.column1 = y.column1) b ON a.col1 = b.col2
It is joining from the result of another SELECT statement, and it’s perfectly valid.
You will have more examples, but let’s deal with one last thing about SQL JOINS. It is how SQL Server’s Query Optimizer processes joins.
How SQL Server Processes Joins
To understand how the process works, you need to know about the two types of operations involved:
- Logical operations correspond to the joins types used in a query: INNER, OUTER, or CROSS. You, as the developer, define this part of the processing when forming the query.
- Physical operations – the Query Optimizer chooses the best physical operation applicable for your join. The best means the fastest to produce results. The Execution Plan of your query will show the physical join operators chosen. These operations are:
- Nested Loop Join. This operation is fast if one of the two tables is small and the second one is large and indexed. It requires the least I/O with the fewest comparisons, but it’s not good for large result sets.
- Merge Join. This is the fastest operation for large and sorted result sets by columns used in the join.
- Hash Join. The query optimizer uses it when the result set is too large for a Nested Loop, and inputs are unsorted for a Merge join. A hash is more efficient than sorting it first and applying a Merge join.
- Adaptive Join. Starting with SQL Server 2017, it enables the choice between a nested loop or hash. The join method is deferred until the first input is scanned. This operation dynamically switches to a better physical join without recompiling.
Why Do We Need to Bother with This?
One word: Performance.
One thing is to know how to form queries with joins to produce correct results. Another one is to make it run as fast as possible. You have to be extra concerned about this if you want a good reputation with your users.
So, what do you need to look out for in the Execution Plan for these logical operations?
- Suppose a Sort operator precedes the Merge Join. This sort operation is expensive for large tables (Figure 2). You can fix this by pre-sorting the input tables in the join.
- Suppose there are duplicates in the input tables of a Merge join. SQL Server will write the duplicates of the second table into a WorkTable in tempdb. Then, it will make the comparisons there. The STATISTICS IO will reveal any WorkTables involved.
- When huge data spills to tempdb in a Hash join, the STATISTICS IO will reveal a large logical read on WorkFiles or WorkTables. A warning will also appear in the Execution Plan (Figure 3). You can apply two things: pre-sort the input tables or lessen the joins, if possible. As a result, the Query Optimizer may choose another physical join.
Join Hints
Join hints are new in SQL Server 2019. When you use it in your joins, it tells the query optimizer to stop deciding what is best for the query. You’re the boss when it comes to the physical join to use.
Now stop, right there. The truth is, the query optimizer typically selects the best physical join for your query. If you don’t know what you’re doing, do not use join hints.
The possible hints you can specify are LOOP, MERGE, HASH, or REMOTE.
I haven’t used join hints, but here’s the syntax:
<join type> <join hint> JOIN <table source> [<alias>] ON <join condition>
All About INNER JOIN
INNER JOIN returns the rows with matching records in both tables, based on a condition. It is also the default join if you don’t specify the keyword INNER:
As you see, matching rows from Table1 and Table2 are returned using Key1 as the join condition. The Table1 record having Key1 = ‘C’ is excluded because there are no matching records in Table2.
Whenever I form a query, my first choice is INNER JOIN. OUTER JOIN comes when requirements dictate it only.
INNER JOIN Syntax
There are two INNER JOIN syntax supported in T-SQL: SQL-92 and SQL-89.
SQL-92 INNER JOIN
FROM <table source1> [<alias1>]
INNER JOIN <table source2> [<alias2>] ON <join condition1>
[INNER JOIN <table source3> [<alias3>] ON <join condition2>
INNER JOIN <table sourceN> [<aliasN>] ON <join conditionN>]
[WHERE <condition>]
SQL-89 INNER JOIN
FROM <table source1> [alias1], <table source2> [alias2] [, <table source3> [alias3], <table sourceN> [aliasN]]
WHERE (<join condition1>)
[AND (<join condition2>)
AND (<join condition3>)
AND (<join conditionN>)]
Which INNER JOIN Syntax is Better?
The first join syntax I learned was SQL-89. When SQL-92 finally arrived, I thought it’s too lengthy. I also thought as the output was the same, why bother typing more keywords? A graphical query designer had the generated code SQL-92, and I changed it back to SQL-89. But today, I prefer SQL-92 even if I have to type more. Here’s why:
- The intention of the type of join is clear. The next guy or gal who will maintain my code will know what is intended in the query.
- Forgetting the join condition in a SQL-92 syntax will trigger an error. Meanwhile, forgetting the join condition in SQL-89 will be treated as a CROSS JOIN. If I meant an INNER or OUTER join, it would be an unnoticeable logic bug until the users complain.
- New tools are more inclined to SQL-92. If I ever use a graphical query designer again, I don’t have to change it to SQL-89. I’m no longer stubborn, so my heart rate is back to normal. Cheers to me.
The reasons above are mine. You may have your reasons why you prefer SQL-92 or why you hate it. I wonder what those reasons are. Let me know in the Comments section below.
But we can’t end this article without examples and explanations.
10 INNER JOIN Examples
1. Joining 2 Tables
Here’s an example of 2 tables joined together using INNER JOIN in SQL-92 syntax.
-- Display Vests, Helmets, and Light products
USE AdventureWorks
GO
SELECT
p.ProductID
,P.Name AS [Product]
,ps.ProductSubcategoryID
,ps.Name AS [ProductSubCategory]
FROM Production.Product p
INNER JOIN Production.ProductSubcategory ps ON P.ProductSubcategoryID = ps.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (25, 31, 33); -- for vest, helmet, and light
-- product subcategories
You only specify the columns that you need. In the above example, 4 columns are specified. I know it’s too lengthy than SELECT * but keep this in mind: it’s best practice.
Notice also the use of table aliases. Both the Product and ProductSubcategory tables have a column named [Name]. If you don’t specify the alias, an error will be triggered.
Meanwhile, here’s the equivalent SQL-89 syntax:
-- Display Vests, Helmets, and Light products
USE AdventureWorks
GO
SELECT
p.ProductID
,P.Name AS [Product]
,ps.ProductSubcategoryID
,ps.Name AS [ProductSubCategory]
FROM Production.Product p, Production.ProductSubcategory ps
WHERE P.ProductSubcategoryID = ps.ProductSubcategoryID
AND P.ProductSubcategoryID IN (25, 31, 33);
They are the same except for the join condition mixed in the WHERE clause with an AND keyword. But under the hood, are they really the same? Let’s inspect the result set, the STATISTICS IO, and the Execution Plan.
See the result set of 9 records:
It’s not just the results, but the resources required by SQL Server is also the same.
See the logical reads:
Finally, the Execution Plan reveals the same query plan for both queries when their QueryPlanHashes are equal. Notice also the highlighted operations in the diagram:
Based on the findings, SQL Server query processing is the same, whether it’s SQL-92 or SQL-89. But as I said, the clarity in SQL-92 is much better for me.
Figure 7 also shows a Nested Loop Join used in the plan. Why? The result set is small.
2. Joining Multiple Tables
Check out the query below using 3 joined tables.
-- Get the total number of orders per Product Category
USE AdventureWorks
GO
SELECT
ps.Name AS ProductSubcategory
,SUM(sod.OrderQty) AS TotalOrders
FROM Production.Product p
INNER JOIN Sales.SalesOrderDetail sod ON P.ProductID = sod.ProductID
INNER JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID
INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
WHERE soh.OrderDate BETWEEN '1/1/2014' AND '12/31/2014'
AND p.ProductSubcategoryID IN (1,2)
GROUP BY ps.Name
HAVING ps.Name IN ('Mountain Bikes', 'Road Bikes')
3. Composite Join
You can also join 2 tables using 2 keys to relate it. Check out the sample below. It uses 2 join conditions with an AND operator.
SELECT
a.column1
,b.column1
,b.column2
FROM Table1 a
INNER JOIN Table2 b ON a.column1 = b.column1 AND a.column2 = b.column2
4. INNER JOIN Using a Nested Loop Physical Join
In the example below, the Product table has 9 records – a small set. The joined table is SalesOrderDetail – a large set. The Query Optimizer will use a Nested Loop Join, as shown in Figure 8.
USE AdventureWorks
GO
SELECT
sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);
5. INNER JOIN Using a Merge Physical Join
The example below uses a Merge Join because both input tables are sorted by SalesOrderID.
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
6. INNER JOIN Using a Hash Physical Join
The following example will use a Hash Join:
SELECT
s.Name AS Store
,SUM(soh.TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.Store s ON soh.SalesPersonID = s.SalesPersonID
GROUP BY s.Name
7. INNER JOIN Using Adaptive Physical Join
In the example below, the SalesPerson table has a Non-clustered ColumnStore Index on the TerritoryID column. The Query Optimizer decided on a Nested Loop Join, as shown in Figure 11.
SELECT
sp.BusinessEntityID
,sp.SalesQuota
,st.Name AS Territory
FROM Sales.SalesPerson sp
INNER JOIN Sales.SalesTerritory st ON sp.TerritoryID = st.TerritoryID
WHERE sp.TerritoryID BETWEEN 1 AND 5
8. Two Ways to Rewrite a Subquery to an INNER JOIN
Consider this statement with a nested subquery:
SELECT [SalesOrderID], [OrderDate], [ShipDate], [CustomerID]
FROM Sales.SalesOrderHeader
WHERE [CustomerID] IN (SELECT [CustomerID] FROM Sales.Customer
WHERE PersonID IN (SELECT BusinessEntityID FROM Person.Person
WHERE lastname LIKE N'I%' AND PersonType='SC'))
The same results can come out if you change it to an INNER JOIN, as below:
SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.PersonType = 'SC'
AND p.lastname LIKE N'I%'
Another way to rewrite it is by using a derived table as a table source for the INNER JOIN:
SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN (SELECT c.CustomerID, P.PersonType, P.LastName
FROM Sales.Customer c
INNER JOIN Person.Person p ON c.PersonID = P.BusinessEntityID
WHERE p.PersonType = 'SC'
AND p.LastName LIKE N'I%') AS q ON o.CustomerID = q.CustomerID
All 3 queries output the same 48 records.
9. Using Join Hints
The following query uses a Nested Loop:
SELECT
sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);
If you want to force it to a Hash join, this is what happens:
SELECT
sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER HASH JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);
However, note that STATISTICS IO shows that performance will turn bad when you force it to a Hash join.
Meanwhile, the query below uses a Merge Join:
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
Here’s what it becomes when you force it to a Nested Loop:
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER LOOP JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
Upon checking the STATISTICS IO of both, forcing it to a Nested Loop require more resources to process the query:
Thus, using join hints should be your last resort when tweaking for performance. Let your SQL Server handle it for you.
10. Using INNER JOIN in UPDATE
You can also use INNER JOIN in an UPDATE statement. Here’s an example:
UPDATE Sales.SalesOrderHeader
SET ShipDate = getdate()
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.PersonType = 'SC'
Since it’s possible to use a join in an UPDATE, why not try it out using DELETE and INSERT?
SQL Join and INNER JOIN Takeaways
So, what’s the big deal about SQL join?
- An SQL JOIN combines records of 2 or more tables to form one result set.
- There are types of joins in SQL: INNER, OUTER, and CROSS.
- As a developer or administrator, you decide which logical operations or join types to use for your requirements.
- On the other hand, the Query Optimizer decides the best physical join operators to use. It can be Nested Loop, Merge, Hash, or Adaptive.
- You can use join hints to force what physical join to use, but it should be your last resort. In most cases, it’s better to let your SQL Server handle it.
- Knowing the physical join operators also helps you in tuning query performance.
- Also, subqueries can be rewritten using joins.
Meanwhile, this post showed 10 examples of INNER JOINs. It’s not just sample codes. Some of them also include an inspection of how the code works from the inside out. It’s not just to help you code but to help you be mindful of performance. At the end of the day, results should not just be correct but also delivered fast.
We’re not done yet. The next article will deal with OUTER JOINS. Stay tuned.
See also
SQL Joins let you fetch and combine data from more than one table. Watch this video to learn more about SQL Joins.
Tags: inner join, sql join, t-sql Last modified: September 18, 2021