Written by 17:30 Database administration, Work with data

Your Ultimate Guide to SQL Join: INNER JOIN – Part 1

Your Ultimate Guide to SQL Join: INNER JOIN – Part 1

Inner join, outer join, cross join? What gives?

It’s a valid question. I once saw a Visual Basic code with T-SQL codes embedded in it. The VB code retrieves table records with multiple SELECT statements, one SELECT * per table. Then, it combines multiple result sets into a record set. Absurd?

To the young developers who did it, it was not. But when they asked me to evaluate why the system was slow, that issue was the first to catch my attention. That’s right. They never heard of SQL joins. In fairness to them, they were honest and open to suggestions.

How do you describe SQL joins? Perhaps, you remember one song – Imagine by John Lennon:

You may say I’m a dreamer, but I’m not the only one.

I hope someday you’ll join us, and the world will be as one.

In the song’s context, joining is uniting. In an SQL database, combining records of 2 or more tables into one result set forms a join.

This article is the start of a 3-part series talking about SQL joins:

  • INNER JOIN
  • OUTER JOIN, which includes LEFT, RIGHT, and FULL
  • CROSS JOIN

But before we start discussing INNER JOIN, let’s describe joins in general.

CodingSight - Your Ultimate Guide to SQL Join: INNER JOIN – Part 1

More About SQL JOIN

Joins appear right after the FROM clause. In its simplest form, it looks like using the SQL-92 standard:

FROM <table source> [<alias1>]
<join type> JOIN <table source> [<alias2>] [ON <join condition>] 
[<join type> JOIN <table source> [<alias3>] [ON <join condition>]
<join type> JOIN <table source> [<aliasN>] [ON <join condition>]]
[WHERE <condition>]

Let’s describe the everyday stuff surrounding the JOIN.

Table Sources

You can add up to 256 tables sources, according to Microsoft. Of course, it depends on your server resources. I have never joined more than 10 tables in my life, not to mention 256. Anyway, table sources can be any of the following:

Table Alias

An alias is optional, but it shortens your code and minimizes typing. It also helps you avoid errors when a column name exists in two or more tables used in a SELECT, UPDATE, INSERT, or DELETE. It also adds clarity to your code. It is optional, but I’d recommend using aliases. (Unless you love to type table sources by name.)

Join Condition

The ON keyword precedes the join condition that can be a single join or 2-key columns from the 2 tables joined. Or it can be a composite join using more than 2 key columns. It defines how tables are related.

However, we use the join condition only for INNER and OUTER joins. Using it on a CROSS JOIN will trigger an error.

Since join conditions define the relationships, they need operators.

The most common of join condition operators is equality (=) operator. Other operators like > or < work too, but they are rare.

SQL JOIN vs. Subqueries

Most joins can be rewritten as subqueries and vice-versa. Check out this article to learn more about subqueries compared to joins.

Joins and Derived Tables

Using derived tables in a join looks like this:

FROM table1 a
INNER JOIN (SELECT y.column3 from table2 x
            INNER JOIN table3 y on x.column1 = y.column1) b ON a.col1 = b.col2

It is joining from the result of another SELECT statement, and it’s perfectly valid.

You will have more examples, but let’s deal with one last thing about SQL JOINS. It is how SQL Server’s Query Optimizer processes joins.

How SQL Server Processes Joins

To understand how the process works, you need to know about the two types of operations involved:

  • Logical operations correspond to the joins types used in a query: INNER, OUTER, or CROSS. You, as the developer, define this part of the processing when forming the query.
  • Physical operations – the Query Optimizer chooses the best physical operation applicable for your join. The best means the fastest to produce results. The Execution Plan of your query will show the physical join operators chosen. These operations are:
    • Nested Loop Join. This operation is fast if one of the two tables is small and the second one is large and indexed. It requires the least I/O with the fewest comparisons, but it’s not good for large result sets.
    • Merge Join. This is the fastest operation for large and sorted result sets by columns used in the join.
    • Hash Join. The query optimizer uses it when the result set is too large for a Nested Loop, and inputs are unsorted for a Merge join. A hash is more efficient than sorting it first and applying a Merge join.
    • Adaptive Join. Starting with SQL Server 2017, it enables the choice between a nested loop or hash. The join method is deferred until the first input is scanned. This operation dynamically switches to a better physical join without recompiling.
The 4 different physical join operators used in Execution Plans.
Figure 1: The 4 different physical join operators used in Execution Plans.

Why Do We Need to Bother with This?

One word: Performance.

One thing is to know how to form queries with joins to produce correct results. Another one is to make it run as fast as possible. You have to be extra concerned about this if you want a good reputation with your users.

So, what do you need to look out for in the Execution Plan for these logical operations?

  • Suppose a Sort operator precedes the Merge Join. This sort operation is expensive for large tables (Figure 2). You can fix this by pre-sorting the input tables in the join.
  • Suppose there are duplicates in the input tables of a Merge join. SQL Server will write the duplicates of the second table into a WorkTable in tempdb. Then, it will make the comparisons there. The STATISTICS IO will reveal any WorkTables involved.
  • When huge data spills to tempdb in a Hash join, the STATISTICS IO will reveal a large logical read on WorkFiles or WorkTables. A warning will also appear in the Execution Plan (Figure 3). You can apply two things: pre-sort the input tables or lessen the joins, if possible. As a result, the Query Optimizer may choose another physical join.
A sample execution plan having a Merge Join preceded by a Sort operator.
Figure 2: A sample execution plan having a Merge Join preceded by a Sort operator.
STATISTICS IO showing a WorkFile with logical reads and Execution Plan with Hash Join. A warning shows that data spilled in Tempdb.
Figure 3: STATISTICS IO showing a WorkFile with logical reads and Execution Plan with Hash Join. A warning shows that data spilled in Tempdb.

Join Hints

Join hints are new in SQL Server 2019. When you use it in your joins, it tells the query optimizer to stop deciding what is best for the query. You’re the boss when it comes to the physical join to use.

Now stop, right there. The truth is, the query optimizer typically selects the best physical join for your query. If you don’t know what you’re doing, do not use join hints.

The possible hints you can specify are LOOP, MERGE, HASH, or REMOTE.

I haven’t used join hints, but here’s the syntax:


<join type> <join hint> JOIN <table source> [<alias>] ON <join condition>

All About INNER JOIN

INNER JOIN returns the rows with matching records in both tables, based on a condition. It is also the default join if you don’t specify the keyword INNER:

A simple illustration of an INNER JOIN with 2 tables.
Figure 4. A simple illustration of an INNER JOIN with 2 tables.

As you see, matching rows from Table1 and Table2 are returned using Key1 as the join condition. The Table1 record having Key1 = ‘C’ is excluded because there are no matching records in Table2.

Whenever I form a query, my first choice is INNER JOIN. OUTER JOIN comes when requirements dictate it only.

INNER JOIN Syntax

There are two INNER JOIN syntax supported in T-SQL: SQL-92 and SQL-89.

SQL-92 INNER JOIN

FROM <table source1> [<alias1>]
INNER JOIN <table source2> [<alias2>] ON <join condition1>
[INNER JOIN <table source3> [<alias3>] ON <join condition2>
 INNER JOIN <table sourceN> [<aliasN>] ON <join conditionN>]
[WHERE <condition>]

SQL-89 INNER JOIN

FROM <table source1> [alias1], <table source2> [alias2] [, <table source3> [alias3], <table sourceN> [aliasN]]
WHERE (<join condition1>)
[AND (<join condition2>)
AND (<join condition3>)
AND (<join conditionN>)]

Which INNER JOIN Syntax is Better?

The first join syntax I learned was SQL-89. When SQL-92 finally arrived, I thought it’s too lengthy. I also thought as the output was the same, why bother typing more keywords? A graphical query designer had the generated code SQL-92, and I changed it back to SQL-89. But today, I prefer SQL-92 even if I have to type more. Here’s why:

  • The intention of the type of join is clear. The next guy or gal who will maintain my code will know what is intended in the query.
  • Forgetting the join condition in a SQL-92 syntax will trigger an error. Meanwhile, forgetting the join condition in SQL-89 will be treated as a CROSS JOIN. If I meant an INNER or OUTER join, it would be an unnoticeable logic bug until the users complain.
  • New tools are more inclined to SQL-92. If I ever use a graphical query designer again, I don’t have to change it to SQL-89. I’m no longer stubborn, so my heart rate is back to normal. Cheers to me.

The reasons above are mine. You may have your reasons why you prefer SQL-92 or why you hate it. I wonder what those reasons are. Let me know in the Comments section below.

But we can’t end this article without examples and explanations.

10 INNER JOIN Examples

1. Joining 2 Tables

Here’s an example of 2 tables joined together using INNER JOIN in SQL-92 syntax.

-- Display Vests, Helmets, and Light products

USE AdventureWorks
GO

SELECT
 p.ProductID
,P.Name AS [Product]
,ps.ProductSubcategoryID
,ps.Name AS [ProductSubCategory]
FROM Production.Product p
INNER JOIN Production.ProductSubcategory ps ON P.ProductSubcategoryID = ps.ProductSubcategoryID
WHERE P.ProductSubcategoryID IN (25, 31, 33);  -- for vest, helmet, and light 
                                            -- product subcategories

You only specify the columns that you need. In the above example, 4 columns are specified. I know it’s too lengthy than SELECT * but keep this in mind: it’s best practice.

Notice also the use of table aliases. Both the Product and ProductSubcategory tables have a column named [Name]. If you don’t specify the alias, an error will be triggered.

Meanwhile, here’s the equivalent SQL-89 syntax:

-- Display Vests, Helmets, and Light products

USE AdventureWorks
GO

SELECT
 p.ProductID
,P.Name AS [Product]
,ps.ProductSubcategoryID
,ps.Name AS [ProductSubCategory]
FROM Production.Product p, Production.ProductSubcategory ps 
WHERE P.ProductSubcategoryID = ps.ProductSubcategoryID
AND P.ProductSubcategoryID IN (25, 31, 33);

They are the same except for the join condition mixed in the WHERE clause with an AND keyword. But under the hood, are they really the same? Let’s inspect the result set, the STATISTICS IO, and the Execution Plan.

See the result set of 9 records:

Result set of 2 queries using different join syntax.
Figure 5: Result set of 2 queries using different join syntax.

It’s not just the results, but the resources required by SQL Server is also the same.

See the logical reads:

Both queries have the same logical reads for the 2 input tables.
Figure 6. Both queries have the same logical reads for the 2 input tables.

Finally, the Execution Plan reveals the same query plan for both queries when their QueryPlanHashes are equal. Notice also the highlighted operations in the diagram:

Execution Plan for both queries is the same and has the same query HashPlan.
Figure 7. Execution Plan for both queries is the same and has the same query HashPlan.

Based on the findings, SQL Server query processing is the same, whether it’s SQL-92 or SQL-89. But as I said, the clarity in SQL-92 is much better for me.

Figure 7 also shows a Nested Loop Join used in the plan. Why? The result set is small.

2. Joining Multiple Tables

Check out the query below using 3 joined tables.

-- Get the total number of orders per Product Category
USE AdventureWorks
GO

SELECT
 ps.Name AS ProductSubcategory
,SUM(sod.OrderQty) AS TotalOrders
FROM Production.Product p
INNER JOIN Sales.SalesOrderDetail sod ON P.ProductID = sod.ProductID
INNER JOIN Sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID
INNER JOIN Production.ProductSubcategory ps ON p.ProductSubcategoryID = ps.ProductSubcategoryID
WHERE soh.OrderDate BETWEEN '1/1/2014' AND '12/31/2014'
AND p.ProductSubcategoryID IN (1,2)
GROUP BY ps.Name
HAVING ps.Name IN ('Mountain Bikes', 'Road Bikes')

3. Composite Join

You can also join 2 tables using 2 keys to relate it. Check out the sample below. It uses 2 join conditions with an AND operator.

SELECT
 a.column1
,b.column1
,b.column2
FROM Table1 a
INNER JOIN Table2 b ON a.column1 = b.column1 AND a.column2 = b.column2

4. INNER JOIN Using a Nested Loop Physical Join

In the example below, the Product table has 9 records – a small set. The joined table is SalesOrderDetail – a large set. The Query Optimizer will use a Nested Loop Join, as shown in Figure 8.

USE AdventureWorks
GO

SELECT
 sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);
Nested Loop Join for a small input table and a large input table.
Figure 8. Nested Loop Join for a small input table and a large input table.

5. INNER JOIN Using a Merge Physical Join

The example below uses a Merge Join because both input tables are sorted by SalesOrderID.

SELECT
 soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
Merge join for input tables sorted by SalesOrderID.
Figure 9. Merge join for input tables sorted by SalesOrderID.

6. INNER JOIN Using a Hash Physical Join

The following example will use a Hash Join:

SELECT
 s.Name AS Store
,SUM(soh.TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.Store s ON soh.SalesPersonID = s.SalesPersonID
GROUP BY s.Name
Execution Plan with a Hash Join.
Figure 10. Execution Plan with a Hash Join.

7. INNER JOIN Using Adaptive Physical Join

In the example below, the SalesPerson table has a Non-clustered ColumnStore Index on the TerritoryID column. The Query Optimizer decided on a Nested Loop Join, as shown in Figure 11.

SELECT
sp.BusinessEntityID
,sp.SalesQuota
,st.Name AS Territory
FROM Sales.SalesPerson sp
INNER JOIN Sales.SalesTerritory st ON sp.TerritoryID = st.TerritoryID
WHERE sp.TerritoryID BETWEEN 1 AND 5
An Actual Execution Plan Showing an Adaptive Join. The Query Optimizer decided on Nested Loop as the best physical join at runtime.
Figure 11. An Actual Execution Plan Showing an Adaptive Join. The Query Optimizer decided on Nested Loop as the best physical join at runtime.

8. Two Ways to Rewrite a Subquery to an INNER JOIN

Consider this statement with a nested subquery:

SELECT [SalesOrderID], [OrderDate], [ShipDate], [CustomerID]
FROM Sales.SalesOrderHeader 
WHERE [CustomerID] IN (SELECT [CustomerID] FROM Sales.Customer
			WHERE PersonID IN (SELECT BusinessEntityID FROM Person.Person
                                          WHERE lastname LIKE N'I%' AND PersonType='SC'))

The same results can come out if you change it to an INNER JOIN, as below:

SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.PersonType = 'SC'
AND p.lastname LIKE N'I%'

Another way to rewrite it is by using a derived table as a table source for the INNER JOIN:

SELECT o.[SalesOrderID], o.[OrderDate], o.[ShipDate], o.[CustomerID]
FROM Sales.SalesOrderHeader o
INNER JOIN (SELECT c.CustomerID, P.PersonType, P.LastName
            FROM Sales.Customer c
            INNER JOIN Person.Person p ON c.PersonID = P.BusinessEntityID
	      WHERE p.PersonType = 'SC'
	        AND p.LastName LIKE N'I%') AS q ON o.CustomerID = q.CustomerID

All 3 queries output the same 48 records.

9. Using Join Hints

The following query uses a Nested Loop:

SELECT
 sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);

If you want to force it to a Hash join, this is what happens:

SELECT
 sod.SalesOrderDetailID
,sod.ProductID
,P.Name
,P.ProductNumber
,sod.OrderQty
FROM Sales.SalesOrderDetail sod
INNER HASH JOIN Production.Product p ON sod.ProductID = p.ProductID
WHERE P.ProductSubcategoryID IN(25, 31, 33);

However, note that STATISTICS IO shows that performance will turn bad when you force it to a Hash join.

Meanwhile, the query below uses a Merge Join:

SELECT
 soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID

Here’s what it becomes when you force it to a Nested Loop:

SELECT
 soh.SalesOrderID
,soh.OrderDate
,sod.SalesOrderDetailID
,sod.ProductID
,sod.LineTotal
FROM Sales.SalesOrderHeader soh
INNER LOOP JOIN sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID

Upon checking the STATISTICS IO of both, forcing it to a Nested Loop require more resources to process the query:

STATISTICS IO of 2 queries. The first one has no join hint. The other is the opposite. Forcing the physical join results in a higher logical read.
Figure 12. STATISTICS IO of 2 queries. The first one has no join hint. The other is the opposite. Forcing the physical join results in a higher logical read.

Thus, using join hints should be your last resort when tweaking for performance. Let your SQL Server handle it for you.

10. Using INNER JOIN in UPDATE

You can also use INNER JOIN in an UPDATE statement. Here’s an example:

UPDATE Sales.SalesOrderHeader
SET ShipDate = getdate() 
FROM Sales.SalesOrderHeader o
INNER JOIN Sales.Customer c on o.CustomerID = c.CustomerID
INNER JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
WHERE p.PersonType = 'SC'

Since it’s possible to use a join in an UPDATE, why not try it out using DELETE and INSERT?

SQL Join and INNER JOIN Takeaways

So, what’s the big deal about SQL join?

  • An SQL JOIN combines records of 2 or more tables to form one result set.
  • There are types of joins in SQL: INNER, OUTER, and CROSS.
  • As a developer or administrator, you decide which logical operations or join types to use for your requirements.
  • On the other hand, the Query Optimizer decides the best physical join operators to use. It can be Nested Loop, Merge, Hash, or Adaptive.
  • You can use join hints to force what physical join to use, but it should be your last resort. In most cases, it’s better to let your SQL Server handle it.
  • Knowing the physical join operators also helps you in tuning query performance.
  • Also, subqueries can be rewritten using joins.

Meanwhile, this post showed 10 examples of INNER JOINs. It’s not just sample codes. Some of them also include an inspection of how the code works from the inside out. It’s not just to help you code but to help you be mindful of performance. At the end of the day, results should not just be correct but also delivered fast.

We’re not done yet. The next article will deal with OUTER JOINS. Stay tuned.

See also

SQL Joins let you fetch and combine data from more than one table. Watch this video to learn more about SQL Joins.

SQL Joins : how to use JOINs in the SELECT statements?
Tags: , , Last modified: September 18, 2021
Close