Today I’ll discuss two very important items to keep in mind when coding with an outer join. These issues apply to both the left and right outer join.
Join order matters – with an inner join, the join order does not affect the results set. The SELECT statement returns the same number of rows regardless of the order in which the tables are listed. But that is not the case with an outer join. Here is an example from the AdventureWorks sample database.
First I’ll join the Production.Product table to the Production.ProductInventory table with a left join so I can list all the products(including those without any inventory) and their corresponding quantities.
SELECT p.Name as ProductName,
inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
ON p.ProductID = inv.ProductId
But after reviewing the output, I realize that the location name is needed to make this query complete, so I’ll add the location table to the query with an inner join:
SELECT p.Name as ProductName,
inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
ON p.ProductID = inv.ProductId
INNER JOIN Production.Location l
ON l.LocationID = inv.LocationID
ORDER BY p.Name desc
But now something is wrong with the code, because the result set is missing the products without inventory. Why?
That’s because using an inner join to add the location table filters out the products with no quantity. Since they have no inventory, they do not have a row in the ProductInventory table and hence nothing to join to in the ON l.LocationID = inv.LocationID clause. Since I want to keep the rows where inv.Quantity is null, let’s change this to a left join:
SELECT p.Name as ProductName,
inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
ON p.ProductID = inv.ProductId
LEFT JOIN Production.Location l
ON l.LocationID = inv.LocationID
And that works! All of our products (even if they do not have any inventory) are in the result set.
WHERE clause vs. ON clause
The WHERE clause is typically used to restrict the results of a query. But in the case of an outer join, using the WHERE clause can give you the wrong results. Here I want to see all the products and the total quantity of each product currently in work. So I left join the Production.Product table to the Production.WorkOrder table to see all 504 products and the quantity in work. Quantity is zero for products not currently in production.
SELECT p.Name,
COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
ON p.productID = wo.ProductID
GROUP BY p.Name
But I don’t want to see all work orders, just the most recent ones. So I try to limit the query to the most recent work orders by putting the StartDate in the WHERE clause:
SELECT p.Name,
COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
ON p.productID = wo.ProductID
WHERE wo.StartDate >= ’1/1/2004′
GROUP BY p.Name
But now I have lost my products not currently in production. So now what? I’ll move the StartDate restriction to the ON clause, so it will be part of the join:
SELECT p.name,
COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
ON p.productID = wo.ProductID
AND StartDate >= ’1/1/2004′
GROUP BY p.Name
Now I will return all 504 products, with the quantities for the most recent work orders. So why does using the ON clause work and not the WHERE clause? There are two reasons. First the ON clause is applied as part of the join and the WHERE clause is applied after the join. Secondly, the WHERE clause is applied to the right side of the left outer join, the WorkOrder table, the table we only want rows from if there is a match.
So, don’t be afraid to use an outer join, just be aware of the gotcha’s when coding a left/right join.