Category Archives: T-SQL

The two things you need to know about outer joins

Today I’ll discuss two very important items to keep in mind when coding with an outer join.  These issues apply to both the left and right outer join.

Join order matters – with an inner join, the join order does not affect the results set.  The SELECT statement returns the same number of rows regardless of the order in which the tables are listed.  But that is not the case with an outer join.  Here is an example from the AdventureWorks sample database.

First I’ll join the Production.Product table to the Production.ProductInventory table with a left join so I can list all the products(including those without any inventory) and their corresponding quantities.

SELECT     p.Name as ProductName,
        inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
    ON p.ProductID = inv.ProductId

But after reviewing the output, I realize that the location name is needed to make this query complete, so I’ll add the location table to the query with an inner join:

SELECT     p.Name as ProductName,
        inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
    ON p.ProductID = inv.ProductId
INNER JOIN Production.Location l
    ON l.LocationID = inv.LocationID
ORDER BY p.Name desc

But now something is wrong with the code, because the result set is missing the products without inventory.  Why?

That’s because using an inner join to add the location table filters out the products with no quantity.  Since they have no inventory, they do not have a row in the ProductInventory table and hence nothing to join to in the ON l.LocationID = inv.LocationID clause.  Since I want to keep the rows where inv.Quantity is null, let’s change this to a left join:

SELECT     p.Name as ProductName,
        inv.Quantity
FROM Production.Product p
LEFT JOIN Production.ProductInventory inv
    ON p.ProductID = inv.ProductId
LEFT JOIN Production.Location l
    ON l.LocationID = inv.LocationID

And that works!  All of our products (even if they do not have any inventory) are in the result set.

WHERE clause vs. ON clause

The WHERE clause is typically used to restrict the results of a query.  But in the case of an outer join, using the WHERE clause can give you the wrong results.  Here I want to see all the products and the total quantity of each product currently in work.  So I left join the Production.Product table to the Production.WorkOrder table to see all 504 products and the quantity in work.  Quantity is zero for products not currently in production.

SELECT    p.Name,
        COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
    ON p.productID = wo.ProductID
GROUP BY p.Name

But I don’t want to see all work orders, just the most recent ones.  So I try to limit the query to the most recent work orders by putting the StartDate in the WHERE clause:

SELECT    p.Name,
        COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
    ON p.productID = wo.ProductID
WHERE wo.StartDate >= ’1/1/2004′
GROUP BY p.Name

But now I have lost my products not currently in production.  So now what?  I’ll move the StartDate restriction to the ON clause, so it will be part of the join:

SELECT    p.name,
        COUNT(wo.WorkOrderID)
FROM Production.Product p
LEFT JOIN Production.WorkOrder wo
    ON p.productID = wo.ProductID
        AND StartDate >= ’1/1/2004′
GROUP BY p.Name

Now I will return all 504 products, with the quantities for the most recent work orders.  So why does using the ON clause work and not the WHERE clause?  There are two reasons.  First the ON clause is applied as part of the join and the WHERE clause is applied after the join.  Secondly, the WHERE clause is applied to the right side of the left outer join, the WorkOrder table, the table we only want rows from if there is a match.

So, don’t be afraid to use an outer join, just be aware of the gotcha’s when coding a left/right join.

SQL to get all Month End and Begin Dates

Many web sites and blogs will tell you how to find the last day, first day of a month or year

But what if you need the month end and begin dates for a specific time period?  Well, here is the code for that:

–script to get month begining & end dates for a time period
–declare all variables
DECLARE @StartDate datetime
DECLARE @EndDate datetime
DECLARE @MonthEndDate datetime
DECLARE @MonthBeginDate datetime

DECLARE @MonthlyDates Table
(    MonthID INT NOT NULL IDENTITY(1,1),
MonthBeginDate SMALLDATETIME NOT NULL,
MonthEndDate SMALLDATETIME NOT NULL
)

–set the time period here
SET @StartDate = ’3/1/2009′
SET @EndDate = ’5/31/2012′

–Get the first month’s begin and end dates
SELECT @MonthBeginDate = DATEADD(dd,-(DAY(@StartDate)-1),@StartDate)
SELECT @MonthEndDate = DATEADD(mm,1,@StartDate) – DAY(DATEADD(mm,1,@StartDate))

–do the first insert into the holding table
INSERT INTO @MonthlyDates
(MonthBeginDate, MonthEndDate)
VALUES
(@MonthBeginDate, @MonthEndDate)

–use while loop to move through each month
WHILE @MonthEndDate < @EndDate
BEGIN

SELECT @MonthBeginDate = DATEADD(mm, 1, @MonthBeginDate)
SELECT @MonthEndDate = DATEADD(dd, -1, DATEADD(mm, 1, @MonthBeginDate))

INSERT INTO @MonthlyDates
(MonthBeginDate, MonthEndDate)
VALUES
(@MonthBeginDate, @MonthEndDate)

END
SELECT
MonthBeginDate,
MonthEndDate
FROM @MonthlyDates

So this code will give you all the months begin & end dates, in order.  Now in this code I’m not looking at when the @StartDate or @EndDate falls in the month.  For example, if I pass into the code a start date of March 10, 2009 it will return, as the first row, the begin and end dates for the month of March.  If you wanted to only return dates for whole months that your range covers (in the above example that would be a start month of April), you would need to alter the code.

Oh, and it is good for leap years too!

Good reference links for the TOP Keyword

This past week I needed to use the TOP keyword for some code and came across some excellent resources:

First, the quick basics of top:

Dynamically controlling the number of rows affected by a SQL Server query

Next, a good article on how to retrieve top and bottom rows together:

How to Retrieve TOP and BOTTOM Rows Together using T-SQL

And an excellent article on the TOP per group scenario:

Who’s On First? Solving the Top per Group Problem

And the book Inside Microsoft SQL Server 2008 T-SQL Querying there is a whole chapter mostly on TOP; Chapter 9 TOP and APPLY.

TOP isn’t something used everyday in SQL Server, but when you need it, you’ll be happy it’s part of TSQL.

The New Date & Time Datatypes are great, but…

The first time I read about the new Date and Time datatypes I though “Well, that’s nice”.  But the more I think about it the happier I am.  No more stripping off the time values to do a date only comparison.  Thank goodness.   Some other pluses;  possible use of indexes, safe to use BETWEEN for date ranges, easier reading of code without all of those CONVERT functions. 

But don’t be so eager to leave out the time part.  You need to be absolutely sure that you don’t need the time piece.  If your not sure, keep the time part in a column separate from the date.

There is some good sample data for these datatypes in the AdventureWorks2008 database.  Take a look at the Human Resources.Employee table, Birthdates & Hire Date columns.  And also look at Human Resources.Shift table, Start Time & End Time columns.

Linker Servers and Server Aliases

Just the other day I received a call from a developer with a database problem. He was receiving timeouts when he called a particular stored procedure from his application. So I took a look at the proc, reviewed the code, and tried to run it. It just ran and ran and ran, no error messages, nothing. Then I tried to recompile the proc, same thing, just hung. So I stopped it, looked at the amount of data it needed to run, not much data involved, so I cut the proc into pieces and ran it a chuck at a time. Finally, I got to the problem part of the code, the part that just hung. In reading the code, I noticed that it was referencing a linked server, hmmm, where does this point to? So, I go into the Client Network Utility and here we are, an old server that no longer exists! Change the server name and port number and yes! The proc complies and runs! Weird, wonder why it just didn’t just return an error message. Oh well, all in a day’s fun!

Adventures in SQL Server Performance Tuning – Joins vs. Where Exists

What is wrong with this query?

SELECT c.CustomerName, b.BillingID
FROM BillingInfo bi
JOIN Customer c
ON bi.CustomerID = c.CustomerID
JOIN DailyTransactions dt
ON bi.BillingID = dt.BillingID
WHERE dt.Transdate >= @startdate
AND dt.Transdate < @enddate

Well, the first thing that I see is that we are joining to the DailyTransactions table, but we aren’t returning
any columns in the results set. Why? The reason is because the goal of this query is to return all customers
that have transactions in a given period, so we must check the DailyTransations table for this. But we don’t care
if the customer had 1 transaction or 1000, nor do we need any columns of data from that table, so here is better
(quicker!) way to do this:

SELECT c.CustomerName, b.BillingID
FROM BillingInfo bi
JOIN Customer c
ON bi.CustomerID = c.CustomerID
WHERE EXISTS (SELECT 1 FROM DailyTransactions dt
WHERE bi.BillingID = dt.BillingID
AND Transdate >= @startdate
AND Transdate < @enddate)

A WHERE EXISTS will be much faster than a join because as soon as the first match is found for that BillingID,
SQL Server stops looking for more, unlike a join, which will match up every row.

When I fixed a stored procedure with this exact problem, I cut the run time required by two-thirds. Since this stored
procedure loads a dropdown in Reporting Services, that cut the time required to load the report.

A Very Simple PIVOT Example

When I first started working with the new PIVOT command in SQL Server 2005, it didn’t make any sense to me.  So I created some very simple examples on my own to help with my understanding.  Here is one.  First, I create the CTE (as I explained in my last post):

WITH Sales as (
SELECT  SalesOrderID, Name, OrderDate
FROM Sales.SalesTerritory st
INNER JOIN Sales.SalesOrderHeader soh
ON st.TerritoryID = soh.TerritoryID
WHERE OrderDate >= ’7/1/2001′
AND OrderDate < ’8/1/2001′
)

The CTE contains the sales orders and the territory name for that sale during the month of July 2001.  Next we will create the PIVOT.  The goal of the PIVOT is to count the number of sales by territory.  Not very exciting, but a good simple example:

SELECT Convert(varchar(20), OrderDate, 101) AS ‘Sales Date’,
Northwest,
Central,
Southwest,
Canada,
France,
Germany,
Australia
FROM sales  — the CTE above
PIVOT
(
COUNT (SalesOrderID)
FOR Name in ([Northwest], [Central], [Southwest], [Canada], [France], [Germany], [Australia])) as a
ORDER BY OrderDate

One downside to the PIVOT command is you need to know the column names at design time.  They must be hard coded.  The only way you can get around this is to use dynamic SQL.

A little bit about Common Table Expressions

I was planning to do a series of blogs about the new PIVOT and UNPIVOT functionality in SQL Server 2005, but I’ve decided to take a step back and do a post on Common Table Expressions (CTE’s) first. They are used in conjunction with PIVOT and UNPIVOT to make the code a bit cleaner.

If you have used derived tables in your scripts before than Common Table Expressions will easy for you to understand.
Here is a simple derived table example (from the Adventure Works sample DB) that returns the total quantity of items for each order in the month of December 2001:

SELECT SalesOrderID,
NumberOfItems
FROM(

SELECT soh.SalesOrderID,
Sum(sod.OrderQty) as NumberOfItems
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE OrderDate >= ’12/1/2001′
AND OrderDate < ’1/1/2002′
GROUP BY soh.SalesOrderID

) AS TotalQty

And here is the same query, done as a CTE. Notice the use of the WITH keyword:
WITH TotalQty
AS (

SELECT soh.SalesOrderID,
SUM(sod.OrderQty) as NumberOfItems
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE OrderDate >= ’12/1/2001′
AND OrderDate < ’1/1/2002′
GROUP BY soh.SalesOrderID

)

SELECT SalesOrderID,
NumberOfItems
FROM TotalQty

Is that Column Really a Varchar?

If the data is padded with trailing zeros, well than it might as well be a char datatype.

I’ve been working with a series of partitioned tables this past week that are just huge, well over 170 million rows total and over 100 columns. As I was writing some queries for the data I discovered that all of the varchar data was stored with trailing spaces up to the length of the column.  When the data was loaded into the tables, it wasn’t stripped of the trailing spaces.     So, if a column was defined as a varchar (30) and held a piece of data such as “Tom”, it took all 30 bytes not just 3.  Of course this made the tables very large.  I created a clean up script using the trim functions to strip the trailing spaces from the data.

My tables are much smaller now and faster too!