Another SSIS Gotcha!
Here is another difference between DTS & SSIS that will trip you up if you are not expecting it. The issue deals with loading flat files with varying numbers of columns. For example the 1st row has 5 columns, then the 2d row has 3 columns, and then the 3d row has 4 columns. DTS handled this by always looking for the row delimiter first, whereas SSIS expects to always have the same number of column delimiters in each row. Here are some great posts that can explain this better than I can:
October 2, 2008 1 Comment
I "Really" want to upgrade, but is it the best choice?
Currently my work life is deep in Reporting Services, doing some enhancements to some existing Accounting reports. They are all in SQL Server 2000 Reporting Services and I would love to upgrade them all to RS 2005. Upgrading the reports would be easy. But I won’t, and here’s why:
- We have various processes that auto generate these reports in various formats. Converting the reports to RS 2005 would require a re-write of these processes. That would be a lot of time, time better spent on other projects.
- The isn’t any real benefit to upgrading. They are pretty simple reports.
- Because of 1 & 2 above, I can’t justify upgrading to my boss. Enough said.
June 25, 2008 No Comments
Adventures in SQL Server Performance Tuning – Joins vs. Where Exists
What is wrong with this query?
SELECT c.CustomerName, b.BillingID
FROM BillingInfo bi
JOIN Customer c
ON bi.CustomerID = c.CustomerID
JOIN DailyTransactions dt
ON bi.BillingID = dt.BillingID
WHERE dt.Transdate >= @startdate
AND dt.Transdate < @enddate
Well, the first thing that I see is that we are joining to the DailyTransactions table, but we aren’t returning
any columns in the results set. Why? The reason is because the goal of this query is to return all customers
that have transactions in a given period, so we must check the DailyTransations table for this. But we don’t care
if the customer had 1 transaction or 1000, nor do we need any columns of data from that table, so here is better
(quicker!) way to do this:
SELECT c.CustomerName, b.BillingID
FROM BillingInfo bi
JOIN Customer c
ON bi.CustomerID = c.CustomerID
WHERE EXISTS (SELECT 1 FROM DailyTransactions dt
WHERE bi.BillingID = dt.BillingID
AND Transdate >= @startdate
AND Transdate < @enddate)
A WHERE EXISTS will be much faster than a join because as soon as the first match is found for that BillingID,
SQL Server stops looking for more, unlike a join, which will match up every row.
When I fixed a stored procedure with this exact problem, I cut the run time required by two-thirds. Since this stored
procedure loads a dropdown in Reporting Services, that cut the time required to load the report.
March 10, 2008 1 Comment
The Good Enough Database
We’ve all seen them, the poorly (or barely!!) normalized database. Every time you dig into the thing you just groan with pain. You go into it to fix one thing and you find half-dozen other things you want to fix as well. But how much work should you put into the thing?
Earlier in my career I would spend all sorts of time just fixing away. It’s so much fun to tweak some code and see it run 2, 3, 4 or more times faster then before. But do you really need to fix it? From a business point of view, can you justify the time? If you made that piece of code run 5 times faster, would anybody notice or care? And would all your hard work have a positive impact on the bottom line of your employer?
Just because you can fix something, doesn’t mean you should spend the time. There are plenty of databases that suffer from poor design, bad coding practice, lack of useable indexes, etc. but still manage to do an OK (but not great) job. Maybe they do not support many users, or they are a bit slow, but nobody complains too much about it. Maybe they support a declining part of the business. Maybe you would be providing more value to the business doing something else.
Just my 2 cents worth!
September 5, 2007 No Comments
DTS to SSIS – Don’t underestimate the paradigm shift needed
For some 3 years I worked in a shop that primarily did ETL work with DTS. I became quite familiar with all of the little quirks and idiosyncrasies of DTS, and I learned exactly how to get stuff done in DTS. DTS is like an old friend to me! But now I’ve started to work with SSIS – SQL Server Integration Services, the replacement for DTS that’s included with SQL Server 2005. The change to SSIS has been a challenge for me, particularly the separation of control flow from data flow. SSIS is vastly different from DTS, so if you have a need to start working with SSIS, do yourself a favor and budget some time to get up to speed. Believe me, you will need it!
July 19, 2007 No Comments
A Very Simple PIVOT Example
When I first started working with the new PIVOT command in SQL Server 2005, it didn’t make any sense to me. So I created some very simple examples on my own to help with my understanding. Here is one. First, I create the CTE (as I explained in my last post):
WITH Sales as (
SELECT SalesOrderID, Name, OrderDate
FROM Sales.SalesTerritory st
INNER JOIN Sales.SalesOrderHeader soh
ON st.TerritoryID = soh.TerritoryID
WHERE OrderDate >= ‘7/1/2001′
AND OrderDate < ‘8/1/2001′
)
The CTE contains the sales orders and the territory name for that sale during the month of July 2001. Next we will create the PIVOT. The goal of the PIVOT is to count the number of sales by territory. Not very exciting, but a good simple example:
SELECT Convert(varchar(20), OrderDate, 101) AS ‘Sales Date’,
Northwest,
Central,
Southwest,
Canada,
France,
Germany,
Australia
FROM sales — the CTE above
PIVOT
(
COUNT (SalesOrderID)
FOR Name in ([Northwest], [Central], [Southwest], [Canada], [France], [Germany], [Australia])) as a
ORDER BY OrderDate
One downside to the PIVOT command is you need to know the column names at design time. They must be hard coded. The only way you can get around this is to use dynamic SQL.
July 15, 2007 2 Comments
A little bit about Common Table Expressions
I was planning to do a series of blogs about the new PIVOT and UNPIVOT functionality in SQL Server 2005, but I’ve decided to take a step back and do a post on Common Table Expressions (CTE’s) first. They are used in conjunction with PIVOT and UNPIVOT to make the code a bit cleaner.
If you have used derived tables in your scripts before than Common Table Expressions will easy for you to understand.
Here is a simple derived table example (from the Adventure Works sample DB) that returns the total quantity of items for each order in the month of December 2001:
SELECT SalesOrderID,
NumberOfItems
FROM(
Sum(sod.OrderQty) as NumberOfItems
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE OrderDate >= ‘12/1/2001′
AND OrderDate < ‘1/1/2002′
GROUP BY soh.SalesOrderID
) AS TotalQty
And here is the same query, done as a CTE. Notice the use of the WITH keyword:
WITH TotalQty
AS (
SUM(sod.OrderQty) as NumberOfItems
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod
ON soh.SalesOrderID = sod.SalesOrderID
WHERE OrderDate >= ‘12/1/2001′
AND OrderDate < ‘1/1/2002′
GROUP BY soh.SalesOrderID
)
SELECT SalesOrderID,
NumberOfItems
FROM TotalQty
June 2, 2007 No Comments
Error Handling in SQL Server 2005
If you are trying to get up to speed on error handling in SQL Server 2005, here is a great resource:
SQL Server Error Handling Workbench
The SQL Server Error Handling workbench will take you through several comparisons of error handling routines in SQL 2000 and 2005. The best example is at the end of the article which is a script that shows you how to retry a query again if it fails. Good stuff!
February 23, 2007 No Comments
Passed 70-431!!!
Recently I took and passed Exam 70-431: Microsoft SQL Server 2005-Implementation and Maintenance. I wasn’t sure I was going to pass it, so I was keeping my fingers crossed as the test calculated my score. My score was 841!
Here’s my take on the exam:
Lots and lots of simulations. You need to get some practice time with SQL Server Management Studio to pass the test. This part was hard because I usually write code to accomplish my tasks. I’m not much of a GUI user.
The MS prep guide is right on target. Use it as your study guide. You can find it at: http://www.microsoft.com/learning/exams/70-431.mspx
Study up on the major new features of SQL 2005 such as Database Mirroring, XML, Snapshots, and Service Broker.
Take your time and read the whole question!
February 22, 2007 No Comments
Is that Column Really a Varchar?
If the data is padded with trailing zeros, well than it might as well be a char datatype.
I’ve been working with a series of partitioned tables this past week that are just huge, well over 170 million rows total and over 100 columns. As I was writing some queries for the data I discovered that all of the varchar data was stored with trailing spaces up to the length of the column. When the data was loaded into the tables, it wasn’t stripped of the trailing spaces. So, if a column was defined as a varchar (30) and held a piece of data such as “Tom”, it took all 30 bytes not just 3. Of course this made the tables very large. I created a clean up script using the trim functions to strip the trailing spaces from the data.
My tables are much smaller now and faster too!
January 31, 2007 No Comments