You've heard my rant before "measure what you think you are measuring". If not, follow the link. Here's an example using the LAG function, new in SQL Server Denali, to measure sales trends. We'll start with a view from AdventureWorksDW2008R2 (the double cast is for clarity).
CREATE VIEW dbo.EmployeeSalesByMonth
AS
SELECT EmployeeKey as [Employee],
DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Year] ,
DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Month] ,
SUM(SalesAmount) AS [EmployeeTotal]
FROM dbo.FactResellerSales
WHERE EmployeeKey IS NOT NULL
GROUP BY EmployeeKey,
DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)),
DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE))
GO
Let's use LAG to get sales for this month, previous month, and three months ago. I'm choosing a specific employee, just leave the WHERE clause out to get everyone.
SELECT Employee,
[Year] ,
[Month] ,
EmployeeTotal AS SalesThisMonth,
LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM dbo.EmployeeSalesByMonth
WHERE Employee = 272
ORDER BY Employee, [Year], [Month];
GO
The LAG function works perfectly. Or does it…? Let's look at the first few rows.
This Month Last Month Three Months Ago
272 2005 8 20544.7015 0.00 0.00
272 2005 9 2039.994 20544.7015 0.00
272 2005 11 6341.551 2039.994 0.00
272 2006 2 61206.4782 6341.551 20544.7015
272 2006 3 18307.746 61206.4782 2039.994
Although this sounded like a good idea, and the function is working as advertised, the answer isn't what we want. Look at the 2005-11 line, for example. Last month was 2005-10, but there was no sales that month, so we get 2005-9 sales. Three months ago (2005-08) there were sales, but we get zero. What gives?
We're only producing rows for months where there were sales for employee 272. Probably not what you'd want. Although we think we're counting months with this LAG function, we're actually counting ROWS. SQL Server BOL implementation of LAG states that the offset (first parameter) is "The number of rows back from the current row from which to obtain a value."
So, can we fix things to get "reasonable" values? Because LAG uses rows, we'd need to come up with some blank (zero total) rows for months where there are no sales. Some databases have a special data densification syntax, like "partition-by joins" to fill in the gaps. Using Itzik Ben-Gan's dbo.GetNums TVF, we can make a table of allMonths and all Employees between a date range. (Note: this code is a quick hack and pretty fragile, but we will end up with the right answer. It also assumes you have dbo.GetNums in tempdb, see the SQLMag article for that function).
DECLARE
@startdt AS DATE = '20050701', — first date we care about
@enddt AS DATE = '20080731' — last date we care about
SELECT DATEADD(month, n-1, @startdt) AS dt, EmployeeKey AS Employee
INTO #allMonths
FROM tempdb.dbo.GetNums(DATEDIFF(month, @startdt, @enddt) + 1) AS Nums
CROSS JOIN
(
SELECT DISTINCT EmployeeKey
FROM dbo.FactResellerSales
) AS A;
GO
With this table in hand, we can use an OUTER APPLY to "add in" the months with no sales, and our sales trend query becomes:
WITH SalesByMonth AS
(
SELECT Employee,
DATEPART(yyyy,dt) as Year,
DATEPART(MONTH,dt) as Month,
ISNULL(EmployeeTotal, 0) as EmployeeTotal
FROM #allMonths m
OUTER APPLY (
SELECT EmployeeTotal
FROM EmployeeSalesByMonth e
WHERE m.Employee = e.Employee
AND DATEPART(yyyy,dt) = [Year]
AND DATEPART(mm,dt) = [Month]
) AS t
)
SELECT Employee,
[Year] ,
[Month] ,
EmployeeTotal AS SalesThisMonth,
LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM SalesByMonth
WHERE Employee = 272
ORDER BY Employee,
[Year],
[Month];
GO
A few corresponding rows…the sales figures now agree with the column headings
This Month Last Month Three Months Ago
272 2005 8 20544.7015 0.00 0.00
272 2005 9 2039.994 20544.7015 0.00
272 2005 10 0.00 2039.994 0.00
272 2005 11 6341.551 0.00 20544.7015
272 2005 12 0.00 6341.551 2039.994
272 2006 1 0.00 0.00 0.00
272 2006 2 61206.4782 0.00 6341.551
272 2006 3 18307.746 61206.4782 0.00
So remember…although the default window for the OVER clause really is RANGE UNBOUNDED PRECEDING AND CURRENT ROW, the LAG function only counts ROWs, not RANGEs.
@bobbeauch
Sorry, comments are closed for this post.