Remember: LAG function in SQL Server Denali uses rows

You've heard my rant before "measure what you think you are measuring". If not, follow the link. Here's an example using the LAG function, new in SQL Server Denali, to measure sales trends. We'll start with a view from AdventureWorksDW2008R2 (the double cast is for clarity).

CREATE VIEW dbo.EmployeeSalesByMonth
AS
SELECT  EmployeeKey as [Employee],
        DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Year] ,
        DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE)) AS [Month] ,
        SUM(SalesAmount) AS [EmployeeTotal]
FROM    dbo.FactResellerSales
WHERE    EmployeeKey IS NOT NULL
GROUP BY EmployeeKey,
        DATEPART(yyyy, CAST(CAST(OrderDateKey as varchar(8)) as DATE)),
        DATEPART(mm, CAST(CAST(OrderDateKey as varchar(8)) as DATE))
GO

Let's use LAG to get sales for this month, previous month, and three months ago. I'm choosing a specific employee, just leave the WHERE clause out to get everyone.

SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
        LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM dbo.EmployeeSalesByMonth
WHERE Employee = 272
ORDER BY Employee, [Year], [Month];
GO

The LAG function works perfectly. Or does it…? Let's look at the first few rows.

                               This Month      Last Month      Three Months Ago
272     2005    8       20544.7015      0.00               0.00
272     2005    9       2039.994         20544.7015     0.00
272     2005    11      6341.551        2039.994         0.00
272     2006    2       61206.4782      6341.551        20544.7015
272     2006    3       18307.746       61206.4782     2039.994

Although this sounded like a good idea, and the function is working as advertised, the answer isn't what we want. Look at the 2005-11 line, for example. Last month was 2005-10, but there was no sales that month, so we get 2005-9 sales. Three months ago (2005-08) there were sales, but we get zero. What gives?

We're only producing rows for months where there were sales for employee 272. Probably not what you'd want. Although we think we're counting months with this LAG function, we're actually counting ROWS. SQL Server BOL implementation of LAG states that the offset (first parameter) is "The number of rows back from the current row from which to obtain a value."

So, can we fix things to get "reasonable" values? Because LAG uses rows, we'd need to come up with some blank (zero total) rows for months where there are no sales. Some databases have a special data densification syntax, like "partition-by joins" to fill in the gaps. Using Itzik Ben-Gan's dbo.GetNums TVF, we can make a table of allMonths and all Employees between a date range. (Note: this code is a quick hack and pretty fragile, but we will end up with the right answer. It also assumes you have dbo.GetNums in tempdb, see the SQLMag article for that function).

DECLARE
 @startdt AS DATE = '20050701',  — first date we care about
 @enddt   AS DATE = '20080731'   — last date we care about
 
SELECT DATEADD(month, n-1, @startdt) AS dt, EmployeeKey AS Employee
INTO #allMonths
FROM tempdb.dbo.GetNums(DATEDIFF(month, @startdt, @enddt) + 1) AS Nums
CROSS JOIN
(
SELECT DISTINCT EmployeeKey
FROM dbo.FactResellerSales
) AS A; 
GO

With this table in hand, we can use an OUTER APPLY to "add in" the months with no sales, and our sales trend query becomes:

WITH SalesByMonth AS
(
SELECT Employee,
       DATEPART(yyyy,dt) as Year,
    DATEPART(MONTH,dt) as Month,
    ISNULL(EmployeeTotal, 0) as EmployeeTotal
FROM #allMonths m
OUTER APPLY (
SELECT  EmployeeTotal
FROM    EmployeeSalesByMonth e
WHERE m.Employee = e.Employee  
 AND    DATEPART(yyyy,dt) = [Year]
 AND    DATEPART(mm,dt) = [Month]
) AS t
)
SELECT  Employee,
        [Year] ,
        [Month] ,
        EmployeeTotal AS SalesThisMonth,
        LAG(EmployeeTotal, 1, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesLastMonth ,
        LAG(EmployeeTotal, 3, 0.00) OVER (PARTITION BY Employee ORDER BY [Year], [Month]) AS SalesThreeMonthsAgo
FROM SalesByMonth
WHERE Employee = 272
ORDER BY Employee,
        [Year],
        [Month];
GO

A few corresponding rows…the sales figures now agree with the column headings

                               This Month      Last Month      Three Months Ago
272     2005    8       20544.7015      0.00               0.00
272     2005    9       2039.994         20544.7015     0.00
272     2005    10     0.00                2039.994         0.00
272     2005    11     6341.551         0.00               20544.7015
272     2005    12     0.00                6341.551        2039.994
272     2006    1       0.00                0.00                0.00
272     2006    2       61206.4782     0.00                 6341.551
272     2006    3       18307.746       61206.4782      0.00

So remember…although the default window for the OVER clause really is RANGE UNBOUNDED PRECEDING AND CURRENT ROW, the LAG function only counts ROWs, not RANGEs.

@bobbeauch  

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.