# Window clause, range specification, and missing values

In the previous post, I insinuated that allowing LAG/LEAD to be sensitive to value RANGEs might help with series of data with missing values. It won't do that unless there's some logic that allows the offset that LAG/LEAD uses to be calculated on a row-by-row basis (i.e. LAG/LEAD uses an expression based on the window values). That's a little too much to expect; notice I've updated the original post to back off a bit on that one. I'll need my densified data for that.

However, a common scenario that would be helped by an expansion of use of RANGE in the window specification is rolling totals/rolling balances. Here's an example.

Suppose I wanted to calculate a 3-month rolling average (or rolling total) of sales on a per-employee basis. Looking at an employee who doesn't make at least one sale every month, I get a skewed total/average. The totals don't reflect the last three calendar months, but instead they reflect the last three months in which the employee had a nonzero sales total. Skews the figures upward a little….

SELECT  Employee,
[Year] ,
[Month] ,
EmployeeTotal AS SalesThisMonth,
COUNT(*) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS NumberOfMonths,
SUM(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthTotal,
AVG(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthAverage
FROM EmployeeSalesByMonth
WHERE Employee = 272
ORDER BY Employee, [Year], [Month];

SalesThisMonth Number   ThreeMonthTotal ThreeMonthAverage
272     2005    8       20544.7015      1       20544.7015      20544.7015
272     2005    9       2039.994          2       22584.6955      11292.3477
272     2005    11     6341.551          3       28926.2465      9642.0821
272     2006    2       61206.4782      3       69588.0232      23196.0077
272     2006    3       18307.746        3       85855.7752      28618.5917
272     2006    4       33406.7043      3       112920.9285     37640.3095

Suppose I had a way to specify a time interval and use RANGE instead of ROWS. Something like: "RANGE BETWEEN '2 MONTHS' PRECEEDING AND CURRENT ROW" where '2 MONTHS' represents a time-interval data type. Then I'd get the "non-skewed" answer. Currently SQL Server Denali doesn't even allow that form the RANGE specification (RANGE BETWEEN <unsigned value specification> PRECEDING) with any data type currently. Instead, I can use the densified data (see previous post) with the rows window and get that "non-skewed" answer.

WITH SalesByMonth AS
(
SELECT Employee,
DATEPART(yyyy,dt) as Year,
DATEPART(MONTH,dt) as Month,
ISNULL(EmployeeTotal, 0) as EmployeeTotal
FROM #allMonths m
OUTER APPLY (
SELECT  EmployeeTotal
FROM    EmployeeSalesByMonth e
WHERE m.Employee = e.Employee
AND    DATEPART(yyyy,dt) = [Year]
AND    DATEPART(mm,dt) = [Month]
) AS t
)
SELECT  Employee,
[Year] ,
[Month] ,
EmployeeTotal AS SalesThisMonth,
COUNT(*) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS NumberOfMonths,
SUM(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthTotal,
AVG(EmployeeTotal) OVER (PARTITION BY Employee ORDER BY [Year], [Month] ROWS 2 PRECEDING) AS ThreeMonthAverage
FROM SalesByMonth
WHERE Employee = 272
ORDER BY Employee,
[Year],
[Month];
GO

SalesThisMonth Number   ThreeMonthTotal ThreeMonthAverage
272     2005    7       0.00                 1       0.00                 0.00
272     2005    8       20544.7015      2       20544.7015      10272.3507
272     2005    9       2039.994          3       22584.6955      7528.2318
272     2005    10     0.00                 3       22584.6955      7528.2318
272     2005    11     6341.551          3       8381.545          2793.8483

Notice, however, that there's another subtle change in the answer. In choosing to start the time series exactly on 2005-07 (rather than the month when each employee had his/her first sale), the averages for the first couple of months for this employee are a little low. We can account for this, if we want, by basing the #allmonths table on each employee's hire date, for example. Yet another instance of "always pay attention to what you're actually measuring".

@bobbeauch

## Other articles

### Over 1000 XEvents in SQL Server 2016 CTP2. Here are the new ones.

Extended events has firmly established itself as the premier diagnostic feature in SQL Server and SQL Server 2016 brings along more events to correspond to

Explore

### Taking the Azure SQL Database row-level security preview for a spin

The security announcements around Azure SQL Database keep coming. Auditing was implemented a few months ago, and today it was followed by a preview of

Explore

### Azure SQL Database V12 Preview – Spatial Fully Functional

Yesterday’s blog post about Azure SQL Database V12 mentioned that one of the features I was particularly interested in seeing/testing were the spatial features. Interestingly,

Explore

### Not a “me-too announcement” blog on Azure SQL Database V12 preview

In general, I usually hate “me too” announcement blog posts. Over the years, I’ve considered it less than useful to simply repeat “Product XXX released

Explore

### AzureML: What components are used by the sample experiments?

A few months ago, I embarked on a project to learn more about data mining, machine learning and, as a prerequisite, statistics. I was tired

Explore

### Book Review: Chris Webb’s “Power Query for Power BI and Excel”

I’ve been experimenting with Power Query and the rest of the Power BI suite in Excel (desktop edition) since it was originally released. But, like

Explore

## Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.