Among lot of new features introduced in SQL Server 2012 also a new Windowing functions were introduced. The new functionality allow us to use the ORDER BY clause in the OVER clause with aggregate functions and also new ROWS and RANGE clauses were introduced to limit rows. The ORDER BY allow us define the order of rows processing and the ROWS/RANGE clauses put limits on the rows being processed in partition. All the details related to the OVER
clause you can find on MSDN: OVER Clause (Transact-SQL).
ROWS/RANGE
clause
The ROWS clause limits the rows in a parittion by specifying a fixed number of rows preceding or folowing the current rows. The rows preceeding and following are determined by the order specified in the ORDER BY
clause.
The limit can be specified by serveral methods:
- <unsigned integer>
PRECEDING
-fixed number of preceding rows CURRENT ROW
- representing current row being processedUNBOUNDED PRECEDING
- all previous records- <unsigned integer>
FOLLOWING
- fixed number of following rows UNBOUNDED FOLLOWING
- all rows following current row
So we can specify the limits like
ROWS BETWEEN 3 PRECEEDING AND 1 FOLLOWING ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING RENGE CURRENT ROW
The RANGE
clause can be only used with the UNBOUNDED
limit and CURRENT ROW
. The difference between ROWS
and RANGE
clause is, that ROWS
works with physical rows and RANGE
works with range of rows based on the current row value in the terms of ORDER BY
clause. This means that for ROWS
clause the CURRENT ROW
represents the only current row being processed. For RANGE
the CURRENT ROW
represents all the rows with the same value in the fields specified in the ORDER BY
clause within current partition as the current row being processed. So if we use RANGE
and multiple rows have the same rank in the terms of order within the partition, then all those rows will represent current row.
When there is no ROWS/RANGE
clause specified after the ORDER BY
clause, then the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is used by SQL Server.
Samples how to use the window functions
Let’s take a look on a few samples, how we can use the window functions and what results they will provide.
Test data preparation
To be able to test the new functionality
--====================== -- Create test database --====================== CREATE DATABASE WindowFunctionsTest GO USE WindowFunctionsTest GO --Create Testing Tables CREATE TABLE [dbo].[Accounts]( [TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, [TransactionDate] [datetime] NULL, [Balance] [float] NULL ) GO CREATE TABLE [dbo].[MultiAccounts]( [TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, [AccountID] [int] NOT NULL, [TransactionDate] [datetime] NULL, [Balance] [float] NULL ) GO --Fill test tables with data INSERT INTO [dbo].[Accounts]( [TransactionDate], [Balance] ) SELECT '2000-1-1', 100 UNION ALL SELECT '2000-1-1', -50 UNION ALL SELECT '2000-1-2', 200 UNION ALL SELECT '2000-1-3', 500 UNION ALL SELECT '2000-1-4', -200 UNION ALL SELECT '2000-1-5', 1000 UNION ALL SELECT '2000-1-5', -300 UNION ALL SELECT '2000-1-6', -300 UNION ALL SELECT '2000-1-7', -200 UNION ALL SELECT '2000-1-8', 2000 UNION ALL SELECT '2000-1-9', 100 UNION ALL SELECT '2000-1-10', -50 UNION ALL SELECT '2000-1-10', 500 UNION ALL SELECT '2000-1-11', 200 UNION ALL SELECT '2000-1-12', 200 UNION ALL SELECT '2000-1-13', 1000 UNION ALL SELECT '2000-1-14', 1000 UNION ALL SELECT '2000-1-15', -500 UNION ALL SELECT '2000-1-15', -300 UNION ALL SELECT '2000-1-16', 1000 UNION ALL SELECT '2000-1-17', 1000 UNION ALL SELECT '2000-1-18', -800 UNION ALL SELECT '2000-1-19', 2000 UNION ALL SELECT '2000-1-20', -1000 GO INSERT [dbo].[MultiAccounts] ( [AccountID], [TransactionDate], [Balance] ) SELECT 1, '2000-1-1', 100 UNION ALL SELECT 1, '2000-1-1', -50 UNION ALL SELECT 1, '2000-1-2', 200 UNION ALL SELECT 1, '2000-1-3', 500 UNION ALL SELECT 1, '2000-1-4', -200 UNION ALL SELECT 1, '2000-1-5', 1000 UNION ALL SELECT 1, '2000-1-5', -300 UNION ALL SELECT 1, '2000-1-6', -300 UNION ALL SELECT 1, '2000-1-7', -200 UNION ALL SELECT 2, '2000-1-1', 2000 UNION ALL SELECT 2, '2000-1-2', 100 UNION ALL SELECT 2, '2000-1-3', -50 UNION ALL SELECT 2, '2000-1-4', 500 UNION ALL SELECT 2, '2000-1-5', 200 UNION ALL SELECT 2, '2000-1-6', 200 UNION ALL SELECT 2, '2000-1-7', 1000 UNION ALL SELECT 2, '2000-1-7', 1000 UNION ALL SELECT 3, '2000-1-1', 800 UNION ALL SELECT 3, '2000-1-2', -300 UNION ALL SELECT 3, '2000-1-3', 1000 UNION ALL SELECT 3, '2000-1-4', 1000 UNION ALL SELECT 3, '2000-1-5', -800 UNION ALL SELECT 3, '2000-1-6', 2000 UNION ALL SELECT 3, '2000-1-7', -1000 GO
Window functions samples
If we try any of below queries they will provide the same results
--Using the ROWS clause SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO --The same as abowe the ROWS UNBOUNDED PRECEDING will be completed by SQL Server to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW --If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS UNBOUNDED PRECEDING ) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO --Using the RANGE Clause SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO --The same as above as RANGE UNBOUNDED PRECEDING AND CURRENT ROW will be complete by SQL Server as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW --If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE UNBOUNDED PRECEDING ) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO --NO ROWS/RANGE Clause (SQL Server will use the DEFAULT RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO
Results are below and we can see, a correct cumulative balance is calculated.
TransactionID TransactionDate Balance CummulativeBalance ------------- ----------------------- ---------------------- ---------------------- 1 2000-01-01 00:00:00.000 100 100 2 2000-01-01 00:00:00.000 -50 50 3 2000-01-02 00:00:00.000 200 250 4 2000-01-03 00:00:00.000 500 750 5 2000-01-04 00:00:00.000 -200 550 6 2000-01-05 00:00:00.000 1000 1550 . . . . . . . . . . . . 20 2000-01-16 00:00:00.000 1000 5900 21 2000-01-17 00:00:00.000 1000 6900 22 2000-01-18 00:00:00.000 -800 6100 23 2000-01-19 00:00:00.000 2000 8100 24 2000-01-20 00:00:00.000 -1000 7100
ROWS clause with not unique order
SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate GO
Results will be the same as in previous example. They are the same because there is no parallelism and we have CLUSTERED INDEX on the TransactionID (otherwise the final order could be different because the order of rows with the same TransactionDate is not guaranteed here.
TransactionID TransactionDate Balance CummulativeBalance ------------- ----------------------- ---------------------- ---------------------- 1 2000-01-01 00:00:00.000 100 100 2 2000-01-01 00:00:00.000 -50 50 3 2000-01-02 00:00:00.000 200 250 4 2000-01-03 00:00:00.000 500 750 5 2000-01-04 00:00:00.000 -200 550 6 2000-01-05 00:00:00.000 1000 1550 7 2000-01-05 00:00:00.000 -300 1250 . . . . . . . . . . . . 20 2000-01-16 00:00:00.000 1000 5900 21 2000-01-17 00:00:00.000 1000 6900 22 2000-01-18 00:00:00.000 -800 6100 23 2000-01-19 00:00:00.000 2000 8100 24 2000-01-20 00:00:00.000 -1000 7100
RANGE Clause with not unique order
SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID GO
Here we see, that the results are quite different. The final sum is the same, but the intermediate are not.
TransactionID TransactionDate Balance CummulativeBalance ------------- ----------------------- ---------------------- ---------------------- 1 2000-01-01 00:00:00.000 100 50 2 2000-01-01 00:00:00.000 -50 50 3 2000-01-02 00:00:00.000 200 250 4 2000-01-03 00:00:00.000 500 750 5 2000-01-04 00:00:00.000 -200 550 6 2000-01-05 00:00:00.000 1000 1250 7 2000-01-05 00:00:00.000 -300 1250 8 2000-01-06 00:00:00.000 -300 950 . . . . . . . . . . . . 17 2000-01-14 00:00:00.000 1000 5700 18 2000-01-15 00:00:00.000 -500 4900 19 2000-01-15 00:00:00.000 -300 4900 20 2000-01-16 00:00:00.000 1000 5900 21 2000-01-17 00:00:00.000 1000 6900 22 2000-01-18 00:00:00.000 -800 6100 23 2000-01-19 00:00:00.000 2000 8100 24 2000-01-20 00:00:00.000 -1000 7100
Here we can see, that the RANGE works as described above. All rows with the same value in the ORDER BY clause are considered as current row. Therefore for the dates ’2000/01/01′ , ’2000/01/05′ and ’2000/01/15′ the values for each date are the same.
Working with FOLLOWING Rows
All the examples above worked with current row and all previous rows. Except this we can even work with rows following current row in particular order.
Here are a few other examples incorporating also FOLLOWING rows.
--Sum of current row and all following rows SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID --SUM of 1 preceding, current and one following row SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID --SUM of all rows in each row SELECT [TransactionID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER () AS FinalBalance FROM [dbo].[Accounts] ORDER BY TransactionDate, TransactionID
Example with Partitioning results
In previous examples we have worked with single partition. The OVER clause also allows partitioning the results. So let see some a few examples with partitioning.
--Sum of current row and all following rows partitioned, by AccountID SELECT [TransactionID] ,[AccountID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance FROM [dbo].[MultiAccounts] ORDER BY AccountID, TransactionDate, TransactionID --SUM of 1 preceding, current and one following row, partitioned by AccountID SELECT [TransactionID] ,[AccountID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance FROM [dbo].[MultiAccounts] ORDER BY AccountID, TransactionDate, TransactionID --SUM of all rows in each row SELECT [TransactionID] ,[AccountID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (PARTITION BY [AccountID]) AS FinalBalance FROM [dbo].[MultiAccounts] ORDER BY AccountID, TransactionDate, TransactionID --SUM of all preceeding and current row, partitioned by AccountID order is based only on TransactionDate - using RANGE SELECT [TransactionID] ,[AccountID] ,[TransactionDate] ,[Balance] ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance FROM [dbo].[MultiAccounts] ORDER BY AccountID, TransactionDate
When to use ROWS
and when RANGE
Now we can ask, when we should use the ROWS
clause and when RANGE
clause to limit the rows. The answer comes fro the definition how the ROWS
and RANGE
clauses works. As described, ROWS
works with each unique rows and RANGE
handles multiple rows with the same order position as current row.
So in case the combination of fields specified in the ORDER BY
clause does not uniquely specify the order of rows (as in case of examples above when only TransactionDate
was used), then you should use RANGE
, as the processing order of rows with the same order position is not guaranteed. In case the rows are uniquely identified, then ROWS
should be used as there are no rows with the same order in the partition.
Conclusion
The new windowing functions brings new possibilities in writing T-SQL queries can simplify a lot of tasks which were problematic to write without these constructs. It allow us to bypass the recursive CTE, other solutions for calculation of running totals or averages without knocking down the server and also allow us to bypass quirky updates, CLR solutions which have some pitfalls when are used.
In my next post I will take a closer look on the Running Totals problem when using this new windowing functionality. Also I will take a closer look on the query plans produced by those constructs and give some advices for using them.