Quantcast
Channel: T-SQL – Pavel's SQL Pages
Viewing all articles
Browse latest Browse all 18

MS SQL 2012 Window functions introduction

$
0
0

Among lot of new features introduced in SQL Server 2012 also a new Windowing functions were introduced. The new functionality allow us to use the ORDER BY clause in the OVER clause with aggregate functions and also new ROWS and RANGE clauses were introduced to limit rows. The ORDER BY allow us define the order of rows processing and the ROWS/RANGE clauses put limits on the rows being processed in partition. All the details related to the OVER clause you can find on MSDN: OVER Clause (Transact-SQL).

ROWS/RANGE clause

The ROWS clause limits the rows in a parittion by specifying a fixed number of rows preceding or folowing the current rows. The rows preceeding and following are determined by the order specified in the ORDER BY clause.

The limit can be specified by serveral methods:

  • <unsigned integer> PRECEDING -fixed number of preceding rows
  • CURRENT ROW - representing current row being processed
  • UNBOUNDED PRECEDING - all previous records
  • <unsigned integer> FOLLOWING - fixed number of following rows
  • UNBOUNDED FOLLOWING - all rows following current row

So we can specify the limits like

ROWS BETWEEN 3 PRECEEDING AND 1 FOLLOWING
ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING

RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RENGE CURRENT ROW

The RANGE clause can be only used with the UNBOUNDED limit and CURRENT ROW. The difference between ROWS and RANGE clause is, that ROWS works with physical rows and RANGE works with range of rows based on the current row value in the terms of ORDER BY clause. This means that for ROWS clause the CURRENT ROW represents the only current row being processed. For RANGE the CURRENT ROW represents all the rows with the same value in the fields specified in the ORDER BY clause within current partition as the current row being processed. So if we use RANGE and multiple rows have the same rank in the terms of order within the partition, then all those rows will represent current row.

When there is no ROWS/RANGE clause specified after the ORDER BY clause, then the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is used by SQL Server.

Samples how to use the window functions

Let’s take a look on a few samples, how we can use the window functions and what results they will provide.

Test data preparation

To be able to test the new functionality

--======================
-- Create test database
--======================
CREATE DATABASE WindowFunctionsTest
GO
USE WindowFunctionsTest
GO 

--Create Testing Tables
CREATE TABLE [dbo].[Accounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
CREATE TABLE [dbo].[MultiAccounts](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
	[AccountID] [int] NOT NULL,
	[TransactionDate] [datetime] NULL,
	[Balance] [float] NULL
)
GO
--Fill test tables with data
INSERT INTO [dbo].[Accounts](
	[TransactionDate],
	[Balance]
)
SELECT '2000-1-1', 100 UNION ALL
SELECT '2000-1-1', -50 UNION ALL
SELECT '2000-1-2', 200 UNION ALL
SELECT '2000-1-3', 500 UNION ALL
SELECT '2000-1-4', -200 UNION ALL
SELECT '2000-1-5', 1000 UNION ALL
SELECT '2000-1-5', -300 UNION ALL
SELECT '2000-1-6', -300 UNION ALL
SELECT '2000-1-7', -200 UNION ALL
SELECT '2000-1-8', 2000 UNION ALL
SELECT '2000-1-9', 100 UNION ALL
SELECT '2000-1-10', -50 UNION ALL
SELECT '2000-1-10', 500 UNION ALL
SELECT '2000-1-11', 200 UNION ALL
SELECT '2000-1-12', 200 UNION ALL
SELECT '2000-1-13', 1000 UNION ALL
SELECT '2000-1-14', 1000 UNION ALL
SELECT '2000-1-15', -500 UNION ALL
SELECT '2000-1-15', -300 UNION ALL
SELECT '2000-1-16', 1000 UNION ALL
SELECT '2000-1-17', 1000 UNION ALL
SELECT '2000-1-18', -800 UNION ALL
SELECT '2000-1-19', 2000 UNION ALL
SELECT '2000-1-20', -1000
GO

INSERT [dbo].[MultiAccounts] (
	[AccountID],
	[TransactionDate],
	[Balance]
)
SELECT 1, '2000-1-1', 100 UNION ALL
SELECT 1, '2000-1-1', -50 UNION ALL
SELECT 1, '2000-1-2', 200 UNION ALL
SELECT 1, '2000-1-3', 500 UNION ALL
SELECT 1, '2000-1-4', -200 UNION ALL
SELECT 1, '2000-1-5', 1000 UNION ALL
SELECT 1, '2000-1-5', -300 UNION ALL
SELECT 1, '2000-1-6', -300 UNION ALL
SELECT 1, '2000-1-7', -200 UNION ALL
SELECT 2, '2000-1-1', 2000 UNION ALL
SELECT 2, '2000-1-2', 100 UNION ALL
SELECT 2, '2000-1-3', -50 UNION ALL
SELECT 2, '2000-1-4', 500 UNION ALL
SELECT 2, '2000-1-5', 200 UNION ALL
SELECT 2, '2000-1-6', 200 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 2, '2000-1-7', 1000 UNION ALL
SELECT 3, '2000-1-1', 800 UNION ALL
SELECT 3, '2000-1-2', -300 UNION ALL
SELECT 3, '2000-1-3', 1000 UNION ALL
SELECT 3, '2000-1-4', 1000 UNION ALL
SELECT 3, '2000-1-5', -800 UNION ALL
SELECT 3, '2000-1-6', 2000 UNION ALL
SELECT 3, '2000-1-7', -1000
GO

Window functions samples

If we try any of below queries they will provide the same results

--Using the ROWS clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as abowe the ROWS UNBOUNDED PRECEDING will be completed by SQL Server to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--Using the RANGE Clause
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--The same as above as RANGE UNBOUNDED PRECEDING AND CURRENT ROW will be complete by SQL Server as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--If we specify only the left boundary, SQL Servers automaticaly fills the right BOUNDARY
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID RANGE UNBOUNDED PRECEDING ) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO
--NO ROWS/RANGE Clause (SQL Server will use the DEFAULT RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Results are below and we can see, a correct cumulative balance is calculated.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

ROWS clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate
GO

Results will be the same as in previous example. They are the same because there is no parallelism and we have CLUSTERED INDEX on the TransactionID (otherwise the final order could be different because the order of rows with the same TransactionDate is not guaranteed here.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    100
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1550
7             2000-01-05 00:00:00.000 -300                   1250
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

RANGE Clause with not unique order

SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID
GO

Here we see, that the results are quite different. The final sum is the same, but the intermediate are not.

TransactionID TransactionDate         Balance                CummulativeBalance
------------- ----------------------- ---------------------- ----------------------
1             2000-01-01 00:00:00.000 100                    50
2             2000-01-01 00:00:00.000 -50                    50
3             2000-01-02 00:00:00.000 200                    250
4             2000-01-03 00:00:00.000 500                    750
5             2000-01-04 00:00:00.000 -200                   550
6             2000-01-05 00:00:00.000 1000                   1250
7             2000-01-05 00:00:00.000 -300                   1250
8             2000-01-06 00:00:00.000 -300                   950
.             .                       .                      .
.             .                       .                      .
.             .                       .                      .
17            2000-01-14 00:00:00.000 1000                   5700
18            2000-01-15 00:00:00.000 -500                   4900
19            2000-01-15 00:00:00.000 -300                   4900
20            2000-01-16 00:00:00.000 1000                   5900
21            2000-01-17 00:00:00.000 1000                   6900
22            2000-01-18 00:00:00.000 -800                   6100
23            2000-01-19 00:00:00.000 2000                   8100
24            2000-01-20 00:00:00.000 -1000                  7100

Here we can see, that the RANGE works as described above. All rows with the same value in the ORDER BY clause are considered as current row. Therefore for the dates ’2000/01/01′ ,  ’2000/01/05′ and ’2000/01/15′ the values for each date are the same.

Working with FOLLOWING Rows

All the examples above worked with current row and all previous rows. Except this we can even work with rows following current row in particular order.

Here are a few other examples incorporating also FOLLOWING rows.

--Sum of current row and all following rows
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER () AS FinalBalance
FROM [dbo].[Accounts]
ORDER BY TransactionDate, TransactionID

Example with Partitioning results

In previous examples we have worked with single partition. The OVER clause also allows partitioning the results. So let see some a few examples with partitioning.

--Sum of current row and all following rows partitioned, by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of 1 preceding, current and one following row, partitioned by AccountID
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate, TransactionID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all rows in each row
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID]) AS FinalBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate, TransactionID

--SUM of all preceeding and current row, partitioned by AccountID order is based only on TransactionDate - using RANGE
SELECT
    [TransactionID]
	,[AccountID]
    ,[TransactionDate]
    ,[Balance]
    ,SUM(Balance) OVER (PARTITION BY [AccountID] ORDER BY TransactionDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CummulativeBalance
FROM [dbo].[MultiAccounts]
ORDER BY AccountID, TransactionDate

When to use ROWS and when RANGE

Now we can ask, when we should use the ROWS clause and when RANGE clause to limit the rows. The answer comes fro the definition how the ROWS and RANGE clauses works. As described, ROWS works with each unique rows and RANGE handles multiple rows with the same order position as current row.

So in case the combination of fields specified in the ORDER BY clause does not uniquely specify the order of rows (as in case of examples above when only TransactionDate was used), then you should use RANGE, as the processing order of rows with the same order position is not guaranteed. In case the rows are uniquely identified, then ROWS should be used as there are no rows with the same order in the partition.

Conclusion

The new windowing functions brings new possibilities in writing T-SQL queries can simplify a lot of tasks which were problematic to write without these constructs. It allow us to bypass the recursive CTE, other solutions for calculation of running totals or averages without knocking down the server and also allow us to bypass quirky updates, CLR solutions which have some pitfalls when are used.

In my next post I will take a closer look on the Running Totals problem when using this new windowing functionality. Also I will take a closer look on the query plans produced by those constructs and give some advices for using them.


Viewing all articles
Browse latest Browse all 18

Trending Articles