Learn Basic Data Analysis with SQL Window Functions

Total: 9 Average: 3.7

This article is about T-SQL (Transact-SQL) Window functions and their basic use in day-to-day data analysis tasks.

There are many alternatives to T-SQL when it comes to data analysis. However, when improvements over time and introduction of Window functions are considered, T-SQL is capable of performing data analysis on a basic level and, in some cases, even beyond that.

About SQL Window Functions

First, let’s first get familiar with SQL Window functions in the context of the Microsoft documentation.

Microsoft Definition

A window function computes a value for each row in the window.

Simple Definition

A window function helps us to focus on a particular portion (window) of the result set so that we can perform data analysis on that specific part (window) only, rather than on the entire result set.

In other words, SQL window functions turn a result set into several smaller sets for data analysis purposes.

What is a result set

Simply put, a result set consists of all records retrieved by running a SQL query.

For example, we can create a table named Product and insert the following data into it:

Now, the result set retrieved by using the script below will contain all rows from the Product table:

Result Set

What is a Window

It is important to first understand the concept of a window as it relates to SQL window functions. In this context, a window is just a way of narrowing down your scope by targeting a specific part of the result set (as we already mentioned above).

You may be wondering now – what does ‘targeting a specific part of the result set’ actually means?

Returning to the example we looked at, we can create a SQL window based on the product region by dividing the result set into two windows.

Understanding Row_Number()

To proceed, we will need to use the Row_Number() function which temporarily gives a sequence number to the output rows.

For example, if we want to add a row numbers to the result set based on ProductID, we’ll need to use ROW_NUMBER() to order it by Product ID as follows:

Row-Number-by-Product-ID

Now, if we want the Row_Number() function to order the result set by ProductID descending, then the sequence of output rows based on ProductID will change as follows:

Row-Number-by-Product-ID-Sorted

There are no SQL windows yet since the only thing we’ve done is ordering the set by specific criteria. As discussed earlier, windowing means breaking the result set into several smaller sets to analyze each one of them separately.

Creating a Window with Row_Number()

To create a SQL window in our result set, we will need to to partition it based on any of the columns it contains.

We can now partition the result set by region as follows:

SQL-Window-Partition-by-Region

Select – Over Clause

In other words, Select with the Over clause paves the way for SQL window functions by partitioning a result set into smaller windows.

According to the Microsoft documentation, Select with the Over clause defines a window which can then be used by any window function.

Now, let’s create a table called KitchenProduct as follows:

Now, let’s view the table:

Kitchen-Product-Table

If you want to see each product with its own serial number rather than a number based on the generalized product ID, then you would have to use a SQL window function to partition the result set by product as follows:

Kitchen-Products-Select-Over

Compatibility (Select – Over Clause)

According to Microsoft documentation, Select – Over Clause is compatible with the following SQL database versions:

  1. SQL Server 2008 and upward
  2. Azure SQL Database
  3. Azure SQL Data Warehouse
  4. Parallel Data Warehouse

Syntax

SELECT – OVER (Partition by <column> Order by <column>)

Please note that I have simplified the syntax to make it easy to understand; please refer to the Microsoft documentation to see the full syntax.

Pre-requisites

This article is basically written for beginners, but there are still some pre-requisites which must be kept in mind.

Familiarity with T-SQL

This article assumes that the readers have a basic knowledge of T-SQL and are capable of writing and running basic SQL scripts.

Setup the Sales sample table

This article requires the following sample table so that we can run our SQL window function examples:

View all sales by running the following script:

Sales-Table

Group By vs SQL Window Functions

One can wonder – what is the difference between using the Group By clause and SQL window functions?

Well, the answer lies in the examples below.

Group By Example

In order to see total sales by product, we can use Group By as follows:

Total-Sales-Grouped-by-Product

So, the Group By clause helps us see the total sales. The total sales value is the sum of revenue for all similar products in the same row with no Group By clause used. What if we are interested in seeing the revenue (sale) of each individual product along with total sales?

This is where SQL window functions come into action.

SQL Window Function Example

In order to see the product, revenue and total revenue by all similar products, we have to partition the data on a by-product basis using OVER() as follows:

The output should be as follows:

Total-Sales-and-Revenue-Partiton-Inside-Over

So, we can now easily see the sales for each individual product along with total sales for that product. For example, the revenue for PC is 100.00 but total sales (sum of revenue for the PC product) is 300.00 because two different PC models were being sold.

Basic Analysis with the Aggregate Functions

Aggregate functions return a single value after performing calculations on a set of data.

In this section, we are going to further explore SQL window functions – specifically, by using them along with aggregate functions to perform basic data analysis.

Common Aggregate Functions

The most common aggregate functions are:

  1. Sum
  2. Count
  3. Min
  4. Max
  5. Avg (Average)

Aggregate Data Analysis by Product

In order to analyse the result set on a by-product basis with the help of aggregate functions, we simply have to use an aggregate function with a by-product partition inside of the OVER() statement:

Data-Analysis-With-Aggregate-Functions

If you take a closer look at the PC or Laptop products, you will see how aggregate functions are working together alongside the SQL window function.

In the example above, we can see that the Revenue value for PC is 100.00 first time and 200.00 next time, but Total Sales amount to 300.00. The similar information can be seen for the rest of the aggregate functions.

Aggregate Data Analysis by Date

Now, let’s perform some data analysis of the products on a by-date basis using SQL window functions in combination with aggregate functions.

This time, we are going to partition the result set by date rather than by product as follows:

Data-Analysis-With-Aggregate-Functions-2

With this, we have learned basic data analysis techniques using the SQL window functions approach.

Things to do

Now that you are familiar with SQL window functions, please try the following:

  1. Keeping in mind the examples we looked at, perform basic data analysis using SQL window functions on the sample database mentioned in this article.
  2. Adding a Customer column to the Sales sample table and see how rich your data analysis can become when another column (customer) is added to it.
  3. Adding a Region column to the Sales sample table and perform basic data analysis using aggregate functions by region.

Haroon Ashraf

Haroon's deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional.

He began his professional life as a computer programmer more than 10 years ago working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data.

His work and interest revolves around Database-Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration along with Database Life Cycle Management (DLM).
Haroon Ashraf