Written by 00:54 Languages & Coding, T-SQL

Calculate the median by using Transact SQL

The statistical median is the value which separates a dataset into halves – one comprises greater values, and the other comprises lesser ones. For a specified dataset, it can be considered as the “middle” value. For example, in the dataset {1, 3, 3, 4, 5, 6, 7, 8, 9}, the median is 5, which is fourth largest, and fourth smallest number in the dataset.

To calculate the median of any dataset, we first need to arrange all values from the dataset in a specific order. After arranging the data, we must determine the middle value of the specified dataset. If the dataset contains an odd number of values, than the middle value of the entire dataset will be considered as a median. Following is the example:

Median (M) = value of ((X + 1)/2) th item. (x is the number of values in the dataset)

For example, if we calculate the median of the dataset {1,2,3,4,6,7,8} by using the above formula, then Median (M)= (7+1)/2=4. So, the fourth value of the specified dataset, which is 4 in this case, is the median.

If the dataset contains an even number of rows, we must determine the two middle values of this dataset, add them together, and divide the result by two. The following formula is used to calculate the median:

Median (M) = value of [(x/2) th item + (x/2 + 1) th item]/2 (x is the number of values in the dataset)

For example, if we apply this formula to the dataset {1,2,4,6,8,10}, then the median value is calculated as shown below:

Median (M)= [ 6/2 ] = 3rd value of the dataset + [ 6/2 + 1 ]= 4th value of the dataset.

= (4+6)/2 = 5. So, the median value in this case is 5.

Usually, it is difficult to calculate the median value of a specified dataset by using Transact SQL because there is no built-in function available for this purpose. But we can make it easier by using the Rank function and common table expressions.

In this article, I am going to describe the following steps:

  1. Preparing the Demo setup.
  2. Calculating the median by using the RANK() function and CTE (common table expressions).
  3. Calculating the median by using PERCENTILE_COUNT(). It applies to SQL Server 2012 and later versions.

Preparing the Demo setup

To demonstrate how the median can be calculated with the help of Transact SQL, let’s first prepare a demo setup. To do that, we’ll need to create a SQL Table named “Invoice.” The Invoice table contains the invoice number, Sale quantity, and the Price column. Execute the following code to create the Invoice table.

CREATE TABLE INVOICE 
  ( 
     ID            INT IDENTITY(1, 1), 
     INVOICENUMBER VARCHAR(50), 
     PRICE         NUMERIC(10, 2), 
     QUANTITY      NUMERIC(10, 2) 
  )

Now insert some dummy data by executing the following query.

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (1, N'DES1112015794', CAST(9.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (2, N'DES1112015794', CAST(10.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (3, N'DES1112015794', CAST(11.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (4, N'DES1112015794', CAST(12.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (5, N'DES1112015794', CAST(12.00 AS NUMERIC(10, 2)), CAST(2.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (6, N'DES1112015794', CAST(18.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (7, N'DES1112018052', CAST(3.00 AS NUMERIC(10, 2)), CAST(2.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (8, N'DES1112018052', CAST(5.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (9, N'DES1112018052', CAST(12.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (10, N'DES1112018052', CAST(18.00 AS NUMERIC(10, 2)), CAST(2.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (11, N'DES1112019188', CAST(20.00 AS NUMERIC(10, 2)), CAST(5.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (12, N'DES1112019188', CAST(23.00 AS NUMERIC(10, 2)), CAST(5.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (13, N'DES1112021986', CAST(27.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (14, N'DES1112022597', CAST(9.00 AS NUMERIC(10, 2)), CAST(5.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (15, N'DES1112022597', CAST(12.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (16, N'DES1112022597', CAST(36.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (17, N'DES1112022597', CAST(155.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))
INSERT [DBO].[INVOICE] ([ID], [INVOICENUMBER], [PRICE], [QUANTITY]) VALUES (18, N'DES1112024469', CAST(95.00 AS NUMERIC(10, 2)), CAST(1.00 AS NUMERIC(10, 2)))

Execute the following query to review the data:

SELECT INVOICENUMBER, 
       PRICE, 
       QUANTITY 
FROM   [INVOICE]

This is the output we should get:

Demo Table

Now we want to calculate the median of the price of an individual invoice number. For example, Invoice XYZ123 has more than five items, so we will generate the median price of all items in this invoice.

Calculate the median by using the ranking function and common table expressions

If you are using SQL Server 2000/ 2005 / 2008, you can calculate the median by using 50 percent highest value and 50 percent lowest values. This method is quite simple and can be used to calculate the median value of an odd or even number of rows within a specified dataset.

To calculate this, execute the following query:

SELECT ( (SELECT MAX(PRICE), 
                 INVOICENUMBER 
          FROM   (SELECT TOP 50 PERCENT PRICE, 
                                        INVOICENUMBER 
                  FROM   INVOICE 
                  GROUP  BY INVOICENUMBER 
                  ORDER  BY INVOICENUMBER) AS BOTTOMHALF) 
         + (SELECT MIN(PRICE), 
                   INVOICENUMBER 
            FROM   (SELECT TOP 50 PERCENT PRICE, 
                                          INVOICENUMBER 
                    FROM   INVOICE 
                    GROUP  BY INVOICENUMBER 
                    ORDER  BY INVOICENUMBER DESC) AS TOPHALF) ) / 2 AS MEDIAN

This is the output we should get:

Calculate the Median with Ranking Function

The above value is the median of all rows from the invoice table. Now we want to calculate the median value of the price of items from an individual invoice number.

To do that, execute the following query:

SELECT INVOICENUMBER, 
       AVG(PRICE) AS MEDIANVAL 
FROM   (SELECT INVOICENUMBER, 
               PRICE, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY INVOICENUMBER 
                   ORDER BY PRICE ASC, INVOICENUMBER ASC) AS ROWASC, 
               ROW_NUMBER() 
                 OVER ( 
                   PARTITION BY INVOICENUMBER 
                   ORDER BY PRICE DESC)                   AS ROWDESC 
        FROM   INVOICE SOH) X 
WHERE  ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 ) 
GROUP  BY INVOICENUMBER 
ORDER  BY INVOICENUMBER;

The output is as follows:

Calculate the Median with Ranking Function 2

As you can see in the image above, 24.00 is the median value of the Invoice “DES1112022597.”

Calculate the median by using PERCENTILE_COUNT (SQL Server 2012 and later versions)

An analytic function called PERCENTILE_CONT was introduced in SQL Server 2012. It is capable of calculating the median within a partitioned set. It calculates the median when we pass 0.5 as an argument and specify the order within that dataset.

The following query demonstrates how to use this function:

SELECT INVOICENUMBER, 
       PRICE, 
       PERCENTILE_CONT(0.5) 
         WITHIN GROUP (ORDER BY PRICE) OVER ( 
           PARTITION BY INVOICENUMBER) AS MEDIANCONT 
FROM   INVOICE 
ORDER  BY INVOICENUMBER DESC

This is the output we should get:

Calculate the Median with PERCENTILE_COUNT

As you can see in the image above, Invoice DES1112022597 has four items in total and the median value for the Price column within this invoice equals to the the sum of Price in row 3 and Price in row 4 divided by two. With the right numbers in place, we get (36+12)/2 =24.00. Similarly, the median value for invoice DES1112022594 will equal (11+12)/2 = 11.5.

Creating a function to calculate the median value

Now, let’s create a T-SQL Function to calculate the median value of the specified dataset. This function can be used in all version of SQL Server. The procedure takes a table name and a column name as input parameters to calculate the median value for any data type. It creates a temporary table from the column values and sorts them. It uses a dynamic scrollable cursor which scrolls to the middle of the result set and gives the output.

The following is the function’s actual code:

USE VSDATA 
GO 

CREATE PROCEDURE DBO.PROC_CALCULATE_MEDIAN (@TABLENAME  VARCHAR(50), 
                                            @COLUMNNAME VARCHAR(50), 
                                            @OUTPUT     SQL_VARIANT OUTPUT) 
AS 
  BEGIN 
      DECLARE @SQLQUERY VARCHAR(200) 
      DECLARE @MIDDLE_VALUE INT 
      SET NOCOUNT ON 
      CREATE TABLE #WORKTABLE 
        ( 
           COLUMNS SQL_VARIANT 
        ) 
      SET @SQLQUERY = 'INSERT #WORKTABLE SELECT ' + @COLUMNNAME 
                      + ' FROM ' + @TABLENAME + ' ORDER BY 1 ASC ' 
      EXEC (@SQLQUERY) 
      DECLARE CURSOR_MED CURSOR SCROLL FOR 
        SELECT * 
        FROM   #WORKTABLE 
      SELECT @MIDDLE_VALUE = ROUND (COUNT(*) * 0.5, 0) 
      FROM   #WORKTABLE 
      OPEN CURSOR_MED 
      FETCH ABSOLUTE @MIDDLE_VALUE FROM CURSOR_MED INTO @OUTPUT 
      CLOSE CURSOR_MED 
      DEALLOCATE CURSOR_MED 
      DROP TABLE #WORKTABLE 
  END 
GO

Let’s run this procedure and check the output. Firstly, we’ll execute this procedure to get the median value of the Prices column from the Invoice Table. To run the procedure, execute the following query:

DECLARE @MEDIAN_VALUE SQL_VARIANT 

EXEC DBO.SP_CALC_MEDIAN 
  'INVOICE', 
  'PRICE', 
  @MEDIAN_VALUE OUTPUT 

SELECT @MEDIAN_VALUE

This is the output we should get:

Calculate the Median with a Custom Function

We can also calculate the median of the InvoiceNumber column from the Invoices table. To do that, run the following query:

DECLARE @MEDIAN_VALUE SQL_VARIANT 

EXEC DBO.SP_CALC_MEDIAN 
  'INVOICE', 
  'INVOICENUMBER', 
  @MEDIAN_VALUE OUTPUT 

SELECT @MEDIAN_VALUE

We should get the following output:

Calculate the Median with a Custom Function 2

Summary

In this article, I have covered the following:

  1. Overview of the  statistical median.
  2. Creating a demo setup and different approaches to calculating the median value of a specified dataset.
  3. Creating a SQL function which accepts a table name and a column name of any dataset and generates its median value.
Tags: , , Last modified: August 08, 2022
Close