Percentiles like Excel to the 4th power (T-SQL, SQL CLR, MDX, ASSP)

With the release of SQL 2012 additional analytic functions have been added to the SQL server T-SQL toolkit including 3 Percentile functions: PERCENTILE_CONT; PERCENTILE_DISC; PERCENTILE_RANK, sadly no new MDX Percentile functions where added. So I thought I would share 4 ways to calculate percentiles just like Excel using: MDX; SSAS Stored Procedures; T-SQL and SQL CLR Aggregates.

The example  queries in this post use the AdventureWorksDW2012 database and the AdventureWorksDW2012 UDM Analysis Services Database, with the primary tables being FactCallCenter and DimDate. The goal was to find the the 90th percentile for the ‘average issue wait time’ by shift.   To ensure that the data used for SSAS and SQL examples was the same I created a very simple cube called callcentre in the AdventureWorksDW2012 model, see screen shot below, the cube contains one measure and two dimensions

  • Measures
    • Average Time Per Issue
  • Dimensions
    • Date (existing date dimension)
    • Shifts ( degenerate from the shifts column in the FactCallCentre Table)

The source for percentile knowledge was Wikipedia, come on it’s on the internet it must be true, both Excel percentile formulas percentile.inc and pecentile.exc are defined here under the Alternative methods heading.

Basically according to Wikipedia calculating an Excel like percentile comes down to two steps:

– Get the ordinal of the value with the following formulas

  • Exclusive

n = \frac{P}{100}(N+1)

  • Inclusive
n = \frac{P}{100}(N-1)+1

– Calculate the percentile Value by splitting the ordinal into its integer component k and decimal component d,
such that n = k + d. Then v_P is calculated as:

 v_P = \begin{cases}   v_1, & \mbox{for }n=1 \\   v_N, & \mbox{for }n=N \\   v_k+d(v_{k+1}-v_k), & \mbox{for }1 < n < N   \end{cases}

Here Goes…

T-SQL

the code snippet below displays the query used to calculate a percentile value using the exclusive method


/*steps
1. determine the index where percentile value is located
2. Calculate the percentile value
*/

declare @percentile numeric(19,2)
set @percentile = .90

;with percentile (rowIndex, rangePoint, [Shift], AverageTimePerIssue) as
(
select
 row_number() over (partition by [Shift] order by AverageTimePerIssue asc) as rowIndex ,
 ((count(1) over (partition by [Shift])+1) * @percentile) as rangePoint,
 a.[Shift], AverageTimePerIssue from FactCallCenter a
where AverageTimePerIssue is not null
)

select [Shift],
case when max(rangePoint) % 1 <> 0 THEN
 CAST(ROUND(MIN(AverageTimePerIssue) +
 (
 (MAX(AverageTimePerIssue) - MIN(AverageTimePerIssue))
 * (max(rangePoint) % 1)),2) as Numeric(19,2))
ELSE
 CAST(round(max(AverageTimePerIssue),2) as numeric(19,2)) END as percentileValue from
percentile
where rowIndex between Floor(rangePoint) and ceiling(rangePoint)
group by [shift],rangePoint

if you want to use the inclusive method you would modify the ((count(1) over (partition by [Shift])+1) * @percentile)  portion of the query to read ((count(1) over (partition by [Shift])-1) * .9)+1

SQL CLR Aggregate

While a T-SQL script works it requires a lot of  repetitive typing for multiple calculations and is not very reusable, I mean you could always bookmark this post or save the script as a stub for reuse, but I thought that a CLR Aggregate would be a better solution. As you can see selecting a percentile using an aggregate function requires much less typing and allows for reuse of the function without having to re write the implementation.

select [shift], [dbo].[percentiles]([AverageTimePerIssue], 90, 'EXC') from FactCallCenter
group by [shift]

The Aggregate Function takes three parameters:

  1. The column to evaluate the percentile value from
  2. The percentile to find as an integer
  3. The calculation Method {‘EXC’, ‘INC’}

To create the CLR aggregate function you can create a new SQL Server Database Project if you are using Visual Studio 2010 and have installed the new SQL CLR Database Project template which allows you to deploy to SQL  2012 or a new Visual C# SQL CLR Database Project if you are using VS 2010 with SQL 2005 or 2008, these projects will deploy your function directly to SQL Server. If you do not want to create a database project you can also simply created a new class library, then compile and register your assembly and function using SQL scripts (see script below). If you do not want to recreate the function you can download the compiled .dll here and execute the code below, assuming you saved the file to ‘c:\Temp’ , in the context of the database where you want to register the assembly and create the function


EXEC sp_configure 'show advanced options' , '1';
go
reconfigure;
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;
-- Turn advanced options back off
EXEC sp_configure 'show advanced options' , '0';
go

CREATE ASSEMBLY [sql.clr.percentiles] from 'c:\Temp\sql.clr.percentiles.dll'
WITH PERMISSION_SET = SAFE
GO

CREATE AGGREGATE [dbo].[percentiles]
(@value [float], @percentile [smallint], @calcMethod [nvarchar](4000))
RETURNS[float]
EXTERNAL NAME [sql.clr.percentiles].[Percentiles]
GO

Below you will find the code for the aggregate, the code basically performs the same operations as the SQL Equivalent, it finds the ordinal based on the previously defined formulas and then returns the percentile value using the formula as shown earlier.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.SqlServer.Server;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates = true, IsInvariantToNulls = false, IsNullIfEmpty = true, IsInvariantToOrder = false, MaxByteSize = -1)]
public struct Percentiles : Microsoft.SqlServer.Server.IBinarySerialize
{
 public void Init()
 {
 valueList = new List();
 }

public void Accumulate(SqlDouble value, SqlInt16 percentile, SqlString calcMethod)
 {
 if (!value.IsNull)
 {
 valueList.Add(value.Value);
 }

if (!percentile.IsNull)
 {
 PercentileValue = percentile.Value > 100 ? (short)100 : percentile.Value;
 }
 Method = calcMethod.ToString();
 }

public void Merge(Percentiles Group)
 {
 foreach (double d in Group.valueList)
 {
 valueList.Add(d);
 }
 PercentileValue = Group.PercentileValue;
 }

public SqlDouble Terminate()
 {
 double rangePoint = 0;
 switch (Method)
 {
 case "EXC":
 {
 rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count + 1))) - 1; // remove 1 as array is zero based
 break;
 }
 case "INC":
 {
 rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count - 1)) + 1) - 1; // remove 1 as array is zero based
 break;
 }
 default:
 {
 return SqlDouble.Null;
 }
 }
 valueList.Sort();

 //if rangePoint is a whole number
 if ( rangePoint % 1 == 0)
 {
 return valueList[(int)rangePoint];
 }
 else if ((int)Math.Ceiling(rangePoint) < valueList.Count)
 {
 return valueList[(int)Math.Floor(rangePoint)] +
 (rangePoint % 1 * (valueList[(int)Math.Ceiling(rangePoint)] - valueList[(int)Math.Floor(rangePoint)]));
 }
 else
 {
 return SqlDouble.Null;
 }
 }

public void Read(BinaryReader binaryReader)
 {
 if (valueList == null)
 {
 valueList = new List();
 }

Method = binaryReader.ReadString();

 PercentileValue = binaryReader.ReadInt16();

long readerLength = binaryReader.BaseStream.Length;
 while (binaryReader.BaseStream.Position  {
 valueList.Add(binaryReader.ReadDouble());
 }
 }

public void Write(BinaryWriter binaryWriter)
 {
 binaryWriter.Write(Method);
 binaryWriter.Write(PercentileValue);
 foreach (double d in valueList)
 {
 binaryWriter.Write(d);
 }
 }

private string Method { get; set; }
 private short PercentileValue{ get; set; }
 private List valueList { get; set; }
}

MDX

calculating a percentile in MDX follows the same pattern; get the ordinal from your ordered set and calculate the value as seen in the code below. In an effort to keep the MDX readable, allow for easier troubleshooting and better performance I broke the calculation into several calculated measures with the final calculated measure ‘Percentile’ using the measures defined earlier in the script, thanks to Chris Webb and Greg Galloway for their MDX optimization suggestions.

The code below provides the MDX used to generate a percentile value.

//number of items in the set
WITH MEMBER [measures].[Count]
as COUNT(
NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]))

//percentile index

MEMBER [Measures].[RangePoint] AS
(([measures].[Count]+1) *.90) -1 /* subtract 1 from total as item array is 0 based*/

MEMBER [Measures].[RangePoint_Int]
 AS
 INT([Measures].[RangePoint])

member Measures.[floor] as
 (BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
 item(([Measures].[RangePoint_int])),
 [Measures].[Average Time Per Issue])

member Measures.[Ceiling] as
(BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
 item(([Measures].[RangePoint_int]+1)),
 [Measures].[Average Time Per Issue])

MEMBER [Measures].[Percentiles] AS
IIF([Measures].[RangePoint] - [Measures].[RangePoint_Int] = 0,
//rangepoint is a whole number
(mySet.item([Measures].[RangePoint]-1),
 [Measures].[Average Time Per Issue]),
 //rangepoint is not a whole number
 Measures.[floor]
+
([Measures].[RangePoint] - [Measures].[RangePoint_Int])
*
(
Measures.[ceiling]
-
measures.[floor])
)
select {[Measures].[Percentiles]} on 0
,
([Shifts].[Shift].[Shift].members) on 1
from callcentre

To switch to the inclusive method you would modify the rangePoint calculated measure to read


MEMBER [Measures].[RangePoint] AS
((([measures].[Count]-1) *.90)+1) -1

SSAS Stored Procedures (ASSP)

Like T-SQL, the MDX solution requires lots of typing and reuse requires re-implementation; if you type anywhere like I do extra typing means lots of room for error. Fortunately SSAS provides a mechanism for creating reusable user defined functions, Analysis Services Stored Procedures.

To use the predefined Stored Procedure requires a fraction of the code and no need to re-implement the logic


with member measures.[90th percentile] as
[ssas storedprocedures].ValueAtPercentile(
NONEMPTY(([Date].[Day of Month].[day of month].members, [Measures].[Average Time Per Issue])),
 [Measures].[Average Time Per Issue], .90, true, "EXC")

select measures.[90th percentile] on 0,
 [Shifts].[Shift].[Shift].members on 1
from callcentre

The function takes 5 Parameters:

  1. The set to evaluate
  2. The measure to get the percentile of
  3. The percentile to find as a decimal
  4. The order the set{true, false}
  5. The calculation Method{‘EXC’, ‘INC’}

It should be noted that ASSP’s used within calculated measures may not perform as well as straight MDX, however for me having both options available is really useful. I did some high level testing  with these scripts using the SQL 2012 Adventure Works Cube and found the ASSP’s performed up to 14 time faster than the straight MDX scripts, bonus.

  • Using my laptop, fully loaded with 3 instances of SQL Server including 2012 with all SSAS flavours,  the ASSP executed in two seconds for both the Inclusive and Exclusive modes where the straight MDX took 28 seconds for Inclusive and 9 Seconds for Exclusive

My work on this stored procedure was heavily influenced by the Analysis Services Stored Procedure Project on Codeplex, it was a huge help in the learning  how to write ASSP’s and it provided an Order() function that I was able to reuse. I have submitted the function to the ASSP Project Authors in hopes that it may be included there. For those of you that do not want to create the ASSP you can download the compiled .dll here and follow the steps outlined at the end of this section to add the assembly to Analysis Services.

If you want to create your own ASSP Library you can follow the steps below.

  • create a new C# Class Library project in Visual Studio
  • add the following references
    • Microsoft.AnalysisServices (C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.DLL)
    • msmgdsrv (C:\Program Files (x86)\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll)
  • Set the target framework project property to .Net Framework 3.5
  • In project properties, Sign the assembly
  • Rename the Class1.cs to something meaningful (in my solution it is called percentiles.cs)
  • Add the following using statements

using Microsoft.AnalysisServices.AdomdServer;
using Microsoft.AnalysisServices;
using System.Collections.Generic;

Download the following file and add it to your solution (setOrdering Class from ASSP codeplex project)

  • Replace all the code under the using statements with the following

namespace ssas.storedprocedures
{
 public class PercentileFunctions
 {
 [SafeToPrepare(true)]
 public static double RangePoint(Set inputSet, double percentileValue, string calcMethod)
 {
 switch (calcMethod)
 {
 case "EXC":
 {
 return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
 }
 case "INC":
 {
 return (((percentileValue) * (inputSet.Tuples.Count -1))+1) - 1;
 }
 default:
 {
 return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
 }
 }
 }

[SafeToPrepare(true)]
 public static double ValueAtPercentile(Set inputSet, Expression sortExpression, double percentileValue, bool sortAscending, string calcMethod)
 {
 //get position where percentile falls
 double Rank = RangePoint(inputSet, percentileValue, calcMethod);

 //order the set ascending using Codeplex SSAS Stored Procedure Function
 if (sortAscending)
 {
 Set s = setOrdering.Order(inputSet, sortExpression);

//if the Rank is a whole number
 if ((Rank % 1) == 0)
 {
 return sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble();
 }
 //if Rank is a decimal
 else
 {
 return
 (sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
 +
 (Rank % 1 *
 (sortExpression.Calculate(s.Tuples[(int)Math.Ceiling(Rank)]).ToDouble()
 - sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
 );
 /*(((sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
 - sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));*/
 }
 }
 else {
 //if the Rank is a whole number
 if ((Rank % 1) == 0)
 {
 return sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble();
 }
 //if Rank is a decimal
 else
 {
 return
 (sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())
 +
 (((sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
 - sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));
 }
 }
 }
}
}
  • Compile the Code
  • Open SSMS, connect to Analysis Services
  • Right click on the Assemblies folder and select New Assembly
  • Browse to the compiled .dll
  • check the include debug information if you want debug the function (how to debug link)
  • use in a mdx query

Happy querying.

Advertisements

19 thoughts on “Percentiles like Excel to the 4th power (T-SQL, SQL CLR, MDX, ASSP)

  1. Thanks Richard. This is a great example. And just in time as I was just looking for a percentile calculation in MDX.

  2. This is exactly what I’ve been looking for. Unfortunately I can’t get it to work. When I run the following against my sample cube:

    with
    member [Measures].[Percentiles] AS
    [ASSP].ValueAtPercentile(
    NonEmpty([Dim Country].[Country ID].[Country ID].[Algeria], [Measures].[Base Salary])
    , [Measures].[Base Salary]
    , .90
    , true
    , “INC”
    )
    select {[measures].[percentiles]} on 0
    from [DWtest];

    it simply gives me the sum. I get the same result no matter what I change the percentile number to

    1. I also tried to debug the assembly to try to figure out what’s going on but I get nothing in VS when I execute the query. I guess it’s not hitting my breakpoints.

    2. Hi,

      The Original Member that you created passes in a single member set to the sp and will only ever evaluate to the sum of the base salaries for Algeria, if you are interested in seeing the 90th percentile base salary for Algeria you would need the following query

      with
      member [Measures].[Percentiles] AS
      [ASSP].ValueAtPercentile(
      NonEmpty( ([Dim Country].[Country ID].[algeria],[Dim Jobs].[Job Name].[job Name].members),
      [Measures].[Base Salary])
      , [Measures].[Base Salary]
      , .90
      , true
      , “INC”
      )
      select {[measures].[percentiles]} on 0
      from [DWtest];

      This Query Below will give you the 90th Percentile base Salary for jobs across countries

      with
      member [Measures].[Percentiles] AS
      [ASSP].ValueAtPercentile(
      NonEmpty( [Dim Country].[Country ID].[Country ID].members, [Measures].[Base Salary])
      , [Measures].[Base Salary]
      , .90
      , true
      , “INC”
      )
      select {[measures].[percentiles]} on 0,
      [Dim Jobs].[Job Name].[job Name].members on 1 from [DWtest];

      I hope that helps.

      Richard

      1. That helps a great deal. I’m obviously just wrestling with my own limited knowledge of MDX. Thanks for taking the time to take a look at this and Happy Holidays!

  3. Most of my research on creating a percentile measure in cubes lead me back to this post. Your explanation and walkthrough was very helpful as I am still working on my MDX knowledge.

    I was hoping you might shed some light in this issue I keep running into after importing the value at percentile stored procedure.

    I was able to run this MDX query without error.

    with member measures.[90th percentile] as

    [assp].ValueAtPercentile(
    NONEMPTY([Patient Number].[Vw Patient Info].members, [Measures].[Est Paid Amt] ),
    [Measures].[Est Paid Amt],.90, true, “EXC”)

    select measures.[90th percentile] on 0
    from PremiumMedexp

    However, when I tried to create a calculated measure in the cube with

    [assp].ValueAtPercentile(
    NONEMPTY([Patient Number].[Vw Patient Info].members, [Measures].[Est Paid Amt] ),
    [Measures].[Est Paid Amt],.90, true, “EXC”)

    I kept getting a syntax error at ‘.90’ from the MDX parser.
    I will really appreciate any thought on this.

  4. Thanks. Very useful.

    I got strange results using the ‘INC’ parameter.
    Looking at the code for INC:

    case “INC”:
    {
    return (((percentileValue) * (inputSet.Tuples.Count -1))+1) – 1;

    I would have expected:

    case “INC”:
    {
    return (((percentileValue) * (inputSet.Tuples.Count -1))+1) ;

    For example if you have 17 numbers and want the .25 percentile your calculation gives
    (((.25)*(17-1))+1)-1 = 4 for the rangepoint

    But the right rangepoint is 5 isn’t it?

    Thanks

  5. I got this error in the SQL Server version even though the field that is passed to the percentile function IS NOT NULL. I am testing this on a record set of over 29 million records

    Msg 6522, Level 16, State 1, Line 4
    A .NET Framework error occurred during execution of user-defined routine or aggregate “percentiles”:
    System.ArgumentNullException: Value cannot be null.
    Parameter name: value
    System.ArgumentNullException:
    at System.IO.BinaryWriter.Write(String value)
    at Percentiles.Write(BinaryWriter binaryWriter)

      1. select ‘Cnty’ GEO,’Month’ Time,Cnty_cd Geographic, SLS_RCDED_PERIOD YearMonth, PropType

        , util.[dbo].[percentiles](TotalAssessedValue, 10, ‘INC’) AS ASSD_PCT10
        , util.[dbo].[percentiles](TotalAssessedValue, 20, ‘INC’) AS ASSD_PCT20
        , util.[dbo].[percentiles](TotalAssessedValue, 25, ‘INC’) AS ASSD_PCT25
        , util.[dbo].[percentiles](TotalAssessedValue, 30, ‘INC’) AS ASSD_PCT30
        , util.[dbo].[percentiles](TotalAssessedValue, 40, ‘INC’) AS ASSD_PCT40
        , util.[dbo].[percentiles](TotalAssessedValue, 50, ‘INC’) AS ASSD_PCT50
        , util.[dbo].[percentiles](TotalAssessedValue, 60, ‘INC’) AS ASSD_PCT60
        , util.[dbo].[percentiles](TotalAssessedValue, 70, ‘INC’) AS ASSD_PCT70
        , util.[dbo].[percentiles](TotalAssessedValue, 75, ‘INC’) AS ASSD_PCT75
        , util.[dbo].[percentiles](TotalAssessedValue, 80, ‘INC’) AS ASSD_PCT80
        , util.[dbo].[percentiles](TotalAssessedValue, 90, ‘INC’) AS ASSD_PCT90

        from
        Sand_HealthySales with(nolock)
        where PropType = ‘ALL’ and TotalAssessedValue is not null
        group by Cnty_cd, SLS_RCDED_PERIOD, PropType
        order by Cnty_cd, SLS_RCDED_PERIOD, PropType

        FYI: TotalAssessedValue is a decimal field

  6. Pingback: Bandlines in SSRS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s