# Percentiles like Excel to the 4th power (T-SQL, SQL CLR, MDX, ASSP)

With the release of SQL 2012 additional analytic functions have been added to the SQL server T-SQL toolkit including 3 Percentile functions: PERCENTILE_CONT; PERCENTILE_DISC; PERCENTILE_RANK, sadly no new MDX Percentile functions where added. So I thought I would share 4 ways to calculate percentiles just like Excel using: MDX; SSAS Stored Procedures; T-SQL and SQL CLR Aggregates.

The example  queries in this post use the AdventureWorksDW2012 database and the AdventureWorksDW2012 UDM Analysis Services Database, with the primary tables being FactCallCenter and DimDate. The goal was to find the the 90th percentile for the ‘average issue wait time’ by shift.   To ensure that the data used for SSAS and SQL examples was the same I created a very simple cube called callcentre in the AdventureWorksDW2012 model, see screen shot below, the cube contains one measure and two dimensions

• Measures
• Average Time Per Issue
• Dimensions
• Date (existing date dimension)
• Shifts ( degenerate from the shifts column in the FactCallCentre Table)

The source for percentile knowledge was Wikipedia, come on it’s on the internet it must be true, both Excel percentile formulas percentile.inc and pecentile.exc are defined here under the Alternative methods heading.

Basically according to Wikipedia calculating an Excel like percentile comes down to two steps:

– Get the ordinal of the value with the following formulas

• Exclusive

$n = \frac{P}{100}(N+1)$

• Inclusive
$n = \frac{P}{100}(N-1)+1$

– Calculate the percentile Value by splitting the ordinal into its integer component k and decimal component d,
such that $n = k + d$. Then $v_P$ is calculated as:

$v_P = \begin{cases} v_1, & \mbox{for }n=1 \\ v_N, & \mbox{for }n=N \\ v_k+d(v_{k+1}-v_k), & \mbox{for }1 < n < N \end{cases}$

# T-SQL

the code snippet below displays the query used to calculate a percentile value using the exclusive method


/*steps
1. determine the index where percentile value is located
2. Calculate the percentile value
*/

declare @percentile numeric(19,2)
set @percentile = .90

;with percentile (rowIndex, rangePoint, [Shift], AverageTimePerIssue) as
(
select
row_number() over (partition by [Shift] order by AverageTimePerIssue asc) as rowIndex ,
((count(1) over (partition by [Shift])+1) * @percentile) as rangePoint,
a.[Shift], AverageTimePerIssue from FactCallCenter a
where AverageTimePerIssue is not null
)

select [Shift],
case when max(rangePoint) % 1 <> 0 THEN
CAST(ROUND(MIN(AverageTimePerIssue) +
(
(MAX(AverageTimePerIssue) - MIN(AverageTimePerIssue))
* (max(rangePoint) % 1)),2) as Numeric(19,2))
ELSE
CAST(round(max(AverageTimePerIssue),2) as numeric(19,2)) END as percentileValue from
percentile
where rowIndex between Floor(rangePoint) and ceiling(rangePoint)
group by [shift],rangePoint



if you want to use the inclusive method you would modify the ((count(1) over (partition by [Shift])+1) * @percentile)  portion of the query to read ((count(1) over (partition by [Shift])-1) * .9)+1

# SQL CLR Aggregate

While a T-SQL script works it requires a lot of  repetitive typing for multiple calculations and is not very reusable, I mean you could always bookmark this post or save the script as a stub for reuse, but I thought that a CLR Aggregate would be a better solution. As you can see selecting a percentile using an aggregate function requires much less typing and allows for reuse of the function without having to re write the implementation.

select [shift], [dbo].[percentiles]([AverageTimePerIssue], 90, 'EXC') from FactCallCenter
group by [shift]


The Aggregate Function takes three parameters:

1. The column to evaluate the percentile value from
2. The percentile to find as an integer
3. The calculation Method {‘EXC’, ‘INC’}

To create the CLR aggregate function you can create a new SQL Server Database Project if you are using Visual Studio 2010 and have installed the new SQL CLR Database Project template which allows you to deploy to SQL  2012 or a new Visual C# SQL CLR Database Project if you are using VS 2010 with SQL 2005 or 2008, these projects will deploy your function directly to SQL Server. If you do not want to create a database project you can also simply created a new class library, then compile and register your assembly and function using SQL scripts (see script below). If you do not want to recreate the function you can download the compiled .dll here and execute the code below, assuming you saved the file to ‘c:\Temp’ , in the context of the database where you want to register the assembly and create the function


EXEC sp_configure 'show advanced options' , '1';
go
reconfigure;
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;
-- Turn advanced options back off
EXEC sp_configure 'show advanced options' , '0';
go

CREATE ASSEMBLY [sql.clr.percentiles] from 'c:\Temp\sql.clr.percentiles.dll'
WITH PERMISSION_SET = SAFE
GO

CREATE AGGREGATE [dbo].[percentiles]
(@value [float], @percentile [smallint], @calcMethod [nvarchar](4000))
RETURNS[float]
EXTERNAL NAME [sql.clr.percentiles].[Percentiles]
GO



Below you will find the code for the aggregate, the code basically performs the same operations as the SQL Equivalent, it finds the ordinal based on the previously defined formulas and then returns the percentile value using the formula as shown earlier.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.SqlServer.Server;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates = true, IsInvariantToNulls = false, IsNullIfEmpty = true, IsInvariantToOrder = false, MaxByteSize = -1)]
public struct Percentiles : Microsoft.SqlServer.Server.IBinarySerialize
{
public void Init()
{
valueList = new List();
}

public void Accumulate(SqlDouble value, SqlInt16 percentile, SqlString calcMethod)
{
if (!value.IsNull)
{
}

if (!percentile.IsNull)
{
PercentileValue = percentile.Value > 100 ? (short)100 : percentile.Value;
}
Method = calcMethod.ToString();
}

public void Merge(Percentiles Group)
{
foreach (double d in Group.valueList)
{
}
PercentileValue = Group.PercentileValue;
}

public SqlDouble Terminate()
{
double rangePoint = 0;
switch (Method)
{
case "EXC":
{
rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count + 1))) - 1; // remove 1 as array is zero based
break;
}
case "INC":
{
rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count - 1)) + 1) - 1; // remove 1 as array is zero based
break;
}
default:
{
return SqlDouble.Null;
}
}
valueList.Sort();

//if rangePoint is a whole number
if ( rangePoint % 1 == 0)
{
return valueList[(int)rangePoint];
}
else if ((int)Math.Ceiling(rangePoint) < valueList.Count)
{
return valueList[(int)Math.Floor(rangePoint)] +
(rangePoint % 1 * (valueList[(int)Math.Ceiling(rangePoint)] - valueList[(int)Math.Floor(rangePoint)]));
}
else
{
return SqlDouble.Null;
}
}

{
if (valueList == null)
{
valueList = new List();
}

}
}

public void Write(BinaryWriter binaryWriter)
{
binaryWriter.Write(Method);
binaryWriter.Write(PercentileValue);
foreach (double d in valueList)
{
binaryWriter.Write(d);
}
}

private string Method { get; set; }
private short PercentileValue{ get; set; }
private List valueList { get; set; }
}


# MDX

calculating a percentile in MDX follows the same pattern; get the ordinal from your ordered set and calculate the value as seen in the code below. In an effort to keep the MDX readable, allow for easier troubleshooting and better performance I broke the calculation into several calculated measures with the final calculated measure ‘Percentile’ using the measures defined earlier in the script, thanks to Chris Webb and Greg Galloway for their MDX optimization suggestions.

The code below provides the MDX used to generate a percentile value.

//number of items in the set
WITH MEMBER [measures].[Count]
as COUNT(
NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
,[Measures].[Average Time Per Issue]))

//percentile index

MEMBER [Measures].[RangePoint] AS
(([measures].[Count]+1) *.90) -1 /* subtract 1 from total as item array is 0 based*/

MEMBER [Measures].[RangePoint_Int]
AS
INT([Measures].[RangePoint])

member Measures.[floor] as
(BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
item(([Measures].[RangePoint_int])),
[Measures].[Average Time Per Issue])

member Measures.[Ceiling] as
(BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
item(([Measures].[RangePoint_int]+1)),
[Measures].[Average Time Per Issue])

MEMBER [Measures].[Percentiles] AS
IIF([Measures].[RangePoint] - [Measures].[RangePoint_Int] = 0,
//rangepoint is a whole number
(mySet.item([Measures].[RangePoint]-1),
[Measures].[Average Time Per Issue]),
//rangepoint is not a whole number
Measures.[floor]
+
([Measures].[RangePoint] - [Measures].[RangePoint_Int])
*
(
Measures.[ceiling]
-
measures.[floor])
)
select {[Measures].[Percentiles]} on 0
,
([Shifts].[Shift].[Shift].members) on 1
from callcentre



To switch to the inclusive method you would modify the rangePoint calculated measure to read


MEMBER [Measures].[RangePoint] AS
((([measures].[Count]-1) *.90)+1) -1



# SSAS Stored Procedures (ASSP)

Like T-SQL, the MDX solution requires lots of typing and reuse requires re-implementation; if you type anywhere like I do extra typing means lots of room for error. Fortunately SSAS provides a mechanism for creating reusable user defined functions, Analysis Services Stored Procedures.

To use the predefined Stored Procedure requires a fraction of the code and no need to re-implement the logic


with member measures.[90th percentile] as
[ssas storedprocedures].ValueAtPercentile(
NONEMPTY(([Date].[Day of Month].[day of month].members, [Measures].[Average Time Per Issue])),
[Measures].[Average Time Per Issue], .90, true, "EXC")

select measures.[90th percentile] on 0,
[Shifts].[Shift].[Shift].members on 1
from callcentre



The function takes 5 Parameters:

1. The set to evaluate
2. The measure to get the percentile of
3. The percentile to find as a decimal
4. The order the set{true, false}
5. The calculation Method{‘EXC’, ‘INC’}

It should be noted that ASSP’s used within calculated measures may not perform as well as straight MDX, however for me having both options available is really useful. I did some high level testing  with these scripts using the SQL 2012 Adventure Works Cube and found the ASSP’s performed up to 14 time faster than the straight MDX scripts, bonus.

• Using my laptop, fully loaded with 3 instances of SQL Server including 2012 with all SSAS flavours,  the ASSP executed in two seconds for both the Inclusive and Exclusive modes where the straight MDX took 28 seconds for Inclusive and 9 Seconds for Exclusive

My work on this stored procedure was heavily influenced by the Analysis Services Stored Procedure Project on Codeplex, it was a huge help in the learning  how to write ASSP’s and it provided an Order() function that I was able to reuse. I have submitted the function to the ASSP Project Authors in hopes that it may be included there. For those of you that do not want to create the ASSP you can download the compiled .dll here and follow the steps outlined at the end of this section to add the assembly to Analysis Services.

If you want to create your own ASSP Library you can follow the steps below.

• create a new C# Class Library project in Visual Studio
• Microsoft.AnalysisServices (C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.DLL)
• msmgdsrv (C:\Program Files (x86)\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll)
• Set the target framework project property to .Net Framework 3.5
• In project properties, Sign the assembly
• Rename the Class1.cs to something meaningful (in my solution it is called percentiles.cs)
• Add the following using statements

using Microsoft.AnalysisServices;
using System.Collections.Generic;


• Replace all the code under the using statements with the following

namespace ssas.storedprocedures
{
public class PercentileFunctions
{
[SafeToPrepare(true)]
public static double RangePoint(Set inputSet, double percentileValue, string calcMethod)
{
switch (calcMethod)
{
case "EXC":
{
return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
}
case "INC":
{
return (((percentileValue) * (inputSet.Tuples.Count -1))+1) - 1;
}
default:
{
return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
}
}
}

[SafeToPrepare(true)]
public static double ValueAtPercentile(Set inputSet, Expression sortExpression, double percentileValue, bool sortAscending, string calcMethod)
{
//get position where percentile falls
double Rank = RangePoint(inputSet, percentileValue, calcMethod);

//order the set ascending using Codeplex SSAS Stored Procedure Function
if (sortAscending)
{
Set s = setOrdering.Order(inputSet, sortExpression);

//if the Rank is a whole number
if ((Rank % 1) == 0)
{
return sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble();
}
//if Rank is a decimal
else
{
return
(sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
+
(Rank % 1 *
(sortExpression.Calculate(s.Tuples[(int)Math.Ceiling(Rank)]).ToDouble()
- sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
);
/*(((sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
- sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));*/
}
}
else {
//if the Rank is a whole number
if ((Rank % 1) == 0)
{
return sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble();
}
//if Rank is a decimal
else
{
return
(sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())
+
(((sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
- sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));
}
}
}
}
}

• Compile the Code
• Open SSMS, connect to Analysis Services
• Right click on the Assemblies folder and select New Assembly
• Browse to the compiled .dll
• check the include debug information if you want debug the function (how to debug link)
• use in a mdx query

Happy querying.