Percentiles like Excel to the 4th power (T-SQL, SQL CLR, MDX, ASSP)

With the release of SQL 2012 additional analytic functions have been added to the SQL server T-SQL toolkit including 3 Percentile functions: PERCENTILE_CONT; PERCENTILE_DISC; PERCENTILE_RANK, sadly no new MDX Percentile functions where added. So I thought I would share 4 ways to calculate percentiles just like Excel using: MDX; SSAS Stored Procedures; T-SQL and SQL CLR Aggregates.

The example  queries in this post use the AdventureWorksDW2012 database and the AdventureWorksDW2012 UDM Analysis Services Database, with the primary tables being FactCallCenter and DimDate. The goal was to find the the 90th percentile for the ‘average issue wait time’ by shift.   To ensure that the data used for SSAS and SQL examples was the same I created a very simple cube called callcentre in the AdventureWorksDW2012 model, see screen shot below, the cube contains one measure and two dimensions

  • Measures
    • Average Time Per Issue
  • Dimensions
    • Date (existing date dimension)
    • Shifts ( degenerate from the shifts column in the FactCallCentre Table)

The source for percentile knowledge was Wikipedia, come on it’s on the internet it must be true, both Excel percentile formulas percentile.inc and pecentile.exc are defined here under the Alternative methods heading.

Basically according to Wikipedia calculating an Excel like percentile comes down to two steps:

– Get the ordinal of the value with the following formulas

  • Exclusive

n = \frac{P}{100}(N+1)

  • Inclusive
n = \frac{P}{100}(N-1)+1

– Calculate the percentile Value by splitting the ordinal into its integer component k and decimal component d,
such that n = k + d. Then v_P is calculated as:

 v_P = \begin{cases}   v_1, & \mbox{for }n=1 \\   v_N, & \mbox{for }n=N \\   v_k+d(v_{k+1}-v_k), & \mbox{for }1 < n < N   \end{cases}

Here Goes…

T-SQL

the code snippet below displays the query used to calculate a percentile value using the exclusive method


/*steps
1. determine the index where percentile value is located
2. Calculate the percentile value
*/

declare @percentile numeric(19,2)
set @percentile = .90

;with percentile (rowIndex, rangePoint, [Shift], AverageTimePerIssue) as
(
select
 row_number() over (partition by [Shift] order by AverageTimePerIssue asc) as rowIndex ,
 ((count(1) over (partition by [Shift])+1) * @percentile) as rangePoint,
 a.[Shift], AverageTimePerIssue from FactCallCenter a
where AverageTimePerIssue is not null
)

select [Shift],
case when max(rangePoint) % 1 <> 0 THEN
 CAST(ROUND(MIN(AverageTimePerIssue) +
 (
 (MAX(AverageTimePerIssue) - MIN(AverageTimePerIssue))
 * (max(rangePoint) % 1)),2) as Numeric(19,2))
ELSE
 CAST(round(max(AverageTimePerIssue),2) as numeric(19,2)) END as percentileValue from
percentile
where rowIndex between Floor(rangePoint) and ceiling(rangePoint)
group by [shift],rangePoint

if you want to use the inclusive method you would modify the ((count(1) over (partition by [Shift])+1) * @percentile)  portion of the query to read ((count(1) over (partition by [Shift])-1) * .9)+1

SQL CLR Aggregate

While a T-SQL script works it requires a lot of  repetitive typing for multiple calculations and is not very reusable, I mean you could always bookmark this post or save the script as a stub for reuse, but I thought that a CLR Aggregate would be a better solution. As you can see selecting a percentile using an aggregate function requires much less typing and allows for reuse of the function without having to re write the implementation.

select [shift], [dbo].[percentiles]([AverageTimePerIssue], 90, 'EXC') from FactCallCenter
group by [shift]

The Aggregate Function takes three parameters:

  1. The column to evaluate the percentile value from
  2. The percentile to find as an integer
  3. The calculation Method {‘EXC’, ‘INC’}

To create the CLR aggregate function you can create a new SQL Server Database Project if you are using Visual Studio 2010 and have installed the new SQL CLR Database Project template which allows you to deploy to SQL  2012 or a new Visual C# SQL CLR Database Project if you are using VS 2010 with SQL 2005 or 2008, these projects will deploy your function directly to SQL Server. If you do not want to create a database project you can also simply created a new class library, then compile and register your assembly and function using SQL scripts (see script below). If you do not want to recreate the function you can download the compiled .dll here and execute the code below, assuming you saved the file to ‘c:\Temp’ , in the context of the database where you want to register the assembly and create the function


EXEC sp_configure 'show advanced options' , '1';
go
reconfigure;
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;
-- Turn advanced options back off
EXEC sp_configure 'show advanced options' , '0';
go

CREATE ASSEMBLY [sql.clr.percentiles] from 'c:\Temp\sql.clr.percentiles.dll'
WITH PERMISSION_SET = SAFE
GO

CREATE AGGREGATE [dbo].[percentiles]
(@value [float], @percentile [smallint], @calcMethod [nvarchar](4000))
RETURNS[float]
EXTERNAL NAME [sql.clr.percentiles].[Percentiles]
GO

Below you will find the code for the aggregate, the code basically performs the same operations as the SQL Equivalent, it finds the ordinal based on the previously defined formulas and then returns the percentile value using the formula as shown earlier.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using Microsoft.SqlServer.Server;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates = true, IsInvariantToNulls = false, IsNullIfEmpty = true, IsInvariantToOrder = false, MaxByteSize = -1)]
public struct Percentiles : Microsoft.SqlServer.Server.IBinarySerialize
{
 public void Init()
 {
 valueList = new List();
 }

public void Accumulate(SqlDouble value, SqlInt16 percentile, SqlString calcMethod)
 {
 if (!value.IsNull)
 {
 valueList.Add(value.Value);
 }

if (!percentile.IsNull)
 {
 PercentileValue = percentile.Value > 100 ? (short)100 : percentile.Value;
 }
 Method = calcMethod.ToString();
 }

public void Merge(Percentiles Group)
 {
 foreach (double d in Group.valueList)
 {
 valueList.Add(d);
 }
 PercentileValue = Group.PercentileValue;
 }

public SqlDouble Terminate()
 {
 double rangePoint = 0;
 switch (Method)
 {
 case "EXC":
 {
 rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count + 1))) - 1; // remove 1 as array is zero based
 break;
 }
 case "INC":
 {
 rangePoint = (((Convert.ToDouble(PercentileValue) / 100) * (valueList.Count - 1)) + 1) - 1; // remove 1 as array is zero based
 break;
 }
 default:
 {
 return SqlDouble.Null;
 }
 }
 valueList.Sort();

 //if rangePoint is a whole number
 if ( rangePoint % 1 == 0)
 {
 return valueList[(int)rangePoint];
 }
 else if ((int)Math.Ceiling(rangePoint) < valueList.Count)
 {
 return valueList[(int)Math.Floor(rangePoint)] +
 (rangePoint % 1 * (valueList[(int)Math.Ceiling(rangePoint)] - valueList[(int)Math.Floor(rangePoint)]));
 }
 else
 {
 return SqlDouble.Null;
 }
 }

public void Read(BinaryReader binaryReader)
 {
 if (valueList == null)
 {
 valueList = new List();
 }

Method = binaryReader.ReadString();

 PercentileValue = binaryReader.ReadInt16();

long readerLength = binaryReader.BaseStream.Length;
 while (binaryReader.BaseStream.Position  {
 valueList.Add(binaryReader.ReadDouble());
 }
 }

public void Write(BinaryWriter binaryWriter)
 {
 binaryWriter.Write(Method);
 binaryWriter.Write(PercentileValue);
 foreach (double d in valueList)
 {
 binaryWriter.Write(d);
 }
 }

private string Method { get; set; }
 private short PercentileValue{ get; set; }
 private List valueList { get; set; }
}

MDX

calculating a percentile in MDX follows the same pattern; get the ordinal from your ordered set and calculate the value as seen in the code below. In an effort to keep the MDX readable, allow for easier troubleshooting and better performance I broke the calculation into several calculated measures with the final calculated measure ‘Percentile’ using the measures defined earlier in the script, thanks to Chris Webb and Greg Galloway for their MDX optimization suggestions.

The code below provides the MDX used to generate a percentile value.

//number of items in the set
WITH MEMBER [measures].[Count]
as COUNT(
NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]))

//percentile index

MEMBER [Measures].[RangePoint] AS
(([measures].[Count]+1) *.90) -1 /* subtract 1 from total as item array is 0 based*/

MEMBER [Measures].[RangePoint_Int]
 AS
 INT([Measures].[RangePoint])

member Measures.[floor] as
 (BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
 item(([Measures].[RangePoint_int])),
 [Measures].[Average Time Per Issue])

member Measures.[Ceiling] as
(BOTTOMCOUNT(NONEMPTY(({[Shifts].currentmember}*
{[Date].[Day of Month].[Day of Month].members})
 ,[Measures].[Average Time Per Issue]), [measures].[Count], [Measures].[Average Time Per Issue]).
 item(([Measures].[RangePoint_int]+1)),
 [Measures].[Average Time Per Issue])

MEMBER [Measures].[Percentiles] AS
IIF([Measures].[RangePoint] - [Measures].[RangePoint_Int] = 0,
//rangepoint is a whole number
(mySet.item([Measures].[RangePoint]-1),
 [Measures].[Average Time Per Issue]),
 //rangepoint is not a whole number
 Measures.[floor]
+
([Measures].[RangePoint] - [Measures].[RangePoint_Int])
*
(
Measures.[ceiling]
-
measures.[floor])
)
select {[Measures].[Percentiles]} on 0
,
([Shifts].[Shift].[Shift].members) on 1
from callcentre

To switch to the inclusive method you would modify the rangePoint calculated measure to read


MEMBER [Measures].[RangePoint] AS
((([measures].[Count]-1) *.90)+1) -1

SSAS Stored Procedures (ASSP)

Like T-SQL, the MDX solution requires lots of typing and reuse requires re-implementation; if you type anywhere like I do extra typing means lots of room for error. Fortunately SSAS provides a mechanism for creating reusable user defined functions, Analysis Services Stored Procedures.

To use the predefined Stored Procedure requires a fraction of the code and no need to re-implement the logic


with member measures.[90th percentile] as
[ssas storedprocedures].ValueAtPercentile(
NONEMPTY(([Date].[Day of Month].[day of month].members, [Measures].[Average Time Per Issue])),
 [Measures].[Average Time Per Issue], .90, true, "EXC")

select measures.[90th percentile] on 0,
 [Shifts].[Shift].[Shift].members on 1
from callcentre

The function takes 5 Parameters:

  1. The set to evaluate
  2. The measure to get the percentile of
  3. The percentile to find as a decimal
  4. The order the set{true, false}
  5. The calculation Method{‘EXC’, ‘INC’}

It should be noted that ASSP’s used within calculated measures may not perform as well as straight MDX, however for me having both options available is really useful. I did some high level testing  with these scripts using the SQL 2012 Adventure Works Cube and found the ASSP’s performed up to 14 time faster than the straight MDX scripts, bonus.

  • Using my laptop, fully loaded with 3 instances of SQL Server including 2012 with all SSAS flavours,  the ASSP executed in two seconds for both the Inclusive and Exclusive modes where the straight MDX took 28 seconds for Inclusive and 9 Seconds for Exclusive

My work on this stored procedure was heavily influenced by the Analysis Services Stored Procedure Project on Codeplex, it was a huge help in the learning  how to write ASSP’s and it provided an Order() function that I was able to reuse. I have submitted the function to the ASSP Project Authors in hopes that it may be included there. For those of you that do not want to create the ASSP you can download the compiled .dll here and follow the steps outlined at the end of this section to add the assembly to Analysis Services.

If you want to create your own ASSP Library you can follow the steps below.

  • create a new C# Class Library project in Visual Studio
  • add the following references
    • Microsoft.AnalysisServices (C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.DLL)
    • msmgdsrv (C:\Program Files (x86)\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll)
  • Set the target framework project property to .Net Framework 3.5
  • In project properties, Sign the assembly
  • Rename the Class1.cs to something meaningful (in my solution it is called percentiles.cs)
  • Add the following using statements

using Microsoft.AnalysisServices.AdomdServer;
using Microsoft.AnalysisServices;
using System.Collections.Generic;

Download the following file and add it to your solution (setOrdering Class from ASSP codeplex project)

  • Replace all the code under the using statements with the following

namespace ssas.storedprocedures
{
 public class PercentileFunctions
 {
 [SafeToPrepare(true)]
 public static double RangePoint(Set inputSet, double percentileValue, string calcMethod)
 {
 switch (calcMethod)
 {
 case "EXC":
 {
 return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
 }
 case "INC":
 {
 return (((percentileValue) * (inputSet.Tuples.Count -1))+1) - 1;
 }
 default:
 {
 return ((percentileValue) * (inputSet.Tuples.Count + 1)) - 1;
 }
 }
 }

[SafeToPrepare(true)]
 public static double ValueAtPercentile(Set inputSet, Expression sortExpression, double percentileValue, bool sortAscending, string calcMethod)
 {
 //get position where percentile falls
 double Rank = RangePoint(inputSet, percentileValue, calcMethod);

 //order the set ascending using Codeplex SSAS Stored Procedure Function
 if (sortAscending)
 {
 Set s = setOrdering.Order(inputSet, sortExpression);

//if the Rank is a whole number
 if ((Rank % 1) == 0)
 {
 return sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble();
 }
 //if Rank is a decimal
 else
 {
 return
 (sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
 +
 (Rank % 1 *
 (sortExpression.Calculate(s.Tuples[(int)Math.Ceiling(Rank)]).ToDouble()
 - sortExpression.Calculate(s.Tuples[(int)Math.Floor(Rank)]).ToDouble())
 );
 /*(((sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
 - sortExpression.Calculate(s.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));*/
 }
 }
 else {
 //if the Rank is a whole number
 if ((Rank % 1) == 0)
 {
 return sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble();
 }
 //if Rank is a decimal
 else
 {
 return
 (sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())
 +
 (((sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank) + 1]).ToDouble()
 - sortExpression.Calculate(inputSet.Tuples[Convert.ToInt32(Rank)]).ToDouble())) * (Rank - Convert.ToInt32(Rank)));
 }
 }
 }
}
}
  • Compile the Code
  • Open SSMS, connect to Analysis Services
  • Right click on the Assemblies folder and select New Assembly
  • Browse to the compiled .dll
  • check the include debug information if you want debug the function (how to debug link)
  • use in a mdx query

Happy querying.

Advertisements

Dynamically generated Create Table Statement SQL Server

— Modified the script to capture and include column default values

It has been a while since I last added to the googleverse, been taking more than I’ve been giving lately, time to give a little back.

In this short post you will find a query that builds a CREATE TABLE statement given a valid table name. The script only handles a single Primary Key Constraint and does not handle foriegn key contraints, but even with it’s limitations I hope this helps someone out there.

declare @SqlStart nvarchar(max) = ''
declare @sql nvarchar(max) = ''
declare @sqlEnd nvarchar(max) = ''
declare @table varchar(max) = 'T_ESB_Submission_Template'

set @SqlStart = 'CREATE TABLE [' + @table + '] (' + Char(13) 
				
Select @sql = @sql + a.Column_Name + ' ' 
+ Data_Type 
+ CASE WHEN CHARACTER_MAXIMUM_LENGTH  IS NULL or DATA_TYPE = 'xml' THEN '' 
ELSE '(' + 
CASE WHEN CHARACTER_MAXIMUM_LENGTH>0 THEN 
CAST(CHARACTER_MAXIMUM_LENGTH as varchar(10))
ELSE 'max' END +')' END
+ CASE WHEN NUMERIC_PRECISION IS NULL OR DATA_TYPE in ('Int', 'tinyint', 'bigint', 'smallint') THEN '' 
ELSE '(' + CAST(NUMERIC_PRECISION as varchar(10)) +','+CAST(NUMERIC_SCALE as varchar(10)) +')' END 
+ CASE when exists (  select id from syscolumns where object_name(id)=@table 
		and name=a.column_name 
		and columnproperty(id,name,'IsIdentity') = 1  ) 
THEN 
' IDENTITY(' +  
cast(ident_seed(@table) as varchar) + ',' +  
cast(ident_incr(@table) as varchar) + ')' ELSE '' END 
+ CASE WHEN b.default_value is not null THEN ' DEFAULT ' + 
substring(b.default_value, 2, Len(b.default_value)-2) + ' ' ELSE '' END
+ CASE WHEN IS_NULLABLE = 'NO' THEN ' NOT NULL' ELSE ' NULL ' END + CHAR(13) + ','
from  Information_Schema.COLUMNS a
join 
(  SELECT so.name AS table_name, 
          sc.name AS column_name, 
          sm.text AS default_value
     FROM sys.sysobjects so
     JOIN sys.syscolumns sc ON sc.id = so.id
 left JOIN sys.syscomments SM ON sm.id = sc.cdefault
    WHERE so.xtype = 'U' 
      AND SO.name = @table) b on b.column_name = a.COLUMN_NAME and b.table_name = a.TABLE_NAME
where a.Table_Name = @table

if((select count(1) from INFORMATION_SCHEMA.TABLE_CONSTRAINTS where TABLE_NAME = @table) >0)
BEGIN
     select @sqlEnd =char(13) + 'CONSTRAINT [PK_' + @table + '_1] PRIMARY KEY NONCLUSTERED' +char(13)
	+'(' +char(13) + Column_Name + ' ASC ' + char(13)+')WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,         IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]) ON [PRIMARY]'
from INFORMATION_SCHEMA.KEY_COLUMN_USAGE  where TABLE_NAME = @table					

set @Sql = @SqlStart + SUBSTRING(@sql, 0, len(@sql)-1) + @sqlEnd
END
else
BEGIN
     set @Sql = @SqlStart + SUBSTRING(@sql, 0, len(@sql)-1) + ')'
END

print @sql				
			

Generating multiple rows from a single record with start and end columns

Working on a project where the reporting requirement is to take a single record containing the overall total, the start and end period and generate multiple rows including all periods between the start and end. I knew that I could get the results that I needed with a common table expression (CTE) but time was of the essence so I turned to my trusty friend the ‘Cursor’. I know Cursors are for the weak and the lazy, but i needed to prototype something really fast and for me this was the easiest way. For those of you wondering why is he blogging about being weak and lazy, I did return to the CTE method and was rewarded with the required results using a fraction of the SQL and produced a stored procedure which is easier to modify and tweak as reporting requirements change.

Below you will find some sample code used to generate multiple rows from a single datarow containing a start and an end column. In this example I show both using integers and dates to denote the beginning and end and end values.

/*using integers as starting and ending values*/
create table #cteTempInt
(
id int identity (1,1)
,amount numeric(18,2)
, Divisor int
, starting int
, ending int
)

insert into #cteTempInt Values(1200, 6, 1, 6)
insert into #cteTempInt Values(4300, 10, 2, 11)
insert into #cteTempInt Values(2400, 4, 3, 6)

print 'results from integer sample'
;with t(id, periodAmount, period, ending) AS
(
select id, amount/divisor as periodAmount, starting as period, ending from #cteTempInt
union all
select id, periodAmount, period +1 as period , ending from t
where period+1 <= ending
)

select id, periodamount, period,
row_number() over(partition by id order by id,period) as paymentNumber from t order by id, period

DROP TABLE #cteTempInt

/*using dates as starting and ending values*/

create table #cteTempDate
(
id int identity (1,1)
,amount numeric(18,2)
, Divisor int
, starting date
, ending date
)

insert into #cteTempDate Values(1200, 6, '2012-01-01','2012-06-01')
insert into #cteTempDate Values(4300, 10,'2012-02-01','2012-11-01')
insert into #cteTempDate Values(2400, 4, '2012-05-01','2012-08-01')

print 'results from date sample'
;with t(id, periodAmount, period, ending) AS
(
select id, amount/divisor as periodAmount, starting as period, ending from #cteTempDate
union all
select id, periodAmount, dateadd(mm, 1, period) as period , ending from t
where dateadd(mm,1,period) <= ending
)
select id, periodamount, period,
row_number() over(partition by id order by id,period) as paymentNumber from t order by id, period

drop table #cteTempDate
results from integer sample
id          periodamount                            period      paymentNumber
----------- --------------------------------------- ----------- --------------------
1           200.0000000000000                       1           1
1           200.0000000000000                       2           2
1           200.0000000000000                       3           3
1           200.0000000000000                       4           4
1           200.0000000000000                       5           5
1           200.0000000000000                       6           6
2           430.0000000000000                       2           1
2           430.0000000000000                       3           2
2           430.0000000000000                       4           3
2           430.0000000000000                       5           4
2           430.0000000000000                       6           5
2           430.0000000000000                       7           6
2           430.0000000000000                       8           7
2           430.0000000000000                       9           8
2           430.0000000000000                       10          9
2           430.0000000000000                       11          10
3           600.0000000000000                       3           1
3           600.0000000000000                       4           2
3           600.0000000000000                       5           3
3           600.0000000000000                       6           4

results from date sample
id          periodamount                            period     paymentNumber
----------- --------------------------------------- ---------- --------------------
1           200.0000000000000                       2012-01-01 1
1           200.0000000000000                       2012-02-01 2
1           200.0000000000000                       2012-03-01 3
1           200.0000000000000                       2012-04-01 4
1           200.0000000000000                       2012-05-01 5
1           200.0000000000000                       2012-06-01 6
2           430.0000000000000                       2012-02-01 1
2           430.0000000000000                       2012-03-01 2
2           430.0000000000000                       2012-04-01 3
2           430.0000000000000                       2012-05-01 4
2           430.0000000000000                       2012-06-01 5
2           430.0000000000000                       2012-07-01 6
2           430.0000000000000                       2012-08-01 7
2           430.0000000000000                       2012-09-01 8
2           430.0000000000000                       2012-10-01 9
2           430.0000000000000                       2012-11-01 10
3           600.0000000000000                       2012-05-01 1
3           600.0000000000000                       2012-06-01 2
3           600.0000000000000                       2012-07-01 3
3           600.0000000000000                       2012-08-01 4

Squarified Heat Maps SQL Reporting Services Part 2

##UPDATE##
Jason Thomas, check out his blog here http://road-blogs.blogspot.com/,  asked what is the first column in the select statement passed into the stored procedure used for; the answer is nothing so I have removed it and modified the Code and SQL Statement in the post.

Thanks Jason
##

Well it has been a while and I know how annoying it can be when the part II of a part I you are interested in never shows up, all I can hope is that there is somebody out there that is interested.

Steps to creating the Report

  • You will need the Contoso Data Warehouse for the query used in this demo report to execute
  • Launch Visual Studio 2010 and create a new Visual C# SQL CLR Database Project

  • Follow the wizard and setup database connection


  • Add a new stored procedure to the project


Replace the contents of the new .cs file with the code below.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections.Generic;
using System.Globalization;

//Enumeration used in the TreeMap creation functions
enum myOrientation { Horizontal, Vertical };

public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void TreeMapGeography(SqlSingle Height, SqlSingle Width, SqlString select)
    {
        //Check to see if the SQL Statement passed into the proc contains bad words
        if (select.ToString().ToLower().Contains("update") || select.ToString().ToLower().Contains("delete")
            || select.ToString().ToLower().Contains("truncate") || select.ToString().ToLower().Contains("drop")
            || select.ToString().ToLower().Contains("alter") || select.ToString().ToLower().Contains("create"))
            return;
        //Open SQL Connection
        using (SqlConnection connection = new SqlConnection("context connection=true"))
        {
            try
            {
                connection.Open();
                SqlCommand command = new SqlCommand(select.ToString(), connection);
                SqlDataReader r = command.ExecuteReader();
                DataTable t = new DataTable();

                if (r.HasRows)
                {

                    t.Load(r, LoadOption.OverwriteChanges);
                    //Instantiate new TreeMap Object
                    /*sqlString property of the TreeMap will contain the sql statement required
                    to generate the treemap*/
                    TreeMap hm = new TreeMap(t, (Single)Height, (Single)Width);
                    if (hm.sqlString != "")
                    {
                        command.CommandText = hm.sqlString;
                        SqlContext.Pipe.ExecuteAndSend(command);
                    }
                    else
                    {
                        SqlContext.Pipe.Send("TreeMap Function Failed to generate any records");
                        return;
                    }
                }


            }
            catch (SqlException e)
            {
                SqlContext.Pipe.Send(e.Message);
            }
        }
    }


    private class TreeMap
    {
        public class TreeMapData
        {
           /*public int id { get; set; }*/
            public int amount { get; set; }
            public Single scaledAmount { get; set; }
            public Single x { get; set; }
            public Single y { get; set; }
            public Single w { get; set; }
            public Single h { get; set; }
            public string nameColumn { get; set; }
            public myOrientation Orientation { get; set; }

            public TreeMapData(/*int Id,*/ int Amount, string NameColumn)
            {
                /*id = Id;*/
                amount = Amount;
                nameColumn = NameColumn;
            }
        }

        private Single myHeight = 0;
        private Single myWidth = 0;
        private Single aspect = 0;
        private Single yOffSet = 0;
        private Single xOffSet = 0;
        private Single valueScale = 0;
        public string sqlString = string.Empty;

        private List<TreeMapData> valueList = new List<TreeMapData>();
        private List<TreeMapData> vList = new List<TreeMapData>();

        public TreeMap(DataTable TreeMapData, Single Height, Single Width)
        {
            myHeight = Height;
            myWidth = Width;
            foreach (DataRow dr in TreeMapData.Rows)
            {
                TreeMapData h = new TreeMapData(Convert.ToInt32(dr[0]), dr[1].ToString());
                valueList.Add(h);
            }
            if (valueList.Count > 0)
                sqlString = Squarify(valueList);
        }

        public string Squarify(List<TreeMapData> fullList)
        {
            string sqlString = String.Empty;
            Single tAspect = 999;
            Single lAspect = 0;
            int passes = 1;
            int i = 1;
            scaleAmounts();
            for (; i <= fullList.Count; i++)
            {
                lAspect = getAspect(myHeight, myWidth, fullList, passes, DrawingOrientation(myWidth, myHeight));
                if (lAspect >= tAspect || i == fullList.Count)
                {
                    tAspect = 999;
                    layout(myHeight, myWidth, fullList, passes, DrawingOrientation(myWidth, myHeight));
                    passes = 1;
                    if (fullList.Count == i)
                        fullList.Clear();
                    else
                        fullList.RemoveRange(0, i - 1);
                    i = 0;
                }
                else
                {
                    passes++;
                    tAspect = lAspect;
                }
            }
            foreach (TreeMapData t in vList)
            {
                sqlString += String.Format(CultureInfo.InvariantCulture," Select geometry::STPolyFromText('POLYGON (({0} {1}, {2} {3}, {4} {5}, {6} {7}, {8} {9}))', 0) as Geo, {10} as amount, "
                    + " '{11}' as Name \r\n Union all\r\n",
                   t.x, t.y,
                   t.x, t.h + t.y,
                   t.x + t.w, t.h + t.y,
                   t.w + t.x, t.y,
                   t.x, t.y,
                  t.amount, t.nameColumn.Replace("'", "").Trim());
            }
            return sqlString.Substring(0, sqlString.Length - 13);
            //if(getAspect(Height, Width, fullList, DrawingOrientation(Width, Height)) <= 
            //setRectangles(vList, 4, 6);
        }
        public void scaleAmounts()
        {
            Single dataTotal = 0;
            foreach (TreeMapData t in valueList)
            {
                dataTotal += t.amount;
            }
            valueScale = (myWidth * myHeight) / dataTotal;

            foreach (TreeMapData s in valueList)
            {
                s.scaledAmount = s.amount * valueScale;
            }
        }
        private myOrientation DrawingOrientation(Single width, Single height)
        {
            return width > height ? myOrientation.Vertical : myOrientation.Horizontal;
        }
        private Single getAspect(Single height, Single width, List<TreeMapData> fullList, int passes, myOrientation orientation)
        {
            Single tWidth = 0;
            Single tHeight = 0;
            Single total = 0;

            for (int t = 0; t < passes; t++)
            {
                total += fullList[t].scaledAmount;
            }
            switch (orientation)
            {
                case myOrientation.Vertical:

                    tWidth = total / myHeight;

                    for (int i = 0; i < passes; i++)
                    {
                        tHeight = myHeight * fullList[i].scaledAmount / total;
                        fullList[i].h = tHeight;
                        fullList[i].w = tWidth;
                    }
                    aspect = Math.Max((tHeight / tWidth), (tWidth / tHeight));
                    break;

                case myOrientation.Horizontal:
                    tHeight = total / myWidth;

                    for (int i = 0; i < passes; i++)
                    {
                        tWidth = myWidth * fullList[i].scaledAmount / total;
                        fullList[i].h = tHeight;
                        fullList[i].w = tWidth;
                    }
                    aspect = Math.Max((tHeight / tWidth), (tWidth / tHeight));
                    break;
            }
            return aspect;
        }
        private void layout(Single height, Single width, List<TreeMapData> fullList, int passes, myOrientation orientation)
        {
            Single tempHeight = 0;
            Single tempWidth = 0;
            Single tWidth = 0;
            Single tHeight = 0;
            Single total = 0;
            passes = fullList.Count == passes ? passes : passes - 1;

            for (int t = 0; t < passes; t++)
            {
                total += fullList[t].scaledAmount;
            }

            switch (orientation)
            {
                case myOrientation.Vertical:
                    tWidth = total / myHeight;
                    for (int i = 0; i < passes; i++)
                    {
                        tHeight = myHeight * fullList[i].scaledAmount / total;

                        fullList[i].h = tHeight;
                        fullList[i].w = tWidth;
                        fullList[i].Orientation = orientation;
                        fullList[i].x = xOffSet;
                        fullList[i].y = yOffSet + tempHeight;
                        vList.Add(fullList[i]);
                        tempHeight += tHeight;
                    }
                    //myWidth -= tWidth;
                    xOffSet += tWidth;
                    myWidth -= tWidth;
                    tempHeight = 0;


                    // staticWidth -= tempY;


                    break;
                case myOrientation.Horizontal:

                    tHeight = total / myWidth;
                    for (int i = 0; i < passes; i++)
                    {
                        tWidth = myWidth * fullList[i].scaledAmount / total;

                        fullList[i].h = tHeight;
                        fullList[i].w = tWidth;
                        fullList[i].Orientation = orientation;
                        fullList[i].x = xOffSet + tempWidth;
                        fullList[i].y = yOffSet;
                        vList.Add(fullList[i]);
                        tempWidth += tWidth;


                    }

                    yOffSet += tHeight;
                    myHeight -= tHeight;

                    break;
            }

        }
    }
};
  • Ensure that the .Net Framework being targeted is 3.5

  • Deploy the project
  • Create a new Report Server Project (SQL 2012 BI Templates are in Visual Studio 2010)

  • Add a new report to the Project


  • Drag a Map in the report surface

  • Select spatial query option

  • Create new Dataset

  • Create a new data source
  • Click Next


  • In the query designer click ‘Edit as Text’
  • Copy the query below into the query text pane

exec dbo.TreeMapGeography 16, 25, ‘select sum(SalesAmount) as salesamount,c.productsubcategoryname from contosodw..factsales a join contosodw..Dimproduct b on a.productKey = b.productKey join contosodw..[DimProductSubcategory] c on b.productsubcategorykey = c.productsubcategorykey where datepart(yyyy,a.DateKey) = 2009 group by b.productsubcategorykey, c.productsubcategoryname order by salesamount desc’

  • Click Next


  • Uncheck the embed data in report option
  • Click next

  • Choose the Color Analytical Map Option
  • Click Next

  • Choose the current dataset as the Analytical Dataset
  • Click Next

  • Select Sum(amount) as the field to visualize
  • Check Display Labels
  • Choose the Name field as the Data field
  • Click next

  • Delete the Legend, and Scales

  • Right click on the map and select polygon Color Rules

  • Modify the Start Color to Red, Middle to Yellow and End to Green

  • Click Preview

Squarified Heat Maps SQL Reporting Services Part 1

Update:

Friend of mine, let’s call him Gerald, asked me what is the point of  this Report, to which I said ‘Green is good and Red is bad’, not such a good answer. I suppose the best way that I can describe the value of this type of visualization is;  for a reasonably sized data sets you are providing the end user with a report that displays each members relative merit not only in relation to the parent or the whole but also in relation to its siblings. By Leveraging report actions in SSRS you can click through to see how the children of any member are contributing to the relative success or failure of the member. Please let me know your thoughts on the merits of this report and my implementation.

One day I was thinking about what it would take to create a bunch of rectangles within a larger rectangle based on the relative weight of the smaller rectangles to the larger rectangle, to create a heat map in SSRS.

So I Googled it and found a couple great posts from Teo Lachev and Jonathan Hodgson and the “Squarified Treemaps” paper by Mark Bruls, Kees Huizing, and Jarke J. van Wijk of the Eindhoven University of Technology, Dept. of Mathematics and Computer Science: (thank you very much)

Using these three resources I began my journey.

First I created a simple windows forms application with a multiline text box. The only thing that this app does is display a string comprised of unioned select statements so I could test the effectiveness of my code in SSMS. After about 1000 attempts it actually worked. Given a set of data, width and height, the application generated a select statement that would deliver the correct ‘Squarified’ Heat map visualization.

The next step was to create a CLR Stored Procedure that accepts the following parameters Height, Width, and Query. I made sure that the procedure would not execute the if it contained key words like, delete, insert, update, drop, truncate and alter.

Basically what the Procedure does is.

  • Executes the query passed in and loads a datatable
  • Instantiates a HeatMap object, passing in the datatable, Height and Width
  • Sets the Commandtext of a sqlCommand to the sqlString attribute of the HeatMap object previously created
  • Executes the sql command and send the results back.

Once the Stored Procedure was created and working it was time to put it to use. I created a New SSRS Report added a Map report item, used the SQL Server Spatial query option, set the stored procedure as my spatial query passing in the appropriate parameters; resulting in the report below. The data for this report is sourced from the Contoso Dataset

In Part Two I will go through the source code and provide step by step instructions for creating the report.

SQL – Converting an integer to datetime

Sometimes in life you come across integers that represent a datetime value and you want to convert it to make it more appealing . Here is a handy little helper that will convert integers represented as ‘YYYYMMDDHHMM’, ie 200805051010;

 select CAST(STUFF(STUFF(STUFF(STUFF(CAST(200805051010 as varchar(20) ), 5, 0, ‘-‘), 8, 0, ‘-‘),11,0,’ ‘), 14, 0, ‘:’) as datetime)
as dt.

(You could also use the convert function if you wanted a different date format, I know I could have cast the int to a varchar(8) and gotten a valid date without all the stuffing, but I needed to keep the time part of the datetime)

As with just about everything that I do, I am sure there is a much better more efficient way to do it that involves less typing, but it works and that is my ultimate goal