Introduction Into Window Functions on SQL Server
Why window functions may come in handy
Let us dive directly into a real world example. Youve got a table with salaries of different job functions and you are asked to find out what the n-tiles (quantiles, mean etc.) are of each function within a company. So this means youve got many people that hold a position which you want to compare with others. You want to compare salaries of marketing professionals to their peers, but you dont want to compare said salaries to marketing directors because they have their own salary bands. So that means you need to frame (also called partition) job titles with their own with which you calculate the n-tiles. Without window functions, this could be a quite fussy task.
In order to walk through the problem, create a new database, Im going to name mine FoehnWind [^1]. The following script creates a new table and inserts some sample data of 20 rows into it.
Tutorial
USE [FoehnWind];GOCREATE TABLE dbo.Employee(ID TINYINT IDENTITY(1, 1) NOT NULL, FirstName VARCHAR(150) NOT NULL, LastName VARCHAR(150) NULL, JobTitle VARCHAR(100) NOT NULL, AnnualSalary INT NOT NULL, CONSTRAINT [PK_dbo.Employee] PRIMARY KEY CLUSTERED(ID ASC))ON [PRIMARY];GOINSERT INTO [dbo].[Employee] VALUES ( 'Ken','Sanchez','Chief Executive Officer', 530000),( 'Jo','Berry','Marketing Assistant', 71875 ),( 'Lori','Penor','Marketing Assistant', 73398 ),( 'Pat','Coleman','Marketing Assistant', 51874 ),( 'Stuart','Macrae','Marketing Assistant', 59108 ),( 'Mary','Dempsey','Marketing Assistant', 57641 ),( 'Wanida','Benshoof','Marketing Assistant', 60779 ),( 'David','Bradley','Marketing Manager', 130000 ),( 'Jill','Williams','Marketing Specialist', 108117 ),( 'John','Wood','Marketing Specialist', 99407 ),( 'Sariya','Pit','Marketing Specialist', 95048 ),( 'Terry','Eminhizer','Marketing Specialist', 106316 ),( 'Michael','Sullivan','Marketing Specialist', 87047 ),( 'Ovidiu','Cracium','Marketing Specialist', 107883 ),( 'Rob','Walters','Marketing Specialist', 109369 ),( 'Garrett','Young','Senior Marketing Specialist', 118051 ),( 'George','Li','Senior Marketing Specialist', 117253 ),( 'Russell','King','Senior Marketing Specialist', 123362 ),( 'Sandra','Alayo','Senior Marketing Specialist', 118185 ),( 'Denise','Smith','Vice President Marketing', 250000 );GO
Firstly, lets use a window function which calculates the averages and percentiles of salaries for each job title:
SELECT distinct e.JobTitle ,AVG(e.AnnualSalary) OVER (PARTITION BY e.JobTitle) AS AverageSalary ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) OVER (PARTITION BY e.JobTitle) as MedianContinous ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) OVER (PARTITION BY e.JobTitle) as MedianDisContinousFROM dbo.Employee eorder by e.JobTitle
You perhaps now wondered how the functions PERCENTILE_CONT()
and PERCENTILE_DISC()
differ. The discontinous variant returns an actual value that exists in the calculated set and pointing out that this values is the closest one being to the calculative median. The continous percentile in contrast returns a value that is actually the calculated median. For example, Marketing Assistant Stuart Macrae has an Annual Salary of 59108. This amount is also the median in this job role, but the calculated median is 59943.5 so Stuart is slightly below the median but his salary is the closest one to the median.
JobTitleEmployeeNameAnnualSalaryAverageSalaryMedianContinousMedianDisContinousChief Executive OfficerKen Sanchez530000530000530000530000Marketing AssistantPat Coleman518746244559943.559108Marketing AssistantMary Dempsey576416244559943.559108Marketing AssistantStuart Macrae591086244559943.559108Marketing AssistantWanida Benshoof607796244559943.559108Marketing AssistantJo Berry718756244559943.559108Marketing AssistantLori Penor733986244559943.559108Marketing ManagerDavid Bradley130000130000130000130000Marketing SpecialistMichael Sullivan87047101883106316106316Marketing SpecialistSariya Pit95048101883106316106316Marketing SpecialistJohn Wood99407101883106316106316Marketing SpecialistTerry Eminhizer106316101883106316106316Marketing SpecialistOvidiu Cracium107883101883106316106316Marketing SpecialistJill Williams108117101883106316106316Marketing SpecialistRob Walters109369101883106316106316Senior Marketing SpecialistGeorge Li117253119212118118118051Senior Marketing SpecialistGarrett Young118051119212118118118051Senior Marketing SpecialistSandra Alayo118185119212118118118051Senior Marketing SpecialistRussell King123362119212118118118051Vice President MarketingDenise Smith250000250000250000250000
Further reading
This was just an example of what window functions are handy for. More window functions are documented here. Ranked functions are used quite often to find out what happened over time, e.g. if a status of a product changed. If you want to dig even deeper into this topic, I recommend (no affiliation) the book T-SQL Window Functions: For data analysis and beyond (2nd Edition) (Developer Reference) by Itzik Ben-Gan. He's authored some excellent books with great knowledge.
[^1]: Fohen wind is a fairly common wind in the European Alps, which is a reference to the popular sample database Nordwind.
Cover Photo Credit: msandersmusic, on Pixabay
]]>