Pivot and Unpivot

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

So lets take a basic select statement:

USE AdventureWorks2014 ;  
GO  
SELECT DaysToManufacture, AVG(StandardCost) AS AverageCost   
FROM Production.Product  
GROUP BY DaysToManufacture;

This returns the following:

DaysToManufacture AverageCost
----------------- -----------
0                 5.0885
1                 223.88
2                 359.1082
4                 949.4105

To pivot the table you can write this kind of query:

-- Pivot table with one row and five columns  
SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,   
[0], [1], [2], [3], [4]  
FROM  
(SELECT DaysToManufacture, StandardCost   
    FROM Production.Product) AS SourceTable  
PIVOT  
(  
AVG(StandardCost)  
FOR DaysToManufacture IN ([0], [1], [2], [3], [4])  
) AS PivotTable;  

This returns the following:

Cost_Sorted_By_Production_Days 0           1           2           3           4         
------------------------------ ----------- ----------- ----------- ----------- -----------
AverageCost                    5.0885      223.88      359.1082    NULL        949.4105

As you can see it places the average values of the standard cost into the columns specified 0-4.

Here is an example of an Unpivot query:

-- Create the table and insert values as portrayed in the previous example.  
CREATE TABLE pvt (VendorID int, Emp1 int, Emp2 int,  
    Emp3 int, Emp4 int, Emp5 int);  
GO  
INSERT INTO pvt VALUES (1,4,3,5,4,4);  
INSERT INTO pvt VALUES (2,4,1,5,5,5);  
INSERT INTO pvt VALUES (3,4,3,5,4,4);  
INSERT INTO pvt VALUES (4,4,2,5,5,4);  
INSERT INTO pvt VALUES (5,5,1,5,5,5);  
GO  
-- Unpivot the table.  
SELECT VendorID, Employee, Orders  
FROM   
   (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5  
   FROM pvt) p  
UNPIVOT  
   (Orders FOR Employee IN   
      (Emp1, Emp2, Emp3, Emp4, Emp5)  
)AS unpvt;  
GO  

Here is a part of the result set:

VendorID    Employee    Orders
----------- ----------- ------
1            Emp1       4
1            Emp2       3 
1            Emp3       5
1            Emp4       4
1            Emp5       4
2            Emp1       4
2            Emp2       1
2            Emp3       5
2            Emp4       5
2            Emp5       5
...

Remember that unpivot is not the exact reverse of a pivot query. The pivot query uses aggregations and therefore merges possible multiple rows into a single row in the output.

Hope this helps you with your pivot queries!

Merge Statements

Over the years I have come across a lot of merge statements and I find most of the time there is one fundamental flaw in how it is written by most people.

In the update section in most cases I find that there is no section there to check if anything has changed against the target from the source, therefore everything gets updated each and every single time. Whereas if there was this section then only the records that changed would be updated. This saves a lot of time on performance of the query.

Here is an example of a merge that checks if something is changed between source and target before updating the record:

MERGE dbo.DimEmployee	TRG
USING Staging.Employee	SRC ON TRG.EmployeeID = SRC.EmployeeID
WHEN MATCHED AND
(
	      ISNULL(SRC.FirstName,'Unknown')	 != ISNULL(TRG.FirstName,'Unknown')
       OR ISNULL(SRC.LastName,'Unknown')	 != ISNULL(TRG.LastName,'Unknown')
       OR ISNULL(SRC.MiddleName,'Unknown')	 != ISNULL(TRG.MiddleName,'Unknown')
       OR ISNULL(SRC.NameStyle,'Unknown')	 != ISNULL(TRG.NameStyle,'Unknown')
       OR ISNULL(SRC.Title,'Unknown')		 != ISNULL(TRG.Title,'Unknown')
)
THEN UPDATE SET
	TRG.FirstName  = SRC.FirstName
    ,TRG.LastName   = SRC.LastName
    ,TRG.MiddleName = SRC.MiddleName
    ,TRG.NameStyle  = SRC.NameStyle
    ,TRG.Title	   = SRC.Title

WHEN NOT MATCHED THEN
INSERT
(
	 FirstName
	,LastName
	,MiddleName
	,NameStyle
	,Title
)
VALUES
(
	SRC.FirstName
	,SRC.LastName
	,SRC.MiddleName
	,SRC.NameStyle
	,SRC.Title
)

Hope this helps you improve the performance of you MERGE statements.

Inline Table-Valued Functions (TVFs)

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

ITVF Examples and Advantages
Suppose that you need to select author id, name, and city from the authors table. Users will often request data by author’s last name. Prior to SQL Server 2000, you could do it either by creating a view or stored procedure. ITVF offers a new option. Consider the following examples and the ease of data retrieval in each case:

CREATE VIEW view_authors
AS
SELECT
    au_id
   ,au_fname
   ,au_lname
   ,city
FROM
    authors
;

CREATE PROC proc_authors @au_lname VARCHAR(40)
AS
SELECT
    au_id
   ,au_fname
   ,au_lname
   ,city
FROM
    authors
WHERE
    au_lname = @au_lname
;

CREATE FUNCTION dbo.itvf_authors
(
    @au_lname VARCHAR(40)
)
RETURNS TABLE
RETURN SELECT
           au_id
          ,au_fname
          ,au_lname
          ,city
       FROM
           authors
       WHERE
           au_lname = @au_lname
;

In order to get a list of authors whose last name is stored in local variable @author, you can use the objects as follows:

SELECT
    *
FROM
    view_authors
WHERE
    au_lname = @author;
EXEC proc_authors @author;
SELECT
    *
FROM
    dbo.itvf_authors(@author)
;

So far, it’s hard to see any advantages to using the ITVF. A stored procedure call is the shortest, but this is the last factor I would consider when choosing an implementation method.

It’s worth mentioning that stored procs are more limiting than views and ITVFs when you don’t need to select all columns. Views and ITVF allow you to list those columns that you want selected. A stored procedure will always select a fixed number of columns unless you write complicated code with dynamic T-SQL. Just consider what you would need to do if a user wasn’t interested in the columns “city” and “state.” The view and ITVF in the example would handle it just fine, but the stored procedure would be virtually useless.

Now suppose that you need to get all titles written by authors whose last name is stored in variable @author. Here’s the code using the view:

SELECT
    a.*
   ,t.title_id
FROM
    view_authors a
    JOIN
    titleauthor  t ON a.au_id = t.au_id
WHERE
    a.au_lname = @author
;

Using the stored procedure for this task is quite inconvenient. Since you can’t join results of a stored procedure directly to another table, you have to save them in a temporary table first:

CREATE TABLE #authors
(
    au_id    VARCHAR(11)
   ,au_fname VARCHAR(20)
   ,au_lname VARCHAR(40)
   ,city     VARCHAR(20)
   ,state    VARCHAR(20)
);
INSERT #authors
EXEC proc_authors @author;
SELECT
    a.*
   ,t.title_id
FROM
    #authors    a
    JOIN
    titleauthor t ON a.au_id = t.au_id;
DROP TABLE #authors
;

The ITVF approach turns out to be more elegant than a view because you don’t need a WHERE clause filter and can specify selection criteria as parameters. It would be even more obvious if you had more parameters:

SELECT
    a.*
   ,t.title_id
FROM
    dbo.itvf_authors(@author) a
    JOIN
    titleauthor               t ON a.au_id = t.au_id
;

ITVfs can be very useful with applications that are doing multiple updates, deletes and inserts into a table. The application can pass through a table with flags to indicate the CRUD operation and then you can run a merge statement.

Hope you enjoyed this post!