PowerBI – Pivot and Unpivot Columns

In PowerBI there is a nifty trick where you can pivot or unpivot your data in the Query Editor. This saves you from having to do this in SQL or which ever source you are using for your data.

There are just 4 simple clicks to pivot your data!

First you need to select the column you want to pivot in the Query Editor, in our case being the Month column:

Next you select the Transform tab at the top and select Pivot Column:

Next you need to select what the values are, in this case its the Users column and click OK:

Then your data will be displayed as pivoted by Month as shown below:

Hope this makes your pivoting needs easier in the future!

PowerBI – Constant Line

The constant line feature in PowerBI seems to be one of those hidden little gems. Lets say you have an SLA with a customer of the amount of downtime allowed on the system and you want to show on a graph once they have gone above this downtime threshold, you can add a constant line to your chart.

In my example below I set the constant line to 1000 and anything above that is great for the purpose of this chart.

To set a constant line, plot your graph, then on the right hand side, select what looks like a magnify glass. This will only show if you have the graph selected.

Then your constant line shall appear on your graph in the colour and density you set it to.

Hope this little gem helps you out in the future!

PowerBI – Table Heat Map

Ever wondered how to create a heat map in PowerBI without having to use a custom visual? There is now a simple way to do this by using Conditional Formatting on a Matrix table!

In my example I took some Google Analytics data showing the times of the day and amount of visitors in that time.

Then plot the values on the Matrix table

Next we go and do some conditional formatting in the pane on the right hand side in PowerBI

Now we can change the colours we want to use by Selecting Advanced Controls

You will see that the table has turned into a heat map table using the conditional formatting with the colours you chose. As you can see below the highest value is pink and the lowest value is a light green and a gradient between the colours as the numbers range.

Just like that we have a heat map table with one simple step!

PowerBI – Turn on Preview Features

Every month PowerBI releases new features. Some of the features are in preview mode and unless you turn it on you don’t get to use the preview features. This post explains how to turn them on.

Firstly you need to have the latest version of PowerBI to get the latest features. You can download it here.

Next you go to File > Options and Settings > Options

Then select Preview Features and select the preview features you want to use.

You will then be prompted to restart PowerBI in order for the preview features to be available.

Once restarted you will have your preview features available and if they are visuals you will see them appear on the right hand pane. For example we can now see Python in the visualizations pane.

Creating PowerBI Layouts

PowerBI has this great functionality where you can go and download preset layouts which make your PowerBI reports stand out more. There is a nice trick to doing this yourself.

You can create layouts in PowerPoint and then save them as images. Then insert them into your PowerBI report as an image and send the image right to the back.

Here are the steps to go about doing this:
Open up PowerPoint, select Design on the ribbon and then select edit background. You can then change the gradient in the panel on the right hand side.

Next you insert shapes, set the fill to white and the shape effect to shadow centre:


Then design your shapes on your PowerPoint slide and create the layout you want:

Now that you have your layout, we need this to be an image, you simply save this slide as an image:

Then select “Just this one”

Now your layout is saved as an image. Next is to import it into PowerBI.

Open up PowerBI and select Image to import it:

Go to where you saved the image of the PowerPoint slide, select it and click open:

Then resize the image to cover the entire PowerBI page and set the scaling to fill on the right hand side:

Now that the image is in we want to make sure it is right at the back. Select send to back:

And now you have an awesome layout to go and place your data on!

Pivot and Unpivot

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

So lets take a basic select statement:

USE AdventureWorks2014 ;  
GO  
SELECT DaysToManufacture, AVG(StandardCost) AS AverageCost   
FROM Production.Product  
GROUP BY DaysToManufacture;

This returns the following:

DaysToManufacture AverageCost
----------------- -----------
0                 5.0885
1                 223.88
2                 359.1082
4                 949.4105

To pivot the table you can write this kind of query:

-- Pivot table with one row and five columns  
SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,   
[0], [1], [2], [3], [4]  
FROM  
(SELECT DaysToManufacture, StandardCost   
    FROM Production.Product) AS SourceTable  
PIVOT  
(  
AVG(StandardCost)  
FOR DaysToManufacture IN ([0], [1], [2], [3], [4])  
) AS PivotTable;  

This returns the following:

Cost_Sorted_By_Production_Days 0           1           2           3           4         
------------------------------ ----------- ----------- ----------- ----------- -----------
AverageCost                    5.0885      223.88      359.1082    NULL        949.4105

As you can see it places the average values of the standard cost into the columns specified 0-4.

Here is an example of an Unpivot query:

-- Create the table and insert values as portrayed in the previous example.  
CREATE TABLE pvt (VendorID int, Emp1 int, Emp2 int,  
    Emp3 int, Emp4 int, Emp5 int);  
GO  
INSERT INTO pvt VALUES (1,4,3,5,4,4);  
INSERT INTO pvt VALUES (2,4,1,5,5,5);  
INSERT INTO pvt VALUES (3,4,3,5,4,4);  
INSERT INTO pvt VALUES (4,4,2,5,5,4);  
INSERT INTO pvt VALUES (5,5,1,5,5,5);  
GO  
-- Unpivot the table.  
SELECT VendorID, Employee, Orders  
FROM   
   (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5  
   FROM pvt) p  
UNPIVOT  
   (Orders FOR Employee IN   
      (Emp1, Emp2, Emp3, Emp4, Emp5)  
)AS unpvt;  
GO  

Here is a part of the result set:

VendorID    Employee    Orders
----------- ----------- ------
1            Emp1       4
1            Emp2       3 
1            Emp3       5
1            Emp4       4
1            Emp5       4
2            Emp1       4
2            Emp2       1
2            Emp3       5
2            Emp4       5
2            Emp5       5
...

Remember that unpivot is not the exact reverse of a pivot query. The pivot query uses aggregations and therefore merges possible multiple rows into a single row in the output.

Hope this helps you with your pivot queries!

Merge Statements

Over the years I have come across a lot of merge statements and I find most of the time there is one fundamental flaw in how it is written by most people.

In the update section in most cases I find that there is no section there to check if anything has changed against the target from the source, therefore everything gets updated each and every single time. Whereas if there was this section then only the records that changed would be updated. This saves a lot of time on performance of the query.

Here is an example of a merge that checks if something is changed between source and target before updating the record:

MERGE dbo.DimEmployee	TRG
USING Staging.Employee	SRC ON TRG.EmployeeID = SRC.EmployeeID
WHEN MATCHED AND
(
	      ISNULL(SRC.FirstName,'Unknown')	 != ISNULL(TRG.FirstName,'Unknown')
       OR ISNULL(SRC.LastName,'Unknown')	 != ISNULL(TRG.LastName,'Unknown')
       OR ISNULL(SRC.MiddleName,'Unknown')	 != ISNULL(TRG.MiddleName,'Unknown')
       OR ISNULL(SRC.NameStyle,'Unknown')	 != ISNULL(TRG.NameStyle,'Unknown')
       OR ISNULL(SRC.Title,'Unknown')		 != ISNULL(TRG.Title,'Unknown')
)
THEN UPDATE SET
	TRG.FirstName  = SRC.FirstName
    ,TRG.LastName   = SRC.LastName
    ,TRG.MiddleName = SRC.MiddleName
    ,TRG.NameStyle  = SRC.NameStyle
    ,TRG.Title	   = SRC.Title

WHEN NOT MATCHED THEN
INSERT
(
	 FirstName
	,LastName
	,MiddleName
	,NameStyle
	,Title
)
VALUES
(
	SRC.FirstName
	,SRC.LastName
	,SRC.MiddleName
	,SRC.NameStyle
	,SRC.Title
)

Hope this helps you improve the performance of you MERGE statements.

Querying Data in Temporal Tables

Following on from my previous post about modifying data in a temporal table, I will be looking into querying the data in this post.

When you want to get latest (actual) state of data in a temporal table, you can query the same way as you query a normal table. If the PERIOD columns are not hidden, their values will appear in a SELECT * query. If you specified PERIOD columns as hidden, their values won’t appear in a SELECT * query. When the PERIOD columns are hidden, reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

To perform any type of time-based analysis, use the new FOR SYSTEM_TIME clause with four temporal-specific sub-clauses to query data across the current and history tables.

  • AS OF <date_time>
  • FROM <start_date_time> TO <end_date_time>
  • BETWEEN <start_date_time> AND <end_date_time>
  • CONTAINED IN (<start_date_time> , <end_date_time>)
  • ALL

FOR SYSTEM_TIME can be specified independently for each table in a query. It can be used inside common table expressions, table-valued functions and stored procedures.

Using The AS OF Sub-Clause:
Use the AS OF sub-clause when you need to reconstruct state of data as it was at any specific time in the past. You can reconstruct the data with the precision of datetime2 datatype that was specified in the PERIOD column definitions.

The AS OF sub-clause clause can be used with constant literals or with variables, which allows you to dynamically specify time condition. The provided values are interpreted as UTC time.

This example returns the state of the dbo.Department table AS OF a specific date in the past.

/*State of entire table AS OF specific date in the past*/   
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME AS OF '2018-07-01 T10:00:00.7230011' ; 

This example compares the values between two points in time for a subset of rows.

DECLARE @ADayAgo datetime2   
SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())   

/*Comparison between two points in time for subset of rows*/   
SELECT D_1_Ago.[DeptID], D.[DeptID],   
D_1_Ago.[DeptName], D.[DeptName],   
D_1_Ago.[SysStartTime], D.[SysStartTime],   
D_1_Ago.[SysEndTime], D.[SysEndTime]   
FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago   
JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]    
AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;  

Using Views With AS-OF Sub-Clause:
Using views is very useful in scenarios when complex point-in time analysis is required. A common example is generating a business report today with the values for previous month. Usually, customers have a normalized database model which involves many tables with foreign key relationships.

Answering the question how data from that normalized model looked like at a point in the past can be very challenging, since all tables change independently. In this case, the best option is to create a view and apply the AS OF sub-clause to the entire view.

Using this approach allows you to decouple modeling of the data access layer from point-in time analysis as SQL Server will apply AS OF clause transparently to all temporal tables that participate in the view definition.

Furthermore, you can combine temporal with non-temporal tables in the same view and AS OF will be applied only to temporal ones. If the view does not reference at least one temporal table, applying temporal querying clauses to it will fail with an error.

/* Create view that joins three temporal tables: Department, CompanyLocation, LocationDepartments */   
CREATE VIEW [dbo].[vw_GetOrgChart]   
AS   
SELECT   
     [CompanyLocation].LocID  
   , [CompanyLocation].LocName  
   , [CompanyLocation].City  
   , [Department].DeptID  
   , [Department].DeptName    
FROM [dbo].[CompanyLocation]   
LEFT JOIN [dbo].[LocationDepartments]    
   ON [CompanyLocation].LocID = LocationDepartments.LocID   
LEFT JOIN [dbo].[Department]    
   ON LocationDepartments.DeptID = [Department].DeptID ;  
GO   
/* Querying view AS OF */   
SELECT * FROM [vw_GetOrgChart]   
FOR SYSTEM_TIME AS OF '2018-09-01 T10:00:00.7230011' ;  

Query For Changes To Specific Rows Over Time:
The temporal sub-clauses FROM…TO, BETWEEN…AND and CONTAINED IN are useful when you want to perform a data audit.

The first two sub-clauses return row versions that overlap with a specified period (i.e. those that started before given period and ended after it), while CONTAINED IN returns only those that existed within the specified period boundaries.

If you search for non-current row versions only, I recommend you use CONTAINED IN as it works only with the history table and will yield the best query performance. Use ALL when you need to query current and historical data without any restrictions.

/* Query using BETWEEN...AND sub-clause*/  
SELECT   
     [DeptID]  
   , [DeptName]  
   , [SysStartTime]  
   , [SysEndTime]  
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual   
FROM [dbo].[Department]   
FOR SYSTEM_TIME BETWEEN  '2018-01-01' AND '2018-12-31'   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC;   

/*  Query using CONTAINED IN sub-clause */  
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME CONTAINED IN ('2018-04-01', '2018-09-25')   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC ;  

/*  Query using ALL sub-clause */   
SELECT    
     [DeptID]   
   , [DeptName]   
   , [SysStartTime]   
   , [SysEndTime]   
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual    
FROM [dbo].[Department] FOR SYSTEM_TIME ALL   
ORDER BY [DeptID], [SysStartTime] Desc  

Hope you have enjoyed the 3 part series on Temporal Tables.

Modifying Temporal Tables

Following on from my previous post about Temporal Tables, this post talks about modifying temporal tables.

Data in a temporal table is modified using regular DML statements with one important difference: the period column data cannot be directly modified. When data is updated, it is versioned, with the previous version of each updated row and is inserted into the history table. When data is deleted, the delete is logical, with the row moved into the history table from the current table – it is not permanently deleted.

Inserting Data:
When you insert new data, you need to account for the PERIOD columns.

If you specify the column list in your INSERT statement, you can omit the PERIOD columns because the system will generate values for these columns automatically.

--Insert with column list and without period columns   
INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID])   
VALUES(10, 'Marketing', 101, 1);

If you do specify the PERIOD columns in the column list in your INSERT statement, then you need to specify DEFAULT as their value.

INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID], SysStartTime, SysEndTime)   
VALUES(11, 'Sales', 101, 1, default, default);  

If you do not specify the column list in your INSERT statement, specify DEFAULT for the PERIOD columns.

--Insert without  column list and DEFAULT values for period columns   
INSERT INTO [dbo].[Department]    
VALUES(12, 'Production', 101, 1, default, default);

If PERIOD columns are specified as HIDDEN, then you need only to specify the values for the visible columns when you use INSERT without specifying the column list. You do not need to account for the new PERIOD columns in your INSERT statement. This behavior guarantees that your legacy applications will continue to work when you enable system-versioning on tables that will benefit from versioning.

CREATE TABLE [dbo].[CompanyLocation]  
(   
     [LocID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY  
   , [LocName] [varchar](50) NOT NULL  
   , [City] [varchar](50) NOT NULL  
   , [SysStartTime] [datetime2](0) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL   
   , [SysEndTime] [datetime2](0) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL   
   , PERIOD FOR SYSTEM_TIME ([SysStartTime], [SysEndTime])   
)    
WITH ( SYSTEM_VERSIONING = ON );   
GO   
INSERT INTO [dbo].[CompanyLocation]   
VALUES  ('Headquarters', 'New York');

If the current table is partitioned, you can use partition switch as an efficient mechanism to load data into an empty partition or to load into multiple partitions in parallel. The staging table that is used in the PARTITION SWITCH IN statement with a system-versioned temporal table must have SYSTEM_TIME PERIOD defined, but it does not need to be a system-versioned temporal table. This ensures that temporal consistency checks are performed during the data insert into a staging table or when SYSTEM_TIME period is added to a pre-populated staging table.

/*Create staging table with period definition for SWITCH IN temporal table*/   
CREATE TABLE [dbo].[Staging_Department_Partition2]  
(   
     [DeptID] [int] NOT NULL  
   , [DeptName] [varchar](50)  NOT NULL  
   , [ManagerID] [int] NULL  
   , [ParentDeptID] [int] NULL  
   , [SysStartTime] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL  
   , [SysEndTime] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL  
   , PERIOD FOR SYSTEM_TIME ( [SysStartTime], [SysEndTime] )   
) ON [PRIMARY]   

/*Create aligned primary key*/   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
ADD CONSTRAINT [Staging_Department_Partition2_PK]  
   PRIMARY KEY CLUSTERED  
   (  [DeptID] ASC )     
   ON [PRIMARY]   

/*Create and enforce constraints for partition boundaries.   
Partition 2 contains rows with DeptID > 100 and DeptID <=200*/ 
ALTER TABLE [dbo].[Staging_Department_Partition2] WITH CHECK ADD CONSTRAINT [chk_staging_Department_partition_2] CHECK ([DeptID]>N'100' AND [DeptID]<=N'200')   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
   CHECK CONSTRAINT [chk_staging_Department_partition_2]   

/*Load data into staging table*/   
INSERT INTO [dbo].[staging_Department] ([DeptID],[DeptName],[ManagerID],[ParentDeptID])   
VALUES (101,'D101',1,NULL)  

/*Use PARTITION SWITCH IN to efficiently add data to current table */    
ALTER TABLE [Staging_Department]    
SWITCH TO [dbo].[Department] PARTITION 2;  

Updating Data:
You update data in the current table with a regular UPDATE statement. You can update data in the current table from the history table to for the “oops” scenario. However, you cannot update PERIOD columns and you cannot directly updated data in the history table while SYSTEM_VERSIONING = ON. Set SYSTEM_VERSIONING = OFF and update rows from current and history table but keep in mind that way system will not preserve history of changes.

UPDATE [dbo].[Department] SET [ManagerID] = 501 WHERE [DeptID] = 10  

You cannot update a PERIOD column and you cannot update the history table. In this example, an attempt to update a PERIOD column generates an error.

UPDATE [dbo].[Department]    
SET SysStartTime = '2015-09-23 23:48:31.2990175'    
WHERE DeptID = 10 ;  

Msg 13537, Level 16, State 1, Line 3   
Cannot update GENERATED ALWAYS columns in table 'TmpDev.dbo.Department'.  

You can use UPDATE on the current table to revert the actual row state to valid state at a specific point in time in the past (reverting to a “last good known row version”). The following example shows reverting to the values in the history table as of 2018-04-25 where the DeptID = 10.

UPDATE Department   
SET DeptName = History.DeptName   
FROM Department    
FOR SYSTEM_TIME AS OF '2018-04-25' AS History   
WHERE History.DeptID  = 10   
AND Department.DeptID = 10 ; 

Deleting Data:
You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Using A Merge Statement:
MERGE operation is supported with the same limitations that INSERT and UPDATE statements have regarding PERIOD columns.

CREATE TABLE DepartmentStaging (DeptId INT, DeptName varchar(50));   
GO   
INSERT INTO DepartmentStaging VALUES (1, 'Company Management');   
INSERT INTO DepartmentStaging VALUES (10, 'Science & Research');   
INSERT INTO DepartmentStaging VALUES (15, 'Process Management');   

MERGE dbo.Department AS target   
USING (SELECT DeptId, DeptName FROM DepartmentStaging) AS source (DeptId, DeptName)   
ON (target.DeptId = source.DeptId)   
WHEN MATCHED THEN    
    UPDATE   
   SET DeptName = source.DeptName   
WHEN NOT MATCHED THEN   
   INSERT (DeptName)   
   VALUES (source.DeptName);  

Temporal Tables

Let’s start off with what is a temporal table? SQL Server 2016 introduced support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

A system-versioned temporal table is a type of user table designed to keep a full history of data changes and allow easy point in time analysis. This type of temporal table is referred to as a system-versioned temporal table because the period of validity for each row is managed by the system.

Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns are referred to as period columns. These period columns are used exclusively by the system to record period of validity for each row whenever a row is modified. A temporal table also contains a reference to another table with a mirrored schema. The system uses this table to automatically store the previous version of the row each time a row in the temporal table gets updated or deleted. This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table.

Benefits of using a temporal table:

  • Auditing all data changes and performing data forensics when necessary
  • Reconstructing state of the data as of any time in the past
  • Calculating trends over time
  • Maintaining a slowly changing dimension for decision support applications
  • Recovering from accidental data changes and application errors

Basic overview of temporal tables:

How does a temporal table work?
A current table and a history table are created, both with two datetime2 columns.

  • Period start column: The system records the start time for the row in this column, typically denoted as the SysStartTime column.
  • Period end column: The system records the end time for the row in this column, typically denoted at the SysEndTime column.

The current table contains the current value for each row. The history table contains each previous value for each row, if any, and the start time and end time for the period for which it was valid.

To create a temporal table you add WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory)) at the end of the create statement as below:

CREATE TABLE dbo.Employee   
(    
  [EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED   
  , [Name] nvarchar(100) NOT NULL  
  , [Position] varchar(100) NOT NULL   
  , [Department] varchar(100) NOT NULL  
  , [Address] nvarchar(1024) NOT NULL  
  , [AnnualSalary] decimal (10,2) NOT NULL  
  , [ValidFrom] datetime2 (2) GENERATED ALWAYS AS ROW START  
  , [ValidTo] datetime2 (2) GENERATED ALWAYS AS ROW END  
  , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)  
 )    
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));  

How to query temporal tables:
The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with five temporal-specific sub-clauses to query data across the current and history tables. This new SELECT statement syntax is supported directly on a single table, propagated through multiple joins, and through views on top of multiple temporal tables.The following query searches for row versions for Employee row with EmployeeID = 1000 that were active at least for a portion of period between 1st January of 2014 and 1st January 2015 (including the upper boundary):

SELECT * FROM Employee FOR SYSTEM_TIME BETWEEN '2017-01-01 00:00:00.0000000' AND '2018-01-01 00:00:00.0000000' WHERE EmployeeID = 1000 ORDER BY ValidFrom;

In the next post we will focus on working with temporal tables.

Original post