Merge Statements

Over the years I have come across a lot of merge statements and I find most of the time there is one fundamental flaw in how it is written by most people.

In the update section in most cases I find that there is no section there to check if anything has changed against the target from the source, therefore everything gets updated each and every single time. Whereas if there was this section then only the records that changed would be updated. This saves a lot of time on performance of the query.

Here is an example of a merge that checks if something is changed between source and target before updating the record:

MERGE dbo.DimEmployee	TRG
USING Staging.Employee	SRC ON TRG.EmployeeID = SRC.EmployeeID
	      ISNULL(SRC.FirstName,'Unknown')	 != ISNULL(TRG.FirstName,'Unknown')
       OR ISNULL(SRC.LastName,'Unknown')	 != ISNULL(TRG.LastName,'Unknown')
       OR ISNULL(SRC.MiddleName,'Unknown')	 != ISNULL(TRG.MiddleName,'Unknown')
       OR ISNULL(SRC.NameStyle,'Unknown')	 != ISNULL(TRG.NameStyle,'Unknown')
       OR ISNULL(SRC.Title,'Unknown')		 != ISNULL(TRG.Title,'Unknown')
	TRG.FirstName  = SRC.FirstName
    ,TRG.LastName   = SRC.LastName
    ,TRG.MiddleName = SRC.MiddleName
    ,TRG.NameStyle  = SRC.NameStyle
    ,TRG.Title	   = SRC.Title


Hope this helps you improve the performance of you MERGE statements.

Querying Data in Temporal Tables

Following on from my previous post about modifying data in a temporal table, I will be looking into querying the data in this post.

When you want to get latest (actual) state of data in a temporal table, you can query the same way as you query a normal table. If the PERIOD columns are not hidden, their values will appear in a SELECT * query. If you specified PERIOD columns as hidden, their values won’t appear in a SELECT * query. When the PERIOD columns are hidden, reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

To perform any type of time-based analysis, use the new FOR SYSTEM_TIME clause with four temporal-specific sub-clauses to query data across the current and history tables.

  • AS OF <date_time>
  • FROM <start_date_time> TO <end_date_time>
  • BETWEEN <start_date_time> AND <end_date_time>
  • CONTAINED IN (<start_date_time> , <end_date_time>)
  • ALL

FOR SYSTEM_TIME can be specified independently for each table in a query. It can be used inside common table expressions, table-valued functions and stored procedures.

Using The AS OF Sub-Clause:
Use the AS OF sub-clause when you need to reconstruct state of data as it was at any specific time in the past. You can reconstruct the data with the precision of datetime2 datatype that was specified in the PERIOD column definitions.

The AS OF sub-clause clause can be used with constant literals or with variables, which allows you to dynamically specify time condition. The provided values are interpreted as UTC time.

This example returns the state of the dbo.Department table AS OF a specific date in the past.

/*State of entire table AS OF specific date in the past*/   
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME AS OF '2018-07-01 T10:00:00.7230011' ; 

This example compares the values between two points in time for a subset of rows.

DECLARE @ADayAgo datetime2   
SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())   

/*Comparison between two points in time for subset of rows*/   
SELECT D_1_Ago.[DeptID], D.[DeptID],   
D_1_Ago.[DeptName], D.[DeptName],   
D_1_Ago.[SysStartTime], D.[SysStartTime],   
D_1_Ago.[SysEndTime], D.[SysEndTime]   
FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago   
JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]    
AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;  

Using Views With AS-OF Sub-Clause:
Using views is very useful in scenarios when complex point-in time analysis is required. A common example is generating a business report today with the values for previous month. Usually, customers have a normalized database model which involves many tables with foreign key relationships.

Answering the question how data from that normalized model looked like at a point in the past can be very challenging, since all tables change independently. In this case, the best option is to create a view and apply the AS OF sub-clause to the entire view.

Using this approach allows you to decouple modeling of the data access layer from point-in time analysis as SQL Server will apply AS OF clause transparently to all temporal tables that participate in the view definition.

Furthermore, you can combine temporal with non-temporal tables in the same view and AS OF will be applied only to temporal ones. If the view does not reference at least one temporal table, applying temporal querying clauses to it will fail with an error.

/* Create view that joins three temporal tables: Department, CompanyLocation, LocationDepartments */   
CREATE VIEW [dbo].[vw_GetOrgChart]   
   , [CompanyLocation].LocName  
   , [CompanyLocation].City  
   , [Department].DeptID  
   , [Department].DeptName    
FROM [dbo].[CompanyLocation]   
LEFT JOIN [dbo].[LocationDepartments]    
   ON [CompanyLocation].LocID = LocationDepartments.LocID   
LEFT JOIN [dbo].[Department]    
   ON LocationDepartments.DeptID = [Department].DeptID ;  
/* Querying view AS OF */   
SELECT * FROM [vw_GetOrgChart]   
FOR SYSTEM_TIME AS OF '2018-09-01 T10:00:00.7230011' ;  

Query For Changes To Specific Rows Over Time:
The temporal sub-clauses FROM…TO, BETWEEN…AND and CONTAINED IN are useful when you want to perform a data audit.

The first two sub-clauses return row versions that overlap with a specified period (i.e. those that started before given period and ended after it), while CONTAINED IN returns only those that existed within the specified period boundaries.

If you search for non-current row versions only, I recommend you use CONTAINED IN as it works only with the history table and will yield the best query performance. Use ALL when you need to query current and historical data without any restrictions.

/* Query using BETWEEN...AND sub-clause*/  
   , [DeptName]  
   , [SysStartTime]  
   , [SysEndTime]  
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual   
FROM [dbo].[Department]   
FOR SYSTEM_TIME BETWEEN  '2018-01-01' AND '2018-12-31'   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC;   

/*  Query using CONTAINED IN sub-clause */  
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME CONTAINED IN ('2018-04-01', '2018-09-25')   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC ;  

/*  Query using ALL sub-clause */   
   , [DeptName]   
   , [SysStartTime]   
   , [SysEndTime]   
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual    
FROM [dbo].[Department] FOR SYSTEM_TIME ALL   
ORDER BY [DeptID], [SysStartTime] Desc  

Hope you have enjoyed the 3 part series on Temporal Tables.

Modifying Temporal Tables

Following on from my previous post about Temporal Tables, this post talks about modifying temporal tables.

Data in a temporal table is modified using regular DML statements with one important difference: the period column data cannot be directly modified. When data is updated, it is versioned, with the previous version of each updated row and is inserted into the history table. When data is deleted, the delete is logical, with the row moved into the history table from the current table – it is not permanently deleted.

Inserting Data:
When you insert new data, you need to account for the PERIOD columns.

If you specify the column list in your INSERT statement, you can omit the PERIOD columns because the system will generate values for these columns automatically.

--Insert with column list and without period columns   
INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID])   
VALUES(10, 'Marketing', 101, 1);

If you do specify the PERIOD columns in the column list in your INSERT statement, then you need to specify DEFAULT as their value.

INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID], SysStartTime, SysEndTime)   
VALUES(11, 'Sales', 101, 1, default, default);  

If you do not specify the column list in your INSERT statement, specify DEFAULT for the PERIOD columns.

--Insert without  column list and DEFAULT values for period columns   
INSERT INTO [dbo].[Department]    
VALUES(12, 'Production', 101, 1, default, default);

If PERIOD columns are specified as HIDDEN, then you need only to specify the values for the visible columns when you use INSERT without specifying the column list. You do not need to account for the new PERIOD columns in your INSERT statement. This behavior guarantees that your legacy applications will continue to work when you enable system-versioning on tables that will benefit from versioning.

CREATE TABLE [dbo].[CompanyLocation]  
   , [LocName] [varchar](50) NOT NULL  
   , [City] [varchar](50) NOT NULL  
   , [SysStartTime] [datetime2](0) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL   
   , [SysEndTime] [datetime2](0) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL   
   , PERIOD FOR SYSTEM_TIME ([SysStartTime], [SysEndTime])   
INSERT INTO [dbo].[CompanyLocation]   
VALUES  ('Headquarters', 'New York');

If the current table is partitioned, you can use partition switch as an efficient mechanism to load data into an empty partition or to load into multiple partitions in parallel. The staging table that is used in the PARTITION SWITCH IN statement with a system-versioned temporal table must have SYSTEM_TIME PERIOD defined, but it does not need to be a system-versioned temporal table. This ensures that temporal consistency checks are performed during the data insert into a staging table or when SYSTEM_TIME period is added to a pre-populated staging table.

/*Create staging table with period definition for SWITCH IN temporal table*/   
CREATE TABLE [dbo].[Staging_Department_Partition2]  
     [DeptID] [int] NOT NULL  
   , [DeptName] [varchar](50)  NOT NULL  
   , [ManagerID] [int] NULL  
   , [ParentDeptID] [int] NULL  
   , [SysStartTime] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL  
   , [SysEndTime] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL  
   , PERIOD FOR SYSTEM_TIME ( [SysStartTime], [SysEndTime] )   

/*Create aligned primary key*/   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
ADD CONSTRAINT [Staging_Department_Partition2_PK]  
   (  [DeptID] ASC )     
   ON [PRIMARY]   

/*Create and enforce constraints for partition boundaries.   
Partition 2 contains rows with DeptID > 100 and DeptID <=200*/ 
ALTER TABLE [dbo].[Staging_Department_Partition2] WITH CHECK ADD CONSTRAINT [chk_staging_Department_partition_2] CHECK ([DeptID]>N'100' AND [DeptID]<=N'200')   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
   CHECK CONSTRAINT [chk_staging_Department_partition_2]   

/*Load data into staging table*/   
INSERT INTO [dbo].[staging_Department] ([DeptID],[DeptName],[ManagerID],[ParentDeptID])   
VALUES (101,'D101',1,NULL)  

/*Use PARTITION SWITCH IN to efficiently add data to current table */    
ALTER TABLE [Staging_Department]    
SWITCH TO [dbo].[Department] PARTITION 2;  

Updating Data:
You update data in the current table with a regular UPDATE statement. You can update data in the current table from the history table to for the “oops” scenario. However, you cannot update PERIOD columns and you cannot directly updated data in the history table while SYSTEM_VERSIONING = ON. Set SYSTEM_VERSIONING = OFF and update rows from current and history table but keep in mind that way system will not preserve history of changes.

UPDATE [dbo].[Department] SET [ManagerID] = 501 WHERE [DeptID] = 10  

You cannot update a PERIOD column and you cannot update the history table. In this example, an attempt to update a PERIOD column generates an error.

UPDATE [dbo].[Department]    
SET SysStartTime = '2015-09-23 23:48:31.2990175'    
WHERE DeptID = 10 ;  

Msg 13537, Level 16, State 1, Line 3   
Cannot update GENERATED ALWAYS columns in table 'TmpDev.dbo.Department'.  

You can use UPDATE on the current table to revert the actual row state to valid state at a specific point in time in the past (reverting to a “last good known row version”). The following example shows reverting to the values in the history table as of 2018-04-25 where the DeptID = 10.

UPDATE Department   
SET DeptName = History.DeptName   
FROM Department    
FOR SYSTEM_TIME AS OF '2018-04-25' AS History   
WHERE History.DeptID  = 10   
AND Department.DeptID = 10 ; 

Deleting Data:
You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Using A Merge Statement:
MERGE operation is supported with the same limitations that INSERT and UPDATE statements have regarding PERIOD columns.

CREATE TABLE DepartmentStaging (DeptId INT, DeptName varchar(50));   
INSERT INTO DepartmentStaging VALUES (1, 'Company Management');   
INSERT INTO DepartmentStaging VALUES (10, 'Science & Research');   
INSERT INTO DepartmentStaging VALUES (15, 'Process Management');   

MERGE dbo.Department AS target   
USING (SELECT DeptId, DeptName FROM DepartmentStaging) AS source (DeptId, DeptName)   
ON (target.DeptId = source.DeptId)   
   SET DeptName = source.DeptName   
   INSERT (DeptName)   
   VALUES (source.DeptName);  

Temporal Tables

Let’s start off with what is a temporal table? SQL Server 2016 introduced support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

A system-versioned temporal table is a type of user table designed to keep a full history of data changes and allow easy point in time analysis. This type of temporal table is referred to as a system-versioned temporal table because the period of validity for each row is managed by the system.

Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns are referred to as period columns. These period columns are used exclusively by the system to record period of validity for each row whenever a row is modified. A temporal table also contains a reference to another table with a mirrored schema. The system uses this table to automatically store the previous version of the row each time a row in the temporal table gets updated or deleted. This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table.

Benefits of using a temporal table:

  • Auditing all data changes and performing data forensics when necessary
  • Reconstructing state of the data as of any time in the past
  • Calculating trends over time
  • Maintaining a slowly changing dimension for decision support applications
  • Recovering from accidental data changes and application errors

Basic overview of temporal tables:

How does a temporal table work?
A current table and a history table are created, both with two datetime2 columns.

  • Period start column: The system records the start time for the row in this column, typically denoted as the SysStartTime column.
  • Period end column: The system records the end time for the row in this column, typically denoted at the SysEndTime column.

The current table contains the current value for each row. The history table contains each previous value for each row, if any, and the start time and end time for the period for which it was valid.

To create a temporal table you add WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory)) at the end of the create statement as below:

CREATE TABLE dbo.Employee   
  , [Name] nvarchar(100) NOT NULL  
  , [Position] varchar(100) NOT NULL   
  , [Department] varchar(100) NOT NULL  
  , [Address] nvarchar(1024) NOT NULL  
  , [AnnualSalary] decimal (10,2) NOT NULL  
  , [ValidFrom] datetime2 (2) GENERATED ALWAYS AS ROW START  
  , [ValidTo] datetime2 (2) GENERATED ALWAYS AS ROW END  
  , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)  

How to query temporal tables:
The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with five temporal-specific sub-clauses to query data across the current and history tables. This new SELECT statement syntax is supported directly on a single table, propagated through multiple joins, and through views on top of multiple temporal tables.The following query searches for row versions for Employee row with EmployeeID = 1000 that were active at least for a portion of period between 1st January of 2014 and 1st January 2015 (including the upper boundary):

SELECT * FROM Employee FOR SYSTEM_TIME BETWEEN '2017-01-01 00:00:00.0000000' AND '2018-01-01 00:00:00.0000000' WHERE EmployeeID = 1000 ORDER BY ValidFrom;

In the next post we will focus on working with temporal tables.

Original post

Inline Table-Valued Functions (TVFs)

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

ITVF Examples and Advantages
Suppose that you need to select author id, name, and city from the authors table. Users will often request data by author’s last name. Prior to SQL Server 2000, you could do it either by creating a view or stored procedure. ITVF offers a new option. Consider the following examples and the ease of data retrieval in each case:

CREATE VIEW view_authors

CREATE PROC proc_authors @au_lname VARCHAR(40)
    au_lname = @au_lname

CREATE FUNCTION dbo.itvf_authors
    @au_lname VARCHAR(40)
           au_lname = @au_lname

In order to get a list of authors whose last name is stored in local variable @author, you can use the objects as follows:

    au_lname = @author;
EXEC proc_authors @author;

So far, it’s hard to see any advantages to using the ITVF. A stored procedure call is the shortest, but this is the last factor I would consider when choosing an implementation method.

It’s worth mentioning that stored procs are more limiting than views and ITVFs when you don’t need to select all columns. Views and ITVF allow you to list those columns that you want selected. A stored procedure will always select a fixed number of columns unless you write complicated code with dynamic T-SQL. Just consider what you would need to do if a user wasn’t interested in the columns “city” and “state.” The view and ITVF in the example would handle it just fine, but the stored procedure would be virtually useless.

Now suppose that you need to get all titles written by authors whose last name is stored in variable @author. Here’s the code using the view:

    view_authors a
    titleauthor  t ON a.au_id = t.au_id
    a.au_lname = @author

Using the stored procedure for this task is quite inconvenient. Since you can’t join results of a stored procedure directly to another table, you have to save them in a temporary table first:

    au_id    VARCHAR(11)
   ,au_fname VARCHAR(20)
   ,au_lname VARCHAR(40)
   ,city     VARCHAR(20)
   ,state    VARCHAR(20)
INSERT #authors
EXEC proc_authors @author;
    #authors    a
    titleauthor t ON a.au_id = t.au_id;
DROP TABLE #authors

The ITVF approach turns out to be more elegant than a view because you don’t need a WHERE clause filter and can specify selection criteria as parameters. It would be even more obvious if you had more parameters:

    dbo.itvf_authors(@author) a
    titleauthor               t ON a.au_id = t.au_id

ITVfs can be very useful with applications that are doing multiple updates, deletes and inserts into a table. The application can pass through a table with flags to indicate the CRUD operation and then you can run a merge statement.

Hope you enjoyed this post!