Pivot and Unpivot

One thing that I still get confused about writing is pivot queries. I find myself needing to lookup the syntax every time. Basically you use Pivot and Unpivot to change the output of a table. If you would like rows turned into columns you can use pivot and for the opposite you can use unpivot.

One thing to note is the column identifiers in the unpivot clause follow the catalog collation. For SQL Database, the collation is always SQL_Latin_General_CP1_CI_AS. For SQL Server partially contained databases, the collation is always Latin1_General_100_CI_AS_KS_WS_SC. If the column is combined with other columns, then a collate clause (COLLATE DATABASE_DEFAULT) is required to avoid conflicts.

So lets take a basic select statement:

USE AdventureWorks2014 ;  
GO  
SELECT DaysToManufacture, AVG(StandardCost) AS AverageCost   
FROM Production.Product  
GROUP BY DaysToManufacture;

This returns the following:

DaysToManufacture AverageCost
----------------- -----------
0                 5.0885
1                 223.88
2                 359.1082
4                 949.4105

To pivot the table you can write this kind of query:

-- Pivot table with one row and five columns  
SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,   
[0], [1], [2], [3], [4]  
FROM  
(SELECT DaysToManufacture, StandardCost   
    FROM Production.Product) AS SourceTable  
PIVOT  
(  
AVG(StandardCost)  
FOR DaysToManufacture IN ([0], [1], [2], [3], [4])  
) AS PivotTable;  

This returns the following:

Cost_Sorted_By_Production_Days 0           1           2           3           4         
------------------------------ ----------- ----------- ----------- ----------- -----------
AverageCost                    5.0885      223.88      359.1082    NULL        949.4105

As you can see it places the average values of the standard cost into the columns specified 0-4.

Here is an example of an Unpivot query:

-- Create the table and insert values as portrayed in the previous example.  
CREATE TABLE pvt (VendorID int, Emp1 int, Emp2 int,  
    Emp3 int, Emp4 int, Emp5 int);  
GO  
INSERT INTO pvt VALUES (1,4,3,5,4,4);  
INSERT INTO pvt VALUES (2,4,1,5,5,5);  
INSERT INTO pvt VALUES (3,4,3,5,4,4);  
INSERT INTO pvt VALUES (4,4,2,5,5,4);  
INSERT INTO pvt VALUES (5,5,1,5,5,5);  
GO  
-- Unpivot the table.  
SELECT VendorID, Employee, Orders  
FROM   
   (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5  
   FROM pvt) p  
UNPIVOT  
   (Orders FOR Employee IN   
      (Emp1, Emp2, Emp3, Emp4, Emp5)  
)AS unpvt;  
GO  

Here is a part of the result set:

VendorID    Employee    Orders
----------- ----------- ------
1            Emp1       4
1            Emp2       3 
1            Emp3       5
1            Emp4       4
1            Emp5       4
2            Emp1       4
2            Emp2       1
2            Emp3       5
2            Emp4       5
2            Emp5       5
...

Remember that unpivot is not the exact reverse of a pivot query. The pivot query uses aggregations and therefore merges possible multiple rows into a single row in the output.

Hope this helps you with your pivot queries!

Merge Statements

Over the years I have come across a lot of merge statements and I find most of the time there is one fundamental flaw in how it is written by most people.

In the update section in most cases I find that there is no section there to check if anything has changed against the target from the source, therefore everything gets updated each and every single time. Whereas if there was this section then only the records that changed would be updated. This saves a lot of time on performance of the query.

Here is an example of a merge that checks if something is changed between source and target before updating the record:

MERGE dbo.DimEmployee	TRG
USING Staging.Employee	SRC ON TRG.EmployeeID = SRC.EmployeeID
WHEN MATCHED AND
(
	      ISNULL(SRC.FirstName,'Unknown')	 != ISNULL(TRG.FirstName,'Unknown')
       OR ISNULL(SRC.LastName,'Unknown')	 != ISNULL(TRG.LastName,'Unknown')
       OR ISNULL(SRC.MiddleName,'Unknown')	 != ISNULL(TRG.MiddleName,'Unknown')
       OR ISNULL(SRC.NameStyle,'Unknown')	 != ISNULL(TRG.NameStyle,'Unknown')
       OR ISNULL(SRC.Title,'Unknown')		 != ISNULL(TRG.Title,'Unknown')
)
THEN UPDATE SET
	TRG.FirstName  = SRC.FirstName
    ,TRG.LastName   = SRC.LastName
    ,TRG.MiddleName = SRC.MiddleName
    ,TRG.NameStyle  = SRC.NameStyle
    ,TRG.Title	   = SRC.Title

WHEN NOT MATCHED THEN
INSERT
(
	 FirstName
	,LastName
	,MiddleName
	,NameStyle
	,Title
)
VALUES
(
	SRC.FirstName
	,SRC.LastName
	,SRC.MiddleName
	,SRC.NameStyle
	,SRC.Title
)

Hope this helps you improve the performance of you MERGE statements.

Querying Data in Temporal Tables

Following on from my previous post about modifying data in a temporal table, I will be looking into querying the data in this post.

When you want to get latest (actual) state of data in a temporal table, you can query the same way as you query a normal table. If the PERIOD columns are not hidden, their values will appear in a SELECT * query. If you specified PERIOD columns as hidden, their values won’t appear in a SELECT * query. When the PERIOD columns are hidden, reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

To perform any type of time-based analysis, use the new FOR SYSTEM_TIME clause with four temporal-specific sub-clauses to query data across the current and history tables.

  • AS OF <date_time>
  • FROM <start_date_time> TO <end_date_time>
  • BETWEEN <start_date_time> AND <end_date_time>
  • CONTAINED IN (<start_date_time> , <end_date_time>)
  • ALL

FOR SYSTEM_TIME can be specified independently for each table in a query. It can be used inside common table expressions, table-valued functions and stored procedures.

Using The AS OF Sub-Clause:
Use the AS OF sub-clause when you need to reconstruct state of data as it was at any specific time in the past. You can reconstruct the data with the precision of datetime2 datatype that was specified in the PERIOD column definitions.

The AS OF sub-clause clause can be used with constant literals or with variables, which allows you to dynamically specify time condition. The provided values are interpreted as UTC time.

This example returns the state of the dbo.Department table AS OF a specific date in the past.

/*State of entire table AS OF specific date in the past*/   
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME AS OF '2018-07-01 T10:00:00.7230011' ; 

This example compares the values between two points in time for a subset of rows.

DECLARE @ADayAgo datetime2   
SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())   

/*Comparison between two points in time for subset of rows*/   
SELECT D_1_Ago.[DeptID], D.[DeptID],   
D_1_Ago.[DeptName], D.[DeptName],   
D_1_Ago.[SysStartTime], D.[SysStartTime],   
D_1_Ago.[SysEndTime], D.[SysEndTime]   
FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago   
JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]    
AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;  

Using Views With AS-OF Sub-Clause:
Using views is very useful in scenarios when complex point-in time analysis is required. A common example is generating a business report today with the values for previous month. Usually, customers have a normalized database model which involves many tables with foreign key relationships.

Answering the question how data from that normalized model looked like at a point in the past can be very challenging, since all tables change independently. In this case, the best option is to create a view and apply the AS OF sub-clause to the entire view.

Using this approach allows you to decouple modeling of the data access layer from point-in time analysis as SQL Server will apply AS OF clause transparently to all temporal tables that participate in the view definition.

Furthermore, you can combine temporal with non-temporal tables in the same view and AS OF will be applied only to temporal ones. If the view does not reference at least one temporal table, applying temporal querying clauses to it will fail with an error.

/* Create view that joins three temporal tables: Department, CompanyLocation, LocationDepartments */   
CREATE VIEW [dbo].[vw_GetOrgChart]   
AS   
SELECT   
     [CompanyLocation].LocID  
   , [CompanyLocation].LocName  
   , [CompanyLocation].City  
   , [Department].DeptID  
   , [Department].DeptName    
FROM [dbo].[CompanyLocation]   
LEFT JOIN [dbo].[LocationDepartments]    
   ON [CompanyLocation].LocID = LocationDepartments.LocID   
LEFT JOIN [dbo].[Department]    
   ON LocationDepartments.DeptID = [Department].DeptID ;  
GO   
/* Querying view AS OF */   
SELECT * FROM [vw_GetOrgChart]   
FOR SYSTEM_TIME AS OF '2018-09-01 T10:00:00.7230011' ;  

Query For Changes To Specific Rows Over Time:
The temporal sub-clauses FROM…TO, BETWEEN…AND and CONTAINED IN are useful when you want to perform a data audit.

The first two sub-clauses return row versions that overlap with a specified period (i.e. those that started before given period and ended after it), while CONTAINED IN returns only those that existed within the specified period boundaries.

If you search for non-current row versions only, I recommend you use CONTAINED IN as it works only with the history table and will yield the best query performance. Use ALL when you need to query current and historical data without any restrictions.

/* Query using BETWEEN...AND sub-clause*/  
SELECT   
     [DeptID]  
   , [DeptName]  
   , [SysStartTime]  
   , [SysEndTime]  
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual   
FROM [dbo].[Department]   
FOR SYSTEM_TIME BETWEEN  '2018-01-01' AND '2018-12-31'   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC;   

/*  Query using CONTAINED IN sub-clause */  
SELECT [DeptID], [DeptName], [SysStartTime],[SysEndTime]   
FROM [dbo].[Department]   
FOR SYSTEM_TIME CONTAINED IN ('2018-04-01', '2018-09-25')   
WHERE DeptId = 1   
ORDER BY SysStartTime DESC ;  

/*  Query using ALL sub-clause */   
SELECT    
     [DeptID]   
   , [DeptName]   
   , [SysStartTime]   
   , [SysEndTime]   
   , IIF (YEAR(SysEndTime) = 9999, 1, 0) AS IsActual    
FROM [dbo].[Department] FOR SYSTEM_TIME ALL   
ORDER BY [DeptID], [SysStartTime] Desc  

Hope you have enjoyed the 3 part series on Temporal Tables.

Modifying Temporal Tables

Following on from my previous post about Temporal Tables, this post talks about modifying temporal tables.

Data in a temporal table is modified using regular DML statements with one important difference: the period column data cannot be directly modified. When data is updated, it is versioned, with the previous version of each updated row and is inserted into the history table. When data is deleted, the delete is logical, with the row moved into the history table from the current table – it is not permanently deleted.

Inserting Data:
When you insert new data, you need to account for the PERIOD columns.

If you specify the column list in your INSERT statement, you can omit the PERIOD columns because the system will generate values for these columns automatically.

--Insert with column list and without period columns   
INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID])   
VALUES(10, 'Marketing', 101, 1);

If you do specify the PERIOD columns in the column list in your INSERT statement, then you need to specify DEFAULT as their value.

INSERT INTO [dbo].[Department] ([DeptID] ,[DeptName] ,[ManagerID] ,[ParentDeptID], SysStartTime, SysEndTime)   
VALUES(11, 'Sales', 101, 1, default, default);  

If you do not specify the column list in your INSERT statement, specify DEFAULT for the PERIOD columns.

--Insert without  column list and DEFAULT values for period columns   
INSERT INTO [dbo].[Department]    
VALUES(12, 'Production', 101, 1, default, default);

If PERIOD columns are specified as HIDDEN, then you need only to specify the values for the visible columns when you use INSERT without specifying the column list. You do not need to account for the new PERIOD columns in your INSERT statement. This behavior guarantees that your legacy applications will continue to work when you enable system-versioning on tables that will benefit from versioning.

CREATE TABLE [dbo].[CompanyLocation]  
(   
     [LocID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY  
   , [LocName] [varchar](50) NOT NULL  
   , [City] [varchar](50) NOT NULL  
   , [SysStartTime] [datetime2](0) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL   
   , [SysEndTime] [datetime2](0) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL   
   , PERIOD FOR SYSTEM_TIME ([SysStartTime], [SysEndTime])   
)    
WITH ( SYSTEM_VERSIONING = ON );   
GO   
INSERT INTO [dbo].[CompanyLocation]   
VALUES  ('Headquarters', 'New York');

If the current table is partitioned, you can use partition switch as an efficient mechanism to load data into an empty partition or to load into multiple partitions in parallel. The staging table that is used in the PARTITION SWITCH IN statement with a system-versioned temporal table must have SYSTEM_TIME PERIOD defined, but it does not need to be a system-versioned temporal table. This ensures that temporal consistency checks are performed during the data insert into a staging table or when SYSTEM_TIME period is added to a pre-populated staging table.

/*Create staging table with period definition for SWITCH IN temporal table*/   
CREATE TABLE [dbo].[Staging_Department_Partition2]  
(   
     [DeptID] [int] NOT NULL  
   , [DeptName] [varchar](50)  NOT NULL  
   , [ManagerID] [int] NULL  
   , [ParentDeptID] [int] NULL  
   , [SysStartTime] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL  
   , [SysEndTime] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL  
   , PERIOD FOR SYSTEM_TIME ( [SysStartTime], [SysEndTime] )   
) ON [PRIMARY]   

/*Create aligned primary key*/   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
ADD CONSTRAINT [Staging_Department_Partition2_PK]  
   PRIMARY KEY CLUSTERED  
   (  [DeptID] ASC )     
   ON [PRIMARY]   

/*Create and enforce constraints for partition boundaries.   
Partition 2 contains rows with DeptID > 100 and DeptID <=200*/ 
ALTER TABLE [dbo].[Staging_Department_Partition2] WITH CHECK ADD CONSTRAINT [chk_staging_Department_partition_2] CHECK ([DeptID]>N'100' AND [DeptID]<=N'200')   
ALTER TABLE [dbo].[Staging_Department_Partition2]    
   CHECK CONSTRAINT [chk_staging_Department_partition_2]   

/*Load data into staging table*/   
INSERT INTO [dbo].[staging_Department] ([DeptID],[DeptName],[ManagerID],[ParentDeptID])   
VALUES (101,'D101',1,NULL)  

/*Use PARTITION SWITCH IN to efficiently add data to current table */    
ALTER TABLE [Staging_Department]    
SWITCH TO [dbo].[Department] PARTITION 2;  

Updating Data:
You update data in the current table with a regular UPDATE statement. You can update data in the current table from the history table to for the “oops” scenario. However, you cannot update PERIOD columns and you cannot directly updated data in the history table while SYSTEM_VERSIONING = ON. Set SYSTEM_VERSIONING = OFF and update rows from current and history table but keep in mind that way system will not preserve history of changes.

UPDATE [dbo].[Department] SET [ManagerID] = 501 WHERE [DeptID] = 10  

You cannot update a PERIOD column and you cannot update the history table. In this example, an attempt to update a PERIOD column generates an error.

UPDATE [dbo].[Department]    
SET SysStartTime = '2015-09-23 23:48:31.2990175'    
WHERE DeptID = 10 ;  

Msg 13537, Level 16, State 1, Line 3   
Cannot update GENERATED ALWAYS columns in table 'TmpDev.dbo.Department'.  

You can use UPDATE on the current table to revert the actual row state to valid state at a specific point in time in the past (reverting to a “last good known row version”). The following example shows reverting to the values in the history table as of 2018-04-25 where the DeptID = 10.

UPDATE Department   
SET DeptName = History.DeptName   
FROM Department    
FOR SYSTEM_TIME AS OF '2018-04-25' AS History   
WHERE History.DeptID  = 10   
AND Department.DeptID = 10 ; 

Deleting Data:
You delete data in the current table with a regular DELETE statement. The end period column for deleted rows will be populated with the begin time of underlying transaction.
You cannot directly delete rows from history table while SYSTEM_VERSIONING = ON.

Set SYSTEM_VERSIONING = OFF and delete rows from current and history table but keep in mind that way system will not preserve history of changes. TRUNCATE, SWITCH PARTITION OUT of current table and SWITCH PARTITION IN history table are not supported while SYSTEM_VERSIONING = ON.

Using A Merge Statement:
MERGE operation is supported with the same limitations that INSERT and UPDATE statements have regarding PERIOD columns.

CREATE TABLE DepartmentStaging (DeptId INT, DeptName varchar(50));   
GO   
INSERT INTO DepartmentStaging VALUES (1, 'Company Management');   
INSERT INTO DepartmentStaging VALUES (10, 'Science & Research');   
INSERT INTO DepartmentStaging VALUES (15, 'Process Management');   

MERGE dbo.Department AS target   
USING (SELECT DeptId, DeptName FROM DepartmentStaging) AS source (DeptId, DeptName)   
ON (target.DeptId = source.DeptId)   
WHEN MATCHED THEN    
    UPDATE   
   SET DeptName = source.DeptName   
WHEN NOT MATCHED THEN   
   INSERT (DeptName)   
   VALUES (source.DeptName);  

Temporal Tables

Let’s start off with what is a temporal table? SQL Server 2016 introduced support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time.

A system-versioned temporal table is a type of user table designed to keep a full history of data changes and allow easy point in time analysis. This type of temporal table is referred to as a system-versioned temporal table because the period of validity for each row is managed by the system.

Every temporal table has two explicitly defined columns, each with a datetime2 data type. These columns are referred to as period columns. These period columns are used exclusively by the system to record period of validity for each row whenever a row is modified. A temporal table also contains a reference to another table with a mirrored schema. The system uses this table to automatically store the previous version of the row each time a row in the temporal table gets updated or deleted. This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table.

Benefits of using a temporal table:

  • Auditing all data changes and performing data forensics when necessary
  • Reconstructing state of the data as of any time in the past
  • Calculating trends over time
  • Maintaining a slowly changing dimension for decision support applications
  • Recovering from accidental data changes and application errors

Basic overview of temporal tables:

How does a temporal table work?
A current table and a history table are created, both with two datetime2 columns.

  • Period start column: The system records the start time for the row in this column, typically denoted as the SysStartTime column.
  • Period end column: The system records the end time for the row in this column, typically denoted at the SysEndTime column.

The current table contains the current value for each row. The history table contains each previous value for each row, if any, and the start time and end time for the period for which it was valid.

To create a temporal table you add WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory)) at the end of the create statement as below:

CREATE TABLE dbo.Employee   
(    
  [EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED   
  , [Name] nvarchar(100) NOT NULL  
  , [Position] varchar(100) NOT NULL   
  , [Department] varchar(100) NOT NULL  
  , [Address] nvarchar(1024) NOT NULL  
  , [AnnualSalary] decimal (10,2) NOT NULL  
  , [ValidFrom] datetime2 (2) GENERATED ALWAYS AS ROW START  
  , [ValidTo] datetime2 (2) GENERATED ALWAYS AS ROW END  
  , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)  
 )    
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));  

How to query temporal tables:
The SELECT statement FROM clause has a new clause FOR SYSTEM_TIME with five temporal-specific sub-clauses to query data across the current and history tables. This new SELECT statement syntax is supported directly on a single table, propagated through multiple joins, and through views on top of multiple temporal tables.The following query searches for row versions for Employee row with EmployeeID = 1000 that were active at least for a portion of period between 1st January of 2014 and 1st January 2015 (including the upper boundary):

SELECT * FROM Employee FOR SYSTEM_TIME BETWEEN '2017-01-01 00:00:00.0000000' AND '2018-01-01 00:00:00.0000000' WHERE EmployeeID = 1000 ORDER BY ValidFrom;

In the next post we will focus on working with temporal tables.

Original post

Inline Table-Valued Functions (TVFs)

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

ITVF Examples and Advantages
Suppose that you need to select author id, name, and city from the authors table. Users will often request data by author’s last name. Prior to SQL Server 2000, you could do it either by creating a view or stored procedure. ITVF offers a new option. Consider the following examples and the ease of data retrieval in each case:

CREATE VIEW view_authors
AS
SELECT
    au_id
   ,au_fname
   ,au_lname
   ,city
FROM
    authors
;

CREATE PROC proc_authors @au_lname VARCHAR(40)
AS
SELECT
    au_id
   ,au_fname
   ,au_lname
   ,city
FROM
    authors
WHERE
    au_lname = @au_lname
;

CREATE FUNCTION dbo.itvf_authors
(
    @au_lname VARCHAR(40)
)
RETURNS TABLE
RETURN SELECT
           au_id
          ,au_fname
          ,au_lname
          ,city
       FROM
           authors
       WHERE
           au_lname = @au_lname
;

In order to get a list of authors whose last name is stored in local variable @author, you can use the objects as follows:

SELECT
    *
FROM
    view_authors
WHERE
    au_lname = @author;
EXEC proc_authors @author;
SELECT
    *
FROM
    dbo.itvf_authors(@author)
;

So far, it’s hard to see any advantages to using the ITVF. A stored procedure call is the shortest, but this is the last factor I would consider when choosing an implementation method.

It’s worth mentioning that stored procs are more limiting than views and ITVFs when you don’t need to select all columns. Views and ITVF allow you to list those columns that you want selected. A stored procedure will always select a fixed number of columns unless you write complicated code with dynamic T-SQL. Just consider what you would need to do if a user wasn’t interested in the columns “city” and “state.” The view and ITVF in the example would handle it just fine, but the stored procedure would be virtually useless.

Now suppose that you need to get all titles written by authors whose last name is stored in variable @author. Here’s the code using the view:

SELECT
    a.*
   ,t.title_id
FROM
    view_authors a
    JOIN
    titleauthor  t ON a.au_id = t.au_id
WHERE
    a.au_lname = @author
;

Using the stored procedure for this task is quite inconvenient. Since you can’t join results of a stored procedure directly to another table, you have to save them in a temporary table first:

CREATE TABLE #authors
(
    au_id    VARCHAR(11)
   ,au_fname VARCHAR(20)
   ,au_lname VARCHAR(40)
   ,city     VARCHAR(20)
   ,state    VARCHAR(20)
);
INSERT #authors
EXEC proc_authors @author;
SELECT
    a.*
   ,t.title_id
FROM
    #authors    a
    JOIN
    titleauthor t ON a.au_id = t.au_id;
DROP TABLE #authors
;

The ITVF approach turns out to be more elegant than a view because you don’t need a WHERE clause filter and can specify selection criteria as parameters. It would be even more obvious if you had more parameters:

SELECT
    a.*
   ,t.title_id
FROM
    dbo.itvf_authors(@author) a
    JOIN
    titleauthor               t ON a.au_id = t.au_id
;

ITVfs can be very useful with applications that are doing multiple updates, deletes and inserts into a table. The application can pass through a table with flags to indicate the CRUD operation and then you can run a merge statement.

Hope you enjoyed this post!

SQL Transactions

Transactions in SQL Server are very useful, however I have come across that not many people use them. Not sure why this could be, so thought I would share how transactions work and the different modes.

A transaction is a single unit of work. If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased.

SQL Server operates in the following transaction modes:
Autocommit transactions
Each individual statement is a transaction.

Explicit transactions
Each transaction is explicitly started with the BEGIN TRANSACTION statement and explicitly ended with a COMMIT or ROLLBACK statement.

Implicit transactions
A new transaction is implicitly started when the prior transaction completes, but each transaction is explicitly completed with a COMMIT or ROLLBACK statement.

Batch-scoped transactions
Applicable only to multiple active result sets (MARS), a Transact-SQL explicit or implicit transaction that starts under a MARS session becomes a batch-scoped transaction. A batch-scoped transaction that is not committed or rolled back when a batch completes is automatically rolled back by SQL Server.

From my experience Explicit transactions are the most commonly used out of all the different modes.

Distributed Transactions:
You can use Distributed transactions which specifies the start of a SQL distributed transaction managed by Microsoft Distributed Transaction Coordinator (MS DTC). This is only applicable to SQL Server and no Azure. Transaction-level snapshot isolation does not support distributed transactions.

Arguments:
transaction_name
Is a user-defined transaction name used to track the distributed transaction within MS DTC utilities. transaction_name must conform to the rules for identifiers and must be <= 32 characters.

@tran_name_variable
Is the name of a user-defined variable containing a transaction name used to track the distributed transaction within MS DTC utilities. The variable must be declared with a char, varchar, nchar, or nvarchar data type.

The instance of the SQL Server Database Engine executing the BEGIN DISTRIBUTED TRANSACTION statement is the transaction originator and controls the completion of the transaction. When a subsequent COMMIT TRANSACTION or ROLLBACK TRANSACTION statement is issued for the session, the controlling instance requests that MS DTC manage the completion of the distributed transaction across all of the instances involved.

For example, if BEGIN DISTRIBUTED TRANSACTION is issued on ServerA, the session calls a stored procedure on ServerB and another stored procedure on ServerC. The stored procedure on ServerC executes a distributed query against ServerD, and then all four computers are involved in the distributed transaction. The instance of the Database Engine on ServerA is the originating controlling instance for the transaction.

USE AdventureWorks2012;  
GO  
BEGIN DISTRIBUTED TRANSACTION;  
-- Delete candidate from local instance.  
DELETE AdventureWorks2012.HumanResources.JobCandidate  
    WHERE JobCandidateID = 13;  
-- Delete candidate from remote instance.  
DELETE RemoteServer.AdventureWorks2012.HumanResources.JobCandidate  
    WHERE JobCandidateID = 13;  
COMMIT TRANSACTION;  
GO  

Begin Transaction
Marks the starting point of an explicit, local transaction. Explicit transactions start with the BEGIN TRANSACTION statement and end with the COMMIT or ROLLBACK statement. BEGIN TRANSACTION increments @@TRANCOUNT by 1.

BEGIN TRANSACTION represents a point at which the data referenced by a connection is logically and physically consistent. If errors are encountered, all data modifications made after the BEGIN TRANSACTION can be rolled back to return the data to this known state of consistency. Each transaction lasts until either it completes without errors and COMMIT TRANSACTION is issued to make the modifications a permanent part of the database, or errors are encountered and all modifications are erased with a ROLLBACK TRANSACTION statement.

Although BEGIN TRANSACTION starts a local transaction, it is not recorded in the transaction log until the application subsequently performs an action that must be recorded in the log, such as executing an INSERT, UPDATE, or DELETE statement.

Naming multiple transactions in a series of nested transactions with a transaction name has little effect on the transaction. Only the first (outermost) transaction name is registered with the system. A rollback to any other name (other than a valid save point name) generates an error. None of the statements executed before the rollback is, in fact, rolled back at the time this error occurs. The statements are rolled back only when the outer transaction is rolled back.

BEGIN TRY
	BEGIN TRANSACTION
	
	DELETE	FROM HumanResources.JobCandidate
	WHERE	JobCandidateID = 13
	
	COMMIT
	
END TRY
BEGIN CATCH
	ROLLBACK
	
END CATCH

Hope this helps you in future when using transactions. If you are doing INSERTS, UPDATES and DELETES it is advisable that you use a transaction.

Using a ZIP Task in SSIS

The ZIP file task is a great feature that SSIS has for when you need to move/email a large amount of data. Also if you have multiple files you want to move to the same location, this makes it easier for you, so you can move just one file instead of many.

I prefer to use variables which I can change if the location changes over using hard-coded values. The first step is to create the variables on the package:

Next step is to place the Zip task in your Control Flow area. Then configure it as follows:

This will then compress and zip your files in the source you set. You can then use a Process System task to move this file to where you require it or use an email task to email the file to people.

SSRS Cancel Report Still Running In The Background

There are some people who cancel a SSRS report because its taking to long by just closing visual studio or their browser. Then they reopen SSRS/browser and try and view the report again. In the mean time the previous report they were trying to run the query is still running on the database. This takes up quite a bit of resources and people think that there query has been cancelled, but this in not the case.

This is a common problem with users accessing reports. There is a way to stop the query from running in the background by using the KILL command, however you will need a DBA to do this for you unless you have permission to do this your self.

Hope this helps other who are struggling with the same issue 🙂

SSIS Email Task Delay Validation

Recently I was trying to send a file via Email in SSIS.

For the connection to the file I was using a variable as the file name changes everyday.

This was causing an issue because it validates the whole package before the file gets generated and the package would fail.

The simplest solution was to delay the validation on the Send Mail Task. You can do this in the properties by setting Delay Validation to True.

Hope this helps you!