SSRS Cancel Report Still Running In The Background

There are some people who cancel a SSRS report because its taking to long by just closing visual studio or their browser. Then they reopen SSRS/browser and try and view the report again. In the mean time the previous report they were trying to run the query is still running on the database. This takes up quite a bit of resources and people think that there query has been cancelled, but this in not the case.

This is a common problem with users accessing reports. There is a way to stop the query from running in the background by using the KILL command, however you will need a DBA to do this for you unless you have permission to do this your self.

Hope this helps other who are struggling with the same issue 🙂

MDS “Service Unavailable” Error

After restarting a server that has MDS on, I could no longer access MDS via the browser. This seemed very strange as I have restarted the server multiple times and it had never done this before.

Turns out the Application Pool had stopped for MDS. So I started it up and then it would stop again, frustrating me further.

Then is occurred to me that it was using my domain details and my password had changed. Then I went into the advanced settings of my MDS Application Pool to update my details under Identity.

And there we go, it works and I can access MDS again via the browser!

Hope this helps someone else with this error.

Indexing Temp Tables

While busy with query performance tuning and trying to see whether the query would run better using a CTE’s (Common Table Expression) or Temp tables and comparing times, I realised I forgot to think about adding indexes on my temp tables which could increase the performance.

I then came across Brent Ozar’s post on Indexing Temp Tables and found it extremely useful and thought I would share it here.

Most people don’t think about indexing temp tables when they are trying to optimise their queries. Which is a shame, because I see people sticking fairly large amount of data into temp tables. On the rare occurrence that I do see them indexed, it’s a nonclustered index on a column or two. The optimser promptly ignores this index while you select 10 columns and join 10,000 rows to another temp table with another ignored nonclustered index on it.

Now, not every temp table needs a clustered index. I don’t even have a rule of thumb here, because it depends on so many things, like

  • Are you joining it to other tables?
  • What does the execution plan look like?
  • Is this even part of the stored procedure that needs tuning?

Assuming that all three things lead us to index temp tables, let’s create some and see how they react to different indexing strategies.

SELECT TOP 100000 *
INTO    #TempUsers
FROM    dbo.Users AS u;

SELECT TOP 100000 p.*
INTO    #TempPosts
FROM    dbo.Posts AS p
JOIN #TempUsers AS tu ON p.OwnerUserId = tu.Id;

SELECT TOP 100000 c.*
INTO    #TempComments
FROM    dbo.Comments AS c
JOIN #TempUsers AS tu ON c.UserId = tu.Id
JOIN #TempPosts AS tp ON c.PostId = tp.Id;

SELECT  *
FROM    #TempUsers AS tu
JOIN    #TempPosts AS tp    ON tu.Id = tp.OwnerUserId
JOIN    #TempComments AS tc ON tc.UserId = tu.Id
                            AND tc.PostId = tp.Id;

We have three temp tables with 100k rows in them. I joined them a bit to make sure we get good correlation in the data.

If we get the plan for select statement, this is our query plan. A bunch of table scans and hash joins. SQL is lazy. I don’t blame it one bit.

I’m so parallel…

After frequent user complaints and careful plan analysis, we decide to add some indexes. Let’s start with nonclustered indexes.

/*Nonclustered*/
CREATE NONCLUSTERED  INDEX ix_tempusers ON #TempUsers (Id);
CREATE NONCLUSTERED INDEX ix_tempposts ON #TempPosts (Id, OwnerUserId);
CREATE NONCLUSTERED INDEX ix_tempposts2 ON #TempPosts (OwnerUserId, Id);
CREATE NONCLUSTERED INDEX ix_tempcomments ON #TempComments (UserId, PostId);

I’ve gone ahead and added two indexes to #TempPosts so the optimizer has a choice if joining in another order is more efficient.

How’d we do?

Not so hot…

Not quite there. Two table scans and an index seek followed by a key lookup that SQL estimates will execute 62k times.

Let’s try that again with clustered indexes. We can only give #TempPosts one, so I picked at random. In real life, you might want to do your job!

/*Clustered*/
CREATE CLUSTERED INDEX cx_tempposts ON #TempPosts (Id, OwnerUserId);
CREATE CLUSTERED INDEX cx_tempusers ON #TempUsers (Id);
CREATE CLUSTERED INDEX cx_tempcomments ON #TempComments (UserId, PostId);

Our indexes get used and everyone is happy. Well, sort of. TempDB isn’t happy, but then it never is. It’s like your liver, or Oscar the Grouch.

Used and abused
When starting to index temp tables, I usually start with a clustered index, and potentially add nonclustered indexes later if performance isn’t where I want it to be.

You can do yourself a lot of favors by only bringing columns into the temp table that you need. If it’s still a lot, there’s a much higher chance SQL will judge a table scan cheaper than using a narrow nonclustered index along with a lookup.

Remember to test this carefully. You want to make sure that adding the indexes doesn’t introduce more overhead than they alleviate, and that they actually make your query better.

Windows Update KB2919355 Not Applicable

When trying to install SQL, Windows Update KB2919355 is a prerequisite, and on a new server this is not always installed and the SQL install stops until it is rectified.

Occasionally when you try and install the update you get the error KB2919355 is not applicable, yet the update is not installed. After searching the internet high and low a solution was found.

If you install Windows Update KB2975061 first and restart the machine and then install KB2919355 and restart the machine again, the SQL install prerequisites will be successful and you can continue with your install.

Hope this solution helps you and you get less grey hair and frustration trying to figure it out.

Setting Up SSIS in SQL Server Management Studio

Before you can deploy a SSIS project you need to set up SSIS in SQL Server Management Studio. When installing SQL you will select to install SSIS, however, it does not setup the SSISSDB or configure any settings for SSIS.

Here are the steps for setting up SSIS in SQL Server Management Studio.

Step 1:
Right-click on Integrations Services Catalogs and select Create Catalog.

Step 2:
Check the box next to Enable CLR Integration and set a password for your SSISDB and click Ok.

You will then see you SSISDB listed under Databases and under Integration Services Catalogs.

Now you will be able to deploy your SSIS project to your server where you have set up SSIS.

The next post will be on deploying an SSIS project.

MDS Backup Strategies

Recently a colleague and I had a discussion about which method is the best to use for backing up MDS (Master Data Services) for disaster recovery. We came to the conclusion it all depended on the SQL environment version you were recovering to. Below are the 2 different methods for backing up an MDS database.

Model Backup
Pros:

  • Fast to create, can backup specific versions
  • Will always be able to deploy a model to a MDS install
  • Can be automated via SSIS
  • Can be restore to a server version that is +1, 0 and -1

Cons:

  • If you want to save backups of different models, then the models will have to be backed up separately.
  • Security is not retained; it will have to be redone. Can take about 2 minutes per user. On a small model this should not be a problem, however, when scaled out it could be very time consuming.
  • Saving more than one backup of a model would require a script/maintenance task to remove backups older than X amount of time.

When deploying the model, the option DeployClone must be used, not DeployNew. DeployNew will remove the MUID and therefore will not allow any changes to be deployed to that model at a later stage. DeployClone keeps it all intact and will allow for future updates.

Database Backup
Pros:

  • Security is retained
  • Simple to set-up and maintain
  • Easy to restore
  • Maintenance tasks to clean up old backups

Cons:

  • Cannot be restored to a server version less than the current version.

The database backup for MDS can be scheduled just like other backups. You can use the maintenance wizard to set up the backup and clean up task on a schedule.

If you know what version the disaster recovery environment will be on, for example the same version as the current MDS server version or higher, then it is simpler to use the Database Backup method. If you are uncertain of what version the disaster recovery version will be then you would want to use the Model Backup method in case you have to restore to a version less than the current MDS server version.

Hope this helps you to choose what is best suited for your environment!

Special thanks to Janlo Du Toit for the discussion on these methods 🙂

Contained Database

A contained database is a database that is isolated from other databases and from the instance of SQL Server that hosts the database. Basically it includes all database settings and the metadata within itself thereby resulting in no configuration dependencies on the instance of the SQL Server Database Engine where the database is actually installed.

This feature is available from SQL Server 2012 and it is available on Azure SQL Database. This feature is not available on Azure SQL Data Warehouse or Parallel Data Warehouse.

One of the benefits of contained databases is that you will no longer struggle with orphaned users when moving databases or working with high availability. From SQL Server 2012 it introduces the notion of boundaries which are borders that define where a database and its metadata and settings end and where server-level features, metadata, and dependencies begin. In addition to implicitly allowing increased portability, boundaries are also used to help explicitly isolate data, settings, features, and capabilities.

There are 3 types of containment levels:

  • NONE. This default containment mode is what exists prior to SQL Server 2012. There are no boundaries defined or visible.
  • PARTIAL. With partially contained databases, you have the ability to define clearer boundaries between databases and the server, making it easier for metadata to be hosted within the databases. This, in turn, makes SQL Server databases more portable and less dependent on underlying hosts.
  • FULL. Full containment will enable greater database portability and allow for strict enforcement of containment boundaries as a means of fully isolating databases from each other and from the underlying host.

In SQL Server 2016 there are 4 ways in which SQL Server helps users to isolate the database from the instance.

  • The metadata that describes the database is maintained in the database and not in the master database.
  • All metadata are defined using the same collation
  • User authentication can be performed by the database, reducing the databases dependency on the logins of the instance of SQL Server
  • The SQL Server environment (DMV’s, XEvents, etc.) reports and can act upon containment information.

There are two types of users for contained databases:

  • Contained database user with password. These are authenticated by the database.
  • Windows principals. Authorized windows users and members of authorized Windows groups can connect directly to the database and do not need logins in the master database. The database trusts the authentication by Windows.

Benefits of using Partially Contained Databases
Database Movement
When moving a database, problems occur with some important information being unavailable after moving to another instance. Login information is stored within an instance and not within a database, therefore when moving the database to another instance will lose this information. The partially contained database can store important information in the database so the database still has the information after it is moved.

Benefit of Contained Database Users with Always On
By reducing the ties to the instance of SQL Server, partially contained databases can be useful during failover when you use Always On Availability Groups. The contained users is a very significant feature in an Always On solution. If the users are contained users, in the event of a failover, people would be able to connect to the failover node without creating logins on the instance hosting the failover node.

Initial Database Development
When creating a database, the developer may not always know where the database will be deployed and all the environmental variables that could impact the database. With a non-contained model, the developer would need to consider all the environmental variables and the impacts it could have on the database and develop accordingly. With using partially contained databases, the developer can identify instance-level impacts on the database.

Database Administration
Database administration becomes easier with maintaining database settings, instead of having the settings in the master database you can now maintain them in the database. This allows each database owner to have more control over their database without giving the database owner sysadmin permission.

Limitations with Partially Contained Databases
With almost any feature it comes with certain limitations. The following are limitations of a Partially Contained Database where the below features are not allowed:

  • Cannot use replication, change data capture, or change tracking.
  • Numbered procedures
  • Schema-bound objects that depend on built-in functions with collation changes
  • Binding change resulting from collation changes, including references to objects, columns, symbols, or types.

Identifying Database Containment
There are two tools to identify the containment status of a database, sys.dm_db_uncontained_entities and database_uncontained_usage event. sys.dm_db_uncontained_entities is a view that shows all potentially uncontained entities in the database. However if any entities are using dynamic SQL, it cannot be determined until runtime whether the entity is contained or not. Database_uncontained_usage Xevent occur whenever an uncontained entity is identified at run time. Therefore, any uncontained user entities you have not run will not be identified by this XEvent.

This could be an awesome feature to use for database developers from SQL Server 2012, hope this post was helpful for you!

10252137_641249792618222_6085988990639587497_n

Temp Table VS Table Variable

There have been many discussions on whether a temp table is better to use or a table variable. It depends on what you would like to do. They both have their pros, cons and limitations.

Table Variable Limitations

  • Table variable changes do not get rolled back during a transaction ROLLBACK whereas temp table changes will.
  • Table variables cannot have Non-Clustered Indexes
  • You cannot create constraints in table variables
  • You cannot create default values on table variable columns
  • Statistics cannot be created against table variables
  • Table variables cannot have a parallel plan, whereas temp tables are not restricted in this manner
  • Table variables cannot be selected into, altered, truncated or be the target of the DBCC commands such as DBCC CHECKDENT or of SET IDENTITY INSERT
  • Query hints are not supported
  • CHECK constraints on table variables are not considered by the optimizer for simplification, implied predicates or contradiction detection.

Similarities between Temp Tables and Table Variables

  • Clustered indexes can be created on both
  • Both are logged in the transaction log
  • Instantiated in the tempdb
  • You can perform DML queries against the tables

Logical Location
Table variables behave more as if they were part of the current database. The table variable column collations if not specified explicitly will be that of the current database. Temp tables will use the default collation of tempdb. User-defined data types and XML collections must be in tempdb in order for you to use it in temp tables, however, table variables can use them from the current database.

  • In a contained database, temporary table data is collated in the collation of the contained database. All metadata associated with the temporary tables will be in the catalog collation.
  • Named constraints may not be used in temporary tables.
  • Temporary tables may not refer to user-defined functions, user-defined types or XML collections.

Visibility to different scopes
Table variables are limited in such that they can only be accessed in the current batch and scope in which they are declared. Temp tables are accessible within child batches (nested triggers, procedures, exec calls). Temp tables created at the outer scope (@@NESTLEVEL=0) can span batches too as they persist until the session ends.

Lifetime
Table variables are created implicitly when containing a DECLARE @MyTableVariable TABLE statement above any user code is executed, and they are dropped implicitly at the end. Temp tables are created explicitly when the CREATE TABLE #MyTemp statement is encountered and can be dropped explicitly with the DROP TABLE or will be dropped when the session ends. If created in a child batch with @@NESTLECT > 0 it will be dropped implicitly.

Within stored routines both types can be cached rather than repeatedly creating and dropping new tables. There are restrictions on when this can occur, it is possible to violate temp tables, the restrictions on table variables prevent any violations. The maintenance overhead on cached temp tables is slightly greater than for table variables. For more info on this, take a look here.

Transactions
Temp table operations are carried out as part of the user transaction itself, whereas table variable operations are carried out as system transactions, independent of any outer user transaction. A ROLLBACK command will therefore affect a temp table but leave a table variable untouched.

Logging
Both generate log records in the tempdb transaction log. A common misconception is that this is not the case for table variables. The quantity of logging is very similar, a very important difference is that log records related to temp tables cannot be cleared out until any containing user transaction finishes. A long running transaction that at some point writes to temp table will prevent log truncation in the tempdb whereas the autonomous transaction spawned for table variables do not.

Indexes
For versions prior to SQL Server 2016 indexes can only be created implicitly on table variables as a side effect of adding a unique constraint or primary key. This means that only unique indexes are supported. Table variables do not support included columns, filtered indexes (until 2016) or partitioning, whereas temp tables do.

In SQL Server 2014 non unique indexes can be declared inline in the table variable definition. Now in SQL Server 2016 filtered indexes can be applied to table variables.

Hope the above gives you a bit more clarity on the benefits of each one in helping you decide whether to use a temp table or table variable in your scenarios.

whatittakes

Limit Memory Used By SQL

Ever noticed SQL using most of the Memory (RAM) on the server and all your other applications start slowing down? This could be because SQL is using the maximum amount of memory it can get a hold of.

When SQL is installed the default max amount of memory it is allowed to use is set to 2147483647 MB. You can check this by right-clicking on the server at the top in the Object Explorer, selecting Properties, then selecting memory.
memory
Properties
To decide what to set the max memory of SQL to use you can use this simple rule; leave 4GB or 10% of total memory free, whichever is larger on your instance to start with, and adjust this as needed. The free memory will be consumed by the OS and other applications on the server.

To calculate this, remember 1GB is 1024MB. If your server has 32GB of RAM then you would take 28 x 1024 = 28672MB. You would then insert this into the max memory.
memoryset
NB! This does not require restarting your SQL Server instance, but it will cause execution plans to recompile.
motivational-quotes-for-students-studying-quotesgram-NQVh0v-quote

Detecting ETL Duration Anomalies with Simple Linear Regression

Thought I would share a piece on detecting ETL duration anomalies using Simple Linear Regression. This post will give you a brief overview of what simple linear regression is and how you can use it to your advantage with testing. In this post I will be demonstrating how to use Simple Linear Regression to detect ETL duration Anomalies.

What happens with ETLs is that they normally run for a set amount of time, give or take a few minutes every time. Then all of a sudden it starts running extremely long. If you have nothing in place to check this, other than looking at it manually each day, then this could be your solution.

Let me begin by explaining what Simple Linear Regression is.

The Simple Linear Regression model is used in statistics, where a straight line is plotted through a set of points on a graph. It is plotted in such a way that makes the sum of squared residuals of the model as small as possible.

In order to plot the line on the graph you need to know where the line intercepts on the y axis and the slope of the line. You will also need to set the standard deviation to detect when something is in fact an anomaly. See Figure below.
LR

Step 1:
To start you will need to create a user-defined data type to accept the values of the X and Y data points.

CREATE TYPE dbo.XYDataPoints AS TABLE
(
	GroupID VARCHAR(200) NOT NULL,
	X FLOAT NULL,
	Y FLOAT NULL
)

Step 2:
You need to create a table-valued function which will calculate the simple linear regression of the ETL’s durations.

CREATE FUNCTION [dbo].[fn_GetSimpleLinearRegression] (@MyInput AS XYDataPoints READONLY)

RETURNS @MyOutput TABLE 
(
	-- Columns returned by the function
	 GroupID			VARCHAR(200)
	,Slope				FLOAT
	,Intercept			FLOAT
	,StDevError			DECIMAL(18,5)
)
AS 

BEGIN

	;WITH BASE AS -- Check for NULL source
	(	
		SELECT	 GroupID
				,X
				,Y
		FROM @MyInput
	)
	,XYBAR AS
	(
		SELECT	GroupID
				,Y
				,yBar = AVG(Y) OVER(PARTITION BY GroupID ORDER BY X ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
				,X
				,xBar = AVG(X) OVER(PARTITION BY GroupID ORDER BY X ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
				,xCount	= COUNT(X) OVER(PARTITION BY GroupID ORDER BY X ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
		FROM BASE
	)
	,SLOPE AS
	(
		SELECT	GroupID
				,Slope = SUM((X - xBar) * (Y - yBar)) / SUM(POWER(X - xBar,2))
				,MAX(yBar) AS yBar
				,MAX(xBar) AS xBar
				,MAX(xCount) AS xCount
		FROM XYBAR
		GROUP BY GroupID
	)
	,INTERCEPT AS
	(
		SELECT	 GroupID
				,Slope
				,xBar
				,yBar
				,xCount
				,Intercept = yBar - xBar * Slope
		FROM SLOPE
	)
	INSERT @MyOutput
	SELECT	 INTERCEPT.GroupID
			,Slope
			,Intercept
			,StDevError = STDEV	(	
									ABS(y - ((Slope * (x)) + Intercept))
								)
	FROM BASE
	JOIN INTERCEPT ON BASE.GroupID = INTERCEPT.GroupID
	GROUP BY Slope,Intercept, INTERCEPT.GroupID

	RETURN
END

Step 3:
You need to create a table to contain a history of the duration of ETLs. I chose to create a tests schema and to create all my objects for this detection under the tests schema.

CREATE TABLE [tests].[ETLLogDurationHistory](
	[ETLLogDurationHistoryID] [int] IDENTITY(1,1) NOT NULL,
	[ETLName] [varchar](250) NOT NULL,
	[DateRun] [date] NOT NULL,
	[TimeRun] [time](7) NOT NULL,
	[Duration] [time](0) NOT NULL,
	[DurationInt] [int] NOT NULL,
	[ETLOutcome] [varchar](250) NOT NULL,
	[IsError] [bit] NULL,
	[Sensitivity] [decimal](18, 3) NULL,
	[CurrCntSigma] [decimal](18, 5) NULL,
 CONSTRAINT [PK_ETLLogDurationHistory] PRIMARY KEY CLUSTERED 
(
	[ETLLogDurationHistoryID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Step 4:
You need to create a table to store the error details that were detected for the ETL duration anomalies.

CREATE TABLE [tests].[TestFailureLog](
	[TestFailureLogID] [int] IDENTITY(1,1) NOT NULL,
	[TestDate] [date] NOT NULL,
	[TestTime] [time](7) NOT NULL,
	[TestCollection] [varchar](100) NOT NULL,
	[TestName] [varchar](300) NOT NULL,
 CONSTRAINT [PK_TestFailureLog] PRIMARY KEY CLUSTERED 
(
	[TestFailureLogID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Step 5:
Create the proc to detect the ETL duration anomalies. This proc will also insert into the history table to keep track of how long ETLs ran for in order to work out whether there is an anomaly or not.

CREATE PROC [tests].[tst_ETLDurationAnomaly] 
AS
BEGIN
	SET NOCOUNT ON;
	/*----------------------------------------------	
	  * Uncomment for debug
	------------------------------------------------*/
	 --IF OBJECT_ID('tempdb..#PreOp') IS NOT NULL DROP TABLE #PreOp
	 --IF OBJECT_ID('tempdb..#tReport') IS NOT NULL DROP TABLE #tReport

	/*----------------------------------------------
	  * Tunable parameters
	------------------------------------------------*/
	DECLARE @nDaysHist			SMALLINT	= 90,	-- Set day count to limit history retention 
			@nSampleMAX			SMALLINT	= 30,	-- Set maximum history count to evalute using simple linear regression
			@nSampleMIN			SMALLINT	= 3,	-- Set minimum history count to evalute using simple linear regression
			@STDevSensitivity	SMALLINT	= 6		-- Set a value to increase or decrease the upper and lower threshold
			
	/*-----------------------------------------------------------------------------------------------------------------------------------
		* FLUSH TABLES                    
		* Clear history table of records older than @nDaysHist days or records added on current day to get latest stats
	-----------------------------------------------------------------------------------------------------------------------------------*/
	DELETE
	FROM tests.ETLLogDurationHistory
	WHERE DateRun < DATEADD(DAY, -@nDaysHist, GETDATE()) ; 
        /*----------------------------------------------------------------------------------------------------------------------------------- 
                * Get duration for all ETLs * Duration will be recorded per ETL, no matter how many times they have been recorded for that 
                  day already 
        -----------------------------------------------------------------------------------------------------------------------------------*/ 
        INSERT INTO [tests].[ETLLogDurationHistory] (ETLName,DateRun,TimeRun,Duration,DurationInt,ETLOutcome)
	SELECT	  ETLName	=	name
		 ,DateRun	=	CAST(CAST(CONVERT(DATE, CONVERT(VARCHAR(10), run_date)) AS VARCHAR(10)) AS DATE)
		 ,TimeRun	=	CAST(CAST(CONVERT(VARCHAR(8), DATEADD(SECOND, run_time, ''), 114) AS VARCHAR(10)) AS TIME(0))
		 ,Duration	=	CAST(CONVERT(CHAR(8), DATEADD(SECOND, run_duration, ''), 114)  AS TIME(0))
		 ,DurationInt	=	run_duration
		 ,ETLOutcome	=	CASE 
						WHEN SJH.run_status=0 THEN 'Failed'
						WHEN SJH.run_status=1 THEN 'Succeeded'
						WHEN SJH.run_status=2 THEN 'Retry'
						WHEN SJH.run_status=3 THEN 'Cancelled'
						ELSE 'Unknown'  
						END 
	FROM MSDB.dbo.sysjobhistory	SJH  
	JOIN MSDB.dbo.sysjobs		SJ ON SJH.job_id = SJ.job_id  
	WHERE step_id = 0  
	AND	name NOT IN (
				'collection_set_1_noncached_collect_and_upload'
				,'collection_set_2_upload'
				,'collection_set_3_upload'
				,'mdw_purge_data_[BI_MDW]'
				,'SSIS Server Maintenance Job'
				,'sysutility_get_cache_tables_data_into_aggregate_tables_hourly'
				,'sysutility_get_views_data_into_cache_tables'
				,'syspolicy_purge_history'
				,'sysutility_get_cache_tables_data_into_aggregate_tables_daily'
			    )
	AND	CAST(CONVERT(DATE, CONVERT(VARCHAR(10), run_date)) AS VARCHAR(10)) + ' ' 
		+ CAST(CONVERT(VARCHAR(8), DATEADD(SECOND, run_time, ''), 114) AS VARCHAR(10)) >= DATEADD(D,-1,GetDate())  
	ORDER BY name,run_date,run_time  

	/*-----------------------------------------------------------------------------------------------------------------------------------
	    * MAGIC
		* Using Simple Linear Regression, issue an estimation for what the Duration per ETL should be "today"
		* Based on the regression line, define thresholds that will determine alert

	*/-----------------------------------------------------------------------------------------------------------------------------------
	SELECT	 ETLName
			,X						= DATEDIFF(DAY, MIN(DateRun) OVER (PARTITION BY ETLName), DateRun) + 1
			,DateRun
			,Y						= DurationInt
			,SampleOrdinal			= ROW_NUMBER() OVER (PARTITION BY ETLName ORDER BY DateRun DESC) 
			,MeasurementsPerTable	= COUNT(*) OVER (PARTITION BY ETLName)
	INTO #PreOp
	FROM
	(
		SELECT	 ETLLogDurationHistoryID
				,ETLName
				,DateRun
				,TimeRun
				,Duration
				,DurationInt
				,ETLOutcome
				,RecentPerDay	= ROW_NUMBER() OVER (PARTITION BY ETLName, DateRun ORDER BY TimeRun DESC)
		FROM tests.ETLLogDurationHistory	EDH
		WHERE EDH.DateRun < CAST(GETDATE() AS DATE)
	)DT
	WHERE RecentPerDay = 1

	DECLARE @tXYData XYDataPoints
	INSERT INTO @tXYData(GroupID,X,Y)
	SELECT	 ETLName
			,X
			,Y
	FROM #PreOp
	WHERE SampleOrdinal <= @nSampleMAX AND MeasurementsPerTable >= @nSampleMIN

	/*-----------------------------------------------
	  * Table to use for updating the error log
	-----------------------------------------------*/
	DECLARE @ErrorOutput TABLE	( 
						FullTableName VARCHAR(200) NOT NULL
					)

	;WITH Pruned AS
	(
		SELECT	 ETLName
				,X						
				,DateRun
				,Y						
				,SampleOrdinal			
				,MeasurementsPerTable	
		FROM #PreOp pO
		WHERE	EXISTS	(
							SELECT *
							FROM @tXYData	tD
							WHERE tD.GroupID = pO.ETLName	
						)
	)
	,FNResponse AS
	(
		SELECT	 GroupID				
				,Slope				
				,Intercept			
				,StDevError			
		FROM dbo.fn_GetSimpleLinearRegression(@tXYData)
	)
	,TodayXCTE AS
	(
		SELECT	 ETLName
				,X						
				,DateRun
				,Y						
				,SampleOrdinal			
				,MeasurementsPerTable
				,GroupID				
				,Slope				
				,Intercept			
				,StDevError		
				,PredictedY	=	(	Slope * (
												--	TodayX
												DATEDIFF(DAY, MIN(DateRun) OVER (PARTITION BY ETLName), GETDATE()) + 1
											)
								) + Intercept
		FROM Pruned		P
		JOIN FNResponse	F	ON P.ETLName = F.GroupID
	)
	,ThresholdCTE AS
	(
		SELECT DISTINCT
				 ETLName
				,PredictedY
				,Y
				,UpperThreshold	= PredictedY + (@STDevSensitivity * STDevError)
				,LowerThreshold	= PredictedY - (@STDevSensitivity * STDevError)
				,StDevError
		FROM TodayXCTE
		WHERE SampleOrdinal = 1
	)
	,TblETLDurHistCTE AS
	(
		SELECT	 ETLName
				,DateRun
				,TimeRun
				,Duration
				,DurationInt
				,ETLOutcome
				,IsError
				,Sensitivity
				,CurrCntSigma
				,RecentPerDay	= ROW_NUMBER() OVER (PARTITION BY ETLName, DateRun ORDER BY TimeRun DESC)
		FROM tests.ETLLogDurationHistory	EDH
	)
	/*-----------------------------------------------
	  * Update only most recent record in history 
		table
	-----------------------------------------------*/
	UPDATE EDH
	SET  IsError		= 1
		,Sensitivity	= @STDevSensitivity
		,CurrCntSigma	= CEILING(ABS(EDH.DurationInt - ths.PredictedY) / NULLIF(ths.StDevError,0))
	/*-----------------------------------------------
	  * Output updated values to table variable 
		for later use 
	-----------------------------------------------*/
	OUTPUT	INSERTED.ETLName
	INTO	@ErrorOutput

	FROM TblETLDurHistCTE	EDH
	LEFT
	JOIN ThresholdCTE		ths	ON EDH.ETLName = ths.ETLName
	WHERE	EDH.RecentPerDay = 1
		AND EDH.DateRun = CAST(GETDATE() AS DATE)
		AND EDH.IsError IS NULL
		AND (
				(EDH.DurationInt NOT BETWEEN ths.LowerThreshold AND ths.UpperThreshold) 
				OR EDH.DurationInt = 0
			)

	/*-----------------------------------------------
	  * Log entry to failure log
	-----------------------------------------------*/
	INSERT INTO tests.TestFailureLog (TestDate, TestTime, TestCollection, TestName)		
	SELECT	 TestDate			= CAST(GETDATE() AS DATE)
			,TestTime			= CAST(GETDATE() AS TIME)
			,TestCollection		= 'ETLDuration'
			,FullTableName
	FROM @ErrorOutput
END

You can download the script file here.

Hope you are able to find this as useful as I do. Feel free to ask any questions in the comment section below!
10421110_647997325276802_799481460399430457_n