I am a consultant in Austin who can help make your data go fast, be secure and highly available. When I am engaged in a performance tuning project priority #1 isn’t to make sure your data go faster. Priority #1 is to make sure we get the same result sets while making your data go faster.
Free SQL Data Compare with T-SQL?
There are several tools out there that can be used to compare data. Today, I want to share how you can quickly do this on your own with T-SQL!
Let’s simplify the process. Our goal is to check two temp tables and validate if any of the data is different. This would include inserts, updates, and deletes. For this example, I will just do a dump of Sales.SalesOrderDetail in AdventureWorks into two temp tables as shown below.
SELECT *
INTO #Tmp1
FROM Sales.SalesOrderDetail
SELECT *
INTO #Tmp2
FROM Sales.SalesOrderDetail
Now we shouldn’t see any differences since we used the same table to create both temp tables. We are going to use two different SQL operators to compare these two temp tables while applying some data changes. We will focus on the UNION ALL and EXCEPT operators.
The Power of EXCEPT
Except is an underrated and underused SQL operation. In a nutshell, it will give you the results of the first query that are different from the next query. So, if the data of any column in #tmp1 is different from #tmp2 or if the row doesn’t exist in #tmp2 but is in #tmp1 it will get returned.
SELECT * FROM #Tmp1
EXCEPT
SELECT * FROM #Tmp2
Let’s go ahead and modify a column in #Tmp1 so you can see how this works. We are going to set OrderQty to five when SalesOrderId is 45313 and SalesOrderDetailId is 6210. This will change just one column in one row. We will then select these columns from both temp tables to see the change.
This is how most people would start using T-SQL to identify changes in data.
UPDATE #Tmp1 SET OrderQty = 5
WHERE SalesOrderID = 45313
AND SalesOrderDetailID = 6210
SELECT SalesOrderId, SalesOrderDetailID,
OrderQty FROM #Tmp1
WHERE SalesOrderID = 45313
AND SalesOrderDetailID = 6210
SELECT SalesOrderId, SalesOrderDetailID,
OrderQty FROM #Tmp2
WHERE SalesOrderID = 45313
AND SalesOrderDetailID = 6210
Finding Data Changes The Easy Way
Selecting the two tables is easy if we know what change occurred and there aren’t many changes. This can get complicated quickly. Therefore, if we just want to quickly know if we have differences lets take a look at my goto method using EXCEPT. To make this example easier to read instead of using “SELECT *” I will just focus on columns that are changing. In a real example, I would want to know if any columns changed.
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp1
EXCEPT
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp2
If an insert or a column change occurs in #tmp1 we will see it in our EXCEPT SQL statement. This isn’t true if the change is only in #tmp2.
For example, an insert in #tmp2 or delete in #tmp1 would not be shown. To see this we would have to switch the temp tables in the EXCEPT clause as shown below.
INSERT INTO #tmp2 (SalesOrderId, ProductID,
SpecialOfferID, OrderQty, UnitPrice,
UnitPriceDiscount,LineTotal,
rowguid, ModifiedDate)
VALUES (45313, 1, 3, 1,1.25,0,
1.25*1, NEWID(), GETDATE())
DELETE FROM #Tmp1
WHERE SalesOrderID = 45313
AND SalesOrderDetailID = 6211
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp1
EXCEPT
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp2
/* We will now see our insert and delete */
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp2
EXCEPT
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp1
Our first except shows us data in #tmp1 that is not in #tmp2 because the OrderQty column changed in #tmp1. The second EXCEPT shows us data in #tmp2 that isn’t in #tmp1 because of our insert into #tmp2 and also our delete from #tmp1 would be found in #tmp2 but not #tmp1.
UNION ALL for the Win!
To wrap this up now we can include a UNION ALL operation between the two EXCEPT operations. This would get us any data changes to the columns selected from the temp tables.
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp1
EXCEPT
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp2
UNION ALL
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp2
EXCEPT
SELECT SalesOrderId, SalesOrderDetailID, OrderQty
FROM #Tmp1
Typically, I need to verify is the data before and after is the same. This is a quick and easy way to get that answer. Now I know you might want to take this to the next level. You might be thinking how do I just get the unique key for the table and columns that changed. I will leave that as an exercise for you.
UPDATE: 3/16/2020 The first script below was updated from setting QDS to “Read_Write” instead of “Read_Only”. Martin thank you for pointing this out in your comment below.
This month’s T-SQL Tuesday is hosted by Tracy Boggiano. Tracy invites us all to write about adopting Query Store. Today, I wanted to share my favorite but a very unique way I use the Query Store for Workload Replays.
You can read more about the invite in detail by clicking on the T-SQL Tuesday logo in this post.
Today, I wanted to talk about my least favorite part of replaying workloads. It’s having an extended event or server-side trace running during a workload replay only so we can compare the results at a query-level when the replay is finished. Now, this might seem like a trivial thing but when you have workloads over 10k batch requests/sec this can consume terabytes of data quickly. The worst part is waiting to read all the data, slice and dice the data for analysis.
Starting with SQL Server 2016 there is a better and faster way to go! You can replace your extended event or server-side trace with Query Store captured data. Today, I will show you how to use the Query Store for the same purpose.
Different Settings
Keep in mind our goal here is very different from the typical use case for using the Query Store. We want to capture metrics for all the queries executed during a workload replay. Nothing more and nothing less.
If we have the runtime results for multiple replays we can then easily compare the workload performance between the workload replays.
Most of our changes from the regular Query Store best practices are shown below:
Max Size (MB) – Need to make sure there is enough space to capture your whole workload. This size will vary by how much workload is being replayed.
Query Store Capture Mode set to All. Normally, not ideal, but remember we want to capture metrics for our whole workload being replayed.
Size Based Cleanup Mode set Off – Yup, we don’t want to lose our workload data that is capture until we persist in our ideal form. More on this later.
The Capture Process
Now, this is where you would use Database Experimentation Assistant (DEA), Distributed Replay or some other process to replay your consistent workload in an isolated non-production environment. This subject we will cover in another future post. For now, we will just have two replays called “Baseline” and “Change”. This simulates a baseline replay with no schema changes and then another change replay with a change introduced in the schema.
To capture our workload we just enable the Query store with our settings mentioned above and also clear out the query store right before our workload replay starts to help ensure we are just capturing our workload.
USE [master]
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE = ON
GO
ALTER DATABASE [YourDatabase]
SET QUERY_STORE (OPERATION_MODE = READ_WRITE,
MAX_STORAGE_SIZE_MB = 10000,
QUERY_CAPTURE_MODE = AUTO,
SIZE_BASED_CLEANUP_MODE = OFF)
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE CLEAR
GO
Stop Capturing Query Store Data
Once your replay is finished we will want to disable the query store from writing data into the query store. We want the least amount of non-workload data inside of the Query Store when we are using it for the sole purpose of comparing workloads.
USE [master]
GO
ALTER DATABASE [YourDatabase]
SET QUERY_STORE (OPERATION_MODE = READ_ONLY)
GO
Prepare Query Store Data for Long-Term Analysis
Now for smaller workloads, one might be happy with utilizing DBCC CLONEDATABASE to have a schema-copy of their workload with Query Store data persisted. This is perfectly fine. With bigger workloads being captured I have noticed there are ways to improve the performance of query store when doing analysis of the query store data. For example, clustered columnstore indexes can be very helpful for performance and compacity. Therefore, I like to have a schema for each replay and import the data. The following is a quick example of setting up a schema for a “baseline” replay and a “change” replay.
CREATE DATABASE [DBA]
GO
use [DBA]
GO
CREATE SCHEMA Baseline;
GO
CREATE SCHEMA Compare;
GO
Next, we will import our captured data from our baseline replay that’s in our read-only query store database. I also like to have a baked-in aggregate of metrics for reads, writes, duration and CPU at the query level.
use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Baseline.query_store_runtime_stats
FROM sys.query_store_runtime_stats;
SELECT * INTO DBA.Baseline.query_store_runtime_stats_interval
from sys.query_store_runtime_stats_interval;
select * INTO DBA.Baseline.query_store_plan
from sys.query_store_plan;
select * INTO DBA.Baseline.query_store_query
from sys.query_store_query;
select * INTO DBA.Baseline.query_store_query_text
from sys.query_store_query_text;
/* Addition for SQL 2017 */
select * INTO DBA.Baseline.query_store_wait_stats
from sys.query_store_wait_stats
use [DBA]
GO
SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Baseline.QueryResults
FROM Baseline.query_store_runtime_stats rs
JOIN Baseline.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Baseline.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash
Next, we would reset the database to our starting position and add our query store settings as mentioned above in this blog post and replay or workload again. This time, we would dump our data into the “change” schema
use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Compare.query_store_runtime_stats
FROM sys.query_store_runtime_stats;
SELECT * INTO DBA.Compare.query_store_runtime_stats_interval
from sys.query_store_runtime_stats_interval;
select * INTO DBA.Compare.query_store_plan
from sys.query_store_plan;
select * INTO DBA.Compare.query_store_query
from sys.query_store_query;
select * INTO DBA.Compare.query_store_query_text
from sys.query_store_query_text;
select * INTO DBA.Compare.query_store_wait_stats
from sys.query_store_wait_stats
use [DBA]
GO
SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Compare.QueryResults
FROM Compare.query_store_runtime_stats rs
JOIN Compare.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Compare.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash
Comparing Workload Results
Now that we have our two workloads imported we can now compare to see how the workload changed per query. I will break this down into two quick steps. First, get deltas per query. Second, get totals for how many times a query might be different in the query store. More on this a little later in the post.
/* Query Store Results */
use [DBA]
GO
SELECT DISTINCT
c.TotalExecutions - b.TotalExecutions AS ExecutionDelta,
c.TotalExecutions AS CompareExecutions,
b.TotalExecutions AS BaselineExecutions,
c.TotalDuration - b.TotalDuration AS DurationDelta,
c.TotalCPU - b.TotalCPU AS CPUDelta,
c.TotalReads - b.TotalReads AS ReadDelta,
c.TotalWrites - b.TotalWrites AS WriteDelta,
c.TotalReads AS CompareReads,
b.TotalReads AS BaselineReads,
c.TotalCPU AS CompareCPU,
b.TotalCPU AS BaselineCPU,
c.TotalDuration AS CompareDuration,
b.TotalDuration AS BaselineDuration,
c.query_hash
--q.query_sql_text
INTO #CTE
FROM Baseline.QueryResults b
JOIN Compare.QueryResults c ON b.query_hash = c.query_hash
select COUNT(query_sql_text) AS QueryCount,
MAX(query_sql_text) query_sql_text,
MIN(query_id) MinQueryID,
qsq.query_hash
INTO #Compare
from Compare.query_store_query qsq
JOIN Compare.query_store_query_text q
ON qsq.query_text_id = q.query_text_id
where qsq.is_internal_query = 0
GROUP BY query_hash
select COUNT(query_sql_text) AS QueryCount,
MAX(query_sql_text) query_sql_text,
MIN(query_id) MinQueryID,
qsq.query_hash
INTO #Baseline
from Baseline.query_store_query qsq
JOIN Baseline.query_store_query_text q
ON qsq.query_text_id = q.query_text_id
where qsq.is_internal_query = 0
GROUP BY query_hash
select cte.*
, a.QueryCount AS Compare_QueryCount
, b.QueryCount AS Baseline_QueryCount
, a.MinQueryID AS Compare_MinQueryID
, b.MinQueryID AS Baseline_MinQueryID
, a.query_sql_text
FROM #CTE cte
JOIN #Compare a on cte.query_hash = a.query_hash
JOIN #Baseline b on cte.query_hash = b.query_hash
WHERE 1=1
AND ExecutionDelta = 0
ORDER BY ReadDelta ASC
Query Store for Workload Replays
Lessons Learned Along the Way!
Initially, working with the query store I thought query_id was going to be my best friend. I quickly learned that my old friend query_hash is more helpful for multiple reasons. One, I can easily compare queries between different replays. That’s right now all workload replays get you the same query_id even when the workload is the exact same being replayed. Two, I can compare them with different databases as well. Finally, query_hash is very helpful with ad-hoc workloads as I can aggregate all the different query_ids that have the same query hash.
I recently spoke at a conference and was asked what is the easiest way to import databases to Azure SQL Database. Therefore, I wanted to share how I do this with DBATools.io. You can use the same code to just export if you need a local copy of an Azure SQL database as well.
Import-Module dbatools -Force
<# Variables #>
$BackupPath = "C:\Demo\AzureSQL\Bacpac" #folder location for backups
$SourceInstance = "sql2019\sql2016"
$DBName = "AdventureWorksLT2012"
$AzureDestInstance = "procuresqlsc.database.windows.net"
$DBNameDest = $DBName
<# backpac options for import and export #>
$option = New-DbaDacOption -Type Bacpac -Action Export
$option.CommandTimeout = 0
$option2 = New-DbaDacOption -Type Bacpac -Action Publish
$option2.CommandTimeout = 0
<#
The following assums Azure SQL Database exists and is empty
Azure will create database by default if it doesn't exist
#>
$bacpac = Export-DbaDacPackage -Type Bacpac -DacOption $option -Path `
$BackupPath -SqlInstance $SourceInstance -Database $DBName
Publish-DbaDacPackage -Type Bacpac -SqlInstance `
$AzureDestInstance -Database $DBNameDest -Path $bacpac.path `
-DacOption $option2 -SqlCredential username
What Is my Performance Tier?
Great question, as of 3/3/2020 if the database in Azure SQL Database does not exist then it will be created. When its created the following database uses the default performance tier. This is General Purpose (Gen5) with 2 vCores.
How to create cheaper databases
Great question, you can import databases to Azure SQL Database cheaper using PowerShell. It is as simple as using the Azure PowerShell Module. The following example below I use my existing Azure SQL Database server and I end up creating a new database with the “S0” tier.
You got all the nuts and bolts to generate a script that can migrate all your databases on an instance. Then you can import databases to Azure SQL Database in one loop.
Hello everyone, this is John your Austin SQL Server Consultant here and today I am going to answer a question that comes up often so I wanted to blog about it for everyone. The question of the day is where can I download the previous SQL Server Updates?
The History towards Updates
Back in the day when we were young but not a kid anymore there were service packs and cumulative updates. We could download these separately and all of the updates were easy to find. Now today, if you click on a KB article to download an update you get pointed to the latest update as shown below.
How far is My SQL Server on Updates?
This is also another great question. My favorite place to find all the history of updates toward SQL Server is the SQL Server Build List Blog. You can cross-reference this towards your version by running the following query below.
I fully get exactly why Microsoft is trying to point everyone to the latest update. Normally, it makes perfect sense but let’s take a look at today Jan 9th, 2020. I am planning to update SQL Server 2017 to CU17. Its been out for two months. Today CU18 is released and if I wasn’t careful I would have downloaded a different update than expected.
Getting a previous SQL Server Update
So, on to the solution. It’s actually an easy one but also one that is easy to overlook as well. Let’s go back to the new standard update page for SQL Server updates.
That is right, the Microsoft Update Catalog is your best friend to find all your updates for Microsoft products including SQL Server. You can search for the product you want. For example, in this case, I am looking for SQL Server 2017 and can see all the previous updates for SQL Server.
If you enjoyed this tip and found it helped make sure to join our newsletter so you can enjoy more free tips, tricks, and video demos.
Hello everyone! This is your SQL Server Consultant in Austin, TX and due to some posts on twitter about SQL PASS recordings costing $999 I wanted to share some of my favorite places to find free SQL Server training videos. I hope this helps make your data fast, secure and highly available in 2020 and beyond!
Where Is the Good Stuff Give Me Some More
Those who know me know I love music. I especially love the underground non-mainstream content. Therefore, my first recommendation is UserGroup.TV. As of December 27th, there are 127 videos tagged as SQL Saturday alone. Shawn goes around to almost every Tech conference he can find and brings his rig and records sessions for the community.
Are you in love with the new pop singles? Wish you could hear them before they hit the radio? If you like your tech like your music than Microsoft Ignite is for you. Every year Microsoft puts on a conference called, Ignite. This conference is usually where Microsoft will break its cutting edge tech. My favorite thing about the conference is that the content is available online for free. Midway through the page, you can search through the massive collection of free recorded sessions.
Next up, is the consistent greatest hits. Almost every session is a banger! This reminds me of my favorite Microsoft Data Platform conference. This is SQLBits and yes, their video content is also available for FREE.
In SQL Server 2016 we saw Query Store. Query Store was a game changer to help database administrators identify troublesome queries. Query Store helps DBAs make those queries run faster. Microsoft’s marketing team even jumped on to help coin the phrase, “SQL Server It Just Runs Faster.” With SQL Server 2017, this started to get even better with automatic tuning. Don’t worry database administrators. Automatic Tuning will just enhance your career and not replace it.
SQL Server 2017 Automatic Tuning looks for queries where execution plans change and performance regresses. This feature depends on Query Store being enabled. Note, even if you don’t turn on Automatic Tuning you still get the benefits of having access to the data. That is right. Automatic Tuning would tell you what it would do if it was enabled. Think of this as free performance tuning training. Go look at the DMVs and try to understand why the optimizer would want to lock in an execution plan. We will actually go through a real-world example:
Automatic Tuning with SQL Server 2017
First, let’s take a quick look at the output of the data. You can find the query and results we will focus on below.
SELECT reason, score,
script = JSON_VALUE(details, '$.implementationDetails.script'),
planForceDetails.*,
estimated_gain = (regressedPlanExecutionCount+recommendedPlanExecutionCount)
*(regressedPlanCpuTimeAverage-recommendedPlanCpuTimeAverage)/1000000,
error_prone = IIF(regressedPlanErrorCount>recommendedPlanErrorCount, 'YES','NO')
--INTO DBA.Compare.Tunning_Recommendations
FROM sys.dm_db_tuning_recommendations
CROSS APPLY OPENJSON (Details, '$.planForceDetails')
WITH ( [query_id] int '$.queryId',
[current plan_id] int '$.regressedPlanId',
[recommended plan_id] int '$.recommendedPlanId',
regressedPlanErrorCount int,
recommendedPlanErrorCount int,
regressedPlanExecutionCount int,
regressedPlanCpuTimeAverage float,
recommendedPlanExecutionCount int,
recommendedPlanCpuTimeAverage float
) as planForceDetails;
I will break the results down into two photos to make them fit well in this blog post.
Free Tuning Recommendations with SQL Server 2017 (1/2)
Free Tuning Recommendations with SQL Server 2017 (2/2)
Now we know in the query store query_id 2271 has CPU time changing from 7,235ms to 26ms. That’s a big difference. Let’s take that query and look at its findings by using the tracked query report inside SSMS.
Find my Changed Query. Did the plans change?
Here we can see the major difference between the two execution plans. One is averaging over 14 seconds in duration while the other is under a second.
Query Store showing the performance difference between the two plans
Now we can select both plans on the screen above and look at the execution plans side by side inside of SSMS. When doing so, we see the common example of the optimizer determining if it is better to scan an index vs a seek with a key lookup.
Using SSMS to compare auto tuning recommended query.
To complete the example I want to point out that automatic tuning would lock in the index seek plan (Plan 2392). In SQL Server 2016 you can do this as well manually inside Query Store. With SQL Server 2017 it can be done automatically for you with Automatic Tuning. If you have ever woken up to slow performance due to an execution plan changing and performance going through the drain this might be a life saver.
Last weekend I had a blast speaking at the SQL Saturday in Chicago. It was awesome to share my knowledge and also catch up with some good friends. My talk was on Automating the Pain Away with Query Store and Automated Tuning. I hope this session helped people leverage Query Store and Automated Tuning to resolve their parameter sniffing problems.
John Sterrett teaching Performance Tuning at SQL Saturday Chicago
This week at MVP Summit I got to talk with a friend who loves profiler. We were talking about capturing workloads and doing analysis on them. T-SQL or other 3rd party tools like ClearTrace, ReadTrace would be used to aggregate the data to get insight into the top offenders. I mentioned that workload analysis could be done with extended events without writing a single line of T-SQL. This was a lightbulb moment for him. Quickly, I learned that he is not alone and that there are a lot of people in the community who didn’t know this either.
I am including a quick video below to show you why Extended Events is a great solution for finding top offenders in a workload.
In some DBA circles, backups are just as popular as politicians! However, recoverability is the most important task for database administrators. While SQL Server 2017 added so many great features like graph, SQL on Linux, and more. Today, I want to focus on two small underdog features that might be game changers on how you do backups.
SQL Server Backups are often as popular as politicians.
Smart Differential Backups
Databases are getting bigger, not smaller. More storage capacity is needed for these backups. Backup compression might hurt your storage capacity. Today, I am seeing more policies include full and differential backups along with transactional log backups. Differential backups are used to offset daily full backups. Typically people will use time increments as the basis for when backups should occur. It’s very common to see automated jobs that do weekly full and daily differentials to reduce storage capacity needed for backups.
How often does your data change? Is the rate of change very consistent or does it change depending on the week? Let’s assume this week it’s Tuesday and over 80% of your data pages have changed. You are not benefiting from taking daily differentials for the rest of the week. The opposite goes for data that doesn’t change that often. Maybe you can save a lot of space by doing less frequent full backups.
Leveraging smart differential backups could greatly reduce your storage footprint and potentially reduce the time it takes to recover.
In SQL Server 2017 you can see exactly how many pages changed since your last full backup. This could be leveraged to determine if you should take a full or differential backup. Backup solutions and backup vendors will be better for this.
select CAST(ROUND((modified_extent_page_count*100.0)/allocated_extent_page_count,2)
as decimal(6,2)) AS 'DiffChangePct'
,modified_extent_page_count
,allocated_extent_page_count
from sys.dm_db_file_space_usage
GO
Smart Transactional Log Backups
The time your users are offline while you are recovering to the point of failure is critical. It could be the difference between keeping and losing customers. Point-in-time recovery is mandatory for a critical database. Transactional log backups have to be restored in order.
Recovery Point Objectives (RPO) drive how often you take transactional log backups. If you have a policy that says you can only lose ten minutes of data, you need transactional log backups every ten minutes. Is this really true if there were no changes? What if your RPO is driven by the amount of data loss and not the time of the loss? Either way, you can now control when transactional log backups occur based on the amount of data that has changed since the last transactional log backup.
SELECT name AS 'DatabaseName',dls.log_since_last_log_backup_mb,
dls.log_truncation_holdup_reason, dls.active_vlf_count,
dls.active_log_size_mb
FROM sys.databases s
CROSS APPLY sys.dm_db_log_stats(s.database_id) dls
This site uses functional cookies and external scripts to improve your experience. Which cookies and scripts are used and how they impact your visit is specified on the left. You may change your settings at any time. Your choices will not impact your visit.
NOTE: These settings will only apply to the browser and device you are currently using.