Typically, I am a big advocate of performance monitor but one place I commonly see performance monitor being misused is with benchmarking disk counters for SQL Server. Typically, you will see people applying best practices like having dedicated spindles for transactional log files and dedicated spindles for data files. With that said, multiple database files and/or transactional log files are collocated on the same logical or physical drive(s). Therefore, when your looking a disk latency like reads per second or writes per second it can almost be impossible to determine which data file(s) is causing the disk latency. You just know which physical or logical drive has latency issues.
Meet THE SYS.DM_IO_VIRTUAL_FILE_STATS DMV
Starting with SQL Server 2005 DBA’s were granted access to the sys.dm_io_virtual_file_stats dynamic management view. This DMV gives you access into how many physical I/O operations occurred, how much latency has occurred, how much data was written and more. The secret key is that this is for each independent database file and that this data is collected since the last time the instance started so we need to keep that in mind. Ideally, we would want to capture this data, wait for a period of time, capture this data again and then compare the results. This objective is completed in the code shown below. For this example we will wait five minutes between captures.
DECLARE @WaitTimeSec int
SET @WaitTimeSec = 300 -- seconds between samples.
/* If temp tables exist drop them. */
IF OBJECT_ID('tempdb..#IOStallSnapshot') IS NOT NULL
BEGIN
DROP TABLE #IOStallSnapshot
END
IF OBJECT_ID('tempdb..#IOStallResult') IS NOT NULL
BEGIN
DROP TABLE #IOStallResult
END
/* Create temp tables for capture baseline */
CREATE TABLE #IOStallSnapshot(
CaptureDate datetime,
read_per_ms float,
write_per_ms float,
num_of_bytes_written bigint,
num_of_reads bigint,
num_of_writes bigint,
database_id int,
file_id int
)
CREATE TABLE #IOStallResult(
CaptureDate datetime,
read_per_ms float,
write_per_ms float,
num_of_bytes_written bigint,
num_of_reads bigint,
num_of_writes bigint,
database_id int,
file_id int
)
/* Get baseline snapshot of stalls */
INSERT INTO #IOStallSnapshot (CaptureDate,
read_per_ms,
write_per_ms,
num_of_bytes_written,
num_of_reads,
num_of_writes,
database_id,
[file_id])
SELECT getdate(),
a.io_stall_read_ms,
a.io_stall_write_ms,
a.num_of_bytes_written,
a.num_of_reads,
a.num_of_writes,
a.database_id,
a.file_id
FROM sys.dm_io_virtual_file_stats (NULL, NULL) a
JOIN sys.master_files b ON a.file_id = b.file_id
AND a.database_id = b.database_id
/* Wait a few minutes and get final snapshot */
WAITFOR DELAY @WaitTimeSec
INSERT INTO #IOStallResult (CaptureDate,
read_per_ms,
write_per_ms,
num_of_bytes_written,
num_of_reads,
num_of_writes,
database_id,
[file_id])
SELECT getdate(),
a.io_stall_read_ms,
a.io_stall_write_ms,
a.num_of_bytes_written,
a.num_of_reads,
a.num_of_writes,
a.database_id,
a.[file_id]
FROM sys.dm_io_virtual_file_stats (NULL, NULL) a
JOIN sys.master_files b ON a.[file_id] = b.[file_id]
AND a.database_id = b.database_id
/* Get differences between captures */
SELECT
inline.CaptureDate
,CASE WHEN inline.num_of_reads =0 THEN 0
ELSE inline.io_stall_read_ms / inline.num_of_reads END AS read_per_ms
,CASE WHEN inline.num_of_writes = 0 THEN 0
ELSE inline.io_stall_write_ms / inline.num_of_writes END AS write_per_ms
,inline.io_stall_read_ms
,inline.io_stall_write_ms
,inline.num_of_reads
,inline.num_of_writes
,inline.num_of_bytes_written
,(inline.num_of_reads + inline.num_of_writes) / @WaitTimeSec AS iops
,inline.database_id
,inline.[file_id]
FROM (
SELECT r.CaptureDate
,r.read_per_ms - s.read_per_ms AS io_stall_read_ms
,r.num_of_reads - s.num_of_reads AS num_of_reads
,r.write_per_ms - s.write_per_ms AS io_stall_write_ms
,r.num_of_writes - s.num_of_writes AS num_of_writes
,r.num_of_bytes_written - s.num_of_bytes_written AS num_of_bytes_written
,r.database_id AS database_id
,r.[file_id] AS [file_id]
FROM #IOStallSnapshot s
JOIN #IOStallResult r
ON (s.database_id = r.database_id and s.[file_id] = r.[file_id])
) inline
The next few questions you might have after capturing these metrics includes how do I automate capturing these disk metrics similar to perfmon? Can I setup a parameter to be used as the normal wait period and also supply an interval for how long I would like to capture data to establish my baseline? Or better yet, can I capture data when a workload is not performing as expected and compare it to the baseline established when the workload performance was good?
The answer to theses questions is YES! Below is the code for my stored procedure to capture disk latency, IOPs and bytes written.
Download SQL Server Disk Latency Stored Procedure.
/*
Author: John Sterrett (https://johnsterrett.com)
NOTICE: This code is provided as-is run it at your own risk. John Sterrett assumes no responsibility
for you running this script.
GOAL: Get latency and IOPS for each data file, keep meta data in lookup table, results in another table.
PARAM: @WaitTime - time in seconds to wait between baselines
@Length - Amount of time to baseline, if null then don't stop
VERSION:
1.0 - 01/03/2012 - Original release
Includes two lookup tables for datafiles and runs
1.1 - 02/08/2012 - Includes computed column to get IOPs per datafile.
1.2 - 11/1/2013 - Changes IOPs so its not computed and has right data.
Missing Features: If you would like something added please follow up at https://johnsterrett.com/contact
-- Code to pull and update file path as needed
*/
/* Create tables */
CREATE SCHEMA DiskLatency
Go
CREATE TABLE DiskLatency.DatabaseFiles (
[ServerName] varchar(500),
[DatabaseName] varchar(500),
[LogicalFileName] varchar(500),
[Database_ID] int,
[File_ID] int
)
CREATE CLUSTERED INDEX idx_DiskLatency_DBID_FILE_ID ON DiskLatency.DatabaseFiles (Database_ID, File_ID)
CREATE TABLE DiskLatency.CaptureData (
ID bigint identity PRIMARY KEY,
StartTime datetime,
EndTime datetime,
ServerName varchar(500),
PullPeriod int
)
CREATE TABLE DiskLatency.CaptureResults (
CaptureDate datetime,
read_per_ms float,
write_per_ms float,
io_stall_read int,
io_stall_write int,
num_of_reads int,
num_of_writes int,
num_of_bytes_written bigint,
iops int,
database_id int,
file_id int,
CaptureDataID bigint
)
CREATE CLUSTERED INDEX [idx_CaptureResults_CaptureDate] ON [DiskLatency].[CaptureResults]
( [CaptureDate] DESC)
CREATE NONCLUSTERED INDEX idx_CaptureResults_DBID_FileID ON DiskLatency.CaptureResults (database_id, file_id)
CREATE NONCLUSTERED INDEX idx_CaptureResults_CaptureDataID ON DiskLatency.CaptureResults (CaptureDataId)
ALTER TABLE DiskLatency.CaptureResults ADD CONSTRAINT FK_CaptureResults_CaptureData FOREIGN KEY
( CaptureDataID) REFERENCES DiskLatency.CaptureData
( ID )
GO
CREATE PROCEDURE DiskLatency.usp_CollectDiskLatency
-- Add the parameters for the stored procedure here
@WaitTimeSec INT = 60,
@StopTime DATETIME = NULL
AS
BEGIN
DECLARE @CaptureDataID int
/* Check that stopdate is greater than current time. If not, throw error! */
/* If temp tables exist drop them. */
IF OBJECT_ID('tempdb..#IOStallSnapshot') IS NOT NULL
BEGIN
DROP TABLE #IOStallSnapshot
END
IF OBJECT_ID('tempdb..#IOStallResult') IS NOT NULL
BEGIN
DROP TABLE #IOStallResult
END
/* Create temp tables for capture baseline */
CREATE TABLE #IOStallSnapshot(
CaptureDate datetime,
read_per_ms float,
write_per_ms float,
num_of_bytes_written bigint,
num_of_reads bigint,
num_of_writes bigint,
database_id int,
file_id int
)
CREATE TABLE #IOStallResult(
CaptureDate datetime,
read_per_ms float,
write_per_ms float,
num_of_bytes_written bigint,
num_of_reads bigint,
num_of_writes bigint,
database_id int,
file_id int
)
DECLARE @ServerName varchar(300)
SELECT @ServerName = convert(nvarchar(128), serverproperty('servername'))
/* Insert master record for capture data */
INSERT INTO DiskLatency.CaptureData (StartTime, EndTime, ServerName,PullPeriod)
VALUES (GETDATE(), NULL, @ServerName, @WaitTimeSec)
SELECT @CaptureDataID = SCOPE_IDENTITY()
/* Do lookup to get property data for all database files to catch any new ones if they exist */
INSERT INTO DiskLatency.DatabaseFiles ([ServerName],[DatabaseName],[LogicalFileName],[Database_ID],[File_ID])
SELECT @ServerName, DB_NAME(database_id), name, database_id, [FILE_ID]
FROM sys.master_files mf
WHERE NOT EXISTS
(
SELECT 1
FROM DiskLatency.DatabaseFiles df
WHERE df.Database_ID = mf.database_id AND df.[File_ID] = mf.[File_ID]
)
/* Loop through until time expires */
IF @StopTime IS NULL
SET @StopTime = DATEADD(hh, 1, getdate())
WHILE GETDATE() < @StopTime
BEGIN
/* Get baseline snapshot of stalls */
INSERT INTO #IOStallSnapshot (CaptureDate,
read_per_ms,
write_per_ms,
num_of_bytes_written,
num_of_reads,
num_of_writes,
database_id,
[file_id])
SELECT getdate(),
a.io_stall_read_ms,
a.io_stall_write_ms,
a.num_of_bytes_written,
a.num_of_reads,
a.num_of_writes,
a.database_id,
a.file_id
FROM sys.dm_io_virtual_file_stats (NULL, NULL) a
JOIN sys.master_files b ON a.file_id = b.file_id
AND a.database_id = b.database_id
/* Wait a few minutes and get final snapshot */
WAITFOR DELAY @WaitTimeSec
INSERT INTO #IOStallResult (CaptureDate,
read_per_ms,
write_per_ms,
num_of_bytes_written,
num_of_reads,
num_of_writes,
database_id,
[file_id])
SELECT getdate(),
a.io_stall_read_ms,
a.io_stall_write_ms,
a.num_of_bytes_written,
a.num_of_reads,
a.num_of_writes,
a.database_id,
a.file_id
FROM sys.dm_io_virtual_file_stats (NULL, NULL) a
JOIN sys.master_files b ON a.file_id = b.file_id
AND a.database_id = b.database_id
INSERT INTO DiskLatency.CaptureResults (CaptureDataID,
CaptureDate,
read_per_ms,
write_per_ms,
io_stall_read,
io_stall_write,
num_of_reads,
num_of_writes,
num_of_bytes_written,
iops,
database_id,
[file_id])
SELECT @CaptureDataID
,inline.CaptureDate
,CASE WHEN inline.num_of_reads =0 THEN 0 ELSE inline.io_stall_read_ms / inline.num_of_reads END AS read_per_ms
,CASE WHEN inline.num_of_writes = 0 THEN 0 ELSE inline.io_stall_write_ms / inline.num_of_writes END AS write_per_ms
,inline.io_stall_read_ms
,inline.io_stall_write_ms
,inline.num_of_reads
,inline.num_of_writes
,inline.num_of_bytes_written
,(inline.num_of_reads + inline.num_of_writes) / @WaitTimeSec
,inline.database_id
,inline.[file_id]
FROM (
SELECT r.CaptureDate
,r.read_per_ms - s.read_per_ms AS io_stall_read_ms
,r.num_of_reads - s.num_of_reads AS num_of_reads
,r.write_per_ms - s.write_per_ms AS io_stall_write_ms
,r.num_of_writes - s.num_of_writes AS num_of_writes
,r.num_of_bytes_written - s.num_of_bytes_written AS num_of_bytes_written
,r.database_id AS database_id
,r.[file_id] AS [file_id]
FROM #IOStallSnapshot s
INNER JOIN #IOStallResult r ON (s.database_id = r.database_id and s.file_id = r.file_id)
) inline
TRUNCATE TABLE #IOStallSnapshot
TRUNCATE TABLE #IOStallResult
END -- END of WHILE
/* Update Capture Data meta-data to include end time */
UPDATE DiskLatency.CaptureData
SET EndTime = GETDATE()
WHERE ID = @CaptureDataID
END
GO
Now that we have our stored procedure ready to go. Here is a simple block of code that you can run or embed in a SQL Agent Job to collect your counters to measure disk latency for each individual database file. For this example, were going to collect for an hour and wait a minute between collections.
DECLARE @EndTime datetime, @WaitSeconds int
SELECT @EndTime = DATEADD(hh, 1, getdate()),
@WaitSeconds = 60
EXEC DiskLatency.usp_CollectDiskLatency
@WaitTimeSec = @WaitSeconds,
@StopTime = @EndTime
I hope you enjoyed this blog post on capturing the metrics needed to single out which database files you should focus on when you notice disk latency. Please check out my next blog post as I focus on some queries I might use to complete a review of the disk latency data that was collected from this blog post.