Monthly Archives: September 2009

Get index fragmentation statistics

I recently attended a Pittsburgh SQL Server user group meeting where Brent Ozar gave a presentation on the silent performance killer.  This motivated me to create a stored procedure that could leverage the DMVs in SQL 2005/2008 to gather index fragmentation statistics for all databases on a given server.

Goal

The goal is very simple.  Build a query that could be scheduled to grab statistics that are helpful towards determining if an index needs to be defragged or reorganized.  I would like to throw these results into a table so I could analyze them at a later date.  I would also like to monitor the fill factor and padding to determine if I need to make changes and to analyze if the changes are really helpful.

Download Scripts

The following script uses the following DMV’s sys.dm_db_index_physical_stats, sys.objects and sys.indexes and this script is provided as is.

To download the script click here

To download the create table script for the table used click here


Table Definition

The following is an explanation of the columns.  The following descriptions come from MSDN.

Column Name

Description

databaseName

Name of database, unique within an instance of SQL Server.

objectName Object name.
indexName Name of the index. name is unique only within the object.

NULL = Heap

partitionNumber 1-based partition number within the owning object; a table, view, or index.

1 = Nonpartitioned index or heap.

fragmentation Logical fragmentation for indexes, or extent fragmentation for heaps in the IN_ROW_DATA allocation unit.

The value is measured as a percentage and takes into account multiple files. For definitions of logical and extent fragmentation, see Remarks.

0 for LOB_DATA and ROW_OVERFLOW_DATA allocation units.

NULL for heaps when mode = SAMPLED.

fill_factor > 0 = FILLFACTOR percentage used when the index was created or rebuilt.

0 = Default value

is_padded 1 = PADINDEX is ON.

0 = PADINDEX is OFF.

type_desc Description of index type:

HEAP
CLUSTERED
NONCLUSTERED
XML
SPATIAL

page_count Total number of index or data pages.

For an index, the total number of index pages in the current level of the b-tree in the IN_ROW_DATA allocation unit.

For a heap, the total number of data pages in the IN_ROW_DATA allocation unit.

For LOB_DATA or ROW_OVERFLOW_DATA allocation units, total number of pages in the allocation unit.

date this is the current date GETDATE()

Script

   1:  ALTER PROCEDURE [dbo].[GetStatsForIndexes]
   2:      @PageCount INT = 100
   3:  
   4:  AS
   5:  BEGIN
   6:      -- SET NOCOUNT ON added to prevent extra result sets from
   7:      -- interfering with SELECT statements.
   8:      SET NOCOUNT ON;
   9:  
  10:      -- Declare varables
  11:      DECLARE @dbID INT, @dbName VARCHAR(128), @SQL NVARCHAR(MAX)
  12:  
  13:      -- Create a temp table to store all active databases
  14:      CREATE TABLE #databaseList
  15:      (
  16:            databaseID        INT
  17:          , databaseName      VARCHAR(128)
  18:      );
  19:  
  20:      -- we only want non-system databases who are currenlty online
  21:      INSERT INTO #databaseList (databaseID, databaseName)
  22:      SELECT d.database_id, d.name FROM sys.databases d where d.[state] = 0 and d.database_id > 4
  23:  
  24:  
  25:      -- Loop through all databases 
  26:         WHILE (SELECT COUNT(*) FROM #databaseList) > 0  BEGIN
  27:  
  28:             -- get a database id
  29:          SELECT TOP 1 @dbID = databaseID, @dbName = databaseName
  30:          FROM #databaseList;
  31:  
  32:              SET @SQL = 'INSERT INTO DBA_Tools.dbo.IDX_FRAG (databaseName, ObjectName, indexName, partitionNumber, fragmentation, fill_factor, is_padded, type_desc, page_count, [date])
  33:                  SELECT
  34:                    db.name AS databaseName
  35:                  , obj.name AS ObjectName
  36:                  , idx.name AS indexName
  37:                  , ps.partition_number AS partitionNumber
  38:                  , ps.avg_fragmentation_in_percent AS fragmentation
  39:                  ,idx.fill_factor
  40:                  ,idx.is_padded
  41:                  ,idx.type_desc
  42:                  , ps.page_count
  43:                  , GETDATE() as [date]
  44:              FROM sys.databases db
  45:                INNER JOIN sys.dm_db_index_physical_stats ('+CAST(@dbID AS VARCHAR(10))+', NULL, NULL , NULL, N''Limited'') ps
  46:                    ON db.database_id = ps.database_id
  47:                INNER JOIN '+ @dbName+'.sys.objects obj ON obj.object_id = ps.object_id
  48:                INNER JOIN '+ @dbName+'.sys.indexes idx ON idx.index_id = ps.index_id AND idx.object_id = ps.object_id
  49:              WHERE ps.index_id > 0
  50:                 AND ps.page_count > 100
  51:              ORDER BY page_count desc
  52:              OPTION (MaxDop 1);'
  53:  
  54:          EXECUTE sp_executesql @SQL
  55:          -- remove the database from the databases table
  56:          DELETE FROM #databaseList WHERE databaseID = @dbID
  57:  
  58:          -- get the next database in the databases table
  59:          SELECT TOP 1 @dbID = databaseID, @dbName = databaseName
  60:          FROM #databaseList;
  61:  
  62:      END
  63:      -- temp table is no longer needed, so we will kill it.
  64:      DROP TABLE #databaseList;
  65:  END

Building a Data Warehouse with SQL Server

Today, I am going to start a series of blogs on one of my favorite topics Data Warehousing with SQL Server.  I will start by defining a data warehouse.  I will then give some business cases that explain the benefits of a data warehouse.  Finally, I will start writing about the built in tools SQL Server provides to implement a data warehouse with SQL Server.

What is a Data Warehouse?

In its simplest form a Data Warehouse is a way to store data information and facts in an format that is informational.  Hopefully, you were able to pull this information from the photos above.   Personally, I like to think of a Data Warehouse as a tool used by decision makers to improve decision‐making.

Business cases for a Data Warehouse

The following are several reasons business cases that explain how “insert company name here” can benefit from a data warehouse.

  • A Data Warehouse is fast to query. After the initial setup is complete, queries can run up to 1000% faster in an OLAP database than in an OLTP database.
  • A Data Warehouses improves ROI by allowing end users to make more efficient use of enterprise information so many companies have all the information they need.
  • A Data Warehouse is a good solution for application’s that are great for data entry but lacks in depth reporting and drilldown capabilities.
  • A Data Warehouse could be used to bring several applications and/or data sources together.
  • If you are a service company a data warehouse could be used to analyze work completed to estimate future flat fee engagements. (I can go on and on on these examples…)
  • If you already own SQL Server you can implement a data warehouse solution with the built in tools.  This means no additional cost for software is needed.

What tools are needed to Build a Data Warehouse?

Both SQL Server 2005 and SQL Server 2008 include three tools that are very helpful towards implementing a Data Warehouse. These three tools allow you to create an killer data warehouse.

  • Analysis Services (SSAS)
  • Integration Services (SSIS)
  • Reporting Services (SSRS).

We will go over these tools in great detail in future posts.  For now, let’s just think of SSAS as the tool that provides storage for the data used in cubes for your data warehouse. SSIS is the tool we will use to extract, transform and load (ETL) data into our data warehouse.  SSRS is a tool used to view the data in our data warehouse.

Next post : Introduction to Data Warehouse Dimensions

SQLSaturday Update & Future Presentations

I wanted to give a quick update and thank everyone who has responded to my initial posting about bringing SQL Saturday to Wheeling, WV.    I have spoken to a couple technology groups and several individuals and everyone is very supportive of this event.

The following are upcoming work items:

  1. Establish a budget
  2. Build a support team that will help with organizing the event.
  3. Find a location and pinpoint a date (tentatively First Quarter 2010.)

I look forward to blogging the status of these work items.

On another note, It looks like I will be presenting the following topic “Building a Data Warehouse with SQL Server” twice in November.  I will present first at the Greater Wheeling Chapter of AITP meeting on November 11th.  I will also give the same presentation at the Pittsburgh SQL Server User Group Meeting on November 16th.

You will soon see a series of blog post that go over building a data warehouse.  You can find the first one here.

Developers, use Profiler to profile yourself

Problem

You’re starting to work on an existing application that doesn’t have sufficient documentation.  You need to trace the current flow to understand and review all SQL calls during the lifecycle of the application (RPC:Complete and SQL:BatchCompleted).  You know that SQL Profiler can do this but you’re unsure how to make use of the filter options.

Solution

In order to solve the following problem you would want to use SQL Profiler.  Start a new trace and in the event selection deselect all events other than RPC:Completed and SQL:BatchCompleted.

Profile1

Next check the checkbox to show all columns so we can add a few columns that are left off by default.  Now scroll to the right and you will notice many columns that could be helpful.  For example, I see DatabaseName, HostName, and NTLoginName.  Place a check in the checkbox for these columns.

Now we will move along to filters.  Click on the Column Filters button.  Notice that the columns you just added are also candidates for filtering.  In this example I am going to select HostName and use my computer name “WHBV53YC1” (You could also use NTUserName and use your login name)

Profile2

Its that simple, you just configured the trace to only capture Stored Procedures and TSQL commands issued by yourself.

Converting a vertical table to horizontal table

Last week I received a request to convert a vertical table from a vendor application into a horizontal table.  There was one catch, the vertical table included text columns that needed to be pivoted horizontally.  The following was my plan for tackling this request.

  1. You must figure out how many columns are required for the worst case scenario in you horizontal table.  You can do this by multiplying the columns being pivoted by the rows in the vertical table. In this example you know the worst case is eight (see figure two.)  In this example we will assume that you will not know how many columns are needed.
  2. Now we will use dynamic sql to create our new table that will support the columns needed in the horizontal table.
  3. Next we will create a cursor that will loop through the vertical table to create and execute insert statements to populate the horizontal table.

Example of vertical table (Input)

ManagerName ManagerEmail Review Employee
Jack Wilson jwilson@comp.com 2009 Review John Sterrett
Jack Wilson jwilson@comp.com 2009 Review Bo Smith
Hank Reed hreed@comp.com 2008 Review John Sterrett
Jack Wilson jwilson@comp.com 2008 Review Chris Cupp

Figure 1 – The vertical table

The following is the horizontal table (output)

ManagerName ManagerEmail Review1 Employee1 Review2 Employee2 Review3 Employee3
Jack Wilson jwilson@comp.com 2009 Review John Sterrett 2009 Review Bo Smith 2008 Review John Sterrett
Hank Reed hreed@comp.com 2008 Review Chris Cupp NULL NULL NULL NULL

Figure 2 – The horizontal table

For this example we used two scripts.  The first script will create a vertical table and insert the sample data.  The second script populates the horizontal table and also prints out all scripts created dynamically to the message window.

If you have any questions please feel free to leave a comment . I will try to point you in the right direction.