4 things I wish I learned sooner as a DBA

This topic continues from an earlier blog written by Mike Walsh. I was tagged by a my good friend Tim Mitchell. Here is my take on 4 things I wished I learned earlier in my DBA Career. I hope they are helpful to you.

You Are not Alone

You are not alone in your database career. Even if you are the only SQL Server Professional in your company the SQL Server community is always there to help. I was in these shoes once and I stuck to myself and didn’t even know there was a community out there to help me grow and become a better DBA.  As SQL Server Professionals we have several free events like  local user groups, SQL Saturday’s and virtual chapters that give us the opportunity to connect, share and learn.

There are seven days in a week and someday isn’t one of them.

One of my mentors and friends Thomas LaRock gave a presentation a few years ago titled SQL Someday.  I still watch Tom’s presentation  occasionally to remind me of this very simple quote

There are seven days in a week and someday isn’t one of them.

The concept is very simple. There are somedays that will never happen. There are somedays that will happen. There are also somedays that have happened and you didn’t even know they happened. For someone who is goal oriented like myself it can be easy to only focus on that next goal. I wish I could go back in time and truly enjoy some of my great achievements instead of just pushing forward for the next goal. At the same time there are several things that are out of our control. I wish I spent less time focusing on things that are out of my control.

Work Life Balance

When your career is also your hobby you can easily forget to have a good work life balance.  I used to have two calendar reminders weekly to tell me to stop working. One was at 5pm and it said “Stop Working.” I had another at 6pm that said “REALLY, STOP WORKING NOW!”  Occasionally, I still run into problems with my work life balance but it is better than it use to be.

Soft Skills > Technical Skills

Through my career I have learned that soft skills are important. If you cannot work with people to solve business problems with technology you are not going to go far in your career.

 

When will my replica be synchronized?

Recently, I was pulled into a gig to help troubleshoot an windows failover cluster service issue that was hindering an availability group. Just to give a little background information its a two node windows cluster with two replicas the availability groups are using synchronous mirroring. Once, this was resolved we had a critical database that was behind on synchronization because one of the two replicas (cluster node) was down. This lead to the following question. When will my replica by synchronized?

This question was easy to answer with database mirroring. We could pull up the database mirroring monitor. We don’t have an database replica monitor GUI tool built into SSMS. Lucky for us, its not that hard with availability groups. We just have to use performance monitor. You will see below there is an perfmon collection called “SQLServer:Database Replica” that comes in handy for us.

When will replica be synchronized?

When will replica be synchronized?

Above, you can see that this secondary replica which is back online is synchronizing as its receiving log bytes and the Recovery Queue shown in the performance monitor is also reducing as the synchronization process is catching up. According to MSDN Recovery Queue is the Amount of log records in the log files of the secondary replica that has not yet been redone.

Unplanned Availability Group Failover

Whenever a disaster occurs it will be a stressful scenario regardless of how small or big the disaster is. This gets multiplied when it is your first time working with a newer technology or the first time you are going through a disaster without a proper run book. Today, were going to help you establish a run book for creating a planned failover with availability groups.

A Step By Step Guide for Unplanned SQL Server Availability Group Failover can be found here.

May Adventures!

This month is going to be an exciting month for me. I enjoy training and sharing my knowledge with others and this month I am going to have several opportunities to accomplish that goal.

This weekend, May 10th I will be giving two sessions at SQL Saturday in Houston. I will kick off the morning at 8:30 with my Proactive Monitoring with PBM and CMS presentation and close the day with Table Partitioning: the Secret Weapon for Your Big Data Problems. With Joey D’Antoni, Ryan Adams, Tamera Clark, and Kara in the house I also feel like it might be a PASS Regional Mentor reunion too.

I am making my first visit to Kansas City to present at the Kansas City Developer Conference. I will be giving the following two sessions SQL Server Performance Tuning with Free Tools on the 16th and Table Partitioning: Secret Weapon for Big Data Problems on the 17th.

Finally, I am heading back to my favorite city Wheeling, WV to speak at the Greater Wheeling Chapter of AITP on the 21st. I will be giving a First Look at Availability Groups. This trip will also include an presentation at the Pittsburgh SQL Server User Group on 20th about 12 Steps to Workload Tuning.  The month of training will conclude with the first All-Day training by LinchPin People in a series focused on the basics of High Availability and Disaster Recovery.

Its going to be a great month. I hope I get to catch up with you at one of the events.

Learn High Availability and Disaster Recovery with SQL Server in a Day!

If you are near Pittsburgh, PA on May 22, 2014 catch the following all-day training session from 8:30 AM to 4:00 PM (EDT) 

Do you need to build a High Availability (HA) and/or Disaster Recovery (DR) strategy for SQL Server Databases? If so, you will no doubt have questions like:

•    What does a good High Availability or Disaster Recovery strategy look like?
•    What options are available to help implement a proper HA/DR strategy?
•    What do I do if my database server goes down?
•    How do I mitigate downtime for planned outages?
•    How much time do I have to recover, and which native SQL Server features helps me reach my goal?
•    How does this work with my company’s existing HA/DR strategy?

Linchpin People & SIOS are offering this seminar to answer those questions and more!
In this all-day training, we will help teach you the fundamentals and best practices of building and implementing a full HA/DR Strategy for your company.  You will learn how to build a recovery plan by mastering the basics of backups and restores. You will also learn how to implement and monitor log shipping, database mirroring, replication, Windows clustering, SQL Server Failover Clustering Instances, and the new Availability Group features. After this training session, you will have the knowledge to design, implement and manage your very own high availability and disaster recovery (HA/DR) plan.

Meet The Presenters!

High Availability and Disaster Recovery Cheap TrainingJohn Sterrett is a Group Principal and Sr. Consultant at Linchpin People. Previously, he was a Sr. Database Admin Advisor for Dell, directly responsible for several mission-critical databases behind dell.com. John has presented at many community events, is a PASS Regional Mentor, and one of the founders of the WVPASS user group and the PASS HA/DR Virtual Chapter. Tim RadneyTim Radney is the Lead System DBA for a top 40 US held bank. He is also a chapter leader for the Columbus GA SQL Users Group, PASS Regional Mentor for Greater South East US. Tim is a Microsoft Systems Admin turned DBA. Prior to becoming a full time DBA, he spent 10 years supporting Citrix, Novell, Windows, IIS, and MSSQL.

 Event Details:

Thursday May 22nd, 2014 – 8:30am-4:30pm (EDT)
Cost:  $79.00 before April 30th, $99.00 after April 30th
Location:
Microsoft Office
30 Isabella St., Second Floor
Alcoa Business Services Center
Pittsburgh, PA 15212
Phone: (412) 323-6700
Food:  Lunch, and drinks will be provided at the event location.  Please contact the organizer if you require a vegetarian option.

Where is my Availability Group?

In SQL Server 2012 we got this great new high availability feature called availability groups. With readable secondaries under the covers it can be harder to figure out the following two questions. When did the availability group failover? Where did the availability group go when the failover occurred? The goal of this blog post is to help you answer these questions.

AlwaysON Extended Event

One of the things I really like about Availability Groups is that there is a built-in extended event named “ALwaysOn_health” that runs and captures troubleshooting information. I took a look at the extended event and noticed that there are several error numbers that were included in the filter for this extended event. This is shown below as I scripted out the default extended event for a quick review.

CREATE EVENT SESSION [AlwaysOn_health] ON SERVER 
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.error_reported(
    WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) 
	OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) 
	OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) 
	OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) 
	OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) 
	OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
ADD EVENT sqlserver.lock_redo_blocked 
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

This got me interested in learning why these specific errors were included in the extended event session created specifically for managing Availability Groups. Knowing that the descriptions for errors are kept in the sys.messages table I did a little digging.

System Messages

Taking the error numbers from the AlwaysON_health extended event I was able to build the following query to get the description of the errors included in the extended event.

 SELECT * 
 FROM sys.messages m where language_id = 1033 -- English
 --AND m.message_id =1480
AND ([message_id]=(9691) OR [message_id]=(35204) OR [message_id]=(9693) OR [message_id]=(26024) OR [message_id]=(28047) 
	OR [message_id]=(26023) OR [message_id]=(9692) OR [message_id]=(28034) OR [message_id]=(28036) OR [message_id]=(28048) 
	OR [message_id]=(28080) OR [message_id]=(28091) OR [message_id]=(26022) OR [message_id]=(9642) OR [message_id]=(35201) 
	OR [message_id]=(35202) OR [message_id]=(35206) OR [message_id]=(35207) OR [message_id]=(26069) OR [message_id]=(26070) 
	OR [message_id]>(41047) AND [message_id]<(41056) OR [message_id]=(41142) OR [message_id]=(41144) OR [message_id]=(1480) 
	OR [message_id]=(823) OR [message_id]=(824) OR [message_id]=(829) OR [message_id]=(35264) OR [message_id]=(35265))
ORDER BY Message_id

Now we will focus on one particular error message. This is error message 1480. Looking at the description below you will see that every time a database included in an availability group or in database mirroring changes its role this error occurs.

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. This is an informational message only. No user action is required.

When did my AlwaysOn Availability Group Failover?

By now it should not be a big surprise to see how you can figure out when our availability group failed over. To answer this question we are going to filter the “AwaysOn_health” extended event for error_number 1480.

The “AlwaysOn_health” extended event target is to file and by default it will utilize the default log folder for SQL Server. Also keep in mind, that by default the target does rollover for 4  5 MB files for a total of 20 MB. If you are constantly having events occur data will be purged.

For my server used for this blog post my path is “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\” if this is not your path you will need to modify line 2 in the script below.

;WITH cte_HADR AS (SELECT object_name, CONVERT(XML, event_data) AS data
FROM sys.fn_xe_file_target_read_file('C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\AlwaysOn*.xel', null, null, null)
WHERE object_name = 'error_reported'
)

SELECT data.value('(/event/@timestamp)[1]','datetime') AS [timestamp],
	   data.value('(/event/data[@name=''error_number''])[1]','int') AS [error_number],
	   data.value('(/event/data[@name=''message''])[1]','varchar(max)') AS [message]
FROM cte_HADR
WHERE data.value('(/event/data[@name=''error_number''])[1]','int') = 1480

Below you will see an example of the result set which shows my last failover.

AGFailover

 

You could also utilize the Extended Event GUI to watch data. We will skip that today as I would recommend using T-SQL so you can find failovers in multiple Availability Groups on different servers. We will go into more detail about this process a little later in the blog post.

How Do We Become Proactive?

If you want an action to occur when an database inside an availability group changes roles to be proactive you can configure an SQL Agent Alert. An SQL Agent alert can performs an actions like sending an email to your DBA team or running another SQL Agent job to perform your required action.

The following shows you how to configure this alert via the SSMS user interface.

AGAlert

 

How Do We Report failovers across the Enterprise?

Central Management Server (CMS) is your best friend for building reports to show Availability Group failovers across the enterprise. You can build an CMS group for your SQL 2012 instances and copy and paste the query above to detect Availability Group failovers.

NOTE: This assumes you have an standard install process that keeps the default log path the same across your SQL Server 2012 instances. I strongly encourage that you have an automated SQL Install process that keeps using the same path for all your installs but we will keep that blog post for another day.

High Availability And Disaster Recovery Virtual Chapter Needs You!

Today I have some good news to share. David Klee  (blog twitter) and I are announcing a new SQL PASS Virtual Chapter dedicated to High Availability and Disaster Recovery topics. I am excited to work with David to build a virtual chapter from scratch.

When Does HADR Virtual Chapter Meet?

The schedule for this virtual chapter meetings is the second Tuesday of each month at 12pm Central time. Our first meeting will be held next Tuesday, February 11, at noon Central with Ryan Adams (blog | twitter) speaking on SQL Server 2012 AlwaysOn Availability Groups Live Deployment. To RSVP for this free event, register here at GoToWebinar!

Please follow the Twitter tag @SQLPASS_HADR for announcements of upcoming sessions, and check out the home page for the virtual chapter at hadrvc.sqlpass.org for details on the future sessions. Make sure to visit MyPASS dashboard and add the HADR virtual chapter to your MyChapters page so you get the announcements as they happen!

Are you interested in joining us building the HADR Virtual Chapter?

I have a feeling that David and I are not the only ones in the SQL Community interested in building a virtual chapter from scratch. We are currently looking for people to help host meetings, manage the web site, drive marketing, and finding speakers,  If you are interested in joining us contact me.

Throwback Thursday #4: Plan Cache

Welcome to the first Throwback Thursday blog post of  2014.  Throwback Thursday is a blog series where I dig deep into my evernote collection and find some great content on a single subject and share it with you.  In the fourth installment of the Throwback Thursday series we are going to cover the plan cache.

One of the features that is helpful for performance tuning in SQL Server is the plan cache. Many DBA’s understand that when you execute a stored procedure by default the execution plan gets cached so it can be reused. This allows CPU cycles to be saved as you don’t have to recompile stored procedures every time they are executed. What some DBA’s don’t know is that you can leverage the plan cache to get performance metrics. During this Throwback Thursday we are going to focus on some great community articles, white papers and blogs that can greatly improve your performance tuning process though using the plan cache.

Plan Caching in SQL Server 2008  – This is a great white paper by Microsoft that gives you a great introduction into what the plan cache is and how can make your performance tuning life easier.

How to Drop One Plan from Cache – A lot of DBA’s think that DBCC FREEPROCCACHE is used only to drop all the execution plans from your instance. Grant Fritchey does a great job showing how you can utilize this DBCC command to remove only a single plan.

Finding Top Offenders from Plan Cache – This is how I go about finding top offenders for frequently used execution plans. It’s not the only tool in my tool-belt for performance tuning but it can be a good starting point.

Finding Key Lookups in Plan Cache – At the end of the day, a execution plan is cached as XML. Kendal Van Dyke does a great job showing us how we can probe this XML structure to find key lookup operators inside of the plans that are in the plan cache.

SQL Server Plan Cache: Junk Drawer Your Queries – A great article by Tom LaRock that shows you how plans are stored, how to find plans with specific operators, and how to aggregate similar queries to get helpful usage statistics.

Plan cache, adhoc workloads and clearing the single-use plan cache bloat - Kimberly Tripp did a great job explaining how you can monitor and prevent adhoc workloads  from bloating your plan cache.

Can You Dig It? Plan Cache Series – I saved my personal favorite for last. Jason Strate did a great job providing an series of several  scripts that can be used for reviewing the plan cache.

T-SQL Tuesday #50: Automation for LazyDBAs!

T-SQL Tuesday is a monthly blog party hosted by a different blogger each month.

T-SQL Tuesday 50 - Automation

T-SQL Tuesday 50 – Automation

This blog party was started by Adam Machanic (blog|twitter). You can take part by posting your own participating post that fits the topic of the month. This month, SQLChow blessed us with a topic I am very passionate about. This months T-SQL Tuesday topic is automation.

I have a confession to share. I am a Lazy DBA. Those who know me won’t be shocked by reading this. Those who don’t know me. Trust me, I mean this in a good way.  My lazyness over the years has actually motivated me to be a better DBA and data professional.  I learned early on in my career that in order to be productive I must automate. No longer can we do manual daily checklists. We lose several hours that could have been spent on tasks that show our value not just hold the status quo. Automation allows us to end the cycle of repeating tasks and allows me to spend that time doing things that provide value, save the company money and make us happy.

Early Stages of Automation

Once in my career I was blessed with an opportunity to be a Database Administrator overseeing thousands of production databases. Quickly, I noticed there was no automated process for a daily checklist.  How did we know if a database backup failed due to low disk space? Hopefully, we got an email from the SQL Agent. Hopefully, someone remembered to setup an SQL Agent notification. I knew this wasn’t the answer. One of my first tasks was automating this whole process so we knew which databases passed and failed an automated daily checklist. I was able to leverage Policy Based Management (PBM) and Central Management Server (CMS) with Powershell to get this done. Little did I know it at the time, but this basic move changed my DBA Career. I got to speak at the PASS Summit in 2011 on how I evaluated my automated daily checklist against 1000+ servers during my morning coffee break. Starting to focus on performance I noticed a better way to pull this information without PBM. I build an automated framework using Powershell and CMS to automate the process to get my failed backups quicker. Still today meet DBA’s today who didn’t know you can automate your daily checklist only using native tools built into SQL Server.

Current Stages of Automation

Today, I am much more focused on performance and proactive monitoring. Learning from my past I knew I wanted to automate as much as possible. This didn’t change even though my core skills were changing.  In the past year I built some nice automated solutions that help me with performance tuning.  When I am in charge of a new instance I automate the process of monitor disk latency, proactively automate the process to monitor wait statistics. Once I have a good automated baseline I can drill deeper as needed. For example, I can find out which queries are causing my waits. It has gotten me to the point where most of the time I can find the root cause to SQL Server performance problems in ten minutes. The automated benchmark process does the heavy lifting for me so I can respond and stay as proactive as possible and provide value instead of running processes that should be automated.

 Future State of Automation

I see more things being automated.  More and more parts of the current “Production DBA” role as we know it today will be automated. This is going to open us up to doing amazing things. One day, an end to end performance tuning process will be automated.  I look forward to seeing things that we thought were not possible be possible and automated. For example, automating server procurement and deployment once fell into this realm. Now, it’s already here. It’s known as “the cloud”. I will be honest, I was shocked to see how easy and quick it is to deploy an Windows Azure Server.

What are your thoughts about automation? Where do you see it going in the future?

Throwback Thursday #3: SQL Server & Disks

I hope everyone is having a good time gearing up for the holidays.   Throwback Thursday is a bi-weekly blog series where I dig deep into my evernote collection and find some great content on a single subject and share it with you.  In the third installment of the Throwback Thursday series we are going to cover some helpful SQL Server disk articles.

IOPS Calculator – This is a great tool to figure out how many IOPS you should get from your storage configuration. It’s also a helpful tool for estimating how many disks are needed to support your workload if you are purchasing new storage.

The Fundamental of Storage Systems – We couldn’t talk about storage and SQL Server without mentioning my friend Wes Brown. He has a plethora of information on this subject including this is a great blog series that hits all the basics.

Storage Top 10 Best Practices – A great quick list of best practices for storage with SQL Server provide by Microsoft. Over time, I have seen all of these best practices be neglected. I neglected a few myself when I was starting out and I paid for it.

Benchmark SQL Server Disk Latency - The following is how I benchmark disk latency with SQL Server. Remember this is just for SQL Server so you also want to take a look at perfmon too to verify if SQL Server is the cause to your I/O problems.

Measuring Disk Latency with Windows Performance Monitor –  Perfmon (Windows Performance Monitor) is the gold standard for measuring disk performance inside of windows. Have you ever wondered what exactly is included in the stack when you look at Avg Disk reads/sec? Jeff Huges does a great job of answering this question and some others you might not be thinking about.

Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications – This white paper by Microsoft is a great guide for sizing your I/O characteristics.  Its a great reference guide to use to understand how to not only benchmark I/O but also understand how to understand how SQL does I/O and also how to size your storage structure correctly.

How It Works: Bob Dorr’s SQL Server I/O Presentation – Want to know how I/O works in SQL Server? This is article on CSS SQL Engineers blog by Bob Dorr is your best starting point. It also has tons of links for great additional reading on how SQL Server works with disks. For example, Bob’s write up on SQL 2000 I/O basics is still relevant today.

 

-