Where is my Availability Group?

In SQL Server 2012 we got this great new high availability feature called availability groups. With readable secondaries under the covers it can be harder to figure out the following two questions. When did the availability group failover? Where did the availability group go when the failover occurred? The goal of this blog post is to help you answer these questions.

AlwaysON Extended Event

One of the things I really like about Availability Groups is that there is a built-in extended event named “ALwaysOn_health” that runs and captures troubleshooting information. I took a look at the extended event and noticed that there are several error numbers that were included in the filter for this extended event. This is shown below as I scripted out the default extended event for a quick review.

CREATE EVENT SESSION [AlwaysOn_health] ON SERVER 
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.error_reported(
    WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) 
	OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) 
	OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) 
	OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) 
	OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) 
	OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
ADD EVENT sqlserver.lock_redo_blocked 
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

This got me interested in learning why these specific errors were included in the extended event session created specifically for managing Availability Groups. Knowing that the descriptions for errors are kept in the sys.messages table I did a little digging.

System Messages

Taking the error numbers from the AlwaysON_health extended event I was able to build the following query to get the description of the errors included in the extended event.

 SELECT * 
 FROM sys.messages m where language_id = 1033 -- English
 --AND m.message_id =1480
AND ([message_id]=(9691) OR [message_id]=(35204) OR [message_id]=(9693) OR [message_id]=(26024) OR [message_id]=(28047) 
	OR [message_id]=(26023) OR [message_id]=(9692) OR [message_id]=(28034) OR [message_id]=(28036) OR [message_id]=(28048) 
	OR [message_id]=(28080) OR [message_id]=(28091) OR [message_id]=(26022) OR [message_id]=(9642) OR [message_id]=(35201) 
	OR [message_id]=(35202) OR [message_id]=(35206) OR [message_id]=(35207) OR [message_id]=(26069) OR [message_id]=(26070) 
	OR [message_id]>(41047) AND [message_id]<(41056) OR [message_id]=(41142) OR [message_id]=(41144) OR [message_id]=(1480) 
	OR [message_id]=(823) OR [message_id]=(824) OR [message_id]=(829) OR [message_id]=(35264) OR [message_id]=(35265))
ORDER BY Message_id

Now we will focus on one particular error message. This is error message 1480. Looking at the description below you will see that every time a database included in an availability group or in database mirroring changes its role this error occurs.

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. This is an informational message only. No user action is required.

When did my AlwaysOn Availability Group Failover?

By now it should not be a big surprise to see how you can figure out when our availability group failed over. To answer this question we are going to filter the “AwaysOn_health” extended event for error_number 1480.

The “AlwaysOn_health” extended event target is to file and by default it will utilize the default log folder for SQL Server. Also keep in mind, that by default the target does rollover for 4  5 MB files for a total of 20 MB. If you are constantly having events occur data will be purged.

For my server used for this blog post my path is “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\” if this is not your path you will need to modify line 2 in the script below.

;WITH cte_HADR AS (SELECT object_name, CONVERT(XML, event_data) AS data
FROM sys.fn_xe_file_target_read_file('C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\AlwaysOn*.xel', null, null, null)
WHERE object_name = 'error_reported'
)

SELECT data.value('(/event/@timestamp)[1]','datetime') AS [timestamp],
	   data.value('(/event/data[@name=''error_number''])[1]','int') AS [error_number],
	   data.value('(/event/data[@name=''message''])[1]','varchar(max)') AS [message]
FROM cte_HADR
WHERE data.value('(/event/data[@name=''error_number''])[1]','int') = 1480

Below you will see an example of the result set which shows my last failover.

AGFailover

 

You could also utilize the Extended Event GUI to watch data. We will skip that today as I would recommend using T-SQL so you can find failovers in multiple Availability Groups on different servers. We will go into more detail about this process a little later in the blog post.

How Do We Become Proactive?

If you want an action to occur when an database inside an availability group changes roles to be proactive you can configure an SQL Agent Alert. An SQL Agent alert can performs an actions like sending an email to your DBA team or running another SQL Agent job to perform your required action.

The following shows you how to configure this alert via the SSMS user interface.

AGAlert

 

How Do We Report failovers across the Enterprise?

Central Management Server (CMS) is your best friend for building reports to show Availability Group failovers across the enterprise. You can build an CMS group for your SQL 2012 instances and copy and paste the query above to detect Availability Group failovers.

NOTE: This assumes you have an standard install process that keeps the default log path the same across your SQL Server 2012 instances. I strongly encourage that you have an automated SQL Install process that keeps using the same path for all your installs but we will keep that blog post for another day.

High Availability And Disaster Recovery Virtual Chapter Needs You!

Today I have some good news to share. David Klee  (blog twitter) and I are announcing a new SQL PASS Virtual Chapter dedicated to High Availability and Disaster Recovery topics. I am excited to work with David to build a virtual chapter from scratch.

When Does HADR Virtual Chapter Meet?

The schedule for this virtual chapter meetings is the second Tuesday of each month at 12pm Central time. Our first meeting will be held next Tuesday, February 11, at noon Central with Ryan Adams (blog | twitter) speaking on SQL Server 2012 AlwaysOn Availability Groups Live Deployment. To RSVP for this free event, register here at GoToWebinar!

Please follow the Twitter tag @SQLPASS_HADR for announcements of upcoming sessions, and check out the home page for the virtual chapter at hadrvc.sqlpass.org for details on the future sessions. Make sure to visit MyPASS dashboard and add the HADR virtual chapter to your MyChapters page so you get the announcements as they happen!

Are you interested in joining us building the HADR Virtual Chapter?

I have a feeling that David and I are not the only ones in the SQL Community interested in building a virtual chapter from scratch. We are currently looking for people to help host meetings, manage the web site, drive marketing, and finding speakers,  If you are interested in joining us contact me.

Throwback Thursday #4: Plan Cache

Welcome to the first Throwback Thursday blog post of  2014.  Throwback Thursday is a blog series where I dig deep into my evernote collection and find some great content on a single subject and share it with you.  In the fourth installment of the Throwback Thursday series we are going to cover the plan cache.

One of the features that is helpful for performance tuning in SQL Server is the plan cache. Many DBA’s understand that when you execute a stored procedure by default the execution plan gets cached so it can be reused. This allows CPU cycles to be saved as you don’t have to recompile stored procedures every time they are executed. What some DBA’s don’t know is that you can leverage the plan cache to get performance metrics. During this Throwback Thursday we are going to focus on some great community articles, white papers and blogs that can greatly improve your performance tuning process though using the plan cache.

Plan Caching in SQL Server 2008  - This is a great white paper by Microsoft that gives you a great introduction into what the plan cache is and how can make your performance tuning life easier.

How to Drop One Plan from Cache – A lot of DBA’s think that DBCC FREEPROCCACHE is used only to drop all the execution plans from your instance. Grant Fritchey does a great job showing how you can utilize this DBCC command to remove only a single plan.

Finding Top Offenders from Plan Cache – This is how I go about finding top offenders for frequently used execution plans. It’s not the only tool in my tool-belt for performance tuning but it can be a good starting point.

Finding Key Lookups in Plan Cache - At the end of the day, a execution plan is cached as XML. Kendal Van Dyke does a great job showing us how we can probe this XML structure to find key lookup operators inside of the plans that are in the plan cache.

SQL Server Plan Cache: Junk Drawer Your Queries – A great article by Tom LaRock that shows you how plans are stored, how to find plans with specific operators, and how to aggregate similar queries to get helpful usage statistics.

Plan cache, adhoc workloads and clearing the single-use plan cache bloat - Kimberly Tripp did a great job explaining how you can monitor and prevent adhoc workloads  from bloating your plan cache.

Can You Dig It? Plan Cache Series – I saved my personal favorite for last. Jason Strate did a great job providing an series of several  scripts that can be used for reviewing the plan cache.

T-SQL Tuesday #50: Automation for LazyDBAs!

T-SQL Tuesday is a monthly blog party hosted by a different blogger each month.

T-SQL Tuesday 50 - Automation

T-SQL Tuesday 50 – Automation

This blog party was started by Adam Machanic (blog|twitter). You can take part by posting your own participating post that fits the topic of the month. This month, SQLChow blessed us with a topic I am very passionate about. This months T-SQL Tuesday topic is automation.

I have a confession to share. I am a Lazy DBA. Those who know me won’t be shocked by reading this. Those who don’t know me. Trust me, I mean this in a good way.  My lazyness over the years has actually motivated me to be a better DBA and data professional.  I learned early on in my career that in order to be productive I must automate. No longer can we do manual daily checklists. We lose several hours that could have been spent on tasks that show our value not just hold the status quo. Automation allows us to end the cycle of repeating tasks and allows me to spend that time doing things that provide value, save the company money and make us happy.

Early Stages of Automation

Once in my career I was blessed with an opportunity to be a Database Administrator overseeing thousands of production databases. Quickly, I noticed there was no automated process for a daily checklist.  How did we know if a database backup failed due to low disk space? Hopefully, we got an email from the SQL Agent. Hopefully, someone remembered to setup an SQL Agent notification. I knew this wasn’t the answer. One of my first tasks was automating this whole process so we knew which databases passed and failed an automated daily checklist. I was able to leverage Policy Based Management (PBM) and Central Management Server (CMS) with Powershell to get this done. Little did I know it at the time, but this basic move changed my DBA Career. I got to speak at the PASS Summit in 2011 on how I evaluated my automated daily checklist against 1000+ servers during my morning coffee break. Starting to focus on performance I noticed a better way to pull this information without PBM. I build an automated framework using Powershell and CMS to automate the process to get my failed backups quicker. Still today meet DBA’s today who didn’t know you can automate your daily checklist only using native tools built into SQL Server.

Current Stages of Automation

Today, I am much more focused on performance and proactive monitoring. Learning from my past I knew I wanted to automate as much as possible. This didn’t change even though my core skills were changing.  In the past year I built some nice automated solutions that help me with performance tuning.  When I am in charge of a new instance I automate the process of monitor disk latency, proactively automate the process to monitor wait statistics. Once I have a good automated baseline I can drill deeper as needed. For example, I can find out which queries are causing my waits. It has gotten me to the point where most of the time I can find the root cause to SQL Server performance problems in ten minutes. The automated benchmark process does the heavy lifting for me so I can respond and stay as proactive as possible and provide value instead of running processes that should be automated.

 Future State of Automation

I see more things being automated.  More and more parts of the current “Production DBA” role as we know it today will be automated. This is going to open us up to doing amazing things. One day, an end to end performance tuning process will be automated.  I look forward to seeing things that we thought were not possible be possible and automated. For example, automating server procurement and deployment once fell into this realm. Now, it’s already here. It’s known as “the cloud”. I will be honest, I was shocked to see how easy and quick it is to deploy an Windows Azure Server.

What are your thoughts about automation? Where do you see it going in the future?

Throwback Thursday #3: SQL Server & Disks

I hope everyone is having a good time gearing up for the holidays.   Throwback Thursday is a bi-weekly blog series where I dig deep into my evernote collection and find some great content on a single subject and share it with you.  In the third installment of the Throwback Thursday series we are going to cover some helpful SQL Server disk articles.

IOPS Calculator – This is a great tool to figure out how many IOPS you should get from your storage configuration. It’s also a helpful tool for estimating how many disks are needed to support your workload if you are purchasing new storage.

The Fundamental of Storage Systems - We couldn’t talk about storage and SQL Server without mentioning my friend Wes Brown. He has a plethora of information on this subject including this is a great blog series that hits all the basics.

Storage Top 10 Best Practices – A great quick list of best practices for storage with SQL Server provide by Microsoft. Over time, I have seen all of these best practices be neglected. I neglected a few myself when I was starting out and I paid for it.

Benchmark SQL Server Disk Latency - The following is how I benchmark disk latency with SQL Server. Remember this is just for SQL Server so you also want to take a look at perfmon too to verify if SQL Server is the cause to your I/O problems.

Measuring Disk Latency with Windows Performance Monitor –  Perfmon (Windows Performance Monitor) is the gold standard for measuring disk performance inside of windows. Have you ever wondered what exactly is included in the stack when you look at Avg Disk reads/sec? Jeff Huges does a great job of answering this question and some others you might not be thinking about.

Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications - This white paper by Microsoft is a great guide for sizing your I/O characteristics.  Its a great reference guide to use to understand how to not only benchmark I/O but also understand how to understand how SQL does I/O and also how to size your storage structure correctly.

How It Works: Bob Dorr’s SQL Server I/O Presentation – Want to know how I/O works in SQL Server? This is article on CSS SQL Engineers blog by Bob Dorr is your best starting point. It also has tons of links for great additional reading on how SQL Server works with disks. For example, Bob’s write up on SQL 2000 I/O basics is still relevant today.

 

-

Throwback Thursday #2: Get Your Speak On!

I hope everyone is having a great Thanksgiving with their families. When you come out of that food coma I hope you enjoy my second blog post in the Throwback Thursday series. Throwback Thursday is a bi-weekly blog series where I dig deep into my evernote collection and find some great content on a single subject and share it with you.

This week were going to cover becoming a speaker and becoming a better speaker. I am still working on several tips included in these links. 

The 30 skills every IT person should have – Questioning if public speaking is helpful for you IT career? Infoworld lists three skills involved with giving presentations in the top five of its list of skills everyone should have.

Presenting Opens Doors - Kendal Van Dyke shares a great story on how presenting helped change his life. This is stuff you wont see on an ROI report.

You Don’t Have to Be an Expert To Speak - One of the first fears I had when I considered building presentations was that I wasn’t an expert.  After doing some thinking I decided to write why I  got up and present regardless of my experience.

Public Speaking: A Primer & Getting started in speaking publicly – clear and concise presentations - Paul Randal shared his thoughts on things you should think about as your working on that first presentation. His wife Kimberly Tripp also wrote a response post with some more goodies.  I forward these around quite a bit when I get asked, “How do you start speaking?”

51 Questions About Your #SQLPASS Summit Submission – Getting an abstract selected can be tough. Today, I still need all the help I can get drafting an abstract. I strongly recommend looking at this checklist. It  has a bunch of items you didn’t think about before you hit submit. Brent Ozar did a great job with this. If you like it check out his presentation links.

How To Successfully Deliver Presentations for Community Leaders and Professional SpeakersDan Stolts shares some great objectives that you should focus on when you start giving technical presentations. I am still working on some of these objectives.

Twenty Tips Better Conference Speaking – More awesome tips provided by Cameron Moll. I actually really enjoy the tip about managing expectations for your first presentation. We only get better through experience and you have to start somewhere.

Why You Should Create A Speakers Resume – If you have completed the hard task of building and giving a presentation make sure you get some credit. Kendra Little gives some great advice. At first I didn’t do this. Once I started I found out it helped me get some great jobs and more speaking gigs.

T-SQL Tuesday #41 – Presenting and Loving It – Bob Pusateri had a great concept for a T-SQL Tuesday topic. Go straight to the comments. They include links to your peers in the community who shared their why they love presenting. I really like this one, that one, oh and also this one too.

SQL PASS Speaker Resources – That’s right folks, our favorite community actually has some great resources including tips on starting, tools to record and more..

I’m A SQL Server Rockstar Blogger!

 

SQL Server Rockstar Blogger

SQL Server Rockstar Blogger

I have a quick secret to share with my readers. Ever since I saw Tom Larock’s SQL Server Blogger rankings I have always wanted to be on that list. Today, that happened. To many it might seem like a small accomplishment towards bigger accomplishments. Right now, I am just happy take a moment and realize I accomplished something I didn’t think was possible when I stared blogging.

When I started blogging my goals were simple. I wanted to improve my communication skills, document lessons learned so I could find them and share knowledge.  In reality, it has opened doors and given me opportunities I never thought would come my way.

Thanks Tom!

Throwback Thursday #1: SQL Inspire

What Is Throwback Thursday?

One of my hobbies is being a turntablist. In high school I took a job just so I could buy myself two Technics 1200′s, crates full of records and a mixer. I love scratching, beat juggling, and mixing accapella tracks over instrumentals. The radio stations I like would always have a turntablist  mix old school hits on Thursday. The show was called Throwback Thursdays.

Gregory Turnables

I am not the only turntablist in the family.

Being a database professional I learned to like backups. Backups didn’t just apply to database backups. I use evernote to backup blogs and articles that I found that are helpful for my career. Recently, this gave me a great idea to group them and share the ones that are old but still very valuable to my career. I am basically mixing some great content that you forgot about or never found that still applies as great reference material. If they are still helpful to me I bet they are helpful to others who frequently visit this blog.

#1 Throwback Topic: SQL Inspire

I had many different ideas for the first thursday throwback. I decided to go with my all-time favorite needle in the haystack. When I would go crate digging for a great instrumental to mix with a new hip hop song I would typically look for something that was unique and different.  When I think of conferences I found that with SQL Inspire. All presentations were recorded and they are still online. The concept is simple, try to inspire SQL people. Not only did Andy and Brian put on an event that inspired me, it also gave me my favorite SQL Server presentation of all-time. You can watch it below its only 20 minutes.

You can also catch the follow up interview. Tom’s prediction on the future of PASS two years ago is spot on. I think PASS accomplished some of those goals in just two years.

Here are all the SQL Inspire Presentations. I recommend that you watch them.

 

Speaking Today at Big Data Virtual Chapter Meeting

Today, I have the privilege of speaking at the SQL PASS Big Data Virtual Chapter Meeting at 2PM EST time. These meetings are free for everyone to attend. If you are new to Virtual Chapters I highly suggest that you check them out. All sessions are recorded so you can find several recordings for a similar topic.

You can signup for today’s session for free. It starts at 2PM EST (1PM CST) I hope to see you over there.

** If you cannot see the powerpoint below you can download it.

Also, all reference material included scripts to build database, scripts and research notes are included here.

SQL Server Performance Root Cause Analysis in 10 Minutes

This year I was honored to be selected by Dell Software to present a ten minute session in their booth (#200) at the 2013 SQL PASS Member Summit. I decided to share how I do a SQL Server Performance Root Cause Analysis in 10 minutes with the SQL Community.

The following is my blog series and it includes all the sample code:

If you are attending the 2013 SQL PASS Member Summit lets connect. You can catch my presentation in the Dell Software Theater at Booth #200 at the times listed below.

Dates and Times:

  • Wednesday – October 16th @ 11:45am
  • Thursday – October 17th @ 1:45pm
  • Thursday – October 17th @ 3:15pm
  • Friday – October 18th @ 12:45pm