Tag Archives: High Availability

Free Video for Accidental DBA’s

Recently, Mike Walsh and John Sterrett teamed up with Embarcadero to give a 60 minute presentation focusing on the skills that are necessary for accidental DBA’s to master to become DBA’s with a successful long lasting career.

I wish I had all this advice when I started as a SQL Server DBA.

Resources:

All Resources mentioned in the video to complete the tasks discussed can be found at http://linchpinpeople.com/where

Video

Upcoming Presentations

It’s an honor to always be able to give back to the SQL Server Community. I would never be where I am today with out others helping me along the way.  I look forward connecting and sharing with everyone at the following events.

When will my replica be synchronized?

Recently, I was pulled into a gig to help troubleshoot an windows failover cluster service issue that was hindering an availability group. Just to give a little background information its a two node windows cluster with two replicas the availability groups are using synchronous mirroring. Once, this was resolved we had a critical database that was behind on synchronization because one of the two replicas (cluster node) was down. This lead to the following question. When will my replica by synchronized?

This question was easy to answer with database mirroring. We could pull up the database mirroring monitor. We don’t have an database replica monitor GUI tool built into SSMS. Lucky for us, its not that hard with availability groups. We just have to use performance monitor. You will see below there is an perfmon collection called “SQLServer:Database Replica” that comes in handy for us.

When will replica be synchronized?

When will replica be synchronized?

Above, you can see that this secondary replica which is back online is synchronizing as its receiving log bytes and the Recovery Queue shown in the performance monitor is also reducing as the synchronization process is catching up. According to MSDN Recovery Queue is the Amount of log records in the log files of the secondary replica that has not yet been redone.

Unplanned Availability Group Failover

Whenever a disaster occurs it will be a stressful scenario regardless of how small or big the disaster is. This gets multiplied when it is your first time working with a newer technology or the first time you are going through a disaster without a proper run book. Today, were going to help you establish a run book for creating a planned failover with availability groups.

A Step By Step Guide for Unplanned SQL Server Availability Group Failover can be found here.

Learn High Availability and Disaster Recovery with SQL Server in a Day!

If you are near Pittsburgh, PA on May 22, 2014 catch the following all-day training session from 8:30 AM to 4:00 PM (EDT) 

Do you need to build a High Availability (HA) and/or Disaster Recovery (DR) strategy for SQL Server Databases? If so, you will no doubt have questions like:

•    What does a good High Availability or Disaster Recovery strategy look like?
•    What options are available to help implement a proper HA/DR strategy?
•    What do I do if my database server goes down?
•    How do I mitigate downtime for planned outages?
•    How much time do I have to recover, and which native SQL Server features helps me reach my goal?
•    How does this work with my company’s existing HA/DR strategy?

Linchpin People & SIOS are offering this seminar to answer those questions and more!
In this all-day training, we will help teach you the fundamentals and best practices of building and implementing a full HA/DR Strategy for your company.  You will learn how to build a recovery plan by mastering the basics of backups and restores. You will also learn how to implement and monitor log shipping, database mirroring, replication, Windows clustering, SQL Server Failover Clustering Instances, and the new Availability Group features. After this training session, you will have the knowledge to design, implement and manage your very own high availability and disaster recovery (HA/DR) plan.

Meet The Presenters!

High Availability and Disaster Recovery Cheap TrainingJohn Sterrett is a Group Principal and Sr. Consultant at Linchpin People. Previously, he was a Sr. Database Admin Advisor for Dell, directly responsible for several mission-critical databases behind dell.com. John has presented at many community events, is a PASS Regional Mentor, and one of the founders of the WVPASS user group and the PASS HA/DR Virtual Chapter. Tim RadneyTim Radney is the Lead System DBA for a top 40 US held bank. He is also a chapter leader for the Columbus GA SQL Users Group, PASS Regional Mentor for Greater South East US. Tim is a Microsoft Systems Admin turned DBA. Prior to becoming a full time DBA, he spent 10 years supporting Citrix, Novell, Windows, IIS, and MSSQL.

 Event Details:

Thursday May 22nd, 2014 – 8:30am-4:30pm (EDT)
Cost:  $79.00 before April 30th, $99.00 after April 30th
Location:
Microsoft Office
30 Isabella St., Second Floor
Alcoa Business Services Center
Pittsburgh, PA 15212
Phone: (412) 323-6700
Food:  Lunch, and drinks will be provided at the event location.  Please contact the organizer if you require a vegetarian option.

Where is my Availability Group?

In SQL Server 2012 we got this great new high availability feature called availability groups. With readable secondaries under the covers it can be harder to figure out the following two questions. When did the availability group failover? Where did the availability group go when the failover occurred? The goal of this blog post is to help you answer these questions.

AlwaysON Extended Event

One of the things I really like about Availability Groups is that there is a built-in extended event named “ALwaysOn_health” that runs and captures troubleshooting information. I took a look at the extended event and noticed that there are several error numbers that were included in the filter for this extended event. This is shown below as I scripted out the default extended event for a quick review.

CREATE EVENT SESSION [AlwaysOn_health] ON SERVER 
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.error_reported(
    WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) 
	OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) 
	OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) 
	OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) 
	OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) 
	OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
ADD EVENT sqlserver.lock_redo_blocked 
ADD TARGET package0.event_file(SET filename=N'AlwaysOn_health.xel',max_file_size=(5),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO

This got me interested in learning why these specific errors were included in the extended event session created specifically for managing Availability Groups. Knowing that the descriptions for errors are kept in the sys.messages table I did a little digging.

System Messages

Taking the error numbers from the AlwaysON_health extended event I was able to build the following query to get the description of the errors included in the extended event.

 SELECT * 
 FROM sys.messages m where language_id = 1033 -- English
 --AND m.message_id =1480
AND ([message_id]=(9691) OR [message_id]=(35204) OR [message_id]=(9693) OR [message_id]=(26024) OR [message_id]=(28047) 
	OR [message_id]=(26023) OR [message_id]=(9692) OR [message_id]=(28034) OR [message_id]=(28036) OR [message_id]=(28048) 
	OR [message_id]=(28080) OR [message_id]=(28091) OR [message_id]=(26022) OR [message_id]=(9642) OR [message_id]=(35201) 
	OR [message_id]=(35202) OR [message_id]=(35206) OR [message_id]=(35207) OR [message_id]=(26069) OR [message_id]=(26070) 
	OR [message_id]>(41047) AND [message_id]<(41056) OR [message_id]=(41142) OR [message_id]=(41144) OR [message_id]=(1480) 
	OR [message_id]=(823) OR [message_id]=(824) OR [message_id]=(829) OR [message_id]=(35264) OR [message_id]=(35265))
ORDER BY Message_id

Now we will focus on one particular error message. This is error message 1480. Looking at the description below you will see that every time a database included in an availability group or in database mirroring changes its role this error occurs.

The %S_MSG database “%.*ls” is changing roles from “%ls” to “%ls” because the mirroring session or availability group failed over due to %S_MSG. This is an informational message only. No user action is required.

When did my AlwaysOn Availability Group Failover?

By now it should not be a big surprise to see how you can figure out when our availability group failed over. To answer this question we are going to filter the “AwaysOn_health” extended event for error_number 1480.

The “AlwaysOn_health” extended event target is to file and by default it will utilize the default log folder for SQL Server. Also keep in mind, that by default the target does rollover for 4  5 MB files for a total of 20 MB. If you are constantly having events occur data will be purged.

For my server used for this blog post my path is “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\” if this is not your path you will need to modify line 2 in the script below.

;WITH cte_HADR AS (SELECT object_name, CONVERT(XML, event_data) AS data
FROM sys.fn_xe_file_target_read_file('C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log\AlwaysOn*.xel', null, null, null)
WHERE object_name = 'error_reported'
)

SELECT data.value('(/event/@timestamp)[1]','datetime') AS [timestamp],
	   data.value('(/event/data[@name=''error_number''])[1]','int') AS [error_number],
	   data.value('(/event/data[@name=''message''])[1]','varchar(max)') AS [message]
FROM cte_HADR
WHERE data.value('(/event/data[@name=''error_number''])[1]','int') = 1480

Below you will see an example of the result set which shows my last failover.

AGFailover

 

You could also utilize the Extended Event GUI to watch data. We will skip that today as I would recommend using T-SQL so you can find failovers in multiple Availability Groups on different servers. We will go into more detail about this process a little later in the blog post.

How Do We Become Proactive?

If you want an action to occur when an database inside an availability group changes roles to be proactive you can configure an SQL Agent Alert. An SQL Agent alert can performs an actions like sending an email to your DBA team or running another SQL Agent job to perform your required action.

The following shows you how to configure this alert via the SSMS user interface.

AGAlert

 

How Do We Report failovers across the Enterprise?

Central Management Server (CMS) is your best friend for building reports to show Availability Group failovers across the enterprise. You can build an CMS group for your SQL 2012 instances and copy and paste the query above to detect Availability Group failovers.

NOTE: This assumes you have an standard install process that keeps the default log path the same across your SQL Server 2012 instances. I strongly encourage that you have an automated SQL Install process that keeps using the same path for all your installs but we will keep that blog post for another day.

Cutover 30+ GB databases in 60 seconds with SQL Server 2005/2008

You kid brother just messed up a database migration.  You now have sixty seconds to migrate a 30 GB database or you kid brother is…

Okay I gave it a shot 🙂

If you are familiar with moving databases across servers most likely you are using one of the following methods.  Today I am going to write about what goes on during these methods with a focus on speed and provide another option that might work for you.

  • Detach/Attach
  • Copy Database Wizard
  • Backup/Restore
  • 3rd Party Tools

Detach/Attach

The detach/attach method allows you to detach and reattach databases. Therefore, in order to move a database from one server to another you have to detach the database, physically move the data and log files to the new server and then attach the database.  An unforeseen bottleneck with this strategy might be network latency. While this method is very straight forward and simple how long can the database be offline while you are moving the database files (mdf, ndf, ldf) during the detach/attach process?

Copy Database Wizard

The copy database wizard is a tool that leverages SQL Server Integration Services (SSIS) to copy a database from one server to another.  The account that the package uses at runtime has to have sysadmin role on both the source and destination instances.  You have two options during the copy process.  The first method is detach/attach see the paragraph above for feedback on using detach/attach.  The second method includes using SMO to script database objects.  This method keeps the source database online during the copy but is much slower than the detach/attach.  Therefore, I have no feedback on this method.  Have you used it? If so please add your comments.

Backup/Restore

Assuming you are using the Full recovery mode for your database the backup restore method for moving a database involves taking a full backup and a transactional log backup where you backup the tail of the log, and leave the database in restoring state.  This will take the source database offline keep the data in sync.  With restore time being a factor you could restore the full backup and do incremental transaction logs up to the point of cutting over assuming assuming no full backups occurred on your database while you started applying transactional backups.

And the winner is…

Drum roll please……..  And the winner is Database Mirroring in High Availability Mode as it can allow you to cutover failover to migrate huge databases in less than sixty seconds.  Okay I might be cheating, it will take much more than sixty seconds to configure. The important fact is that it will seam like it only took seconds to the end users. If your application uses .NET 2.0+ framework and you configure client side redirect the end users might not experience an outage at all.

To setup database mirroring it is highly recommended to make sure the principal and mirror database engine services are using service accounts, they also need access the the ports used by SQL endpoints, and more….  Check out the MSDN site for a great guide for Database Mirroring