-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
1/40
This video is part of the Microsoft Virtual Academy.
1
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
2/40
In this session we are going to be diving deeper into understanding Microsoft's high availability
solutions. Part one of this series look to the application infrastructure meaning failover clustering,virtualization and some of the other key infrastructure components.
Part two is going to look at the applications which run on top of this infrastructure. Were going to
spend most of our time looking at SQL Server, Exchange Server and then briefly cover some of the
other server high availability solutions.
Im Symon Perriman and Im going to be joined by SQL program manager Justin Erickson and
Exchange technical writer Scott Schnoll in this session.
2
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
3/40
Learn about Microsofts different High Availability technologies and when to use each of them.
High availability is important because it keeps our applications up and running not only for
availability but also to make sure that our customers are happy, by maintaining continual service we
can keep our customers connected in a 24/7 marketplace. This session will specifically focus on the
application layer, weve covered the core infrastructure in part one of this video session and part
three will look at the management focusing on System Center.
16-Nov-11
3
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
4/40
Im now going to turn it over to Justin Erickson, Senior Program Manager with the SQL team. Justin.
Justin: Hello everyone Im Justin Erickson, Im a program manager in the SQL Server database engine
team.
4
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
5/40
So lets quickly go through the introduction to each one of the technologies, if you guys have
questions theres sessions through the SQL Server track that goes into more detail in a lot of the high
availability technologies.
Key thing that Id want to point out is when you look at what comprises database downtime theres
two big portions; you have unplanned downtime where I actually have a failure, user caused an issue
where I have to move to a different system and theres also planned downtime where Im doing an
application upgrade, Im doing a patch or Im just trying to maintain the escalades that I need for my
system through put so we look at all of these drivers as we look into what makes SQL Server
availability technology.
5
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
6/40
And so the gamut of technologies that we have looking at existing releases as well as whats coming
up in the SQL Server Denali release through always on is listed over here. So Ill walk through each
one of these and talk about what does each technology build on based on the previous one and
youll see that theres a sequence of looking at back up and restore, log shipping, database mirroring
which is sort of the same technology being built incrementally to give you better SLAs. As well as
technologies like replication which sort of fit into this space and failover clustering instances which
use a lower level of data protection with the SANs and shared storage and SQL Server along with the
shared storage. And then well end talking about some of the ways to manage downtime as well
which is a majority of your downtime that youll see.
So back and restore is the most basic technology. Regardless of what technology youre using on top
of it its always a good idea to have a physical back up of your database so you can go and recreatethe entire system from scratch should your high availability system go down, your entire data center
go down or whatever other issues that maybe you have some issues that you need to go back to a
point and time, back up and restore is your base set of technologies there.
6
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
7/40
When you look at the downtime though of the backup and restore solution, you have the backup
and if something goes down you need to use backup and restore process to get your system back upand running, youre now doing a full installation of that system and applying that restore maybe you
have the system there but you actually go into the restore from scratch there which if youre looking
at terra byte sized databases that could take you a good amount of time.
And thats where a system like SQL log shipping comes into place, where this is basically an
automated backup and restore process where you have transactions that are coming into your
primary system , you have log backups that are going on a periodic basis through the aging job
schedule and this guys just copying backups out. Theres another job that will copy the backups to
your local system and finally a third job that goes through the restore process, so this is basically
doing the backup and restore, not waiting for that failure but saying Im going to have the systemready to go, constantly going through this backup, the copy job and the restore job so when I have
that failure I have the system ready to go and just have to apply whatever logs I havent applied at
that time.
And theres a nice wizard with SQL Server management studio to help you set this up and determine
what sets of intervals you want to be able to configure this on based on your needs.
7
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
8/40
The next technology takes the log shipping which was the automated backup restore process and
builds it into the engine so this looks through how do I get that streaming log records and nowbecause its built into the engine we can go and do things like provide synchronous commits where I
can make sure my secondary system is fully up to date with the primary so when Im failing over
theres zero data loss going on. And the way that this works is your application is coming in,
committing a set of transactions at the time that we write the transactions locally to our log file
were sending it over to the secondary. And if youre in synchronous mode well write it over to the
log file on your secondary side, send back and ack and only then will we go and tell the application
that hey, your transaction has been committed. So this means that at any point and time when we
execute that failover my system is fully up to date with that primary. Of course you dont have to run
in synchronous mode, you can always run in asynchronous mode where this guys just sending logrecords over as fast as he can, not waiting for the back up and restore process but the primary will
continue to go ahead so Im not slowing down the workload to get that data that ends up being a
choice depending on what your SLA needs are.
8
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
9/40
Another technology which isnt really built to be a high availability technology but is often used in
high availability scenarios is replication and the reason why this is typically used is wanting to getextra utilization of that hardware. When Im using a system like log shipping or database mirroring I
have a mirror or secondary thats sitting there, waiting in the event of a failure and sometimes we
hear from a customer well if I have that hardware I also want to do something with it and theres
scenarios where customers have previously used replication in the past because that not only allows
you to send the data to your secondary but actually be able to read the data from the secondary as
well for doing reporting or offloading other sets of workloads. And Ill talk in a second how always
on availability groups takes away this need so as we go forward were simplifying our technology
stack.
9
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
10/40
So SQL Server always on in the upcoming release of SQL Server, we took a holistic look of what do
we do with high availability and figured how do we build an integrated, flexible, efficient, singlesolution to meet your high availability needs rather than the previous set of technologies that we put
together to build a solution there. And from that we came up with two main feature areas; we have
always on availability groups which provides you database protection with SQL Server is doing the
data replication and similar to database mirroring and log shipping. And we have always on failover
cluster instances which allows customers to use their existing infrastructure and provide data
protection at the lower layer of the hardware stack using the SAN and the shared storage to provide
the data protection and SQL Server failing over between these. Failover cluster instances is a
technology that existed in the previous releases but was enhanced in Denali with multi-side
clustering, flexible failover policy provides a better set up health detection and diagnosticinfrastructure as well as improved failure times with indirect checkpoints.
Always on availability is a new feature in Denali that replaces database mirroring, provides a multi-
database failover unit, multiple secondaries so I dont need a combined database mirroring and log
shipping as well as active secondaries that I can now read and provide backups for from the
secondary system so replication doesnt end up being in the high availability mix.
As we look to the additional features set we provided here we looked into provided an integrated HA
management solution.
10
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
11/40
So what is SQL Server failover cluster instances? This is built similar to what Symon went through
with the other technologies with always on failover cluster instances we use a shared disk to do thedata protection so each one of the machines is accessing the same sets of files and when were
failing over were moving the access over to that same data file to another machine and having SQL
Server start up on that side so on the SQL side were providing protection of the binaries processes
between machines or relying on external SAN technologies to provide protection of the database
files themselves.
11
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
12/40
WSFC = Windows Server Failover Clustering
WSFC vs. FCI
Scoping: No replicas on same node (Hyper-V)
WSFC for:
1. Primary selection and coordination
2. Primary health detection
3. Distributed changes and truth
Secondary health driven from primary (no impact to primary)
SQL Server always on availability groups uses SQL Server to provide the data protection, its still built
on top of Windows clustering to help us with the inter node health detection state configuration
changes across the system but does not rely on any SAN or shared storage infrastructure, that
SQL Servers providing the data protection levels. We have collections of databases moving
rather than the binaries services moving between them.
12
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
13/40
When we look at always on as a comprehensive solution its built to be able to meet combinations of
needs so in some cases youre looking at using shared storage and SANs for your data protection
within your data center and availability groups between data centers which is like the picture on the
right. In some cases you dont have any investments into shared storage and you want to use a
cheaper solution to provide as a faster failover and thats where always on availability groups comes
in. And so you can mix and match these technologies to meet your needs whatever they are.
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
14/40
Another common question that comes up is well what about virtualization, how does this fit into the
mix? Virtualization is often used in consolidation scenarios with SQL Server and virtualization on its
own does provide some high availability guarantees as well so when you look into virtualization you
need to look at both the planned and unplanned downtime, at the host as well as the virtualization
layer because virtualization will provide live migration where you can failover VMs between hosts
with zero downtime and so thats the best set of solutions there. If you have an unplanned failover,
an unplanned event at the host level, thats when youre failing over the entire VM and doing and OS
restart so provides some protection over there but youll have a slower recovery time. If you have
failures at the guest level thats where virtualization doesnt provide any set of protection there so if
I have database file corruption or the binaries themselves within that OS get corrupt for whatever
reason just using virtualization is not providing protection there, thats where youre falling back toback restore or you can use an additional technology at the guest level and they provide you the best
of both worlds in a solution. And similarly at the planned level when Im patching the guest OS
youre having downtime during the patch unless you have another technology within the guest OS to
provide you protection. So when you look at this when to use always on technology and when to
use a high availability technology if you look at these sets of requirements and your customers
looking at requirements and says this isnt enough, thats when its worth going and investing into
that complexity. If youre looking at these sets of requirements and meets your SLAs then its good
to stick with virtualization as your core technology rather than biting off the additional complexity
and adding in another solution to the guest and all our technologies will work through virtualization.
14
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
15/40
So that gives you a quick introduction to our unplanned downtime features.
When we look into planned downtime there are other things to consider, how do I handle the OS aswell as the SQL Server upgrades and thats where each one of the technologies has a rolling
upgrades story where I can upgrade the mirror, failover to a secondary, patch the old primary and
then fail back if I need to. Online operations is another key thing, if Im doing an application change
where Im actually changing the database structures or adding new data to the system, online
operations that are enhanced in Denali will allow you to make these changes without impacting the
currently running workloads and theres new enhancements in SQL Server Denali where we can do
more online index builds with log data types and big data types as well as adding in columns that are
non null able columns which were not previously available in the previous releases. Along with this
a lot of times you look into okay what are other sources that impact my SLAs if Im building my SLAsas a business Im looking not only what happens in the event of a failover when Im taking the system
down but is my system able to respond at the throughput that I need it to go and respond and thats
where resource governor is a great technology and allows you to throttle the workloads to reserve
the capacity for your core workloads so I can have my core system saying that I want to make sure
that I reserve 80% of my CPU for my core workload allowing lower priority workloads to still run onto
that same box but restricting that so its not going to be on a certain set of resources so thats where
resource governor is a great technology to be able to restrict your lower priority workloads from
impacting the SLAs of your most critical workloads.
15
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
16/40
Thats a little bit about flash introduction to SQL Server, now Ill hand it over to Scott to talk about
Exhange.
Scott: My name is Scott Schnoll, Im a Principal Technical Writer on the Exchange Team among other
things I write all the product documentation around high availability, site resilience, disaster recovery
and a few other areas so Im really excited to talk to you about it.
16
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
17/40
I do though want to tell you that Exchange does things a little differently from what youve heard
until now. We do use failover clustering technologies but we dont use any shared storage, we dont
use the resource model and in fact were just more of a consumer of clustered technologies as youll
see in a minute. We also have in Exchange a very specific definition of high availability okay so to
have true high availability for an Exchange Server you must have three criteria; you must service
availability, data availability and automatic recovery from most failures and we say most failures
because youre not going to get automatic recovery from all failures for example a data center level
type event you wouldnt get automatic recovery from them. We have mechanisms to do manual
recoveries for that but thats not an automatic solution and that would be a DR process, not a highly
available process.
The other thing I want to tell you about is we use this acronym called *overs a lot and that really isjust our short hand notation for switchovers and failovers. Failoversweve been talking about a lot,
failovers simply when the system takes the automatic corrective action for you, a switchover is when
an administrator manually activates for instance a passive copy of an exchange database.
And then we have site resilience as well, site resilience and HA they are unified into a single platform
inside of Exchange 2010 for example but there are different operations with different configurations
as youll see here in a minute, site resilience is that DR type configuration that you do to protect
yourself when you have multiple data centers and you want redundancy across those data centers.
17
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
18/40
Now we actually introduced both service availability and *over capabilities way, way back in
Exchange 5.5 but back in those days we were using Microsoft cluster server and NT 4, we were usingthe cluster resource model and many of our core components were cluster aware Exchange knew it
was being installed in a cluster and it did something a little different from an unclustered Exchange
Server.
We also back at that time relied very heavily on third party partner products, we didnt have any built
in data replication whatsoever so we had no native data availability in Exchange and instead relied on
hardware vendors, storage vendors, replication vendors to make copies of our data for us.
In Exchange 2007 we took a very revolutionary leap forward okay we started the model of breaking
away of doing the old legacy way of doing Exchange clustering. We still supported the old style of
Exchange clustering where you use shared storage but we gave that a different name, we called thata single copy cluster to reflect that in that cluster you only had one single copy of your data. So in
2007 we also introduced a second form of Exchange clustering called cluster continuous replication
and that in fact is when we introduced our continuous replication or what we call log shipping
technology.
18
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
19/40
We actually have three different forms of continuous replication in Exchange 2007, one is local
where youre just shipping a copy of the logs to the database thats connected to the same server as
your active copy, we also have cluster replication where every database you had on an active node
was being replicated up to a node and you always had them in pairs and then we had this one called
standby continuous replication that we introduced in service pack one for Exchange 2007 and what
that did was allow you to replicate data pretty much anywhere from a standalone mailbox server to
another standalone, from a cluster to a standby cluster and so forth and in fact it became as
Exchange 2007 evolved and matured it became pretty much the defacto configuration or
architecture to use a combination of cluster continuous replication for high availability within the
data center and standby continuous replication to get you site resilience for that data center as well.
And so this is basically what it looks like, the information stored in Exchange is doing what its donesince day one, generates log files and as those log files are closed theyre copied over to the other
copy of the database, theyre inspected by the other copy of the database and assuming they pass
inspection they then get replayed into that copy thereby making that copy pretty much an up to date
bit for bit duplicate of the original active copy.
19
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
20/40
Now this is typically what it would look like when you see it in the organization topology, here Ive got two
separate CCR clusters remember CCR was always a pair of two; an active and a passive so Ive got twoseparate clusters, Ive got some Outlook and Outlook web app, and Active Sync clients that are out theretheyre going through our front end component called a client access server in 2007 and later or in the case ofOutlook going directly to the information store and talking to it and basically we would replicate one for onewithin these pairs. If you wanted to extend that solution to another data center you used a separatetechnology, you used standby continuous replication and that actually worked really well but it had somechallenges with it. But it still worked, it got the data over there, you had a standby server, maybe a standbycluster and so if you had any problem with your primary site, in this case San Jose you could go ahead andactivate the Dallas site, get your clustered mailbox up and running and life was good.There were some challenges though when you clustered the mailbox role in 2007 it couldnt co-exist with anyother server roles or client access role the transport role, the unified messaging role, you had to buy extrahardware to do that, it only allowed you to use the mailbox role in the cluster. So that meant at a minimum ifyou wanted high availability for Exchange 2007 you had to buy at least four servers. Some of the other
challenges were you had to have some clustering knowledge okay and that might not seem like a big deal ifyouve been doing it for a long time but most of the administrators who managed Exchange solutions areExchange pros not cluster pros. And so sometimes it was challenging for them to build the underlying clustercorrectly before they would deploy Exchange. This wasnt so much true in the CCR paradigm but it wasespecially true in the other type of cluster we had in 2007 called the single copy cluster when you had to alsodeal with the shared storage and the interconnects and getting that just right. Some of the other challengeswas even though in 2007 we supported 50 databases per server if you had a problem with just a singledatabase on that server you had to failover the whole clustered mailbox server, the entire Exchange Serversnetwork identity had to be moved to another server even if you only had one problematic database out of 50so that wasnt very optimal.We did introduce SCR, finally people had a built in way to get data replicated outside of the cluster and offsiteto a different data center but we introduced it in a service pack and typically when we introduce majorfeatures in a service pack we dont put GUI around them, okay so that meant if you wanted to manage SCR
you had to do it all from the Exchange management shell which is a powershell based console that you had touse, you couldnt use the Exchange management console which is an MMC snapin and click on pictures andstuff so that meant administrators had to learn to manage CCR one way and manage SCR a completelydifferent way. And then the last challenge was even after you got the data over there, it was a pretty complexactivation process that you had to go through there were many, many steps that involved usurping theclustered mailbox server itself and forklifting it over to the recovery server that took time, for someadministrators it was confusing because of the different technologies and so we looked at all of this and cameup with a whole new solution in Exchange 2010.
20
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
21/40
And in fact Exchange 2010 is very different from anything that weve done in the past. First of all theres no
more clustered mailbox server okay, we dont use the cluster resource anymore or put slightly different, thecluster has no idea that were even there, but we know the clusters there because we use it, we use the
clusters node and membership APIs so that we can join the servers together in a group, we also use the
clusters heart beating technology which is very mature, and proven technology and will allow us to find out
when servers are dropping off the network. And of course we use the cluster database because theres data
that we need to share between the members and the solution and we need to share it very quickly much
more quickly then if we were to store it in Active Directory and wait for it to be replicated across.
So what we have now, what youre seeing is a representation of a new construct that we call a database
availability group, or DAG for short. A DAG is simply a collection of mailbox servers, in this case five mailbox
servers that host replicated databases so for example if you look at DB1 for example you can see that DB1 is
using green, on mailbox server 1, green means in this case its the active copy and then weve got DB1 onmailbox server 2 and a DB1 under mailbox server 4 that are in blue, those represent passive copies of the
database. Databases that the system keeps up and maintains itself and that are waiting to become active in
the case of some sort of failure affecting the actual active database. We also made another architectural
change where now all clients including Outlook mapping clients no longer connect directly to the information
store. Instead they now connect to a set of services on the client access server, one is called the client
address service, thats where they get their directory information, and the other service is called the RPC
client access service and thats where they get their MAPI endpoint now, so all Outlook knows is its got its
MAPI and directory in points, it has no idea its talking to a client access server not a mailbox server anymore.
So you can see here I have the option to replicate databases as I see fit, its not like CCR where every database
you have on the active node gets replicated at the passive node. Its more like SCR in this case in that theadministrator gets to choose which databases get replicated and to where. So in this case the administrator
only wanted three copies of DB1 so we spread them across mailbox server one, two and four. Similarly you
can see on mailbox server one, DB1 and 3 are both green, those are the active copies but mailbox server 1
also hosts a passive copy of DB2, again, this is another departure from our previous model where you had
only active instances on one server and only passive instances on another server. Now we have multiple
instances, you could have active and passive copies of multiple databases on multiple servers as you see here.
21
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
22/40
In changing to this model this changed everything from a failover perspective okay because we dont
have a clustered mailbox server anymore because we dont have a network identity to moveanymore, we now only have to simply move the designation of the active copy and I say move thedesignation of the active copy because were not really picking up a database and moving it. All
were saying is youre active now, youre passive now, you had a problem, youre active now and
youre passive now, its that simple. Failovers now managed completely within Exchange because
there is no cluster resource model if you open up failover cluster manager on a mailbox server thats
a member of DAG and you look under services and applications, youre not going to see anything.
Theres no exchange group, theres no exchange resources, no IP addresses, no storage groups, no
databases, no information store, no system attendant, nothing, we dont use the cluster resource
anymore. That though means that we had to have some mechanism to handle failover within
Exchange in previous versions if we had a problem the cluster moved that resource over to anothernode for us. Now we actually have a whole brand new component inside of Exchange called Activemanager and Active manager runs in a key service on these mailbox servers its called the Microsoft
Exchange replication service.Its the same service we introduced in 2007 to do log shipping and CCR and SCR but now theres anew component that runs inside that service called Active Manager thats the brain of the Exchange
solution. Active manager is not only responsible for managing everything but its also responsible for
initiating the corrective action when some sort of failure occurs. So say for instance the disk hostingDB1just dies, were not using RAID in this case so the disk dies and now the database is gone with it.Active manager detects that and will automatically failover the active copy to one of the otherpassive copies whichever one it believes to be the best most up to date healthy copy.
So I mention all clients connect via CAS so the system works somewhat like this, Ive got any clientout there, might be Outlook, might be Outlook web app, might be Active sync, we dont know, its
just a client accessing and getting messages in. Theres an active manager client that also runs inside
of the client access server and that knows where the users database is located so users only connectto CAS and its CAS that talks to RPC MAPI to the information store. Users dont talk to the
information store directly anymore, weve abstracted the user connection away from the information
store so that we can get fast failover when one of these databases has to failover.
22
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
23/40
So messages come in, they go to the appropriate database and then the log files representing those
messages get replicated to the copies of that database these are using message icons, its not the
actual messages that we replicate its the actual transaction log files generated by the Exchange
database engine itself that gets replicated.
So if we have a failure affecting database 1, database 1 disappears for whatever reason maybe its
the storage, maybe its some sort of corruption we dont know, database 1 is gone, what happens in
30 seconds or less and you can see under mailbox server #2, the copy of DB1 has now gone green, in
30 seconds or less a new active copy replaces the failed active copy by choosing the best available
passive copy to activate. So in this case the system decided that the best copy was on mailbox server
2, notice the client still stays connected the CAS, even though their underlying database went away,
CAS understands whats going on because of the active manager clients, so active manager says no,your database is now over here, Im going to connect CAS to mailbox server 2 and the client is back
in business.
And because it happens so quickly its quite possible and more often then not clients dont even
notice that any of this happened, theyre abstracted away from it so as their database goes away
they dont get disconnected, they might get disconnected at the CAS server goes away but well talk
about load balancing and how to deal with that in a minute but assuming CAS doesnt go away and
its just a failure of the mailbox server, the mailbox servers network, the mailbox servers disks,
thats going to be a transparent failover to the client okay, theyre probably not going to notice
anything. Okay, now if they happen to be in the middle of Outlook web app, theyre composing
something and they go to hit send, and in the middle of hitting send a failover occurs, they will get a
message saying that your mailbox is temporarily unavailable. But if they just wait a few seconds and
press f5 for refresh on the browser it will bring them right back into their mailbox and they wont
even have to log on again, okay, its that fast. And of course mail flow will continue because active
manager knows where the database is located and replication will continue as long as you have
multiple copies left.
23
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
24/40
Now each DAG, Im showing a five member DAG here, each DAG can go up to sixteen members so
you can have sixteen copies of all your databases and each Exchange Server itself supports 100databases so you can 1600 databases inside a single DAG, of course that would be non replicated
databases but you can have 800 databases where youve got 2 copies of each, 533 databases with 3
copies of each, and so forth and also consider that our maximum recommended database size is
now 2 terra bytes per database you can grow this very, very large it scales incredibly well, and in case
youre wondering how well, this solution is whats running Outlook.com, Office 365 and so forth. So
were talking 75 million almost 80 million mailboxes on this solution. The beauty of it is the same
exact commands you use to create the DAG inside a data center the same exact ones you would use
to extend it to another data center to put yourself in a site resilience configuration, its that easy.
24
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
25/40
Now, I mentioned the DAG and this is basically again, what the architecture would look like, clients
are talking to the client access server, theyre talking to the RPC client access service and the addressbook service, and its the active manager component that is telling those services where the users
mailbox is located so that CAS can talk to mailbox for them.
25
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
26/40
So a DAG is simply just a set of servers up to sixteen that host a set of replicated databases you can
have multiple DAGs in a single org, obviously if you need more than 16 members in the DAG youhave to use a second DAG and so forth. We do leverage the Windows failover cluster technologies
but were not cluster aware and we dont use the cluster resource model and the DAG itself defines
the boundary for replication so you wont be replicating outside the DAG.
26
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
27/40
And we mention this, now we did add a second form of continuous replication in service pack 1 so
let me briefly talk about that.
27
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
28/40
So now I have two forms and in 2007 and in 2010 RTM we had one form of continuous replication
and one form of log shipping where were shipping closed transaction files. So the active copy in
green would create the log files and then the passive copy would say hey, send me your latest log
files, Ive got these so far, the latest log files would go through and if the passive copy is able to keep
up and catch up with log generation activity on the active copy, which in this case it would because
now its got log five which was the last one generated, now it says you know what, the database
copy is up to date, now the system switches into block mode and we now instead of shipping those
transaction files we actually ship blocks of ESE transactions as theyre being written to the log buffer.
So we write to the log buffer on the active side and at the same time we send that information over
to a corresponding buffer on the passive side and keep things up to date. Now all continuous
replication is asynchronous so we dont wait for acknowledgement on the other side. So there ispotential for data loss but weve got other mechanisms built in the system to get that data back. But
again, now we have the ability to replicate blocks, we dont have to wait for a transactional log file to
be closed in order to externalize that data which means that that amount of losable data has
substantially decreased with service pack one as a result of databases being able to leverage block
mode. And of course once the buffers full, it generates the corresponding log file on each separate
side, its built, its inspected separately, and of course replayed into the copy of the database after
that. We also have a mechanism whereby if we only get a partial buffer and then the active copy
goes away, well actually use that, well take that, we call it a log fragment and well convert it into a
full log and if theres usable transactions in there, well play those transactions against the database
and at least get the data that we were able to get over.
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
29/40
We also have this concept of a lagged database copy and this is a database copy that you have the
ability to delay replay of log files against for up to 14 days. So think of it as a point in time back up ofyour database up to 14 days. Dont go beyond 14 days we have a hard coded limit of 14 days for the
lag but its basically there to provide you with a maximum of 14 days protection against things like
logical corruption. If you have physical corruption in your store thats not going to be a problem
because continuous replication will detect that and it will block physical corruption from being
replicated to another database. For logical corruption theres no way for the system to tell that and
so as its fail back mechanism you have the ability to delay replay into a passive copy so that if you do
detect logical corruption from an end user you can go and activate a point and copy at a point and
time before that corruption took place. And of course lag copies will affect storage design where
holding those log files so you are going to need to size them appropriately but thats something elseas a protection that we have.
29
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
30/40
Now load balancing has changed a little bit with Exchange 2010 as well in 2007 most customers were
used to doing load balancing for the reverse proxies so that traffic coming from the internet wouldget load balanced and not overwhelm a single reverse proxy. As a result of the architectural change
we made where Outlook now connects to the client access server instead of the information store
you need to have a form of RPC load balancing that you can use for your Outlook clients so that all of
your Outlook clients arent going to a single CAS server. So you will need a load balancer now and it
will have to be an RPC load balancer so that means something like you know Windows Load
balancing wont be able to handle that for you, its got to be RPC load balancing and the RPC load
balancer has to not only support RPC but it also has to support infinity as well. This catches some
customers off guard because its a new requirement we never had in Exchange before so be aware of
that when you talk to customers who are migrating from 2003 or 2007 to 2010.And the last thing we have is Exchange does support back up and recovery obviously thats disaster
recovery not high availability. We used to support both the ESE streaming back up APIs and the VSS
APIs but because were dealing with much larger data sets now the ESE streaming APIs just werent
going to do the job and so we cut them from Exchange 2010.
16-Nov-11
30
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
31/40
We now support only VSS based back ups but the good news is we ship a plug in for Windows Server
back up in the box so if you just want a basic VSS back up of your databases you get that in the boxwith Exchange, you dont have to buy other products.
If you want something more full featured its when DPM or any other Exchange aware third party or
VSS solution would work for you.
We also have some other DR technologies one called a recovery database, its basically an object into
which you can put a restore database in which you can extract data out of it. We also have this
concept of database portability where you can take any Exchange 2010 database and have it be
moved to any other Exchange 2010 server inside the Org so even if you didnt replicate it you can
pick it up and forklift it over to somewhere else.
The last thing we have is this dial tone portability which if you do have a failure affecting your onlydatabase copy you at least have the ability to spring up what we call this dial tone database, its an
empty database that just allows users to send and receive mail, it doesnt have all their historical
data in it, its empty but it gives them dial tone so they can at least send and receive messages while
youre in the background restoring their data.
16-Nov-11
31
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
32/40
Thank you Scott. Now Im going to continue and talk about some of the other mission critical servers
for Microsoft and theyre high availability solutions.
32
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
33/40
http://support.microsoft.com/kb/957006
First of all think about virtualization, virtualization is one of Microsofts key investments of the Hyper
V platform and all teams now test their products in Hyper V. Microsoft has whats called the
common engineering criteria which are a series of guidelines which each engineering team must
follow to ensure their applications are enterprise ready. One of the key tenets of this guide is to test
in Hyper V to make sure that it has equivalent performance and equivalent resiliency, so you can
actually go online to check out this KB article 957006 and its kept up to date as far as which of the
various versions of all the Microsoft products and whether they are supported in a Hyper V
environment. And when we think of Hyper V, think Hyper V with failover clustering meaning that we
can run all of these application service inside a VM Guest and the VM itself is clustered.
16-Nov-11
33
http://support.microsoft.com/kb/957006http://support.microsoft.com/kb/957006 -
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
34/40
The next major application is the file server and file server of course manages your storage, your
shares, replication and search and indexing. Traditionally file servers user failover clustering, this isthe default configuration and you can have multiple file servers on a failover cluster.
DFS replication is another technology which is part of file server. And DFS replication can be used as
a high availability technology in regards to it allows you to push information from one server to
another server so in theory its in multiple locations. So if a primary server crashes or becomes
unavailable you can recover the information from a secondary location. Now, you can do this within
a single site, within a data center, within a group of servers, you could do this across multiple sites,
so this can build up a disaster recovery solution if you have multiple data centers. With replication it
also gives you the ability to access offline files, so if you use the offline files feature youre actually
using some DFS replication on the backend to go and push or to keep updated your local copy of allof these versions. Now, one of the key things to keep in mind is that replication only happens when a
file is closed so this could give you pretty good availability if youre working on something such as a
Word document, or an Excel spreadsheet, but if we extend this concept to the enterprise it doesnt
do a great job of replicating things which keep their file open. For example a virtual machines VHD
file or a SQL database, these types of resources are kept open indefinitely and really theyre only
closed when theyre taken offline. Now, if these are kept open and replication hasnt happened then
potentially you could lose all of the data, all the information which has been collected since the last
replication. And for this reason failover clustering does not support DFSR as a replication technology
since it is possible that some data could be lost. Additionally you have to keep in mind that there
could be replication conflicts, if you have multiple people working on the same document
simultaneously in two different locations, when replication happens there could be some synching
and configuration conflicts which need to be resolved. Nevertheless it is a great true inbox solution
to give you some levels of high availability in your data center.
34
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
35/40
Lync Server is one of the extensions of the unified communications server, it basically covers all types
of messaging, including IM, voice and video, and content sharing over live streaming mediums. Lyncserver has a high availability architecture which is relatively flexible. The core is using load balancers
to connect people to a registrar so when a user wants to go and connect to the Lync server theyre
going to go and get sent to a registrar. Now there is a requirement to use hardware load balancers
for this registrar and NLB, Microsofts network load balancing is explicitly not supported. Now the
registrars themselves they actually have access to whats called a back up registrar pool so if this
primary registrar is unavailable when a client connects or it crashes the client will get sent to this
backup registrar. So from their perspective they may be disconnected temporarily but their
transaction will be recovered and they can continue staying online. There is also DNS load balancing
available for other types of network traffic from Lync server, so this gives you kind of the basic formsof high availability just by distributing incoming clients to different registrars or different Lync server
components to help distribute the traffic and make sure that one specific server or one specific
component isnt overloaded with too many client connections.
Now there are partners which deliver what are called SBAs or survivable branch applications. These
are essentially customized unified communication applications which contain a subset of all of the
Lync functionality so by connecting to these SBAs which are generally a branch office we could keep
a client connected, we could keep them using some of the basic communication and collaboration
tools but they do have limited ability to use the full functionality of Lync server. By having an SBA in a
branch office we could keep our client up and running even if they cannot connect to their primary
data center.
16-Nov-11
35
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
36/40
SBA in a branch office we could keep our client up and running even if they cannot connect to their
primary data center.Lync also has two multi-site high availability solutions which are called Data Center Resiliency and
Metropolitan Data Center Resiliency. With Data Center Resiliency it allows us to spread our Lync
servers across multiple physical locations and we can even have high availability for the voice
communication so if somebodys on a phone call, primary data center crashes, we can actually
failover to this secondary without dropping the call. Specifically the high availability is built for voice
failover so it is possible that other types of transactions such as IM communication could be
temporarily lost if a failover happens.
The more advanced version of this is whats called Metropolitan Data Center Resiliency and with this
you have an Active/Active configuration with continual replication at the sites at the hardware level.And the reason why this is called metropolitan is that its generally going to be deployed within a
specific city, meaning that the breadth or the distance which you can stretch the sites is limited to a
few miles or a few dozen miles. This does give you higher availability since it is an active/active
connection so youll have better resilience but the distance between the data center can be limited.
The final Lync server high availability solution is simply backup and restore. If you lose information, it
crashes, you can pull it back using an expedited service restoration process which can be a workflow
that can be pre-programed or pre-orchestrated to help recover as quick as possible.
16-Nov-11
36
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
37/40
http://blogs.msdn.com/b/joelo/archive/2007/03/09/sharepoint-backup-restore-high-availability-
and-disaster-recovery.aspx
SharePoint Server is Microsofts web platform for all types of collaboration and document
management. Its primarily built around a database where all of this shared content is stored and
this database can be made highly available using SQL; it can use SQL backup and restore, database
mirroring or log shipping and this database can be protected using System Center Data Protection
Manager or DPM. DPM has some nice integration points with SharePoint because it gives you the
ability to granularly restore specific objects so if you lost a specific document you could go and just
recover that specific document rather than having to restore the whole database.
Two of the additional SharePoint servers the Crawl or Index server and the Search or Query serverthese are deployed in a redundant topology meaning that there are multiple versions of them
available throughout the infrastructure and if one of them is unavailable clients will simply be
reconnected to another one to help speed up the indexing or to help speed up their search queries.
SharePoint does have a rich, front end, web based interaction experience where clients or users
actually will go and browse the document or collaborate on the site using a web front end and this is
made highly available using Windows Network Load Balancing, NLB will go and distribute the traffic
across these multiple front end servers to ensure that a single server is not overloaded.
An additional high availability feature which is unique or customized for SharePoint is the recycle bin
and this gives you the ability to simply recover items that were accidentally deleted so if you lose any
type of file, list or application, by default they are still saved for 30 days before theyre permanently
deleted to give people the opportunity to recover any documents that were accidentally removed.
16-Nov-11
37
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
38/40
Technical Review complete
As we talked about the web server for SharePoint a lot of this is built on Microsofts web server
known as IIS. IIS has a rich series of clients and a rich topology to handle all the different kind of web
services from file transfers to actually displaying websites.
Network Load Balancing is used for most of the web server roles with IIS, this means that when a
client tries to connect to anything they can go through NLB and be load balanced across multiple
servers, additionally hardware load balancing can be used. Now network load balancing, this does
load balancing at a level 2-3 layer in the networking stack. However with IIS with the web server
there are often load balancing requirements at level 7 which is what we call the HTTP traffic and so
load balancer at this level actually look at the URL, HTTP://microsoft.com and it will go and loadbalance traffic based on what is contained within that URL it will load balance the HTTP traffic. And
it does this by whats called an application request routing server or ARR. ARR essentially contains
this logic to do the load balancing for this level 7 traffic. However the ARR server itself needs to be
made highly available so that its not a single point of failure and the ARR server can user network
load balancing to be deployed in redundant arrays. So at the front end youre going to have ARR
with network load balancing this is going to not only give you traffic load balancing at level 2 and 3
but then its going to go and figure out at level 7 where it should redirect the clients to the content
servers and then you have the middle tier which will actually go and serve the content up.
Additionally IIS has high availability for two of its roles using failover clustering. This is the FTP role
and the WWW role. There are white papers out there that will actually show you how to explicitly
configure these roles on a Windows Server failover cluster so that there are client access points for
anyone trying to connect to either of these roles is always available and can move between different
nodes in the failover cluster.
16-Nov-11
38
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
39/40
As we review this section weve covered quite a lot of the core servers and core applications from
Microsoft. As we know downtime is inevitable so not only is it important to keep our infrastructureup and running but its more important to keep our applications up and running. While most of the
technologies and servers can use virtualization or network load balancing, many of them have
unique and specific dependencies on failover clustering. And as weve seen from Exchange Server as
well as SQL Server they both use failover clustering as one of the underlying technologies yet they
abstract a lot of the management and unique functions specific to SQL and Exchange from clustering
so while they might use the cluster for membership or for health checking the rest of the
functionality is unique to Exchange and to SQL.
We hope that you found this module on application high availability useful and check out part three
of this series which will go and look at management high availability.
16-Nov-11
39
-
7/30/2019 Module 1 Application HA With SQL, Exchange and Other Servers
40/40
This video is a part of the Microsoft Virtual Academy.
Thank you.