high availability and disaster recovery topologies - omf canberra june 2014

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

High Availability and Disaster Recovery Topologies

Damien McAullayOracle Fusion MiddlewareJune 2014

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2

Business Continuity Planning (BCP)

• How to make your business “life” go on in the case of a disaster– It’s about the business, not the means– Does not necessarily incorporate IT• (but usually does in the 21st century)


Disaster recovery

• The “IT” part of BCP

• How do I recover my data, configurations, …?• Where do I restore my data to?• How can my users get access to the

recovered environments?


Some data recovery strategies

Take periodic backups to local media, store media offsite

Replicate data to another site

Backup directly to offsite Replicate data to the cloud


High availabilityDowntime per year

90% (one nine) 36.5 days

99% (two) 3.65 days

99.9% 8.76 hours

99.99% 52.56 minutes

99.999% 5.26 minutes

99.9999% 31.5 seconds

99.99999% 3.15 seconds

• Determined by “up time”– (total time – down time) / total time

• Target availability often expressed in class of “nines”


High availability

• Often practical to pre-define downtime (e.g. maintenance windows, periods where users are not active like public holidays)

• Three key aspects:– No single points of failure– Reliable switching mechanism(s)– Capability to detect failure, recover/bypass, and alert technician


WebLogic

Example scenario: WebLogic web-based application, database, and internal/external users

WLS


Eliminate SPOF

Add 2nd WebLogic server to cluster

Add a 2nd WebLogic node to cluster

Add 2nd Database server

Add load balancers to distribute load across WLS and DB

WLS WLSWLS WLS

LB

LB


Reliable switching

Use a network load-balancer (e.g. F5 or CSM) to distribute requests across WLS

Use Active GridLink data source in WLS to connect to RAC Database (clustered)

WLS WLSWLS WLS

RAC

F5

ActiveGridlink


Monitoring, recovery, and alertingUse OEM to monitor WLS/DB, push metrics into service desk platform

Use OEM/scripts to remediate common/known problems (e.g. restart WLS on OOM)

Add notifications for outages, performance degradation, etc. to technicians

WLS WLSWLS WLS

RAC

F5

Ora

cle

Ente

rpris

e M

anag

er

Your

Ser

vice

Des

k Pl

atfor

m


Fast disaster recovery

WLS WLSWLS WLS

RAC

DC1

WLS WLSWLS WLS

RAC

DC2

WLS WLSWLS WLS

RAC


Maintenance• Server patching

1.0 1.1

• Deploying app updates


Closing advice …• Change Control– Make sure you can identify the current “state” of your deployments

• Practice makes perfect– The more often you rebuild your environments, the better you’ll perform on race day– Private-cloud-style provisioning and CI encourage practices useful for DR

• Don’t reinvent the wheel– Use the Oracle Maximum Availability Architectures as a starting point

• Start with your BCP– HA/DR is not cheap, so don’t do anything unnecessary

high availability and disaster recovery topologies - omf canberra june 2014

Software