cloud data center – chicago designed 2007/ opened 2009 generation 2 deployment (sla...
TRANSCRIPT
Azure Architecture Patterns change IT approaches to T<solutions>Ulrich (Uli) HomannPartner Software Architect, Microsoft [email protected]
DCIM-B214
Cloud Data Center – Chicago Designed 2007/ Opened 2009
Generation 2 Deployment (SLA 99.999)
Generation 3 Deployment (SLA 99.9)
Physical RedundancyN+2, Tier 3
Software Geo-RedundancyActive/Active nodes – geo-distributed
Raised Floor ITServers · Storage · Network
ContainerDC in a box
3x9s
Enterprise Architecture Service Architecture
Mainframe
N/S
Tra
ffic
E/W Traffic (active/active)
Seats 10,000 1,000,000,000
Talent Custodians Designers
Budget Fixed Cost Rates
Architectures Many Few
App Integration Loose Tight
Infrastructure Overhead Enabler
Reach Regional Global
Cost/Mb $1.74M $0.026M
Network $/server >$200 <$200
Hardware Custom Commodity
Availability Infrastructure Service
Operability MTBF MTTR
Reliability Hardware Software
Network Downtime Impacting Irrelevant
Network Availability 99.9999% 99.9%
Design Primary/Backup Active/Active
Speed Speed Performant
Deployment Time Weeks Minutes
Enterprise IT Cloud-scale
From the enterprise to the cloud
Changing Behavior with Microsoft Tools
SCRY
Microsoft’s SCRY measurement tool aligns actual resource use with charge back model
Tracking Carbon
Tracking UtilizationFrom Allocating by Space…
…To Allocating by Power
Tracking Power
Billing & Cost Allocation
$
What Does Moving to an Online Service Mean?
DEPLOYMENT
SERVICE CHANGESTANDARDIZATION
Single architecture
Limited configuration and customization options
Initial deploy is still required to migrate data to Office 365
AD clean up and network upgrade is often required
PRIVACY and SECURITY CONSIDERATIONS
Understand your internal security and privacy requirements
Balance between continuous innovations and minimize change
Customer controls IT policies but not feature availability
On-premises
Online
Lessons learned
8
Extreme Standardization
SLA-Driven Architecture
Process Maturity
Delegation & Control
Re-imagined Processes
Automation Change Control
Scale Out Application
Customer Self Service
(Mostly) Yesterday’s Platform
Each layer “early bound” to layer belowMust provision entire stack for each layer instanceDifficult to balance isolation and utilization/efficiency
1. Purchase
OS2. InstallRole3. InstallApp4. Deploy
Context5. Configure
Requests
Today's Platform
Virtualization breaks the tight coupling between hardware & softwareSoftware stack is still mostly statically bound though…
OS
Role
App
Context
OS
Role
App
Context
Virtualization
“Fabric Based” Computing PlatformInfrastructure Fabric
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
OSRole
InfrastructureFabric
Base infrastructure serves multiple workloads / rolesInfrastructure is managed as one resourceProvisioned to aggregate need rather than per project
Hardware becomes fungible
DEFINE THE FABRIC
o Offloaded Data transfer (ODX)
o Storage Spaceso Thin-Provisioningo Deduplicationo Tier-ing
Storage Consolidation
o High Performance & Share Nothing Live Migration
o System Center Multi Hypervisor support (Hyper-V, VMware, XEN)
o BitLocker Encryptiono Up to 64TB Virtual
Hard Disk (VHDX) Size
ServerVirtualization
Network Virtualization
o Software Defined Networking
o Virtual IP Address Management
o Datacenter Bridging
Access & Information Protection
o Windows Server & Azure Active Directory
o Active Directory Federation Services
Management
o PowerShell Automation, >3000 cmdlets
o Desired Configuration
o Windows Management Framework: WS-Management, REST, HTTP, PSRP
High Availability
o Hyper-V Replicao Windows Azure
Hyper-V Recovery Manager
System Center
Windows Server 2012
Hardware Stamp
Compute
Networking
Storage
Workloads
SQL
Lync VDI
SharePoint
Exchange
CRM
Fast Track Microsoft Private Cloud Fast Track
Guidance Sethttp://technet.microsoft.com/en-us/jj572811
Microsoft Azure
App services
Data services
Infrastructure services
Integration HPC Analytics
Web sites
Mobile services
Caching IdentityService
bus MediaCloud
services
SQL database
HDInsight Table
Blob storage
Virtual machine
sVirtual
network VPNTraffic
manager CDN
Health Endpoint Monitoring PatternSummary: Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly
http://aka.ms/Health-Endpoint-Monitoring-Pattern
Resilience Modeling and AnalysisPhases
Document
Act
Rate
Discover
Identify failure pointsComponent interaction diagram
Prioritize reliability workRemediate against effects and validate mitigations
Record failure effectsAssess risk priority using Impact and Likelihood
Brainstorm failure modesDIAL categories (Discovery, Auth, Incorrectness, Limits, Component)
Resilience Modeling and AnalysisDiscover
Discovery
Limits
Auth
Incorrectness
Name resolution service health or configurationCaller configuration
Timeouts and blockingService unavailable or unhealthy, throttlingFlooding, congestion, slow response times
Protocol and version mismatchCorruption, data fidelity, poison messageDuplicate request, invalid state, timing errors
Authentication service health or configurationResource authorization configuration
ComponentCode or configuration changesHangs, crashes, resource exhaustionFault domains
Resiliency Modeling and AnalysisRate – Assessing Risk
Effects
Likelihood
Resolution
Detection
Portion Affected
When this failure occurs, how deeply is the functionality impaired?
What is the frequency this failure is likely to occur?
How long does it take the automated system or human to restore functionality after the failure has been detected?
How long does it take until an automated system or human is notified to take corrective measures?
When this failure occurs, what portion of users or transactions are affected?
Impact
Likelihood
Resiliency Modeling and AnalysisAct – Prioritize and Mitigate
Impact Likelihood
IDComponent/ Dependency Interactions
Failure Short Name
Failure Description
Consequences EffectsPortion
AffectedDetection Resolution Likelihood
3Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
4Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
5Storage Layer -
> Azure Storage
Latency from Azure
Storage::Service
Azure Storage component may
When memory pressure is
sufficient, return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Less than once a year
6Web Service ->
Server APILatency from
Server APIThe Server API may be slow to respond
Caller will timeout resulting in a client
retry.
Major impairment of
core functionality
Less than 2%More than 15
minMore than 45
minMore than
once a month
7Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
8Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
Less than 50%More than 15
minMore than 45
minMultiple times
a year
9 Azure DNSAzure DNS
Failure::ClientAPI
The Azure DNS system may fail to
respond
Error DNS not found returned to
caller.
Major impairment of
core functionality
Less than 2%Between 5 min and 15
min
More than 45 min
Less than once a year
Risk
Impact Likelihood
IDComponent/ Dependency Interactions
Failure Short Name
Failure Description
Consequences EffectsPortion
AffectedDetection Resolution Likelihood
4Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
3Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
5Storage Layer -
> Azure Storage
Latency from Azure
Storage::Service
Azure Storage component may
When memory pressure is
sufficient, return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Less than once a year
7Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
6Web Service ->
Server APILatency from
Server APIThe Server API may be slow to respond
Caller will timeout resulting in a client
retry.
Major impairment of
core functionality
Less than 2%More than 15
minMore than 45
minMore than
once a month
8Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
9 Azure DNSAzure DNS
Failure::ClientAPI
The Azure DNS system may fail to
respond
Error DNS not found returned to
caller.
Major impairment of
core functionality
Less than 2%Between 5 min and 15
min
More than 45 min
Less than once a year
Impact Likelihood
IDComponent/ Dependency Interactions
Failure Short Name
Failure Description
Consequences EffectsPortion
AffectedDetection Resolution Likelihood
4Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
3Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
7Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
5Storage Layer -
> Azure Storage
Latency from Azure
Storage::Service
Azure Storage component may
When memory pressure is
sufficient, return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Less than once a year
8Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
6Web Service ->
Server API
Latency from Azure
Storage::Service
The Server API may be slow to respond
Caller will timeout resulting in a client
retry.
Major impairment of
core functionality
Less than 2%More than 15
minMore than 45
minMore than
once a month
9 Azure DNSAzure DNS
Failure::ClientAPI
The Azure DNS system may fail to
respond
Error DNS not found returned to
caller.
Major impairment of
core functionality
Less than 2%Between 5 min and 15
min
More than 45 min
Less than once a year
Impact Likelihood
IDComponent/ Dependency Interactions
Failure Short Name
Failure Description
Consequences EffectsPortion
AffectedDetection Resolution Likelihood
4Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
7Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
3Storage Layer -
> Azure Storage
Error 5xx from Azure
Storage::Service
Azure Storage may respond with error
Return Error to caller. Service
closed
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
5Storage Layer -
> Azure Storage
Latency from Azure
Storage::Service
Azure Storage component may
When memory pressure is
sufficient, return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Less than once a year
8Storage Layer -
> Azure Storage
No Response from Azure
Storage::Service
Azure Storage may fail to respond
within the timeout period
No retry. Return Error to caller.
Major impairment of
core functionality
More than 50%
More than 15 min
More than 45 min
Multiple times a year
6Web Service ->
Server APILatency from
Server APIThe Server API may be slow to respond
Caller will timeout resulting in a client
retry.
Major impairment of
core functionality
Less than 2%More than 15
minMore than 45
minMore than
once a month
9 Azure DNSAzure DNS
Failure::ClientAPI
The Azure DNS system may fail to
respond
Error DNS not found returned to
caller.
Major impairment of
core functionality
Less than 2%Between 5 min and 15
min
More than 45 min
Less than once a year
RiskRiskRisk
Retry PatternSummary: Enable an application to handle anticipated, temporary failures when it attempts to connect to a service or network resource by transparently retrying an operation that has previously failed in the expectation that the cause of the failure is transient. This pattern can improve the stability of the application.
http://aka.ms/Retry-Pattern
Circuit Breaker PatternSummary: Handle faults that may take a variable amount of time to rectify when connecting to a remote service or resource. This pattern can improve the stability and resiliency of an application.
http://aka.ms/Circuit-Breaker-Pattern
Throttling PatternSummary: Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This pattern can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.
http://aka.ms/Throttling-Pattern
Solution Design Patternshttp://aka.ms/Cloud-Design-Patterns
Copies of the poster are available at the SCT booth…
Come Visit Us in the Microsoft Solutions Experience!
Look for Datacenter and Infrastructure ManagementTechExpo Level 1 Hall CD
For More InformationWindows Server 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205286
Windows Server
Microsoft Azure
Microsoft Azurehttp://azure.microsoft.com/en-us/
System Center
System Center 2012 R2http://technet.microsoft.com/en-US/evalcenter/dn205295
Azure PackAzure Packhttp://www.microsoft.com/en-us/server-cloud/products/windows-azure-pack
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
msdn
Resources for Developers
http://microsoft.com/msdn
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.