powerpoint presentation · 2018-01-29 · automated, consistent application updates updates to the...
TRANSCRIPT
Automated, Consistent
Application Updates
Updates to the application occur in an automated way
Updates result in clean components forcing consistency
Local storage and OS are left untouched
Automated, Consistent
Configuration Changes
Updates to the settings occur in an automated way
Updates result in clean settings
Local storage and OS are left untouched
Multi-Instance
Management
Identical instances are deployed across the service
Large scale-out services are guaranteed to be consistent
No configuration drift
Scale-out Application scale-out can occur automatically
High Availability The application has no downtime, even in the face of hardware
failures.
Automated, Consistent
OS Servicing
The OS system hosting the application can be updated with the
most recent patches in a coordinated and automated way.
Single Instance
Persistent OS
Single Instance
Stateless OS
Multi-Instance
Stateless OS
Automated, Consistent
Application Updates
Automated, Consistent
Configuration Changes
Multi-Instance Management
Scale-out
High Availability
Automated, Consistent OS
Servicing
Windows Azure
Single Instance
Persistent OS
Single Instance
Stateless OS
Multi-Instance
Stateless OS
Automated, Consistent
Application Updates
Automated, Consistent
Configuration Changes
Multi-Instance Management
Scale-out
High Availability
Automated, Consistent OS
Servicing
Front-End
Front-End
Front-
End-1
Front-
End-2
Update
Domain 1
Update
Domain 2
Middle
Tier-1
Middle
Tier-2
Middle
Tier-3
Update
Domain 3
Middle
Tier-3
Front-
End-2
Front-
End-1
Middle
Tier-2
Middle
Tier-1
Front-
End-1
Front-
End-2
Middle
Tier-2
Middle
Tier-1 Middle
Tier-3
Mark’s Service
Role: Front-End
Definition
Type: Web
VM Size: Small
Endpoints: External-1
Configuration
Instances: 2
Update Domains: 2
Fault Domains: 2
US-North Central Datacenter
FC
Server Datacenter
TOR
LB LB Agg
PDU
LB LB Agg
LB LB Agg
LB LB Agg
LB LB Agg
LB LB Agg
Racks
Datacenter
Routers
Aggregation
Routers and
Load Balancers
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
… … … … …
Top of Rack
Switches
Power Distribution
Units
…
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
No
des
Fabric Controller
Role
Images
Role
Images
Role
Images
Role
Images
Image Repository
Maintenance OS Parent
OS
Node
PXE
Server Maintenance OS
Windows Azure
OS
Windows
Azure
OS
FC
Host
Agent
Windows Azure Hypervisor
Fabric Controller (Primary)
FC Host Agent
(trusted)
Host Partition
Guest
Partition
Guest
Agent
Guest
Partition
Guest
Agent
Guest
Partition
Guest
Agent
Guest
Partition
Guest
Agent
Physical Node
Fabric Controller (Replica) Fabric Controller (Replica) …
Role
Instance
Role
Instance
Role
Instance
Role
Instance
Trust boundary
23
Role B Count: 2
Update Domains: 2
Fault Domains: 2
Size: Medium
Role A Count: 3
Update Domains: 2
Fault Domains: 2
Size: Large
Load
Balancer
www.mycloudapp.net
www.mycloudapp.net
Resource Volume
OS Volume
Role Volume
Guest Agent
Role Host
Role Entry Point
Role A
UD 1
Role A
UD 2
Role B
UD 1
Role B
UD 2
Role A
UD 1
Role B
UD 1
Role A
UD 2
Role B
UD 2
Role A
UD 1
Role A
UD 2
Role B
UD 1
Role B
UD 2
Role A
UD 1
Role A
UD 2
Role B
UD 1
Role B
UD 2
• Allocation 1 allows for
2 nodes rebooting
simultaneously
• Host OS upgrade
rollout is 2x faster
with allocation 1
Allocation 1
Allocation 2
Service A
Role A-1
UD 2
Service B
Role A-1
UD 2
Service A
Role B-2
UD 2
Service B
Role B-2
UD 2
Service A
Role A-1
UD 2
Service A
Role B-2
UD 2
Service B
Role B-2
UD 2
Service B
Role A-1
UD 2
Problem How Detected Fabric Response
Role instance crashes FC guest agent monitors role termination FC restarts role
Guest VM or agent crashes FC host agent notices missing guest agent heartbeats
FC restarts VM and hosted role
Host OS or agent crashes FC notices missing host agent heartbeat Tries to recover node FC reallocates roles to other nodes
Detected node hardware issue Host agent informs FC FC migrates roles to other nodes Marks node “out for repair”
25 min
Guest
Agent
Connect
Timeout
Guest Agent
Heartbeat
5s
Role
Instance
Launch
Indefinite
Role
Instance
Start
Role
Instance
Ready
(for updates only)
15 min
Role Instance
Heartbeat
15s
Guest Agent
Heartbeat Timeout
10 min
Role Instance
“Unresponsive” Timeout
30s
Load Balancer
Heartbeat
15s
Load Balancer
Timeout
30s
Guest Agent
Role Instance