打造企業內的 sre 透過 istio. 透過 istio...istio in 2 minutes gallery service a service b...

28
透過 Istio 打造企業 SRE Hybrid Specialist: Shawn Ho [email protected]

Upload: others

Post on 18-Aug-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

透過 Istio 打造企業內的 SRE

Hybrid Specialist: Shawn Ho

[email protected]

Page 2: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

1What is SRE?

Page 3: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Product Lifecycle

Concept Business Development Operations Market

Agile solves this

DevOps solves this

Page 4: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

DevelopersAgility

OperatorsStability

Dev & Ops’ KPIs aren't Aligned

Page 5: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

What is relationship between Devops and SRE ?

● Devops is more like abstract concept,guide line and disciplines to break silos in developments, operation

● SRE is Google version of realized practice of Devops.

“Class SRE implements Devops”

Page 6: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Self-Service Platform

Monitoring Automation

CI/CD

SRE

Developers

Class SRE = REAL PERSON

Page 7: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

#1. Decision based on data所有的決定是以資料為基礎

Page 8: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

#2. Be user centric即使所有的監控數據都是正常的,

但客戶只要覺得系統不穩定,那系統就是不穩定

Page 9: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

#3. Blameless culture & Share responsibility降低部門隔閡要由跨部門的責任分享開始 (Developers, Operators, Leader) 系統

系統失效不僅是維運者的責任,程式碼品質,技術債等都是可能的原因

Page 10: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

2How to Implement SRE by Istio/Anthos?

Page 11: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Istio in 2 minutes

Gallery

Service A Service B

proxy proxy

Control Plane API on K8S API Server

Citadel

Logg

ing

plug

in

Mon

itorin

g pl

ugin

HTTP, gRPC, TCP

Routing +

Secure Naming

Cert

Aut

horit

y p

lugi

n

Ingress Gateway Egress Gateway

mTLSmTLS mTLS

JWT + TLS

Cert issuance

Perimeter security policies

Perimeter security policies

Istio Control Plane

Pilot

Policy Enforcement + Reporting

Data flow

Control + metrics flow

Local AuthzJWT + TLS

Internal App 1

External App 1

Page 12: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

What does SRE implement on Platform?

Metrics & monitoring

Capacity planning

Emergency response

Change management

Culture

● SLO● Dashboard● Analytics

● Forecasting● Demand-driven● Performance

● Release process● Consulting design● Automations

● Oncall● Incident analysis● Postmortems

● Toil management● Blamelessness● Share responsibility

Page 13: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

What does SRE implement on Platform?

Metrics & monitoring

Capacity planning

Emergency response

Change management

Culture

● SLO● Dashboard● Analytics

● Forecasting● Demand-driven● Performance

● Release process● Consulting design● Automations

● Oncall● Incident analysis● Postmortems

● Toil management● Blamelessness● Share responsibility

Page 14: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Monitoring and Incident Management

Understand system architecture

Understand system architecture and deployed topology

System monitoring

Monitoring system by gathering blackbox & whitebox metrics

SLI & SLO are extracted from the matrix and logs.

The informations are visualized thru dashboard

Log handling

Managing planned event (release, maintenance)

Incident handling

Create incident ticketRollback change to resolve incident

Investigate root cause with logging,monitoring matrix and debugging.

Postmortem

Retrospect incident and prepare plan to prevent reoccurence

Page 15: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

What to Monitor?

SLO = SLI + Target“99% of REST API call will complete in less than 100ms every week”

SLI Target

SLIservice level indicator: a well-defined measure of 'good enough'

• used to specify SLO/SLA

SLOservice level objective: a top-line target for fraction of good interactions

• specifies goals (SLI + Target)

SLAservice level agreement: consequences

• SLA = (SLO + margin) + consequences = SLI + Target + consequences

Error BudgetProduct management & SRE define an availability target.

• 100% - availability targetis a “budget of unreliability”(or the error budget).

Page 16: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Availability SLO

Allowed unavailability window Error Budget

per year per quarter per 30 days Error rate 1%

90% 36.5 days 9 days 3 days 90

95% 18.25 days 4.5 days 1.5 days 80

99% 3.65 days 21.6 hours 7.2 hours 0

99.5% 1.83 days 10.8 hours 3.6 hours -100

99.9% 8.76 hours 2.16 hours 43.2 minutes -900

99.95% 4.38 hours 1.08 hours 21.6 minutes -1900

99.99% 52.6 minutes 12.96 minutes 4.32 minutes -9900

99.999% 5.26 minutes 1.30 minutes 25.9 seconds -99900

Error Budget (Availability)

Page 17: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Demo with Anthos:Monitoring+Incident Mgmt

● Topology

● SLO/SLI Metrics

● Blackbox/Whitebox

● Log Viewer

● Tracing/Tracing Report

Page 18: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Demo with Anthos:Monitoring+Incident Mgmt

Topology Blackbox Whitebox

Page 19: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Demo with Anthos:Monitoring+Incident MgmtLogging Tracing

Page 20: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Error Budget Burn Down Rate

Page 21: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Demo with Anthos:Proactive Reduce Error Budget

● Alert Setting

● Canary Deployment

● Cross-Region Deployment

ClientsKubernetes ClusterKubernetes Engine

Taiwan-1

Kubernetes ClusterKubernetes Engine

Singapore

Cloud LoadBalancing

10

90

Page 22: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

● Alert Setting

● Canary Deployment

● Cross-Region Deployment

ClientsKubernetes ClusterKubernetes Engine

Taiwan-1

Kubernetes ClusterKubernetes Engine

Singapore

Cloud LoadBalancing

50

50

Demo with Anthos:Proactive Reduce Error Budget

Page 23: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

What does SRE implement on Platform?

Metrics & monitoring

Capacity planning

Emergency response

Change management

Culture

● SLO● Dashboard● Analytics

● Forecasting● Demand-driven● Performance

● Release process● Consulting design● Automations

● Oncall● Incident analysis● Postmortems

● Toil management● Blamelessness● Share responsibility

Page 24: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Capacity planning

Plan for organic growth

Increased product adoption and usage by customers.

Determine inorganic growth

Sudden jumps in demand due to feature launches, marketing campaigns, etc.

Page 25: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Change ManagementRoughly 70%1 of outages are due to changes in a live system

Kubernetes Configuration Service Continuous Deployment

Clients

Kubernetes ClusterKubernetes Engine

Multiple Instances

Cloud SourceRepositories

OnPremise

Kubernetes ClusterKubernetes Engine

GCP

Kubernetes ClusterKubernetes Engine

On-Prem1

Anthos HubService

NAT

Page 26: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Demo with Anthos:The Power of GitOps

Page 27: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Summary + Call for Action● SRE has 3 key principles:

○ Decision Based on Data (有意義的監控)

○ Be User Centric(黑箱測試)

○ Blameless Culture & Share Responsibility (分擔責任,共同努力)

● Kubernetes is a perfect platform to implement SRE○ SLI + SLO + Error Budget ○ Watch for the Budget Burn Rate○ Establish CI+CD with GitOps

● Pick a System and Build your SRE Practices

Page 28: 打造企業內的 SRE 透過 Istio. 透過 Istio...Istio in 2 minutes Gallery Service A Service B proxy proxy Control Plane API on K8S API Server Citadel Logging plugin oring plugin

Cover images used with permission. These books can be found on shop.oreilly.com.