webinar: openstack best practices for production
TRANSCRIPT
![Page 1: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/1.jpg)
Sirish Raghuram Co-founder, CEO
Platform9
7 OpenStack Best Practices
Private Clouds Made Easy
Roopak Parikh Co-founder, VP Engineering
Platform9
![Page 2: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/2.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Speaker Bio
2
Sirish Raghuram
• Co-founder, CEO at Platform9
• Previously: Staff Engineer at VMware (12 years)
• Technical and Management responsibility for multiple VMware products
Roopak Parikh
• Co-founder, VP Engineering at Platform9
• Previously: Staff Engineer at VMware (7 years)
• Architect for multiple VMware products
![Page 3: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/3.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Best practices from managing 50+ active OpenStack deployments
• Recommended for technical audience looking to use OpenStack in production
• Assumes fair knowledge of OpenStack
Preamble
3
![Page 4: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/4.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
OpenStack Architecture
4
Clarity UI
Nova !!
Cin
de
r
Scheduler
Keystone (Identity)
CLI / Tools Scripts Heat (Orchestration)
Ne
utr
on
Gla
nce
(Im
age
s)
Basic Storage
Compute
Basic Network
BlockStorage
NetworkController
![Page 5: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/5.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Platform9 Managed OpenStack:
• Your servers host your data
• Platform9 hosts the OpenStack controller as a Service, with an SLA
• No need to install, monitor, troubleshoot or upgrade OpenStack
Platform9 Managed OpenStack
5
![Page 6: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/6.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Controller API logs
• Nginx or Apache
• Controller services
• /var/log/nova/*, /var/log/glance/*, /var/log/keystone…
• Rabbit/MQ
• /var/log/rabbitmq
• Controller system health
• CPU, Memory, Disk, N/W
• File Descriptors
• Sockets
• Compute node logs (occasionally)
• nova, glance, other services
• Rarely, libvirt
#1 — Instrument & Monitor
6
![Page 7: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/7.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 Log Telemetry
7
raw log
raw log
raw log
raw log
… Pre-process(filter)
log storage, archival and
search
Alert filters
alertmechanism
Alerts
![Page 8: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/8.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• 100% automation is key
• Alerts can be very noisy
• Future:
• Sentry / Rollbar / to easily discern problem areas by severity and priority
• Migrate from papertrail to E-L-K?
Takeaways
8
![Page 9: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/9.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Common points of failure
• OpenStack Controller
• Database
• Python applications (Keystone, Nova, Glance, et al)
• Rabbit-mq
• Compute Nodes
• Agent software uptime
#2 — High Availability Configuration
9
![Page 10: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/10.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 HA Architecture
10
Compute NodeCompute NodeCompute NodeCompute Node …
Internet
OpenStack Controller
OpenStack Controller
OpenStack Controller
UI
VirtualIP
Load Bala-ncer
Intranet
ReplicatedDB
![Page 11: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/11.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• SLA —> must recover quickly from losing Controller
• Backup Controller DB
• Backup Controller State
• Automated recipe to restore from backup
• Test restore recipe
#3 — Backup / Restore
11
![Page 12: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/12.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Automated mechanism to rollout
• Controller upgrade
• Compute node agent upgrade
• Plan for testing upgrade before committing
• Roll-back if required
#4 — Upgrade / Patch Rollout
12
![Page 13: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/13.jpg)
© 2015 Platform9 Systems, Inc. @Platform9Sys
Platform9 Orchestration
13
Vanilla OS
customer state
Template Image V1
Customer Server V1
Fresh Install
Upgrade
Vanilla OS Template Image V2
Customer Server V2
![Page 14: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/14.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9: Havana to Juno Upgrade
14
![Page 15: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/15.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Segregate underlying infrastructure for different classes of workloads (or users!)
• By workload, hardware type, geography or organization
• Illustrations:
• Test/Dev vs Production
• Tier 1 vs Tier 2
• SSD vs HDD
#5 — Workload Tiering
15
![Page 16: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/16.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Intelligent Placement
16
DevOps
Tier-2Infra
Tier-1Infra
Private Cloud
Tier-2Tier-1
![Page 17: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/17.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• OpenStack controller and compute node software communicate over message queues
• Reliable message delivery is critical to OpenStack
• Issue
• Once in ~2-5000 API requests, compute node or controller node can lose connection to queue
• Result: messages stuck in queue and never delivered
• Result: operations can stall, seemingly at random
• Resolution
• oslo messaging heart-beating applied Jan 2015
• Ref: https://github.com/openstack/oslo.messaging/commit/b9e134d7e955b9180482d2f7c8844501c750adf6
• Disabled in April: https://github.com/openstack/oslo.messaging/commit/287a4f56f45ed9cd40116a9e7b6e529f3382a925
• Platform9 has a Platform9 specific heart-beat mechanism, leverages Platform9 web socket architecture
#6 — Hardened Messaging Libs
17
![Page 18: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/18.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Issue #6 is an example of an issue you will run into
• Be prepared to
• Debug / diagnose
• It took us ~7 man days to debug issue #6 (worst case example)
• Roll out a patch
• Techniques
• Separate webinar topic!
#7 — Troubleshooting / Debugging
18
![Page 19: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/19.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Reviewed 7 best practices to running OpenStack successfully
• Share your own tips — share via GTM chat panel!
Recap
19
![Page 20: Webinar: OpenStack Best Practices for Production](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55ca2e97bb61ebe04d8b45e8/html5/thumbnails/20.jpg)
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Production grade OpenStack without the hard work
• Request your own Platform9 account
• Related resources
• OpenStack benefits for KVM / VMware — recorded webinars
• Upcoming webinar: Jun 7, 2015
• Have questions?
• Ask away!
• Get in touch:
• @Platform9Sys
Summary
20