considerations for operating an openstack cloud
Post on 02-Jul-2015
377 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
Mark T. Voelker, Technical Leader @ Cisco
OpenStack ATC/StackForge Puppet Core/Foundation Member #54
All Things Open 2014
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
@marktvoelker
• Tech Lead at Cisco, StackForge Puppet core developer, OS Foundation Member #54
• Fact: can be bribed with doughnuts
• Currently works in Cisco’s Cloud & Virtualization Group
• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively Scalable Data Centers, Devops, making sawdust with extreme prejudice
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
• Tech lead, manager, software developer, architect
• Started in OpenStack in 2011 at the Diablo Design Summit
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
The great thing about my job is that I get to have fun exploring a lot of new things…
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
….and I get to help build a LOT of clouds.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
Today’s talk won’t be overly formal….
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
…because I tend to get excited by this stuff.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
……then you know how to get to Day 1.
Now let’s talk about getting to Day 30…
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
High
Availability?
Sounds
great--I’ll
take two!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
• Consider whether you want active/active or active/passive
• Setup and tooling differs a bit, but I generally like active/active
• Note that docs.openstack.org has an HA Guide
• A bit dated…patches welcome!
• Prioritize HA for the control plane
• That also means thinking about your database, network, and RPC bus
• Instance-level HA: there be dragons
• But yes, it’s being looked at
• Pets vs cattle
• Note: HA == more hardware
• Some components need at least 3 nodes
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
• Stuff OpenStack needs to run: message brokers
• Check out RabbitMQ clustering and mirrored queues
• Check out Galera for MySQL/MariaDB
• I usually see Percona XtraDB
• Frontend with an HAProxy/Keepalived pair
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
• Don’t do rabbit clustering
over a WAN
• Be aware of the SELECT…
FOR UPDATE issue
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
• Long story short: Neutron and some parts of Nova invoke an SQL pattern known as “SELECT…FOR UPDATE” which Galeradoesn’t support due to issues with cross-node locking.
• Can cause deadlocks symptoms.
• Neutron/nova code being refactored to remove, but will likely not be done until at least Kilo.
• Meanwhile: use HAProxy to send writes to a single Galera node and you should be fine
• With the obvious scalability bottleneck
• More info here.
• Thank Jay Pipes & Peter Boros for
the find!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
• Use Swift, Ceph, or other highly available storage to back Glance
• Pick a highly available storage backend for Cinder too
• Use Keepalived/HAProxy to front-end multiple API servers
• Or another load balancer technology of your choice
• Can be deployed as dedicated nodes for scale, or cohabitate
• Network: DVR vs Provider Network Extensions
• Distributed Virtual Routers are a new experimental feature in Juno (not yet ready for production)
• Please go test it and report/fix bugs!
• Provider networks essentially punt the availability issue to your physical network
• Allows you to use standard tools like virtual port channels and VRRP
• Also highly performant
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD• Packaging
• Automated test
• Monitoring• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
We start with bare metal.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
• For a cloud of any real size, you don’t want to be installing operating systems by hand
• Remember that baremetal bringup actually isn’t something that just happens once…often recurs for upgrades, capacity expansion, etc.
• Baremetal bringup tools can also have other uses, like inventory or bootstrapping configuration management agents.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
• A simple (~15k lines of Python code) tool for managing baremetaldeployments
• Flexible usage (API, CLI, GUI)
• Allows you to define systems (actual machines) and profiles (what you want to do with them)
• Provides hooks for Puppet so you can then do further automation once the OS is up and running
• Provides control for power (via IPMI or other means), DHCP/PXE (for netbooting machines), and more.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
• Razor• Developed by EMC, managed by Puppet Labs (occasionally used with Chef
too)
• Initial release in 2012
• Uses a “microkernel” loaded onto the machine to gather facts before provisioning
• Tag + Policy model
• Crowbar• Originally written by Dell, now a community project
• Originally designed to deploy OpenStack on all the way from baremetal
• Now deploys other stuff too (namely, Hadoop)
• Uses Chef to handle everything after the OS install
• Foreman• Used by Red Hat among others
• Does baremetal bringup and serves as a Puppet ENC
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD• Packaging
• Automated test
• Monitoring• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
“Cloud isn’t just an infrastructure technology….it’s a new operations model. And with OpenStack in particular, it’s one that’s very well suited to a DevOps style of management. Many companies aren’t just adopting cloud, they’re changing how they operate.”
“Besides, logging into servers to mess with config files makes me sad.”
--That ranty guy in Raleigh again
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
• Remember, OpenStack is a set of interoperating distributed systems
• That means you’re going to have a lot of software to configure on a lot of machines
• You’re probably going to want to make changes over time
• You’re probably going to have more than one person touching your cloud
• CM tools help you treat configuration as code, so you can collaborate more easily
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29
Pile of
Bash
Scripts
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
• An increasingly common pattern:
• Puppet or Chef for configuration management, PLUS
• Ansible or Salt for cross-node orchestration
• Recommendation: use the tools that work for you!
• But remember: you don’t have to do it alone.
• Several CM tools have thriving collaborators in the OpenStack community
• Links for later:
• Puppet for OpenStack
• Chef for OpenStack
• Ansible for OpenStack
• SaltStack for OpenStack
• Pile of bash scripts for OpenStack
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32
• Unit tests for your deployment code are a good idea
• ServerSpec tests to make sure your config management system did what it was supposed to are great
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD• Packaging
• Automated test
• Monitoring• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
…well, haven’t you always wanted a butler?
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35
• DevOps: actually pretty handy
• OpenStack change velocity (community’s and yours)
• Anecdote: the majority of deployments I work with have some customizations or backports from future releases
• It’s not just OpenStack, it’s all the underpinning components and your CM code too!
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36
• OpenStack itself uses CI/CD tools in it’s development process…you should consider using them in your cloud buildouttoo!
• The OpenStack Infra team has created some awesome tools: JJB, Zuul, etc
• They’re all open source and you can even see how OpenStack’s own CI is set up (check out Elizabeth Joseph’s slides from yesterday for more!).
• The basics:
• An integration server (Jenkins, Go, Travis, etc)
• A code review and repository tool (Gerrit, Cgit, GitHub, etc)
• A battery of automated tests (lint checks, rspec-puppet, Tempest, Rally, etc)
• Some form of packaging (rpmbuild/mock, sbuilder/pbuilder, etc)
• An artifact repository (Artifactory, yum/apt repos, etc)
• Optionally, some deployment jobs (usually powered by your CM tool)
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37
• …you never intend to change the code yourself
• …building your own packages would violate a support contract with your distribution
• …you’ve never used a CI/CD pipeline before (but really: you should start learning)
• …you have a static environment that absolutely will not change, need to add capacity, etc.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD• Packaging
• Automated test
• Monitoring• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39
• Now that you have a cloud, you’ll probably want to know that all it’s parts stay in good working order.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42
• I’ve worked on a lot of OpenStack clouds and almost everyone has their own preferred monitoring toolset.
• One possible exception: almost everybody seems to love Graphite.
• The golden rule is: use the tools that work for you!
• Very often this will be whatever you’re using in the rest of your infrastructure.
• Break it down into at least two buckets:
• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are OpenStack plugins out there on NagiosExchange)
• Trending data collection/plotting (ex: collectd/statsd feeding graphite)
• Also: use your peers!
• Check out Tong Li’s Monitoring as a Service talk later today!
• Operators often willing to share, so ask on the openstack-operators list.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43
• Architecture
• Components
• High Availability
• Bare metal bring-up
• Config management
• CI/CD
• Packaging
• Automated test
• Monitoring
• Up/down alerting
• Trending data
• Logging and log search
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45
• Distributed systems generate logs…..all over the place.
• Finding the root of problems may mean correlating logs from different machines…but which?
• OpenStack in particular *can* be pretty verbose
• You may also be dealing with logs from other distributed tools in your cloud (RabbitMQ, databases, etc)
• Generally you want to get logs together, be able to search them, and be able to visualize them.
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46
Unlike monitoring tools, there seems to be pretty broad consensus on good tools here in deployments I’ve worked with….
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47
http://www.elasticsearch.org/blog/openstack-elastic-recheck-powered-elk-stack/
(visualization)
(collection)
(search/analytics)
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 48
Questions?@marktvoelker
http://openstack.org/
http://cisco.com/go/openstack/
(yes, we’re hiring!)
top related