considerations for operating an openstack cloud

Post on 02-Jul-2015

377 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

My talk from All Things Open 2014 Over the past four years, OpenStack has become a widely adopted cloud operating system. Cloud computing has made many tasks like creating new servers and networks easy for end users by creating abstractions above the infrastructure. However, cloud operators need to maintain not only the cloud operating system itself, but all of the underpinning systems beneath it. The challenges of managing a set of distributed systems isn’t small, but with proper tooling is well within reach. This talk will discuss considerations for cloud operators such as logging, storage, monitoring, high availability, configuration management with a focus on OpenStack clouds with a focus on open source solutions for common issues encountered when operating an OpenStack cloud. We’ll consider data gathered from the community and discuss “day 1″ and “day 2″ concerns as well as established patterns and technology choices among OpenStack deployers today.

TRANSCRIPT

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1

Mark T. Voelker, Technical Leader @ Cisco

OpenStack ATC/StackForge Puppet Core/Foundation Member #54

All Things Open 2014

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

@marktvoelker

• Tech Lead at Cisco, StackForge Puppet core developer, OS Foundation Member #54

• Fact: can be bribed with doughnuts

• Currently works in Cisco’s Cloud & Virtualization Group

• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively Scalable Data Centers, Devops, making sawdust with extreme prejudice

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

• Tech lead, manager, software developer, architect

• Started in OpenStack in 2011 at the Diablo Design Summit

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

The great thing about my job is that I get to have fun exploring a lot of new things…

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

….and I get to help build a LOT of clouds.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Today’s talk won’t be overly formal….

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

…because I tend to get excited by this stuff.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

……then you know how to get to Day 1.

Now let’s talk about getting to Day 30…

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

High

Availability?

Sounds

great--I’ll

take two!

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

• Consider whether you want active/active or active/passive

• Setup and tooling differs a bit, but I generally like active/active

• Note that docs.openstack.org has an HA Guide

• A bit dated…patches welcome!

• Prioritize HA for the control plane

• That also means thinking about your database, network, and RPC bus

• Instance-level HA: there be dragons

• But yes, it’s being looked at

• Pets vs cattle

• Note: HA == more hardware

• Some components need at least 3 nodes

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

• Stuff OpenStack needs to run: message brokers

• Check out RabbitMQ clustering and mirrored queues

• Check out Galera for MySQL/MariaDB

• I usually see Percona XtraDB

• Frontend with an HAProxy/Keepalived pair

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

• Don’t do rabbit clustering

over a WAN

• Be aware of the SELECT…

FOR UPDATE issue

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

• Long story short: Neutron and some parts of Nova invoke an SQL pattern known as “SELECT…FOR UPDATE” which Galeradoesn’t support due to issues with cross-node locking.

• Can cause deadlocks symptoms.

• Neutron/nova code being refactored to remove, but will likely not be done until at least Kilo.

• Meanwhile: use HAProxy to send writes to a single Galera node and you should be fine

• With the obvious scalability bottleneck

• More info here.

• Thank Jay Pipes & Peter Boros for

the find!

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

• Use Swift, Ceph, or other highly available storage to back Glance

• Pick a highly available storage backend for Cinder too

• Use Keepalived/HAProxy to front-end multiple API servers

• Or another load balancer technology of your choice

• Can be deployed as dedicated nodes for scale, or cohabitate

• Network: DVR vs Provider Network Extensions

• Distributed Virtual Routers are a new experimental feature in Juno (not yet ready for production)

• Please go test it and report/fix bugs!

• Provider networks essentially punt the availability issue to your physical network

• Allows you to use standard tools like virtual port channels and VRRP

• Also highly performant

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

We start with bare metal.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

• For a cloud of any real size, you don’t want to be installing operating systems by hand

• Remember that baremetal bringup actually isn’t something that just happens once…often recurs for upgrades, capacity expansion, etc.

• Baremetal bringup tools can also have other uses, like inventory or bootstrapping configuration management agents.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

• A simple (~15k lines of Python code) tool for managing baremetaldeployments

• Flexible usage (API, CLI, GUI)

• Allows you to define systems (actual machines) and profiles (what you want to do with them)

• Provides hooks for Puppet so you can then do further automation once the OS is up and running

• Provides control for power (via IPMI or other means), DHCP/PXE (for netbooting machines), and more.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

• Razor• Developed by EMC, managed by Puppet Labs (occasionally used with Chef

too)

• Initial release in 2012

• Uses a “microkernel” loaded onto the machine to gather facts before provisioning

• Tag + Policy model

• Crowbar• Originally written by Dell, now a community project

• Originally designed to deploy OpenStack on all the way from baremetal

• Now deploys other stuff too (namely, Hadoop)

• Uses Chef to handle everything after the OS install

• Foreman• Used by Red Hat among others

• Does baremetal bringup and serves as a Puppet ENC

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27

“Cloud isn’t just an infrastructure technology….it’s a new operations model. And with OpenStack in particular, it’s one that’s very well suited to a DevOps style of management. Many companies aren’t just adopting cloud, they’re changing how they operate.”

“Besides, logging into servers to mess with config files makes me sad.”

--That ranty guy in Raleigh again

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

• Remember, OpenStack is a set of interoperating distributed systems

• That means you’re going to have a lot of software to configure on a lot of machines

• You’re probably going to want to make changes over time

• You’re probably going to have more than one person touching your cloud

• CM tools help you treat configuration as code, so you can collaborate more easily

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29

Pile of

Bash

Scripts

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

• An increasingly common pattern:

• Puppet or Chef for configuration management, PLUS

• Ansible or Salt for cross-node orchestration

• Recommendation: use the tools that work for you!

• But remember: you don’t have to do it alone.

• Several CM tools have thriving collaborators in the OpenStack community

• Links for later:

• Puppet for OpenStack

• Chef for OpenStack

• Ansible for OpenStack

• SaltStack for OpenStack

• Pile of bash scripts for OpenStack

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32

• Unit tests for your deployment code are a good idea

• ServerSpec tests to make sure your config management system did what it was supposed to are great

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34

…well, haven’t you always wanted a butler?

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35

• DevOps: actually pretty handy

• OpenStack change velocity (community’s and yours)

• Anecdote: the majority of deployments I work with have some customizations or backports from future releases

• It’s not just OpenStack, it’s all the underpinning components and your CM code too!

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36

• OpenStack itself uses CI/CD tools in it’s development process…you should consider using them in your cloud buildouttoo!

• The OpenStack Infra team has created some awesome tools: JJB, Zuul, etc

• They’re all open source and you can even see how OpenStack’s own CI is set up (check out Elizabeth Joseph’s slides from yesterday for more!).

• The basics:

• An integration server (Jenkins, Go, Travis, etc)

• A code review and repository tool (Gerrit, Cgit, GitHub, etc)

• A battery of automated tests (lint checks, rspec-puppet, Tempest, Rally, etc)

• Some form of packaging (rpmbuild/mock, sbuilder/pbuilder, etc)

• An artifact repository (Artifactory, yum/apt repos, etc)

• Optionally, some deployment jobs (usually powered by your CM tool)

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37

• …you never intend to change the code yourself

• …building your own packages would violate a support contract with your distribution

• …you’ve never used a CI/CD pipeline before (but really: you should start learning)

• …you have a static environment that absolutely will not change, need to add capacity, etc.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39

• Now that you have a cloud, you’ll probably want to know that all it’s parts stay in good working order.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42

• I’ve worked on a lot of OpenStack clouds and almost everyone has their own preferred monitoring toolset.

• One possible exception: almost everybody seems to love Graphite.

• The golden rule is: use the tools that work for you!

• Very often this will be whatever you’re using in the rest of your infrastructure.

• Break it down into at least two buckets:

• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are OpenStack plugins out there on NagiosExchange)

• Trending data collection/plotting (ex: collectd/statsd feeding graphite)

• Also: use your peers!

• Check out Tong Li’s Monitoring as a Service talk later today!

• Operators often willing to share, so ask on the openstack-operators list.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45

• Distributed systems generate logs…..all over the place.

• Finding the root of problems may mean correlating logs from different machines…but which?

• OpenStack in particular *can* be pretty verbose

• You may also be dealing with logs from other distributed tools in your cloud (RabbitMQ, databases, etc)

• Generally you want to get logs together, be able to search them, and be able to visualize them.

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46

Unlike monitoring tools, there seems to be pretty broad consensus on good tools here in deployments I’ve worked with….

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47

http://www.elasticsearch.org/blog/openstack-elastic-recheck-powered-elk-stack/

(visualization)

(collection)

(search/analytics)

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 48

Questions?@marktvoelker

http://openstack.org/

http://cisco.com/go/openstack/

(yes, we’re hiring!)

top related