lessons learned running the largest openstack clouds

18
Lessons Learned Running The Largest OpenStack Clouds KENNETH HUI Senior Technical Marketing Engineer Technology Evangelist @kenhuiny

Upload: kenneth-hui

Post on 18-Feb-2017

179 views

Category:

Technology


0 download

TRANSCRIPT

Lessons Learned RunningThe Largest OpenStack Clouds

KENNETH HUISenior Technical Marketing EngineerTechnology Evangelist@kenhuiny

AGENDA

• Rackspace and OpenStack• A Short History Lesson• Lessons Learned• Resources

RACKSPACE ANDOPENSTACK

RACKSPACE AND OPENSTACK:THEN AND NOW

RACKSPACE PUBLIC CLOUD

• 6 Geographic regions around the globe• Tens of thousands of hypervisors• Over 350,000 Cores, Over 1.2 Petabytes of RAM• Hundreds of thousands of virtual machines• Several hundred on-metal instances• Hundreds of thousands of virtual switch ports

Concept of Nova Cells to scale regions to 1,000 of nodes

Tempest: the initial QA test framework for OpenStack

OpenStack Ansibledeployment project

Magnum: the container management system

Rewriting the Swiftobject server in Go to Meethyper-scale demands

Barbican: the key management service

KEY COMMUNITYCONTRIBUTIONSTO OPENSTACK

RACKSPACE’SLEADERSHIP

• Freely share lessons learned • Contribute code and ideas to the OpenStack

project• Open source tools based on what we use to

operate our clouds

OPENSTACK INNOVATIONCENTER (OSIC)

THREE PILLARS1. Train the Next Generation of OpenStack Contributors2. Contribute to the removal of Enterprise barriers to OpenStack

adoption3. Provide an avenue for operational scale testing to the OpenStack

community

A SHORT HISTORY LESSON

• Before OpenStack, there was Slicehost• Scaling limits led to OpenStack• Xen is Slicehost’s legacy in the Rackspace

Public Cloud• 10’s of thousands of existing customers meant

starting at scale• Private Cloud started with clean sheet of paper

ORIGIN STORY

RACKSPACE’SAPPROACH

• Continuously upgrade our public cloud– Deploy upstream OpenStack code– Patch regularly

• Only use projects stable enough to run in production at scale• Don’t reinvent the wheel• Change code in production to meet scale requirements

– Certain bugs we only find in production– Contribute back upstream when appropriate

• Move ahead of community when necessary– Create service with internal software– Contribute code and lessons learned to project– Switch to project code when ready

LESSONS LEARNED

• Why Cells?– Scaling – DB & RabbitMQ, – Reduce failure impact– Broadcast domains/ Nova– Multiple compute flavors – SSD– Multiple hardware types

• How we use Cells– ~100 hosts per cell – scaling/failure impact– Multiple cells per region – Failure impact– Group same flavor types– Group servers from same vendor – Live migration

• Takeaways– Use cells from day 1– Plan for scale

PARTITIONYOUR CLOUD

ABSTRACT YOURCONTROL PLANE• iNova- Ancestor to TripleO

– Seed servers in each region– Seed servers & Cells runs on VMs– Easy to deploy, tear down, redeploy services– React to issues quickly - Spikes

• Virtualized compute nodes– Nova compute runs as VM on compute node– Limits impact of compute node failure– Reboot compute node but not hypervisor– Security isolation

• Takeaways– Explore TripleO – Red Hat OpenStack– Containerize your control plane – OSA– Protect your control plane – Use HA

AUTOMATE EVERYTHING• Operator error is more common than software

failure• Automation = Making time• OpenStack Ansible

– Encodes recommended practices– Rackspace Private Cloud RA– Highly customizable– Great community support

• Takeaways– Automation starts day 1– Pick an appropriate tool and run with it

USE FLEET MANAGEMENT• Failure is inevitable at scale• We created tools to manage the fleet

– Auditor – Monitor for rules compliance– Resolver – Automate tasks based on events– Use Cases• Upgrades and patches – Xen vulnerability live patch• Maintenance – Live migration

• Takeaways– Focus on service availability over component

availability– You can’t manage what you don’t know– Leverage live migration– Check out Project Craton

• Rackspace Public Cloudhttps://www.rackspace.com/cloud

• Rackspace Private Cloudhttps://www.rackspace.com/cloud/private/openstacksolutions

• OpenStack Innovation Centerhttps://osic.org/

• Rackspace Bloghttp://blog.rackspace.com/

• Rackspace Videos at OpenStack Summitshttps://www.youtube.com/user/OpenStackFoundation/playlists

• Project Cratonhttps://github.com/openstack/craton

RESOURCES

THANK YOU

ON E F AN AT I C AL PL AC E | SAN AN T ON I O, T X 7 8 2 1 8

US SALES: 1 -800-961-2888 | US SUPPORT: 1 -800-961-4454 | WWW.RACKSPACE.COM

© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM