openstack trove in production at hp - troveday 2014

23
August 19, 2014 OpenStack Trove Day Vipul Sabhaya, Software Development Lead, HP Cloud Trove in Production at HP

Upload: tesora

Post on 29-Jun-2015

1.096 views

Category:

Technology


1 download

DESCRIPTION

Presentation by Vipul Sabhaya, Software Development Lead, HP Cloud at OpenStack Trove Day 2014

TRANSCRIPT

Page 1: OpenStack Trove in Production at HP  - TroveDay 2014

August 19, 2014

OpenStack Trove Day

Vipul Sabhaya, Software Development Lead, HP Cloud

Trove in Production at HP

Page 2: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 2

What is this about?• Trove• How to deploy Trove with HA• How we do config management• Monitoring Trove• Operating Trove

8/19/14

Page 3: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 3

Trove• Database as a Service• MySQL• MongoDB• Cassandra• Postgres• …

• Integrated Openstack Project• Icehouse Release

8/19/14

Page 4: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 4

Architecture

8/19/14

Page 5: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 5

Which Cloud?• Trove has only API dependencies• Overcloud (bare-metal)?• In-Cloud (vms)?

8/19/14

Page 6: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 6

HA Trove• HA OverCloud• Availability Zones

• HA Trove Control Plane• Control Plane across availability zones• Galera Cluster• RabbitMQ Cluster• Multiple Trove API, TaskManager, Conductors

8/19/14

Page 7: OpenStack Trove in Production at HP  - TroveDay 2014
Page 8: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 8

How did we get here?• Salt Stack

• Salt-based Trove deployment• https://github.com/saurabhsurana/trove-installer/tree/m

aster/saltstack

• Salt-based Openstack deployment• https://github.com/EntropyWorks/salt-openstack

8/19/14

Page 9: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 9

Configuration Management• Helps define/control • Packages and dependencies to be installed• Configuration files to be copied• Users / groups

• Gives a reproducible state of the infrastructure

• Highstate Trove-managed VMs on first boot

8/19/14

Page 10: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 10

Remote Execution• No SSH

• Can control infrastructure from single machine

• Can define user and resource level access

• Specifically useful for Trove to help manage DB instances

8/19/14

Page 11: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 11

trove-api.slstrove:  user.present:    - name: trove

trove-package:  pip.installed:    - name: trove    - require:      - user: trove

/etc/trove/trove.conf:  file.managed:    - source: salt://trove/api/trove.conf    - template: jinja    - user: trove    - require:      - pip: trove-package      - user: trove

trove-api:  service:    - running    - enable: True    - watch:      - pip: trove-package      - file: /etc/trove/trove.conf

8/19/14

Page 12: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 12

trove.conf# Number of child processes to runtrove_api_workers = {{ pillar['trove_worker_threads']}}

# AMQP Connection inforabbit_password = {{ pillar['trove_rabbit_password'] }}rabbit_hosts = {{ pillar['trove_rabbit_hosts'] }}rabbit_userid = {{ pillar['trove_rabbit_userid'] }}

sql_connection = {{ pillar['trove_mysql_connection']}}

{% if not pillar['devstack_setup'] %}

# Updates service and instance task statuses if instance failed become activeupdate_status_on_fail = True

# how long to wait for guest agent to become active (in sec) (default is 300)usage_sleep_time = 30usage_timeout = {{ salt['pillar.get']('trove_guestagent_active_timeout', 600) }}

{% endif %}

# Path to the extensionsapi_extensions_path = {{ pillar['trove_path'] }}/extensions/routes

8/19/14

Page 13: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 13

Trove @ HP Helion• Image-based Deploys• TripleO• Trove Heat Templates• Trove Image Elements

• Saltcloud / Nova wrapper -> Salt Master -> Trove

• Seed -> Under -> Over -> Heat -> Trove

8/19/14

Page 14: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 14

Operations - SaltStack• Most of the DBaaS operations are based on

SaltStack• HA Deployment of Salt Masters• Control the access to infrastructure with Salt Stack• Control access to customer instances • To help Debug the issues• But protect the data and access to MySQL database

• Each Trove guest instance becomes a minion

8/19/14

Page 15: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 15

Trove Upgrades• Trove Datastore must be usable during all upgrades• Upgrades usually involve downtime• RPC Versioning

• Upgrade Sequence that we follow:• Upgrade all the guest agents first (trove service)• Upgrade Task Manager and Conductor• Upgrade API servers• If new RPC method is introduced, it must be available on the

Guest before an api operation is performed

8/19/14

Page 16: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 16

Security of key Trove components• Use SSL• Trove API• RabbitMQ

• Security Group• Database• Only Control Plane components needs access

• RabbitMQ• Control Plane and All the guestagent needs access, but use the range where

ever possible

• Use separate DB and RMQ Credentials for each service

8/19/14

Page 17: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 17

Monitoring of Trove Service / Instances

8/19/14

• Trove doesn’t ship with monitoring• Upstart scripts respawn Trove services• Monitor Trove API ports with Nagios• Monitor RabbitMQ and DB connectivity from

Control plane nodes

Page 18: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 18

Monitoring of key Trove components

• RabbitMQ• Number of Queues• Number of Sockets used• Number of Established Connections• Cluster Status• Failed access attempts

• Database• MySQL standard monitoring• Cluster status• Slow query log• error.log for unauthorized/failed access attempts

8/19/14

Page 19: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 19

Monitoring of key Trove components

• Trove Guest Agent Heartbeat status• Trove Instance Audit (catch failed instances

to help identify service issues)• Connectivity to trove instances from outside

8/19/14

Page 20: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 20

What we learned?

8/19/14

Page 21: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 21

OpenStack Trove : RabbitMQ • RabbitMQ• Up the default socket descriptor limit (as that will blow up

pretty soon)• Number of queues and sockets will keep on growing, if you

don’t enable RabbitMQ connections with heartbeat• Monitoring is the key to deal with RabbitMQ cluster

configured with Mirrored queues

8/19/14

Page 22: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 22

OpenStack Trove• GuestAgent Hearbeats (Service Status notifications)

should be monitored for failure• Upgrading the Guest Agent is tricky on xsmall • Quota mismatch between Trove and Nova would be

the biggest reason for instance failures• Resource mismatch between Trove and Nova• Schedule jobs to correct things

8/19/14

Page 23: OpenStack Trove in Production at HP  - TroveDay 2014

tesora.com 23

Thank you

8/19/14