hkg15-204: openstack: 3rd party testing and performance benchmarking
Post on 15-Jul-2015
262 Views
Preview:
TRANSCRIPT
Presented by
Date
LEG HKG15-204
OpenStack Testing and Performance Benchmarking
3rd-Party CI, Rally and TempestAndrew McDermott
Clark Laughlin
Tuesday, 10 Feb 2015
Agenda
● Update on 3rd party CI testing○ why, where and how
● Update on Tempest○ analysis of results○ current issues○ plan going forward
● Explanation of Rally○ results○ how we can make use of it
OpenStack 3rd-Party CI
● Goal: Get ARM recognized as an equal, supported platform for OpenStack○ Path to recognition requires setting up 3rd-party CI
system○ Must be able to demonstrate stability before being
allowed to vote on patches
OpenStack 3rd-Party CI● What it is:
○ Run Tempest against OpenStack triggered by gerrit events○ Report results back to OpenStack gerrit○ Functional test of OpenStack components
● What it is not:○ A general purpose arm64 test environment○ Testing hypervisor functionality○ Testing performance, functionality of VMs
OpenStack 3rd-Party CI● How?
○ Setting up using OpenStack CI components○ OpenStack deployment with KVM as hypervisor○ Run devstack/tempest configured to use QEMU
instances
Image credit: http://thoughtsoncloud.com/2014/09/creating-continuous-integration-environment-openstack/
arm64nova-compute
nodes
OpenStack 3rd-Party CI
● Setting up a dedicated testing environment in Linaro co-lo facility○ HP Moonshot
■ Single chassis■ ~5 HP m300 cartridges (8-core amd64) running CI
infrastructure services■ ~20 HP m400 cartridges (8-core arm64) running test
instances (KVM)
OpenStack 3rd-Party CI
● Plans:○ Initially handle gerrit events for nova○ Over time scale to handle additional projects:
■ cinder■ glance■ swift■ neutron
Questions
● What other projects does Linaro need to work towards adding test support for?○ Network/storage plugins?
Questions
● Would anyone like to help?○ Help debugging / fixing Tempest failures?○ Experience setting up an OpenStack CI?
OpenStack Rally
● Rally is an OpenStack project that provides a framework for measuring performance, for benchmarking and validation○ run benchmarks that explain how the
deployment scales○ provides an historical view of the benchmarks
that were run○ details how fast they run○ validates that the workload run successfully
Rally versus Tempest
● Rally is a higher-level tool than Tempest○ Tempest is typically about running something once
○ Rally is more about testing across your data centres with 1000’s of machines, each with 1000’s of users/tenants
● Note: validation could use tempest as a workload
Rally high-level use cases
● Rally for devops○ uses existing cloud, simulate real-world load, aggregate results, verify
SLA have been met
● Rally for developers and QA○ deploy, simulate real-world load, iterate on performance issue, aggregate
results, make openstack better by upstreaming patches
● Rally for Continuous Integration / Delivery○ deploy on specific h/w configuration with latest versions from tip, run a
specific set of benchmarks, store performance data for historical trend analysis, report results - this use case is our initial focus
Rally Benchmark Scenarios● A scenario is a benchmark specification
○ Typically grouped into OpenStack functional areas● A scenario performs a small set of atomic operations
○ nova: boot then delete an instance○ keystone: create user, then list user
● Benchmark scenarios are also customisable○ which image to use, how much RAM, disk, CPU
Rally Benchmark Runners● Control the execution of a benchmark● Provide different strategies for applying load to the deployment:
○ constant - generate a constant load N times○ constant-for-duration - constant, but time limited○ periodic - intervals between consecutive runs
● Key aspect is concurrency○ Run same test but with concurrent invocations○ This is quite different to tempest testing
Rally Benchmark Context
● A Context typically specifies:○ the number of users/tenants○ the roles granted to those users/tenants○ whether they have extended or narrowed quotas
● Running a test on your laptop is different to running the test at scale
Rally Example Scenario{ "NovaServers.boot_server": [ { "args": { "flavor_id": 42, "image_id": "73257560-c59b-4275-a1ec-ab140e5b9979" }, "runner": { "type": "constant", "times": 15, "concurrency": 2 }, "context": { "users": { "tenants": 1, "users_per_tenant": 3 }, "quotas": { "nova": { "instances": 20 }
Rally Benchmark Database
● Rally stores results in a database• data mining & trend analysis• looking at historical results• results can be arbitrarily tagged, then used in SQL queries
$ rally task list+--------------------------+---------------------+-----------+--------+| uuid | created_at | status | failed |+--------------------------+---------------------+-----------+--------+| fbdf6a3e-...fe47d6345d13 | 2014-10-22 15:26:37 | finished | False || ab231519-...3a72b7460fad | 2014-10-22 15:29:32 | finished | False || 67ff34c4-...a6a651f1c458 | 2014-10-24 13:33:15 | finished | False || 495598c5-...98b0e9b005e6 | 2014-11-12 11:02:46 | finished | False |+--------------------------+---------------------+-----------+--------+
Nova “boot-and-delete” scenario
● Manual runs of the “boot-and-delete” scenario○ Results for 1 controller, 2 compute ○ Results for 1 controller, 3 compute
● Disclaimer: results and timings are exemplary - machines and network shared
How we use and run Rally?
● Deployment, testing and running of Rally through LAVA and manually
● Start with nova scenarios○ grow and expand for other OpenStack components○ Future: benchmark ODP and NFV
● Run scenarios against icehouse, juno and tip
Openstack Tempest Update
● Summary from LEG-SC meeting● Analysis of results● What are the current issues● What we plan to do next cycle
Tempest Result Summary● Bundle Stream: https://validation.linaro.
org/dashboard/streams/private/team/mustang/mwhudson-devstack/bundles/7c4d42405460a199ae694d0affe8d9e3ae96c64e/
ARMv8 x86 (OpenStack CI)
Pass 1379 2051
Fail 36 0
Skip 322 200
Understanding “skips”
● Components not installed■ cinder, neutron, trove, sahara, ceilometer, zaqar, etc.
● Config setting not enabled■ Nova v3 API, suspend, live migration
● Currently disabled (existing bugs)● Configuration errors
■ ping/ssh access not enabled■ not enough images in glance
Examining Tempest failures● Some reasons:
○ HTTP timeouts in test setup○ Invalid configuration creating instances (attempting to use
IDE bus)● Common ARM and x86 failures
○ Unable to locate instance/image by ID○ Unable to establish SSH connection to running instance○ Tempest test suite can hang when running concurrently (e.
g., --concurrency=8)
Getting more tests passing
• We need to enable subsystems like cinder (needs PCIe)
• Get live migration working• Live migration is planned for 2015.03• PCIe (hot plug) is planned for 2015 Q2
• Neutron, getting it configured and working on ARMv8
Ongoing LAVA testing plan (1)
● Dedicate 3 (new) machines in LAVA for OpenStack testing
● Will improve test execution time○ no reboot○ no reinstall of base OS for each run○ not shared
● Machines will also be used for Rally benchmarking
Ongoing LAVA testing plan (2)
● Establish baselines results for:○ icehouse vs juno vs tip
● CI jobs for both ARM and x86○ Want a baseline to make comparisons○ x86 is minimal, best effort only
● Investigate LAVA results○ some LAB issues○ some test jobs fail very early
Linaro OpenStack bugzilla
● Bug database setup:○ https://bugs.linaro.org/enter_bug.cgi?
product=OpenStack● Capturing ARMv8 only bugs
○ Common bugs will be reported upstream
top related