continuous validation at scale

17
Symantec Confidential – Cloud Platform Engineering 1 Continuous Validation at scale Vijay Seshadri Cloud Platform Engineering (CPE), Symantec

Upload: mirantis

Post on 08-May-2015

719 views

Category:

Technology


4 download

DESCRIPTION

Vijay Seshadri of Symantec's, slide deck from the OpenStack at Mega-scale Meetup on April 2nd, 2014.

TRANSCRIPT

Page 1: Continuous Validation at Scale

Symantec Confidential – Cloud Platform Engineering 1

Continuous Validation at scale

Vijay SeshadriCloud Platform Engineering (CPE), Symantec

Page 2: Continuous Validation at Scale

Agenda

CPE Overview1

What is Continuous Validation?2

SCTF Overview & Usage3

SCTF Design and Roadmap4

2

Page 3: Continuous Validation at Scale

CPE Overview

• CPE Charter– Consolidated cloud infrastructure that offers platform services for Symantec cloud applications

• Symantec Cloud Infrastructure already operating at scale

– Compute – Reputation based security– Storage – Consumer and Enterprise backup– Network – Hosted email security

• How do we leverage the best practices/insights from operating at scale to the new platform?

• Core objectives– Secure, scalable and reliable OpenStack based cloud platform

Page 4: Continuous Validation at Scale

Cloud Platform Engineering (CPE)

Core Services

CPE Platform Architecture

2

Compute Networking Storage

CLIs ScriptsCloud Applications

Big Data Messaging

Identity & Access

(Keystone)Supporting Services

Authn

Roles

User Mgmt

Tenancy

Quotas

Logging

Metering

Monitoring

Deployment

Compute (Nova)

Image (Glance)

SDN (Neutron)

Load Balancing

DNS SQL

Batch Analytics

Stream Processing

Msg Queue

Mem Cache

Email Relay

SSL

K/V Store

Web Portal

Object Store

REST/JSON API

Page 5: Continuous Validation at Scale

CPE Reference App #1 - Log Collection service

CPE Cloud

Object Store (Swift)

Compute

VM0

VM1

LB

Container

DNS queries

KeystoneAuthentication

Log Collection AppLog Sources(e.g security metadata, install logs, telemetery)

1 Acquire an authentication token

2Create two VMs, associate a network and start them using a CentOS image

3

Create a LB endpoint, place the two VMs in it and configure a DNS entry

4Provision a container in the Object store

5Deploy and start the flask application

6 Fetch log files from Object store

Page 6: Continuous Validation at Scale

Problem Statement

• Cloud infrastructure at scale is a highly dynamic environment

– Diversity of cloud workloads • Cannot predict application behaviors and patterns

– Addition and removal of resources (machines, network equipment etc.)

– Configuration drift over a period of time– External events causing huge variations in network, compute and storage consumption

– Stability issues occur when you cross scale boundaries (jump an order of magnitude)

• Key Question – What validation tools/frameworks do we need to identify issues at scale and remediate them?

Page 7: Continuous Validation at Scale

What capabilities do we need in a validation framework?

• Ability to test generic REST/JSON endpoints (services)– Including OpenStack and platform services

• Ability to quickly create tests for functionality, stability and performance

– Should not be burdensome for developers• Ability to customize/extend test conditions and/or verification functions

• Independent channel of verification– Higher order verification

• E.g Just don’t check for return status from individual services, but verify end-to-end function

– Extensible, pluggable design

• Provide continuous visibility into the health and performance of production cloud

– Proactively monitor transient and persistent errors

Page 8: Continuous Validation at Scale

Continuous Validation State Transitions

Page 9: Continuous Validation at Scale

Symantec Cloud Test Framework (SCTF)

• What is SCTF? – A set of python libraries, scripts and simple text files (YAML) that facilitate the validation of a cloud infrastructure

– Primitives for expressing REST requests and validating responses

Built in exec function

Test Command

Validation condition

Page 10: Continuous Validation at Scale

How to run SCTF?Input YAML

fileTest case

name

Validation summary

Page 11: Continuous Validation at Scale

SCTF Usage – Simple web request

Built in Web service function

Request URL and Method

Response Code

Page 12: Continuous Validation at Scale

SCTF Usage – Reusable Primitives

Test Procedure Name

Variable definitions

Test case definition

Page 13: Continuous Validation at Scale

SCTF Usage – Independent channel of verification

Built in exec function

started after VM create

ssh command

line

Retry args

Page 14: Continuous Validation at Scale

SCTF Design

Page 15: Continuous Validation at Scale

SCTF Roadmap

• Stream files– enable large file downloads• Test Runner – execute all test files in a directory hierarchy• Preserve comments – retain comments after programmatic manipulation

• Improve error reporting - make stack traces and error reporting more descriptive

• Incorporate salt to allow remote execution and job management

• Allow tests to be run in parallel multiple ( possible ways )– Use pykka ( https://github.com/jodal/pykka ) for actors in single

process– Call out to julia ( http://julialang.org/ ) and use the parallel facilities

Page 16: Continuous Validation at Scale

SCTF Roadmap – Cont’d

•Allow test results to be written to files and databases.•Allow test documentation to be queried.•Determine why the test failed

– Diagnosis – Remediation– Validate remediation

•Add timing and meta data to test output. •Performance as test criteria

•Add extension type to allow type handlers to be added at run-time

Page 17: Continuous Validation at Scale

Summary/Conclusion

• We plan use SCTF as a primary means of functional and performance validation

– Enable continuous monitoring of the stability and performance of the CPE cloud

– Ability to associate diagnosis and remediation with failing functional tests

– Scale the ability to generate tests along with the cloud– Enable shorter mean time to resolution

• Planning to collaborate with other similar open source projects

• Our primary motivation is to ensure the stability of an OpenStack based cloud when deployed at scale