war stories from building a public cloud
TRANSCRIPT
War Stories from Building a Public Cloud
QCon New York - 2015
Amila Maharachchi
Senior Tech Lead, WSO2 [email protected]
Motivation
o Historyo StratosLiveo Stratos -> Donated to Apacheo Wanted to provide a better user experienceo WSO2 API Manager was becoming a hot producto WSO2 AppFactory was in the making
Beginning
o Two cloudso App Cloud
■Powered by WSO2 AppFactoryo API Cloud
■Powered by WSO2 API Manager
oIn beta for nearly two yearsoAPI Cloud is commercial now
Why it is a war :)
o It is not a war, buto More than 100 instanceso Can’t let a bug to live too longo Need to upgrade frequentlyo Customer issues/questionso We depend on other WSO2 products, but...
Will be sharing the experience on..
o Customizations and new developmentso Planning the deploymento Configuration managemento Monitoring and alertso Bug fixes, upgrades and migrationso Securityo Backups and restorationo Statisticso Feedback and customer supporto Performance issueo Processes
Customizations and new developments
o New user modeloRequirement of plugging the wso2.com
userstoreoEase of registering and working in
organizationsoWrote our own userstore implementation
oManagement appoTwo clouds (and more in the future) to be
centrally managed
Customizations and new developments
user1@tenant1
user2@tenant2
user3@tenant3
Tenant 1
Tenant 2
Tenant 3
Old Model New Model
Planning the deployment
o AWS as the IaaSoPrevious experience in running a cloud in our
infrastructureoWe are not specialized in maintaining data
centersoSo, why waste our time
o EC2, VPC, RDS, R53o High availability
Configuration Management
o We had experienced our own solution previously
o We were also playing with Puppeto Some facts to consider
oAWS instances are shutdown for maintenanceoNecessity of scalingoSetting up multiple environments
o Decided to go with puppeto We manage more than 100 nodes now
Monitoring & Alerting
o Three types of monitoring were neededoHealth of the instances and JVM processes
■SNMP, Nagios■Emails, Phone alerts etc.
oFunctionality health■Our own heartbeat monitoring tool■Improved to track the uptime as well■Keeps adding more tests
oLogs■To smell trouble■Logstash and Kibana from https://www.elastic.co/
Bug fixes, Upgrades & Migration
o Bug fixingoWe can’t let a bug to exist in the live systemoWe are a customer of WSO2 :-)oGet patches from WSO2 Support
oUpgrades & MigrationsoDeploy AppFactory milestones every 2 or 3
weeksoSome ends up needing migrationsoHave our own ways
oTargetoContinuous deployment
Security
o Access to infrastructureo Public/Private access for services
oWhich service/product should be exposed publicly/privately
oSecuring users’ dataoNative multi-tenancy supportoJava security manager enabled
■Very strict at the moment
Backups & Restoration
o Any user artifact is in Git or SVNo Snapshots taken
oRDSoEBSoAutomated via AWS facilities
oLDAP backups
Statistics
oWe need to know what is happeningoHow many users are using this dailyoHow far they gooUnderstand about our UX
oPublish stats to WSO2 BAM on various user activities
oRun analytic scriptsoUptime trackingoAlso refer logstash stats
Feedback and customer support
oThere was no way for the users to contact usoProvided few methods
oContact us menu at the top■Via StackOverflow■Via email - which will automatically create a jira
oImproved our customer serviceoMonitor the dashboardoKeep the user informed regularly
Performance
o Identified and fixed several performance issues
o Changed some architectures as wello Several issues
oRegistry related issuesoFile system related issues
oHad problems when the number of tenants were growing
oNow we have cleanup mechanisms in place
Processes
o If same mistake happens more than once, its negligence.
oBut, people do make mistakeso Processes are the best way to minimize them
oApplying patchesoMaking a config changeoMonitoring logs for errorsoSupporting users
oChecklistsoIn upgrades