Scaling the Platform for Your Startup - Startup Talks June 2015

Download Scaling the Platform for Your Startup - Startup Talks June 2015

Post on 29-Jul-2015

291 views

Category:

Technology

0 download

TRANSCRIPT

1. Scaling the Platform for your Startup Dean Bryen, AWS Solutions Architecture Peter Mounce, Senior Software Developer at JUST EAT 2. Why are you here? Building the technology platform for your startup You want to prepare for success Learn about design patterns & scalability A pragmatic approach for startups 3. Priorities for startups Racing within a window of opportunity Small team with no legacy Focus on solving a problem Avoid over-engineering & re-engineering Reduce risk of failure when you go viral 4. A scalable architecture Can support growth in users, traffic, data size Without practical limits Without a drop in performance Seamlessly - just by adding more resources Efficiently - in terms of cost per user 5. Day 1 Dev & private beta 6. Single host THE server (e.g. Apache, MySQL) Elastic IP www.example.com Amazon Route 53 DNS service Server Image (AMI) 7. Day 2 - Public beta 8. We need a bigger server Add larger & faster storage (EBS) Use the right instance type Easy to change instance sizes Not our long term strategy Will hit an endpoint eventually No fault tolerance 9. Separating web and DB More capacity Scale each tier individually Tailor instance for each tier Instance type Storage Security Security groups DB in a private VPC subnet 10. But how do I choose what DB technology I need? SQL? NoSQL? 11. Why start with a Relational DB? SQL is versatile & feature-rich Lots of existing code, tools, knowledge Clear patterns to scalability (for read-heavy apps) Reality: eventually you will have a polyglot data layer There will be workloads where NoSQL is a better fit Combination of both Relational and NoSQL Use the right tool for each workload 12. Key Insight: Relational Databases are Complex Our experience running Amazon.com taught us that relational databases can be a pain to manage and operate with high availability Poorly managed relational databases are a leading cause of lost sleep and downtime in the IT world! Especially for startups with small teams 13. RelationalDatabases MySQL,Aurora,PostgreSQL,Oracle,SQLServer Fully managed; zero admin Amazon RDS Aurora 14. Improving efficiency 15. Offload static content Amazon S3: highly available hosting that scales Static files (JavaScript, CSS, images) User uploads S3 URLs serve directly from S3 Let the web server focus on dynamic content 16. Amazon CloudFront Worldwide network of edge locations Cache on the edge Reduce latency Reduce load on origin servers Static and dynamic content Even few seconds caching of popular content can have huge impact Connection optimizations Optimize transfer route Reuse connections Benefits even non cachable content 17. CloudFront for static & dynamic content Amazon Route 53 EC2 instance(s) S3 bucket Static content Dynamic content css/* js/* Images/* Default(*) CloudFron t distributio n 18. Database caching Faster response from RAM Reduce load on database Application server 1.Ifdataincache, returnresult 2. Ifnotincache, readfromDB RDS database Amazon ElastiCache 3.Andstorein cache 19. Amazon ElastiCache: in-memory cache Simple to Deploy Managed Automatically replaces failed nodes Patch management Elastic Compatible ElastiCache 20. Day 3 Paying customers 21. High Availability Availability Zone a RDS DB instance Web server S3 bucket for static assets www.example.com Amazon Route 53 DNS service Amazon CloudFront ElastiCache node 1 22. High Availability Availability Zone a RDS DB instance Availability Zone b Web server Web server S3 bucket for static assets www.example.com Amazon Route 53 DNS service Amazon CloudFront ElastiCache node 1 23. High Availability Availability Zone a RDS DB instance Availability Zone b www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server S3 bucket for static assets Amazon CloudFront ElastiCache node 1 24. Elastic Load Balancing Managed Load Balancing Service Fault tolerant Health Checks Distributes traffic across AZs Elastic automatically scales its capacity 25. High Availability Availability Zone a RDS DB instance Availability Zone b www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server S3 bucket for static assets ElastiCache node 1 Amazon CloudFront 26. High Availability Availability Zone a RDS DB instance Availability Zone b www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby S3 bucket for static assets ElastiCache node 1 Amazon CloudFront 27. Data layer HA Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby 28. Data layer HA Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby ElastiCache node 2 29. User sessions Problem: Often stored on local disk (not shared) Quickfix: ELB Session stickiness Solution: DynamoDB Elastic Load Balancing Web server Web server Loggedin Loggedout 30. Amazon DynamoDB Managed document and key-value store Simple to launch and scale To millions of IOPS Both reads and writes Consistent, fast performance Durable: perfect for storage of session data https://github.com/aws/aws-dynamodb-session-tomcat http://docs.aws.amazon.com/aws-sdk-php/guide/latest/feature-dynamodb-session-handler.html 31. Day 4 Lets go viral! 32. Replace guesswork with elastic IT Startupspre-AWS Demand Unhappy Customers Waste $$$ Traditional Capacity Capacity Demand AWS Cloud 33. Scaling the web tier Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby ElastiCache node 2 34. Scaling the web tier Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby ElastiCache node 2 Web server Web server 35. Scaling the web tier Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing Web server Web server RDS DB standby ElastiCache node 2 Web server Web server 36. Automaticresizingofcompute clustersbasedondemand Feature Details Control Defineminimumandmaximuminstance pool sizesandwhenscalingandcooldownoccurs. IntegratedtoAmazon CloudWatch Usemetrics gatheredbyCloudWatch todrive scaling. Instancetypes RunAutoScalingforon-demandandSpot Instances. CompatiblewithVPC. aws autoscaling create-auto-scaling-group --auto-scaling-group-name MyGroup --launch-configuration-name MyConfig --min-size4 --max-size200 --availability-zonesus-west-2c,us-west-2b Auto Scaling Triggerauto-scalingpolicy Amazon CloudWatch 37. Decompose into small, loosely coupled, stateless building blocks Prerequisite 38. What does this mean in practice? Only store transient data on local disk Needs to persist beyond a single http request? Then store it elsewhere Useruploads UserSessions AmazonS3 AWSDynamoDB ApplicationData AmazonRDS 39. Having decomposed into small, loosely coupled, stateless building blocks YoucannowScaleoutwithease Havingdonethat 40. Having decomposed into small, loosely coupled, stateless building blocks WecanalsoScalebackwithease Havingdonethat 41. Take the shortcut While this architecture is simple you still need to deal with: Configuration details Deploying code to multiple instances Maintaining multiple environments (Dev, Test, Prod) Maintain different versions of the application Solution: Use AWS Elastic Beanstalk 42. AWS Elastic Beanstalk (EB) Easily deploy, monitor, and scale three-tier web applications and services. Infrastructure provisioned and managed by EB You maintain control. Preconfigured application containers Easily customizable. Support for these platforms: 43. Loose coupling with SQS Tightcoupling PlaceasynchronoustasksintoAmazonSQS SQS bufferthatprotectsbackendsystems Processatownpace Respondquicklytoendusers SQS Get Message Back End EC2 Instance Put Message Front End EC2 Instance 44. Day 5 Add more features 45. Mobile Push Notifications Mobile Analytics Cognito Cognito Sync Analytics Kinesis Data Pipeline RedShift EMR YourApplications AWSGlobalInfrastructure Network VPC Direct Connect Route53 Storage EBS S3 Glacier CloudFront Database DynamoDBRDS ElastiCache Deployment&Management Elastic Beanstalk OpsWorks Cloud Formation Code Deploy Code Pipeline Code Commit Security&Administration CloudWatch Config Cloud Trail IAM Directory KMS Application SQS SWF App Stream Elastic Transcoder SES Cloud Search SNS EnterpriseApplications WorkSpaces WorkMail WorkDocs Compute EC2 ELB Auto Scaling LambdaECS 46. AWS building blocks InherentlyScalable&HighlyAvailable Scalable&HighlyAvailable a ElasticLoadBalancing a AmazonCloudFront a AmazonRoute53 a AmazonS3 a AmazonSQS a AmazonSES a AmazonCloudSearch a AWSLambda a a AmazonDynamoDB a AmazonRedshift a AmazonRDS a AmazonElasticache a 4 AmazonEC2 4 AmazonVPC Automated Configurable Withtherightarchitecture 47. Stay focused as you scale your team AWS Cloud-Based Infrastructure Your Business MoreTimetoFocuson YourBusiness ConfiguringYour CloudAssets 70% 30%70% On-Premise Infrastructure 30% ManagingAllofthe UndifferentiatedHeavyLifting 48. Day 6 Growing fast 49. Scaling Relational DBs Increase RDS instance specs Larger instance type More storage / more PIOPS Read Replicas (Master Slave) Scale out beyond capacity of single DB instance Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora Replication lag Writes => master Reads with tolerance to stale data => read replica (slave) Reads with need for most recent data => master 50. Scaling the DB Web server Web server Web server Web server Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing RDS DB standby ElastiCache node 2 51. Scaling the DB Web server Web server Web server Web server Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing RDS DB standby ElastiCache node 2 RDS read replica 52. Scaling the DB Web server Web server Web server Web server Availability Zone a RDS DB instance ElastiCache node 1 Availability Zone b S3 bucket for static assets www.example.com Amazon Route 53 DNS service Elastic Load Balancing RDS DB standby ElastiCache node 2 RDS read replica RDS read replica 53. What if your app is write-heavy? Challenge: You will eventually hit the write throughput or storage limit of the master node Solutions: Federation (splitting into multiple DBs based on function) Sharding (splitting one data set up across multiple hosts) 54. Database federation Splituptablestosmaller autonomousdatabases Hardertodocross-functionqueries Essentiallydelayingtheneedfor sharding Wonthelpwithsinglehuge functions/tables ForumsDB UsersDB ProductsDB 55. Sharded horizontal scaling Eachpartitionhostsaportion oftherowsofatable Morecomplexatthe applicationlayer ORMsupportcanhelp Nopracticallimitonscalability Operationcomplexity Shardbykeyspace RDBMSorNoSQL User ShardID 002345 A 002346 B 002347 C 002348 B 002349 A ShardC ShardB ShardA 56. NoSQL data stores Trade query & integrity features of Relational DBs for More flexible data model Horizontal scalability & predictable performance DynamoDB Provisionedread/writeperformancepertable 57. Massive and Seamless Scale Distributed system that can scale both reads and writes Sharding + Replicas Automatic & transparent partitioning: Data set size growth Provisioned capacity increases table 58. Summary 59. Amazon Route 53 DNS service No limit Availability Zone a RDS DB instance ElastiCache node 2 Availability Zone b S3 bucket for static assets www.example.com Elastic Load Balancing RDS DB standby ElastiCache node 3 RDS read replica RDS read replica DynamoDB RDS read replica ElastiCache node 4 RDS read replica ElastiCache node 1 CloudSearchLambdaSES SQS 60. A quick review Keep it simple and stateless Make use of managed self-scaling services Multi-AZ and AutoScale your EC2 infrastructure Use the right DB for each workload Cache data at multiple levels Simplify operations with deployment tools 61. Next steps? READ! aws.amazon.com/documentation aws.amazon.com/architecture aws.amazon.com/start-ups ASK FOR HELP! forums.aws.amazon.com aws.amazon.com/support 62. Performance testing @ JUST EAT (Or: DoS yourself every night in production to prove you can take it) @justeat_tech + @petemounce http://tech.just-eat.com 63. Please wait while I start my DoS attack... (Demo - start fake load, show dashboards) @justeat_tech + @petemounce http://tech.just-eat.com 64. The problem with performance tests & continuous delivery Dont want to sacrifice continuous delivery & decoupled teams Dont want performance to suffer All the usual problems: Bottleneck through single environment Individual tests take too long @justeat_tech + @petemounce http://tech.just-eat.com 65. Why? Continuously test performance capacity If we find a problem Thursday night: 1. dont run fake load over the weekend 2. enjoy weekend as normal 3. fix it next week with leisure @justeat_tech + @petemounce http://tech.just-eat.com 66. Gamble! OH: We deploy tens of small changes a day. I bet we wont break production... OH: Lets just do it in production with fake traffic at the same time as customers! @justeat_tech + @petemounce http://tech.just-eat.com 67. Not that much of a gamble, really We have tight feedback loops at this point. Engineers being on call ... highly invested in not regressing performance. @justeat_tech + @petemounce http://tech.just-eat.com 68. How? Pick scenarios we care about Pick data variations to exercise Add header(s) to discriminate fake load vs customer load And then: Run it every night during peak time If no alerts fire, were good @justeat_tech + @petemounce http://tech.just-eat.com 69. What did we gain? Continuous confidence in capacity @justeat_tech + @petemounce http://tech.just-eat.com 70. What did we gain? Continuous confidence in dealing with spikes @justeat_tech + @petemounce http://tech.just-eat.com 71. What did we gain? Performance as a 1st-class concern @justeat_tech + @petemounce http://tech.just-eat.com 72. What did we gain? Tests become independent of environments data @justeat_tech + @petemounce http://tech.just-eat.com 73. (Remind me to stop my DoS attack now) (Demo - stop fake load, show dashboards) @justeat_tech + @petemounce http://tech.just-eat.com 74. Thank You @justeat_tech + @petemounce http://tech.just-eat.com Yes, were recruiting too. http://tech.just-eat.com/jobs