aws webcast - dynamo db

Download AWS Webcast - Dynamo DB

Post on 25-Jan-2015

981 views

Category:

Technology

9 download

Embed Size (px)

DESCRIPTION

Amazon DynamoDB is a fully managed, highly scalable distributed database service. In this technical talk, we will deep dive on how to: Use DynamoDB to build high-scale applications like social gaming, chat, and voting. - Model these applications using DynamoDB, including how to use building blocks such as conditional writes, consistent reads, and batch operations to build the higher-level functionality such as multi-item atomic writes and join queries. - Incorporate best practices such as index projections, item sharding, and parallel scan for maximum scalability

TRANSCRIPT

  • 1. Amazon DynamoDB Deep DiveMatt Yanchyshyn: Solutions Architect, AWSShekhar Deshkar: Chief Architect, Druva

2. Amazon DynamoDB Deep Dive Whats new JSON Document Support Flexible access control Partitions & Scaling Optimizing Data Access Cost Customer use case: Druva 3. Fast & flexible NoSQL databasePredictable performanceSeamless & massive scalabilityFully managed; zero admin AmazonDynamoDB 4. Whats new? JSON Document Support Larger items: 400KB max item size Additional Scaling Options Expanded Free Tier: 25GB & 200M requests per month4 5. Amazon DynamoDB Deep Dive Whats new JSON Document Support Flexible access control Partitions & Scaling Optimizing Data Access Cost Customer use case: Druva 6. New Data Types Scalars: Number, String, Binary, Boolean, Null Multi-value: String Set, Number Set, Binary Set Document: List and Map 7. New Data Types{Day: "Monday",FavoriteDay: false,ItemsOnMyDesk: ["Telephone",{Pens: { Quantity : 3},Pencils: { Quantity : 2},Erasers: null}]}MapBooleanNested ListMapNull 8. Document Paths{Day: "Monday",FavoriteDay: false,ItemsOnMyDesk: ["Telephone",{Pens: { Quantity : 3},Pencils: { Quantity : 2},Erasers: null}]}FavoriteDayItemsOnMyDesk[0]ItemsOnMyDesk[1].Pens.Quantity 9. Example: AWS SDK for Java Document APIItem item = new Item().withPrimaryKey("Day", "Monday").withJSON("document", json);table.putItem(item);Item documentItem = table.getItem(new GetItemSpec().withPrimaryKey("Day", Monday").withAttributesToGet("document"));Item documentItem = table.getItem(new GetItemSpec().withPrimaryKey("Day", "Monday").withProjectionExpression("document.ItemsOnMyDesk[1].Pens.Quantity"));{ItemsOnMyDesk=[Telephone,{Erasers=null, Pencils={Quantity=2},Pens={Quantity=3}}],FavoriteDay=false}{ItemsOnMyDesk=[{Pens={Quantity=3}}]} 10. Example: AWS SDK for Java Document APIItem item = new Item().withPrimaryKey("Day", "Friday").withMap("document", new ValueMap().withBoolean("FavoriteDay", true).withList("ItemsOnMyDesk", "Telephone",new ValueMap().withMap("Pens", new ValueMap().withInt("Quantity", 1)).withMap("Pencils", new ValueMap().withInt("Quantity", 0)).withMap("Erasers", new ValueMap().withInt("Quantity", 1))));{Day: "Monday",FavoriteDay: false,ItemsOnMyDesk: ["Telephone",{Pens: { Quantity : 3},Pencils: { Quantity : 2},Erasers: null}]} 11. Questions? 12. Amazon DynamoDB Deep Dive Whats new JSON Document Support Flexible access control Partitions & Scaling Optimizing Data Access Cost Customer use case: Druva 13. Flexible access control Control access to individualitems and attributes Implement using AWS IAMroles that authenticatedusers assume Leverage SAML or WIF 14. Flexible access control with AWS IAM..."Effect": "Allow","Action": [ "dynamodb:Query, "dynamodb:Scan ],"Resource": "arn:aws:dynamodb:REGION:abcde12345:users/example","Condition": {"ForAllValues:StringEquals": {"dynamodb:LeadingKeys": ["${saml:sub}"],"dynamodb:Attributes": [ "userId", "firstName", "lastName ]},"StringEqualsIfExists": {"dynamodb:Select": "SPECIFIC_ATTRIBUTES"}...Authenticated user ID /DynamoDB table keyItem attributes thatuser can accessDynamoDB APIactions allowed 15. Questions?and if you want to take DynamoDBs flexible securitymodel to the next level, check-out client-side encryptionin Java with DynamoDBMapper:http://bit.ly/1zerqW2 16. Amazon DynamoDB Deep Dive Whats new JSON Document Support Flexible access control Partitions & Scaling Optimizing Data Access Cost Customer use case: Druva 17. Partitions & ScalingVotes TableVoterCandidate AVotes: 21Candidate BVotes: 30UpdateItemADD 1 to Candidate A(aka Atomic Increment) 18. Partitions & ScalingVotes TableYouProvision 1200 Write Capacity UnitsNeed to scalefor the election 19. Partitions & ScalingVotes TablePartition 1 Partition 2You600 Write Capacity Units (each)Provision 1200 Write Capacity Units 20. Partitions & ScalingVotes TablePartition 1 Partition 2You(no sharing)Provision 1200 Write Capacity Units 21. Partitions & ScalingVotes TableYouProvision 200,000 Write Capacity UnitsPartition 1(600 WCU)Partition K(600 WCU)Partition M(600 WCU)Partition N(600 WCU) 22. Partitions & ScalingVotes TablePartition 1(600 WCU)Candidate APartition K(600 WCU)Partition M(600 WCU)Partition N(600 WCU)Candidate BVoters 23. Scaling WritesVotes TableCandidate A_2Candidate B_1Candidate B_2Candidate B_3Candidate B_5Candidate B_4Candidate B_7Candidate B_6Candidate A_1Candidate A_3Candidate A_4Candidate A_7 Candidate B_8VoterUpdateItem: CandidateA_ + rand(0, 10)ADD 1 to VotesCandidate A_6 Candidate A_8Candidate A_5 24. Scaling WritesVotes TableCandidate A_2Candidate B_1Candidate B_2Candidate B_3Candidate B_5Candidate B_4Candidate B_7Candidate B_6Candidate A_1Candidate A_3Candidate A_4Candidate A_5Candidate A_6 Candidate A_8Candidate A_7 Candidate B_8PeriodicProcessCandidate ATotal: 2.5M1. Sum2. Store Voter 25. Scaling ReadsVotes TablePartition 1(600 WCU)Candidate A_TotalVotes: 2.5MPartition K(600 WCU)Partition M(600 WCU)Partition N(600 WCU)Candidate B_TotalVotes: 2.1MVoters 26. Scaling ReadsVotes TablePartition 1(600 WCU)Candidate A_TotalVotes: 2.5MPartition K(600 WCU)Partition M(600 WCU)Partition N(600 WCU)Candidate B_TotalVotes: 2.1MVoters 27. Scaling Best Practices Design for uniform data access Distribute write activity Understand access patterns Be careful not to over-provision 28. Questions? 29. Amazon DynamoDB Deep Dive Whats new JSON Document Support Flexible access control Partitions & Scaling Optimizing Data Access Cost Customer use case: Druva 30. Optimizing Query Cost 31. Provisioned Throughput Capacity Units 1 Write Capacity Unit (WCU) = Up to 1 KB Item 1 Read Capacity Unit (RCU) = Up to 4 KB of Items 0.5 RCU = Up to 4 KB of Items (Eventually Consistent) 32. How much throughput do I need? Investigate customer workload: How many requests per second per hash key? How much will those operations cost? Investigate customer data size: How big will the table be initially? How large will the table grow? Will the throughput grow proportionally? 33. Best practices: Query vs. BatchGetItem Query treats all items as a single read operation Items share the same hash key = same partition BatchGetItem reads each item in the batch separately Example Read 100 items in a table, all with the same hash key.Each item is 120 bytes in size Query RCU = 3 BatchGetItem RCU = 100 34. Filter cost calculationQuery UserId=Alice, Filter where OpponentId=BobUserId GameId Date OpponentIdCarol e23f5a 2013-10-08 CharlieAlice d4e2dc 2013-10-01 BobAlice e9cba3 2013-09-27 CharlieAlice f6a3bd 2013-10-08 BobFilters do not reducethe throughput costQuery/Scan: 1 MB maxreturned per operation.Limit applies beforefiltering 35. Example: Query & Read Capacity UnitsPage Time Viewsindex.html 2014-06-25T12:00 120index.html 2014-06-25T12:01 122index.html 2014-06-25T12:02 125search.html 2014-06-25T12:00 200Query PageViewsWHERE Page=index.htmlAND Time >= 2014-06-25T12:00LIMIT 10 42 bytes each X 3 items= 126 bytes= 1 Read Capacity Unit 36. Query cost calculationUserId GameId Date OpponentId Carol e23f5a 2013-10-08 Charlie Alice d4e2dc 2013-10-01 Bob Alice e9cba3 2013-09-27 Bob Alice f6a3bd 2013-10-08(1 item = 600 bytes)(397 more games for Alice)400 X 600 / 1024 / 4 = 60 Read Capacity Units(bytes per item) (KB per byte)(Items evaluated by Query) (KB per Read Capacity Unit) 37. Local Secondary Indexes (LSI) Same hash key + alternate (scalar) range key Index and table data is co-located (same partition) Read and write capacity units consumed from the table Increases WCUs per write 10GB max 38. A1(hash)A3(range)A2(table key)A1(hash)A2(range)A3 A4 A5LSIsA1(hash)A4(range)A2(table key)A3(projected)TableKEYS_ONLYINCLUDE A3A1(hash)A4(range)A2(table key)A3(projected)A5(projected) ALLLocal Secondary Index Projections Items with fewer projected attributes will be smaller: lower cost Project with care: LSI queries auto-fetch non-projected attributesfrom the table at a cost: index matches + table matches 39. Global Secondary Indexes (GSI) Any attribute indexed as new hash and/or range key GSIs get their own provisioned throughput Asynchronously updated, so eventual consistency only 40. A5(hash)A3(range)A1(table key)A1(hash)A2 A3 A4 A5GSIsA5(hash)A4(range)A1(table key)A3(projected)TableKEYS_ONLYINCLUDE A3A4(hash)A5(range)A1(table key)A2(projected)A3(projected) ALLA2(hash)A1(table key) KEYS_ONLYGlobal Secondary Index ProjectionsGSI queries can onlyrequest projectedattributes 41. Reduce cost with Sparse Indexes DynamoDB only writes to an index if the index key valueis present in the item Use sparse indexes to efficiently locate table items thathave an uncommon attribute Example: find hockey teams that have won the StanleyCup GSI Hash Key: StanleyCupWinner GSI Range Key: TeamId 42. Optimizing Scans 43. What about non-indexed querying? Sometimes you need to scan the full table,e.g. bulk export Use Parallel Scan for faster retrieval, if Your table is >= 20 GB Provisioned read throughput has room Sequential Scan operations are too slow Consider cost and performance implications 44. Bulk Export: ScanYour TablePartition 1 Partition 2 Partition 3 Partition 4 Slow: only uses onepartition at a time Non-uniform utilization Consumes lots ofRCUs 45. Bulk Export: Parallel ScanYour TablePartition 1 Partition 2 Partition 3 Partition 4 Divide the work Use threads, non-blockingIO, processes,or multiple servers todistribute the work Can still consume lots ofRCUs! 46. Bulk Export: Rate-limited Parallel ScanYour TablePartition 1 Partition 2 Partition 3 Partition 4 Limit page size and rate(1 MB default page size/ 4 KB item size)/ 2 eventually consistent reads= 128 RCUs per page (default) Rate-limit each segment 47. Many best practices are implemented already Amazon Elastic Map Reduce dynamodb.throughput.read/write.percent Amazon Redshift readratio DynamoDB Mapper (Java SDK) RateLimiter library (Google Guav