everything you need to know about sharding

Download Everything You Need to Know About Sharding

Post on 30-Jun-2015




3 download

Embed Size (px)


Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.


  • 1. EverythingYou Need toKnow AboutShardingDylan Tongdylan.tong@mongodb.comSenior Solutions Architect

2. 2AgendaOverview What is sharding? Why and what should I use sharding for?Building your First Sharded Cluster What do I need to know to succeed with sharding?Q&A 3. 3What is Sharding?Sharding is a means of partitioning data across servers to enable:Scaleneeded bymodernapplicationsto supportmassive workloads anddata volume.Geo-Localityto supportgeographicallydistributeddeployments tosupport optimal UXfor customers acrossvast geographies.HardwareOptimizationson Performance vs.CostLower Recovery Timesto make Recovery Time Objectives(RTO) feasible. 4. 4What is Sharding?Sharding involves a shard key defined by a data modelerthat describes the partition space of a data set.Data is partitioned into data chunks by the shard key, andthese chunks are distributed evenly across shards thatreside across many physical servers.MongoDB provides 3 Types of Sharding Strategies: Ranged Hashed Tag-aware 5. 5Range ShardingShard Key: {deviceId}1000 10012000 20013000 30014000 4001Composite Keys Supported: {deviceId, timestamp}1000,1418244824 1000,1418244825 6. 6Hash ShardingHash Sharding is a subset of Range Sharding.MongoDB apples a MD5 hash on the key when a hash shard key isused:Hash Shard Key(deviceId) = MD5(deviceId)Ensures data is distributed randomly within the range of MD5 values3333 33348000 8001AAAA AAABDDDD DDDF 7. 7Tag-aware ShardingTag-aware sharding allows subset of shards to be tagged, and assignedto a sub-range of the shard-key.Example: Sharding User Data belong to users from 100 regionsCollection: Users, Shard Key: {uId, regionCode}Tag Start EndWest MinKey, MinKey MaxKey,50East MinKey, 50 MaxKey, MaxKeySecondarySecondaryShard2,Tag=WestSecondarySecondaryShard3,Tag=EastShard1,Tag=WestSecondarySecondaryShard4,Tag=EastSecondarySecondaryAssign Regions1-50 to the WestAssign Regions51-100 to theEast 8. 8Applying ShardingUsage Required StrategyScale Range or HashGeo-Locality Tag-awareHardware Optimization Tag-awareLower Recovery Times Range or Hash 9. 9Sharding for ScalePerformance Scale: Throughput and LatencyData Scale: Cardinality, Data Volume 10. 10Typical Small DeploymentHighly Availablebut not ScalableReplica SetWrites ReadsLimited by capacityof the PrimaryshostWhen ImmediateConsistencyMatters: Limited bycapacity of thePrimarys hostWhen EventualConsistency isAcceptable: Limitedby capacity ofavailable replicaSetmembers 11. 11Sharded ArchitectureAuto-balancing:data is partitionedbased on a shardkey, andautomaticallybalanced acrossshards by MongoDBQuery Routing: databaseoperations aretransparently routedacross the clusterthrough a routing proxyprocess (software).Horizontal Scalability:load is distributed andresources are pooledacross commodityservers.Increasing read/write capacityDecoupling forDevelopment andOperational Simplicity:Ops can add capacitywithout app dependenciesand Dev involvement. 12. 13Value of Scale-out ArchitectureScale-down and re-allocateresourcesSystem Capacity$Scale-up LimitsHighCapacity/$Scale-out to 100-1000sof serversOptimize on Capacity/$Apps ofthe PastModern Apps Trend 13. 14Sharding for Geo-LocalityAdobe Cloud Services among other popular consumer andEnterprise services use sharding to run servers across multiple datacenters across geographies.Network latency from West to East is ~80ms Amazon - Every 1/10 second delay resulted in 1% loss ofsales. Google - Half a second delay caused a 20% drop in traffic. Aberdeen Group - 1-second delay in page-load timeo 11% fewer page viewso 16% decrease in customer satisfactiono 7% loss in conversions 14. 15Multi-Active DCs via Tag-awareShardingTag Start EndWest MinKey, MinKey MaxKey,50East MinKey, 50 MaxKey, MaxKeyWEST EASTSecondary SecondaryQueryTag = WestShard 2Tag = EastShard 3Tag = EastLocal Reads (Eg. Read Preference = Nearest)QueryShard 1Secondary SecondarySecondary SecondaryUpdateUpdateCollection: Users, Shard Key: {uId, regionCode}Priority=5 Priority=10Priority=10 Priority=5 15. 17Optimizing Latency and CostMagnitudes of Difference in SpeedEvent Latency Normalized to 1 sRAM access 120 ns 6 minSSD access 150 s 6 daysHDD access 10 ms 12 monthsMagnitudes of Difference in CostStorage Type Avg. Cost ($/GB) Cost at 100TB ($)RAM 5.50 550KSSD 0.50-1.00 50K to 100KHDD 0.03 3K 16. 18Optimizing Latency and CostUse Case: Sensor data collected from millions of devices. Data used forreal-time decision automation, real-time monitoring and historicalreporting.Data Type Description Latency SLA Data VolumeMeta Data Fast look-ups todrive real-timedecisions95th Percentile 90 days ago MaxKey, MaxKeyCollection Shard KeyMeta DeviceIdMetrics High Memory Ratio, DeviceId, TimestampFast CoresHDDSecondarySecondarySecondarySecondarySecondarySecondarySecondarySecondarySecondarySecondaryTag: Cache Tag: Flash Tag: ArchiveSecondarySecondarySecondarySecondaryMedium Memory Ratio,High ComputeSSDsLow Memory Ratio,Medium ComputeHHD 18. 20Restoration TimesScenario: Application bug causes logical corruption of the data, and the databaseneeds to be rolled back to a previous PIT. Whats RTO does your business require inthis event?Total DB Snapshot Size = 100TBN = 1010X10TB Snapshots generatedand/or transferred in parallelN = 100100X1TB Snapshots generatedand/or transferred in parallel.Potentially 10X fasterrestoration time.Tar.gzTar.gzTar.gzTar.gz 19. Building Your FirstSharded Cluster 20. Predictive Maintenance Platform: a cloud platform for buildingpredictive maintenance applications and servicessuch as a servicethat monitors various vehicle components by collecting data fromsensors, and automatically prescribes actions to take.Allow tenant to register, ingest and modify data collected by sensorsDefine and apply workflows and business rulesPublish and subscribe to notificationsData access API for things like reporting22Life-cycle of ShardingProduct Definition: Starts with an idea to build something big!Design &Development Test/QA Pre-Production Production 21. 23Life-cycle of ShardingData Modeling: Do I need to Shard?Throughput: data from millions of sensors updated in real-timeLatency: The value of certain attributes need to be accesswith 95th percentile < 10ms to support real-time decisionsand automation.Volume: 1TB of data collected per day. Retained for 5years.Design &Development Test/QA Pre-Production Production 22. 26Life-cycle of ShardingData Modeling: Select a Good Shard KeyCritical StepSharding is only effective as the shard key.Shard Key attributes are immutable.Re-sharding is non-trivial. Requires re-partitioningdata.Design &Development Test/QA Pre-Production Production 23. 27Good Shard KeyCardinalityWrite DistributionQuery IsolationReliabilityIndex Locality 24. 28CardinalityKey = Data Center 25. 29CardinalityKey = Timestamp 26. 30Write DistributionKey = Timestamp 27. 31Write DistributionKey = Hash(Timestamp) 28. 32Query IsolationKey = Hash(Timestamp)Scatter-gather Query 29. 33Query IsolationKey = Hash(DeviceId)*Assumes bulk of queries on collection are incontext of a single deviceId 30. 34ReliabilityKey = Hash(Timestamp) Key = Hash(DeviceId) 31. Key = Hash(DeviceId) Key = DeviceId, Timestamp35Index LocalityRandom Access Index Right Balance Index MD5 ... -2 day -1 day -0 dayRight balanced index may onlyneed to be partially in RAM to beeffective.Working Set Device N 32. 36Good Shard KeyKey = DeviceId,TimestampRandom SequentialCardinalityWrite DistributionQuery IsolationReliabilityIndex Locality 33. 37Life-cycle of ShardingPerformance Testing: Avoid PitfallsBest Practices:Pre-split:Sharding results in massiveperformance degradation!1. Hash Shard Key: specify numInitialChunks:http://docs.mongodb.org/manual/reference/command/shardCollection/2. Custom Shard Splits happen Key: create on demanda pre-split script:http://docs.as mongodb.chunks groworg/manual/tutorial/create-chunks-in-sharded-cluster/Migrations happen when animbalance is detected Run mongos (query router) on app server if possible.Design &Development Test/QA Pre-Production Production 34. 38Life-cycle of ShardingCapacity Planning: How many shards do I need?Sizing:What are the total resources required for your initialdeployment?What are the ideal hardware specs, and the # shardsnecessary?Capacity Planning: create a model to scale MongoDB for aspecific app.How do I determine when more shards need to be added?How much capacity do I gain from adding a shard?Design &Development Test/QA Pre-Production Production 35. 39How Many Servers?Strategy Accuracy Level of Effort Feasibility ofEarly ProjectAnalysisDomain Expert High to Low:inversely relatedto complexity ofthe ApplicationLow YesEmpirical (LoadTesting)High High Unlikely 36. 41Domain ExpertNormally, performed by MongoDB Solution Architect:http://bit.ly/1rkXcfN What is the document model? Collections, documents, indexes What are the major operations?- Throughput- Latency What is the working set? Eg. Last 90 days of ordersBusiness SolutionM odel and Load DefinitionAnalysis Resource Analysis Hardware Specification 37. Adjust more or less depending on latency vs. cost requirements. Very large clustersshould account for connection pooling/thread overhead (1MB per active thread)faults. Assume random IO. Account for replication, journal and log (note: sequentialIO). Ideally, estimated empirically through prototype testing. Experts can useexperience from similar applications as an estimate. Spot testing maybe needed.Storage Estimate using throughput, document and index size approximations, and retention42Domain ExpertResource MethodologyRAM Standard: Working Set + IndexesIOPs Primarily based on throughput requirements. Writes + estimation on query pagerequirements. Account for overhead like fragmentation if applicable.CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using currentcommodity CPU specs will suffice.Network Estimate using throughput and document size approximations.Business


View more >