sharding overview

49
Director of Solution Architecture, 10gen Edouard Servan-Schreiber, Ph.D. #MongoDBdays Sharding

Upload: mongodb

Post on 15-Jan-2015

824 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Sharding Overview

Director of Solution Architecture, 10gen

Edouard Servan-Schreiber, Ph.D.

#MongoDBdays

Sharding

Page 2: Sharding Overview

Visual representation of vertical scaling

1970 - 2000: Vertical Scalability (scale up)

Page 3: Sharding Overview

Visual representation of horizontal scaling

Google, ~2000: Horizontal Scalability (scale out)

Page 4: Sharding Overview

Why shard?

Page 5: Sharding Overview

Working Set Exceeds Physical Memory

Page 6: Sharding Overview

Read/Write Throughput Exceeds I/O

Page 7: Sharding Overview

Partition data based on ranges

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

Page 8: Sharding Overview

Distribute data in chunks across nodes

• Initially 1 chunk

• Default max chunk size: 64mb

• MongoDB automatically splits & migrates chunks when max reached

Page 9: Sharding Overview

MongoDB manages data access

• Queries routed to specific shards

• MongoDB balances cluster

• MongoDB migrates data to new nodes

Page 10: Sharding Overview

Architecture

Page 11: Sharding Overview

Data stored in shard

• Shard is a node of the cluster

• Shard can be a single mongod or a replica set

Page 12: Sharding Overview

• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have

3)– Two phase commit (not a replica set)

Config server stores meta data

Page 13: Sharding Overview

MongoS manages the data routing

• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Page 14: Sharding Overview

Sharding infrastructure

Page 15: Sharding Overview

Mechanics

Page 16: Sharding Overview

Partitioning

• Remember it's based on ranges

Page 17: Sharding Overview

Chunk is a section of the entire range

Page 18: Sharding Overview

Chunk splitting

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same

shard key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk

count across cluster a balancing round starts

Page 19: Sharding Overview

Balancing

• Balancer is running on mongos• Once the difference in chunks between the most

dense shard and the least dense shard is above the migration threshold, a balancing round starts

Page 20: Sharding Overview

Acquiring the Balancer Lock

• The balancer on mongos takes out a “balancer lock”• To see the status of these locks:

- use config- db.locks.find({ _id: “balancer” })

Page 21: Sharding Overview

Moving the chunk

• The mongos sends a “moveChunk” command to source shard

• The source shard then notifies destination shard• The destination claims the chunk shard-key range• Destination shard starts pulling documents from

source shard

Page 22: Sharding Overview

Committing Migration

• When complete, destination shard updates config server- Provides new locations of the chunks

Page 23: Sharding Overview

Cleanup

• Source shard deletes moved data- Must wait for open cursors to either close or time out- NoTimeout cursors may prevent the release of the lock

• Mongos releases the balancer lock after old chunks are deleted

Page 24: Sharding Overview

Routing Requests

Page 25: Sharding Overview

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Page 26: Sharding Overview

Cluster Request Routing: Targeted Query

Page 27: Sharding Overview

Routable request received

Page 28: Sharding Overview

Request routed to appropriate shard

Page 29: Sharding Overview

Shard returns results

Page 30: Sharding Overview

Mongos returns results to client

Page 31: Sharding Overview

Cluster Request Routing: Non-Targeted Query

Page 32: Sharding Overview

Non-Targeted Request Received

Page 33: Sharding Overview

Request sent to all shards

Page 34: Sharding Overview

Shards return results to mongos

Page 35: Sharding Overview

Mongos returns results to client

Page 36: Sharding Overview

Cluster Request Routing: Non-Targeted Query with Sort

Page 37: Sharding Overview

Non-Targeted request with sort received

Page 38: Sharding Overview

Request sent to all shards

Page 39: Sharding Overview

Query and sort performed locally

Page 40: Sharding Overview

Shards return results to mongos

Page 41: Sharding Overview

Mongos merges sorted results

Page 42: Sharding Overview

Mongos returns results to client

Page 43: Sharding Overview

Shard Key

Page 44: Sharding Overview

Shard Key

• Choose a field common used in queries

• Shard key is immutable

• Shard key values are immutable

• Shard key requires index on fields contained in key

• Uniqueness of `_id` field is only guaranteed within individual shard

• Shard key limited to 512 bytes in size

Page 45: Sharding Overview

Shard Key Considerations

• Cardinality

• Write distribution

• Query isolation

• Data Distribution

Page 46: Sharding Overview

Hashed Sharding

Page 47: Sharding Overview

Hashed Shard Keys

• New in MongoDB 2.4

• Uses Hashed indexes

• The mongos will route all equality queries to a specific shard or set of shards; however, the mongos must route range queries to all shards.

• When using a hashed shard key on an empty collection, MongoDB automatically pre-splits the range of 64-bit hash values into chunks

Page 48: Sharding Overview

Use Cases for Hashed Shard Keys

• Write heavy applications

• Need good write distribution

• Tolerant of slower scatter gather queries

• Rarely perform range queries (Everything is directed to one document)

Page 49: Sharding Overview

Job Title, 10gen

Speaker Name

#ConferenceHashTag

Thank You