MongoDB and The Internet of Things
Arthur ViegersSenior Solutions Architect, MongoDB
MongoDB IoT City Tour 2014
*
MongoDB
Document Database
Open-Source
General Purpose
*
Documents Are Core
Relational MongoDB{ first_name: "Paul", surname: "Miller", city: "London", location: [45.123,47.232], cars: [ { model: "Bentley", year: 1973, value: 100000, … }, { model: "Rolls Royce", year: 1965, value: 330000, … } ]}
*
Documents Are Core
Relational MongoDB
Modelling time series datain MongoDB
*
Rexroth NEXO Cordless Nutrunner
*
• Store event data
• Support Analytical Queries
• Find best compromise of:- Memory utilization- Write performance- Read/Analytical Query Performance
• Accomplish with realistic amount of hardware
Time series schema design goal
*
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Modelling time series data
*
Document per event
{ deviceId: "Test123", timestamp: ISODate("2014-07-03T22:07:38.000Z"), temperature: 21}
• Relational-centric approach
• Insert-driven workload
*
Document per minute (average)
{ deviceId: "Test123", timestamp: ISODate("2014-07-03T22:07:00.000Z"), temperature_num: 18, temperature_sum: 357}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute level
*
Document per minute (by second)
{ deviceId: "Test123", timestamp: ISODate("2014-07-03T22:07:00.000Z"), temperature: { 0: 18, 1: 18, …, 58: 21, 59: 21 }}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
*
Document per hour (by second)
{ deviceId: "Test123", timestamp: ISODate("2014-07-03T22:00:00.000Z"), temperature: { 0: 18, 1: 18, …, 3598: 20, 3599: 20 }}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
*
Document per hour (by second){ deviceId: "Test123", timestamp: ISODate("2014-07-03T22:00:00.000Z"), temperature: { 0: { 0: 18, …, 59: 18 }, …, 59: { 0: 21, …, 59: 20 } }}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59 + 59 steps
*
Rexroth NEXO schema{ _id: ObjectID("52ecf3d6bf1e623a52000001"), assetId: "NEXO 109", hour: ISODate("2014-07-03T22:00:00.000Z"), status: "Online", type: "Nutrunner", serialNo : "100-210-ABC", ip: "127.0.0.1", positions: { 0: { 0: { x: "10", y:"40", zone: "itc-1", accuracy: "20” }, …, 59: { x: "15", y: "30", zone: "itc-1", accuracy: "25” } }, …, 59: { 0: { x: "22", y: "27", zone: "itc-1", accuracy: "22” }, …, 59: { x: "18", y: "23", zone: "itc-1", accuracy: "24” } } }}
When to scale
*
Disk I/O Limitations
*
Working Set Exceeds Physical Memory
Working SetIndexes Data
Working SetDataIndexes
How to scale
*
Scaling Up
*
Scaling Out
First Edition (1771)3 Volumes
Fifteenth Edition (2010)32 Volumes
MongoDB’s Approach
*
Shards and Shard Keys
Shard
Shard key range
*
• Cardinality
• Write distribution
• Query isolation
• Reliability
• Index locality
Shard Key Considerations
*
Shard Key Selection Rexroth NEXO
Cardinality Write Distribution
Query Isolation Reliability Index
Locality
_id Doc level One shard Scatter/gather
All users affected Good
hash(_id) Hash level All Shards Scatter/gather
All users affected Poor
assetId Many docs All Shards Targeted Some assets affected
Good
assetId, hour Doc level All Shards Targeted Some assets affected
Good
*
• MongoDB scales horizontally (sharding)
• Each shard is an independent database, and collectively, the shards make up a single logical database
• MMS makes it easy and reliable to run MongoDB at scale
• Sharding requires minimal effort from the application code: same interface as single mongod
Scaling Data - Summarized
Summary
*
• IoT processes are real-time
• Relational technologies can simply not compete on cost, performance, scalability, and manageability
• IoT data can come in any format, structured or unstructured, ranging from text and numbers to audio, picture and video
• Time series data is a natural fit
• IoT applications often require geographically distributed systems
Why is MongoDB a good fit for IoT?
Thank you!