socialite, the open source status feed part 2: managing the social graph
DESCRIPTION
There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.TRANSCRIPT
![Page 1: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/1.jpg)
Building a Social Platform with MongoDB
MongoDB IncDarren Wood & Asya Kamsky
#MongoDBWorld
![Page 2: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/2.jpg)
Building a Social Platform
Part 2: Managing the Social Graph
![Page 3: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/3.jpg)
Socialite
• Open Source• Reference Implementation
– Various Fanout Feed Models– User Graph Implementation– Content storage
• Configurable models and options• REST API in Dropwizard (Yammer)
– https://dropwizard.github.io/dropwizard/• Built-in benchmarking
https://github.com/10gen-labs/socialite
![Page 4: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/4.jpg)
Architecture
Graph Service
Proxy
Cont
ent
Prox
y
![Page 5: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/5.jpg)
Graph Data - Social
John Katefollows
Bob
followsPete
follows
follows
![Page 6: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/6.jpg)
Graph Data - Social
John Katefollows
Bob
followsPete
follows
follows
Recommendation ?
![Page 7: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/7.jpg)
Graph Data - Promotional
John Katefollows
Bob
follows Pete
follows
follows
follows
Acme Soda
Mention
Mention
Recommendation ?
![Page 8: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/8.jpg)
Graph Data - Everywhere
• Retail
• Complex product catalogues
• Product recommendation engines
• Manufacturing and Logistics
• Tracing failures to faulty component batches
• Determining fallout from supply interruption
• Healthcare
• Patient/Physician interactions
![Page 9: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/9.jpg)
Design Considerations
![Page 10: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/10.jpg)
The Tale of Two Biebers
VS
![Page 11: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/11.jpg)
The Tale of Two Biebers
VS
![Page 12: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/12.jpg)
Follower Churn
• Tempting to focus on scaling content• Follow requests rival message send rates• Twitter enforces per day follow limits
![Page 13: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/13.jpg)
Edge Metadata
• Models – friends/followers• Requirements typically start simple• Add Groups, Favorites, Relationships
![Page 14: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/14.jpg)
Storing Graphs in MongoDB
![Page 15: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/15.jpg)
Option One – Embedding Edges
![Page 16: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/16.jpg)
Embedded Edge Arrays
• Storing connections with user (popular choice)Most compact formEfficient for reads
• However….– User documents grow– Upper limit on degree (document size)– Difficult to annotate (and index) edge
{ "_id" : "djw","fullname" : "Darren Wood","country" : "Australia","followers" : [ "jsr", "ian"],"following" : [ "jsr", "pete"]
}
![Page 17: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/17.jpg)
Embedded Edge Arrays
• Creating Rich Graph Information– Can become cumbersome
{
"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [
{"uid" : "jsr", "grp" : "school"},{"uid" : "ian", "grp" : "work"} ]
} {
"_id" : "djw","fullname" : "Darren Wood","country" : "Australia","friends" : [ "jsr", "ian"],"group" : [ ”school", ”work"]
}
![Page 18: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/18.jpg)
Option Two – Edge Collection
![Page 19: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/19.jpg)
Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne(){
"_id" : ObjectId(…),"from" : "djw","to" : "jsr"
}
> db.friends.findOne(){
"_id" : ObjectId(…),"from" : "djw","to" : "jsr","grp" : "work","ts" : Date("2013-07-10")
}
![Page 20: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/20.jpg)
Operational issues
• Updates of embedded arrays– grow non-linearly with number of indexed array
elements
• Updating edge collection => inserts– grows close to linearly with existing number of
edges/user
![Page 21: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/21.jpg)
Edge Insert Rate
![Page 22: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/22.jpg)
Edge CollectionIndexing Strategies
![Page 23: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/23.jpg)
Finding Followers
Consider our single follower collection :> db.followers.find({from : "djw"}, {_id:0, to:1}){
"to" : "jsr"}
Using index :
{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"
}
Covered index when searching on "from" for all followers
Specify only if multiple edges cannot exist
![Page 24: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/24.jpg)
Finding Following
What about who a user is following?Can use a reverse covered index :
{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"
}{
"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"
}Notice the flipped field order here
![Page 25: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/25.jpg)
Finding Following
Wait ! There is an issue with the reverse index….. SHARDING !
{"v" : 1,"key" : { "from" : 1, "to" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "from_1_to_1"
}{
"v" : 1,"key" : { "to" : 1, "from" : 1 },"unique" : true,"ns" : "socialite.followers","name" : "to_1_from_1"
}
If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard
To find who the user is following however, it must scatter-gather the query to all shards
![Page 26: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/26.jpg)
Dual Edge Collections
![Page 27: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/27.jpg)
Dual Edge Collections When "following" queries are common
– Not always the case– Consider overhead carefully
Can use dual collections storing
– One for each direction– Edges are duplicated reversed– Can be sharded independently
![Page 28: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/28.jpg)
Edge Query Rate ComparisonNumber of shards vsNumber of queries
Followers collectionwith forward and reverse indexes
Two collections, followers, followingone index each
1 10,000 10,000
3 90,000 30,000
6 360,000 60,000
12 1,440,000 120,000
![Page 29: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/29.jpg)
Follower Counts
Can use the edge indexes :
How to determine these counts ?
> db.followers.find({_f : "djw"}).count()> db.following.find({_f : "djw"}).count()
However this can be heavy weight- Especially for rendering landing page- Consider maintaining counts on user document
![Page 30: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/30.jpg)
Socialite User Service
• Manages user profiles and the follower graph• Supports arbitrary user data passthrough• Options for graph storage
– Uses edge collections (can shard by _f) – Options for maintaining separate follower/ing graphs– Storing counts vs counting
{"_id" : ObjectId("52cd1d32a0ee9a1a76d369bb"),"_f" : "jsr","_t" : "djw"
}{
"v" : 1,"key" : {"_f" : 1, "_t" : 1},"unique" : true,
}
![Page 31: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/31.jpg)
Next up @ 11:50am : Scaling the Data Feed
• Delivering user content to followers
• Comparing fanout models
• Caching user timelines for fast retrieval
• Embedding vs Linking Content
![Page 32: Socialite, the Open Source Status Feed Part 2: Managing the Social Graph](https://reader033.vdocuments.mx/reader033/viewer/2022061103/540024c68d7f724c088b4b1f/html5/thumbnails/32.jpg)
Building a Social Platform with MongoDB
MongoDB IncDarren Wood & Asya Kamsky
#MongoDBWorld