yieldbot tech talk, sept 20, 2012
TRANSCRIPT
© 2012 Yieldbot / CONFIDENTIAL© 2012 Yieldbot / CONFIDENTIAL
Yieldbot Tech Talk – MongoDB to k/v
© 2012 Yieldbot
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Yieldbot technology creates marketplaces where advertisers target realtime consumer intent flowing through premium publishers.
• At a high level: Analytics + Ad Serving– Geo-distributed
• Data collection• Realtime ad matching
– Cascalog batch analytics– Rich Analytics Results visualizations
What We Do
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Why MongoDB (Dec 2009)
• Needed manageable by dev team (1 person!)• Flexible• Easy to get started, run on laptop or deploy• Scale wasn’t initially biggest concern• Could focus on other stuff
– Lucene– Analytics– Ad serving dynamics
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Configuration– Publisher profiles, ad matching rules, etc.
• Data collection– Pageviews, impressions, clicks
• Analytics results• Task state tracking• Lookup tables for ad serving• Real-time ad stats
How MongoDB Used Initially
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Master/Slave– convenient for simple durability– convenient for geo distribution– not unique to Mongo, now similar redis topology
• Indexing– Easy to set up, but eventually RAM scaling issue– initially great for efficient views of data in UI– moved analytics results as key/value in mongo
• Durable sharded config (replica sets) expensive
Couple Aspects of Note
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Mongo: collections for pageviews, impressions, clicks– Wasn’t archived anywhere else– Not where you want to infinitely scale
• Now flows through redis, to files, to S3
Data Collection
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• redis lists populated as events come in• Daemons pull off lists and write to files• Periodically compress and archive files to S3• S3 files used for input later
– Hadoop (Cascalog) batch analytics– Advertising Stats Calculations
Data Collection with redis Assist
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Mongo: collections for different lookup types– Eg., geo, url– Built periodically, updated on config change– Lookup in each, correlate results
• redis– Ability to pipeline operations in single server call– Set intersection across lookup dimensions and one
response back– Same master/slave as Mongo for distribution
Matching Lookup Tables
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• Mongo– Database per publisher– Collections for objects– Denormalized where possible– Manual Foreign Keys– Obviously best candidate for relational model
• History and Versioning was paramount to us– Roll our own: HeroDB
Configuration
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
HeroDB
• History and granular versioning highest goal• Database built on top of git
– Golden database is a bare repo– Can clone to anywhere, make changes, push– Changes in single commit are atomic
• How, when, and who changed it• Ability to set to specific previous state of DB• Much more to do, in production 6+ months
– Recent change, caching
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Analytics Results
• ARCv1, Mongo: indexed collections– Very easy to code to– Initially with everything else in same server– Moved out to dedicated server– Memory became an issue
• Indexes bigger than data itself– Overhead of importing Cascalog results
• Pull json files from S3 to local disk• mongoimport files into DB
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• ARCv2, Mongo: paged data, key/value– Migrated app to key/value access pattern– Much better memory usage– Application sharded, publishers spread around– DB per day per publisher, most recent 7 held– Still overhead of importing Hadoop results
• Pull json files from S3 to local disk• mongoimport files into DB
Analytics Results Cont’d
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Analytics Results - ElephantDB
• Cascalog support to directly write EDB format– Berkeley DB or LevelDB
• Ring Topology– Shards distributed around ring, consistent hashing– Configurable replication factor– Request to any node, forwards as necessary– Incrementally increase ring size
• Import from S3 efficient– Copy shard from S3 to local disk
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Real-time Ad Stats
• Mongo: DB per day, collection by entity type– Document per entity instance– stat_type.hour.minute nested values, atomic
increment– Never a good story around aggregating at larger
timeframes• Enter redis again
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Real-time Ad Stats Cont’d
• redis has robust access patterns– More pipelining
• Initially realtime and aggregated kept in redis• Issue with redis scaling is DB has to fit in memory• Time-period aggregations now kept in HBase• Only most recent hours kept in redis
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Task State Tracking
• The last holdout• Collection of tasks
– Each task is a document– Indexed as needed– Mongo query and update syntax convenient
• Both in static code, but also in Python or Mongo repl
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
• redis for the celery backend, used for task messaging infrastructure
• but was never mongo anyway...
Honorable Mention
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
MongoDB Migration Summary • Configuration• Data Collection• Analytics Results• Task State Tracking• Matcher Lookup Tables• Real-time Ad Stats
HeroDB to S3 via redis ElephantDB still Mongo redis redis/HBase
© 2012 Yieldbot
Yieldbot Tech Talk – MongoDB to key/value, Sept 20, 2012
Thanks!
Site: yieldbot.comBlog: blog.yieldbot.comTwitter: @yieldbotEmail: [email protected]