precog & mongodb user group: skyrocket your analytics
DESCRIPTION
earn how to do advanced analytics with the Precog data science platform on your MongoDB database. It's free to download the Precog file and after installing, you'll be on your way to analyzing all the data in your MongoDB database, without forcing you to export data into another tool or write any custom code. Learn more here: www.precog.com/mongodbTRANSCRIPT
![Page 1: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/1.jpg)
Skyrocket your Analytics
MongoDB Meetup on December 10, 2012www.precog.com@precogioNov - Dec 2012
![Page 2: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/2.jpg)
■ Welcome to the Precog & MongoDB Meetup!
■ Questions? Please ask away!
welcome & agenda
7:00 - 7:30Overview of Precog for MongoDB by Derek Chen-Becker
7:30 - 7:45Break (grab a beer, drink and snacks)
7:45 - 8:15Analyzing Big Data with Quirrel by John A. De Goes
8:15 - 8:30Precog Challenge Problems! Win some prizes!
![Page 3: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/3.jpg)
■ Precog TeamDerek Chen-Becker, Lead Infrastructure Engineer
John A. De Goes, CEO/Founder
Kris Nuttycombe, Dir of Engineering
Nathan Lubchenco, Developer Evangelist
■ MongoDB HostClay Mcllrath
■ Thank you to Google for hosting us!
who we are
![Page 4: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/4.jpg)
Current MongoDB Support for Analytics
Derek Chen-BeckerPrecog Lead Infrastructure Engineer@dchenbeckerNov - Dec 2012
![Page 5: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/5.jpg)
■ Mongo has support for a small set of simple aggregation primitives
○ count - returns the count of a given collection's documents with optional
filtering
○ distinct - returns the distinct values for given selector criteria
○ group - returns groups of documents based on given key criteria. Group
cannot be used in sharded configurations
current mongodb support for analytics
![Page 6: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/6.jpg)
> db.london_medals.group({
key : {"Country":1},
reduce : function(curr, result) { result.total += 1 },
initial: { total : 0, fullTotal: db.london_medals.count() },
finalize: function(result){ result.percent = result.total * 100 / result.fullTotal }
})
[
{"Country" : "Great Britain", "total" : 88, "fullTotal" : 1019, "percent" : 8.635917566241414},
{"Country" : "Dominican Republic", "total" : 2, "fullTotal" : 1019, "percent" : 0.19627085377821393},
{"Country" : "Denmark", "total" : 16, "fullTotal" : 1019, "percent" : 1.5701668302257115},
...
■ More sophisticated queries are possible, but require a lot of JS and you'll hit the limits pretty quickly
■ Group cannot be used in sharded configurations. For that you need...
current mongodb support for analytics
![Page 7: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/7.jpg)
■ Map/Reduce: Exactly what its name says.
■ You utilize JavaScript functions to map your documents' data, then reduce that
data into a form of your choosing.
current mongodb support for analytics
Input Collection
Mapping Function Reducing Function
Result Document
Output Collection
![Page 8: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/8.jpg)
■ The mapping function redefines this to be the current document
■ Output mapped keys and values are generated via the emit function
■ Emit can be called zero or more times for a single document
function () { emit(this.Countryname, { count : 1 }); }
function () {
for (var i = 0; i < this.Pupils.length; i++) {
emit(this.Pupils[i].name, { count : 1});
}
function () {
if ((this.parents.age - this.age) < 25) { emit(this.age, { income : this.income }); }
}
current mongodb support for analytics
![Page 9: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/9.jpg)
■ The reduction function is used to aggregate the outputs from the mapping
function
■ The function receives two inputs: the key for the elements being reduced, and
the values being reduced
■ The result of the reduction must be the same format as in the input elements,
and must be idempotent
function (key, values) {
var count = 0;
for (var item in values) {
count += item.count
}
{ "count" : count }
}
current mongodb support for analytics
![Page 10: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/10.jpg)
■ Map/Reduce utilizes JavaScript to do all of its work
○ JavaScript in MongoDB is currently single-threaded (performance bottleneck)
○ Using external JS libraries is cumbersome and doesn't play well with sharding
○ No matter what language you're actually using, you'll be writing/maintaining
JavaScript
■ Troubleshooting the Map/Reduce functions is primitive. 10Gen's advice: "write
your own emit function" (!)
■ Output options are flexible, but have some caveats
○ Output to a result document must fit in a BSON doc (16MB limit)
○ For an output collection: if you want indices on the result set, you need to pre-
create the collection then use the merge output option
current mongodb support for analytics
![Page 11: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/11.jpg)
■ The Aggregation Framework is designed to alleviate some of the issues with
Map/Reduce for common analytical queries
■ New in 2.2
■ Works by constructing a pipeline of operations on data. Similar to M/R, but
implemented in native code (higher performance, not single-threaded)
current mongodb support for analytics
Input Collection Match Project Group
![Page 12: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/12.jpg)
■ Filtering/paging ops
○ $match - utilize Mongo selection syntax to choose input docs
○ $limit
○ $skip
■ Field manipulation ops
○ $project - select which fields are processed. Can add new fields
○ $unwind - flattens a doc with an array field into multiple events, one per array
value
■ Output ops
○ $group
○ $sort
■ Most common pipelines will be of the form $match ⇒ $project ⇒ $group
current mongodb support for analytics
![Page 13: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/13.jpg)
■ $match is very important to getting good performance
■ Needs to be the first op in the pipeline, otherwise indices can't be used
■ Uses normal MongoDB query syntax, with two exceptions
○ Can't use a $where clause (this requires JavaScript)
○ Can't use Geospatial queries (just because)
{ $match : { "Name" : "Fred" } }
{ $match : { "Countryname" : { $neq : "Great Britain" } } }
{ $match : { "Income" : { $exists : 1 } } }
current mongodb support for analytics
![Page 14: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/14.jpg)
■ $project is used to select/compute/augment the fields you want in the output
documents
{ $project : { "Countryname" : 1, "Sportname" : 1 } }
■ Can reference input document fields in computations via "$"
{ $project : { "country_name" : "$Countryname" } } /* renames field */
■ Computation of field values is possible, but it's limited and can be quite painful
{ $project: {
"_id":0, "height":1, "weight":1,
"bmi": { $divide : ["$weight", { $multiply : [ "$height", "$height" ] } ] } }
} /* omit "_id" field, inflict pain and suffering on future maintainers... */
current mongodb support for analytics
![Page 15: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/15.jpg)
■ $group, like the group command, collates and computes sets of values based
on the identity field ("_id"), and whatever other fields you want
{ $group : { "_id" : "$Countryname" } } /* distinct list of countries */
■ Aggregation operators can be used to perform computation ($max, $min, $avg,
$sum)
{ $group : { "_id" : "$Countryname", "count" : { $sum : 1 } } } /* histogram by
country */
{ $group : { "_id" : "$Countryname", "weight" : { $avg : "$weight" } } }
{ $group : { "_id" : "$Countryname", "weight" : { $sum : "$weight" } } }
■ Set-based operations ($addToSet, $push)
{ $group : { "_id" : "$Countryname", "sport" : { $addToSet : "$sport" } } }
current mongodb support for analytics
![Page 16: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/16.jpg)
■ Aggregation framework has a limited set of operators
○ $project limited to $add/$subtract/$multiply/$divide, as well as some
boolean, string, and date/time operations
○ $group limited to $min/$max/$avg/$sum
■ Some operators, notably $group and $sort, are required to operate entirely in
memory
○ This may prevent aggregation on large data sets
○ Can't work around using subsetting like you can with M/R, because output is
strictly a document (no collection option yet)
current mongodb support for analytics
![Page 17: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/17.jpg)
■ Even with these tools, there are still limitations
○ MongoDB is not relational. This means a lot of work on your part if you have
datasets representing different things that you'd like to correlate. Clicks vs
views, for example
○ While the Aggregation Framework alleviates some of the performance issues
of Map/Reduce, it does so by throwing away flexibility
○ The best approach for parallelization (sharding) is fraught with operational
challenges (come see me for horror stories)
current mongodb support for analytics
![Page 18: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/18.jpg)
Overview of Precog for MongoDB
Derek Chen-BeckerPrecog Lead Infrastructure Engineer@dchenbeckerNov - Dec 2012
![Page 19: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/19.jpg)
■ Download file: http://www.precog.com/mongodb
■ Setup:
$ unzip precog.zip
$ cd precog
$ emacs -nw config.cfg (adjust ports, etc)
$ ./precog.sh
overview of precog for mongodb
![Page 20: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/20.jpg)
■ Precog for MongoDB allows you to perform sophisticated analytics utilizing
existing mongo instances
■ Self-contained JAR bundling:
○ The Precog Analytics service
○ Labcoat IDE for Quirrel
■ Does not include the full Precog stack
○ Minimal authentication handling (single api key in config)
○ No ingest service (just add data directly to mongo)
overview of precog for mongodb
![Page 21: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/21.jpg)
■ Some sample queries
-- histogram by countrydata := //summer_games/athletessolve 'country { country: 'country, count: count(data where data.Countryname = 'country) }
overview of precog for mongodb
![Page 22: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/22.jpg)
Analyzing Big Data with Quirrel
John A. De GoesPrecog CEO/Founder@jdegoesNov - Dec 2012
![Page 23: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/23.jpg)
Quirrel is a statistically-oriented query language designed for the analysis of large-scale, potentially heterogeneous data sets.
overview
![Page 24: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/24.jpg)
● Simple● Set-oriented● Statistically-oriented● Purely declarative● Implicitly parallel
quirrel
![Page 25: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/25.jpg)
pageViews := //pageViewsavg := mean(pageViews.duration)bound := 1.5 * stdDev(pageViews.duration)pageViews.userId where pageViews.duration > avg + bound
sneak peek
![Page 26: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/26.jpg)
1true[[1, 0, 0], [0, 1, 0], [0, 0, 1]]
"All work and no play makes jack a dull boy"
{"age": 23, "gender": "female", "interests": ["sports", "tennis"]}
quirrel speaks json
![Page 27: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/27.jpg)
-- Ignore me.(- Ignore me, too -)
comments
![Page 28: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/28.jpg)
2 * 4
(1 + 2) * 3 / 9 > 23
3 > 2 & (1 != 2)
false & true | !false
basic expressions
![Page 29: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/29.jpg)
x := 2
square := x * x
named expressions
![Page 30: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/30.jpg)
//pageViews
load("/pageViews")
//campaigns/summer/2012
loading data
![Page 31: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/31.jpg)
pageViews := load("/pageViews")
pageViews.userId
pageViews.keywords[2]
drilldown
![Page 32: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/32.jpg)
count(//pageViews)
sum(//purchases.total)
stdDev(//purchases.total)
reductions
![Page 33: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/33.jpg)
pageViews := //pageViews
pageViews.userId where pageViews.duration > 1000
filtering
![Page 34: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/34.jpg)
clicks with {dow: dayOfWeek(clicks.time)}
augmentation
![Page 35: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/35.jpg)
import std::stats::rank
rank(//pageViews.duration)
standard library
![Page 36: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/36.jpg)
ctr(day) := count(clicks where clicks.day = day) / count(impressions where impressions.day = day)
ctrOnMonday := ctr(1)
ctrOnMonday
user-defined functions
![Page 37: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/37.jpg)
solve 'day {day: 'day, ctr: count(clicks where clicks.day = 'day) / count(impressions where impressions.day = 'day)}
grouping - implicit constraints
![Page 38: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/38.jpg)
solve 'day = purchases.day {day: 'day, cummTotal: sum(purchases.total where purchases.day < 'day)}
grouping - explicit constraints
![Page 39: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/39.jpg)
http://quirrel-lang.org
questions?
![Page 40: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/40.jpg)
Now, it's your turn! Win some cool prizes!
Precog Challenge ProblemsNov - Dec 2012
![Page 41: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/41.jpg)
■ Using the conversions data, find the state with the highest average income.
■ Variable names: conversions.customers.state and conversions.customers.income
precog challenge #1
![Page 42: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/42.jpg)
■ Use Labcoat to display a bar chart of the clicks per month.
■ Variable names: clicks.timestamp
precog challenge #2
![Page 43: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/43.jpg)
■ What product has the worst overall sales to women? To men?
■ Variable names: billing.product.ID, billing.product.price, billing.customer.gender
precog challenge #3
![Page 44: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/44.jpg)
conversions := //conversions
results := solve 'state
{state: 'state,
aveIncome: mean(conversions.customer.income where
conversions.customer.state = 'state)}
results where results.aveIncome = max(results.aveIncome)
precog challenge #1 possible solution
![Page 45: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/45.jpg)
clicks := //clicks
clicks' := clicks with {month: std::time::monthOfYear(clicks.timeStamp)}
solve 'month
{month: 'month, clicks: count(clicks'.product.price where clicks'.month = 'month)}
precog challenge #2 possible solution
![Page 46: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/46.jpg)
billing := //billing
results := solve 'product, 'gender
{product: 'product,
gender: 'gender,
sales: sum(billing.product.price where
billing.product.ID = 'product &
billing.customer.gender = 'gender)}
worstSalesToWomen := results where results.gender = "female" &
results.sales = min(results.sales where results.gender = "female")
worstSalesToMen := results where results.gender = "male" &
results.sales = min(results.sales where results.gender = "male")
worstSalesToWomen union worstSalesToMen
precog challenge #3 possible solution
![Page 47: Precog & MongoDB User Group: Skyrocket Your Analytics](https://reader033.vdocuments.mx/reader033/viewer/2022050905/548754465906b5dd0c8b458b/html5/thumbnails/47.jpg)
Thank you!
Follow us on Twitter@precogio@jdegoes@dchenbecker
Download Precog for MongoDB for FREE:www.precog.com/mongodb
Try Precog for free and get a free account:www.precog.com
Subscribe to our monthly newsletter:www.precog.com/about/newsletter
Nov - Dec 2012