big data paris
DESCRIPTION
A talk I gave during the vendor pitch section at Big Data Paris.TRANSCRIPT
![Page 1: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/1.jpg)
1©MapR Technologies - Confidential
Expect More from Hadoop
![Page 2: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/2.jpg)
2©MapR Technologies - Confidential
Introducing MapR
MapR offers thetechnology leading
distribution for Hadoop
![Page 3: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/3.jpg)
3©MapR Technologies - Confidential
The Industry-Leaders Choose MapR in the Cloud
Google chose MapR to provide Hadoop on Google
Compute Engine
Amazon EMR is the largest Hadoop provider in revenue
and # of clusters
![Page 4: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/4.jpg)
4©MapR Technologies - Confidential
MapR Supports Broad Set of Use Cases
Log analysis HBase
Customer targeting Social media analysis
Customer Revenue Analytics
ETL Offload
Advertising exchange analysis and optimization
Clickstream Analysis Quality profiling/field
failure analysis
Customer Sentiment
Network Analytics
Monitors and measures behavior of online shoppers
Fraud Detection Channel analytics
Customer Behavior Analysis Brand Monitoring
Customer targeting Viewer Behavioral analytics
Recommendation Engine Family tree connections
Intrusion detection & prevention Forensic analysis
Global threat analytics
Virus analysis
Patient care monitoring
Leading Retailer Recommendation Engine Fraud detection and Prevention
Leading Bank
![Page 5: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/5.jpg)
5©MapR Technologies - Confidential
Introducing Hadoop
Hadoop is deployed because
a) big datab) fast datac) rapidly changing data
![Page 6: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/6.jpg)
6©MapR Technologies - Confidential
Introducing Hadoop
Hadoop is deployed because
a) big datab) fast datac) rapidly changing data
![Page 7: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/7.jpg)
7©MapR Technologies - Confidential
Introducing Change
Changing data implies a need for integration
![Page 8: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/8.jpg)
8©MapR Technologies - Confidential
Introducing Change
Changing data implies a need for integration
If you copy, the data willchange before you finish.
![Page 9: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/9.jpg)
9©MapR Technologies - Confidential
Controlling Change
Changing data implies a need for stabilization
![Page 10: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/10.jpg)
10©MapR Technologies - Confidential
Controlling Change
Changing data implies a need for stabilization
Long running analyses must have stable data
![Page 11: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/11.jpg)
11©MapR Technologies - Confidential
The Story Can Now be Told
Here are three truestories about how Hadoop integration
pays off
![Page 12: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/12.jpg)
12©MapR Technologies - Confidential
Story #1ETL Off-load
![Page 13: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/13.jpg)
13©MapR Technologies - Confidential
The Problem
Major telecom vendor
Key step in billing pipeline handled by data warehouse (EDW)
EDW at maximum capacity
Multiple rounds of software optimization already done
Revenue limiting (= career limiting) bottleneck
![Page 14: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/14.jpg)
14©MapR Technologies - Confidential
ETLCDR billing
records
Billing reports
Data Warehouse
Customer bills
Original Flow
![Page 15: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/15.jpg)
15©MapR Technologies - Confidential
ETLCDR billing
records
Billing reports
Data Warehouse
Customer bills
Original Flow
70% of total load<10% of total code
Import by bulk load from NFS
![Page 16: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/16.jpg)
16©MapR Technologies - Confidential
ETLCDR billing
records
Billing reports
Data Warehouse
Customer billing
With ETL Offload
Import written to MapR via NFS
Bulk load via NFS from MapR
![Page 17: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/17.jpg)
17©MapR Technologies - Confidential
Simplified Analysis – EDW Strategy
70% of EDW consumed by ETL processing EDW direct hardware cost is approximately $30 million CAPEX, 12
million OPEX Additional EDW only increases capacity by 50% due to poor
division of labor
![Page 18: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/18.jpg)
18©MapR Technologies - Confidential
Simplified Analysis – MapR Strategy
Hardware + MapR cost ~ $1.5 million
ETL replacement development costs ~ $1.5 million
Result is 3x performance increase
![Page 19: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/19.jpg)
19©MapR Technologies - Confidential
Price Performance
EDW strategy– 1.5 x performance– $30 million
MapR Strategy– 3 x performance– $3 million
20x cost/performance advantage for MapR strategy
![Page 20: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/20.jpg)
20©MapR Technologies - Confidential
Story #2Search Abuse
![Page 21: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/21.jpg)
21©MapR Technologies - Confidential
The Problem
Build a high performance recommendation– Use all kinds of available data
Deploy it to production– Must have efficient deployment
![Page 22: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/22.jpg)
22©MapR Technologies - Confidential
Input Data User transactions– user id, merchant id– SIC code, amount
Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts
![Page 23: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/23.jpg)
23©MapR Technologies - Confidential
Input Data User transactions– user id, merchant id– SIC code, amount
Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts
Import data via standard interfaces from log files, databases, direct feeds
Find anomalous indicators of behavior
![Page 24: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/24.jpg)
24©MapR Technologies - Confidential
Search-based Recommendations
Sample document– Merchant Id– Field for text description– Phone– Address– Location
![Page 25: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/25.jpg)
25©MapR Technologies - Confidential
Search-based Recommendations
Sample “document”– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
![Page 26: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/26.jpg)
26©MapR Technologies - Confidential
Search-based Recommendations
Sample “document”– Merchant Id– Field for text description– Phone– Address– Location
– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40
User History (query)– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40
![Page 27: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/27.jpg)
27©MapR Technologies - Confidential
SolRIndexerSolR
IndexerSolrindexing
Cooccurrence(Mahout)
Item meta-data
Indexshards
Transactions
Web Views
Email offers
![Page 28: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/28.jpg)
28©MapR Technologies - Confidential
SolRIndexerSolR
IndexerSolrindexing
Cooccurrence(Mahout)
Item meta-data
Indexshards
Transactions
Web Views
Email offers
Legacy code runs directly in map-
reduce framework
![Page 29: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/29.jpg)
29©MapR Technologies - Confidential
SolRIndexerSolR
IndexerSolrsearchWeb tier
Item meta-data
Indexshards
User history
![Page 30: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/30.jpg)
30©MapR Technologies - Confidential
SolRIndexerSolR
IndexerSolrsearchWeb tier
Item meta-data
Indexshards
User history
SolrCloud runs without change
via NFS
![Page 31: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/31.jpg)
31©MapR Technologies - Confidential
Objective Results
At a very large credit card company
History is all transactions, all web interaction
Processing time cut from 20 hours per day to 3
Recommendation engine load time decreased from 8 hours to 3 minutes
![Page 32: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/32.jpg)
32©MapR Technologies - Confidential
Story #3Stable
Learning
![Page 33: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/33.jpg)
33©MapR Technologies - Confidential
The Theme and Setting
A humble machine learning expert once lived in a small cubicle
One day the CEO walked in and said– Your machine recommended PINK WAFFLES to my wife!!!– Tell me why it is suddenly doing this
![Page 34: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/34.jpg)
34©MapR Technologies - Confidential
The Theme and Setting
A humble machine learning expert once lived in a small cubicle
One day the CEO walked in and said– Your machine recommended PINK WAFFLES to my wife!!!– Tell me why it is suddenly doing this
The machine learning expert could say nothing because he could not reproduce the conditions that model was trained with
The CEO was not pleased
![Page 35: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/35.jpg)
35©MapR Technologies - Confidential
Why?
![Page 36: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/36.jpg)
36©MapR Technologies - Confidential
StormKafka
Data LoggerKafka
ClusterKafka
ClusterKafka
Cluster
Kafka API
Web Service NAS
Web Data
Hadoop
Flume
HDFS Data
Web-site
![Page 37: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/37.jpg)
37©MapR Technologies - Confidential
StormKafka
Data LoggerKafka
ClusterKafka
ClusterKafka
Cluster
Kafka API
Web Service NAS
Web Data
Hadoop
Flume
HDFS Data
Data arrives continuously
Web-site
Learning steps can’t be tied to
delayed dataIt can be delayed
arbitrarily
![Page 38: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/38.jpg)
38©MapR Technologies - Confidential
The Essence of the Problem
Coupling data arrival with modeling makes the data chain brittle– Minor delays in data delivery will break modeling SLA’s
But if data can arrive late and restate the past then we can’t easily replicate a model build
Existing data chains don’t support full bitemporal queries
![Page 39: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/39.jpg)
39©MapR Technologies - Confidential
MapR
Data Logger
Web-site
Snap
Data
Modeling
ModelModelModelModel
Mirror
Live System
![Page 40: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/40.jpg)
40©MapR Technologies - Confidential
The New Story
A humble machine learning expert once lived in a small cubicle
One day the CEO walked in and said– Your machine recommended PINK WAFFLES to my wife!!!– Tell me why it is suddenly doing this
![Page 41: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/41.jpg)
41©MapR Technologies - Confidential
The New Story
A humble machine learning expert once lived in a small cubicle
One day the CEO walked in and said– Your machine recommended PINK WAFFLES to my wife!!!– Tell me why it is suddenly doing this
The machine learning expert could– Pull out all previously deployed models– Could exactly replicate any training run with any version of software– Could point out that PINK WAFFLES were actually quite stylish
The CEO was very pleased … he ran off to buy pink waffles
![Page 42: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/42.jpg)
42©MapR Technologies - Confidential
Expect more fromHadoop
![Page 43: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/43.jpg)
43©MapR Technologies - Confidential
Expect MapR
![Page 44: Big Data Paris](https://reader036.vdocuments.mx/reader036/viewer/2022062405/554f5b05b4c905524c8b54c9/html5/thumbnails/44.jpg)
44©MapR Technologies - Confidential
Contact me!
[email protected] or [email protected]
@ted_dunning
Come to the MapR booth