building a marketing data lake
TRANSCRIPT
How Big Data ISVs get marketing data into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics Center of Excellence Principal
EY
© 2015 Progress Software Corporation. All rights reserved.2
Audio Bridge Options & Question Submission
How Big Data ISVs get marketing data into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics Center of Excellence Principal
EY
© 2015 Progress Software Corporation. All rights reserved.4
Agenda What is a Marketing Data Lake? Industry trends around accessing marketing
data in SaaS applications How to ingest data with Apache Sqoop and
Apache Falcon directly from SaaS applications How big data vendors can embed SaaS
connectivity
© 2015 Progress Software Corporation. All rights reserved.5
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.6
A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs”
- SAS Institute
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.7
Benefits of a Marketing Data Lake?
Some of the benefits of a data lake include: Store data in all shapes and sizes Flexible analytics with “schema on read” Query data using SQL or big data
programming frameworks Eliminate data silos
© 2015 Progress Software Corporation. All rights reserved.8
Why Marketing Data?
CMOs will outspend CIOs on technology by 2017 (Gartner)
Oracle spent $3B on a martech aquisition spree to gain CMO mindshare.
Expect more collaboration between CMO and CIO (CIO.com)
Modern Marketing Data Warehouse Webinar ~500 registrations (Progress)
© 2015 Progress Software Corporation. All rights reserved.9
Industry trends around accessing marketing data in SaaS applications
© 2015 Progress Software Corporation. All rights reserved.10
It’s easy to forget that it’s still about solving real business problems.
Relevant data
Transaction / behavior history
Manage
DataPerform
AnalyticsDrive
DecisionsInsights
continuous feedback loop
Appropriatedata sources
Answers to business questions
Strategy (Thinking) Moves Right to Left
Implementation Moves Left to Right
Before you think data, think decisions!
© 2015 Progress Software Corporation. All rights reserved.11
Our marketing data is almost all in the cloud
CRM Web Behavior
Mobile Behavior
Search Buys
Display Buys
Owned Social
Public SocialMeta-Data
And it’s almost all complex, stream data – which means APIs that only give aggregations aren’t too useful
© 2015 Progress Software Corporation. All rights reserved.12
Detail is important because this digital data is true big data
The relationship
between events is
critical
© 2015 Progress Software Corporation. All rights reserved.13
We’re almost never solving for one problem with a big data system
Reporting Analytics
SummarizedData
SegmentedData
DetailData
We can’t just aggregate / We can’t not aggregate
Dashboarding
Campaign
Optimization
Customer
Drill-downAttribution, CLTV,
Experience,
Personalization
Targeting
Forecasting
© 2015 Progress Software Corporation. All rights reserved.14
Segmentation is a one important technique to aggregate and join
Customer segmentation
Visit type identification RFM models KPDs and
metrics
Measuremen
t Foundation
Customers v. prospects
Owned products
Persona
Product focused
Shopping focused
Social focused
Customer service
Measurement of success specific to
each segment and visit
Recency and frequency for
every segment and
visit type
Additional metrics that help identify
drivers of success
Segmentation allows for effective aggregation of the meaning and outcome of streamed event data:
Measurement foundation
© 2015 Progress Software Corporation. All rights reserved.15
End-to-End Strategies
ReportCubeParkFull Detail
ReportParkFull Detail
Most organizations do some combination of at least 1 & 2
Direct to Detail (2) has many advantages if it can be made performant (more flexible reporting and much less maintenance)
Semi-Detail (3) is designed to capture most of the advantages of (2) when (2) isn’t performant ReportSemi-
DetailParkFull Detail
1
2
3
© 2015 Progress Software Corporation. All rights reserved.16
How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.17
What is Apache Sqoop?
Apache SqoopApache Sqoop(TM) is a tool designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project
http://sqoop.apache.org/
© 2015 Progress Software Corporation. All rights reserved.18
What is Apache Falcon?
Apache FalconFalcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop clusters.
https://falcon.apache.org/
Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop.
© 2015 Progress Software Corporation. All rights reserved.19
Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed for Process Integration, not Data Integration
© 2015 Progress Software Corporation. All rights reserved.20
How to ingest data directly from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.21
JDBC access to SaaS data
Progress DataDirectJDBC Connector
Schema Manager
Apache Sqoop
Salesforce.comSchema
User DefinedSchema
Driver uses SOAP API
Bulk API
Metadata API
© 2015 Progress Software Corporation. All rights reserved.22
Geek Speak
$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
--connect <jdbc-uri> Specify JDBC connect string
--connect-manager <jdbc-uri> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME
© 2015 Progress Software Corporation. All rights reserved.23
Why ISVs are turning to a single interface for SaaS?
Get JDBC interface on top of any API
Data Source APIEloqua Web Services API (REST/SOAP)
Bulk and non-Bulk APIsNo query language
Oracle Service Cloud Web Services APIs (REST/SOAP)ROQL
Google Analytics Hypercube (query limits of 10 metrics grouped by max of 7 dimensions)
Veeva CRM SOAP, BULK, Metadata APIsSOQL
© 2015 Progress Software Corporation. All rights reserved.24
As the Market Switches from ETL to ELT, Data Access is critical
ETL
ELT
Extract
Transform
Load View
Operational Systems Staging Area Data Warehouse Analytics Apps
Operational Systems
Extract &Load
Big Data Warehouse
Transform & View
Analytics, Data Prep, and even traditional DW
© 2015 Progress Software Corporation. All rights reserved.25
How big data vendors are embeding SaaS connectivity
© 2015 Progress Software Corporation. All rights reserved.26
Progress DataDirectEmbed Sales & Marketing Connectors into the Data Access Layer
© 2015 Progress Software Corporation. All rights reserved.27
Ingest data across 200+ data sources (beyond marketing data sources)
Big Data/NoSQL Apache Hadoop Hive
Cloudera
Hortonworks
Pivotal HD
MapR
EMR
Pivotal HAWQ Cloudera Impala MongoDB
Spark SQL
Cassandra
SAP HANA
Data Warehouses Amazon Redshift SAP Sybase IQ Teradata
Pivotal Greenplum
Relational Oracle DB
Microsoft SQL Server
IBM DB2
MySQL
PostgreSQL
IBM Informix
SAP Sybase Pervasive SQL Progress OpenEdge
Progress Rollbase
SaaS/Cloud Salesforce.com
Database.com
FinancialForce Veeva CRM ServiceMAX
Any Force.com App
Hubspot
Marketo
Microsoft Dynamics CRM
Microsoft SQL Azure
Oracle Eloqua
Oracle Service Cloud
Google Analytics
EDI/XML/Text EDIFACT
EDIG@S
EANCOM
X12
IATA
Healthcare EDI: X12, HIPAA, ICD-10, HL7
Custom EDI
Flat files: CSV, TSV, dBase, Clipper, Foxpro, Paradox
Text Files
Any SDK
SequeLink Socket Server
Customer Engineering
© 2015 Progress Software Corporation. All rights reserved.28
Single API for data lake ingestion from SaaS sources
Ingest data against a single API (JDBC)
Get a single dedicated partner
Connect to unlimited data with a single API
Get unlimited support
How Big Data ISVs get marketing data into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics Center of Excellence Principal
EY