building a marketing data lake

29
How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY

Upload: sumit-sarkar

Post on 08-Jan-2017

458 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building a marketing data lake

How Big Data ISVs get marketing data into lakes

Sumit Sarkar

Chief Data Evangelist

Progress DataDirect

Gary Angel

Advisory Digital Analytics Center of Excellence Principal

EY

Page 2: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.2

Audio Bridge Options & Question Submission

Page 3: Building a marketing data lake

How Big Data ISVs get marketing data into lakes

Sumit Sarkar

Chief Data Evangelist

Progress DataDirect

Gary Angel

Advisory Digital Analytics Center of Excellence Principal

EY

Page 4: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.4

Agenda What is a Marketing Data Lake? Industry trends around accessing marketing

data in SaaS applications How to ingest data with Apache Sqoop and

Apache Falcon directly from SaaS applications How big data vendors can embed SaaS

connectivity

Page 5: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.5

What is a Marketing Data Lake?

Page 6: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.6

A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs”

- SAS Institute

What is a Marketing Data Lake?

Page 7: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.7

Benefits of a Marketing Data Lake?

Some of the benefits of a data lake include: Store data in all shapes and sizes Flexible analytics with “schema on read” Query data using SQL or big data

programming frameworks Eliminate data silos

Page 8: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.8

Why Marketing Data?

CMOs will outspend CIOs on technology by 2017 (Gartner)

Oracle spent $3B on a martech aquisition spree to gain CMO mindshare.

Expect more collaboration between CMO and CIO (CIO.com)

Modern Marketing Data Warehouse Webinar ~500 registrations (Progress)

Page 9: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.9

Industry trends around accessing marketing data in SaaS applications

Page 10: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.10

It’s easy to forget that it’s still about solving real business problems.

Relevant data

Transaction / behavior history

Manage

DataPerform

AnalyticsDrive

DecisionsInsights

continuous feedback loop

Appropriatedata sources

Answers to business questions

Strategy (Thinking) Moves Right to Left

Implementation Moves Left to Right

Before you think data, think decisions!

Page 11: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.11

Our marketing data is almost all in the cloud

CRM Web Behavior

Mobile Behavior

Search Buys

Display Buys

Owned Social

Public SocialMeta-Data

And it’s almost all complex, stream data – which means APIs that only give aggregations aren’t too useful

Page 12: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.12

Detail is important because this digital data is true big data

The relationship

between events is

critical

Page 13: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.13

We’re almost never solving for one problem with a big data system

Reporting Analytics

SummarizedData

SegmentedData

DetailData

We can’t just aggregate / We can’t not aggregate

Dashboarding

Campaign

Optimization

Customer

Drill-downAttribution, CLTV,

Experience,

Personalization

Targeting

Forecasting

Page 14: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.14

Segmentation is a one important technique to aggregate and join

Customer segmentation

Visit type identification RFM models KPDs and

metrics

Measuremen

t Foundation

Customers v. prospects

Owned products

Persona

Product focused

Shopping focused

Social focused

Customer service

Measurement of success specific to

each segment and visit

Recency and frequency for

every segment and

visit type

Additional metrics that help identify

drivers of success

Segmentation allows for effective aggregation of the meaning and outcome of streamed event data:

Measurement foundation

Page 15: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.15

End-to-End Strategies

ReportCubeParkFull Detail

ReportParkFull Detail

Most organizations do some combination of at least 1 & 2

Direct to Detail (2) has many advantages if it can be made performant (more flexible reporting and much less maintenance)

Semi-Detail (3) is designed to capture most of the advantages of (2) when (2) isn’t performant ReportSemi-

DetailParkFull Detail

1

2

3

Page 16: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.16

How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications

Page 17: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.17

What is Apache Sqoop?

Apache SqoopApache Sqoop(TM) is a tool designed for efficiently transferring bulk data between 

Apache Hadoop and structured datastores such as relational databases.

Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project

http://sqoop.apache.org/

Page 18: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.18

What is Apache Falcon?

Apache FalconFalcon is a feed processing and feed management system aimed at making it easier for

end consumers to onboard their feed processing and feed management on hadoop clusters.

https://falcon.apache.org/

Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop.

Page 19: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.19

Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed for Process Integration, not Data Integration

Page 20: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.20

How to ingest data directly from SaaS applications

Page 21: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.21

JDBC access to SaaS data

Progress DataDirectJDBC Connector

Schema Manager

Apache Sqoop

Salesforce.comSchema

User DefinedSchema

Driver uses SOAP API

Bulk API

Metadata API

Page 22: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.22

Geek Speak

$ sqoop help import

usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:

--connect <jdbc-uri> Specify JDBC connect string

--connect-manager <jdbc-uri> Specify connection manager class to use

--driver <class-name> Manually specify JDBC driver class to use

--hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME

--help Print usage instructions

-P Read password from console

--password <password> Set authentication password

--username <username> Set authentication username

--verbose Print more information while working

--hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME

Page 23: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.23

Why ISVs are turning to a single interface for SaaS?

Get JDBC interface on top of any API

Data Source APIEloqua Web Services API (REST/SOAP)

Bulk and non-Bulk APIsNo query language

Oracle Service Cloud Web Services APIs (REST/SOAP)ROQL

Google Analytics Hypercube (query limits of 10 metrics grouped by max of 7 dimensions)

Veeva CRM SOAP, BULK, Metadata APIsSOQL

Page 24: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.24

As the Market Switches from ETL to ELT, Data Access is critical

ETL

ELT

Extract

Transform

Load View

Operational Systems Staging Area Data Warehouse Analytics Apps

Operational Systems

Extract &Load

Big Data Warehouse

Transform & View

Analytics, Data Prep, and even traditional DW

Page 25: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.25

How big data vendors are embeding SaaS connectivity

Page 26: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.26

Progress DataDirectEmbed Sales & Marketing Connectors into the Data Access Layer

Page 27: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.27

Ingest data across 200+ data sources (beyond marketing data sources)

Big Data/NoSQL Apache Hadoop Hive

Cloudera

Hortonworks

Pivotal HD

MapR

EMR

Pivotal HAWQ Cloudera Impala MongoDB

Spark SQL

Cassandra

SAP HANA

Data Warehouses Amazon Redshift SAP Sybase IQ Teradata

Pivotal Greenplum

Relational Oracle DB

Microsoft SQL Server

IBM DB2

MySQL

PostgreSQL

IBM Informix

SAP Sybase Pervasive SQL Progress OpenEdge

Progress Rollbase

SaaS/Cloud Salesforce.com

Database.com

FinancialForce Veeva CRM ServiceMAX

Any Force.com App

Hubspot

Marketo

Microsoft Dynamics CRM

Microsoft SQL Azure

Oracle Eloqua

Oracle Service Cloud

Google Analytics

EDI/XML/Text EDIFACT

EDIG@S

EANCOM

X12

IATA

Healthcare EDI: X12, HIPAA, ICD-10, HL7

Custom EDI

Flat files: CSV, TSV, dBase, Clipper, Foxpro, Paradox

Text Files

Any SDK

SequeLink Socket Server

Customer Engineering

Page 28: Building a marketing data lake

© 2015 Progress Software Corporation. All rights reserved.28

Single API for data lake ingestion from SaaS sources

Ingest data against a single API (JDBC)

Get a single dedicated partner

Connect to unlimited data with a single API

Get unlimited support

Page 29: Building a marketing data lake

How Big Data ISVs get marketing data into lakes

Sumit Sarkar

Chief Data Evangelist

Progress DataDirect

Gary Angel

Advisory Digital Analytics Center of Excellence Principal

EY