why hadoop for 360 degree insight - technical primercustomer insight? - a technical primer
TRANSCRIPT
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Why Hadoop for 360-Degree Customer Insight? - A Technical Primer Mark Rittman, CTO, Rittman Mead November 2015
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
What is Customer 360-Degree Analysis?
•Gather together all meaningful information about the customer (“360-degree view”) •Organizing, matching, profiling & storing every interaction in real time
•Matched and combined; factual, interpreted, learned ‣Across all channels, and on public forums and social media
•Captures interactions across all-touch points and all channels ‣Including activity on social networks, forums, blogs
•Typically stored and processed in a Hadoop “data reservoir” •Dynamic customer profiles with segmentation, behavioural analysis “at scale”
•Downstream feeds into DW, CRM and other systems
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
•Detailed events and transactions now combined with granular behavioural & attitudinal data
Adding “Who” and “Why” to Customer & Transaction Data
SingleCustomerViewEnriched
CustomerProfile
Correlating
Modeling
Machine Learning
Scoring
“How” InteractionData
Voice+ChatTranscripts In-person
dialogs
Webserverlogs
Blogs
Surveys
SocialMedia
“Why”AttitudinalData
“What” BehaviouralData
Transaction History
Retail Activity
PaymentHistory
BasketAnalysis
Attributes
Segments
Relationships
“Who”DescriptiveData
Demographics
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Connect the Silos, Understand Customers, Drive Decisions
execute smarterlisten better
consumption logs, clickstream & devices
demographic, user and credit data
customer contacts and service cases
transactions and subscriptions
content metadata, ratings, comments
marketing campaign response
social mediaactivity
programmatic advertising
audience acquisition, retention
multi-channelmarketing
targeted promotions
next bestoffer
personalized content
product & service strategy
content acquisition
learn faster
EnrichedCustomerProfile
Correlating
Modeling
Scoring
Micro-Segments
History
Preferences
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
But Wait … Isn’t This Just Data Warehousing & Data Mining?
•Data warehouses were conceived as a single source of reporting truth •Formally accept, model and integrate data to provide analytical reporting platform •Well-established design patterns for long-term data storage •Stored in structured, indexed, optimised “schema on write” storage •Data moved through layers via formal ETL •Extreme Performance, Highly Secure •Analytic SQL, In-Database Analytics ‣So why not use for this Customer 360 data?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Data Warehouse Loading Requires Formal ETL and Modeling
$1m
AnalyticDBMSNode
ETL
DataModel
ETL Developer
DataModeller
CuratedData
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Traditional DW Databases Started as Single-Node Systems
$1m
AnalyticDBMSNode
ETL
DataModel
ETL Developer
DataModeller
ETLDevelopmenttakestime,isfragile,butresultsinwell-curateddataButwhataboutdatawhoseschemaisnowknown?Orfinalusehasnotyetbeendetermined?
DimensionaldatamodellinggivesstructuretothedataforbusinessusersButalsorestrictshowthatdatacanbeanalysedWhatiftheend-userisbetterplacedtoapplythatschema?
CuratedData
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Traditional DW Databases Started as Single-Node Systems
$1m
AnalyticDBMSNode
ETL
DataModel
ETL Developer
DataModeller
Analyticworkloadstypicallyoriginatewithtabular,structureddataWell-suitedtodashboardsandreporting,dimensionalanalysisMostalsosupportdatamining,advancedanalyticsButlimitedintermsofsupportforflexible-schemadatasetsAndlimitedsupportforunstructuredandsemi-structureddataCuratedData
CRMR
DataMining/Stats
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Traditional DW Databases Started as Single-Node Systems
$1m
AnalyticDBMSNodeDBInstance
Compute
ETL
DataModel
ETL Developer
DataModeller
DatabasessuchasOraclewereoriginallydesignedforasingleservernodeScalabilityachievedby“verticalscale-out”,i.e.buyabiggerserver
ButservershavelimitsintermsofhowpowerfuljustonecangetAndcostrisesexponentiallyasRAM,CPUetcincreases
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Shared-Everything DWs Can Scale to 5-10 Nodes - At Cost
$1m
AnalyticDBMSNode
DBInstance
Compute
ETL
DataModel
ETL Developer
DataModeller
$1m
AnalyticDBMSNode
Compute
$1m
AnalyticDBMSNode
Compute
$1m
AnalyticDBMSNode
SingleDBInstance
Compute
ETLDevelopmenttakestime,isfragile,butresultsinwell-curateddataButwhataboutdatawhoseschemaisnowknown?Orfinalusehasnotyetbeendetermined?
DimensionaldatamodellinggivesstructuretothedataforbusinessusersButalsorestrictshowthatdatacanbeanalysedWhatiftheend-userisbetterplacedtoapplythatschema?
ButlimitsonhowfarthiscangoMaximumsizeofclusteraround5-10nodesAndcost-eachnodetypicallycosts$1m
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
AnalyticDBMSNode
AnalyticDBMSNode
Shared-Nothing DWs Scale Further - But Required Sharding
$1m
AnalyticDBMSNode
Compute
DataModel
ComputeCompute
DBShard DBShard DBShard
ComplexShard-AwareETL
A-H I-M N-Z
$1m $1m
Shared-nothingdatabasescanpotentiallyscalefurtherNoneedtomaintainasingledatabaseinstance
Butscalingachievedthrough“sharding”thedatasetETLandotherprocessesneedtoconsiderdatalocality
LeadstomorecomplexETLthansingle-instance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
AnalyticDBMSNode
AnalyticDBMSNode
Shared-Nothing DWs Scale Further - But Required Sharding
$1m
AnalyticDBMSNode
Compute
DataModel
ComputeCompute
DBShard DBShard DBShard
ComplexShard-AwareETL
A-F O-R S-T
$1m $1m
AnalyticDBMSNode
Compute
DBShard
AnalyticDBMSNode
Compute
DBShard
AnalyticDBMSNode
Compute
DBShard
AnalyticDBMSNode
Compute
DBShard
$1m$1m $1m $1m
G-J K-N U-W X-Z
..andaddingmorenodesmeansre-shardingthedatasetAlsorulesoutmixed-workloadDBswithOLTP
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Introducing Hadoop - Cheap, Flexible Storage + Compute
•A new approach to data processing and data storage •Rather than a small number of large, powerful servers, it spreads processing overlarge numbers of small, cheap, redundant servers
•Spreads the data you’re processing over lots of distributed nodes
•Has scheduling/workload process that sends parts of a job to each of the nodes
•And does the processing where the data sits •Shared-nothing architecture •Low-cost and highly horizontal scalable
Job Tracker
Task Tracker Task Tracker Task Tracker Task Tracker
Data Node Data Node Task Tracker Task Tracker
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Introducing Hadoop - Cheap, Flexible Storage + Compute
•Hadoop & NoSQL better suited to exploratory analysis of newly-arrived data ‣Flexible schema - applied by user rather than ETL ‣Cheap expandable storage for detail-level data ‣Better native support for machine-learning anddata discovery tools and processes
‣Potentially a great fit for our new and emergingcustomer 360 datasets, and great platform for analysis
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Hadoop Designed for Real-Time Storage of Raw Data Feeds
$50k
HadoopNode
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogs
Real-timeFeeds
RawData
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Supplement with Batch + API Loads of ERP + 3rd Party Data
$50k
HadoopNode
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogs
Real-timeFeeds
CRMData Transactions SocialFeeds Demographics
BatchLoads APIs,WebServiceCalls
RawData
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Supplement with Batch + API Loads of ERP + 3rd Party Data
$50k
HadoopNode
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogsCRMData Transactions SocialFeeds Demographics
RawData
Customer360Apps
PredictiveModels
SQL-on-Hadoop
Businessanalytics
Real-timeFeeds,batchandAPI
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
HadoopNode HadoopNodeHadoopNodeHadoopNode
Supplement with Batch + API Loads of ERP + 3rd Party Data
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogsCRMData Transactions SocialFeeds Demographics
Real-timeFeeds,batchandAPI
HadoopNode
Compute
HadoopNode
Compute ComputeCompute
$5k
Compute Compute
$50k
HadoopNode
RawDataacrossClusterFilesystem
Compute
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Hadoop-Based Storage & Compute : A Better Logical Fit
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogsCRMData Transactions SocialFeeds Demographics
Real-timeFeeds,batchandAPI
$50k
HadoopNode
$50k
HadoopNode
$50k
HadoopNodeHadoopNodeHadoopNode
$50k$50k
HadoopNode HadoopNode
$50k
EnrichedCustomerProfile
Modeling
Scoring HadoopDataReservoirRawcustomerdatastoredatdetail Enrichedandprocessedforinsights
$50k
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Typically Stored on Flexible, Scalable Hadoop + NoSQL
Voice+ChatTranscripts
CallCenterLogsChatLogs iBeaconLogs WebsiteLogsCRMData Transactions SocialFeeds Demographics
Real-timeFeeds,batchandAPI
$50k
HadoopNode
$50k
HadoopNode
$50k
HadoopNodeHadoopNodeHadoopNode
$50k$50k
HadoopNode HadoopNode
$50k
EnrichedCustomerProfile
Modeling
Scoring HadoopDataReservoirRawcustomerdatastoredatdetail Enrichedandprocessedforinsights
$50k
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
•Oracle Engineered system for big data processing and analysis •Start with Oracle Big Data Appliance Starter Rack - expand up to 18 nodes per rack •Cluster racks together for horizontal scale-out using enterprise-quality infrastructure
OracleBigDataApplianceStarterRack+Expansion
• ClouderaCDH+Oraclesoftware• 18High-specHadoopNodeswith
InfiniBandswitchesforinternalHadooptraffic,optimisedfornetworkthroughput
• 1CiscoManagementSwitch• SingleplaceforsupportforH/W+S/W
Deployed on Oracle Big Data Appliance Engineered System
OracleBigDataApplianceStarterRack+Expansion
• ClouderaCDH+Oraclesoftware• 18High-specHadoopNodeswith
InfiniBandswitchesforinternalHadooptraffic,optimisedfornetworkthroughput
• 1CiscoManagementSwitch• SingleplaceforsupportforH/W+S/W
EnrichedCustomerProfile
Modeling
Scoring
Infiniband
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Architected using “Data Reservoir” Design Pattern
•Data for customer 360 system typically landed into a Hadoop & NoSQL-based •Applies aggregation, joining and machine-learning processes to extract insights
DataTransfer DataAccess
DataFactory DataReservoir
BusinessIntelligenceTools
HadoopPlatform
FileBasedIntegration
StreamBased
Integration
Datastreams
Discovery&DevelopmentLabsSafe&secureDiscoveryandDevelopment
environment
Datasetsandsamples
Models andprograms
Marketing/SalesApplications
Models
MachineLearning
Segments
OperationalData
Transactions
CustomerMasterata
UnstructuredData
Voice+ChatTranscripts
ETLBasedIntegration
RawCustomerData
Datastoredintheoriginal
format(usuallyfiles)suchasSS7,ASN.1,JSONetc.
MappedCustomerData
Datasetsproducedbymappingandtransformingrawdata
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Why Hadoop for 360-Degree Customer Insight? - A Technical Primer Mark Rittman, CTO, Rittman Mead November 2015