spark and the enterprise by tony baer
TRANSCRIPT
www.ovum.com
© Copyright Ovum 2015. All rights reserved.
Spark & the Enterprise
Tony Baer
Presentation for Spark Summit East 2016
2© Copyright Ovum 2015. All rights reserved.
Spark eating the Big Data world
§ 40+ committers
§ 1000 contributors
§ 179 projects using Spark engine
§ 370k+ LOCSource: Databricks, January 2015 The most active Apache project
3© Copyright Ovum 2015. All rights reserved.
“The leading candidate for ‘successor’ to MapReduce today is Apache Spark.”
Mike OlsonChief Strategy Officer, Cloudera 12/30/2013
Bob PiccianoSVP, Analytics, IBM 6/15/2015
When IBM put its muscle behind Linux in 1999, that move marked the beginning of its ascendancy in corporations and Internet-class data centers. The same sort of thing could happen now with Spark.
5© Copyright Ovum 2015. All rights reserved.
What’s there to like?Ease of development
Performance
FlexibilityExtensibility
Versatility
10 – 100x faster than MapReduce
Batch + Real timeCore projects & 80+ libraries
Orchestrate multiple analytic processes
Higher level programming abstraction
7© Copyright Ovum 2015. All rights reserved.
What’s really there to like?Ease of development
Performance
FlexibilityExtensibility
Versatility
10 – 100x faster than MapReduce
Orchestrate multiple analytic processes
Higher level programming abstraction
Batch + Real time
Handle different & complex scenarios
Better Productivity
Wider tool selection
Smarter predictions
Handle more varied scenarios
Better programming
Core projects & 80+ libraries
8© Copyright Ovum 2015. All rights reserved.
What’s really there to like?Ease of development
Performance
VersatilityExtensibility
Versatility
10 – 100x faster than MapReduce
Orchestrate multiple analytic processes
Higher level programming abstraction
Batch + Real time
Handle more complex scenarios
Better Productivity
Wider tool selection
Smarter predictions
Handle more varied & complex scenarios
Better programming
Core projects & 80+ libraries
9© Copyright Ovum 2015. All rights reserved.
What’s really there to like?Ease of development
Performance
VersatilityExtensibility
Versatility
10 – 100x faster than MapReduce
Orchestrate multiple analytic processes
Higher level programming abstraction
Batch + Real time
Handle more complex scenarios
Better Productivity
Wider tool selection
Smarter predictions
Handle more varied & complex scenarios
Better programming
Core projects & 80+ librariesSo what?
11© Copyright Ovum 2015. All rights reserved.
Focus on the results
§ What use cases/business scenarios/business problems can Spark address?
§ How does Spark impact analytics?
§ What questions are asked?
§ How questions are asked?§ Types of analytics that are performed?
§ Timeliness of results?
§ The insights that can be obtained?
12© Copyright Ovum 2015. All rights reserved.
Common analytics use cases
Workload shift
Customer Engagement
Risk/Fraud/Security
Operations
Customer RetentionCustomer ExperienceUpsell/Cross-SellSocial Tribe InfluenceReal-time Customer Offer
Risk MitigationFraud Detection/PreventionIntrusion Detection
Operational EfficiencyProcess OptimizationAsset & Service Mgmt.Performance Mgmt.
ETL processesBatch analytics
Many use cases are familiar… the results are different
13© Copyright Ovum 2015. All rights reserved.
Smart City:Manage Traffic flow
From Sense & respond to….
Real-time analytics + interactive query + long running ML = better insights for managing traffic
14© Copyright Ovum 2015. All rights reserved.
Monitoring Automotive product performance
From:
§ Track warranty & repair trends (after the fact)
To:
§ Identify signals from social media to prepare auto mfr & dealer network to anticipate performance issues
§ Use Spark MLlib machine learning capabilities
Benefits:
§ Provided advance warning of customer feedback
§ MLlib libraries eliminated need for custom programming ML functions Source: Toyota 12-week pilot program
15© Copyright Ovum 2015. All rights reserved.
Data wrangling to spot financial fraud
From:
§ DW populated with data from internal sources (mostly OLTP data)
To:
§ Broadening data set to widely varying sources (transactions, text messages, social media) with 10s or 100s of millions of records
§ Use Spark-based ML-powered data prep tool to harmonize data to ID outliers & patterns
Benefits:
§ Spark performance enabled team to expand data pool, query interactively & run more what-if scenarios for spotting fraud
16© Copyright Ovum 2015. All rights reserved.
Customer Experience (CX) Management
From:
§ Surveys, focus groups, CRM data
To:
§ Predictive analytics for improving the customer experience
§ Spark-enabled machine learning for identifying CX trends, customer satisfaction levels; Graph analytics for connecting customer experiences across different channels and ID’inginfluencers & followers
Benefits:
§ Changes CX management from reactive to proactive
17© Copyright Ovum 2015. All rights reserved.
Why Spark?From the tech argument
Ease of development
Performance
FlexibilityExtensibility
Versatility
10 – 100x faster than MapReduce
Batch + Real timeCore projects & 80+ libraries
Orchestrate multiple analytic processes
Higher level programming abstraction
18© Copyright Ovum 2015. All rights reserved.
Why Spark?To: Business Benefits
§ Automotive product performance
§ Machine learning enables the automotive OEM to be proactive in deciphering the signals to anticipate consumer sentiment/perceptions of product performance
§ Business Benefit: Head off product complaints/potential liability/reputational issues before they explode
§ Financial fraud detection
§ Spark’s scalability allows crunching of more complete data sets; performance produces more timely results; machine learning IDs emergent outliers of interest
§ Business Benefit: More thorough, timely detection of fraud
§ Customer Experience
§ Machine learning allows proactive deciphering of signals; graph computing identifies social tribes & influencers
§ Business Benefit: Keep more in sync with customers. Act, not react to events, trends, changes in customer climate
19© Copyright Ovum 2015. All rights reserved.
Takeaways
§ Spark enthusiasm in practitioner community has gone viral
§ Spark community highly successful in sparking vendor support.
§ Spark practitioners must take the message on Spark to higher level: Talk to the business
§ Keep your message real:
§ Business benefits
§ Don’t promise the sky
§ Spark is not the only path to ML, graph, streaming, etc. But API compatibility provides accessibility, enables flexibility & versatility
§ Spark is still in adolescence.
www.ovum.com
© Copyright Ovum 2015. All rights reserved.
Thank you
Tony Baer
Ovum
(646) 546-5330
[email protected] Twitter: @TonyBaer