a6 big data_in_the_cloud
TRANSCRIPT
Big Data In the Cloud
Simon Tang
Master Principle Sales Consultant
BI and Big Data Team, HK
October, 2016
2
3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Innovation
&
4 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Innovation
Execution
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Transforming to Big Data
From Data Warehouse
• Relational
– On-Premises
• Transactional data
• Analytics + Data Mining
To Big Data
• Relational + Hadoop and NoSQL
– On-Premises + Cloud
• Transactional + Social, Web and IoT
• Analytics + Data Mining + Machine Learning
5
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 6
Transactional
Data Warehouse
SQL
Social, Web
Data Lake
IoT
Fast Data
node.js Java REST Python Scala R
To Big Data
Consolidate Metadata across Silos
Unique Architecture to deliver Performance:
Optimize queries
Pushing down processing
Extend Security model
From Data Warehouse
Applications
Simplicity
Specialization
Performance
Analytics
Complexity
Fragmentation
Delays
Oracle Big Data SQL
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 7
Existing People
Existing Applications
More Data
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Time to root cause identification (Reduction in) Time to insight
Confidential – Oracle Internal/Restricted/Highly Restricted 8
The Value of Minimizing Data Movement
86%
7 days
3 weeks
4 hours
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Graph
• Massively-Scalable Graph Database
• 40+ in-memory parallel algorithms
• Simple standard interfaces: R, Python...
9
Comprehensive Data Science Capabilities Any analysis across relational, Hadoop, Spark and NoSQL
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Graph - Sample Use Cases
Traffic Pattern – Shortest path
Multi-Modal Routing
Find closest store within a specified drive time
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Who is most important? There Are Lots of Answers.
• Answers from Aggregation
– Who spends the most?
– Who buys the highest margin goods?
– Who is most consistently a top contributor?
• Answers from Connectivity
– Who’s most influential?
– Which supplier do I depend on the most?
– What is the right product mix for millennials?
Tabular questions: Well-suited to SQL-like tools
Graph questions: We need something different!
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
What Big Data problems can Graphs address?
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication Stream (e.g. tweets)
Graph Pattern Matching Community Detection
Recommend the most similar item purchased by similar people
Find out people that are central in the given network – e.g. influencer marketing
Identify group of people that are close to each other – e.g. target group marketing
Find out all the sets of entities that match to the given pattern – e.g. fraud detection
12
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 13
Faster Time to Results: Graph
• Parallel and Distributed Graph Processing
• Optimized for Infiniband network
• Ultra balanced workload partitioning using state-of-the-art, proprietary techniques
1
10
100
1000
2 4 8 16 32
Exe
cuti
on
Tim
e (
seco
nd
s)
log
sca
le
Cluster Size
Oracle PGX outperforms GraphX up to 160X
Weakly Connected Components (WCC) 1.4 Billion edge graph
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Graph
• Massively-Scalable Graph Database
• 40+ in-memory parallel algorithms
• Simple standard interfaces: R, Python...
Machine Learning
• Scale to big data using R and SQL
• 40+ parallel, distributed algorithms
• Automated data preparation and modeling
• Enhances and extends SparkML
15
Comprehensive Data Science Capabilities Any analysis across relational, Hadoop, Spark and NoSQL
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Machine Learning - Better Information, Valuable Insights and Predictions
Customer Months
Cell Phone Churners vs. Loyal Customers
Insight & Prediction Segment #1 IF CUST_MO > 14 AND INCOME <
$90K, THEN Prediction = Cell Phone Churner
Confidence = 100% Support = 8/39
Segment #3 IF CUST_MO > 7 AND INCOME <
$175K, THEN Prediction = Cell Phone Churner, Confidence = 83% Support = 6/39
Source: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff
R
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Be Specific in Problem Statement Poorly Defined Better
Predict employees that leave •Based on past employees that voluntarily left: • Create New Attribute EmplTurnover O/1
Predict customers that churn •Based on past customers that have churned: • Create New Attribute Churn YES/NO
Target “best” customers •Recency, Frequency Monetary (RFM) Analysis •Specific Dollar Amount over Time Window: • Who has spent $500+ in most recent 18 months
How can I make more $$? •What helps me sell soft drinks & coffee?
Which customers are likely to buy? •How much is each customer likely to spend?
Who are my “best customers”? •What descriptive “rules” describe “best customers”?
How can I combat fraud? •Which transactions are the most anomalous? • Then roll-up to physician, claimant, employee, etc.
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 18
Faster Time to Results: Machine Learning Oracle Big Data Cloud Service outperforms Spark MLlib by 10X
264 sec
24 sec
Spark Oracle
• Re-designed, optimized algorithms
• Better parallelization, distribution
• Efficient memory utilization
• Superior solver technology
Linear Model algorithm over airline data Predict 8,000+ coefficients over 120M records
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 19
Machine Learning: Not Just For Data Scientists
EXPERIMENT
ANALYZE
& ACT
20
Manage, secure and make all data available
Connect people to information they need
Innovation through data experiments and advanced analytics
Transform workplace and workforce through insights
COLLECT
MANAGE
Oracle’s Unified Big Data Management and Analytics Strategy
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22
DataFlow ML CS
Big Data Preparation CS
IoT CS
Big Data CS
NoSQL Database CS
Data Visualization CS Big Data Discovery CS
Stream Analytics CS*
R on Hadoop*
Spatial and Graph*
23
EXPERIMENT
ANALYZE
& ACT
COLLECT
MANAGE
DataFlow ML CS
Big Data Preparation CS
IoT CS
GoldenGate CS
Event Hub CS
Data Integrator CS
Big Data CS
NoSQL Database CS
Big Data CS CE
Big Data SQL CS
NoSQL Database CS
Exadata CS
Data Visualization CS
Business Intelligence CS
Spatial and Graph*
Advanced Analytics*
R on Hadoop*
Big Data Discovery CS
Stream Analytics CS*
R on Hadoop*
Spatial and Graph*
* Bundled with other Cloud Services
Oracle Cloud Platform for Big data
24
New At Oracle OpenWorld
Oracle Big Data Cloud Service –
Compute Edition
Oracle Dataflow ML Cloud Service
Oracle NoSQL Database Cloud
Service
Oracle Event Hub Cloud Service
Big Data CS
• Spark and all Hadoop services
• Bursting new
• Dedicated compute and storage
• Automated, Flexible
• Includes: R, multimedia, Spatial, Graph analytics; data integration tools, Cloudera Enterprise Data Hub
• Big Data SQL Cloud Service coming
Big Data CS – Compute Edition
• Spark and HDFS services
• Fully Elastic
• Shared compute and storage
• Fully Managed
25
Choice of Big Data Cloud Services Announcing
Customer Data Center
Purchased
Customer Managed
26
Choice of Big Data Deployment Models
Big Data Appliance
Customer Data Center
Subscription
Oracle Managed
Big Data Cloud Machine
Oracle Cloud
Subscription
Oracle Managed
Big Data Cloud Service
27
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 29