a6 big data_in_the_cloud

28
Big Data In the Cloud Simon Tang Master Principle Sales Consultant BI and Big Data Team, HK October, 2016

Upload: dr-wilfred-lin-phd

Post on 11-Jan-2017

14 views

Category:

Business


0 download

TRANSCRIPT

Page 1: A6 big data_in_the_cloud

Big Data In the Cloud

Simon Tang

Master Principle Sales Consultant

BI and Big Data Team, HK

October, 2016

Page 2: A6 big data_in_the_cloud

2

Page 3: A6 big data_in_the_cloud

3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Innovation

Page 4: A6 big data_in_the_cloud

&

4 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Innovation

Execution

Page 5: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Transforming to Big Data

From Data Warehouse

• Relational

– On-Premises

• Transactional data

• Analytics + Data Mining

To Big Data

• Relational + Hadoop and NoSQL

– On-Premises + Cloud

• Transactional + Social, Web and IoT

• Analytics + Data Mining + Machine Learning

5

Page 6: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 6

Transactional

Data Warehouse

SQL

Social, Web

Data Lake

IoT

Fast Data

node.js Java REST Python Scala R

To Big Data

Consolidate Metadata across Silos

Unique Architecture to deliver Performance:

Optimize queries

Pushing down processing

Extend Security model

From Data Warehouse

Applications

Simplicity

Specialization

Performance

Analytics

Complexity

Fragmentation

Delays

Oracle Big Data SQL

Page 7: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 7

Existing People

Existing Applications

More Data

Page 8: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Time to root cause identification (Reduction in) Time to insight

Confidential – Oracle Internal/Restricted/Highly Restricted 8

The Value of Minimizing Data Movement

86%

7 days

3 weeks

4 hours

Page 9: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Graph

• Massively-Scalable Graph Database

• 40+ in-memory parallel algorithms

• Simple standard interfaces: R, Python...

9

Comprehensive Data Science Capabilities Any analysis across relational, Hadoop, Spark and NoSQL

Page 10: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Graph - Sample Use Cases

Traffic Pattern – Shortest path

Multi-Modal Routing

Find closest store within a specified drive time

Page 11: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Who is most important? There Are Lots of Answers.

• Answers from Aggregation

– Who spends the most?

– Who buys the highest margin goods?

– Who is most consistently a top contributor?

• Answers from Connectivity

– Who’s most influential?

– Which supplier do I depend on the most?

– What is the right product mix for millennials?

Tabular questions: Well-suited to SQL-like tools

Graph questions: We need something different!

Page 12: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

What Big Data problems can Graphs address?

Purchase Record

customer items

Product Recommendation Influencer Identification

Communication Stream (e.g. tweets)

Graph Pattern Matching Community Detection

Recommend the most similar item purchased by similar people

Find out people that are central in the given network – e.g. influencer marketing

Identify group of people that are close to each other – e.g. target group marketing

Find out all the sets of entities that match to the given pattern – e.g. fraud detection

12

Page 13: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 13

Faster Time to Results: Graph

• Parallel and Distributed Graph Processing

• Optimized for Infiniband network

• Ultra balanced workload partitioning using state-of-the-art, proprietary techniques

1

10

100

1000

2 4 8 16 32

Exe

cuti

on

Tim

e (

seco

nd

s)

log

sca

le

Cluster Size

Oracle PGX outperforms GraphX up to 160X

Weakly Connected Components (WCC) 1.4 Billion edge graph

Page 14: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Graph

• Massively-Scalable Graph Database

• 40+ in-memory parallel algorithms

• Simple standard interfaces: R, Python...

Machine Learning

• Scale to big data using R and SQL

• 40+ parallel, distributed algorithms

• Automated data preparation and modeling

• Enhances and extends SparkML

15

Comprehensive Data Science Capabilities Any analysis across relational, Hadoop, Spark and NoSQL

Page 15: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Machine Learning - Better Information, Valuable Insights and Predictions

Customer Months

Cell Phone Churners vs. Loyal Customers

Insight & Prediction Segment #1 IF CUST_MO > 14 AND INCOME <

$90K, THEN Prediction = Cell Phone Churner

Confidence = 100% Support = 8/39

Segment #3 IF CUST_MO > 7 AND INCOME <

$175K, THEN Prediction = Cell Phone Churner, Confidence = 83% Support = 6/39

Source: Inspired from Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J. A. Berry, Gordon S. Linoff

R

Page 16: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Be Specific in Problem Statement Poorly Defined Better

Predict employees that leave •Based on past employees that voluntarily left: • Create New Attribute EmplTurnover O/1

Predict customers that churn •Based on past customers that have churned: • Create New Attribute Churn YES/NO

Target “best” customers •Recency, Frequency Monetary (RFM) Analysis •Specific Dollar Amount over Time Window: • Who has spent $500+ in most recent 18 months

How can I make more $$? •What helps me sell soft drinks & coffee?

Which customers are likely to buy? •How much is each customer likely to spend?

Who are my “best customers”? •What descriptive “rules” describe “best customers”?

How can I combat fraud? •Which transactions are the most anomalous? • Then roll-up to physician, claimant, employee, etc.

Page 17: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 18

Faster Time to Results: Machine Learning Oracle Big Data Cloud Service outperforms Spark MLlib by 10X

264 sec

24 sec

Spark Oracle

• Re-designed, optimized algorithms

• Better parallelization, distribution

• Efficient memory utilization

• Superior solver technology

Linear Model algorithm over airline data Predict 8,000+ coefficients over 120M records

Page 18: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 19

Machine Learning: Not Just For Data Scientists

Page 19: A6 big data_in_the_cloud

EXPERIMENT

ANALYZE

& ACT

20

Manage, secure and make all data available

Connect people to information they need

Innovation through data experiments and advanced analytics

Transform workplace and workforce through insights

COLLECT

MANAGE

Oracle’s Unified Big Data Management and Analytics Strategy

Page 20: A6 big data_in_the_cloud

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Page 21: A6 big data_in_the_cloud

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 22

Page 22: A6 big data_in_the_cloud

DataFlow ML CS

Big Data Preparation CS

IoT CS

Big Data CS

NoSQL Database CS

Data Visualization CS Big Data Discovery CS

Stream Analytics CS*

R on Hadoop*

Spatial and Graph*

23

EXPERIMENT

ANALYZE

& ACT

COLLECT

MANAGE

DataFlow ML CS

Big Data Preparation CS

IoT CS

GoldenGate CS

Event Hub CS

Data Integrator CS

Big Data CS

NoSQL Database CS

Big Data CS CE

Big Data SQL CS

NoSQL Database CS

Exadata CS

Data Visualization CS

Business Intelligence CS

Spatial and Graph*

Advanced Analytics*

R on Hadoop*

Big Data Discovery CS

Stream Analytics CS*

R on Hadoop*

Spatial and Graph*

* Bundled with other Cloud Services

Oracle Cloud Platform for Big data

Page 23: A6 big data_in_the_cloud

24

New At Oracle OpenWorld

Oracle Big Data Cloud Service –

Compute Edition

Oracle Dataflow ML Cloud Service

Oracle NoSQL Database Cloud

Service

Oracle Event Hub Cloud Service

Page 24: A6 big data_in_the_cloud

Big Data CS

• Spark and all Hadoop services

• Bursting new

• Dedicated compute and storage

• Automated, Flexible

• Includes: R, multimedia, Spatial, Graph analytics; data integration tools, Cloudera Enterprise Data Hub

• Big Data SQL Cloud Service coming

Big Data CS – Compute Edition

• Spark and HDFS services

• Fully Elastic

• Shared compute and storage

• Fully Managed

25

Choice of Big Data Cloud Services Announcing

Page 25: A6 big data_in_the_cloud

Customer Data Center

Purchased

Customer Managed

26

Choice of Big Data Deployment Models

Big Data Appliance

Customer Data Center

Subscription

Oracle Managed

Big Data Cloud Machine

Oracle Cloud

Subscription

Oracle Managed

Big Data Cloud Service

Page 26: A6 big data_in_the_cloud

27

Page 27: A6 big data_in_the_cloud
Page 28: A6 big data_in_the_cloud

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 29