scg data management for advance analytics 20171126

50
Data Analytics data is the new GOLD Dr.Sittapong Settapat

Upload: g-able

Post on 24-Jan-2018

439 views

Category:

Business


1 download

TRANSCRIPT

Data Analyticsdata is the new GOLDDr.Sittapong Settapat

Introduction

Agenda

Customer Use Cases

Business Value of

Big Data Analytics

Solution Architecture

1

2

3

4

Big Data Is Only Getting BiggerParticularly Relevant in the Manufacturing Space

Dat

a G

row

th

END-USERAPPLICATIONS

THE INTERNET

MOBILE DEVICES

SOPHISTICATEDMACHINES

STRUCTURED DATA – 10%

COMPLEX DATA – 90%

1980 TODAY

DEVICES & SENSORS

PLANT & OPERATIONS

SUPPLY CHAIN & INVENTORY

MARKETING & CRM

PUBLIC & TRADE

● What makes “big data” big?

⎼ Volume?

⎼ Variety?

⎼ Velocity?

• Data becomes big when we take one

or more large data sets and start to

analyze relationships between

observations

Big Data and Data Products

Data is your most important asset

your data can help achieveyour objectives and goals.

…only if...

Data, Information, Knowledge

The data-driven enterprise

IoT explosion of new data

30Bconnected

devices

440x more data

Enterprises re-architect to modernize IT infrastructure

open source

cloud

machine learning

Modern platform for Big Data Analytics

Data Science &Engineering

Analytic Database

Operational Database

Driver Customer Insights

Improve Product & Services Efficiency

Lower Business Risks

Cloudera Enterprise: Fast, Easy, Secure

Business Value

TechnologyUse Cases

ONE platform. MANY application.

Agenda

Customer Use Cases

Introduction

Business value of

Big Data Analytics

Solution Architecture

1

2

3

4

Industry Use Cases – Driving Business Valuereal customer use cases

<

Financial Services

• Customer 360• Fraud / Cyber • Compliance• Risk• Operational Data

Store• Market Data• Algo Trading• Active Archive

Telco Media

• Customer 360• Churn prediction• Network

Optimization• Data Monetization• EDW Augmentation• Media Streaming• Active Archive

Manufacturing Life Science

• Connected: Car, Plane, Equipment

• Agile Supply Chain• Predictive

Maintenance• IoT Data enabled

“Smart Services” • Clinical trials • Diagnostics

Retail CPG

Transportation

• Ship to Store• Agile Supply Chain• Next Best Offer• Connected Store• Completed baskets• IoT – Stores• Active archiving• Smart Vessel• Customer Loyalty

Government Health Care

• Border Control• Risk / Intelligence• 360 Tax payer• Tax Optimization• Cyber Threat• Fraud prevention• Intelligence • Patient care• Citizen 360

Agenda

Customer Use Cases

Introduction

Business Value of Big Data Analytics

Solution Architecture

1

2

3

4

Customer Use Cases of a modern analytic database

IoT Data Characteristicsthe foundation of hadoop’s potential

IoT data comes from a variety of different sources• Massive volumes of intermittent data streams

• Generated from a variety of data sources

• Predominantly time-series

• Can come in streams (real-time) or batches

• Diverse data structures and schemas

• Some of it may be perishable

Combining sensor data with contextual data is the key to value creation from IoT

15

Where Is the Manufacturing Data?Mapping and Consolidation Are the Tip of the Iceberg for Big Data

Devices & Sensors

• Device Readings• Device Performance• Device Diagnostics• Battery / Power

Consumption• Software Logs• Environmental

Interactions• R&D• Quality / Testing

Plant & Operations

• MES• Sensors• Video / Surveillance• Line Productivity• Machines• Staffing / Scheduling

Supply Chain & Inventory

• ERP• Supplier / Manufacturer• Orders / Receivables• Commodity Supplies /

Prices

Marketing & CRM

• Transactions• Accounts • Warranties /

Aftermarket• Customer Service Logs• Campaigns /

Promotions• Website / SEO• Affiliates / Merchants• Surveys• Competitive

Intelligence

Public & Trade

• Market Intelligence• Policy / Regulation• Demographic / Census• Psychographic• Inflation / Macroeconomic• Gas Prices• Labor Statistics• Social / Search• Public Health Data• Clinical Studies• Store Schematics• Journals / Editorial• Seismic / Speculation

16

Where Is the Retail Data?Mapping and Consolidation Are the Tip of the Iceberg for Big Data

Customer Transactions

Shopper Behavior

Out-of-StoreBehavior

Merchandising & Operations

• POS / TLOG• E-commerce /

Mobile Sales• In-Store Ordering• Memberships /

Loyalty Programs• Warranties

• Video / Surveillance

• Sensors• Internet of Things

• Social / Sentiment

• Clickstreams• Consumer /

Consumption

• Schematic / Displays

• Store Layout / Characteristics

• Orders / Receipts• Staffing /

Scheduling• Retailtainment• Supplier /

Manufacturer

Marketing & CRM

Public & Trade

• Promotions / Trade

• Campaigns / SEO / Affiliate

• Direct / Indirect• Customer

Support Logs• Surveys• Competitive

Intelligence

• Demographic / Census

• Psychographic• Gas Prices• Labor Statistics• Weather Data• Public Health Data• Industry Research

19

Merchandising

Problem

Solution

Partners

First-in-Basket AnalysisUse exploratory analytics (e.g., clickstream) to identify promoted items that drive greater numbers of transactions and larger total transaction size.

Many Rigid SystemsComplex grid architecture is expensive, inflexible, error-prone, and hard to test. It does not scale to accommodate analysis of millions of SKU combinations.

Regression AnalysisImpala offsets the latency and constraints of EDWs to expand data available for merchandise regressions, also driving down the cost of ad hoc modeling.

Use Case

20

Buying

Problem

Solution

Partners

Consumer-Driven AssortmentOvercome SKU rationalization across categories by isolating the products or mix of products that are most indicative of larger baskets or key customer groups.

Moving Data to ComputeExpanding sources to include broad market data from research, Google, Facebook, Twitter, etc. overwhelms systems built on traditional data warehouses.

Automate in Real TimeCentralize data from silos: transactions, clickstreams, service logs, social, etc. Find data using Search and build models with Pig, Mahout, Spark, analytics tools.

Use Case

Customer 360

22

Increase Customer Satisfaction Challenge:• Disparate view of customers• Unable to analyze unstructured data cost-

effectively and consistently • Manual and random analysis of web chats and

customer sentiments

Solution:• Analyzing over 250K web chats/ month • Tapping into 100% of unstructured data versus

1% previously• Discovering valuable patterns in web chats that

were previously undetectable • Reduce customer complaints by 25%• Roadmap: expand usage to include additional

omni-banking channels for 360 view

RETAIL BANK» CUSTOMER 360

DRIVE CUSTOMER INSIGHTS

23

Increase Customer Retention, Loyalty and Acquisition Rates

Challenge:• Fragmented systems and disparate

view of customers

Solution:

• Serve trend information back to customers via Santander’s “Spendlytics” application

• Capture, transform and enrich data in near-real time

RETAIL BANK» CUSTOMER 360» CRM, MARKETING PROGRAMS» FRAUD» RISK » COMPLIANCE – BCBS239DRIVE CUSTOMER

INSIGHTS

24

DRIVE CUSTOMER INSIGHTS

GLOBAL INFORMATION SERVICES» CUSTOMER 360

Gains insights from customer spend data and behavior-based lifestyle segmentation

Challenge:• High storage costs plus acquiring mass archival

data was cost-prohibitive• Unable to obtain a 360 view of consumer spend

data, preferences and behavior or tap into new data sources

Solution:• Using Apache Spark to analyze consumers’

preferences and interests based on their spending behavior patterns. Enhancing spend data with new data sources

• Processing 500% more matches per day

• 50X performance gains

• Deployment in < 6 months

“Nobody is doing what we’re doing with Hadoop today, especially at this order of

magnitude. The Experian Marketing Suite’s Identity Manager is the first real-time linkage

engine that accepts data, links information together across an entire marketing

ecosystem, and puts it into a usable format for a solid customer experience.”

Emad Georgy, SVP Global Software Development, Experian Marketing Services

25

New Products, Gain Insights, & Reduced Costs • Analyzing data from its 20+ automotive

ecosystem brands (e.g. Autotrader, Kelley Blue Book)

• Combining data for new products and offerings that aren't otherwise possible

• Fine grained real-time view of activity, responses, inventory & pricing

• Reduced TCO by 50% by consolidating over 1PB of data, adding 200M rows daily

• “Impala provides analysts with near-Netezza speeds but on the Hadoop cluster”

DRIVE CUSTOMER INSIGHTS

26

Measure user interaction across the ecosystem, help direct R&D and development spend

• Real-time streaming and batch data from product logs, web analytics, channel data and ERP

• Virtuous cycle: Identify features that facilitate sharing of content that drive new customers

• Analyze utilization of new community attributes that drive adoption

MANUFACTURING» CUSTOMER 360» DATA DRIVEN PRODUCTS» DATA DRIVEN SERVICES

DATA-DRIVENPRODUCTS

27

IoT

28

Predictive Maintenance on Thousands of Industrial Machinery in Real- Time

Challenge:• Collect and analyze data from thousands of

diverse manufacturing systems in real-time

Solution:

• iTrak application using Cloudera in the Cloud to monitor the performance of individual manufacturing systems in real-time

• Predictive Maintenance - Proactively identifying & fixing issues before they break

MANUFACTURING» INDUSTRIAL IoT» PREDICTIVE MAINTENANCE» IMPROVED EFFICIENCIES

Industrial IoT – Predictive Maintenance

DATA-DRIVENPROCESS

CASE STUDY

DATA-DRIVENPRODUCTS

29

Fraud

30

LOWER BUSINESS RISKS

MAJOR RETAIL BANK» CYBER SECURITY

Top Retail Bank Uses EDH to Detect and PreventMalware Attacks

Challenge:

• One malware source on SharePoint took 9 months to find – re-infection kept occurring

• Unable to determine source of malware

Solution:

• Uses Cloudera Enterprise to ingest internal network comms, proxy logs, etc. Uses Apache Spark (Machine Learning techniques) to create network graph

• Reduces the spread of malware within bank. Finds malware entry source

• Mobilized quickly to respond to the “shell shock” bug

31

GLOBAL PAYMENT PROCESSOR» REAL-TIME FRAUD DETECTION &

PREVENTION» CUSTOMER 360°» ETL OFFLOAD/STORAGE

OPTIMIZATION

GLOBAL PAYMENT PROCESSOR

FRAUDLOWER BUSINESS RISKS

Challenge:

• Spending $1 billion on EDW environment annually • Data Scientists and Statisticians were unable to

access more than a year’s worth of data • Unable to perform faster queries or mine data for

fraud and risk factors

Solution:• Performs real-time fraud detection using Apache

Spark and Impala• Creates and back-test new fraud models over

historic data• Identifies largest case of fraud in company’s history

• Ingesting 4TB of data per day

• Using Cloudera Enterprise for ETL Offload resulting in 10-15% workload reduction andEDW optimization with $30M in annual savings

32

FRAUD

GLOBAL PAYMENT PROCESSOR» DATA SECURITY» FRAUD DETECTION & PREVENTION» CUSTOMER 360» IT COST REDUCTION

. Cloudera Enterprise: First PCI Certified Hadoop Platform

• Performs real-time fraud detection and prevention with Apache Spark and Impala

• Secures 10 PB of data in a PCI-compliant manner every day

• Optimizes EDW and ETL Offload with savings in millions

• MasterCard Advisors partners with Cloudera

33

LOWER BUSINESSRISKS

REGULATORY AUTHORITY» TRADE SURVEILLANCE

Builds Holistic Picture of US Market By Looking at 30BN Events/Day

Challenge:• Overseeing transactions from more than 4,100 firms incl.

exchanges, brokers-dealers & trade reporting facilities• Difficult and costly to aggregate and analyze increasing

volume of data from numerous sources incl. orders, quotes and trades

Solution:• Built market event graph database using EDH• Provides interactive access to graph data for investigations• Using EDH on-premise and in the cloud• Monitoring and analyzing transactions to detect fraud,

insider trading, short sale, best execution

• Savings of $10-20M annually

34

Agenda

Customer Use Cases

Introduction

Business Value of Big Data Analytics

Solution Architecture

1

2

3

4

35

The Legacy Approach

• Batch File Ingestion • Discover Threats Too Late

The Hadoop Machine Learning Approach

• Real-Time Packet Ingestion• Discover in Seconds vs. Hours or Days

Legacy Approach vs. Hadoop Machine Learning

36

The Legacy Approach

• Batch File Ingestion • Discover Threats Too Late

• Rules Based• Don’t Discover Zero-Day Attack Methods• False Positive Overload

The Hadoop Machine Learning Approach

• Real-Time Packet Ingestion• Discover in Seconds vs. Hours or Days

• Real-Time Anomaly Detection• Discover 250% to 350% More Fraud

• 20 to 30 Times Less False Positives

Legacy Approach vs. Hadoop Machine Learning

37

The Legacy Approach

• Batch File Ingestion • Discover Threats Too Late

• Rules Based• Don’t Discover Zero-Day Attack Methods• False Positive Overload

• Data Silos• No Crime 360 Signals

The Hadoop Machine Learning Approach

• Real-Time Packet Ingestion• Discover in Seconds vs. Hours or Days

• Real-Time Anomaly Detection• Discover 250% to 350% More Fraud

• 20 to 30 Times Less False Positives

• Enterprise Data Hub• Discover Crime 360 Signals

Legacy Approach vs. Hadoop Machine Learning

38

The Legacy Approach

• Batch File Ingestion • Discover Threats Too Late

• Rules Based• Don’t Discover Zero-Day Attack Methods• False Positive Overload

• Data Silos• No Crime 360 Signals

• Flat world Forensics• Discover Incident not Crime Rings

The Hadoop Machine Learning Approach

• Real-Time Packet Ingestion• Discover in Seconds vs. Hours or Days

• Real-Time Anomaly Detection• Discover 250% to 350% More Fraud

• 20 to 30 Times Less False Positives

• Enterprise Data Hub• Discover Crime 360 Signals

• Graph based Visual Analytics• Discover Crime Rings

Legacy Approach vs. Hadoop Machine Learning

39

The Legacy Approach

• Batch File Ingestion • Discover Threats Too Late

• Rules Based• Don’t Discover Zero-Day Attack Methods• False Positive Overload

• Data Silos• No Crime 360 Signals

• Flat world Forensics• Discover Incident not Crime Rings

• High Cost Proprietary Architecture• Limited Data due to cost constraints

The Hadoop Machine Learning Approach

• Real-Time Packet Ingestion• Discover in Seconds vs. Hours or Days

• Real-Time Anomaly Detection• Discover 250% to 350% More Fraud

• 20 to 30 Times Less False Positives

• Enterprise Data Hub• Discover Crime 360 Signals

• Graph based Visual Analytics• Discover Crime Rings

• Native Hadoop Architecture• Unlimited Data Storage & Analytics

Legacy Approach vs. Hadoop Machine Learning

40© Cloudera, Inc. All rights reserved.

Enterprise Data Warehouse

ApplicationsData Sources Operational Data Stores

Traditional Architecture

Enterprise Data Warehouse

ServeELT

Archive

BI System

Modeling

Reporting

ETL

HPC GRID

Storage #2

Storage #1

Ingest

Pro

cess Load

Unstructured

FinancialLedger P&L

RisksMarket,

Counterparty,Ratings

PaymentsCollections

Charges

Ingest

Ingest

PortfolioContractsPortfolio

Challenges Architectures

41© Cloudera, Inc. All rights reserved.

ApplicationsRisk Data Sources Cloudera Enterprise Data Hub (EDH)

Modern Architecture

EDHIngest

Active Structured Data

Serve

Serve

Archive Load

Extract Load

BI System

Modeling

Reporting

Enterprise Data Warehouse (EDW)

PortfolioContractsPortfolio

Unstructured

FinancialLedger P&L

RisksMarket,

Counterparty,Ratings

PaymentsCollections

Charges

Compute

Transfo

rm

Storage

New Architecture with Big Data

42© Cloudera, Inc. All rights reserved.http://www.jobs.ac.uk/enhanced/industry/lifesciences-london/

Data exploration

Data preparation

Data modelling

Data visualization

Machine learning

Process and Tools

Bringing the goals to lives

43© Cloudera, Inc. All rights reserved.

44© Cloudera, Inc. All rights reserved.

A. Technology Savvy

● Data management

● Analytics & virtualization

B. Service Oriented

● Architectural design

● System development

● Quality management

● System management

C. Our Experiences

● Massive and real time data

processing

● Advance data analytics

● Natural language processing

(Thai and English)

D. Applications

● Data lake and virtual platform

● Voice of customer management

● Machine learning for

personalization and

recommendation

E. Team Proficiency

● Data architect

● Data engineer

● Report designer

● Data scientist

F. Our Alliances

● Big data experience center

● Consulting firm & Experts

● MS Partner Development Unit

G-ABLE data and analytics unit

45

Q&A

Data Warehouse Offloading

• No Hub / No Data Lake

• No C360

• Tape – expensive to read data

• Expensive ETL tooling

• Expensive EDW per TB

• No enterprise search

• Long time to get value from data

• Slow to get access to the data

• IT led

• Problem with scale

• Data to analysis – days

• Not Petabyte scale

• Logs, clickstream data archived

Before Hadoop

• Enterprise Data Hub – Governed with Security and Search

• C360 – all data web logs, click stream, active archive

• Cheap 1/30 cost EDW. Easy to scale. PB+

• Seconds from log data, click stream to analytics

With Hadoop

Analytics on Hadoop

Analytics – BI and Predictive on all data

DB2

Oracle MySQL

Structured Data Cloud

sqoop

Web Logs Click Stream Data

Fla

fka

Hive / ODBC

Semi-Structured Data

Impala HIVE

ODBC / JDBC Connectivity

Predictive Analytics / Data Science