large scale data analytics

30
Large Scale Data Analytics Shankar Radhakrishnan [email protected] linkedin.com/in/connect2shankar

Upload: shankarradhakrishnan

Post on 18-Jul-2015

498 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Large Scale Data Analytics

Large Scale Data Analytics

Shankar Radhakrishnan [email protected] linkedin.com/in/connect2shankar

Page 2: Large Scale Data Analytics

Scenario

• Insurer uses meteorological data for pricing model • At present data from 2000 weather stations are

collected for analysis • Plan is to use 10,000 weather station data

( or more ) • Stochastic simulation needs to run to ID pattern in

weather data, to determine pricing • Volumetric : peta-bytes of information

( for 1 region )

2

Page 3: Large Scale Data Analytics

Trends

3

Page 4: Large Scale Data Analytics

Data Analytics Is Mostly About $$, Customers, Markets

4

Page 5: Large Scale Data Analytics

How Widespread Is Data Analytics?

5

Page 6: Large Scale Data Analytics

Expectations On Payback Period ( Aggressive )

6

Page 7: Large Scale Data Analytics

Large Scale Data Analytics

7

“Involves using different algorithms, distributed platforms, tools and techniques to analyze big data and provide actionable insights”

Page 8: Large Scale Data Analytics

Big Data

“ Data sets that are very large in volume and complex “

8

New platforms, tools and techniqueshave emerged to manage Big Data

We broke away from traditionalways to process and analyze them

Page 9: Large Scale Data Analytics

Data Structures

 Vector, Matrix,

Or Complex Structure

Free Text Image or Binary Data Data “bags”

Iterative Logic Or Complex Branching

Advanced Analytic Routines

Rapidly Repeated

Measurements

Extreme Low

Latency

Access to all data required

Search Ranking X X X X X X

Ad Tracking X X X X X X X X  

Location or Proximity Tracking X   X X     X X  

Social CRM X X X X X X      X

Document Similarity Testing X X X X X X   X X

Genomic Analysis X X X X X

Customer Cohort groups X X   X X X     X

Fraud Detection X X X X X X X X X

Smart Utility Metering X X X X X X

Churn Analysis X X X X X X   X  

Satellite Image Analysis X X X X

Game Gesture Analysis X X X X X X X X

Data Bag Exploration X X X X X X

9

Page 10: Large Scale Data Analytics

Business Interests : Well Informed Customer Executive

10

Speech to Text Conversion

Voice Data

Unstructured data Analytical System

Customer Persona

• Customer Persona - Demographics, Top interactions, Channel Preferences, Dissatisfies

• Customer Lifetime Value • Recent Contact History • Customer Sentiment &

Trend during the call

Customer’s state of mind

Sentimental Analysis

Social media

Depositions

ComplaintsOther Channel

information (ATM, Branch)

Big Data Warehouse

Traditional Warehouse

Decision Engine • Customer Executive Dashboard presents all intelligence required to make a decision

• The decision engine also presents important decisions to be taken for the particular customer issue

Page 11: Large Scale Data Analytics

Well Informed Customer Executive…

Customer calls Banking Call Center

Executive understands the customer problemExecutive authenticates

customer and pulls up Customer Persona

Executive reviews risk of attrition

against Customer Lifetime Value

Executive reviews Last 5 call center

and banking transactions

Executive views customer’s state of

mind (risk of attrition ) through a barometer chart

Analytical Solution -Converts Speech to

textAnalytical engine listens to

customer voice

Suggested top 5 Actions requiredDecision Engine

Executive performs below actions based on his analysis and recommendations from Decision engine1. Reversal of overdraft fee2. One time fee waiver on Cheque book (predicting customer need based on historic usage cycles )3. Cash back Reward card for a minimum spend of $X through debit card4. Offer interest revision for investment products or mortgage5. Promote new mutual funds or credit cards based on customer willingness

Analytical engine monitors sentiment

Executive analyzes Customer Persona (demographic / Preferences / Satisfiers /

dissatisfies etc )

11

Page 12: Large Scale Data Analytics

Business Interests : Fraud Prevention

12

Envisaged Benefits ▪New fraud patterns can be identified by building ‘analytical models’ to run against historical data

▪ ‘Web crawling’, ‘Contextual text analysis’, ‘Natural Language Processing’ allows fraud behavior identification from social media. It may increase Fraud detection success rate

▪ ‘Real time’ models to capture behavioral patters and do pattern analysis against History data to evaluate Fraud case validity. The model learns by self and updates ‘Fraud pattern master sets.

▪Brings ‘artificial intelligent’ fraud pattern detection and analysis

▪ ‘Real time’ (in the order of .5-1 minute refresh rate) alerts to Fraud analysts about ‘self learned’ fraud patterns based on new customer behavior patterns

Big Data Usage ▪ Formation of key value groups to the order of XcY (where X no. of attributes that are relevant to Fraud

and Y is no. of attributes that should be combined to identify patterns)

▪High speed history data loading from source systems

▪ Efficient Real time fraud detection by identifying patterns through customer behavioral events and processing them over X yrs. of history data – e.g. using HBase

Scenario Formation of Fraud pattern reference tables using ▪ Real time data coming from different departments like IVR, WEB, Customer profile, Transactions etc ▪ Real time Mining and analysis of history data to form prior patterns (no. of years in range to 50-100 TB)

Page 13: Large Scale Data Analytics

Fraud Pattern Detection…

13

Legacy Fraud Data

Customer Profile Data

IVR Audio Data Web / Online

Card Transaction

Data

Fraud Pattern

Master Table Fraud Analyst

History Data Processing to

determine Fraud

Patterns over X years

Real-time Customer Behavior

Analysis for Fraud

Detection

Customer Behavior Change

Events

Customer Behavior Change

Events

Customer Behavior Change

Events

Real time Analysis of behavior patterns over

historical data

Real time update to Master Table on New

Fraud Patterns

Real time alert to Fraud Analyst

RDBMS RDBMS(JSON Files) RDBMS

Customer Behavior Change

Events

Page 14: Large Scale Data Analytics

Fraud Prevention…

14

Page 15: Large Scale Data Analytics

Benefits

15

BenefitsIndustry

Financial services▪ Customer Insights – Integrating Transactional data (CRM/Payments) and unstructured Social feeds ▪ Regulatory Compliance – Risk exposures across asset classes, LOBs and firms ▪ Fraud Detection in Credit Cards & Financial Crimes (AML) in Banks

Travel, Hospitality & Retail

▪ Customer centricity – Customer behavior analysis from Omni channel retailing & Social feeds ▪ Markdown Optimization – Improve markdown based on actual customer buying patters ▪ Market basket analysis – Narrow down market basket analysis by demographics

Life Science▪ Improve targeting & predictions – Automatic Detection of Adverse Drug Effects (ADEs) ▪ Patient data analysis – Longitudinal Patient Data (LPD) analysis ▪ Predictive Sciences – Analyze Preclinical Side Effect Profiles of Marketed Drugs

Healthcare (Payers & Providers)

▪ Cost of Care – Drug effectiveness & Cost of Care Analysis based on electronic Health Records (EMR) ▪ Self Service Healthcare – Increase in mHealth & eHealth to allow consumer access to health information ▪ Claims Analytics – Analyze insurance claims data for fraud detection & preferred treatment plans

Communication, Media & Entertainment

▪ Discover churn patterns based on Call data records (CDRs) and activity in subscribers’ networks ▪ Digital Asset Management (DAM) – Analyze & capitalize digital data assets

Manufacturing▪ Proactive Maintenance & Recommendation – Sensor Monitoring for automobile, buildings & machinery ▪ Energy Efficiency – Leveraging Smart meters for utility energy consumption ▪ Location or Proximity Tracking – Location based analytics using GPS Data

Hi-Tech ▪ Extend and complement conventional information supply chain with big data path ▪ Predictive analysis and real time decision support

Page 16: Large Scale Data Analytics

Hadoop

16

Page 17: Large Scale Data Analytics

Hadoop - HDFS

17

Page 18: Large Scale Data Analytics

Hadoop - MapReduce

18

Page 19: Large Scale Data Analytics

Hadoop - MapReduce

19

Page 20: Large Scale Data Analytics

Apache Spark

20

Spark

Iterative Processing

Batch Processing

Machine Learning

SQL

Stream Processing

Graph Processing

Page 21: Large Scale Data Analytics

Hadoop

21

Page 22: Large Scale Data Analytics

NoSQL Databases

22

Page 23: Large Scale Data Analytics

NoSQL Databases

23

Page 24: Large Scale Data Analytics

Modern Data Architecture

24

Page 25: Large Scale Data Analytics

Lambda Architecture

25

Page 26: Large Scale Data Analytics

Lambda Architecture

26

Page 27: Large Scale Data Analytics

Data Analytics Lifecycle

27

Page 28: Large Scale Data Analytics

Analytics - Trends

• Big Data Analytics In The Cloud • AWS, AWS-Redshift

• Hadoop • Enterprise Data Operating

System • Data Analytics Platform • SQL on Hadoop

• NoSQL • IoT ( Internet of Things )

28

• Multi-polar Analytics • Predictive Analytics ( Spark ) • In-memory Analytics • Data Lake • Deep Learning • Machine Learning • Neural Networks • Data Monetization

Page 29: Large Scale Data Analytics

Q & A

Page 30: Large Scale Data Analytics

Thank You !

“Any Sufficiently Advanced Technology Is Indistinguishable From Magic “

- Arthur C. Clarke