Transcript
Page 1: Entering the Data Analytics industry

1

UPGRAD WORKSHOP10TH DEC’16 @HYD

ENTERING THE

DATA ANALYTICS INDUSTRY

B GANES KESARIVP, GRAMENER

Page 2: Entering the Data Analytics industry

2

DATA ANALYTICS ?WHAT’S THE BUZZ AROUND ANALYTICS

Page 3: Entering the Data Analytics industry

We have internal information. Getting

information from outside is our challenge. There’s no

way of doing that.

– Senior EditorLeading Media Company

Page 4: Entering the Data Analytics industry

INDIA’S RELIGIONS

4

Page 5: Entering the Data Analytics industry

AUSTRALIA’S RELIGIONS

5

Page 6: Entering the Data Analytics industry

6

Page 7: Entering the Data Analytics industry

WHAT ARE PEOPLE LOOKING FOR IN DATA ANALYTICS?

7

USA India

data analytics jobs

data analytics tools

data analytics salary

data analytics training

Jobs & Salary Tools Companies Training & Courses

data analytics courses

data analytics tools

data analytics jobs

data analytics companies

Source: https://google.com, https://google.co.in

Page 8: Entering the Data Analytics industry

WHAT’S THE POPULARITY OVER TIME?

8

“Data Analytics”

Source: https://trends.google.com/

Page 9: Entering the Data Analytics industry

WHICH CITIES HAVE INTEREST IN DATA ANALYTICS?

9Source: https://trends.google.com/

0 20 40 60 80 100 120

GurgaonPimpri-Chinchwad

NoidaBengaluruHyderabad

ChennaiSingapore

MumbaiSan Francisco

DublinBoston

WashingtonPune

HowrahToronto

New YorkSydney

New DelhiChicago

Melbourne

Page 10: Entering the Data Analytics industry

10

WHAT’S THE STATE OF THE

DATA ANALYTICS JOBS

Page 11: Entering the Data Analytics industry

WHO’S RECRUITING THE TEAMS?

11

0 50 100 150 200 250 300 350 400 450

IBM India

Accenture

JPMorgan

KPMG

Concentrix Daksh

Microsoft India

Ernst & Young

UnitedHealth Group

Shell India Markets

Amazon Dev Centre

GE India Technology

Hewlett-Packard

Deloitte

Cisco Systems

WNS

Xerox

eClerx Services

Mphasis

AIG Analytics

Sapient Consulting

#Jobs

Source: https://www.naukri.com

Page 12: Entering the Data Analytics industry

WHAT INDUSTRIES USE DATA ANALYTICS?

12

0% 10% 20% 30% 40% 50% 60%

Software

Banking, Financial Services

Internet, Ecommerce

KPO, Research, Analytics

BPO, Call Centre, ITES

Recruitment, Staffing

Strategy Mgmt Consulting

Media & Entertainment

Advertising & PR

Accounting & Finance

Telcom, ISP

Education, Teaching & Training

Pharma, Biotech & Clinical Research

Insurance

FMCG, Foods & Beverage

Source: https://www.naukri.com

Page 13: Entering the Data Analytics industry

WHAT DO THEY PAY?

13

0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0%

0-3 Lakhs

3-6 Lakhs

6-10 Lakhs

10-15 Lakhs

15-25 Lakhs

25-50 Lakhs

50-75 Lakhs

75-100 Lakhs

100+ Lakhs

Source: https://www.naukri.com

Page 14: Entering the Data Analytics industry

WHERE ARE THE DATA ANALYTICS JOBS?

14Source: https://www.naukri.com

0% 5% 10% 15% 20% 25%

Bengaluru

Delhi NCR

Mumbai

Gurgaon

Hyderabad

Others

Pune

Noida

Chennai

Delhi

Page 15: Entering the Data Analytics industry

WHO ARE THE BIG PLAYERS IN THIS SPACE?

15Source: Gartner BI Magic Quadrant

Page 16: Entering the Data Analytics industry

WHICH STARTUPS OFFER DATA ANALYTICS IN INDIA?

16Source: https://angel.co/... and more

Page 17: Entering the Data Analytics industry

17

WHY DATA ANALYTICS?WHAT’S CAUSING ALL THIS BUZZ

Page 18: Entering the Data Analytics industry

CLASSES OF ANALYTICAL SOLUTIONS

18

Proactive ActionWhat should I do to achieve my goal?Data products, data validated actions, increased success rate of strategic initiatives

ModeApproach to data Benefits

Proactive DecisionsWhat is likely to happen?Support for strategic initiatives, forward looking decision making

Proactive Consumption

ActiveWhat happened ? Marginal business benefits , process gap identification

Why did it happen? Significant improvements from status quo, data backed management

Page 19: Entering the Data Analytics industry

19

Proactive Action

ModeApproach to data Benefits

Proactive Decisions

Proactive Consumption

ActiveOperational Reporting for measurement of

efficiency & compliance

Marginal business benefits , process gap identification

CLASSES OF ANALYTICAL SOLUTIONS

Page 20: Entering the Data Analytics industry

TIMES NOW COVERAGE HAD

80%+ VIEWERSHIP 20

Page 21: Entering the Data Analytics industry

21

Proactive Action

ModeApproach to data Benefits

Proactive Decisions

Proactive ConsumptionRoot Cause Analysis , Benchmarking and multi-

dimensional analysis

Significant improvements from status quo, data backed management

Active

CLASSES OF ANALYTICAL SOLUTIONS

Page 22: Entering the Data Analytics industry

DETECTING FRAUD

“ We know meter readings are incorrect, for various reasons.

We don’t, however, have the concrete proof we need to start the process of meter reading automation.

Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.

ENERGY UTILITY

Page 23: Entering the Data Analytics industry

This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of readings are

aligned with the tariff slab boundaries.

This clearly shows collusion of some form with the customers.

Apr-10 May-10Jun-10Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11217 219 200 200 200 200 200 200 200 350 200 200250 200 200 200 201 200 200 200 250 200 200 150250 150 150 200 200 200 200 200 200 200 200 150150 200 200 200 200 200 200 200 200 200 200 50

200 200 200 150 180 150 50 100 50 70 100 100100 100 100 100 100 100 100 100 100 100 110 100100 150 123 123 50 100 50 100 100 100 100 100

0 111 100 100 100 100 100 100 100 100 50 500 100 27 100 50 100 100 100 100 100 70 1001 1 1 100 99 50 100 100 100 100 100 100

This happens with specific customers, not randomly. Here are such customers’ meter readings.

Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%

If we define the “extent of fraud” as the percentage excess of the 100 unitmeter reading, the value varies considerably across sections, and time

New section manager arrives

… and is transferred out

… with some explainable anomalies.

Why would these happen?

Simple histograms have been applied to manage ALM compliance,fraud in corporate directorships, and collusion in schools

Page 24: Entering the Data Analytics industry

What do the children in schools know and can do at different stages of elementary education?

Have the inputs made into the elementary education system had a beneficial effect or not?

24

Page 25: Entering the Data Analytics industry

HAVING BOOKS IMPROVES READING ABILITYHaving more books at home improves the performance of children when it comes to reading. (But children typically only have only 1-10 books at home)

Number of students sampled

What is the impact? How many more marks can having more books fetch?

Circle size indicates number of students with this response. Few students have no books.

Is this response (“25+ books”) good or bad? Small red bars indicate low marks. Large green bars indicate high marks. Students having 25+ books tend to score high marks.

The most common response is marked in blue. This is also the circle.

The graphic is summarized in words

Indicates whether the best response is the most popular. Blue means that it is not. Green means that it is. Red means that the worst level is the most popular response.

25

Page 26: Entering the Data Analytics industry

HAVING MORE SIBLINGS DOESN’T HELP READINGChildren with 1 sibling do much better than children with many siblings

26

Page 27: Entering the Data Analytics industry

… BUT HELPS A LOT IN MATHEMATICSChildren with 4+ siblings do very well, children with 1 sibling fare poorly

27

Page 28: Entering the Data Analytics industry

TUITIONS HELP A LITTLE

… BUT NOT CHILDREN WITH 4+ SIBLINGS

28

Page 29: Entering the Data Analytics industry

TUITIONS HELP A LITTLE

… BUT NOT CHILDREN OF ILLITERATE PARENTS

29

Page 30: Entering the Data Analytics industry

CHILDREN LIKE GAMES, AND THEY’RE GOOD

… but playing daily hurts reading ability30

Page 31: Entering the Data Analytics industry

31

Proactive Action

ModeApproach to data Benefits

Proactive Decisions

Proactive Consumption

Active

Statistical Analysis thru Segmentation, Decision Trees and

Cause-effect Modelling

Support for strategic initiatives, forward looking decision making

CLASSES OF ANALYTICAL SOLUTIONS

Page 32: Entering the Data Analytics industry

32

Telecommunication

“ How to predict customer churn, atleast a month ahead”

Page 33: Entering the Data Analytics industry

33

Background & Objective

Gramener Approach

Customer churn is a well noted problem in telecom industry today. One of the leading telecom operator in the country wanted to predict the churn rate 2/4 week before using an analytical model.

Exploratory Analysis & influencers

Predictive Intervention

Linear Discriminant Parameters

Exploratorybusinessanalysisperformedtoidentifyinfluencers&createadditionalderivedmetrics&deriveddimensions

Usingselectivemetrics,modelswerebuiltonLinearClassificationlikeDecisiontrees,LinearDiscriminantParameters

Non – Linear Models

Usingselectivemetricsnon-linearfamiliesofmodelswerebuilt:NeuralNetworks,RandomForests&SupportVectorMachines

• Thebestmodelwasimplemented&comparedwithacontrolset

• Targetedpromotions forpredictedsetyielded~60%reductioninchurn

CLASSES OF ANALYTICAL SOLUTIONS

Page 34: Entering the Data Analytics industry

MODEL BUILDING & FINE-TUNING

ModelsDeployed

üPair-wisecorrelationüMulti-linearregressionüLinearDiscriminantAnalysisüDecisionTreeüSupport VectorMachinesüNeuralNetworksüRandomForest

OtherVariability

üPredictDurationüAgeingofmodel

InputMetrics- Customer

ü Incoming&OutgoingMinutesü Incoming&OutgoingCallsü DailyMobileUsageü ClosingBalanceü Customeractivationdate

Input- Derived&GrowthMetrics

ü Last/AverageClosingbalanceinamonthü Dayssincethe lastOutgoingCallü Dayssincethe lastRechargeü TotalDecrementü MonthlyRefillAmountü TotalMinutesincl Incoming&Outgoingü PercentageofIncomingMinutesü RechargeValues

Page 35: Entering the Data Analytics industry

8.3% 0.0%

MISSED WASTED

6.61COST PER CUST.

0.0%

IMPROVEMENT

Base

MODEL

OK

WASTED

Marketing costRs 40

MISSED

Acquisition costRs 80

OK

No churn Churn

No

chur

nCh

urn

Prediction

Actu

al

Page 36: Entering the Data Analytics industry

~1-2% ~2-3%

MISSED WASTED

~2.0-3.0

COST PER CUST.

~40-50%

IMPROVEMENT

Random Forest/SVM/etc

MODELS

Page 37: Entering the Data Analytics industry

37

Proactive Action

ModeApproach to data Benefits

Proactive Decisions

Proactive Consumption

Active

Data driven decision making, thru advanced mathematical models and

scenario planning

Data products, data validated actions, increased success rate of strategic initiatives

CLASSES OF ANALYTICAL SOLUTIONS

Page 38: Entering the Data Analytics industry

HEURISTICS

EMERGENCY

“ A man is rushed to a hospital in the throes of a heart attack.

The nurse needs to decide whether the victim should be admitted into emergency care.

Although this decision can save or cost a life, the nurse must decide using only the available cues, and within a few seconds – preferably using some fancy statistical software package.

Page 39: Entering the Data Analytics industry

HEURISTICS

EMERGENCY

Pressure < 91

Age > 62

Pulse > 100

No Yes

No Yes

No Yes

Page 40: Entering the Data Analytics industry

VISUAL ANALYTICS IS IMPERATIVE FOR

ANALYTICS →INSIGHTS →ACTIONSpot the unusual Communicate patterns Simplify decisions

Page 41: Entering the Data Analytics industry

41

SKILLS & ROLESTHAT YOU SHOULD PICK UP

Page 42: Entering the Data Analytics industry

SO, WHAT’S THE SKILL NEEDED TO CREATE THESE?

42

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Page 43: Entering the Data Analytics industry

…AND WHAT ARE THE ROLES AVAILABLE?

43

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Data Scientist

Page 44: Entering the Data Analytics industry

SO, WHAT’S THE SKILL NEEDED TO CREATE THESE?

44

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Functional Consultant

Page 45: Entering the Data Analytics industry

SO, WHAT’S THE SKILL NEEDED TO CREATE THESE?

45

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Information Designer

Page 46: Entering the Data Analytics industry

SO, WHAT’S THE SKILL NEEDED TO CREATE THESE?

46

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Data Analyst

Page 47: Entering the Data Analytics industry

SO, WHAT’S THE SKILL NEEDED TO CREATE THESE?

47

Deep Domain Expertise

Visual Design & Presentation

Deep Programming

Statistics & Machine Learning

Passion for NumbersDomain Orientation

Data ScientistFunctional Consultant

Information Designer Data Analyst

Page 48: Entering the Data Analytics industry

48

TOOLS & SOFTWARETHAT YOU SHOULD BE LOOKING AT

Page 49: Entering the Data Analytics industry

THE DATA SCIENCE TOOLKIT

AlteryxAmazon EC2Azure MLBigQueryBirstCaffeCassandraCloud ComputeClouderaCognosCouchDBD3Decision treeElasticSearchExcelGephi

ggplot2HadoopHP VerticaIBM WatsonImpalaJuliaJupyter NotebookKafkaKibanaKinesisLambdaLeafletLogstashMapRMapReduceMatplotlibMicrostrategy

MongoDBNodeXLPandasPentahoPivotalPowerPointPower BIQlikviewRR StudioRandom ForestRedisRedshiftRegressionRevolution RS3SAP Hana

SASSparkSpotfireSPSSSQL ServerStanford NLPStormSVMTableauTensorFlowTeradataTheanoThriftTorchWekaWord2Vec

The tool does not matter. A person’s skill with the tool does.Pick the person. Let them pick the tool.

Page 50: Entering the Data Analytics industry

50

TRAINING & COURSESTHAT WILL HELP YOU ENTER THE INDUSTRY

Page 51: Entering the Data Analytics industry

SELF-LEARNING

51

TAILORED COURSES

LEARN ON THE JOB

Page 52: Entering the Data Analytics industry

GRAMENERCONSULTING | SERVICES | PRODUCTS

[email protected] | @KESARITWEETS


Top Related