visual intelligence' by ganes kesari, at hyderabad analytics club

36
VISUAL INTELLIGENCE Ganes Kesari B, VP, Gramener

Upload: gramener

Post on 16-Apr-2017

745 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

VISUAL INTELLIGENCE

Ganes Kesari B, VP, Gramener

Page 2: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

WHY VISUALISE?

Page 3: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

100

YEAR

S O

F IN

DIA

’S

WE

ATH

ER

1901

1911

1921

1931

1941

1951

1961

1971

1981

1991

2001

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

http://www.youtube.com/watch?v=WT0Aq41BaOQ

Page 4: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

THERE ARE MANY WAYS TO AID DATA CONSUMPTION

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Low effort High effort

High effort

Low effort

Creator

Consumer

Page 5: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 6: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

INDIA’S RELIGIONS

Page 7: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

AUSTRALIA’S RELIGIONS

Page 8: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

https://gramener.com/search/#questions/how-to-

Page 9: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Simplifying access to data is a big win

Page 10: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 11: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

DETECTING FRAUD

“ We know meter readings are incorrect, for various reasons.

We don’t, however, have the concrete proof we need to start the process of meter reading automation.

Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.

ENERGY UTILITY

Page 12: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

AN ENERGY UTILITY DETECTED BILLING FRAUD

This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large

number of readings are aligned with the slab boundaries.

Below is a simple histogram (or frequency distribution) of usage levels. Each bar represents the number of customers with a customers with a specific bill amount (in units, or KWh).

Tariffs are based on the usage slab. Someone with 101 units is billed in full at a higher tariff than someone with 100 units. So people have a strong incentive to stay at or within a slab boundary.

An energy utility (with over 50 million subscribers) had 10 years worth of customer billing data available.

Most fraud detection software failed to load the data, and sampled data revealed little or no insight.

This can happen in one of two ways.

First, people may be monitoring their usage very carefully, and turn of their lights and fans the instant their usage hits the slab boundary.

Or, more realistically, there’s probably some level of corruption involved, where customers pay a small sum to the meter reading staff to ensure that it stays exactly at the slab boundary, giving them the advantage of a lower price.

Page 13: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

PREDICTING MARKS

“ What determines a child’s marks?

Do girls score better than boys?

Does the choice of subject matter?

Does the medium of instruction matter?

Does community or religion matter?

Does their birthday matter?

Does the first letter of their name matter?

EDUCATION

Page 14: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

TN CLASS X: ENGLISH

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 15: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

TN CLASS X: SOCIAL SCIENCE

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 16: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

TN CLASS X: MATHEMATICS

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 990

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Page 17: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

Jain

Harini

Shweta

Sneha Pooja

Ashwin

Shah

Deepti

Sanjana

Varshini

Ezhumalai

Venkatesan

Silambarasan

Pandiyan

Kumaresan

Manikandan

Thirupathi

Agarwal

Kumar

Priya

https://gramener.com/names/

Page 18: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200.

June borns score the

lowest

The marks shoot up for Aug borns

… and peaks for Sep-borns

120 marks out of 1200

explainable by month of birth

An identical pattern was observed in 2009 and 2010…

… and across districts, gender, subjects, and class X & XII.

“It’s simply that in Canada the eligibility cut-off for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year—and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.”

-- Malcolm Gladwell, Outliers

Page 19: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

This is a dataset (1975 – 1990) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known.

For example,• Are birthdays uniformly distributed?• Do doctors or parents exercise the C-section option to

move dates?• Is there any day of the month that has unusually high or

low births?• Are there any months with relatively high or low births?

Very high births in September. But this is fairly

well known. Most conceptions happen during

the winter holiday season

Relatively few births during the Christmas and

Thanksgiving holidays, as well as New Year and

Independence Day.

Most people prefer not to have children

on the 13th of any month, given that it’s

an unlucky day

Some special days like April Fool’s day are avoided, but Valentine’s Day is quite popular

More births Fewer births … on average, for each day of the year (from 1975 to 1990)

LET’S LOOK AT 15 YEARS OF US BIRTH DATA

Page 20: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

THE PATTERN IN INDIA IS QUITE DIFFERENTThis is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.

For example,• Is there an aversion to the 13th or is there a local cultural

nuance?• Are holidays avoided for births?• Which months have a higher propensity for births, and

why?• Are there any patterns not found in the US data?

Very few children are born in the month of August, and

thereafter. Most births are concentrated in the first half

of the year

We see a large number of children born on the 5th, 10th,

15th, 20th and 25th of each month – that is, round

numbered dates

Such round numbered patterns a typical indication

of fraud. Here, birthdates are brought forward to aid

early school admission

More births Fewer births … on average, for each day of the year (from 2007 to 2013)

Page 21: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

THIS ADVERSELY IMPACTS CHILDREN’S MARKSIt’s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.

The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. • Are holidays avoided for births?• Which months have a higher propensity for births, and

why?• Are there any patterns not found in the US data?

Higher marks Lower marks… on average, for children born on a given day of the year (from 2007 to 2013)

Children “born” on round numbered days score lower marks on average,due to a higher proportion of younger children

Page 22: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

… to inform and to highlight

Page 23: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 24: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

IMPACT OF THE BUDGET ON STOCK PRICES

https://gramener.com/budget/?Year=2010

Page 25: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

“Which is the least successful party in Indian elections

history?”

Page 26: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

WHICH IS THE LEAST SUCCESSFUL PARTY?

https://gramener.com/election/parliament#story.ddp

Page 27: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

… to connect the dots for your readers

Page 28: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 29: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

Recruiting top quality developers is always a problem. We decided to use an algorithmic approach and pulled out the social network of developers on Github (a social network for open source code).

In this visualisation, each circle is a person. The size of the circle represents the number of followers. Larger circles have more followers (but not in proportion – it’s a log scale.)

The circle’s colour represents the city the programmer’s live in. This visual is a slice showing the tale of two cities: Bangalore and Singapore

Two people are connected if one follows the other. This leads to a clustering of people in the form of a network.

Here, you can see that Bangalore and Singapore are reasonably well connected cities. Bangalore has more developers, but Singapore has more popular ones (larger circles).

However, the interaction between Bangalore and Singapore are few and far between. But for a few people across both cities, like:

… etc.

Sudar, Yahoo!Anand C, ConsultantKiran, HasgeekAnand S, Gramener

Mugunth, Steinlogic Honcheng, buUukSau Sheong, HP LabsLim Chee Aung

Bangalore

Singapore

1 follower

100 followers

A follows B (or)

B follows A

Most followed in Bangalore

Most followed in Singapore

Ciju CherianLin JunjieAmudhi Sebastian

There are, of course, a number of smaller independent circles – people who are not connected to others in the same city. (They may be connected to people in other cities.)

Apart from this, there are a few small networks of connected people – often people within the same company or start-up – who form a community of their own.

THE SOCIAL TALE OF TWO CITIES: BANGALORE & SINGAPORE

https://gramener.com/codersearch/

Page 30: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

“Exploring politics as Data stories..”

Page 31: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club
Page 32: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

Has there ever been an all-woman election?

Who’s the oldest candidate ever?

Who won by the lowest margins ever in history?

Was there ever an uncontested win?

Som Marandi (BJP) and Konathala Ramakrishna (INC) won by just 9 votes in Bihar, 1998 and AP, 1989 respectively.

Since 1989, no election was won uncontested. Srinagar, J&K was the last, where Mohammad Shafi Bhat of JKN won without competition.

Only two elections had women candidates exclusively: Karur, TN (1967) and Panskura, WB (1977). Only 8 had a woman majority ever.

Arif Ahmed Shaikh Jafhar (NBNP) contested the 2009 elections from Dhhule, MH at age 99, making him the oldest candidate ever in India.

Page 33: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

https://gramener.com/election/cartogram?ST_NAME=Tamil%20Nadu

Page 34: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

SOMETIMES, STATISTICS CAN BE

EVEN MORE FUN THAN POLITICS

Page 35: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

THERE ARE MANY WAYS TO AID DATA CONSUMPTION

SHOWme what is

happening with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Low effort High effort

High effort

Low effort

Creator

Consumer

Page 36: Visual Intelligence' by Ganes Kesari, at Hyderabad Analytics Club

More examples at gramener.com blog.gramener.com

A data analytics and visualisation company

We handle terabyte-size data

via non-traditional analytics

and visualise in real-time.

Reach me at [email protected] @kesaritweets