data exploration assignment ppt

14
Data Exploration Thoroughbred Horse Racing N RAMACHANDRAN

Upload: ramachandran-n

Post on 09-Aug-2015

62 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Data exploration assignment ppt

Data Exploration Thoroughbred Horse Racing

N RAMACHANDRAN

Page 2: Data exploration assignment ppt

Transform Qualitative Variables to Quantitative• Transforming the qualitative variables which have a significant impact

on the handle :1.Race_Type:dummy_allowance , dummy_handicap , dummy_stakes , dummy_maiden , dummy_starters,dummy_claiming

2.Age Restriction:dummy_is2allowed , dummy_is3allowed, dummy_is4allowed, dummy_is5allowed, dummy_isg5allowed

3.Surface : dummy_dirt , dummy_turf

4.Track Id:dummy_AD ,dummy_CD , dummy_CRC, dummy_FG

Page 3: Data exploration assignment ppt

Derived Variables

• Hour of race : Getting the hour of race in 24hr format• Day of race : Getting the day of the week (1: Sunday , 7: Saturday)• Month of race :Gettting the month of the race(1:Jan , 12:Dec)

Page 4: Data exploration assignment ppt

Summary Statistics

• No missing values .Some of the data not available for conditions_of_races , sex_restriction are assumed to mean that there are no conditions or restrictions and hence the field is blank.• Proc means and proc freq data on the expected lines .Nothing

unusual to be reported from the data.

Page 5: Data exploration assignment ppt

Graphical Analysis

• Compared different independent variables to the dependent variable handle and generated some charts.

Page 6: Data exploration assignment ppt

1

2

3

4

5

6

7

0 100000 200000 300000 400000 500000 600000

HANDLE

DAY

OF

WEE

K

Average Handle vs Day of Week

• The data below shows that the average handle peaks on Wed , Fri and Sat.(Sun =1 and Sat=7)

Page 7: Data exploration assignment ppt

7000 13000 19000 24500 30500 36500 43500 48300 52500 59500 65000 750004000000

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

Purse_USA

Hand

le

Average Handle vs Purse_USA

• There is a steep increase in the Handle when the total prize money increases above 125000.

Page 8: Data exploration assignment ppt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

100000

200000

300000

400000

500000

600000

700000

Race No

Hand

le

Average Handle vs Race No

• The average handle increases with the no of races till the race no 10 or 11.The client is advised to restrict the number of races to 11.For the cases of more than 11 races in a day , the returns are not that great.Race no 15 is an outlier .

Page 9: Data exploration assignment ppt

Average Handle vs No of runners• The average handle increases from no of runners from 4 to 12 and the

client is suggested to keep this range to maximize profits.

3 4 5 6 7 8 9 10 11 12 13 140

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

No of Runners

HAND

LE

Page 10: Data exploration assignment ppt

11 12 13 14 15 16 17 18 19 200

200000

400000

600000

800000

1000000

1200000

Hour of Day

HAND

LE

Average Handle vs Hour of day

• The data shows a significant high value of the handle where the 1st race are in the range 11-12 pm and the last race occurs in the time 7-8 pm. The client can be suggested to schedule the races as such.

Page 11: Data exploration assignment ppt

Handle Value Graph

• All the high values of the handle look like an outlier but the reason behind them is that they are mostly placed on the weekends (ie on holidays)

11 385 759 1133 1507 1881 2255 2629 3003 3377 3751 4125 4499 4873 5247 5621 5995 6369 6743 7117 7491 7865 8239 8613 8987 9361 9735 10109104830

1000000

2000000

3000000

4000000

5000000

6000000

handle

Page 12: Data exploration assignment ppt

Handle vs Track Id

• From data it can be inferred that the average handle at Churchill Downs in the state Kentucy is significantly greater than its peers.

AP CD CRC FG0

100000

200000

300000

400000

500000

600000

700000

800000

900000

Count of handleAverage of handle2

Page 13: Data exploration assignment ppt

Anomaly Detection

• In the handle graph(11th slide) , there are some spikes in the values which turnout to be weekends when high transaction handle occurs , so could not be termed as an outlier.• There is only one day(26-Oct-04) where we have no of races =15 , so

that can be an outlier .

Page 14: Data exploration assignment ppt

Suggestions for client(Summary)

• As described in the few graphs and histograms , some of the things the client should take into account are :• 1.Wed , Fri , Sat , Sun : are the highest gross handle days in a week.• 2.Steep increase in handle when the purse is higher than 150000$.• 3.Restrict the no of races to 11/day.• 4.Average handle increases when the no of runners are in 4-12 range.• 5.Value of the handle is significantly high if the first race is in 11-12pm

and the last in 7-8pm range.