my first attempt on kaggle - higgs machine learning challenge: 755st and proud!

46
Higgs Challenge My first attempt at Kaggle: 755st and proud!

Upload: dhiana-deva

Post on 14-Jun-2015

333 views

Category:

Data & Analytics


3 download

DESCRIPTION

The Higgs Machine Learning Challenge is not only a place for PhDs! As an undergraduate with a student license of MATLAB and a couple of dollars for Amazon AWS I could enter on the last 8 days of the challenge and overtake more than half of the competitors! In this talk, I'll present the challenge, my approach, and walk through the code.

TRANSCRIPT

Page 1: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Higgs ChallengeMy first attempt at Kaggle: 755st and proud!

Page 2: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

@dhianadeva

Page 3: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Err… Kaggle?!Platform for data science competitions

Machine Learning, Big Data, Statistics, Data mining ...

Community for data scientistsUsers, leaderboard, forums …

Sponsors!

Page 4: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 5: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 6: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

$$$posored competitions!

Page 7: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

We don’t need no PhD!

Page 8: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Yes, we can!My guilty pleasure:

Student license of MATLAB <3

Open source alternatives:Python + Scikit + Numpy + …R + randomForest + e1071 + caret + …Octave!?

Page 9: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 10: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Higgs Challenge

Page 11: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 12: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 13: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 14: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

DatasetsTraining (labeled):

250k events30 featuresEvent id, weight and class (s/b)

Test (unlabeled):18% Public (500k events)72% Private

Page 16: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

training.csvEventId , DER_mass_MMC , … , Weight , Class100000 , 138.47 , … , 0.00265331133733 , s100001 , 160.937 , … , 2.23358448717 , b100002 , -999.0 , … , 2.34738894364 , b100003 , 143.905 , … , 5.44637821192 , b…

Page 17: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

test.csvEventId , DER_mass_MMC , … , PRI_jet_all_pt350000 , -999 , … , -0.0350001 , 106.398 , … , 47.575350002 , 117.794 , … , 0.0350003 , 135.861 , … , 0.0…

Page 18: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

submission.csvEventId , RankOrder , Class350000 , 262328 , b350001 , 201479 , b350002 , 212810 , b350003 , 134945 , b…

Page 19: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

End-to-end

Page 20: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

A little math...

(Aproximate Median Significance)

Page 21: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

755th/1785 secretsI’ve entered on the last 8 days of the 127-days challenge and could overtake more than half of the competitors using:

MATLAB 2014b (student license)Neural Networks Toolbox20$ EC2 at Amazon Web Services9 code files totaling 674 words

Page 22: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Neural netwhat?!

Neurons

Inputs Output

Page 23: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

For now, a Black box!

OutputInputs

Page 24: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

It trains

Output

Inputs

Target

Error

Page 25: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

It runs

OutputInputs

Page 26: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 27: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Moonlighting!

Page 28: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

1. nprtool2. fixunknowns3. trainlm4. processpca5. 0.8 threshold6. ams threshold pick7. hidden neurons pick8. 0.25*targets + amsweights

8 days!

Page 29: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Some stats...

Page 30: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 1

Page 31: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 32: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 2

Page 33: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 3

Page 34: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 4

Page 35: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 36: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 37: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 5

Page 38: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 39: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 6

Page 40: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 7

Page 41: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Day 8

Oops!(weighted errors using ams, regularization, mapstd, … nothing worked!)

Page 42: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Lessons learned+ Optimize self-learning doing things from scratch (or

from default baseline)

+ Kaggle is way funnier than studying with traditional datasets (iris, cancer, thyroid...)

+ Data science needs good engineering practices!

+ The competition fact sheet was a great way of accessing what I know I know, what I know I don’t know…

Page 43: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 44: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!
Page 45: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Let’s hack?!Re-considering PCAPCD?Dimensionality ReductionStop on best AMS (hack nn toolbox!)EnsembleAuto-encoderMATLAB unit testsMATLAB continuous integration

Page 46: My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Thanks! ;)