rubin presentation

9
A consulting project by Douglas Rubin Y Combinator backed startup Databased analytics to growers and sellers ~ $2 billion high growth market (Forbes, Feb. 2015) Completely new space http://douglasrubin.zohosites.com

Upload: dsrubn

Post on 25-Jan-2017

117 views

Category:

Data & Analytics


0 download

TRANSCRIPT

A  consulting  project  by  

Douglas  Rubin  

•  Y  Combinator  backed  startup  •  Data-­‐based  analytics  to  growers  and  sellers  

•  ~ $2 billion high growth market (Forbes, Feb. 2015)

•  Completely new space

http://douglasrubin.zohosites.com

Data  •  User  experience  data  •  Lab  data  

http://douglasrubin.zohosites.com

Chem 2 Chem 1 Strain Name

25.42 1.23

30.31 0.97

15.56 2.01

19.01 1.55

21.59 0.79

Strain Name

Euphoric Paranoid

1 0

1 1

0 1

0 0

1 1

Douglas  Rubin  

Algorithm  

Douglas  Rubin  

Cluster  in  chemical  space  (k-­‐means)  

MC  iteration  for  each  cluster  to  match  names  

Build  up  “feelings”  distribution  for  each  

cluster  

Statistically  validate  difference  between  

“feelings”  distributions  

http://douglasrubin.zohosites.com

Chem 2 Chem 1 Strain Name

25.42 1.23

30.31 0.97

15.56 2.01

19.01 1.55

21.59 0.79

Strain Name

Euphoric Paranoid

1 0

1 1

0 1

0 0

1 1

Results  

Douglas  Rubin   http://douglasrubin.zohosites.com

Ps < 0.1

Kolmogorov-Smirnov test:

fraction)who)felt)energetic)

Probability)

cluster 1 cluster 2

Actionable  Insights  for  Company  

Douglas  Rubin   http://douglasrubin.zohosites.com

•  Several  broad  chemical  clusters  associate  more  (or  less)  with  particular  feelings  

•  Consistent  nomenclature  cluster 1 cluster 2 cluster 3 cluster 4

About  me  

Douglas  Rubin  

www.Klickpredict.net  

http://douglasrubin.zohosites.com

Results  

Douglas  Rubin  

Ps < 0.1

Kolmogorov-Smirnov test:

http://douglasrubin.zohosites.com

Results  

Douglas  Rubin   http://douglasrubin.zohosites.com

Supervised  Learning  

Douglas  Rubin   http://douglasrubin.zohosites.com

•  Standard  classiKication  algorithms  performed  poorly  (random  forest,  logistic  regression,  etc…)