big data meetup
TRANSCRIPT
![Page 1: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/1.jpg)
Data ScienceData Meetup Jan. 12
![Page 2: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/2.jpg)
What is data science?Besides a reason to have beer and pizza…
![Page 3: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/3.jpg)
![Page 4: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/4.jpg)
![Page 5: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/5.jpg)
What does the literature say?
![Page 6: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/6.jpg)
Hacking“Good data scientists understand, in a
deep way, that the heavy lifting of
cleanup and preparation isn’t
something that gets in the way of solving
the problem…
bash/awk/sed
DJ Patilit is the
problem”
![Page 7: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/7.jpg)
StatisticsWhat’s the probability that 2 people in the front 2 rows share a birthday?1. ~10%2. ~20%3. ~50%4. ~90%What’s the probability that a 99% accurate test diagnosed a 1/1000 disease?1. ~10%2. ~50%3. ~90%4. ~99%
![Page 8: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/8.jpg)
Domain Expertise
![Page 9: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/9.jpg)
Intelligence CookbookJust follow the steps
![Page 10: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/10.jpg)
The Recipe
First, make it valuable.Then, make it possible.Then, make it beautiful.
Then, make it smart.
![Page 11: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/11.jpg)
Example
E-Commerce website
![Page 12: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/12.jpg)
Make it valuable
Find a KPI that is correlated to bottom line
revenue
e.g. number of products the visitor browses
through
![Page 13: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/13.jpg)
Make it possible
Develop the simplest heuristic
e.g. show the visitor one of the top 10 selling products
![Page 14: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/14.jpg)
Make it beautiful
Create a method to quickly test new algorithms against old ones
e.g. create a framework that split tests two models and reports which one is better
![Page 15: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/15.jpg)
Make it smart
Figure out in what field your problem is and choose an off the
shelf algorithm
e.g. recognize that the problem is product
recommendation and use collaborative filtering
![Page 16: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/16.jpg)
Common ML problems• Supervised learning
• Classification• Regression• Anomaly detection
• Unsupervised learning• Clustering• Separation
• Recommendation• Feature based recommendation• Collaborative filtering
• Search• Indexing• Ranking
![Page 17: Big data meetup](https://reader033.vdocuments.mx/reader033/viewer/2022052904/557ef121d8b42ad17d8b4c25/html5/thumbnails/17.jpg)
To sum it all upReal data science is hard
but …
Real data science is the last step in data science, not the first
and besides …
The most important thing in data science is the business, not the science