introduction to data science using python - github pages · introduction to data science using...

14
Introduction to Data Science using Python Issam Laradji

Upload: others

Post on 31-May-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Data Science using Python

Issam Laradji

Data ScienceI The term “data science” has exploded in business

environments

Data ScienceI Many academics and journalists see no distinction between

data science and statistics

Data ScienceNate Silver

I Sexed-up term for statistics. Statistics is a branch ofscience. Data scientist is slightly redundant in some wayand people shouldn’t berate the term statistician

Data ScienceI Find and interpret rich data sourcesI Create visualizations to aid in understanding dataI Data Scientists are people who turn data into applications

Data ScienceI Find and interpret rich data sourcesI Create visualizations to aid in understanding dataI Data Scientists are people who turn data into applications

Data Science

I Find and interpret rich data sourcesI Create visualizations to aid in understanding dataI Data Scientists are people who turn data into applications

Data ScienceI Find and interpret rich data sourcesI Create visualizations to aid in understanding dataI Data Scientists are people who turn data into applications

Data Science

Python libraries for data scienceI Pandas (for data manipulation and visualization)

I Scikit-learn (for machine learning)

Titanic Dataset

On April 15, 1912I the Titanic sank after colliding with an iceberg

I killing 1502 out of 2224 passengers and crew.I There were not enough lifeboats for the passengers and

crew.I Some groups of people were more likely to survive than

others, such as women, children, and the upper-class.Task

I What sorts of people were likely to survive ?I Use Data Science or Machine Learning to predict which

passengers survived the tragedy.

Titanic Dataset

I Download link: goo.gl/oF5GBc

Sklearn

Check out the exercises at the bottom of thejupyter notebook

Exercise 1:Try different depths of the decision tree (see code inthe notebook)

Exercise 2:Use three sklearn methods and see which gives thehighest score (see code in the notebook)