logistic regression in data science

21
www.edureka.in/data-science Slide 1 www.edureka.in/data-science Data Science Inject Intelligence Into Business decisions Using Regression

Upload: edureka

Post on 12-Aug-2015

360 views

Category:

Technology


6 download

TRANSCRIPT

Page 1: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 1 www.edureka.in/data-science

Data ScienceInject Intelligence Into

Business decisions Using

Regression

Page 2: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 2 www.edureka.co/r-for-analyticsSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Objectives

What is data mining

Stages of data mining??

What is R

What is data science??

What is needed of data scientist??

Roles and Responsibilities of a Data Scientist.

Logistic Regression

At the end of this session, you will be able to

Page 3: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 3 www.edureka.in/data-scienceSlide 3

Data Science Applications: Wine Recommendation

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 4: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 4 www.edureka.in/data-scienceSlide 4

Data Science Applications: Predict Accidents

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 5: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 5Slide 5Slide 5 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Cross Industry standard Process for data mining ( CRISP – DM )

Stages of Analytics / Data Mining

Page 6: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 6Slide 6Slide 6 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Components data science??

Page 7: Logistic Regression In Data Science

www.edureka.in/data-scienceSlide 7Slide 7Slide 7 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Components data science

R Programming Language

Page 8: Logistic Regression In Data Science

Slide 8 www.edureka.in/data-science

Data Science: Demand Supply Gap

Big Data Analyst

Big Data Architect

Big Data Engineer

Big Data Research Analyst

Big Data Visualizer

Data Scientist

50

43

44

31

23

18

50

57

56

69

77

82

Filled job vs unfilled jobs in big data

Filled Unfilled

Vacancy/Filled(%)

Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915

Page 9: Logistic Regression In Data Science

Slide 9 www.edureka.in/data-science

Hadoop and R together

Page 10: Logistic Regression In Data Science

Slide 10 www.edureka.in/data-science

Machine LearningWe have so many algorithms for data mining which can be used to build systems that can read past data and can

generate a system that can accommodate any future data and derive useful insight from it

Machine learning focuses on the development of computer programs that can teach themselves to grow and change

when exposed to new data

Page 11: Logistic Regression In Data Science

Slide 11 www.edureka.in/data-science

Types of Learning

Supervised Learning Unsupervised Learning

1. Uses a known dataset to make predictions.

2. The training dataset includes input data and response values.

3. From it, the supervised learning algorithm builds a model to make predictions of the response values for a new dataset.

1. Draw inferences from datasets consisting of input data without labeled responses.

2. Used for exploratory data analysis to find hidden patterns or grouping in data

3. The most common unsupervised learning method is cluster analysis.

Machine Learning

Page 12: Logistic Regression In Data Science

Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

• Common Machine Learning Algorithms

Page 13: Logistic Regression In Data Science

Slide 13 www.edureka.in/data-science

Logistic Regression

Page 14: Logistic Regression In Data Science

Slide 14 www.edureka.in/data-science

Logistic Regression

In statistics, logistic regression, or logit regression, or logit model is a direct probability

model

Rather than modeling this response Y directly, logistic regression models the probability

that Y belongs to a particular category

In logistic regression, we use the logistic function,

Page 15: Logistic Regression In Data Science

Slide 15 www.edureka.in/data-science

Logistic Regression

After some calculations we can get : p(X) /1−p(X) = eA+BX

The quantity p(X)/[1−p(X)] is called the odds, and can take on values between 0 and ∞.

Values of the odds close to 0 and ∞ indicate very low and very high probabilities resp.

Finally we get: log (p(X)/1-p(X)) = A + BX which is called the log-odds or logit

Logistic Regression is linear in x.

Page 16: Logistic Regression In Data Science

Slide 16 www.edureka.in/data-science

Sigmoid Function for Logistic Regression

Page 17: Logistic Regression In Data Science

www.edureka.in/pmi-acp

Slide 17

Page 18: Logistic Regression In Data Science

Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Maximum Likelihood Estimation (MLE)

→ MLE is a statistical method for estimating the coefficients of a model.

→ The likelihood function (L) measures the probability of observing the particular set of dependent variable values (p1, p2, ..., pn) that occur in the sample:

L = Prob (p1* p2* * * pn)

→ The higher the L, the higher the probability of observing the ps in the sample.

Page 19: Logistic Regression In Data Science

Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Maximum Likelihood Estimation (MLE)

→ MLE involves finding the coefficients (, ) that makes the log of the likelihood function (LL < 0) as large as possible

→ Or, finds the coefficients that make -2 times the log of the likelihood function (-2LL) as small as possible

→ The maximum likelihood estimates solve the following condition:

{Y - p(Y=1)}Xi = 0

summed over all observations, i = 1,…,n

Page 20: Logistic Regression In Data Science

www.edureka.in/pmi-acpSlide 20 www.edureka.in/data-science

Module 1

» Introduction to Data Science

Module 2

» Basic Data Manipulation using R

Module 3

» Machine Learning Techniques using R Part -1

- Clustering

- TF-IDF and Cosine Similarity

- Association Rule Mining

Module 4

» Machine Learning Techniques using R Part -2

- Supervised and Unsupervised Learning

- Decision Tree Classifier

Course Topics

Module 5

» Machine Learning Techniques using R Part -3

- Random Forest Classifier

- Naïve Bayer’s Classifier

Module 6

» Introduction to Hadoop Architecture

Module 7

» Integrating R with Hadoop

Module 8

» Mahout Introduction and Algorithm Implementation

Module 9

» Additional Mahout Algorithms and Parallel Processing in R

Module 10

» Project

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 21: Logistic Regression In Data Science

www.edureka.in/pmi-acp

Slide 21

Questions?Enroll for the Complete Course at : www.edureka.in/data_science

Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

www.edureka.in/data_science

Please Don’t forget to fill in the survey report

Class Recording and Presentation will be available in 24 hours at:http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/