Transcript
Page 1: Data Science : Make Smarter Business Decisions

www.edureka.in/data-science

Data ScienceMake Business decisions

Smarter

Page 2: Data Science : Make Smarter Business Decisions

www.edureka.co/r-for-analyticsSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Objectives

What is data mining

What is data science??

What is need of data scientist??

Stages of data mining??

Roles and Responsibilities of a Data Scientist.

Sentiment analysis on Zomato reviews

At the end of this session, you will be able to

Page 3: Data Science : Make Smarter Business Decisions

www.edureka.in/data-scienceSlide 3

Data Science Applications: Wine Recommendation

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 4: Data Science : Make Smarter Business Decisions

www.edureka.in/data-scienceSlide 4

Data Science Applications: Pizza Hut

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 5: Data Science : Make Smarter Business Decisions

www.edureka.in/data-scienceSlide 5

Data Science Applications: Summarize News

Page 6: Data Science : Make Smarter Business Decisions

www.edureka.in/data-scienceSlide 6

How about this?

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 7: Data Science : Make Smarter Business Decisions

www.edureka.in/data-scienceSlide 7

What’s Common in these Applications?

According to Wikipedia: Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science.

These scenarios involve:

Storing, organizing and integrating huge amount of unstructured data Processing and analyzing the data Extracting knowledge, insights and predict future from the data

Storage of big data is done in Hadoop. For more details on Hadoop please refer Big data and Hadoop blog http://www.edureka.in/blog/category/big-data-and-hadoop/

Processing, Analyzing, extracting knowledge and insights are done through Machine Learning.

All above technologies and steps together can be termed as data mining process.

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 8: Data Science : Make Smarter Business Decisions

Slide 8Slide 8 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Cross Industry standard Process for data mining ( CRISP – DM )

Stages of Analytics / Data Mining

Page 9: Data Science : Make Smarter Business Decisions

Slide 9Slide 9 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Knowledge discovery and data mining ( KDD)

Stages of Analytics / Data Mining

Page 10: Data Science : Make Smarter Business Decisions

Slide 10Slide 10 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

What is data science??

“More data usually beats better algorithms,” Such as: Recommending movies or music based on past preferences

No matter how extremely unpleasant your algorithm is, they can often be beaten simply by having more data (and a less sophisticated algorithm).

Page 11: Data Science : Make Smarter Business Decisions

Slide 11Slide 11 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Components data science??

Page 12: Data Science : Make Smarter Business Decisions

Slide 12Slide 12 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

What is R

R is Programming Language

R is Environment for Statistical Analysis

R is Data Analysis Software

Page 13: Data Science : Make Smarter Business Decisions

Slide 13 www.edureka.in/data-science

Data Science: Demand Supply Gap

Big Data Analyst

Big Data Architect

Big Data Engineer

Big Data Research Analyst

Big Data Visualizer

Data Scientist

50

43

44

31

23

18

50

57

56

69

77

82

Filled job vs unfilled jobs in big data

Filled Unfilled

Vacancy/Filled(%)

Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915

Page 14: Data Science : Make Smarter Business Decisions

Slide 14 www.edureka.in/data-science

Page 15: Data Science : Make Smarter Business Decisions

Slide 15Slide 15 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

R : Characteristics

Effective and fast data handling and storage facility

A bunch of operators for calculations on arrays, lists, vectors etc

A large integrated collection of tools for data analysis, and visualization

Facilities for data analysis using graphs and display either directly at the computer or paper

A well implemented and effective programming language called ‘S’ on top of which R is built

A complete range of packages to extend and enrich the functionality of R

Page 16: Data Science : Make Smarter Business Decisions

Slide 16Slide 16 www.edureka.co/r-for-analyticsTwitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Data Visualization in RThis plot represents the

locations of all the traffic signals in the city.

It is recognizable as Toronto without any other geographic data being plotted - the structure of the city comes out in the data alone.

Page 17: Data Science : Make Smarter Business Decisions

Slide 17 www.edureka.in/data-science

Data Science: Job Trends

Page 18: Data Science : Make Smarter Business Decisions

Slide 18 www.edureka.in/data-science

Machine LearningWe have so many algorithms for data mining which can be used to build systems that can read past data and can

generate a system that can accommodate any future data and derive useful insight from it

Such set of algorithms comes under machine learning

Machine learning focuses on the development of computer programs that can teach themselves to grow and change

when exposed to new data

Train data

ML

model

Algorithms

Page 19: Data Science : Make Smarter Business Decisions

Slide 19 www.edureka.in/data-science

Types of Learning

Supervised Learning Unsupervised Learning

1. Uses a known dataset to make predictions.

2. The training dataset includes input data and response values.

3. From it, the supervised learning algorithm builds a model to make predictions of the response values for a new dataset.

1. Draw inferences from datasets consisting of input data without labeled responses.

2. Used for exploratory data analysis to find hidden patterns or grouping in data

3. The most common unsupervised learning method is cluster analysis.

Machine Learning

Page 20: Data Science : Make Smarter Business Decisions

Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

• Common Machine Learning Algorithms

Types of Learning

Supervised Learning

Unsupervised Learning

Algorithms

Naïve Bayes Support Vector Machines Random Forests Decision Trees

Algorithms

K-means

Fuzzy Clustering

Hierarchical Clustering

Gaussian mixture models

Self-organizing maps

Page 21: Data Science : Make Smarter Business Decisions

Slide 21 www.edureka.in/data-science

Use Case : Zomato Ratings Review

Page 22: Data Science : Make Smarter Business Decisions

Slide 22 www.edureka.in/data-science

Module 1

» Introduction to Data Science

Module 2

» Basic Data Manipulation using R

Module 3

» Machine Learning Techniques using R Part -1

- Clustering

- TF-IDF and Cosine Similarity

- Association Rule Mining

Module 4

» Machine Learning Techniques using R Part -2

- Supervised and Unsupervised Learning

- Decision Tree Classifier

Course Topics

Module 5

» Machine Learning Techniques using R Part -3

- Random Forest Classifier

- Naïve Bayer’s Classifier

Module 6

» Introduction to Hadoop Architecture

Module 7

» Integrating R with Hadoop

Module 8

» Mahout Introduction and Algorithm Implementation

Module 9

» Additional Mahout Algorithms and Parallel Processing in R

Module 10

» Project

Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions

Page 23: Data Science : Make Smarter Business Decisions

Slide 23

Questions?Enroll for the Complete Course at : www.edureka.in/data_science

Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions

www.edureka.in/data_science

Please Don’t forget to fill in the survey report

Class Recording and Presentation will be available in 24 hours at:http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/


Top Related