introduction to data science
TRANSCRIPT
![Page 1: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/1.jpg)
Introduction to Data Science
Dr. Bill Howe - Director of Research, Scalable Data Analytics
![Page 2: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/2.jpg)
What is data science?◦ Set of theories and principles to perform several data
related tasks, like
◦ Data collection
◦ Data cleaning
◦ Data integration
◦ Data modeling
◦ Data visualization
Introduction to Data Science
![Page 3: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/3.jpg)
Data science is different from ◦ Business intelligence
◦ Statistics
◦ Database management
◦ Visualization
◦ Machine Learning
Introduction to Data Science
![Page 4: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/4.jpg)
DBA- Unstructured data
Statistician – data that doesn’t fit in to memories
Software engineer- statistical models and how to communicate results
Business analyst- algorithms and tradeoff at scale
Suggest ion for students!!
![Page 5: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/5.jpg)
Common three skills of Data scientiest◦ Statistics
traditional analysis
◦ Data Munging parsing, scraping, and formatting data
◦ Visualization graphs, tools, etc.
What do data scientists do?
![Page 6: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/6.jpg)
Three types of tasks:
◦ Preparing to run a model
◦ Running the model
◦ Communicating the results
What do data scientists do?
![Page 7: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/7.jpg)
◦ Preparing to run a model Gathering
Cleaning
Integrating
Restructuring
Transforming
Loading
Filtering
![Page 8: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/8.jpg)
◦ Running the model Choosing appropriate machine learning
algorithms for regression, classification, clustering and recommendations.
Validation of model
Improvement of model
◦Communicating the results
![Page 9: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/9.jpg)
Breadth◦ Mapreduce/Relational algebra/Logistic
regression/visualization Depth
◦ Structure (Relational algebra)/ statics (linear algebra)
Scale◦ Desktop (R)/Cloud (Hadoop)
Target◦ Hackers(R,Java, python) /Analyts (little/no
programming)
Data science dimensions
![Page 10: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/10.jpg)
Scale – Cloud for Bigdata The bigdata can be measured by 3 V’s
◦ Volume – number of rows (size)
◦ Variety – number of columns OR sources (text, images, audio, video)
◦ Velocity - number of rows OR bytes per unit time (processing time )
Data science dimensions
![Page 11: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/11.jpg)
“data exhaust” from customers
new and pervasive sensors
the ability to “keep everything”
Where does big data come from?
![Page 12: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/12.jpg)
Prior programming exercise◦ SQL◦ Python
Basic statistics
Basic database concepts
Prequisites
![Page 13: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/13.jpg)
Twitter sentiment Analysis◦ Extract the tweets from twitter API
◦ Calculate the sentiment score for tweets
◦ Calculate the sentiment score for terms in tweets
◦ Calculate frequency for terms of tweets
◦ Identify the happiest state
◦ Identify the top ten hastag
Programming Assignment 1
![Page 14: Introduction to data science](https://reader036.vdocuments.mx/reader036/viewer/2022083002/558c1a96d8b42a9d2c8b45d3/html5/thumbnails/14.jpg)
Thanks !!