data science and big data analytics chap1: intro to big data analytics charles tappert seidenberg...
TRANSCRIPT
![Page 1: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/1.jpg)
Data Science and Big Data Analytics
Chap1: Intro to Big Data Analytics
Charles TappertSeidenberg School of CSIS, Pace
University
![Page 2: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/2.jpg)
1.1 Big Data Overview
Industries that gather and exploit data Credit card companies monitor purchase
Good at identifying fraudulent purchases Mobile phone companies analyze calling
patterns – e.g., even on rival networks Look for customers might switch providers
For social networks data is primary product
Intrinsic value increases as data grows
![Page 3: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/3.jpg)
Attributes Defining Big Data Characteristics
Huge volume of data Not just thousands/millions, but billions of
items Complexity of data types and
structures Varity of sources, formats, structures
Speed of new data creation and grow High velocity, rapid ingestion, fast
analysis
![Page 4: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/4.jpg)
Sources of Big Data Deluge
Mobile sensors – GPS, accelerometer, etc. Social media – 700 Facebook updates/sec
in2012 Video surveillance – street cameras, stores,
etc. Video rendering – processing video for
display Smart grids – gather and act on information Geophysical exploration – oil, gas, etc. Medical imaging – reveals internal body
structures Gene sequencing – more prevalent, less
expensive, healthcare would like to predict personal illnesses
![Page 5: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/5.jpg)
Sources of Big Data Deluge
![Page 7: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/7.jpg)
1.1.1 Data Structures:Characteristics of Big
Data
![Page 8: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/8.jpg)
Data Structures:Characteristics of Big
Data
Structured – defined data type, format, structure Transactional data, OLAP cubes, RDBMS, CVS files,
spreadsheets Semi-structured
Text data with discernable patterns – e.g., XML data Quasi-structured
Text data with erratic data formats – e.g., clickstream data Unstructured
Data with no inherent structure – text docs, PDF’s, images, video
![Page 9: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/9.jpg)
Example of Structured Data
![Page 10: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/10.jpg)
Example of Semi-Structured Data
![Page 11: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/11.jpg)
Example of Quasi-Structured Data
visiting 3 websites adds 3 URLs to user’s log files
![Page 12: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/12.jpg)
Example of Unstructured Data
Video about Antarctica Expedition
![Page 13: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/13.jpg)
1.1.2 Types of Data Repositories
from an Analyst Perspective
![Page 14: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/14.jpg)
1.2 State of the Practicein Analytics
Business Intelligence (BI) versus Data Science
Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem
and a New Approach to Analytics
![Page 15: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/15.jpg)
Business Drivers for Advanced Analytics
![Page 16: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/16.jpg)
1.2.1 Business Intelligence (BI) versus Data Science
![Page 17: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/17.jpg)
1.2.2 Current Analytical Architecture
Typical Analytic Architecture
![Page 18: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/18.jpg)
Current Analytical Architecture
Data sources must be well understood EDW – Enterprise Data Warehouse From the EDW data is read by
applications Data scientists get data for
downstream analytics processing
![Page 19: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/19.jpg)
1.2.3 Drivers of Big DataData Evolution & Rise of Big Data
Sources
![Page 20: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/20.jpg)
1.2.4 Emerging Big Data Ecosystem and a New Approach
to Analytics
Four main groups of players Data devices
Games, smartphones, computers, etc. Data collectors
Phone and TV companies, Internet, Gov’t, etc. Data aggregators – make sense of data
Websites, credit bureaus, media archives, etc. Data users and buyers
Banks, law enforcement, marketers, employers, etc.
![Page 21: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/21.jpg)
Emerging Big Data Ecosystem and a New Approach to Analytics
![Page 22: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/22.jpg)
1.3 Key Roles for theNew Big Data
Ecosystem
1. Deep analytical talent Advanced training in quantitative
disciplines – e.g., math, statistics, machine learning
2. Data savvy professionals Savvy but less technical than group 1
3. Technology and data enablers Support people – e.g., DB admins,
programmers, etc.
![Page 23: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/23.jpg)
Three Key Roles of theNew Big Data
Ecosystem
![Page 24: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/24.jpg)
Three Recurring Data Scientist Activities
1. Reframe business challenges as analytics challenges
2. Design, implement, and deploy statistical models and data mining techniques on Big Data
3. Develop insights that lead to actionable recommendations
![Page 25: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/25.jpg)
Profile of Data ScientistFive Main Sets of Skills
![Page 26: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/26.jpg)
Profile of Data ScientistFive Main Sets of Skills
Quantitative skill – e.g., math, statistics
Technical aptitude – e.g., software engineering, programming
Skeptical mindset and critical thinking – ability to examine work critically
Curious and creative – passionate about data and finding creative solutions
Communicative and collaborative – can articulate ideas, can work with others
![Page 27: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/27.jpg)
1.4 Examples of Big Data Analytics
Retailer Target Uses life events: marriage, divorce,
pregnancy Apache Hadoop
Open source Big Data infrastructure innovation
MapReduce paradigm, ideal for many projects
Social Media Company LinkedIn Social network for working professionals Can graph a user’s professional network 250 million users in 2014
![Page 28: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/28.jpg)
Data Visualization of User’s
Social Network Using InMaps
![Page 29: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/29.jpg)
Summary
Big Data comes from myriad sources Social media, sensors, IoT, video surveillance, and
sources only recently considered Companies are finding creative and novel ways
to use Big Data Exploiting Big Data opportunities requires
New data architectures New machine learning algorithms, ways of working People with new skill sets
Always Review Chapter Exercises
![Page 30: Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics Charles Tappert Seidenberg School of CSIS, Pace University](https://reader033.vdocuments.mx/reader033/viewer/2022061612/56649ef65503460f94c09fb4/html5/thumbnails/30.jpg)
Focus of Course
Focus on quantitative disciplines – e.g., math, statistics, machine learning
Provide overview of Big Data analytics
In-depth study of a several key algorithms