Download - Data analytics all about data v5
![Page 1: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/1.jpg)
Data Analytics – All about data
Harish Dixit
![Page 2: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/2.jpg)
Getting the context
• Examining raw data.
• Discovering useful information.
• Make better business decisions.
![Page 3: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/3.jpg)
Life cycle stages
![Page 4: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/4.jpg)
What is Big Data?
![Page 5: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/5.jpg)
From where data is coming?
![Page 6: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/6.jpg)
Complex ecosystem
Cloud
![Page 7: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/7.jpg)
Hadoop Architecture
![Page 8: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/8.jpg)
HDFS
• Distributed file system.
• Partitioning of data.
• Fault tolerant.
• Java API.
• Higher scalability.
• Master slave paradigm.
![Page 9: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/9.jpg)
HDFS
![Page 10: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/10.jpg)
Map Reduce
• Parallel processing model.
• Move operations not data.
• Distributed computations.
• User defined functions
– Map()
– Reduce()
![Page 11: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/11.jpg)
Map/Reduce Operations
![Page 12: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/12.jpg)
Example
![Page 13: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/13.jpg)
R - Add analytic power to Hadoop
![Page 14: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/14.jpg)
Data modeling techniques
• Regression
• Classification
• Clustering
• Recommendation
• Text mining
![Page 15: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/15.jpg)
Regression
Regression can be formulated as follows:
y = ax +e
x y -----------------------------
63 3.164 3.665 3.866 4
-----------------------------
![Page 16: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/16.jpg)
Classification
![Page 17: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/17.jpg)
Clustering
![Page 18: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/18.jpg)
Recommendation
![Page 19: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/19.jpg)
Text Mining
![Page 20: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/20.jpg)
NoSQL
• Non-relational.
• Distributed environment.
• Large volume.
• No fixed schemas.
• Horizontally scalable.
![Page 21: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/21.jpg)
CAP Theorem (any 2 of 3)
![Page 22: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/22.jpg)
Variants of NoSQL
• Key-Value Systems.
• Document-based Systems.
• Column-based Systems.
• Graph-based Systems.
![Page 23: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/23.jpg)
Distributed Key-Value Systems
![Page 24: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/24.jpg)
Column-based Systems
![Page 25: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/25.jpg)
Applications of Big Data
![Page 26: Data analytics all about data v5](https://reader033.vdocuments.mx/reader033/viewer/2022061214/547e44fab47959a2508b4b06/html5/thumbnails/26.jpg)
Think Big Q & A