microsoft big data essentials module 1 - introduction to big data saptak sen, microsoft bill ramos,...
TRANSCRIPT
![Page 1: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/1.jpg)
Microsoft Big Data EssentialsModule 1 - Introduction to Big Data
Saptak Sen, MicrosoftBill Ramos, Advaiya
![Page 2: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/2.jpg)
• Why Big Data?
• Big Data Lambda Architecture
• Getting started with Windows Azure HDInsight Service
Agenda
![Page 3: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/3.jpg)
The Business Imperative
1. 2. 4. 3. Human Fault Tolerance
Minimize CapEx Low Learning CurveHyper Scale on Demand
![Page 4: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/4.jpg)
CAP Theorem
Consistency
C
Partition Tolerance
PAvailabili
ty
A
![Page 5: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/5.jpg)
Big Data Lambda Architecture
![Page 6: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/6.jpg)
Big Data Lambda Architecture
• Batch layer• Stores master dataset• Compute arbitrary views
• Speed layer• Fast, incremental algorithms• Batch layer eventually
overrides speed layer
• Serving layer• Random access to batch
views• Updated by batch layer
Serving Layer
Speed Layer
Batch Layer
![Page 7: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/7.jpg)
The Batch Layer
• Stores master dataset (in append mode)
• Unrestrained computation
• Horizontally scalable
• High latency
Incoming data
streams
Master dataset
Batch views
![Page 8: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/8.jpg)
The Speed Layer
• Stream processing of data
• Stores a limited window of data
• Dynamic computation
Real-time increments
Incoming data
streams
Process stream
Increment views
Real-time views
![Page 9: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/9.jpg)
The Serving Layer
• Queries the batch and real-time views
• Merges the resultsReal-time views
Batch views
Querying and
mergingOutput
![Page 10: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/10.jpg)
Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
Azure Blob storage
MapReduce, Hive, Pig, Oozie, SSIS
Federations in Windows Azure SQL Database
Azure tables
Memcached/MongoDB
SQL Server database engine
SQL Server VM:
• Columnstore indexes
• Analysis Services
• StreamInsight
Azure Storage Explorer
Microsoft Excel
Power Query
PowerPivot
Power View
Power Map
Reporting Services
LINQ to Hive
Analysis Services
![Page 11: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/11.jpg)
Serving LayerSpeed LayerBatch Layer
Apache Hadoop
Yahoo!
SQL Server Analysis Service (SSAS)
Microsoft Excel and PowerPivot
Other BI Tools and Custom Applications
Hadoop Data
Third Party Database
SQL Server Analysis Services
(SSAS Cube)
+Custom
Applications
SQL Server Connector (Hadoop Hive ODBC)
Staging Database
Microsoft Excel & PowerPivot for
Excel
![Page 12: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/12.jpg)
Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
Ferranti Computer Systems
Microsoft Dynamics AX
SQL Server Analysis Services
SQL Server Reporting Services
SQL Server (In-Memory OLTP)
Data Feed from Smart Meters
Reactive Extensions (Rx)
SQL Server Database (In-Memory OLTP)
Reactive Extensions (Rx)
Windows Azure
HDInsight
SQL Server Analysis Services
SQL Server ReportingServices
Microsoft Dynamics
AX
![Page 13: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/13.jpg)
Windows Azure Storage
![Page 14: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/14.jpg)
Serving LayerSpeed LayerBatch Layer
Azure Blob storage
Windows AzureBlob storage
Demo 1: Setting up the Windows Azure storage account
Azure Storage Explorer
Azure Storage Explorer
![Page 15: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/15.jpg)
Blob Storage Concepts
• Store large amounts of unstructured text or binary data with the fastest read performance
• Highly scalable, durable, and available file system
• Blobs can be exposed publically over HTTP
• Securely lock down permissions to blobs
BlobContainer
Account
Images
PIC01.JPG
Video
VID1.AVI
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/Blocks
Block/Page
Block/Page
PIC02.JPGContoso
![Page 16: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/16.jpg)
Getting started with HDInsight Service
![Page 17: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/17.jpg)
Demo 2: Setting up the Windows Azure HDInsight cluster
Windows Azure HDInsight
Azure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
![Page 18: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/18.jpg)
Demo 3: Loading data into Windows Azure storage for use with HDInsight
Windows Azure HDInsight
Azure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
CSV files from local disk
![Page 19: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/19.jpg)
Easy Access to Data, Big & Small
![Page 20: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/20.jpg)
Easy Access to Data, Big & SmallSimplify access to public & corporate data
Easily preview, shape, & format your data
Combine and refine data across multiple sources
Gain insight across relational, unstructured, & semi-structured data
Common management of structured & unstructured data
Query across relational DB & Hadoop with single T-SQL Query
Power Query
Windows Azure Marketplace
Windows Azure HDInsight Service
Parallel Data Warehouse with Polybase
![Page 21: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/21.jpg)
Learn more• Getting Started with
HDInsighthttp://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx
• Azure HDInsight and Azure Storagehttp://blogs.msdn.com/b/windowsazure/archive/2013/03/21/azure-hdinsight-and-azure-storage.aspx
![Page 22: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/22.jpg)
Questions?
![Page 23: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya](https://reader035.vdocuments.mx/reader035/viewer/2022062320/56649d8b5503460f94a71c94/html5/thumbnails/23.jpg)