![Page 1: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/1.jpg)
1
Integration with Hadoop
PolyBase in SQL 2016
Adastra
Pavel Stejskal, Consultant
linkedin.com/in/pavelstejskal
20.4.2016
![Page 2: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/2.jpg)
Integration with Hadoop using PolyBase
2
Excel + Power BI add-insQuery, Pivot, View, Map
SharePointPower Pivot Gallery, Power View
ExcelData Mining
Power BI Desktop Power BI Portal
Azure ML
Power BI Mobile App
Analytics Platform System (APS)
PolyBase allows you to use Transact-SQL (T-SQL) statements to access data stored in Hadoop or Azure Blob Storage and query it in an ad-hoc fashion.
![Page 3: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/3.jpg)
3
PolyBase
![Page 4: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/4.jpg)
What is PolyBase and where belongs to?
4
Hadoop cluster
Hortonworks / Cloudera
Azure
Blob Storage
Cloud solution
On-Premises solution
Hadoop cluster
Hortonworks / Cloudera
Relational Non-relational
Po
lyB
ase
Standard BI tools Integration
![Page 5: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/5.jpg)
How to start with PolyBase - requirements
5
• Hardware
– Server for SQL (SMP architecture)
– Hadoop cluster (MPP architecture)
– Fast network between SQL and Hadoop
• Software
– MS SQL 2016 – RDBMS
– Hadoop distribution (Hortonworks or Cloudera)
• In case of cloud solution
– Hadoop in cloud
– Azure blob storage
![Page 6: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/6.jpg)
PolyBase for SQLServer 2016 – How it works
6
SQL Server engine
PolyBase engine
PolyBase DMS*
Hadoop cluster
NameNode DataNode DataNode DataNode
T-SQL query
Direct JOIN
without ETL
DB Table
External Table
* DMS = Data Movement Service
MS SQL 2016
Data transfer
![Page 7: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/7.jpg)
3 basic concepts for PolyBase object
7
1. External data sourceCREATE EXTERNAL DATA SOURCE HadoopHDP2 WITH (
TYPE = HADOOP,
LOCATION ='hdfs://10.xxx.xx.xxx:xxxx',
RESOURCE_MANAGER_LOCATION = '10.xxx.xx.xxx:xxxx',
CREDENTIAL = HadoopUser1 (for Kerberos-secured Hadoop)
);
2. External file formatCREATE EXTERNAL FILE FORMAT TextFileFormat WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR ='|',
USE_TYPE_DEFAULT = TRUE)
);
![Page 8: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/8.jpg)
3 basic concepts for PolyBase object
8
3. External tableCREATE EXTERNAL TABLE ClickStream (
url varchar(50),
event_date date,
user_IP varchar(50)
)
WITH (
LOCATION='/webdata/employee.tbl', --path in HDFS)
DATA_SOURCE = HadoopHDP2,
FILE_FORMAT = TextFileFormat
);
![Page 9: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/9.jpg)
External table
9
• Adding a shape to semi-structured data
File format – “|” as delimiter
Defined types of columns
Table for T-SQL query
1
2
3
![Page 10: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/10.jpg)
10
Demo
![Page 11: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/11.jpg)
Sqoop vs. PolyBase
11
SQL
Hadoop cluster
SQL
Hadoop clusterSqoop PolyBase
2 TB 100 TB 2 TB 100 TB
T-SQL queryT-SQL query Hive SQL
Data volume Data volume
???
![Page 12: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016 · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00](https://reader033.vdocuments.mx/reader033/viewer/2022042413/5f2cb751ea5a6967bc63abdb/html5/thumbnails/12.jpg)
ADASTRA CZECH REPUBLICAdastra, s.r.o.
Karolinská 654/2, 186 00 Praha 8
Tel.: +420 271 733 303
www.adastra.cz
ADASTRA GROUP North America8500 Leslie St.
Markham, Ontario, L3T 7M8
Tel: +1 905 881 7946
Restrictions for public release and use:This document can comprise confidential information. As such it may not, without Adastra’s prior consent, be copied or transferred.
Important:All brands and names of products given in this documentation are or can be registered trademarks of their owners.© 2016 Adastra, all rights reserved.
12
Thank you!