pxf bdam 2016
TRANSCRIPT
![Page 1: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/1.jpg)
Shivram Mani ( Pivotal)
PXF A Unified Access Framework for
HDFS datasets
![Page 2: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/2.jpg)
Agenda
● Motivations● PXF Introduction● Architecture/Design● Developer View● Usage/Plugins● Value Proposition to new applications● Whats coming
![Page 3: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/3.jpg)
Motivations: SQL on Hadoop
RDBMS
?
various formats, storages supported on HDFS
● ANSI SQL● Cost based optimizer● Transactions● ...
Foreign Tables!
![Page 4: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/4.jpg)
PXF is an extension framework that does the following
● Uniform tabular view to heterogeneous data sources
● Exploits parallelism for data access
● Pluggable framework for custom connectors
● Provides built-in connectors for accessing data in HDFS files, Hive/HBase tables, etc
What is PXF ?
![Page 5: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/5.jpg)
PXF Communication
Apache Tomcat
PXF WebappREST API
Java API
libhdfs3 (written in C) segments
External Tables
Native Tables
HTTP, port: 51200
Java API
Java/Thrift
![Page 6: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/6.jpg)
Deployment Architecture
HAWQMaster Node NN
pxf
HBase Master
DN4
pxf
HAWQseg4
DN1
pxf
HAWQseg1
HBase Region Server1
DN2
pxf
HAWQseg2
HBase Region Server2
DN3
pxf
HAWQseg3
HBase Region Server3
* PXF needs to be installed on all DN* PXF is recommended to be installed on NN
![Page 7: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/7.jpg)
PXF Components
Fragmenter Splits dataset into partitionsReturns locations of each partition
Accessor Understand and read/write the fragmentReturn records
Resolver Convert records to a consumable format (Data Types)
![Page 8: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/8.jpg)
Architecture - Read Data Flow
HAWQMaster Node NN
pxf
DN1
pxf
HAWQseg1
select * from ext_table0
getFragments() API
pxf://<location>:<port>/<path>
1
Fragments (JSON)2
7
3Split mapping(fragment -> segment)
DN1
pxf
HAWQseg1
DN1
pxf
HAWQseg1Query dispatched to Segment 1,2,3… (Interconnect)
5
Read() REST
6 records
8
query result
Records (stream)
Fragmenter
Resolver
Accessor
4
![Page 9: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/9.jpg)
Read Data Flow - Take 2
![Page 10: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/10.jpg)
PXF Developer View
![Page 11: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/11.jpg)
PXF Usage
Built-in with Plugins
● HDFS
● Hive
● HBase
Community (https://bintray.com/big-data/maven/pxf-plugins/view )
● Cassandra
● Accumulo
● Redis
● ...
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name ( column_name data_type [, ...] )LOCATION ('pxf://host[:port]/path-to-data?PROFILE=<profile-name> [&custom-option=value...]')FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
![Page 12: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/12.jpg)
PXF Hdfs PluginFragment - Splits (blocks)
● Support Read : multiple formats ->
● Support Write to Sequence Files
● Chunked Read Optimization
● Support for stats
Profile Description
HdfsTextSimple Read delimited single line records (plain text)
HdfsTextMulti Read delimited multiline records (plain text)
Avro Read avro records
JSON Supports simple/pretty printed JSON with
field projection
![Page 13: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/13.jpg)
PXF Hive PluginFragment - Splits of the file stored in table
● Text based
● SequenceFile
● RCFile
● ORCFile
● Parquet
● Avro
*Complex types are converted to text
Partition Filtering
Metadata API *
Profile Description
Hive Read all Hive tables (all types)
HiveRC Hive tables stored in RC (serialized with
ColumnarSerDe/LazyBinaryColumnarSerDe)
HiveText Faster access for Hive tables stored as Text
![Page 14: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/14.jpg)
PXF HBase PluginFragment - Regions
● Read Only. Uses Profile ‘Hbase’
● Filter push down to Hbase scanner
○ (Operators: EQ, NE, LT, GT, LE, GE & AND)
● Direct Mapping
● Indirect Mapping
○ Lookup table - pxflookup
○ Maps attribute name to hbase <cf:qualififer>
(row key) mapping
sales id=cf1:saleid
sales cmts-cf8:comments
![Page 15: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/15.jpg)
● Abstracts application from external Datasource/APIs/Versions
● Focus on one data layout
● Off the shelf support for various datasources
● Extensibility. Ease of supporting custom datasources
● Provides means for Filter push down
● Dataset statistics for performance optimization
Value Proposition of PXF
![Page 16: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/16.jpg)
● Using FDW callback functions that will interact with PXF.
PXF with Postgres
Apache Tomcat
PXF WebappREST API Java API
HTTP, port: 51200
Java API
Java/Thrift
FDW
![Page 17: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/17.jpg)
● HA
● Schema Auto Discovery (Metadata)
● Support for more dataset statistics
● Time series data optimization
● More plugins (Gemfire, Solr, etc)
● Additional Filter push down support
● Custom Output Format
Whats coming
![Page 18: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/18.jpg)
cwiki.apache.org/confluence/display/HAWQ/PXFhttp://hawq.incubator.apache.org/docs/pxf/javadoc
github.com/apache/incubator-hawq/tree/master/pxf
issues.apache.org/jira/browse/HAWQ Component = PXF
ContributionFeature Areas Custom Plugins
(storage, formats)Push Down
FiltersCustom
Applications
Documentation Wiki/Docs
Code / Review Github(Apache)
Join Discussion/Ask Questions Apache DLs [email protected]@hawq.incubator.apache.org
Github(Field) github.com/Pivotal-Field-Engineering/pxf-field
![Page 19: PXF BDAM 2016](https://reader035.vdocuments.mx/reader035/viewer/2022070514/5880d5681a28ab9c3a8b5f63/html5/thumbnails/19.jpg)
thank you !