apache tez : accelerating hadoop query processing
DESCRIPTION
호튼웍스 아시아 기술 총괄 이사 제프 마크햄 (Jeff Markham) 이 테즈에 대한 소개를 합니다. 테즈는 맵리듀스를 대체하여 하둡의 질의 처리를 가속하는 소프트웨어입니다. 왜 테즈를 만들었고, 어떻게 구성되었으며, 최적화는 어떻게 진행되고, 그 성능은 얼마나 좋아졌는지 전반에 대해 설명합니다.TRANSCRIPT
Apache Tez : Accelerating Hadoop Query Processing
Page 1
Jeff Markham Technical Director, APAC Hortonworks
© Hortonworks Inc. 2013
Tez – Introduction
Page 2
• Distributed execution framework targeted towards data-processing applications.
• Based on expressing a computation as a dataflow graph.
• Built on top of YARN – the resource management framework for Hadoop.
• Open source Apache incubator project and Apache licensed.
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
YARN: Taking Hadoop Beyond Batch
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Pig (data flow)
Hive (sql)
Others (cascading)
HDFS2 (redundant, reliable storage)
YARN (cluster resource management)
Tez (execu:on engine)
HADOOP 2.0
Data Flow Pig
SQL Hive
Others (cascading)
Batch MapReduce Real Time
Stream Processing
Storm
Online Data
Processing HBase,
Accumulo
MapReduce as Base Apache Tez as Base
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
Apache Tez (“Speed”) • Replaces MapReduce as primitive for Pig, Hive, Cascading etc.
– Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft
YARN ApplicationMaster to run DAG of Tez Tasks
Task with pluggable Input, Processor and Output
Tez Task - <Input, Processor, Output>
Task
Processor Input Output
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
Tez: Building blocks for scalable data processing
Classical ‘Map’ Classical ‘Reduce’
Intermediate ‘Reduce’ for Map-Reduce-Reduce
Map Processor
HDFS Input
Sorted Output
Reduce Processor
Shuffle Input
HDFS Output
Reduce Processor
Shuffle Input
Sorted Output
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x
ORDER BY AVG;
SELECT a.state
JOIN (a, c) SELECT c.price
SELECT b.id
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state, c.itemId
JOIN (a, c)
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
SELECT b.id
Tez avoids unneeded writes to
HDFS
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
Tez Sessions
… because Map/Reduce query startup is expensive
• Tez Sessions – Hot containers ready for immediate use – Removes task and job launch overhead (~5s – 30s)
• Hive – Session launch/shutdown in background (seamless, user not
aware) – Submits query plan directly to Tez Session
Native Hadoop service, not ad-hoc
© Hortonworks Inc. 2013.
© Hortonworks Inc. 2013.
Tez Delivers Interactive Query - Out of the Box!
Page 8
Feature DescripEon Benefit
Tez Session Overcomes Map-‐Reduce job-‐launch latency by pre-‐launching Tez AppMaster Latency
Tez Container Pre-‐Launch
Overcomes Map-‐Reduce latency by pre-‐launching hot containers ready to serve queries. Latency
Tez Container Re-‐Use Finished maps and reduces pick up more work rather than exi:ng. Reduces latency and eliminates difficult split-‐size tuning. Out of box performance!
Latency
Run:me re-‐configura:on of DAG
Run:me query tuning by picking aggrega:on parallelism using online query sta:s:cs Throughput
Tez In-‐Memory Cache Hot data kept in RAM for fast access. Latency
Complex DAGs Tez Broadcast Edge and Map-‐Reduce-‐Reduce paXern improve query scale and throughput. Throughput
© Hortonworks Inc. 2013
Tez – Design Themes
Page 9
• Empowering End Users • Execution Performance
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying deployment
Page 10
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s – Enable definition of complex data flow pipelines using simple
graph connection API’s. Tez expands the logical plan at runtime. – Targeted towards data processing applications like Hive/Pig but
not limited to it. Hive/Pig query plans naturally map to Tez dataflow graphs with no translation impedance.
Page 11
TaskA-1 TaskA-2 TaskB-1 TaskB-2 TaskC-1 TaskC-2
TaskD-1 TaskD-2 TaskE-1 TaskE-2
© Hortonworks Inc. 2013
Aggregate Stage
Partition Stage
Preprocessor Stage
Tez – Empowering End Users
• Expressive dataflow definition API’s
Page 12
Sampler
Task-1 Task-2
Task-1 Task-2
Task-1 Task-2
Samples
Ranges
Distributed Sort
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Flexible Input-Processor-Output runtime model – Construct physical runtime executors dynamically by connecting
different inputs, processors and outputs. – End goal is to have a library of inputs, outputs and processors that
can be programmatically composed to generate useful tasks.
Page 13
Mapper
HDFSInput
MapProcessor
FileSortedOutput
Reducer
ShuffleInput
ReduceProcessor
HDFSOutput
PairwiseJoin
Input1
JoinProcessor
FileSortedOutput
Input2
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Data type agnostic – Tez is only concerned with the movement of data. Files and
streams of bytes. – Does not impose any data format on the user application. MR
application can use Key-Value pairs on top of Tez. Hive and Pig can use tuple oriented formats that are natural and native to them.
Page 14
File
Stream
Key Value
Tez Task
Tuples
User Code
Bytes Bytes
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Simplifying deployment – Tez is a completely client side application. – No deployments to do. Simply upload to any accessible
FileSystem and change local Tez configuration to point to that. – Enables running different versions concurrently. Easy to test new
functionality while keeping stable versions for production. – Leverages YARN local resources.
Page 15
Client Machine
Node Manager
TezTask
Node Manager
TezTask TezClient
HDFS Tez Lib 1 Tez Lib 2
Client Machine
TezClient
© Hortonworks Inc. 2013
Tez – Empowering End Users
• Expressive dataflow definition API’s • Flexible Input-Processor-Output runtime model • Data type agnostic • Simplifying usage
With great power API’s come great responsibilities J Tez is a framework on which end user applications can be built
Page 16
© Hortonworks Inc. 2013
Tez – Execution Performance
• Performance gains over Map Reduce • Optimal resource management • Plan reconfiguration at runtime • Dynamic physical data flow decisions
Page 17
© Hortonworks Inc. 2013
Tez – Execution Performance
• Performance gains over Map Reduce – Eliminate replicated write barrier between successive
computations. – Eliminate job launch overhead of workflow jobs. – Eliminate extra stage of map reads in every workflow job. – Eliminate queue and resource contention suffered by workflow
jobs that are started after a predecessor job completes.
Page 18
Pig/Hive - MR Pig/Hive - Tez
© Hortonworks Inc. 2013
Tez – Execution Performance
• Plan reconfiguration at runtime – Dynamic runtime concurrency control based on data size, user
operator resources, available cluster resources and locality. – Advanced changes in dataflow graph structure. – Progressive graph construction in concert with user optimizer.
Page 19
HDFS Blocks
YARN Resources
Stage 1 50 maps
100 partitions
Stage 2 100
reducers
Stage 1 50 maps
100 partitions
Stage 2 100 10
reducers
Only 10GB’s of data
© Hortonworks Inc. 2013
Tez – Execution Performance
• Optimal resource management – Reuse YARN containers to launch new tasks. – Reuse YARN containers to enable shared objects across tasks.
Page 20
YARN Container
TezTask Host
TezTask1
TezTask2
Sha
red
Obj
ects
YARN Container
Tez Application Master
Start Task
Task Done
Start Task
© Hortonworks Inc. 2013
Tez – Execution Performance
• Dynamic physical data flow decisions – Decide the type of physical byte movement and storage on the fly. – Store intermediate data on distributed store, local store or in-
memory. – Transfer bytes via blocking files or streaming and the spectrum in
between.
Page 21
Producer (small size)
In-Memory
Consumer
Producer
Local File
Consumer
At Runtime
© Hortonworks Inc. 2013
Tez – Sessions
Page 33
Client
• Key for interactive queries • Analogous to database
sessions and represents a connection between the user and the cluster
• Run multiple DAGs / queries in the same session
• Maintains a pool of reusable containers for low latency execution of tasks within and across queries
• Takes care of data locality and releasing resources when idle
• Session cache in the Application Master and in the container pool reduce re-computation and re-initialization
Application Master
Con
tain
er P
ool
Pre-Warmed
JVM
Shared Object
Registry
Task Scheduler
Start Session
Submit DAG
© Hortonworks Inc. 2013
Tez – Benchmark Performance
Page 35
Significant (but not all) speed-ups due to Tez: • DAG support and runtime graph re-
configuration enable utilizing the parallelism of the cluster
• Tez Session and container re-use enable efficient and low latency execution
© Hortonworks Inc. 2013
Tez – Performance Analysis
Page 36
Tez Session populates container pool
Dimension table calculation and HDFS split generation in parallel
Dimension tables broadcasted to Hive MapJoin tasks
Final Reducer pre-launched and fetches completed inputs
AM
… …
TPC-DS – Query 27 with Hive on Tez
© Hortonworks Inc. 2013
Tez – Current status
• Apache Incubator Project – Rapid development. Over 600 jiras opened. Over 400 resolved. – Growing community of contributors and users.
• Focus on stability – Testing and quality are highest priority. – Code ready and deployed on multi-node environments.
• Support for a vast topology of DAGs – Already functionally equivalent to Map Reduce. Existing Map
Reduce jobs can be executed on Tez with few or no changes. – Hive re-targeted to use Tez for execution of queries (HIVE-4660). – Work started on Pig to use Tez for execution of scripts (PIG-3446).
Page 37
© Hortonworks Inc. 2013
Tez – Roadmap
• Richer DAG support – Support for co-scheduling and streaming – Better fault tolerance with checkpoints
• Performance optimizations – More efficiencies in transfer of data – Improve session performance
• Usability – Stability and testability – Recovery and history – Tools for performance analysis and debugging
Page 38
© Hortonworks Inc. 2013
Tez – Key Takeaways
• Distributed execution framework that works on computations represented as dataflow graphs
• Naturally maps to execution plans produced by query optimizers
• Customizable execution architecture designed to enable dynamic performance optimizations at runtime
• Works out of the box with the platform figuring out the hard stuff
• Span the spectrum of interactive latency to batch • Open source Apache project – your use-cases and code are welcome
• It works and is already being used by Hive and Pig
Page 40
© Hortonworks Inc. 2013
Thank You !
Page 41