andy konwinski x-tracing hadoop

8/6/2019 Andy Konwinski X-tracing Hadoop

1/17

UC Berkeley

X-Trac ing Hadoop

Andy Konwinski, Matei Zaharia,Randy Katz, Ion Stoica

CS division, UC Berkeley, [email protected]

Motivation & Objectives

Motivation:

Hadoop style parallel processing masks

failures and performance problems

Objectives:

Help Hadoop developersdebug and profileHadoop

Help operatorsmonitor and optimizeMapReduce jobs


2/17

About the RAD Lab

Reliable, Adaptive Distributed Systems

Setup for X-Tracing

Instrument Hadoop using X-Traceframework

Trace analysis

Visualization via web-based UI

Statistical analysis and anomaly detection Identify potential problems


3/17

Outline

X-Trace overview Applications of trace analysis

Conclusion

Checkpoint

X-Trace overview

Applications of trace analysis

Conclusion


4/17

How X-Trace Works

Path basedtracing framework Generate event graph to capture causality

of events across network

Examples of events: RPCs, HTTP requests

Annotate messageswith trace metadata(16 bytes) carried along execution path

Instrument Protocol APIs and RPC libraries

DFS Write Message Flow

Write Write Write


5/17

Example DFS Write

Successful DFS write

Report LabelTrace ID #Report ID #HostnameTimestamp

Report LabelTrace ID #Report ID #HostnameTimestamp

Report Graph Node

Begin writeoperation onDFS Client r37

Begin writeoperation onDFS Client r37

First writehappens onDataNode r33

First writehappens onDataNode r33

Second writeon differentDataNode r32

Second writeon differentDataNode r32

Third writeon finalDataNode r31

Third writeon finalDataNode r31

r33completesand returns


DFS Clientwrite operation

completes

DFS Clientwrite operation

completes



r31 completes

and returns

r31 completes

and returns

Failed DFS Write

Failed DFS write

3 MinuteTimeout

3 MinuteTimeout


6/17

MapReduce Event Graph

Example MapReduce graph

Properties of Path Tracing

Deterministic causality and concurrency

Control over which events get traced

Cross-layer

Low overhead

Modest modification of app source code Fewer than 500 lines for Hadoop


7/17

Our Architecture

HadoopMaster

HadoopSlave

X-Trace FE X-Trace FE

X-TraceBackend

BerkeleyDB

TraceAnalysisWeb UI

HadoopSlave

X-Trace FE

HadoopSlave

X-Trace FE

User

FaultDetectionPrograms

HTTPHTTP

TCP

HTTP

Trace Analysis UI

Web-based

Provides

Performance statistics

Graphs of utilization

Critical path analysis


8/17

Checkpoint

X-Trace overview Applications of trace analysis

Conclusion

Optimizing Job Performance

Examined performance of Apache Nutchweb indexing engine on a Wikipedia crawl

Time to creating an inverted link index of a50 GB crawl

With default configuration, 2 hours

With optimized configuration, 7 minutes


9/17


Machine utilization under default configuration


Problem: One single Reduce task, which actually failsseveral times at the beginning

Three 10 minutetimeouts


10/17


Active tasks vs. time with improved configuration(50 reduce tasks instead of one)

Faulty Machine Detection

Motivated by observing slow machine

Diagnosed to be failing hard driveHost Number

MasternodeMasternode

AnomalousAnomalous


11/17

Statistical Analysis

Off-line Machine Learning Faulty machine detection

Buggy software detection

Current Work on graph processing andanalysis

Future Work

Tracing more production MapReduceapplications Larger clusters + real workloads

More advanced trace processing tools

Migrating our code into Hadoop codebase


12/17

Conclusion

Efficient, low overhead tracing of Hadoop Exposed multiple interesting behaviors

[email protected]

[email protected]

http://x-trace.net

Questions?

???


13/17

Andys ToDo Slide:

Overhead of X-Trace

Negligible overhead in our 40-node testcluster

Because Hadoop operations are large

E.g. 18-minute Apache Nutch indexing job

with 390 tasks generates 50,000 reports(~20 MB of text data)


14/17


0

0.2

0.4

0.6

0.8

1

Welch Rank Welch Wilcoxon Permutation

Success Rates False Positive Rate

0

0.2

0.4

0.6

0.8

1

Welch Rank Welch Wi lcoxon Permuta tion

False Positive Rate

Failing disk, significance = 0.01 No failing disk, significance = 0.01

Observations about Hadoop

Unusual run - very slow small reads at start of job


15/17


Tracing also illustrates the inefficiency of havingtoo many mappers and reducers

Breakdown of longest mapwith 3x more mappers

Related Work

Per-machine logs

Active tracing: log4j, etc

Passive tracing: dtrace, strace

Log aggregation tools: Sawzall, Pig

Cluster monitoring tools: Ganglia


16/17

Event Graphs

Events are nodes in a causal directedgraph

Captures causality (graph edges)

Captures concurrency (fan-out, fan-in,parallel edges)

Spans layers, applications, administrative

boundaries

Observations about Hadoop

Hardcoded timeouts may be inappropriatein some cases (e.g. long reduce task)

Solution: expose timeouts in configurationfile?

Highly variable DFS performance underload (on ourcluster), which can slow theentire job

Solution: multiple hard disks per node?


17/17

Hardware Provisioning

Node with 4 Disks(variance typically lower)

Node with 1 Disk(high variance typical)


Approach: Compare performance of eachmachine vs. others in that trace

Statistical Anomaly detection tests

andy konwinski x-tracing hadoop

Documents