programming clusters with dryadlinq

Post on 23-Feb-2016

49 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Programming clusters with DryadLINQ . Mihai Budiu Microsoft Research, Silicon Valley HPC & GPU Supercomputing Group of Silicon Valley Dec 5 , 2011. Design Space. Grid. Internet. Data- parallel. X. Search. Shared memory. D ata center. Transaction. HPC. Latency (interactive). - PowerPoint PPT Presentation

TRANSCRIPT

Programming clusters with DryadLINQ

Mihai Budiu Microsoft Research, Silicon Valley

HPC & GPU Supercomputing Group of Silicon Valley Dec 5, 2011

2

Design Space

Throughput(batch)

Latency(interactive)

Internet

Datacenter

Data-parallel

Sharedmemory

XSearch

HPC

Grid

Transaction

3

Software Stack

Windows Server

Cluster (Windows HPC Cluster)

Storage (Distributed Storage Catalog)

Execution (Dryad)

Language (DryadLINQ)

Windows Server

Windows Server

Windows Server

Applications

Execution

Application

Data-Parallel Computation

4

Storage

Language

ParallelDatabases

Map-Reduce

GFSBigTable DSC

Dryad

DryadLINQScope

Sawzall,FlumeJava

Hadoop

HDFS

Pig, HiveSQL ≈SQL LINQ, SQLSawzall, Java

5

Dryad• Execution layer of Bing analytics• Continuously deployed since 2006• Running on >> 104 machines• Sifting through > 10Pb data daily• Runs on clusters > 3000 machines• Handles jobs with > 105 processes each• Platform for rich software ecosystem

The Dryad by Evelyn De Morgan.

6

Dryad = Execution Layer

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine≈

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

8

Virtualized 2-D Pipelines

9

Virtualized 2-D Pipelines

10

Virtualized 2-D Pipelines

11

Virtualized 2-D Pipelines

12

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

13

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

14

Dryad System Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS,Sched RE RERE

V V V

Graph manager cluster

Fault Tolerance

16

LINQ

Dryad

=> DryadLINQ

17

LINQ = .Net+ Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

18

Collections

class Collection<T> : IEnumerable<T>;

Elements of type TIterator(current element)

19

DryadLINQ Data Model

Partition

Collection

.Net objects

20

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

21

Demo

Example: counting lines

var table = PartitionedTable.Get<LineRecord>(file);int count = table.Count();

Parse, Count

Sum

Example: counting words

var table = PartitionedTable.Get<LineRecord>(file);int count = table

.SelectMany(l => l.line.Split(‘ ‘))

.Count();

Parse, SelectMany, Count

Sum

Example: counting unique wordsvar table = PartitionedTable.Get<LineRecord>(file);int count = table

.SelectMany(l => l.line.Split(‘ ‘))

.GroupBy(w => w)

.Count();

GroupBy;Count

HashPartition

Example: word histogramvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' ')) .GroupBy(w => w) .Select(g => new { word = g.Key, count = g.Count() });

GroupBy;Count

GroupByCountHashPartition

Example: high-frequency wordsvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))

.GroupBy(w => w)

.Select(g => new { word = g.Key, count = g.Count() })

.OrderByDescending(t => t.count)

.Take(100);

Sort; Take Mergesort;

Take

Example: words by frequencyvar table = PartitionedTable.Get<LineRecord>(file);var result = table.SelectMany(l => l.line.Split(' '))

.GroupBy(w => w)

.Select(g => new { word = g.Key, count = g.Count() })

.OrderByDescending(t => t.count);

Sample

Histogram

Broadcast

Range-partition

Sort

Example: Map-Reducepublic static IQueryable<S> MapReduce<T,M,K,S>( IQueryable<T> input,

Func<T, IEnumerable<M>> mapper,Func<M,K> keySelector,Func<IGrouping<K,M>,S> reducer)

{ var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}

Expectation Maximization

29

• 160 lines • 3 iterations shown

30

Probabilistic Index MapsImages

features

31

Language Summary

WhereSelectGroupByOrderByAggregateJoin

32

Using PLINQQuery

LINQ to HPC

PLINQ

Local query

33

So, What Good is it For?

34

Application: Kinect Training

35

Input device

36

Depth computation

Source: http://nuit-blanche.blogspot.com/2010/11/unsing-kinect-for-compressive-sensing.html

37

Vision Input: Depth Map + RGB

Why is it hard?

Background segmentationPlayer separation

Body Part Classifier

Skeletal Tracking Pipeline

39

Depth mapSensor

Body Part Identification

Skeleton

40

The Classifier

InputDepth map

OutputBody parts

Classifier

Runs on GPU @ 320x240

41

Motion Capture

42

Ground Truth Data

3D avatar model

43

Learn from Data

Classifier

Training examplesMachine learning

Kinect Training

Dryad

DryadLINQ

Machine learning

Decisionforest

Millions of examples needed

Cluster

45

Highly efficient parallellization

time

mac

hine

Conclusions

46

ClusterLINQDryad

46

=

top related