plinq: a query language for data parallel programming joe duffy, microsoft declarative aspects of...

20
PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007, Microsoft, Corp. All rights reserved.

Upload: moses-morton

Post on 26-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

PLINQ: A Query Language for Data Parallel Programming

Joe Duffy, MicrosoftDeclarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07

© 2007, Microsoft, Corp. All rights reserved.

Page 2: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Research Context (1) New Microsoft technology, Language Integrated Query

(LINQ) Primary goals:

Data-source agnostic, type-safe query language Simplify expression of complex, multi-step operations over sets of

data Why?

Many programs contain textual (untyped) SQL, XPath, XQuery, … Programs must deal with increasingly larger quantities of data

Hardware: memory and hard disk capacities continue to grow GBTBPB…

Industry software: rich media, interactive visualizations, AI, NLP Databases are ubiquitous, but aren’t always the solution

Features SQL-like relational algebra language syntax and libraries Supports queries over in-memory collections, XML, and RDBMS’s

In Microsoft’s “developer division” Entity responsible for Visual Studio, Visual C#, Basic, and C++ Releasing as part of Visual Studio 2007

New Microsoft technology, Language Integrated Query (LINQ) Primary goals:

Data-source agnostic, type-safe query language Simplify expression of complex, multi-step operations over sets of

data Why?

Many programs contain textual (untyped) SQL, XPath, XQuery, … Programs must deal with increasingly larger quantities of data

Hardware: memory and hard disk capacities continue to grow GBTBPB…

Industry software: rich media, interactive visualizations, AI, NLP Databases are ubiquitous, but aren’t always the solution

Features SQL-like relational algebra language syntax and libraries Supports queries over in-memory collections, XML, and RDBMS’s

In Microsoft’s “developer division” Entity responsible for Visual Studio, Visual C#, Basic, and C++ Releasing as part of Visual Studio 2007

© 2007, Microsoft, Corp. All rights reserved.

2

Page 3: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Research Context (2) This talk describes extensions to LINQ to

accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ) Goals:

Apply data parallelism to LINQ query execution Preserve LINQ programming model, little to no req’d interface

changes Deal efficiently with composition and nesting of query

operators Audience: developers on Microsoft’s .NET platform, C#,

VB, and VC++ Architectures: those running Windows – mostly MIMD (w/

SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel

Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others

This talk describes extensions to LINQ to accomplish parallel queries execution, i.e. Parallel LINQ (PLINQ) Goals:

Apply data parallelism to LINQ query execution Preserve LINQ programming model, little to no req’d interface

changes Deal efficiently with composition and nesting of query

operators Audience: developers on Microsoft’s .NET platform, C#,

VB, and VC++ Architectures: those running Windows – mostly MIMD (w/

SIMD/vector extensions), multi-core machines in the range of 2…64 processors, typically AMD or Intel

Mostly an application of other techniques: RDBMS parallel query execution (Volcano, SQL Server, Oracle), NESL, GpH,, and others

© 2007, Microsoft, Corp. All rights reserved.

3

Page 4: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Syntax: A Query Language

var q = from x1 in y

join x2 in z on x2.fA equals x1.fA

where p(x2.fB)

orderby x1.fC

select new { x1.fA, x2.fB, x1.fC };

int r = q.Sum(a => a.fB*a.fC);

© 2007, Microsoft, Corp. All rights reserved.

4

Page 5: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Queries == Trees of Operators A query is comprised of a tree of operators

Most operators operate on a stream t of type T* and produce a (lazily, on-demand generated) stream u of type U*, i.e. T* U*

var q = from x in A where (x % seed) == 0 select x/0.33f; Many operators are unary: forming a stream, but others are binary, i.e.

a tree

Some operators “terminate” the stream by reducing to a non-stream, i.e. T*U

float s = q.Sum();

As with a program AST, these trees can be analyzed, rewritten

Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail

This is why we can safely introduce parallelism

A query is comprised of a tree of operators Most operators operate on a stream t of type T* and produce a (lazily,

on-demand generated) stream u of type U*, i.e. T* U* var q = from x in A where (x % seed) == 0 select x/0.33f;

Many operators are unary: forming a stream, but others are binary, i.e. a tree

Some operators “terminate” the stream by reducing to a non-stream, i.e. T*U

float s = q.Sum();

As with a program AST, these trees can be analyzed, rewritten

Declarative, data-intensive, and bulk transformation nature means execution technique == implementation detail

This is why we can safely introduce parallelism© 2007, Microsoft, Corp. All rights

reserved.5

Where

Select

Where

Join

Page 6: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Declaring Queries Queries are expressed with one of two mechanisms

“Query comprehensions” – syntax extensions to Visual C#, VB to build a query

Calling query APIs directly The former is transformed into the latter by compilers, e.g.

var q = from x in Y where p(x) orderby x.f1 select x.f2;

becomes…

var q = Enumerable.Select(Enumerable.OrderBy( Enumerable.Where(Y, x => p(x)), x => x.f1), x.f2);

Comprehensions allow query declaration “left-to-right” instead of “inside out” But supports only a subset of query operators right now My hope is that one day this restriction is gone

To obtain results from a query, execution must be forced, e.g.

foreach (T e in q) a(e);or T[] results = q.ToArray(); etc…

Queries are expressed with one of two mechanisms “Query comprehensions” – syntax extensions to Visual C#, VB to build a

query Calling query APIs directly

The former is transformed into the latter by compilers, e.g.

var q = from x in Y where p(x) orderby x.f1 select x.f2;

becomes…

var q = Enumerable.Select(Enumerable.OrderBy( Enumerable.Where(Y, x => p(x)), x => x.f1), x.f2);

Comprehensions allow query declaration “left-to-right” instead of “inside out” But supports only a subset of query operators right now My hope is that one day this restriction is gone

To obtain results from a query, execution must be forced, e.g.

foreach (T e in q) a(e);or T[] results = q.ToArray(); etc…

© 2007, Microsoft, Corp. All rights reserved.

6

Page 7: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Input to an operator is any data sequence expressible in the CLR’s type system Input to one query operator is often the output of a child

operator Leaves: Arrays, vectors, sets, trees, infinite streams of data

Non-linear data types are flattened for execution, but presented to the programmer in original form

This works because all sequences unified by a common .NET interface, IEnumerable<T> (standard enumerator, e.g. MoveNext, Current), i.e. T*

Query evaluation is mostly lazy; we can get the “first” U from o without forcing complete calculation of the input var q = from x in infiniteStream where p(x); Much like many streaming/vector processing systems Some exceptions:

a Sort needs to evaluate its whole subtree before producing one item a Join evaluates one of its subtrees fully And so on…

Input to an operator is any data sequence expressible in the CLR’s type system Input to one query operator is often the output of a child

operator Leaves: Arrays, vectors, sets, trees, infinite streams of data

Non-linear data types are flattened for execution, but presented to the programmer in original form

This works because all sequences unified by a common .NET interface, IEnumerable<T> (standard enumerator, e.g. MoveNext, Current), i.e. T*

Query evaluation is mostly lazy; we can get the “first” U from o without forcing complete calculation of the input var q = from x in infiniteStream where p(x); Much like many streaming/vector processing systems Some exceptions:

a Sort needs to evaluate its whole subtree before producing one item a Join evaluates one of its subtrees fully And so on…

Query Inputs and Outputs

© 2007, Microsoft, Corp. All rights reserved.

7

Page 8: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

C# Query Comprehension Syntax

© 2007, Microsoft, Corp. All rights reserved.

8

expr ::= … | query-exprquery-expr ::= from-clause query-bodyfrom-clause ::= ‘from’ itemNameExpr ‘in’ srcExprquery-body ::= join-clause*

(from-clause join-clause* | let-clause | where-clause)*orderby-clause?(select-clause | groupby-clause)query-continuation

join-clause ::=‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2(‘into’ itemNameExpr)?

let-clause ::= ‘let’ itemNameExpr ‘=’ selExprwhere-clause ::= ‘where’ predExprorderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ |

‘descending’)?)*select-clause ::= ‘select’ selExprgroupby-clause ::= ‘group’ selExpr ‘by’ keyExprquery-continuation ::= ‘into’ itemNameExpr ‘query-body’

expr ::= … | query-exprquery-expr ::= from-clause query-bodyfrom-clause ::= ‘from’ itemNameExpr ‘in’ srcExprquery-body ::= join-clause*

(from-clause join-clause* | let-clause | where-clause)*orderby-clause?(select-clause | groupby-clause)query-continuation

join-clause ::=‘join’ itemNameExpr ‘in’ srcExpr ‘on’ keyExpr1 ‘equals’ keyExpr2(‘into’ itemNameExpr)?

let-clause ::= ‘let’ itemNameExpr ‘=’ selExprwhere-clause ::= ‘where’ predExprorderby-clause ::= ‘orderby’ (keyExpr (‘ascending’ |

‘descending’)?)*select-clause ::= ‘select’ selExprgroupby-clause ::= ‘group’ selExpr ‘by’ keyExprquery-continuation ::= ‘into’ itemNameExpr ‘query-body’

Page 9: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Common Query Operators Binding operators, used to express operations on abstract

elements Bind: from x in A – bind variable x to a single element e in the data

source A, one at a time, so that x may be referenced in the query text Cross product bind: from x in A from y in B – create the relational

cross-product, A × B, binding x and y to members of the resulting pairs (x, y)

Let bind: let x = e – bind variable x to the result of evaluating expression e

General operators, to perform relational operations Selection: where p – for each element e of type T, yield only those for

which the selection predicate, p(e), of form T bool evaluates to true Sort: orderby k (ascending | descending)? – order the elements of

type T ascending or descending based on keys generated with the key-selection function k, of form T K

Map: select p – transform each element e from type T to U via the projection function, p(e), of form T U

Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”)

Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T K, is equal for all e in the group

Binding operators, used to express operations on abstract elements Bind: from x in A – bind variable x to a single element e in the data

source A, one at a time, so that x may be referenced in the query text Cross product bind: from x in A from y in B – create the relational

cross-product, A × B, binding x and y to members of the resulting pairs (x, y)

Let bind: let x = e – bind variable x to the result of evaluating expression e

General operators, to perform relational operations Selection: where p – for each element e of type T, yield only those for

which the selection predicate, p(e), of form T bool evaluates to true Sort: orderby k (ascending | descending)? – order the elements of

type T ascending or descending based on keys generated with the key-selection function k, of form T K

Map: select p – transform each element e from type T to U via the projection function, p(e), of form T U

Equi-join: join y in B on k1 equals k2 (into z)? – for each pair of elements (x, y) in the cross-product of the “left” input A and the “right” input B, for which k1(x) == k2(y), bind the result to y (or z if specified—“group join”)

Grouping: group p by k – yield groupings of data, of type (K, T*) for which k(e), of the form T K, is equal for all e in the group

© 2007, Microsoft, Corp. All rights reserved.

9

Page 10: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Some Example Queries

© 2007, Microsoft, Corp. All rights reserved.

10

Word counts:string doc = …;

var counts = from w in doc.Split(' ') group w by w;

Weighted average:float[] D = …, W = …;

float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum();

“Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”:

Set<Customer> custs = …;

Set<Order> ords = …;

Set<Address> addrs = …;

var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State;

Word counts:string doc = …;

var counts = from w in doc.Split(' ') group w by w;

Weighted average:float[] D = …, W = …;

float avg = D.ZipWith(W, (x,y) => x*y).Sum() / W.Sum();

“Select customers whose billing address is in Washington in the United States, or whose cumulative order total is >= $25 USD; order them by total $ descending, group them by state, and project just their name and total”:

Set<Customer> custs = …;

Set<Order> ords = …;

Set<Address> addrs = …;

var q = from c in custs join o in ords on o.CustomerID equals c.ID into co join a in addrs on a.AddressID equals o.BillingID let ordTotal = co.Sum(o => o.TotalCost) where (a.State == "WA" && a.Country == "United States") || ordTotal >= $25.00 orderby ordTotal descending group new {c.LastName,c.FirstName,ordTotal} by a.State;

Page 11: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Additional Query Operators

© 2007, Microsoft, Corp. All rights reserved.

11

Some have no syntactic representation and must be accessed w/ library calls: ForAll(A, a): invoke side effecting operation a(x) for each

element x in A Concat(A, B): linearly concatenate the data inputs A and B Zip(A, B): combine two inputs A and B into pairs by overlaying

data Reverse(A): reverse the ordering of elements in vector A Range(x, y): generate a stream representing the range [x, y) Set operators: Distinct(A), Union(A, B), Intersect(A, B) Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e)

Some have no syntactic representation and must be accessed w/ library calls: ForAll(A, a): invoke side effecting operation a(x) for each

element x in A Concat(A, B): linearly concatenate the data inputs A and B Zip(A, B): combine two inputs A and B into pairs by overlaying

data Reverse(A): reverse the ordering of elements in vector A Range(x, y): generate a stream representing the range [x, y) Set operators: Distinct(A), Union(A, B), Intersect(A, B) Reductions (a.k.a. aggregations, folds): Aggregate(A, binOp), Count(A), Sum(A), Min(A), Max(A), Average(A), EqualAll(A, B), Any(A, p), All(A, p), Contains(A, e)

Page 12: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Runtime: Parallel Execution

© 2007, Microsoft, Corp. All rights reserved.

12

Page 13: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Operator Parallelism Intra-operator, i.e. partitioning:

Input to a single operator is “split” into p pieces and run in parallel Adjacent and nested operators can enjoy fusion Good temporal locality of data – each datum “belongs” to a partition

Inter-operator, i.e. pipelining Operators run concurrently with respect to one another Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/

partitioning Typically incurs more synchronization overhead and yields

considerably worse locality than intra-operator parallelism, so is less attractive

Partitioning is preferred unless there is no other choice For example, sometimes the programmer wants a single-CPU view,

e.g.:foreach (x in q) a(x)

Consumption action a for might be written to assume no parallelism Bad if a(x) costs more than the element production latency

Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills

But a(x) can be parallel too

Intra-operator, i.e. partitioning: Input to a single operator is “split” into p pieces and run in parallel Adjacent and nested operators can enjoy fusion Good temporal locality of data – each datum “belongs” to a partition

Inter-operator, i.e. pipelining Operators run concurrently with respect to one another Can avoid “data skew”, i.e. imbalanced partitions, as can occur w/

partitioning Typically incurs more synchronization overhead and yields

considerably worse locality than intra-operator parallelism, so is less attractive

Partitioning is preferred unless there is no other choice For example, sometimes the programmer wants a single-CPU view,

e.g.:foreach (x in q) a(x)

Consumption action a for might be written to assume no parallelism Bad if a(x) costs more than the element production latency

Otherwise, parallel tasks just eat up memory, eventually stopping when the bounded buffer fills

But a(x) can be parallel too

© 2007, Microsoft, Corp. All rights reserved.

13

Page 14: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

q = from x in A where p(x) select x3; Intra-operator:

Inter-operator:

Both composed:

q = from x in A where p(x) select x3; Intra-operator:

Inter-operator:

Both composed:

… Thread 4 …

… Thread 3 …

… Thread 2 …

… Thread 1 …

Parallelism Illustrations

© 2007, Microsoft, Corp. All rights reserved.

14

where p(x) select x3

Awhere p(x) select x3

… Thread 2 …… Thread 1 …

A where p(x) select x3

… Thread 2 …

… Thread 1 …

where p(x) select x3

Awhere p(x) select x3

Page 15: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Deciding Parallel Execution Strategy Tree analysis informs decision making:

Where to introduce parallelism? And what kind? (partition vs. pipeline) Based on intrinsic query properties and operator costs

Data sizes, selectivity (for filter f, what % satisfies the predicate?)

Intelligent “guesses”, code analysis, adaptive feedback over time

But not just parallelism, higher level optimizations too, e.g. Common sub-expression elimination, e.g.

from x in X where p(f(x)) select f(x); Reordering operations to:

Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around

Achieve better operator fusion, reducing synchronization cost

Tree analysis informs decision making: Where to introduce parallelism? And what kind? (partition vs. pipeline) Based on intrinsic query properties and operator costs

Data sizes, selectivity (for filter f, what % satisfies the predicate?)

Intelligent “guesses”, code analysis, adaptive feedback over time

But not just parallelism, higher level optimizations too, e.g. Common sub-expression elimination, e.g.

from x in X where p(f(x)) select f(x); Reordering operations to:

Decrease cost of query execution, e.g. put a filter before the sort, even if the user wrote it the other way around

Achieve better operator fusion, reducing synchronization cost

© 2007, Microsoft, Corp. All rights reserved.

15

Page 16: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Partitioning Techniques Partitioning can be data-source sensitive

If a nested query, can fuse existing partitions If an array, calculate strides and contiguous ranges (+spatial locality) If a (possibly infinite) stream, lazily hand out chunks

Partitioning can be operator sensitive E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into

O(n+m) Build hash table out of one data source; then probe it for matches Only works if all data elements in data source A with key k are in the same

partition as those elements in data source B also with key k We can use “hash partitioning” to accomplish this: for p partitions,

calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p

Output of sort: we can fuse, but restrict ordering, ordinal and key based

Existing partitions might be repartitioned Can’t “push down” key partitioning information to leaves: types

changed during stream data flow, e.g. select operator Nesting: join processing output of another join operator Or just to combat partition skew

Partitioning can be data-source sensitive If a nested query, can fuse existing partitions If an array, calculate strides and contiguous ranges (+spatial locality) If a (possibly infinite) stream, lazily hand out chunks

Partitioning can be operator sensitive E.g. equi-joins employ a hashtable to turn an O(nm) “nested join” into

O(n+m) Build hash table out of one data source; then probe it for matches Only works if all data elements in data source A with key k are in the same

partition as those elements in data source B also with key k We can use “hash partitioning” to accomplish this: for p partitions,

calculate k for each element e in A and in B, and then assign to partition based on key, e.g. k.GetHashCode() % p

Output of sort: we can fuse, but restrict ordering, ordinal and key based

Existing partitions might be repartitioned Can’t “push down” key partitioning information to leaves: types

changed during stream data flow, e.g. select operator Nesting: join processing output of another join operator Or just to combat partition skew

© 2007, Microsoft, Corp. All rights reserved.

16

Page 17: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Example: Query Nesting and Fusion

© 2007, Microsoft, Corp. All rights reserved.

17

Nesting queries inside of others is common We can fuse partitions

var q1 = from x in A select x*2; var q2 = q1.Sum();

Nesting queries inside of others is common We can fuse partitions

var q1 = from x in A select x*2; var q2 = q1.Sum();

sele

ct x

*2

sele

ct x

*2

+ +

+

sele

ct x

*2

sele

ct

x*2

+ +

+

I. Select (alone)

2. Sum (alone)

3. Select + Sum

Page 18: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Execution of Work

© 2007, Microsoft, Corp. All rights reserved.

18

Windows’ finest granularity of work is a thread Each partition has at most one thread assigned to it, assigned via

a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm

Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better

Hard to predict things like IO and blocking Developer still has shared memory, can make horrible

mistakes, e.g.:

int s_x = 0;var q = from x in A where x == s_x++;

Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…))

C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope)

Where is transactional memory when you need it?

Windows’ finest granularity of work is a thread Each partition has at most one thread assigned to it, assigned via

a gang scheduling//dynamic work stealing-like (a la Cilk) algorithm

Tension between creating “just the right number of threads” (static+dynamic adaptivity) versus over partitioning work: would change some things, but maybe for the better

Hard to predict things like IO and blocking Developer still has shared memory, can make horrible

mistakes, e.g.:

int s_x = 0;var q = from x in A where x == s_x++;

Analysis can sometimes catch this, but often not (dynamic function invocation, e.g. where x == side_effecting_func(…))

C#, and generally the CLR’s, type system doesn’t support the notion of purity (though some research systems, e.g. Spec#, provide hope)

Where is transactional memory when you need it?

Page 19: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

Some Conclusions & Observations

© 2007, Microsoft, Corp. All rights reserved.

19

Results have been encouraging: about what you’d expect given prior related research Good performance, few changes required to the serial programming model Given the upcoming public release of LINQ in VS, we hope reach will be good Not a silver bullet – just one tool in a developer’s belt

Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators

var q = Range(0,100).Sum(); // add up #s [0,100) Also easy to run into memory bottlenecks, possible opportunities for

architecture-aware optimizations (we already try to maximize spatial+temporal locality)

Costs are hard to get right Too much dynamism in the platform to arrive at a correct # Even if we did, hard to create heuristics that scale well across platforms Too much decomposition, too little, unexpected IO (paging, …), synchronization

But in the end: do costs really matter? Or is it better to represent concurrency using a fixed granule and let another

scheduling mechanism apply policy (work stealing)? Many queries are candidates for SIMD/vector architectures

Targeting other instruction sets (SSEx, GPU) could be profitable

Results have been encouraging: about what you’d expect given prior related research Good performance, few changes required to the serial programming model Given the upcoming public release of LINQ in VS, we hope reach will be good Not a silver bullet – just one tool in a developer’s belt

Hard to “catch up” to huge parallelism constants on Windows, particularly given small data inputs and/or inexpensive operators

var q = Range(0,100).Sum(); // add up #s [0,100) Also easy to run into memory bottlenecks, possible opportunities for

architecture-aware optimizations (we already try to maximize spatial+temporal locality)

Costs are hard to get right Too much dynamism in the platform to arrive at a correct # Even if we did, hard to create heuristics that scale well across platforms Too much decomposition, too little, unexpected IO (paging, …), synchronization

But in the end: do costs really matter? Or is it better to represent concurrency using a fixed granule and let another

scheduling mechanism apply policy (work stealing)? Many queries are candidates for SIMD/vector architectures

Targeting other instruction sets (SSEx, GPU) could be profitable

Page 20: PLINQ: A Query Language for Data Parallel Programming Joe Duffy, Microsoft Declarative Aspects of Multicore Programming (DAMP) Workshop – POPL’07 © 2007,

The End

© 2007, Microsoft, Corp. All rights reserved.

20

No paper yet – tentative plans for ’07 Public release dates TBD; for more

information, watch: http://www.bluebytesoftware.com/blog/

Thanks for coming …

No paper yet – tentative plans for ’07 Public release dates TBD; for more

information, watch: http://www.bluebytesoftware.com/blog/

Thanks for coming …