![Page 1: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/1.jpg)
Data-parallel Abstractionsfor
Irregular Applications
Keshav PingaliUniversity of Texas, Austin
![Page 2: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/2.jpg)
Motivation• Multicore processors are here
– but no one knows how to program them• A few domains have succeeded in exploiting parallelism
– Databases: billions of SQL queries are run in parallel everyday– Computational science
• Both these domains deal with structured data– Databases: relations– Computational science: mostly dense and sparse arrays
• “Universal” parallel computing– Unstructured data is the norm: graphs, trees, lists,…
• What can we do to make it easier for programs that manipulate unstructured data to exploit multicore parallelism?
![Page 3: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/3.jpg)
Organization of talk
• Two case studies– Delaunay mesh refinement– Agglomerative clustering Irregular programs have “generalized” data-parallelism
• Exploiting irregular data-parallelism: Galois system– Programming model– Implementation
• Experimental evaluation• Ongoing work
– Exploiting locality– Scheduling
![Page 4: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/4.jpg)
Two case studies
![Page 5: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/5.jpg)
Delaunay Mesh Refinement• Meshes useful for
– Finite element method for solving PDEs
– Graphics rendering• Delaunay meshes (2-D)
– Triangulation of a surface, given vertices
– Delaunay property: circumcircle of any triangle does not contain another point in the mesh
• In practice, want all triangles in mesh to meet certain quality constraints– (e.g.) no angle > 120°
• Mesh refinement: – fix bad triangles through
iterative refinement
![Page 6: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/6.jpg)
Refinement Algorithm
{ pick a bad triangle add new vertex at center of circumcircle gather all triangles that no longer satisfy Delaunay
property into cavity re-triangulate affected region, including new point
// some new triangles may be bad themselves }
while there are bad triangles
![Page 7: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/7.jpg)
Refinement Example
Original Mesh Refined Mesh
![Page 8: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/8.jpg)
Sequential Algorithm
Mesh m = /* read in mesh */WorkList wl;wl.add(mesh.badTriangles());while (true) { if ( wl.empty() ) break;
Element e = wl.get(); if (e no longer in mesh) continue;Cavity c = new Cavity(e);//determine new
cavityc.expand();c.retriangulate();//re-triangulate regionm.update(c);//update meshwl.add(c.badTriangles());
}
![Page 9: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/9.jpg)
Parallelization Opportunities
• Unit of work: fixing a bad triangle• Bad triangles with non-overlapping cavities can be processed in parallel.• No obvious way to tell if cavities of two bad triangles will overlap without actually building cavities
compile-time parallelization will not work
![Page 10: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/10.jpg)
Agglomerative Clustering
• Input:– Set of data points– Measure of “distance” (similarity) between them
• Output: dendrogram– Tree that exposes similarity hierarchy
• Applications:– Data mining– Graphics: lightcuts for rendering with large numbers of light sources
![Page 11: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/11.jpg)
Clustering algorithm
• Sequential algorithm: iterative – Find two closest points in data set– Cluster them in dendrogram– Replace pair in data set with a “supernode” that represents pair
• Placement of supernode: use heuristics like center of mass– Repeat until there is only one point left
![Page 12: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/12.jpg)
Key Data Structures• Priority queue:
– Elements are pairs <p,n> where• p is point in data set• n is its nearest neighbor
– Ordered by increasing distance • kdTree:
– Answers queries for nearest neighbor of a point– Convention: if there is only one point, nearest
neighbor is point at infinity (ptAtInfinity)– Similar to a binary search tree but in higher
dimensions
![Page 13: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/13.jpg)
Clustering algorithm: implementation
kdTree := new KDTree(points);pq := new PriorityQueue();for each p in points (pq.add(<p,kdTree.nearest(p)>));
while (true) do { if (pq.size() == 0) break; pair <p,n> := pq.get(); //get closest pair ………. Cluster c := new Cluster(p,n); //create supernode dendrogram.add(c); kdTree.remove(p); //update kdTree kdTree.remove(n); kdTree.add(c); Point m := kdTree.nearest(c); //update priority queue …………. pq.add(<c,m>);}
![Page 14: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/14.jpg)
Clustering algorithm: detailskdTree := new KDTree(points);pq := new PriorityQueue()for each p in points (pq.add(<p,kdTree.nearest(p)>)while (true) do { if (pq.size() == 0) break; pair <p,n> := pq.get(); if (p.isAlreadyClustered()) continue; if (n.isAlreadyClustered()) { pq.add(<p, kdTree.nearest(p)>); continue; } Cluster c := new Cluster(p,n); dendrogram.add(c); kdTree.remove(p); kdTree.remove(n); kdTree.add(c); Point m := kdTree.nearest(c); if (m!= ptAtInfinity) pq.add(<c,m>);}
![Page 15: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/15.jpg)
Parallelization Opportunities
• Natural unit of work: processing of a pair in PQ• Algorithm appears to be sequential
– pair enqueued in one iteration into PQ may be the pair dequeued in next iteration
• However, in example, <a,b> and <c,d> can be clustered in parallel
• Cost per pair in graphics app– 100K instructions, 4K floating-point operations
![Page 16: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/16.jpg)
Take-away lessons• Irregular programs have data-parallelism
– Data-parallelism has been studied in the context of arrays– For unstructured data, data-parallelism arises from work-lists of
various kinds• Delaunay mesh refinement: list of bad triangles• Agglomerative clustering: priority queue of pairs of points• Maxflow algorithms:list of active nodes
– Boykov-Kolmogorov algorithm for image segmentation– Preflow-push algorithm
• Approximate SAT solvers• …….
• Data-parallelism in irregular programs is obscured within while loops, exit conditions, etc.– Need transparent syntax similar to FOR loops for structured data-
parallelism
![Page 17: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/17.jpg)
Take-away lessons (contd.)
• Parallelism may depend on “data values”– whether or not two potential data-parallel computations conflict
may depend on input data• (e.g.) Delaunay mesh generation: depends on shape of mesh
• Optimistic parallelization is necessary in general• Compile-time approaches using points-to analysis or shape
analysis may be adequate for some cases• In general, runtime conflict-checking is needed
• Handling of conflicts depends on the application• Delaunay mesh generation: roll back all but one conflicting
computation• Agglomerative clustering: must respect priority queue order
![Page 18: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/18.jpg)
Galois programming model and implementation
![Page 19: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/19.jpg)
Beliefs underlying Galois system• Optimistic parallelism is the only general approach to
parallelizing irregular apps– Static analysis can be used to optimize optimistic execution
• Concurrency should be packaged within syntactic constructs that are natural for application programmers and obvious to compilers and runtime systems– Libraries/runtime system should manage concurrency (cf. SQL)– Application code should be sequential
• Crucial to exploit abstractions provided by object-oriented languages – in particular, distinction between abstract data type and its
implementation type• Concurrent access to shared mutable objects is
essential
![Page 20: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/20.jpg)
Components of Galois approach
1) Two syntactic constructs for packaging optimistic parallelism as iteration over sets
2) Assertions about methods in class libraries
3) Runtime system for detecting and recovering from potentially unsafe accesses by optimistic computations
![Page 21: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/21.jpg)
(1) Concurrency constructs: two set iterators
• for each e in Set S do B(e)– evaluate block B(e) for each element in set S– sequential implementation
• set elements are unordered, so no a priori order on iterations• there may be dependences between iterations
– set S may get new elements during execution• for each e in PoSet S do B(e)
– evaluate block B(e) for each element in set S– sequential implementation
• perform iterations in order specified by poSet• there may be dependences between iterations
– set S may get new elements during execution
![Page 22: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/22.jpg)
Galois version of mesh refinement
Mesh m = /* read in mesh */Set wl;wl.add(mesh.badTriangles()); // non-deterministic order
for each e in Set wl do { //unordered iterator
if (e no longer in mesh) continue;Cavity c = new Cavity(e); //determine new cavityc.expand(); //determine affected trianglesc.retriangulate(); //re-triangulate regionm.update(c); //update meshwl.add(c.badTriangles()); //add new bad triangles to
workset}
![Page 23: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/23.jpg)
Observations
• Application program has a well-defined sequential semantics– No notion of threads/locks/critical sections etc.
• Set iterators– SETL language was probably first to introduce set
iterators– However, SETL set iterators did not permit the sets
being iterated on to grow during execution, which is important for our applications
![Page 24: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/24.jpg)
Parallel computational model
• Object-based shared-memory model• Computation performed by some
number of threads• Threads can have their own local
memory• Threads must invoke methods to
access internal state of objects– mesh refinement:shared objects are
• worklist• Mesh
– agglomerative clustering• priority queue• kdTree• dendrogram
Shared Memory
Objects
![Page 25: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/25.jpg)
Parallel execution of iterators
• Master thread and some number of worker threads– master thread begins execution of program and executes code
between iterators– when it encounters iterator, worker threads help by executing
some iterations concurrently with master– threads synchronize by barrier synchronization at end of iterator
• Key technical problem– Parallel execution must respect sequential semantics of
application program• result of parallel execution must appear as though iterations were
performed in some interleaved order• for poSet iterator, this order must correspond to poSet order
– Non-trivial problem• each iteration may access mutable shared objects
![Page 26: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/26.jpg)
Implementing semantics of iterators
1. Concurrent method invocations that modify object should not step on each other (mutual exclusion)– Library writer uses locks or some other mutex mechanism– Locks acquired during method invocation and released when method
invocation ends2. Uncontrolled interleaving may violate iterator semantics
– In (a), contains?(x) must always return false but some interleavings will violate this (e.g., [add(x),contains?(x),remove(x)]
– Sometimes, interleaving is OK and is needed for concurrency• In (b) (motivated by Delaunay mesh refinement), method invocations can be
interleaved provided result of get() is not argument of add()
![Page 27: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/27.jpg)
(II) Assertions on methods
• Concurrent accesses to a mutable object by multiple threads are OK provided method invocations commute
Shared Memory
Objects
get()
add()
get()
add()
get()get()add()add()
get()get()add()add()
get()add()get()add()
![Page 28: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/28.jpg)
Assertions on methods (contd.)
• Semantic commutativity vs. concrete commutativity– for most implementations of workset, concrete data structure will
be different for these two sequences, so commutativity fails– however, at semantic level, these set operations commute
provide they operate on different set elements• Conclusion:
– semantic commutativity is crucial– class implementor must specify this information
• Commutativity of method invocations, not methods– get() commutes with add() only if element inserted by add() is
not the same as the element inserted by get()
get()get()add()add()
get()add()get()add()
?
![Page 29: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/29.jpg)
Assertions on methods (contd.)
• Updates to objects happen before iteration completes (eager commit)
• So we need a way of undoing the effect of a method invocation
• Class implementer must provide an ‘inverse’ method
• As before, semantic inverse is key, not concrete inverse
Shared Memory
m1
m3m2
![Page 30: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/30.jpg)
Example: setClass SetInterface { void add (Element x); [conflicts] - add(x) - remove(x) - contains?(x) - get() :x [inverse] remove(x) void remove(Element x); [conflicts] - add(x) - remove(x) - contains?(x) - get(): x [inverse] add(x) ………}
![Page 31: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/31.jpg)
Remarks• Commutativity information is optional
– No commutativity information for a mutable object means only one iteration can manipulate the object at a time
• Inverse method is more or less essential– for a class w/o commutativity information, inverse methods can
be implemented by data copying• Difficulty of writing specifications
– in our apps, most shared objects are collections (sets, bags, maps)
• (e.g.), kdTree is simply a set with a nearestNeighbor operation– writing specifications is quite easy
• Relationship to Abelian group axioms– commutativity, inverse, identity
![Page 32: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/32.jpg)
(III) Runtime system: commit pool• Maintains iteration record for each ongoing iteration in system• Status of iteration
– running– ready-to-commit (RTC)– aborted
• Life-cycle of iteration– thread goes to commit pool for work– commit pool
• obtains next element from iterator• assigns priority to iterator based on priority of element in set• creates an iteration record with status running
– when iteration completes• status of iteration record is set to RTC• when that record has highest priority in system, it is allowed to commit
– if commutativity conflict is detected• commit buffer arbitrates to determine which iteration(s) should be aborted• commit buffer executes undo logs of aborted iterations
• Role of commit pool is similar to that of reorder buffer in out-of-order execution microprocessors
![Page 33: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/33.jpg)
(III) Runtime system:conflict logs• Each object has a conflict log
– Contains sequence of method invocations that have been performed by ongoing iterations
• Each thread has undo log that contains sequence of inverse method invocations it must execute if it aborts
• When thread invokes method m on object O– Check if m commutes with method invocations and their inverses in
conflict log of object O– If so, add m to conflict log of object O, and m-1 to undo log of thread and
execute method– Otherwise, iteration aborts
• When thread commits iteration– Remove its invocations from conflict logs of all objects it has touched– Zero out its undo log
• Easy to extend this to support nested method invocations
![Page 34: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/34.jpg)
Experiments
![Page 35: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/35.jpg)
Experimental Setup
• Machines– 4-processor 1.5 GHz Itanium 2
• 16 KB L1, 256 KB L2, 3MB L3 cache• no shared cache between processors• Red Hat Linux
– Dual processor, dual core 3.0 GHz Xeon• 32 KB L1, 4 MB L2 cache• dual cores share L2• Red Hat Linux
![Page 36: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/36.jpg)
Delaunay mesh generation
• Workset: implemented using STL queue• Mesh: implemented as a graph
– each triangle is a node– edges in graph represent triangle adjacencies– used adjacency list representation of graph
• Input mesh:– from Shewchuck’s Triangle program– 10,156 triangles of which 4,837 were bad
![Page 37: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/37.jpg)
Code versions
• Three versions– reference: sequential version without
locks/threads/etc.– FGL: handwritten code that uses fine-grain locks on
triangles– meshgen: Galois version
• Galois work-set implementation– used STL queue first: high abort ratio
• Sequential code: 21,918 completed+0 aborted• Galois(q): 21,736 completed+28,290 aborted
– replaced queue with array+random choice• Galois(r): 21,908 completed+49 aborted
![Page 38: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/38.jpg)
Results
![Page 39: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/39.jpg)
Performance Breakdown
*4 processor numbers are summed over all processors
![Page 40: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/40.jpg)
Agglomerative clustering
• Two versions– reference: sequential version w/o locks/threads– treebuild: Galois version
• Data structures– priority queue– kd-tree– dendrogram
• Data set– from graphics scene with roughly 50,000 light sources
![Page 41: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/41.jpg)
Speedups
• sequential version is best on 1 processor
• self-relative speed-up of almost 2.75 on 4 processors
![Page 42: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/42.jpg)
Abort ratios and CPI
• Sequential and treebuild perform almost same number of instructions
• As before, cycles/instruction (CPI) is higher for treebuild mainly because of L3 cache misses– mainly from kdTree
Committediterations
Abortediterations
1 proc 57486 n/a
4 proc 57861 2528
![Page 43: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/43.jpg)
Degree of speculation
• Measured number of iterations ready to commit (RTC) whenever commit pool creates/aborts/commits an iteration
• Histogram shown above– X-axis in figure is truncated to show detail near origin– maximum number of RTC iterations is 120
• Most of the time, we do not need to speculate too deeply to keep 4 threads busy
– but on occasion, we do need to speculate deeply
![Page 44: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/44.jpg)
Take-away points• Support for ordering speculative computations is
very useful for some apps– hard to do agglomerative clustering otherwise
• May need to speculate deeply in some apps• Domain-specific information is very useful for
proper scheduling– workset implementation made a huge difference in
performance– will probably need to provide hooks for user to specify
scheduling policy• Reducing cache traffic is important to improve
performance further
![Page 45: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/45.jpg)
Ongoing work
![Page 46: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/46.jpg)
Improving Performance
• Locality enhancement– Galois approach can expose data-parallelism in
irregular applications– Scalable exploitation of parallelism requires attending
to locality• Specifying scheduling strategies
– Delaunay mesh refinement example shows that scheduling of iterations can be critical to lower abort ratios
– needed domain knowledge to fix problem
![Page 47: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/47.jpg)
Galois methodology
• How easy is it to specify commutativity of method invocations?– How important is the distinction between
semantic and concrete commutativity?• How easy is it to write inverse methods?• Given a specification of the ADT, can we
check commutativity and inverse directives?
![Page 48: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/48.jpg)
Benchmarks
• Existing benchmarks are useless– Wirth: Program = Algorithm + Data structure– current benchmarks are programs– we need algorithms and data structures
• experience with Delaunay mesh generation & STL queue
– variety of input data sets to illustrate range of behavior
![Page 49: Data-parallel Abstractions for Irregular Applications](https://reader035.vdocuments.mx/reader035/viewer/2022062814/56816790550346895ddcc065/html5/thumbnails/49.jpg)
• Irregular programs have data-parallelism – Work-list based iterative algorithms over irregular data structures
• Data-parallelism may be inherently data-dependent– Pointer/shape analysis cannot work for these apps
• Optimistic parallelization is essential for such apps– Analysis might be useful to optimize parallel program execution
• Exploiting abstractions provided by OO is critical– Only CS people still worry about F77 and C anyway….
• Exploiting high-level semantic information about programs is critical– Galois knows about sets and ordered sets– Commutativity information is crucial
• Support for ordering speculative computations important • Concurrent access to mutable objects is important• Benchmark programs are bad
– Programs – Algorithms+data structures
Conclusions