eecs 583 – class 17 research topic 1 decoupled software pipelining university of michigan november...
TRANSCRIPT
![Page 1: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/1.jpg)
EECS 583 – Class 17Research Topic 1Decoupled Software Pipelining
University of Michigan
November 9, 2011
![Page 2: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/2.jpg)
- 2 -
Announcements + Reading Material 2nd paper review due today
» Should have submitted to andrew.eecs.umich.edu:/y/submit
Next Monday – Midterm exam in class Today’s class reading
» “Automatic Thread Extraction with Decoupled Software Pipelining,” G. Ottoni, R. Rangan, A. Stoler, and D. I. August, Proceedings of the 38th IEEE/ACM International Symposium on Microarchitecture, Nov. 2005.
Next class reading (Wednes Nov 16)» “Spice: Speculative Parallel Iteration Chunk Execution,” E.
Raman, N. Vachharajani, R. Rangan, and D. I. August, Proc 2008 Intl. Symposium on Code Generation and Optimization, April 2008.
![Page 3: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/3.jpg)
- 3 -
Midterm Exam When: Monday, Nov 14, 2011, 10:40-12:30 Where
» 1005 EECS Uniquenames starting with A-H go here
» 3150 Dow (our classroom) Uniquenames starting with I-Z go here
What to expect» Open book/notes, no laptops
» Apply techniques we discussed in class on examples
» Reason about solving compiler problems – why things are done
» A couple of thinking problems
» No LLVM code
» Reasonably long but you should finish
Last 2 years exams are posted on the course website» Note – Past exams may not accurately predict future exams!!
![Page 4: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/4.jpg)
- 4 -
Midterm Exam Office hours between now and Monday if you have questions
» Daya: Thurs and Fri 3-5pm
» Scott: Wednes 4:30-5:30, Fri 4:30-5:30
Studying» Yes, you should study even though its open notes
Lots of material that you have likely forgotten Refresh your memories No memorization required, but you need to be familiar with the material to
finish the exam
» Go through lecture notes, especially the examples!
» If you are confused on a topic, go through the reading
» If still confused, come talk to me or Daya
» Go through the practice exams as the final step
![Page 5: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/5.jpg)
- 5 -
Exam Topics Control flow analysis
» Control flow graphs, Dom/pdom, Loop detection
» Trace selection, superblocks
Predicated execution» Control dependence analysis, if-conversion, hyperblocks
» Can ignore control height reduction
Dataflow analysis» Liveness, reaching defs, DU/UD chains, available defs/exprs
» Static single assignment
Optimizations» Classical: Dead code elim, constant/copy prop, CSE, LICM, induction
variable strength reduction
» ILP optimizations - unrolling, renaming, tree height reduction, induction/accumulator expansion
» Speculative optimization – like HW 1
![Page 6: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/6.jpg)
- 6 -
Exam Topics - Continued Acyclic scheduling
» Dependence graphs, Estart/Lstart/Slack, list scheduling
» Code motion across branches, speculation, exceptions
Software pipelining» DSA form, ResMII, RecMII, modulo scheduling
» Make sure you can modulo schedule a loop!
» Execution control with LC, ESC
Register allocation» Live ranges, graph coloring
Research topics» Can ignore these
![Page 7: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/7.jpg)
- 7 -
Last Class
Scientific codes – Successful parallelization» KAP, SUIF, Parascope, gcc w/ Graphite
» Affine array dependence analysis
» DOALL parallelization
C programs» Not dominated by array accesses – classic
parallelization fails
» Speculative parallelization – Hydra, Stampede, Speculative multithreading Profiling to identify statistical DOALL loops But not all loops DOALL, outer loops typically not!!
This class – Parallelizing loops with dependences
![Page 8: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/8.jpg)
- 8 -
What About Non-Scientific Codes???
for(i=1; i<=N; i++) // C a[i] = a[i] + 1; // X
while(ptr = ptr->next) // LD ptr->val = ptr->val + 1; // X
Scientific Codes (FORTRAN-like) General-purpose Codes (legacy C/C++)
0
1
2
3
4
5
LD:1
X:1 LD:2
X:2
LD:4
X:4
LD:3
X:3
LD:5
X:5 LD:6
Cyclic Multithreading
(CMT)
Example: DOACROSS
[Cytron, ICPP 86]
Independent Multithreading (IMT)
Example: DOALL
parallelization
0
1
2
3
4
5
C:1
X:1
C:2
X:2
C:4
X:4
C:3
X:3
C:5
X:5
C:6
X:6
Core 1
Core 2
Core 1
Core 2
![Page 9: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/9.jpg)
- 9 -
Alternative Parallelization Approaches
0
1
2
3
4
5
LD:1
X:1 LD:2
X:2
LD:4
X:4
LD:3
X:3
LD:5
X:5 LD:6
Core 1
Core 2
while(ptr = ptr->next) // LD ptr->val = ptr->val + 1; // X
0
1
2
3
4
5
LD:1
LD:2 X:1
X:2
X:3
X:4
LD:3
LD:4
LD:5
LD:6 X:5
Core 1
Core 2Pipelined
Multithreading (PMT)
Example: DSWP[PACT 2004]
Cyclic Multithreadi
ng(CMT)
![Page 10: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/10.jpg)
- 10 -
Comparison: IMT, PMT, CMT
0
1
2
3
4
5
C:1
X:1
C:2
X:2
C:4
X:4
C:3
X:3
C:5
X:5
C:6
X:6
Core 1
Core 2 0
1
2
3
4
5
LD:1
X:1 LD:2
X:2
LD:4
X:4
LD:3
X:3
LD:5
X:5 LD:6
Core 1
Core 2
CMTIMT
0
1
2
3
4
5
LD:1
LD:2 X:1
X:2
X:3
X:4
LD:3
LD:4
LD:5
LD:6 X:5
Core 1
Core 2
PMT
![Page 11: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/11.jpg)
- 11 -
Comparison: IMT, PMT, CMT
0
1
2
3
4
5
C:1
X:1
C:2
X:2
C:4
X:4
C:3
X:3
C:5
X:5
C:6
X:6
Core 1
Core 2
IMT
1 iter/cycle
0
1
2
3
4
5
LD:1
LD:2 X:1
X:2
X:3
X:4
LD:3
LD:4
LD:5
LD:6 X:5
Core 1
Core 2
PMT
1 iter/cyclelat(comm) = 1:
0
1
2
3
4
5
LD:1
X:1 LD:2
X:2
LD:4
X:4
LD:3
X:3
LD:5
X:5 LD:6
Core 1
Core 2
CMT
1 iter/cycle
![Page 12: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/12.jpg)
- 12 -
Comparison: IMT, PMT, CMT
0
1
2
3
4
5
C:1
X:1
C:2
X:2
C:4
X:4
C:3
X:3
C:5
X:5
C:6
X:6
Core 1
Core 2
IMT
0
1
2
3
4
5
LD:1
LD:2
X:1
X:2
X:3
X:4
LD:3
LD:4
LD:5
LD:6
Core 1
Core 2
PMT
1 iter/cyclelat(comm) = 1: 1 iter/cycle1 iter/cycle1 iter/cyclelat(comm) = 2: 0.5 iter/cycle1 iter/cycle
0
1
2
3
4
5
LD:1
X:1
LD:2
X:2
LD:3
X:3
Core 1
Core 2
CMT
![Page 13: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/13.jpg)
- 13 -
Comparison: IMT, PMT, CMT
0
1
2
3
4
5
C:1
X:1
C:2
X:2
C:4
X:4
C:3
X:3
C:5
X:5
C:6
X:6
Core 1 Core 2
IMT
0
1
2
3
4
5
LD:1
LD:2
X:1
X:2
X:3
X:4
LD:3
LD:4
LD:5
LD:6
Core 1 Core 2
PMT
0
1
2
3
4
5
LD:1
X:1
LD:2
X:2
LD:3
X:3
Core 1 Core 2
CMT
Cross-thread Dependences Wide Applicability
Thread-local Recurrences Fast Execution
![Page 14: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/14.jpg)
- 14 -
Our Objective: Automatic Extraction of Our Objective: Automatic Extraction of Pipeline Parallelism using DSWPPipeline Parallelism using DSWP
FindEnglish
Sentences
ParseSentences
(95%)
EmitResults
Decoupled Software Pipelining PS-DSWP (Spec DOALL Middle Stage)
197.parser
![Page 15: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/15.jpg)
Decoupled Software Pipelining
![Page 16: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/16.jpg)
- 16 -
Decoupled Software Pipelining (DSWP)
A: while(node) B: ncost = doit(node);C: cost += ncost;D: node = node->next;
Inter-thread communication latency is a one-time cost
intra-iteration
loop-carried
register
control
communication queue
[MICRO 2005]
DependenceGraph
DAGSCCThread 1 Thread 2
D
B
C
A
A D
B
C
A
DB
C
![Page 17: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/17.jpg)
- 17 -
Implementing DSWPL1:
Aux:
DFG
intra-iteration
loop-carried
register
memory
control
![Page 18: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/18.jpg)
- 18 -
Optimization: Node SplittingTo Eliminate Cross Thread Control
L1
L2
intra-iteration
loop-carried
register
memory
control
![Page 19: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/19.jpg)
- 19 -
Optimization: Node Splitting To Reduce Communication L1
L2
intra-iteration
loop-carried
register
memory
control
![Page 20: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/20.jpg)
- 20 -
Constraint: Strongly Connected Components
Solution: DAGSCC
Consider:
intra-iteration
loop-carried
register
memory
control
Eliminates pipelined/decoupled property
![Page 21: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/21.jpg)
- 21 -
2 Extensions to the Basic Transformation
Speculation» Break statistically unlikely dependences
» Form better-balanced pipelines
Parallel Stages» Execute multiple copies of certain “large” stages
» Stages that contain inner loops perfect candidates
![Page 22: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/22.jpg)
- 22 -
Why Speculation?
A: while(node) B: ncost = doit(node);C: cost += ncost;D: node = node->next;
DependenceGraph D
B
C
A
DAGSCC A D
B
C
intra-iteration
loop-carried
register
control
communication queue
![Page 23: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/23.jpg)
- 23 -
Why Speculation?
A: while(cost < T && node) B: ncost = doit(node);C: cost += ncost;D: node = node->next;
DependenceGraph D
B
C
A
DAGSCC A D
B
C
A B C D
PredictableDependenc
es
intra-iteration
loop-carried
register
control
communication queue
![Page 24: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/24.jpg)
- 24 -
Why Speculation?
A: while(cost < T && node) B: ncost = doit(node);C: cost += ncost;D: node = node->next;
DependenceGraph D
B
C
A
DAGSCC D
B
C
PredictableDependenc
es
Aintra-iteration
loop-carried
register
control
communication queue
![Page 25: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/25.jpg)
- 25 -
Execution Paradigm
Misspeculationdetected
DAGSCC D
B
C
A
Misspeculation RecoveryRerun Iteration 4
intra-iteration
loop-carried
register
control
communication queue
![Page 26: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/26.jpg)
- 26 -
Understanding PMT Performance
0
1
2
3
4
5
A:1
A:2 B:1
B:2
B:3
B:4
A:3
A:4
A:5
A:6 B:5
Core 1
Core 2 0
1
2
3
4
5
A:1B:1
C:1
C:3
A:2B:2
A:3B:3
Core 1
Core 2
Idle
T
ime
1 cycle/iterSlowest thread:
Iteration Rate:1 iter/cycle
2 cycle/iter
0.5 iter/cycle
)max( itT
1. Rate ti is at least as large as the longest dependence recurrence.
2. NP-hard to find longest recurrence.
3. Large loops make problem difficult in practice.
![Page 27: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/27.jpg)
- 27 -
Selecting Dependences To Speculate
A: while(cost < T && node) B: ncost = doit(node);C: cost += ncost;D: node = node->next;
DependenceGraph D
B
C
A
DAGSCC D
B
C
A
Thread 1
Thread 2
Thread 3
Thread 4intra-iteration
loop-carried
register
control
communication queue
![Page 28: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/28.jpg)
- 28 -
Detecting Misspeculation
DAGSCC D
B
C
A
A1: while(consume(4)) D : node = node->next produce({0,1},node);T
hre
ad
1
A3: while(consume(6)) B3: ncost = consume(2);C : cost += ncost; produce(3,cost);T
hre
ad
3
A2: while(consume(5)) B : ncost = doit(node); produce(2,ncost);D2: node = consume(0);T
hre
ad
2
A : while(cost < T && node)B4: cost = consume(3); C4: node = consume(1); produce({4,5,6},cost < T && node);
Th
read
4
![Page 29: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/29.jpg)
- 29 -
Detecting Misspeculation
DAGSCC D
B
C
A
A1: while(TRUE) D : node = node->next produce({0,1},node);T
hre
ad
1
A3: while(TRUE) B3: ncost = consume(2);C : cost += ncost; produce(3,cost);T
hre
ad
3
A2: while(TRUE) B : ncost = doit(node); produce(2,ncost);D2: node = consume(0);T
hre
ad
2
A : while(cost < T && node)B4: cost = consume(3); C4: node = consume(1); produce({4,5,6},cost < T && node);
Th
read
4
![Page 30: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/30.jpg)
- 30 -
Detecting Misspeculation
DAGSCC D
B
C
A
A1: while(TRUE) D : node = node->next produce({0,1},node);T
hre
ad
1
A3: while(TRUE) B3: ncost = consume(2);C : cost += ncost; produce(3,cost);T
hre
ad
3
A2: while(TRUE) B : ncost = doit(node); produce(2,ncost);D2: node = consume(0);T
hre
ad
2
A : while(cost < T && node)B4: cost = consume(3); C4: node = consume(1); if(!(cost < T && node)) FLAG_MISSPEC();
Th
read
4
![Page 31: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/31.jpg)
- 31 -
Breaking False Memory Dependences
MemoryVersion 3Memory
Version 3
Oldest VersionCommitted by
Recovery ThreadDependence
Graph D
B
C
A
intra-iteration
loop-carried
register
control
communication queue
false memory
![Page 32: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/32.jpg)
- 32 -
Adding Parallel Stages to DSWP
LD = 1 cycleX = 2 cycles
while(ptr = ptr->next) // LD ptr->val = ptr->val + 1; // X
ThroughputDSWP: 1/2 iteration/cycleDOACROSS: 1/2 iteration/cyclePS-DSWP: 1 iteration/cycle
Comm. Latency = 2 cycles
0
1
2
3
4
5
LD:1
LD:2
X:1
X:3
LD:3
LD:4
LD:5
LD:6
X:5
6
7
LD:7
LD:8
Core 1 Core 2 Core 3
X:2
X:4
![Page 33: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/33.jpg)
- 33 -
p = list; sum = 0;A: while (p != NULL) {B: id = p->id;E: q = p->inner_list;C: if (!visited[id]) {D: visited[id] = true;F: while (foo(q))G: q = q->next;H: if (q != NULL)I: sum += p->value; }J: p = p->next; }
10
10
10
10
55
50
50
5
3 Reduction
Thread Partitioning
![Page 34: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/34.jpg)
- 34 -
Thread Partitioning: DAGSCC
![Page 35: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/35.jpg)
- 35 -
Thread Partitioning
Merging Invariants
• No cycles• No loop-carried dependence inside a doall node
20
10
15
5
100
5
3
![Page 36: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/36.jpg)
- 36 -
Treated as sequential
20
10
15
113
Thread Partitioning
![Page 37: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/37.jpg)
- 37 -
45
113
Thread Partitioning
Modified MTCG[Ottoni, MICRO’05] to generate code from partition
![Page 38: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/38.jpg)
- 38 -
Discussion Point 1 – Speculation How do you decide what dependences to speculate?
» Look solely at profile data?
» What about code structure?
How do you manage speculation in a pipeline?» Traditional definition of a transaction is broken
» Transaction execution spread out across multiple cores
![Page 39: EECS 583 – Class 17 Research Topic 1 Decoupled Software Pipelining University of Michigan November 9, 2011](https://reader035.vdocuments.mx/reader035/viewer/2022062408/56649ef25503460f94c039d1/html5/thumbnails/39.jpg)
- 39 -
Discussion Point 2 – Pipeline Structure When is a pipeline a good/bad choice for parallelization?
Is pipelining good or bad for cache performance?» Is DOALL better/worse for cache?
Can a pipeline be adjusted when the number of available cores increases/decreases?