cmput429/cmpe382 amaral 1/17/01 cmput429/cmpe382 winter 2001 topic9: software pipelining (some...
TRANSCRIPT
CMPUT429/CMPE382Amaral1/17/01
CMPUT429/CMPE382 Winter 2001
Topic9: Software Pipelining
(Some slides from David A. Patterson’s CS252,
Spring 2001 Lecture Slides)
CMPUT429/CMPE382Amaral1/17/01
Another possibility:Software Pipelining
• Observation: if iterations from loops are independent, then we can get more ILP by scheduling execution instructions from different iterations
• Software pipelining: reorganizes loops so that each iteration is made from instructions chosen from different iterations of the original loop
Iteration 0 Iteration
1 Iteration 2 Iteration
3 Iteration 4
Software- pipelined iteration
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleBefore: Unrolled 3 times 1 L.D F0,0(R1) 2 ADD.D F4,F0,F2 3 S.D 0(R1),F4 4 L.D F6,-8(R1) 5 ADD.D F8,F6,F2 6 S.D -8(R1),F8 7 L.D F10,-16(R1) 8 ADD.D F12,F10,F2 9 S.D -16(R1),F12 10 DSUBUI R1,R1,#24 11 BNEZ R1,LOOP
After: Software Pipelined 1 S.D 0(R1),F4 ; Stores M[i] 2 ADD.D F4,F0,F2 ; Adds to
M[i-1] 3 L.D F0,-16(R1);Loads M[i-
2] 4 DSUBUI R1,R1,#8 5 BNEZ R1,LOOP
• Symbolic Loop Unrolling– Maximize result-use distance – Less code space than unrolling– Fill & drain pipe only once per loop vs. once per each unrolled iteration in loop unrolling
SW Pipeline
Loop Unrolled
ove
rlap
ped
op
sTime
Time
5 cycles per iteration
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleBefore: Unrolled 3 times 1 L.D F0,0(R1) 2 ADD.D F4,F0,F2 3 S.D 0(R1),F4 4 L.D F6,-8(R1) 5 ADD.D F8,F6,F2 6 S.D -8(R1),F8 7 L.D F10,-16(R1) 8 ADD.D F12,F10,F2 9 S.D -16(R1),F12 10 DSUBUI R1,R1,#24 11 BNEZ R1,LOOP
After: Software PipelinedL.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
F0 F2 F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
sX[1000]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
+
R1
T1F0 F2 F4
sx[1000]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T1F0 F2 F4
sx[999]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
T1X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T1F0 F2 F4
sx[999]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T2F0 F2 F4
sx[999]
+
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T2F0 F2 F4
sx[998]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]X[999]X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T2F0 F2 F4
sx[998]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining ExampleAfter: Software Pipelined
L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)
------------------------------------L: S.D 0(R1),F4 ; Stores M[i]
ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8
BNEZ R1,L------------------------------------
S.D -8(R1),F4 ADD.D F4,F0,F2
S.D -16(R1),F4
X[1000]T2
X[998]X[997]
...
0xFF000xFEE80xFEE00xFED8
...
R1
T2F0 F2 F4
sx[998]
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
32 33 34 35 36 37 38
General Registers (Physical)
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x132 33 34 35 36 37 38
General Registers (Physical)
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
x132 33 34 35 36 37 38
General Registers (Physical)
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
x4x5
x1x2x3
Memory
x132 33 34 35 36 37 38
General Registers (Physical)
39
32 33 34 35 36 37 38 39
General Registers (Logical)
0
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0 0116 17 18
Predicate Registers
4
LC
3
EC
1
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 0116 17 18
Predicate Registers
3
LC
3
EC
x4x5
x1x2x3
Memory
x133 34 35 36 37 38 39
General Registers (Physical)
32
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-1
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
1
x4x5
x1x2x3
Memory
x134 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1
-2
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3
Memory
x134 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
y2
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3
Memory
34 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3 y1
Memory
y234 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
2
LC
3
EC
x4x5
x1x2x3 y1
Memory
y234 35 36 37 38 39 32
General Registers (Physical)
33
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
-2
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 11
16 17 18
Predicate Registers
1
LC
3
EC
1
x4x5
x1x2x3 y1
Memory
-3
RRB
y235 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
Memory
-3
RRB
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
x2y1 x3
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1 1116 17 18
Predicate Registers
1
LC
3
EC
x4x5
x1x2x3 y1
y2
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
1
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x435 36 37 38 39 32 33
General Registers (Physical)
34
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-3
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
1
x4x5
x1x2x3 y1
y2
Memory
-4
RRB
y2 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 x3
-4
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-4
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
-4
RRB
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1116 17 18
Predicate Registers
0
LC
3
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x436 37 38 39 32 33 34
General Registers (Physical)
35
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-4
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x437 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 x437 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3 y1
y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
1 1016 17 18
Predicate Registers
0
LC
2
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y537 38 39 32 33 34 35
General Registers (Physical)
36
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-5
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y538 39 32 33 34 35 36
General Registers (Physical)
37
32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
38 39 32 33 34 35 36 37
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
38 39 32 33 34 35 36 37
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
38 39 32 33 34 35 36 37
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
38 39 32 33 34 35 36 37
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 1016 17 18
Predicate Registers
0
LC
1
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-6
RRB
38 39 32 33 34 35 36 37
CMPUT429/CMPE382Amaral1/17/01
Software Pipelining Example in the IA-64
0 0016 17 18
Predicate Registers
0
LC
0
EC
loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1
br.ctop loop
0
x4x5
x1x2x3
y4y5
y1y2y3
Memory
y2 x5 y5
General Registers (Physical)32 33 34 35 36 37 38 39
General Registers (Logical)
y3y1 y4
-7
RRB
38 39 32 33 34 35 36 37