processor architectures and program mapping
DESCRIPTION
Processor Architectures and Program Mapping. 5kk10. flexibility. efficiency. DSP. Programmable CPU. Programmable DSP. Application specific instruction set processor (ASIP). Application specific processor. efficiency. ASIC. high medium low. ASIP. DSP. - PowerPoint PPT PresentationTRANSCRIPT
-
Processor Architectures and Program Mapping5kk10
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
DSPProgrammable CPUProgrammable DSPApplication specific instruction set processor (ASIP) Applicationspecific processor
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
low medium high high
medium
lowflexibilityefficiencyASICGP procFPGADSPASIP
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Programmable CPU cores introduction architecture of the MIPS core discussed as an example pipelining application examples software issues comparison between different CPU cores towards application specific architectures discussion
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
rationale: as high multiplex factor R as possibleconsequence: often manual handcrafted design optimised for clock rateproblem : fast changes in the IC process technologyexamples embedded: MIPS (first one, licensing instruction set architecture)ARM (Advanced Risc Machines, telecom, low power, small code size, most popular one, licensing alsothe micro-architecture as hard or soft IP)Sparcderivatives from general purpose CPUsIntel, NEC, Hitachi, National, PowerPC
Introduction
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Instruction set architecturesimplicit operandsexplicit operandsIntroduction
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
C = A + BIntroduction
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
stack
accum
Reg-mem
Reg-reg
Push A
Load A
Load R1, A
Load R1,A
Push B
Add B
Add R1,B
Load R2,B
Add
Store C
Store C, R1
Add R3,R1,R2
Pop C
Store C,R3
-
Architecture of the MIPS core[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
opoperation of the instructionrs,rt,rdsource and destination registersshamtshift amountfunctoperation of the instruction-part 2immfor program constantsaddrtarget address of a jumpMIPS instruction formats ( 32 bits )[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Example 1 : R - type : add instruction[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
PCInstructionMemoryRw Ra Rb
32 32-bitregisters
DataMemoryClkClkClkDataaddressData inData outInstruction addressInstructionRdRsRtImm5551632323232Critical path R-type operation[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Old valueNew valueInstruction memory access timePCRs, rt, rdop, functOld valueNew valueRFile access timeBus A,BOld valueNew valueALU delayBus WSet up + skewClock-to-QNew valueOld valueClockWrite into RFileCritical path R-type operation
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Example 2 : I-type : load wordlw rs, rt, imm16 mem[PC] addr = R[rs] + ext[imm16] R[rt] = mem[addr] PC = PC + 4[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Old valueNew valueInstruction memory access timePCRs, rt, rdop, functOld valueNew valueRFile access timeBus A,BOld valueNew valueMem access timeBus Wset up+skewClock-to-QNew valueClockCritical path load operationOld valueNew valueALU delayaddressOld value
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
beq rs, rt, imm16 mem[PC] cond = R[rs] - R[rt] if cond = 0 PC = PC + 4 + ext(imm16)*4 else PC = PC + 4Example 3 : I-type : branch[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Rw Ra Rb
32 32-bitregisters
ClkRsdc (Rt)55532BusA32Reg WrBus WALUctrRdRtRedDst
32ExtenderImm 16 16ALUSrcExtOpBusB32Next AddressLogicImm 16 16BranchTo InstructionMemoryPCClkZeroExample 3 : I-type : branch[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
PCBranch Zero01SignExtImm 16 16Instruction 00AddrAddr
InstructionMemory303030303030Clk132Instruction Example 3 : I-type : branch[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
PCBranch Zero01SignExtImm 16 16Instruction 00AddrAddr
InstructionMemory3030Clk032Instruction 301c_inExample 3 : I-type : branch[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
problem : long critical path defined by the slowest instruction (load) solution ?= pipelining break the instruction into smaller steps all steps have about the same critical pathArchitecture of the MIPS core
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IfetchRF readALUdmemRF writecycle 1cycle 2cycle 3cycle 4cycle 5cycle 6cycle 7IfetchRF readALUdmemRF writeIfetchRF readALUdmemRF writelwlwlwPipelining lw instructions One instructions enters the pipeline every clock cycle One instructions leaves the pipeline every clock cycle=> CPI = 1 (Cycles per Instruction)[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IRAMWInstructionsDataCurrent CPU cyclePipelining lw instructions
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IfetchRF readALURF writeE.g. ADD4 stages of R-type instructioncycle 1cycle 2cycle 3cycle 4[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IfetchRF readALUdmemRF writecycle 1cycle 2cycle 3cycle 4cycle 5cycle 6cycle 7IfetchRF readALURF writelwaddPipelining lw and R-type instructions[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Solution: stretch R-type to 5 stagesIfetchRF readALUdmemRF writeDummy op (noop)[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
BusADin
RegDstext.Imm16ALUSrcExtOpDatamemMemtoRegMemWrBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsALUopbranchRegWrIfetchReg/decexecmemwrNext PC[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
R1 = ... = R1 + ... = R1 + ... = R1 + ... = R1 + ...Data dependencies : R-type instructions[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
R1 = ... = R1 + ... = R1 + ... = R1 + ... = R1 + ...Data dependencies : R-type instructionsSolution: bypasses[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
DatamemadrBypasses[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
R1 = lw... = R1 + ... = R1 + ... = R1 + ...Data dependencies : load instruction[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
R1 = lw... = R1 + ... = R1 - ... = R1 - ...Data dependencies : load instructionBypass is no solutionfor + instruction[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IMRFDMRFIMRFDMRFIMRFDMRFR1 = lw... = R1 + ... = R1 - ... = R1 - ...Data dependencies : load instructionSolution: pipeline interlock = detects a data hazard and stallsthe pipeline until the hazard is cleared[Hennessy&Patterson]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IR(interlocked)AMWInstructionsi1) lw r10, r2, r0i2) add r8, r9, r10Data available from data cachei1i2
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
IR(interlocked)AMWInstructionsi1) MULT r3, r2, r1i2) ADD r5, r4, r3i1i2
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
BusADin
ext.Imm16DatamemBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsbranchNext PC[Hennessy&Patterson]Control hazards
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
BusADin
ext.Imm16DatamemBusBRaRbRwDiRsRtRtRdadrProgmem+ 4DoutRfileflagsbranchNext PC[Hennessy&Patterson]Control hazards0?
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
i1i2i3Address available for instr. fetchi1) beq r10, r2, 1bi2) nop/independent instructionsi3) add r8, r9, r10Control hazardsSolution: compiler action possibly filling the branch delay slot
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
PR3930 CPU
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
caches
I$ 8K, 2-way
D$ 4K, 4-way
Process
0.35(, 5M
voltage
2.7-3.6 V
frequency
81/100 MHz
Tj = 125/90 C
2.7V, wcp
area
20 mm2
Power dissipation
4 mW/MHz
-
PR3930 + peripheralsGfx, SDRAM controller,Serial interconnect bus,I2C, UART, timers PI bus architecture80 mm2352 pins0.35 micron process48 MHz (96 for gfx)TCP chip: TV controllerD$I$
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Programmable CPU cores introduction architecture of the MIPS core discussed as an example pipelining application examples software issues comparison between different CPU cores towards application specific architectures discussion
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Application examples (1)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
#define NTAPS 4
int fir(int in)
(
int i;
static int state[NTAPS];
static int coeff[NTAPS];
int out[NTAPS];
state[NTAPS] = in;
out[0] = state[0] * coeff[0];
for ( i = 1; i < NTAPS+1; i++) (
out[i] = out[i-1] + state[i] * coeff[i];
state[i-1] = state[i];
(
return(out[NTAPS]);
(
-
Application examples (1)19 instructions per tap!!
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
.L1000006
sll $3, $2, 2
R3=R2>>2
R3=i-1
addu$14, $15, $3
R14=R15+R3
lw$24, 0($14)
R24=load(*R14)R24=coeff[i-1]
addiu$12, $6, -4
R12=R6-4
addu$11, $12, $3
R11=R12+R3
lw$13, 0($11)
R13=load(*R11)R13=state[i-1]
nop
mult$24, $13
R24=R24*R13
addu$25, $sp, $3
R25=sp+R3
lw$9, -4($25)
R9=load(R25-4)R9=out[i-1]
addiu$2, $2, 1
R2=R2+1
i=i+1
mflo $13
R13=move from low mpy reg
addu$10, $9, $13
R10=R9+R13
R10=out[i]
sw$10, 0($25)
mem(*R25)=R10
addu$25, $7, $3
R25=R7+R3
sw$24, 0($25)
mem(*R25)=R24
slti$24, $2, 10
bne $24, $0, .L100006
addiu$15, $7, -4
-
Bit level operations:finite field arithmeticApplication examples (2)10 instructions!!Very simple in hardware
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
temp1 = input
-
Bit level operations : DES exampleApplication examples (2)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Bit level operations : A5 example (GSM encryption)Application examples (2)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
CIF format = 352 * 288 px, 2:1:1, 8 bits/sampleQCIF = 1/4 CIFSQCIF = 96*128Process = 0.25 micronpower consumption = 100 mW @ 10 HzVideo conferencing H26396*128*1.5*10Hz= 180 KB/s20Kb/s:72Compare 852*576*2B/p *50 =49MB/sApplication examples (3)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
H.263 video encoderApplication examples (3)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
PR3940I$D$memory10 Hz => 140 MHz CPUApplication examples (3)
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
indicator
value
Code size
249 kB
Data size
189 kB
I-frame
8.8 Mcc
P-frame
13.8 Mcc
Motion Est.
2.1 Mcc
Bus load
18 %
I$ misses
0.8 %
-
Application examples (3)In which process can the H263 video encoder be executedon a single MIPS processor ?Conclude: power consumption is limiting factor!!
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
95
97
99
01
03
06
09
12
gatelength ((m)
350
250
180
150
130
100
70
50
VDD (V)
2.7
2.5
1.8
1.5
1.5
1.2
0.9
0.6
s
1
0.71
0.51
0.43
0.37
.29
0.20
0.14
p
1
0.93
0.67
0.56
0.56
0.44
0.33
0.22
area
s2
20
10.2
5.3
3.67
2.76
1.63
0.8
0.41
max. clock freq (MHz)
p/s2
81
147
204
245
326
441
675
882
energy/ins (nJ)
sp2
4
2.45
0.91
0.53
0.46
0.23
.09
0.03
-
Application examples: conclusionsCPUs offer flexibility, butnot efficient in performancenot efficient in code sizenot efficient in power consumption
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
func() { a=x.value & 0x3; if (a != 0) { b = a * c + d; } else { b = ; } y.post(b);} a=x.value & 0x3;b = a * c + d;b = ;y.post(b);a != 0a == 0BB1BB2BB4BB3parserldi #0x3, R5and R4,R5,R6cmp R0,R6,R7br R7,trueba falseArch. Modelldi=2 cyclesnop =1 cycle...func() { a=x.value & 0x3; DelayCycles(7); if (a != 0) { b = a * c + d; DelayCycles(8); } else { b = ; DelayCycles(5); } y.post(b); DelayCycles(4);} compile each BBto instructionsgenerate new Cwith delay countscompileand run
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Comparison between different CPU cores
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
processor
Size
Process
Inter-connect
Clock
Specint
Int
Specint
FP
Power
Watt
microsparc
225
0.8
3
50
23
18
4
I486DX2
82
0.8
3
66
32
16
7
Power PC 601
121
0.6
4
66
60
80
7
Pentium
294
0.8
3
66
85
63
16
R4200
76
0.6
3
80
55
30
1.5
R4000SC
184
0.8
2
100
62
63
12
R4400SC
184
0.6
2
150
88
97
15
Alpha
238
0.75
3
200
130
184
30
-
Comparison between different CPU coreshttp://bwrc.eecs.berkeley.edu/cic
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
Sheet1
processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W
Inteli386SX336.215112431.1035405919906.170.14418604653.1
Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35
IntelP56678250.81.3219280949162960.801246.090.16864864864.875
Motorola680402521150.81.321928094961640.511976.890.08195121953.5
SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6
Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6
PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538
SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338
PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429
SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667
IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214
PowerPC604e2009.342.50.352.514573172816961.59630.9000
Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889
Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515
PowerPC604e30012.941.90.25312471.89530.2100
Alpha2136410007041.50.183.47393118831003501.49671.9900
PowerPC740040031.80.153.73696559428832.06484.9300
TMPNX850016641.80.183.47393118831.416.92.56390.31
ARM610251615110.5711.59630.90
71033150.81.32192809490.5461.71586.30
8107513.30.520.52.06484.93
SA-11016011.650.352.51457317280.5502.39418.20
SA-110013311.50.352.51457317280.32.53394.82
940T150130.352.51457317280.68152.23448.50
Sheet1
technology in micron
Specit92
Specint92 vs technology
Sheet2
technology in micron
MOPS/W
MOPS/W vs technology
Sheet3
technology in micron
Specint92/mm^2
Specint92/mm^2 vs technology
technology in micron
Specint92/W
Specint92/W vs technology
-
Comparison between different CPU cores
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
Chart1
6.2
33.4
78
21
26.4
138
40
89
128
269
293
0.35
143
500
0.25
0.18
0.15
technology in micron
Specit92
Specint92 vs technology
Sheet1
processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W
Inteli386SX336.215112431.1035405919906.170.14418604653.1
Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35
IntelP56678250.81.3219280949162960.801246.090.16864864864.875
Motorola680402521150.81.321928094961640.511976.890.08195121953.5
SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6
Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6
PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538
SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338
PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429
SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667
IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214
PowerPC604e2009.342.50.352.514573172816961.59630.9000
Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889
Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515
PowerPC604e30012.941.90.25312471.89530.2100
Alpha2136410007041.50.183.47393118831003501.49671.9900
PowerPC740040031.80.153.73696559428832.06484.9300
TMPNX850016641.80.183.47393118831.416.92.56390.31
ARM610251615110.5711.59630.90
71033150.81.32192809490.5461.71586.30
8107513.30.520.52.06484.93
SA-11016011.650.352.51457317280.5502.39418.20
SA-110013311.50.352.51457317280.32.53394.82
940T150130.352.51457317280.68152.23448.50
Sheet1
technology in micron
Specit92
Specint92 vs technology
Sheet2
technology in micron
MOPS/W
MOPS/W vs technology
Sheet3
technology in micron
Specint92/mm^2
Specint92/mm^2 vs technology
technology in micron
Specint92/W
Specint92/W vs technology
-
Comparison between different CPU cores
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
Chart2
0.1441860465
0.2639012346
0.1686486486
0.0819512195
0.0750933333
0.3317307692
0.1190082645
0.12515625
0.1624365482
0.1886415873
0.1840641026
0
0.1327083333
0.293062201
technology in micron
Specint92/mm^2
Specint92/mm^2 vs technology
Sheet1
processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W
Inteli386SX336.215112431.1035405919906.170.14418604653.1
Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35
IntelP56678250.81.3219280949162960.801246.090.16864864864.875
Motorola680402521150.81.321928094961640.511976.890.08195121953.5
SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6
Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6
PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538
SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338
PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429
SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667
IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214
PowerPC604e2009.342.50.352.514573172816961.59630.9000
Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889
Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515
PowerPC604e30012.941.90.25312471.89530.2100
Alpha2136410007041.50.183.47393118831003501.49671.9900
PowerPC740040031.80.153.73696559428832.06484.9300
TMPNX850016641.80.183.47393118831.416.92.56390.31
ARM610251615110.5711.59630.90
71033150.81.32192809490.5461.71586.30
8107513.30.520.52.06484.93
SA-11016011.650.352.51457317280.5502.39418.20
SA-110013311.50.352.51457317280.32.53394.82
940T150130.352.51457317280.68152.23448.50
Sheet1
technology in micron
Specit92
Specint92 vs technology
Sheet2
technology in micron
MOPS/W
MOPS/W vs technology
Sheet3
technology in micron
Specint92/mm^2
Specint92/mm^2 vs technology
technology in micron
Specint92/W
Specint92/W vs technology
-
Comparison between different CPU cores
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
Chart3
3.1
8.35
4.875
3.5
6.6
4.6
6.1538461538
6.2676056338
9.1428571429
8.9666666667
12.5213675214
0
15.8888888889
15.1515151515
technology in micron
Specint92/W
Specint92/W vs technology
Sheet1
processorclockSpecint-92Specint-95issueVDDtech1/tech (om te ^lotten)powersizeMops/WmW/MHzSpecint per sq. mmSpecint per W
Inteli386SX336.215112431.1035405919906.170.14418604653.1
Inteli486DX5033.4150.81.32192809494810.981017.330.26390123468.35
IntelP56678250.81.3219280949162960.801246.090.16864864864.875
Motorola680402521150.81.321928094961640.511976.890.08195121953.5
SparcMicro5026.4150.81.321928094942250.981017.330.07509333336.6
Alpha2106420013823.30.751.4150374993302341.01989.120.33173076924.6
PowerPC601504033.60.61.73696559426.51211.25800.490.11900826456.1538461538
SparcSuper608935.30.61.736965594214.22560.991011.080.125156256.2676056338
PowerPC60410012843.30.52141971.34745.160.16243654829.1428571429
SparcUltra16726943.30.472.0892673381303151.23810.560.18864158738.9666666667
IntelP61662937.333.30.352.514573172823.41951.21823.680.184064102612.5213675214
PowerPC604e2009.342.50.352.514573172816961.59630.9000
Sparcturbo1701433.543.30.352.514573172891321.76566.790.132708333315.8888888889
Alpha21164a40050012.3420.352.5145731728332091.57636.270.29306220115.1515151515
PowerPC604e30012.941.90.25312471.89530.2100
Alpha2136410007041.50.183.47393118831003501.49671.9900
PowerPC740040031.80.153.73696559428832.06484.9300
TMPNX850016641.80.183.47393118831.416.92.56390.31
ARM610251615110.5711.59630.90
71033150.81.32192809490.5461.71586.30
8107513.30.520.52.06484.93
SA-11016011.650.352.51457317280.5502.39418.20
SA-110013311.50.352.51457317280.32.53394.82
940T150130.352.51457317280.68152.23448.50
Sheet1
technology in micron
Specit92
Specint92 vs technology
Sheet2
technology in micron
MOPS/W
MOPS/W vs technology
Sheet3
technology in micron
Specint92/mm^2
Specint92/mm^2 vs technology
technology in micron
Specint92/W
Specint92/W vs technology
-
Power Consumption in microprocessorsPower consumption is (becoming) the limiting factor in processor design
Solution in direction ofHardware accelerationInstruction Level Parallelism instead of clock speedCode size efficiency
source: ISSCC2001, Patrick Gelsinger, Intel
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Towards application specific architecturesConCISe [Bernardo Kastrup]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Example equation for one output bit (12) is shown!Towards application specific architectures
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Towards application specific architectures
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Hardware/softwarepartitioningTranslatorhardwarecompilerAssembler/linkerModified assemblywith ASIsHardwarenetlistDoes it fit? Y/NHardware partitionHDL fileSource codeConCISe integratedtool-setProfiledataCorecompilerSimulatorexecutableAssemblycodeTowards application specific architectures
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Advantages: faster execution, smaller code size, lower powerThe Configurable Functional Unit (CFU) can be:Standard cellField-Programmable Logic (FPL)Considerably bigger in silicon (4 to 5mm2 in C075)But its reconfigurable = reprogrammable for different application programsTowards application specific architecturesConCISe [Bernardo Kastrup]
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Some benchmarks
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Amdahls lawImpact of an improvement on the execution time of a program depends on 2 parameters:f = fraction of the original computation time that is affected by the improvements = speedup factor (local)exec_time_new = exec_time_old * (1-f) + exec_time_old * f / sspeedup_overall = exec_time_old / exec_time_new = 1 / ( 1 f + f / s)if s >> 1 then speedup_overall = 1 / ( 1 f )Example: 40 % of program can be executed 10 x faster speedup_overall = 1 / ( 0.6 + 0.4 / 10 ) = 1.56
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
www.tensilica.comTowards application specific architectures
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Programmable CPU cores are important for the control parts of the application. They are well supported with tools to support the development of end-user software. ( vs. deeply embedded sw) Keep it Simple heuristic (RISC vs. CISC) Make frequent cases fast and rare cases correct. Regular (orthogonal) instruction set No special features that match a high level language construct. At least 16 registers to ease register allocation. Embedded cores are often light cores which are a compromise between performance, area and power dissipation. (vs. stand-alone CPU cores which are optimised for performance)
Conclusions
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Hands-onImplement a FIR filter in assembly and simulate
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman
-
Hands-onSPIM (MIPS assembly simulator) link from PAM websiteUse appendix A (same site)example assembly file on PAM website1 or 2 page report in 2 weeks:Engineering decisions (eg. Addressing of samples)Verify that C-code and assembly matchAssembly in appendix# instructions/tab? Conclusions?
Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman