ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ … 14-15-ece... · mirror...
TRANSCRIPT
ΗΜΥ 307ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ
ΚΥΚΛΩΜΑΤΑΕαρινό Εξάμηνο 2018
ΔΙΑΛΕΞΕΙΣ 14-15: Κυκλώματα Αριθμητικής και Λογικής
ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ ([email protected])(ack: Prof. Mary Jane Irwin and Vijay Narayanan)
[Προσαρμογή από “Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.”]
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .2 © Θεοχαρίδης, ΗΜΥ, 2018
Review: Basic Building Blocksl Datapath
¤ Execution units¢ Adder, multiplier, divider, shifter, etc – Today!
¤ Register file and pipeline registers – MEMORY – See below!¤ Multiplexers, decoders, etc. – THIS lecture (and L.15)
l Control¤ Finite state machines (PLA, ROM – Lecture 17)
l Interconnect¤ Switches, arbiters, buses – Lecture 16
l Memory¤ Caches (SRAMs), TLBs, DRAMs, buffers
¤ Lecture 17
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .3 © Θεοχαρίδης, ΗΜΥ, 2018
The 1-bit Binary Adder
1-bit Full Adder(FA)
A
BS
Cin
S = A Å B Å CinCout = A&B | A&Cin | B&Cin (majority function)
q How can we use it to build a 64-bit adder?
q How can we modify it easily to build an adder/subtractor?
q How can we make it better (faster, lower power, smaller)?
A B Cin Cout S carry status0 0 0 0 0 kill0 0 1 0 1 kill0 1 0 0 1 propagate0 1 1 1 0 propagate1 0 0 0 1 propagate1 0 1 1 0 propagate1 1 0 1 0 generate1 1 1 1 1 generate
Cout
G = A&BP = A Å BK = !A & !B
= P Å Cin
= G | P&Cin
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .4 © Θεοχαρίδης, ΗΜΥ, 2018
FA Gate Level Implementations
A B
S
Cout
Cin
t1 t0t2 t0
t1
A B
S
Cout
Cin
t2
q The way you learned to design in ECE 210 and ECE 211
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .5 © Θεοχαρίδης, ΗΜΥ, 2018
Review: XOR FA
Cout
S
Cin
A
B
16 transistors
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .6 © Θεοχαρίδης, ΗΜΥ, 2018
Review: CPL FA
A
!A
B!B Cin!Cin
!S
S
Cout
!CoutA
!A
B
!B
!B
B Cin !Cin
Cin
!Cin
20+8 transistors, dual rail – beware of threshold drops
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .8 © Θεοχαρίδης, ΗΜΥ, 2018
Review: Mirror Adder
B
B B
B B
BB
BA
A
A
A
A
A A
A
Cin
Cin
Cin
Cin
Cin!Cout !S
24+4 transistors
kill
generate
0-propagate
1-propagate
Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)
4 4
4 4
4
8
888
8
2 2 23
3
3
6
6
6444
4
2
Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .9 © Θεοχαρίδης, ΗΜΥ, 2018
Mirror Adder Featuresl The NMOS and PMOS chains are completely symmetrical with
a maximum of two series transistors in the carry circuitry,guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.
l When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.
l The transistors connected to Cin are placed closest to the output.
l Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .10 © Θεοχαρίδης, ΗΜΥ, 2018
A 64-bit Adder/Subtractor
1-bit FA S0
C0=Cin
C1
1-bit FA S1
C2
1-bit FA S2
C3
C64=Cout
1-bit FA S63
C63
. . .
q Ripple Carry Adder (RCA) built out of 64 FAs
q Subtraction – complement all subtrahend bits (xorgates) and set the low order carry-in
q RCA
l advantage: simple logic, small (low cost)
l disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)
A0
B0
A1
B1
A2
B2
A63
B63
add/subt
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .11 © Θεοχαρίδης, ΗΜΥ, 2018
Ripple Carry Adder (RCA)
A0 B0
S0
C0=CinFA
A1 B1
S1
FA
A2 B2
S2
FA
A3 B3
S3
FACout=C4
T = O(N) worst case delay
Tadder » TFA(A,B®Cout) + (N-2)TFA(Cin®Cout) + TFA(Cin®S)
Real Goal: Make the fastest possible carry path
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .12 © Θεοχαρίδης, ΗΜΥ, 2018
Inversion Property
A B
S
CinFA
!Cout (A, B, Cin) = Cout (!A, !B, !Cin)
Cout
A B
S
FACout Cin
!S (A, B, Cin) = S(!A, !B, !Cin)
º
q Inverting all inputs to a FA results in inverted values for all outputs
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .13 © Θεοχαρίδης, ΗΜΥ, 2018
Exploiting the Inversion Property
A0 B0
S0
C0=CinFA’
A1 B1
S1
FA’
A2 B2
S2
FA’
A3 B3
S3
FA’Cout=C4
Now need two “flavors” of FAs
regular cellinverted cell
q Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .15 © Θεοχαρίδης, ΗΜΥ, 2018
Fast Carry Chain Designl The key to fast addition is a low latency carry networkl What matters is whether in a given position a carry is
¤ generated Gi = Ai & Bi = AiBi¤ propagated Pi = Ai Å Bi (sometimes use Ai | Bi)¤ annihilated (killed) Ki = !Ai & !Bi
l Giving a carry recurrence ofCi+1 = Gi | PiCi
C1 = G0 | P0C0
C2 = G1 | P1G0 | P1P0 C0
C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0
C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .16 © Θεοχαρίδης, ΗΜΥ, 2018
Manchester Carry Chainl Switches controlled by Gi and Pi
l Total delay of¤ time to form the switch control signals Gi and Pi¤ setup time for the switches¤ signal propagation delay through N switches in the worst case
Gi Pi
!Ci!Ci+1
clk
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .17 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Sliced MCC Adder
G P
!C0
clk
G PG PG P
ÅÅÅÅ
& Å& Å& Å& Å
A0 B0A1 B1A2 B2A3 B3
S0S1S2S3
!C1!C2!C3
!C4
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .18 © Θεοχαρίδης, ΗΜΥ, 2018
Domino Manchester Carry Chain Circuit
Ci,0G0
clk
clkP0P1P2P3
G1G2G3
Ci,41 2 3 4
5
6
3 3 3 3 3
1
2
2
3
3
4
4
5
!(G0 | P0 Ci,0)
!(G1 | P1G0 | P1P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)
!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .19 © Θεοχαρίδης, ΗΜΥ, 2018
Binary Adder Landscapesynchronous word parallel adders
ripple carry adders (RCA) carry prop min adders
signed-digit fast carry prop residue adders adders adders
Manchester carry parallel conditional carry carry chain select prefix sum skip
T = O(N), A = O(N)
T = O(1), A = O(N)
T = O(log N)A = O(N log N)
T = O(ÖN), A = O(N)T = O(N)
A = O(N)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .20 © Θεοχαρίδης, ΗΜΥ, 2018
Carry-Skip (Carry-Bypass) Adder
If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the block itself kills or generates the carry internally
A0 B0
S0
Ci,0FA
A1 B1
S1
FA
A2 B2
S2
FA
A3 B3
S3
FACo,3
Co,3
BP = P0 P1 P2 P3 “Block Propagate”
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .21 © Θεοχαρίδης, ΗΜΥ, 2018
Carry-Skip Chain Implementation
BPblock carry-in
block carry-outcarry-out
Cin
G0
P0P1P2P3
G1G2G3
!Cout
BP
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .22 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Block Carry-Skip Adder
Worst-case delay ® carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15
Ci,0
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15
Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .23 © Θεοχαρίδης, ΗΜΥ, 2018
Optimal Block Size and Timel Assuming one stage of ripple (tcarry) has the same delay as
one skip logic stage (tskip) and both are 1TCSkA = 1 + B + (N/B-1) + B + 1
tsetup ripple in skips ripple in tsumblock 0 last block
= 2B + N/B + 1l So the optimal block size, B, is
dTCSkA/dB = 0 Þ Ö(N/2) = Bopt
l And the optimal time isOptimal TCSkA = 2(Ö(2N)) + 1
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .24 © Θεοχαρίδης, ΗΜΥ, 2018
Carry-Skip Adder Extensionsl Variable block sizes
¤ A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay
CinCout
q Multiple levels of skip logic
skip level 1
skip level 2
CinCout
AND of the first level skip signals (BP’s)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .25 © Θεοχαρίδης, ΗΜΥ, 2018
Carry-Skip Adder Comparisons
0
10
20
30
40
50
60
70
8 bits 16 bits 32 bits 48 bits 64 bits
RCACSkAVSkA
B=2 B=3B=4
B=5B=6
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .31 © Θεοχαρίδης, ΗΜΥ, 2018
Parallel Prefix Adders (PPAs)l Define carry operator € on (G,P) signal pairs
¤ € is associative, i.e.,[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]
€
(G’’,P’’) (G’,P’)
(G,P)
whereG = G’’ Ú P’’G’P = P’’P’
€
€ €
€
G’!G
G’’
P’’
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .32 © Θεοχαρίδης, ΗΜΥ, 2018
PPA General Structurel Given P and G terms for each bit position, computing all the
carries is equal to finding all the prefixes in parallel(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)
l Since € is associative, we can group them in any order ¤ but note that it is not commutative
q Measures to consider● number of € cells● tree cell depth (time)● tree cell area● cell fan-in and fan-out● max wiring length● wiring congestion● delay path variation (glitching)
Pi, Gi logic (1 unit delay)
Si logic (1 unit delay)
Ci parallel prefix logic tree (1 unit delay per level)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .34 © Θεοχαρίδης, ΗΜΥ, 2018
Brent-Kung PPAPa
ralle
l Pre
fix C
ompu
tatio
n €
G0P0
G1P1
G2p2
G3P3
G4P4
G5P5
G6P6
G7P7
G8P8
G9p9
G10P10
G11p11
G12P12
G13p13
G14p14
G15p15
€€€€€€€
€ € € €
€
€
€
€
€
€
€ € € € € €
€ €
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
NT
= lo
g 2N
-2
A =
2log
2N
A = N/2
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .35 © Θεοχαρίδης, ΗΜΥ, 2018
Kogge-Stone PPF AdderPa
ralle
l Pre
fix C
ompu
tatio
n
€
G0P0
G1P1
G2P2
G3P3
G4P4
G5P5
G6P6
G7P7
G8P8
G9P9
G10P10
G11P11
G12P12
G13P13
G14P14
G15P15
€€€€€€€
€ € € €
€
€
€
€
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
N
A =
log 2
N
A = N
€€€€€€€
€ € € € € € € € € €
€ € € € € € € € € €
€ € € € € €
Tadd = tsetup + log2N t€ + tsum
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .36 © Θεοχαρίδης, ΗΜΥ, 2018
Normalized Delay - Adder Comparisons
0
10
20
30
40
50
60
70
8 bits 16 bits 32 bits 48 bits 64 bits
RCACSkAVSkAKS PPA
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .37 © Θεοχαρίδης, ΗΜΥ, 2018
Multiply Operationl Multiplication as repeated additions
multiplicandmultiplier
partialproductarray
double precision product
N
2N
N can be formed in parallel
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .38 © Θεοχαρίδης, ΗΜΥ, 2018
Shift & Add Multiplicationl Right shift and add
¤ Partial product array rows are accumulated from top to bottom on an N-bit adder
¤ After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
¤ Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
q Making it faster● Use a faster adder● Use higher radix (e.g., base 4) multiplication
- Use multiplier recoding to simplify multiple formation
● Form partial product array in parallel and add it in parallelq Making it smaller (i.e., slower)
● Use an array multiplier- Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI- Can be easily and efficiently pipelined
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .39 © Θεοχαρίδης, ΗΜΥ, 2018
Tree Multiplier Structure
partial productarray reduction tree
fast carry propagate adder (CPA)
P (product)
mux + reductiontree (log N)+CPA (log N)
Q (‘ier)
D (‘icand)
DD
D
0
00
0
multiple forming circuits
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .40 © Θεοχαρίδης, ΗΜΥ, 2018
(4,2) Counterl Built out of two (3,2) counters (just FA’s!)
¤ all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
¤ the internal output is carried to the next higher weight position (indicated by the )
(3,2)
(3,2) Note: Two carry outs - one “internal” and one “external”
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .42 © Θεοχαρίδης, ΗΜΥ, 2018
Tiling (4,2) Counters
l Reduces columns four high to columns only two high¤ Tiles with neighboring (4,2) counters¤ Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .44 © Θεοχαρίδης, ΗΜΥ, 2018
4x4 Partial Product Array Reduction
multiplicandmultiplier
partialproductarray
reduced pp array (to CPA)
double precision product
q Fast 4x4 multiplication using (4,2) counters
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .46 © Θεοχαρίδης, ΗΜΥ, 2018
8x8 Partial Product Array Reduction‘icand‘ier
partialproductarray
reduced partial product array
How many (4,2) countersminimumare needed to reduce it to 2 rows?
Answer: 24
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .47 © Θεοχαρίδης, ΗΜΥ, 2018
Alternate 8x8 Partial Product Array Reduction‘icand‘ier
partialproductarray
reduced partial product array
More (4,2) counters, so what is the advantage?
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .48 © Θεοχαρίδης, ΗΜΥ, 2018
Array Reduction Layout Approach
multiple generators
multiplicand
multiple selection signals(‘ier)
. . .2(4,2) counter slice
(4,2) counter slice
(4,2) counter slice
CPA
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .49 © Θεοχαρίδης, ΗΜΥ, 2018
Parallel Programmable Shifters
Dat
a In
Control =
Dat
a O
ut
Shift amountShift directionShift type (logical,
arith, circular)
Shifters used in multipliers, floating point units
Consume lots of area if done in random logic gates
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .50 © Θεοχαρίδης, ΗΜΥ, 2018
A Programmable Binary Shifter
rgt nop left
Ai
Ai-1 Bi-1
BiAi Ai-1 rgt nop left Bi Bi-1A1 A0 0 1 0 A1 A0
A1 A0 1 0 0 0 A1
A1 A0 0 0 1 A0 0
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .52 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Barrel Shifter
A0
A1
A2
A3
B0
B1
B2
B3
Sh1
Sh2
Sh3
Sh0 Sh1 Sh2 Sh3
Example: Sh0 = 1B3B2B1B0 = A3A2A1A0
Sh1 = 1B3B2B1B0 = A3A3A2A1
Sh2 = 1B3B2B1B0 = A3A3A3A2
Sh3 = 1B3B2B1B0 = A3A3A3A3
Area dominated by wiring
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .54 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Barrel Shifter Layout
BufferSh3Sh2Sh1Sh0
A3
A2
A1
A0
Widthbarrel ~ 2 pm NN = max shift distance, pm = metal pitch
Delay ~ 1 fet + N diff caps
Widthbarrel
Only one Sh#active at a timel
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .55 © Θεοχαρίδης, ΗΜΥ, 2018
8-bit Logarithmic Shifter
A3
A2
A1
A0
!Sh1Sh1 !Sh2Sh2 !Sh3Sh3
B0
B1
B2
B3
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .57 © Θεοχαρίδης, ΗΜΥ, 2018
8-bit Logarithmic Shifter Layout Slice
Widthlog ~ pm(2K+(1+2+…+2K-1)) = pm(2K+2K-1)K = log2 N
Delay ~ K fets + 2 diff caps
A0
B3
B2
B1
B0
A1
A2
A3
1 2 4
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .58 © Θεοχαρίδης, ΗΜΥ, 2018
Shifter Implementation Comparisons
N K
Barrel Logarithmic
Width Speed Width Speed2 N pm 1 + N diffs pm(2K+2K-1) K + 2 diffs
8 3 16 pm 1 + 8 13 pm 3 + 216 4 32 pm 1 + 16 23 pm 4 + 232 5 64 pm 1 + 32 41 pm 5 + 264 6 128 pm 1 + 64 75 pm 6 + 2
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .59 © Θεοχαρίδης, ΗΜΥ, 2018
Decodersl Decodes inputs to activate one of many outputs
¤ two inverters, four 2-input nand gates, four inverters plus enable logic
¤ how about for a 3-to-8, 4-to-16, etc. decoder?
In0
In1
Enable
Out0 = !In1 & !In0
Out1 = !In1 & In0
Out2 = In1 & !In0
Out3 = In1 & In0
2x4
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .60 © Θεοχαρίδης, ΗΜΥ, 2018
Dynamic NOR Decoder
Vdd GND GND
A0 !A0 A1 !A1
B0
B1
B2
B3
precharge
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .62 © Θεοχαρίδης, ΗΜΥ, 2018
Dynamic NAND Decoder
GND
A0 !A0 A1 !A1
B3
precharge
B2
B1
B0
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .64 © Θεοχαρίδης, ΗΜΥ, 2018
Building Big Decoders from Small
1x2
A4
enable
A3 A2
2x4
2x4
A1 A0
2x4
2x4
.
.
.
0 0 0 0 1
1 ® 0 ® 1
Active low enable Active low output
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .65 © Θεοχαρίδης, ΗΜΥ, 2018
Multiplexersl Selects one of several inputs to gate to the single output
¤ two inverters, four 3-input nands, one 4-input nand¤ how about for an 8x1, 16x1, etc. mux?
In0
S1 S0
Out = In0 & !S1 & !S0 |In1 & !S1 & S0 |In2 & S1 & !S0 |In3 & S1 & S0
In1
In2
In3
4x1
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .66 © Θεοχαρίδης, ΗΜΥ, 2018
Review: TG 2x1 Multiplexer
GND
VDD
In1 In2S S
S S
S
S
!S
In2
In1
F
F
F = !((In1 & S) | (In2 & !S))
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .67 © Θεοχαρίδης, ΗΜΥ, 2018
Building Big Muxes from Small
A0
S0
A12x1
A2
A32x1
2x1
S1
Out
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .69 © Θεοχαρίδης, ΗΜΥ, 2018
Review: Datapath Bit-Sliced OrganizationControl Flow
Bit 0
Bit 1
Bit 2
Bit 3
Tile identical bit-slice elements
Reg
iste
r File
Pipe
line
Reg
iste
r
Adde
r
Shift
er
Pipe
line
Reg
iste
r
Mul
tiple
xer
Mul
tiple
xer
Data Flow
Pipe
line
Reg
iste
r
From I$
Pipe
line
Reg
iste
r
To/From D$
decoder
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .70 © Θεοχαρίδης, ΗΜΥ, 2018
Layout of Bit-Sliced Datapaths
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .71 © Θεοχαρίδης, ΗΜΥ, 2018
Layout of Bit-sliced DatapathsWithout feedthroughs or pitch matching (4.2µm2)
With feedthroughs (3.2µm2)
With feedthroughs and pitch matching (2.2µm2)
ΗΜΥ307 Δ14-15 Arithmetic and Logic Circuits .72 © Θεοχαρίδης, ΗΜΥ, 2018
Alpha 21264 Integer Unit DatapathMultimedia engine
Shifter
Intercluster bypass
Adder
Logic box
Register fileRegister
file decoder
Logic box
Adder
Intercluster bypass
Load bypass
Store FIFO
Address drivers
tristate bus driver
bus driver
RC1_0RC1_1
RC2_0
RC2_1LSD_1LSD_0to D$