a design tradeoff study with monolithic 3d integration design tradeoff study with monolithic 3d...

A Design Tradeoff Study with Monolithic 3D IntegrationChang Liu and Sung Kyu Lim

Georgia Institute of TechonologyAtlanta, Georgia, 30332

Phone: (404) 894-0315, Fax: (404) 385-1746

Abstract— This paper studies various design tradeoffs existing in themonolithic 3D integration technology. Different design styles in monolithic3D ICs are studied, including transistor-level monolithic integration (MI-TR) and gate-level integration (MI-G). GDSII-level layout of monolithic3D designs are constructed and analyzed. Compared with its 2D coun-terparts, MI-TR designs have advantages in footprint area, wire-length,timing, and power, because of the smaller footprint. MI-G design stylealso demonstrate advantages in area, timing and power over TSV-baseddesigns, because of the smaller size and parasitics of inter-tier viascompared with TSVs. To further take the advantage of monolithic 3Dtechnology, several technology improvement options are also explored.Besides, some possible design challenges with monolithic 3D are alsostudied, including global variation and signal integrity issues.

I. INTRODUCTION

3D integration technology is actively being studied as a solu-tion to continue the scaling trajectory predicted by Moore’s Law.Compared with other existing 3D integration technologies (wire-bonding, interposer, TSV, etc.), monolithic 3D integration is the onlyone that enables ultra fine-grained vertical integration of devicesand interconnects, thanks to the extremely small size of inter-tiervias (typically 50nm in diameter). Monolithic 3D technology, by itsdefinition, is a 3D integration technology that fabricates two or moretiers of device tiers sequentially, rather than bonding two fabricateddies together using bumps or TSVs.

Figure 1 shows a typical monolithic 3D structure. The two devicetiers are connected by inter-tier-vias, which are essentially local vias.Metal layers are enabled between two device layers. To fabricatethe top device tier, low-thermal-budgeting process must be applied.Currently, several monolithic 3D integration process are developed.CEA/LETI [1][2] developed a sequential integration flow based onlow temperature bonding process. Samsung [3] developed a ”S3”technology for 3 tier SRAM cell using low-thermal TFT process.Rajendra [4] developed 3D sequential integration process using waferbonding or seeded crystallization.

Existing works [1][2][4] mainly focus on monolithic 3D processand device-level study. A few circuit/system level studies focus onthe memory design using monolithic 3D technology [3][5][6], whichbelongs to highly regular custom design style. However, followingcritical questions need to be answered for adoption of monolithic3D ICs: how to design a large logic system (CPU, DSP, etc.) withmonolithic 3D technology; and how much benefit monolithic 3D ICscan provide for a digital system compared with existing technologies.

This paper studies the new design opportunities with monolithic3D in logic circuits, and compared them with the existing 2D andTSV-based 3D circuits. The rest of the paper is organized as follows.Section II analyzes the benefits monolithic 3D will bring comparedwith TSV-based 3D. Section III demonstrates two design approachesusing monolithic 3D. Section IV compares the monolithic 3D designs

This material is based upon work supported by the Semiconductor ResearchCorporation (SRC) under the Integrated Circuit & Systems Sciences (ICSS,Task ID: 2193.001) and the Interconnect Focus Center (IFC, Theme ID:2050.001) programs.

N+ N+ N+ N+

P+ P+ P+ P+

ILD

M1bot

M1top

M2top

M3top

Internal

via

Inter-tier

via

Local via

Local via

Local via

Oxide

Silicon substrate

Fig. 1. Monolithic 3D structure in this study

with the 2D and TSV based 3D designs based on real layout andsign-off analysis. Section V further discusses the results and suggeststhe options for technology improvement. Section VI analyzes somepotential design challenges with monolithic 3D. Then Section VIIconcludes the paper.

II. BENEFITS OF MONOLITHIC 3D FOR 3D INTEGRATION

Compared with the TSV-based 3D integration, the monolithic 3Dintegration has the following merits from designers’ perspective.

First, since the inter-tier via in monolithic 3D ICs is muchsmaller than a TSV, fine-grained vertical integration is feasible, whichprovides more design freedom for the designers and EDA tools.

The 3D vertical integration can be categorized into several levelsin terms of partitioning granularity. The first one is core level integra-tion. A typical example is core + memory stack [7], which providesvery high memory access bandwidth. The second one is block-levelintegration [8], where functional blocks are partitioned into differenttiers based on their logical connections. In block-level integration, thenumber of vertical connections is usually more than core + memorystacking. The third one is gate-level integration [9], where tiers arepartitioned based on each single gate. Since the number of gate ishuge in a digital system, the demand for vertical interconnection isvery aggressive. The last one is transistor-level integration, whichpartitions the transistors into different tiers. In terms of verticalconnections, transistor-level has a even finer granularity than the gate-level integration.

With current TSV technology, the typical TSV diameter is about5 µm, which is much larger than a standard cell. If we consider a2.5 µm keep-out-zone for each TSV to reduce mechanical reliabilityproblem, the actual silicon area occupied by a single TSV is 100 um2,which is about 5 standard cell rows in 45 nm technology. Figure 2shows the size comparison between a TSV and a standard cell. We seethat the area of TSV is many times bigger than a gate. The huge sizegap implies that fine-grained vertical-integration with a lot of vertical

5um

TSVNAND2_4X

1.4

um

Inter-tier

via

50nm

Diameter

Fig. 2. Size comparison among a TSV, a gate, and an inter-tier via

connections cannot be achieved by TSVs. In other words, the numberof TSVs that can be used in a 3D design is strongly limited by itssize. For example, for a chip with 1 mm × 1 mm footprint, if welimit the total TSV areas to 30 % of the total area, the maximum TSVamount we can use is only 3000. Therefore, core-level or block-level3D partitioning is usually preferable for TSV based 3D integration.

Gate level partitioning is acceptable only when the cut size is small.Recently, nano-scale TSVs are actively being studied and developed.The diameter of the future TSV can reach 0.1 µm, which willboost the vertical interlocution density significantly. However, despitethe effort in reducing the TSV size, the alignment precision in 3Dbonding process becomes a major constraint for further improving the3D IC design granularity. The current alignment precision is about 1µm [1], and is very difficult to improve further. In contrast, the sizeof an inter-tier via is as small as a local via (50 nm in diameter).For the same design with 1 mm × 1 mm footprint, the maximuminter-tier via amount is 30 million, which means almost no limitationon the number of vertical connections between device tiers. Andsince the devices and the inter-tier vias are fabricated sequentially,the alignment precision is extremely high. Therefore, monolithic 3Dtechnology is very suitable for gate-level, or even transistor-level 3Dintegration.

Second, an inter-tier via has a much better electrical performancethan a TSV, in terms of parasitics, mechanical stress, electricalcoupling, etc., due to its small size. Consider the TSV in Figure 2 with5 µm diameter, 0.1 µm thick liner and 50 µm height. The parasiticcapacitance from the TSV to the substrate is about 80 fF [10], whichis roughly equal to the capacitance of a 200 µm long wire. In contrast,the parasitic capacitance of an inter-tier via is less than 1 fF , whichis negligible. Figure 3 shows the timing comparison between twotiming paths, where a TSV and a inter-tier via is driven by an 4Xinverter separately. We see that the delay on the TSV is about 0.73ns, which is much bigger than the delay on the inter-tier via (0.04ns). Therefore with the same timing performance, the TSV-baseddesign needs more efforts on buffering and gate sizing, which willin turn increase the power consumption.

Third, in some design styles in monolithic 3D ICs, existing 2Dtools can handle the 3D design well without the need of using 3Dspecific tools. This feature will be discussed in Section III.

III. DESIGNS STYLES IN MONOLITHIC 3D

This section demonstrates two design styles in monolithic 3D ICs.Pros and cons in each design style are analyzed as well.

A. Transistor Level Design

Since monolithic 3D technology is suitable for fine-grained verticalintegration, we focus on gate-level and transistor-level designs formonolithic 3D technology.

0.0 1.0n 2.0n 3.0n 4.0n 5.0n 6.0n

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Volta

ge (V

)

time (s)

input inter-tier via TSV

Fig. 3. Transient simulation of an INV driving a TSV and a inter-tier via

In transistor-level 3D ICs, basic units for tier partitioning aretransistors. In standard cell based ASIC flow, the most intuitive wayof transistor partitioning is to split the NMOS and PMOS in eachstandard cell into two device tiers as shown in Figure 4. The meritsof this design style are two-fold. First, very few interconnect layers,usually one, in the bottom device tier are needed, because they areonly used for local interconnection inside each cell. Second, existing2D physical design tools can be used for place and route, since thetwo device tiers in each cell are strictly aligned.

However, to perform monolithic 3D designs in transistor-level, weneed to design monolithic 3D standard cells first. Figure 6 showsstandard cell designs with monolithic 3D for INV, NAND2, NOR2and DFF. The design only uses local interconnects at the bottomPMOS tier. The area reduction compared with their 2D counterpartis about 30 %. The area reduction does not reach 50 % becauseof the following two reasons. First the inter-tier vias occupy somearea, which does not happen in the 2D standard cell. Second, thePMOS occupies more area than NMOS because of its larger sizedue to the worse mobility. Fortunately, the area skew problem isalleviated by using new technologies, such as 32nm and 22nm, thanksto the strained silicon for PMOS [11]. It is reported that usingstrained silicon, the mobility of PMOS is significantly improved andis comparable with the NMOS. Therefore, the area skew betweenPMOS and NMOS can be eliminated. With the balanced PMOSand NMOS, the area of MI-TR gates can be further reduced.Therefore monolithic 3D ICs shows more advantages when going tothe advanced technology below 32nm. Since we 45 nm technologylibrary in this work, we still consider the area skew between PMOSand NMOS. Using these redesigned standard cells and their physicallibrary, we can use existing physical design tools to construct thefull-chip layout, which is a significant benefit over TSV-based 3DICs considering the fact that 3D specific EDA tools have not beenfully developed yet.

B. Gate Level Design

The second design style is gate-level monolithic 3D ICs. In thisdesign style, each gate is placed either on the top device tier oron the bottom device tier, as shown in Figure 5. Each device tierhas several metal layers for interconnect and inter-tier vias are usedto connect the two tiers. The major merit of this design style isthat we can use existing 2D standard cells. Also, we can control theinter-tier via count by properly partition the design. However, gate-level monolithic 3D ICs tend to use more metal layers because the

INV NORINV

NMOS

tier

PMOS

tier

Fig. 4. Illustration of transistor-level monolithic 3D design

top

tier

bottom

tier

ILD

NAND INV

NANDINV

Fig. 5. Illustration of gate-level monolithic 3D design

bottom tier also needs enough metal layers for cell interconnection.Moreover, traditional 2D design tools are not applicable here. Instead,3D placer is required which is not matured in industry yet.

IV. DESIGN AND ANALYSIS

A. Design Technology

The monolithic 3D technology used in this study is similar toCEA/LETI process [1]. The 3D structure is shown in Figure 1,where inter-tier vias and internal vias [2] are used to connectgate/source/drain/metal of upper-device-tier to the lower-tier metals.

[2] by CEA/LETI claimed that 2D regular performance can beachieved with their monolithic 3D process for a single transistor.Therefore, since the major purpose of this study is to figure out thepossible design options, we assume that the transistor performancedoes not change much in monolithic 3D ICs, hence the 2D devicemodel is still applicable.

In this study, we compare four design styles, which are 2D,transistor-level monolithic 3D ICs (MI-TR), gate-level monolithic 3DICs (MI-G) and TSV-based 3D ICs. The TSV-based 3D structure isshown in Figure 8. We use via-first TSV with 2.5 µm in diameterand 30 µm in height. Each TSV is connected to the metal wiresthough M1 and Mtop landing pad.

B. Design and Analysis Flows

We use two different design and analysis flows for the four designstyles. The 2D and MI-TR designs can be fully handled by existing2D commercial tools. Therefore we use Cadence Encounter to placeand route these designs and obtain the layout. Then, we also useEncounter to perform timing optimization and power analysis. Finallywe use Synopsys Primetime to perform timing analysis.

The MI-G and TSV-based 3D design styles require 3D layoutconstruction and analysis tools. There are no existing commercialtools available for the 3D design. Therefore, we use a partition based3D placer in [9] for cell placement. The idea of of this placer isto perform Z-direction cut in addition to XY-direction cut to assigncells into different tiers. Then, we can use Encounter to route eachdie separately. After the layout construction, we use a timing-scalingmethod to perform 3D timing optimization. Then we use Primetimeto perform timing and power analysis. The 3D analysis is based on

INV

NAND2 NOR2

DFF

inter-tier via

Fig. 6. Monolithic standard-cell design. NMOS tier(left or top) uses 1 metallayer. PMOS tier (right or bottom) uses only local interconnect

Partitioning

3D placement

Route each die

separately

Next die?Y

N

Analysis &

Optimization

2D placement

Route only on the

top tier

Analysis &

Optimization

(a) (b)

Fig. 7. Design flow comparison. (a) gate-level monolithic 3D and TSV based3D, (b) transistor-level monolithic 3D.

stitching of the RC parasitic files (.SPEF), the netlist files (.v), andinsertion of the TSV/inter-tier-via information. Figure 7 summarizesthe two design flows. We see that MI-TR design flow is much simplerthan MI-G design flow in terms of number of design steps.

Since the design tools used in these two design flows are verydifferent, the comparisons among designs with different design flowsare not fair. For example, the placement quality of Cadence Encounteris better than the partition-based 3D placer. Our experiments showthat for the same 2D design, Encounter is about 10 % better thanthe 3D placer in terms of wirelength. Since the goal of this studyis not to compare the 3D placer with the commercial tool, we onlycompare the designs with the same design flow (2D vs. MI-TR andTSV-3D vs. MI-G) for fair comparisons.

C. Testbench Circuits

We choose three circuits of different gate count as our benchmarkcircuits. They are FIR filter, FFT processor, and JPEG decoderwhich contains 130K, 591K, and 1.17 million gates, respectively.We implement four designs styles for each circuit, which are 2D,TSV-based 3D, MI-TR, and MI-G. The physical design library weuse is based on Nangate 45 nm PDK. We use the same size for bothlocal via and inter-tier via. We also design our own monolithic 3Dstandard cells for MI-TR, as shown in Figure 6.

TR-level MI (overall)

NMOS-tier (zoom in)

PMOS-tier (zoom-in)

gate-level MI (overall) TSV-based 3D (overall)

top-tier (zoom in)

bottom-tier (zoom in)

top-tier (zoom in)

bottom-tier (zoom in)

Fig. 9. Layouts of different types of designs for the FIR filter. The yellow dots shown in the monolithic designs are inter-tier vias, and the blue squaresshown in TSV designs are TSV M1 landing pads.

Die0

Die1

landing pad

(M1)

face

back

landing pad

(Mtop)

(a) (b)

TSV

2.5um

M1 Landing Pad

Mtop Landing Pad

30

um

3umdevice

metal

layers

Fig. 8. (a) TSV size (b) TSV based 3D structure

The layout of the three 3D designs for the FIR filter are shown inFigure 9. The cell placement density of all the designs is 60 %. Table Iand II list the area and the vertical connection (TSV or inter-tier via)counts in each design. We see that the MI-TR has the most fine-grained vertical connection, therefore it shows the biggest footprintarea among the 3D designs. Compared with 2D designs, the MI-TRcircuits has a smaller footprint because of the smaller standard cellfootprint. In the TSV-based 3D designs, TSVs occupy 2% to 8% areadue to the large size of the TSV. In contrast, the percentage of inter-tier via area in the MI-G designs is almost 0. This is why TSV-based3D design usually occupies a larger area than the MI-G design.

D. Analysis and Comparison

The metrics used in this study include area, total wirelength,timing, and power consumption. In terms of timing, we compare bothlongest path delay and total negative slack. All the metrics reportedare simulation results after timing optimization.

We first compare 2D designs with MI-TR designs. Table I showsthe analysis results. In terms of total wirelength, we see that all theMI-TR designs shows a shorter wirelength than the 2D designs by9 % to 17 %. This is expected because a reduced footprint naturallyleads to a reduced wirelength. We also see from the FIR circuit thatthe MI-TR design tends to use more metal layers. This is becausethe routing space for MI-TR is smaller than 2D designs, thereforethe routability for MI-TR is worse than 2D ICs.

We also observe that all the MI-TR designs achieve better longestpath delay than 2D designs by approximately 3 % to 8 % , because ofthe reduced wirelength. We see that the longest path delay reductionis not as significant as the wirelength reduction. This is because thepath delay is also strongly affected by the device, which we assumethe same in the two design styles. As for the total negative slack,we see that the improvement of MI-TR is from 7 % to 35 %, whichis very significant. Compared with longest path delay, the TNS isa metric that evaluates many timing paths together rather than onlyone path. To obtain a more comprehensive understanding on how MI-TR improves the timing compared with 2D ICs, we draw the path

-2.0 -1.5 -1.0 -0.5 0.00

20

40

60

80

100

120

140#

timin

g pa

th

timing slack (ns)

FIR 2D

-2.0 -1.5 -1.0 -0.5 0.00

20

40

60

80

100

120

140

# tim

ing

path

timing slack (ns)

FIR MI-TR

-1.0 -0.8 -0.6 -0.4 -0.2 0.00

50

100

150

200

250

300

# tim

ing

path

timing slack (ns)

FFT 2D

-0.8 -0.6 -0.4 -0.2 0.00

50

100

150

200

250

300#

timin

g pa

th

timing slack (ns)

FFT MI-TR

-2.0 -1.5 -1.0 -0.5 0.00

200

400

600

800

1000

1200

1400

# tim

ing

path

timing slack (ns)

JPEG 2D

-2.0 -1.8 -1.6 -1.4 -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.00

200

400

600

800

1000

1200

1400

1600

# tim

ing

path

timing slack (ns)

JPEG MI-TR

(a)

(b)

(c)

Fig. 10. Negative timing slack distribution of 2D and MI-TR designs for (a)FIR filter (b) FFT processor (c) JPEG decoder

delay distribution for the three designs as shown in Figure 10. Fromthe distribution, we clearly see that MI-TR has a better potential fortiming improvement, because it has much fewer paths that violatethe timing constraints.

We also identify that the power consumption of MI-TR designs arebetter than that of 2D designs for FIR, FFT and JPEG circuits by 1 %to 7 %. The power consumed by the wire and devices both reduces.The reduction of wire power is because of the reduced wirelength.The device power reduction is because of less buffers added due to theshorter wirelength. The reduction in total power is not as significantas the wirelength, because the power consumption is strongly affectedby the devices as well, whose power reduction is not as significantas the wire. Experimental results show that the wire power consumes15 %, 28 % and 40 % of the total power in FIR, FFT, and JPEGrespectively. This explains why the JPEG circuit achieves the biggestpower reduction, because it has the biggest wire power portion dueto the large circuit size. Based on this result, we predict that largerMI-TR designs have more potential in power saving because of thebigger wire power portion.

Now, we compare the MI-G designs with the TSV-based 3D de-signs. Table II shows the analysis results. We observe that comparedwith TSV-based 3D designs, the MI-G designs have smaller area,shorter wirelength, better timing, and smaller power consumption.The reduction in area is due to the smaller size of inter-tier via thanthe TSV. The improved timing for MI-G is from both the reducedwirelength and the smaller parasitics of the inter-tier vias than theTSVs as analyzed in section II. Due to the small parasitics of the

inter-tier vias, the timing optimizer does not need to insert manybuffers in MI-G designs as in the TSV-based 3D circuit. Therefore,power can be saved through the reduced number of buffers in theMI-G case.

V. DISCUSSIONS

A. The Impact of Chip Area

The analysis in section II explains well why MI-G shows a betterperformance over TSV-based 3D ICs. On the other hand, the benefitsof MI-TR are actually coming from the smaller chip area comparedwith 2D ICs as analyzed in section IV. It is MI-TR designs’ smallerfootprint area that results in a reduced wirelength. Also, a bettertiming and power can be achieved due to the reduced wirelength. Tobetter understand how MI-TR outperforms the 2D designs in termsof footprint area, we study the impact of using different chip areasin MI-TR designs. We take the FIR filter as an example.

To manipulate the chip area, we change the placement density ofthe FIR filter from 50 % to 85 %. To ensure routability with highplacement density, we allow the router to use up to 10 metal layers.We record the wirelength, timing, and power change with the areaas shown in Figure 11. From Figure 11(a), we clearly see that asmaller chip area is beneficial for wirelength reduction. Of course,as we push the placement density to the upper limit, the routabilitybecomes worse, and the wirelength also begins to increase. This isbecause as the routing becomes difficult, the routing quality may alsodegrade as a result. The timing results of LPD and TNS both showthat the MI-TR designs result in a better timing than the 2D ones.The general trend also reveals that the timing improves as the areareduces. Same trend is valid for the power consumption. The timingand power improvement trend with area reduction is not as strongas the wirelength, because the timing and power are also stronglyaffected by the devices.

B. Exploration on Technology Improvement Options

As discussed above, the smaller area is the key factor that helpsreduce the wirelength in MI-TR designs. However, in larger designssuch as FFT and JPEG, we cannot push the placement density to ahigh level as in the 2D designs. This is because in MI-TR designs,each standard cell footprint is shrinked by 30 % percent, resultingin 30 % area reduction. However, the interconnect does not scalesimultaneously with the standard cell, which results in the difficultyin the MI-TR design routing. For example, using a force directedplacer without routability awareness, we obtain more and more DRCerrors as we keep reducing the area. Figure 12 shows the DRC errorsof JPEG circuit with different placement densities. We see that for thelarger circuit such as JPEG, the design is severely wire constrainedrather than device constrained. Therefore, the interconnect should beimproved by either adding more metal layers or reducing the metalwidth and pitch.

We first examine the impact of adding more metal layers. Assumethat the default number of metal layer is 10. We increase the availablemetal layers in the physical design library from 10 to 14. The routingresults in Table III shows that the DRC errors reduce as the thenumber of metal layer increases, which is expected. However, wesee that the DRC error count is still huge even if we use 14 metallayers. This is because adding more top metal layers is not veryefficient in solving the local interconnection problem. On the otherhand, more metal layers will significantly increase the fabricationcost. Therefore, we conclude that adding more metal layers is not anefficient solution to the routing problem in MI-TR designs.

TABLE IDESIGN AND ANALYSIS SUMMARY OF 2D VS MI-TR

footprint total silicon inter-tier via % area by total metal WL LPD TNS total power wire power device power(µm2) area (µm2) count inter-tier via layers (µm) (ns) (ns) (mW ) (mW ) (mW )

FIR filter (130K gates)MI-TR 365×361 263,530 550K 4% 7 5.65×105 4.40 792.8 58.6 9.5 49.1

2D 449×445 199,805 0 - 6 6.65×105 4.75 843.5 59.4 10.0 49.4FFT processor (591K gates)

MI-TR 874×874 1,527,752 3.9M 5% 10 9.71×106 3.16 154.3 172.4 49.6 122.82D 1126×1126 1,267,876 0 - 10 11.6×106 3.39 236.9 175.5 51.5 124.0

JPEG decoder (1.17M gates)MI-TR 1081×1081 2,337,122 7M 6% 10 1.33×107 5.79 355.6 309.9 119.9 190.0

2D 1319×1312 1,730,528 0 - 10 1.41×107 5.98 500.2 330.2 133.8 196.4

TABLE IIDESIGN AND ANALYSIS SUMMARY OF MI-G AND TSV-BASED 3D ICS

footprint inter-tier via/ % area by TSV/ total metal WL LPD TNS total power wire power device power(µm2) TSV count inter-tier vias layers (µm) (ns) (ns) (mW ) (mW ) (mW )

FIR filter (130K gates)MI-G 334×334 373 almost 0 12 6.72×105 4.96 330 64.4 10.3 54.1

TSV-3D 342×342 373 8% 12 6.74×105 5.58 883 69.2 10.6 58.6FFT processor (591K gates)

MI-G 775×775 470 almost 0 20 1.21×107 3.96 403 173.6 51.7 121.9TSV-3D 778×778 470 2% 20 1.23×107 4.23 582 185.1 53.5 131.6

JPEG decoder (1.17M gates)MI-G 930×930 780 almost 0 20 1.46×107 6.11 174 349.7 134.1 215.6

TSV-3D 933×933 780 2% 20 1.46×107 6.20 362 355.1 136.3 218.8

Fig. 11. impact of chip area on (a) wirelength (b) longest path delay (c) total negative slack (d) power

We then reduce the metal pitch and width to see the impact. Theoriginal and new metal pitches are shown in Table IV. With theJPEG design above, we see from Table V that the DRC errors dropsignificantly under each placement densities with the same number

of metal layers. Therefore, reducing metal pitch and width is a moreefficient process option to solve the routability problem than addingmore metal layers.

We further explore the impact of smaller metal width/pitch on the

65% 70% 75% 85%100

101

102

103

104

105

# D

RC

err

ors

Placement density

Fig. 12. # DRC errors with different placement densities

TABLE IIIIMPACT OF ADDING MORE METAL LAYERS ON ROUTING FOR JPEG

BENCHMARK, MI-TR DESIGN STYLE

# Metal layers # DRC errors wirelength (um)10 193519 1.469×107

11 160116 1.465×107

12 139039 1.454×107

13 114008 1.452×107

14 101137 1.451×107

circuit performance. We re-characterize the interconnect models forthe reduced metal pitch/width, finish physical design, and performtiming analysis. Table VI lists the design and analysis results on theJPEG circuit. We observe that after reducing the metal width/pitch,the wirelength further decreases. This is because with smaller metalpitch, more routing tracks are available on the same footprint.Therefore the router has more freedom to perform a better routing. Wesee that the timing also improves because of the reduced wirelengthand reduced wire parasitics.

VI. DESIGN CHALLENGES WITH MONOLITHIC 3D ICS

In the above study, we showed the benefits of monolithic 3DICs in terms of area, wirelength, timing, and power consumptioncompared with 2D and TSV-based 3D ICs. However, there are someunique design challenges associated with monolithic 3D technology.One challenge is about routability, which we discussed in SectionV. Moreover, with the denser wires in MI-TR designs, the couplingcaused signal-integrity (SI) problem may become severe. If we reducethe metal width and pitch as suggested in section V, the SI maybecome even worse. Besides, since the two tiers of device arefabricated sequentially, there is a high chance for a global variationbetween two tiers, which affects the timing and yield. This sectiondiscusses these two potential challenges in monolithic 3D designs.

A. Inter-tier Global Variation Study

Compared with 2D process, one of the unique characteristic inthe MI-TR circuit is that the PMOS and NMOS are fabricatedsequentially. This unique process will introduce global variationbetween the PMOS and NMOS tiers, which is known as global P-to-N skew. In this section, we analyze how the global P-to-N skewaffects the performance of the circuit.

To analyze the impact of global P-to-N skew, we generate differenttiming libraries for each standard cell considering the top NMOSVth variations. We first examine the the impact of the global NMOS

TABLE IVDEFAULT AND REDUCED METAL WIDTH/PITCH USED IN THE EXPERIMENT

default default reduced reducedwidth (um) pitch (um) width (um) pitch (um)

M1 M3 0.065 0.19 0.045 0.125M4 M6 0.14 0.285 0.095 0.185M7 M8 0.4 0.885 0.265 0.535M9 M10 0.8 1.71 0.535 1.065

TABLE VDRC ERROR COUNTS BASED ON DIFFERENT PLACEMENT DENSITIES

USING SMALLER METAL PITCH/WIDTH

65% 70% 75% 85%default pitch/width 22 67 1384 193519reduced pitch/width 0 0 5 832

Vth variation on the longest path delay by performing deterministicsimulations. Figure 13 shows the impact of Vth variation on thelongest path delay for the FIR filter design. Based on these timinglibraries, we perform Monte Carlo timing simulations considering aPoisson distributed 30 mV global variation on Vth for the NMOStier, as shown in Figure 14. We see that the global Vth variationcauses more than 10 % variation on the longest path delay, whichcan significantly affect the yield. If we consider local variation aswell, the problem could be more severe.

B. Signal Integrity Issues in Monolithic 3D ICs

As discussed in section V, the smaller routing space in MI-TRdesigns may cause routability problems. The routing congestionalso results in a potential signal integrity problem. The couplingcaused delay degradation will harm the timing performance of MI-TR circuits. This section analyzes the signal integrity problems inMI-TR designs.

We perform timing degradation analysis on the MI-TR FFT designand compare it with the 2D design. Figure 15 shows the delaydegradation distribution comparison. We observe that due to routingcongestion induced wire coupling, the MI-TR design shows moretiming degradation compared with the 2D counterpart. Therefore, inthe MI-TR designs, we should pay more effort on SI issues thanin the 2D designs. Also, SI-aware routing and timing optimizationshould be adopted to the MI-TR design flow.

VII. CONCLUSIONS

Monolithic 3D technology provides new opportunities for furtherreducing wirelength and improving chip performance. We proposetwo physical design methodologies, namely gate-level monolithic3D ICs (MI-G) and transistor-level monolithic 3D ICs (MI-TR).Experimental results show that MI-TR design style shows advantagesin area, wirelength, timing, and power compared with 2D ICs,because of the smaller footprint. In addition, thanks to the smallsize and parasitics of inter-tier vias, MI-G designs also demonstrateadvantages in area, timing, and power compared with the TSV-based3D counterparts. Since the smaller footprint of MI-TR is the keyreason why MI-TR is superior to 2D desings, we analyze the impactof footprint on the circuit performance. To overcome the routingproblem due to the smaller footprint, we suggest improving theprocess technology by reducing the metal width and pitch rather thanadding more metal layers. Finally, the challenges in monolithic 3Ddesigns are discussed. The simulation and analysis results show thatdesigners should pay attention to inter-tier global variation and signalintegrity issues when designing monolithic 3D ICs.

Fig. 14. Monte Carlo timing analysis considering the inter-tier global Vthvariation.

50-100 100-150 150-200 200-250 250-300 >300

10

100

1000

10000

# vc

itim

net

s

Coupling-caused delay degradation (ps)

transistor-level monolithic 3D 2D

Fig. 15. Coupling-caused delay degradation analysis on transistor-levelmonolithic 3D and 2D designs

TABLE VIIMPACT OF WIRE WIDTH/PITCH ON DESIGN QUALITIES

footprint # metal wirelength LPD(um2) layer (um) (ns)

2D (default 1319×1312 10 1.41×107 5.98pitch/width)

MI-TR (default 1081×1081 10 1.33×107 5.79pitch/width)

MI-TR (reduced 1081×1081 10 1.19×107 5.70pitch/width)

Fig. 13. The impact of Vth variation on the longest path delay for FFTdesign before timing optimization.

REFERENCES

[1] P. Batude et al., “Advances in 3D CMOS Sequential Integration,” inProc. IEEE Int. Electron Devices Meeting, 2009.

[2] O. Thomas et al., “Compact 6T SRAM cell with robust Read/Writestabilizing design in 45nm Monolithic 3D IC technology,” in Proc. IEEEInt. Conf. on Integrated Circuit Design and Technology, 2009.

[3] S.-M. Jung et al., “A 500-MHz DDR High-Performance 72-Mb 3-DSRAM Fabricated With Laser-Induced Epitaxial c-Si Growth Technol-ogy for a Stand-Alone and Embedded Memory Application,” in IEEETrans. on Electron Devices, 2010.

[4] B. Rajendran, “Sequential 3D IC Fabrication: Challenges and Prospects,”in IEEE Trans. on Electron Devices, 2010.

[5] P. Batude et al., “3D CMOS Integration: Introduction of Dynamiccoupling and Application to Compact and Robust 4T SRAM,” in Proc.IEEE Int. Conf. on Integrated Circuit Design and Technology, 2008.

[6] L. Chang et al., “Stable SRAM Cell Design for the 32nm Node andBeyond,” in Symposium on VLSI Technology, 2005.

[7] M. B. Healy et al., “ Design and Analysis of 3D-MAPS: A Many-Core3D Processor with Stacked Memory,” in Proc. IEEE Custom IntegratedCircuits Conf., 2010.

[8] D. H. Kim, R. Topaloglu, and S. K. Lim, “ Block-level 3D IC Designwith Through-Silicon-Via Planning,” in Proc. Asia and South PacificDesign Automation Conf., 2012.

[9] D. H. Kim, K. Athikulwongse, and S. K. Lim, “A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout,” in Proc. IEEE Int.Conf. on Computer-Aided Design, Nov. 2009, pp. 674–680.

[10] G. V. der Plas et al., “Design Issues and Considerations for Low-Cost3-D TSV IC Technology,” in IEEE Journal of Solid-State Circuits, Jan.2011, pp. 293–307.

[11] S.-Y. Wu et al., “A 32nm CMOS Low Power SoC Platform Technologyfor Foundry Applications with Functional High Density SRAM,” inProc. IEEE Int. Electron Devices Meeting, 2007.

a design tradeoff study with monolithic 3d integration design tradeoff study with monolithic 3d...

Documents