advanced low power, multi-supply implementation techniques ... · being low-power, the complexities...

26
Advanced Low Power, Multi-Supply Implementation Techniques for 65nm and Beyond using DCT and ICC Dwight Galbi Project Leader Analog Devices, Inc. [email protected] Brandon T. Waldo Senior Design Consultant Synopsys Professional Services, Synopsys, Inc. [email protected] ABSTRACT Designing low-power ASICs in the nanometer era using 65nm and beyond can be complex. With leakage power becoming more dominant as the process technology shrinks, more methods to reduce idle power need to be used. Multi-supply designs with power-down blocks allow for large reductions in leakage power with the trade-off of design complexity. In this paper we will discuss the methodology and flow that was used to implement a multi-supply design with the latest EDA tools: Design Compiler Topographical and IC Compiler.

Upload: others

Post on 08-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Advanced Low Power, Multi-Supply Implementation Techniques for 65nm and Beyond using DCT and ICC

Dwight Galbi

Project Leader

Analog Devices, Inc. [email protected]

Brandon T. Waldo

Senior Design Consultant Synopsys Professional Services, Synopsys, Inc.

[email protected]

ABSTRACT Designing low-power ASICs in the nanometer era using 65nm and beyond can be complex. With leakage power becoming more dominant as the process technology shrinks, more methods to reduce idle power need to be used. Multi-supply designs with power-down blocks allow for large reductions in leakage power with the trade-off of design complexity. In this paper we will discuss the methodology and flow that was used to implement a multi-supply design with the latest EDA tools: Design Compiler Topographical and IC Compiler.

Page 2: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Table of Contents 1.0 Introduction......................................................................................................................... 4 2.0 Design and Flow Overview ................................................................................................ 4 3.0 Logic Synthesis................................................................................................................... 6 3.1 RTL Implementation ........................................................................................................ 6 3.1.1 Power Domains ............................................................................................................ 6 3.1.2 Isolation Cells .............................................................................................................. 7 3.2 Clock Gating..................................................................................................................... 8 3.3 Synthesis ........................................................................................................................... 9 3.4 Scan Insertion ................................................................................................................. 10 4.0 Physical Implementation................................................................................................... 10 4.1 Power-Down Control Design ......................................................................................... 11 4.2 Floorplanning.................................................................................................................. 12 4.2.1 Power Domains and Voltage Areas ........................................................................... 12 4.2.2 MTCMOS Placement and Configuration .................................................................. 14 4.2.3 Power Grid Insertion.................................................................................................. 15 4.3 Physical Synthesis .......................................................................................................... 16 4.3.1 Power Optimization ................................................................................................... 16 4.3.2 Scan Reordering......................................................................................................... 16 4.3.3 CTS ............................................................................................................................ 17 5.0 Sign-Off ............................................................................................................................ 18 5.1 Verification ..................................................................................................................... 18 5.2 Power Analysis ............................................................................................................... 19 5.3 Static IR-Drop................................................................................................................. 19 5.4 Dynamic IR-Drop and In-Rush Current Analysis .......................................................... 20 6.0 Conclusion ........................................................................................................................ 20 7.0 Acknowledgements........................................................................................................... 20 8.0 References......................................................................................................................... 21 9.0 Appendix........................................................................................................................... 22 9.1 Script to place the switch cells ....................................................................................... 22 9.2 Script to connect the switch cell chains.......................................................................... 24

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

2

Page 3: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Table of Figures Figure 1 - Top Level ASIC.......................................................................................................... 5 Figure 2 - Design Flow................................................................................................................ 5 Figure 3 - GTECH Isolation Cells ............................................................................................... 8 Figure 4 - Switch Cell Design and Usage.................................................................................. 12 Figure 5 - DSP Block with Power-Down Region...................................................................... 13 Figure 6 - Switch Cell Placement and Sleep Enable Signal Connections ................................. 14 Figure 7 - Power Grid Connections (Top Metal Layers Only).................................................. 15 Figure 8 - Power-down block scan chains before reordering.................................................... 17 Figure 9 - Power-down block scan chains after reordering....................................................... 17 Figure 10 - Power-down domain clock crossings ................................................................... 18 Figure 11 - VDD and VSS Static IR-Drop Maps .................................................................... 19 Figure 12 - Power-Down Region In-Rush Current ................................................................. 20 © COPYRIGHT 2007 Analog Devices, Inc. © COPYRIGHT 2007 Synopsys, Inc. ALL RIGHTS RESERVED This entire notice must be reproduced on all copies of this file and copies of this file may only be made by a person if such person is permitted to do so under the terms of a subsisting license agreement from Analog Devices, Inc. and Synopsys, Inc.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

3

Page 4: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

1.0 Introduction The purpose of this paper is to discuss the implementation of a low-power, multi-supply ASIC for a 65nm technology. This ASIC was developed and implemented by Analog Devices, Inc. (ADI) with the assistance of Synopsys Professional Services. The design of a 65nm ASIC design can be quite complex. If you are interested in the design being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially when using power-down blocks, the complexities increase even more.

For this chip, the reduction of leakage power was a major goal. When certain parts of the chip were idle, we wanted the least amount of power possible. Dynamic power was addressed with aggressive clock gating and analysis of target technologies without the need to lower the voltage levels of specific portions of the design (multi-voltage). If a only a single voltage level is necessary, but using a power-down portion of the design is desired, this is known as a multi-supply design. This is an attractive design model for our goals. So the decision was made to trade-off design complexity for lower leakage power. This paper discusses the flow and design methods used to implement this multi-supply ASIC chip. Due to the proprietary nature of this design, we are not at liberty to divulge specific data that could compromise ADI’s intellectual property. 2.0 Design and Flow Overview The ADI design was a large ASIC containing onboard system memory, a microprocessor core and a high-performance DSP core. It was determined that the DSP core would benefit the most by having a power-down region to reduce leakage power. The design complexity impact of the power-down block was to be minimized by not requiring retention registers. Because the chip was operating at the same voltage throughout, there was no need for level shifters. Without retention registers, it was determined that the power-down portion of the DSP block would be approximately 90% of the overall standard cell logic of the core. A picture of the top-level completed layout of the ASIC chip can be seen in Figure 1. For this multi-supply design, we developed a specific design flow that took advantage of the multi-voltage capabilities of the Synopsys tools. A top-level overview of the design flow is shown in Figure 2. This figure also indicates the tools used at each step of the flow and highlights the major accomplishments of each step. These design steps will be discussed in more detail below. This design was implemented with a modified version of Synopsys Professional Services’ Pilot flow. We began with Pilot version 1.2 and made several project specific modification and enhancements. With the Pilot flow, we were able to implement most of the flow with automated scripts. This made design exploration very easy.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

4

Page 5: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Figure 1 - Top Level ASIC

Figure 2 - Design Flow

Logic

Synthesis

Physical

Implementation

Sign-Off

Clock Gating Synthesis

Leakage Optimization Scan Insertion

DCT

Floorplanning MTCMOS Structure

Physical Synthesis Leakage Optimization

Scan Reordering

ICC

Verification Power Analysis

IR-Drop Analysis

ICC PrimeTime-PX

Astro-Rail PrimeRail

RTL

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

5

Page 6: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

3.0 Logic Synthesis For this design we used Design Compiler – Topographical (DCT) version 2006.06 for the logic synthesis tool. DCT is able to use topographical information to determine interconnect properties, and therefore a wire-load model is no longer necessary. Physical constraints can be fed back from the floorplanning step into DCT to give a more accurate account of the physical characteristics. When using DCT, the DC vs ICC correlation of the power analysis results is much better. Also, the quality of results (QOR) of the ICC timing optimization is better when DCT is used, along with the timing correlation. There is a tradeoff of the time to results (TTR) since DCT requires longer runtime. Although the DCT results take longer, this tool produces a netlist that improves the QOR in ICC and also reduces runtime in ICC, especially as the design size increases. Papers have been written which show the advantages of using DCT over DC.[1][2] With multi-voltage or multi-supply designs, the voltage areas should be created within DCT in order to provide their physical constraints. This information can be very important to the correlation of results from DCT to ICC. Once the design’s floorplan was in place, the voltage areas were created in DCT with fed-back information. Below we discuss the details of the logic synthesis step. 3.1 RTL Implementation

In order to properly implement a multi-supply design, you must be able to apply the design’s specification of the power distribution and management. This specification needs to be done in a way that the EDA tools can comprehend and interpret. The current method to implement these power specifications is to imbed them inside the RTL code. For a multi-supply design, power domains and signal isolation intentions need to be specified in the RTL code with the use of system tasks ($power and $isolate) to provide information to implement and verify the design’s power structure. Power domains and isolation cells can be created in the design by other methods as well, as discussed below. In the future, the new Unified Power Format (UPF) Standard [3] could be used to specify the power characteristics of a design. This is an industry wide standard that will allow design power specifications to be used across multiple implementation and verification tools without RTL modifications. 3.1.1 Power Domains

Power domains are logic groupings of one or more hierarchical blocks in a design that share power net hookup requirements, power down control and acknowledge signals, and power switching style. These front end design constructs describe the design partitions which have a specific power behavior with respect to the rest of the design. Power net info objects are used with power domains to specify the power and ground net connections. Power domains are strictly a logic construct that, along with net info objects, specify the power distribution and management of the design. With this specification, the domain interfaces, shut-down logic connections and always-on logic connections are derived automatically.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

6

Page 7: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Using Verilog, the power domain is specified using the $power system task. The Presto reader passes power domain information to synthesis by the means of this system task:

$power (<domain_name>, <power_on_net>, <on_sense_expression>, <power_on_ack_net>, <ack_sense_expression>, <instance1>, <instance2>, …, <instanceN>);

This system task will give the same power specifications to one or more module instances, but it does not get explicit connections to the power network or any other constraints that are not associated with the RTL simulation. The $power system tasks are implemented as PLI calls into the RTL simulation engine. In order to reduce the complexities of using a PLI call, in the ADI design the power domains were created during synthesis by means of the create_power_domains command. This is discussed in more detail in section 3.3 below. Special verification methods were created by ADI to fully test the power domains without the use of the $power system tasks. 3.1.2 Isolation Cells

The power-down enabled blocks must be isolated to maintain predictable output levels. Signal isolation is achieved with isolation cells. It is important to note that when the voltage level is the same for the whole design, the timing engine does not have to scale the signals crossing the domain interfaces. Because isolation cells alter design functionality, adding these cells in the RTL code instead of later in the design flow can avoid equivalence coverage problems. Isolation cell insertion can be done by GTECH instantiation or by using the $isolate system task. The syntax of $isolate is:

$isolate( <out>, <enable>, <iso_out>, <data> )

Based on the negation of the enable signal and the constant output value, the correct GTECH cells are generated by Presto during elaborate. The compile command maps the inferred GTECH cells to the corresponding target library cells. Note that $isolate allows the specification of isolation cells on a hierarchical net basis. The user can choose different types of isolation on input, output, or specific nets. There are six cells that model isolation cell functionality in the GTECH library, shown in Figure 3. [4] They are recognized by the compile command’s mapping engine and have simulation models for both VHDL and Verilog. Once used, they are converted to the isolation cells from the technology library.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

7

Page 8: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

EN DO GTECH Cell Name 0 Isolating: 1 1 DI

GTECH_ISO1_EN1

0 DI 1 Isolating: 1 GTECH_ISO1_EN0

0 Isolating: 0 1 DI GTECH_ISO0_EN1

0 DI 1 Isolating: 0 GTECH_ISO0_EN0

0 Isolating: Last DI 1 DI GTECH_ISOLATCH_EN1

0 DI 1 Isolating: Last DI GTECH_ISOLATCH_EN0

Figure 3 - GTECH Isolation Cells

If the $isolate system task is used in the RTL, verification with VCS will require a special PLI. It is also possible to include isolation cells in the RTL by simply instantiating the GTECH cells into the code. This method was chosen because it did not require the use of a PLI and therefore reduced the implementation complexity. One thing to note, because the power domains were not specified in the RTL, the following variable had to be set before elaboration in order to have the GTECH isolation cells instantiated properly: set allow_iso_cell_without_power_domains true

3.2 Clock Gating

In order to reduce the dynamic power of the ASIC design, clock gating was used at all levels of the design implementation. We set the clock gating style as shown below: set_clock_gating_style \ -sequential_cell latch \ -max_fanout 8 \ -no_sharing \ -minimum_bitwidth 4 \ -control_point before \ -control_signal test_mode \ -positive_edge_logic {integrated:<library_name>/<ICG_cell>} \ -negative_edge_logic {integrated:<library_name>/<ICG_cell>}

We used high Vth (HVT) cells for the clock gating elements when possible and used SVT cells when required by timing. Simply letting DCT incorporate clock gating cells into the design significantly reduced the dynamic power consumption.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

8

Page 9: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

3.3 Synthesis

After elaboration the power domains must be created. Since our design required isolation cells, the power domains must be created before running the compile command otherwise DC would fail. If the power domains had been defined in the RTL using the $power system task, we could have used the infer_power_domain command to generate the power domains. Since we didn’t include these system tasks in the RTL we used the create_power_domain command, as discussed earlier in this document. The drawback of this method is that the RTL simulation engine was not be able to fully interpret the power specifications for proper simulation. Therefore, other methods of verifying the power specifications were necessary. The power domains were created during synthesis of the DSP block by using the following commands: # Create the top-level power domain create_power_domain TOP # Create the power domain for the power-down block create_power_domain POWER_DOWN \ -object_list [get_cells pwrdwn_domain_iso/pwrdwn_domain] \ -power_down \ -power_down_ctrl [get_net LONGCHAIN_BSWITCH_ENABLE]

During compilation, the isolation cells are inserted at the interface of the power-down block and the top level block. In order to protect these isolation cells from being changed and to prevent cells from being inserted between the output of the power-down block and the isolation cell, DC automatically sets a dont_touch attribute on the input of these cells. It also sets a dont_touch attribute on the isolation cell itself. These dont_touch attributes can’t be removed by the designer. In many cases, the isolation cells (and other multi-voltage specific cells) are contained in a different library and have dont_touch and dont_use attribute settings in the library. These settings need to be removed before the first compile in order to enable the insertion of the isolation. An example of these settings is below: # Remove the dont_use and dont_touch attribute from the isolation cells remove_attribute <library_file>.db:<iso_library_name>/<cell_name> dont_use remove_attribute <library_file>.db:<iso_library_name>/<cell_name> dont_touch

In multi-voltage designs, different operating conditions are specified for the voltage areas in order to apply different voltage levels to the multi-voltage islands. In the ADI design, there is only one voltage level for the entire chip. The power-down region does not require a different voltage operating condition, so the settings do not need to change.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

9

Page 10: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Before the initial compile, the power net information must also be specified. This tells the tool the power net names and if they are power or ground connections. Once the power nets have been specified, their connections to the power domains can be specified. An example of these settings is shown below: ## Create Power Net Info create_power_net_info VDD -power create_power_net_info VSS -gnd create_power_net_info VDD_PWRDWN -power ## Connect the power nets to the power domains connect_power_domain TOP -primary_power_net VDD -primary_ground_net VSS connect_power_domain POWER_DOWN \ -primary_power_net VDD \ -primary_ground_net VSS \ -internal_power_net VDD_PWRDWN

3.4 Scan Insertion

For an IC Compiler based design, the recommended DFT flow is to stitch the scan chains during the logic synthesis stage and then reorder them during the physical implementation stage. All scan synthesis was performed by using DFT Compiler within Design Compiler - Topographical. After completing scan synthesis, physically aware scan chain reordering was performed in IC Compiler during placement optimization. Within the RTL, the scan ports for the power-down region of the block were instantiated. This allowed isolation cells to be placed on the scan-out ports as necessary. Otherwise, if the isolation cells had been added after scan insertion, it would have created additional complexity with the formal verification procedure and added to verification time in the long run. When using DFT compiler on a multi-supply design, it clusters the scan chains within the power domain. The goal is to reduce the number of power domain crossings which reduced the number of isolation cells, and additional route complexity. The insert_dft command is multi-voltage aware to handle the power domains and isolation cells within the ADI design. In order to do scan-reordering within ICC, DFT related information must be passed from DCT to ICC. A SCANDEF file was generated to provide this information with the use of the write_scan_def command. The ASCII formatted SCANDEF file gives a list of stub chains, or scan chain segments, based on the DFT configuration provided during logic synthesis. The SCANDEF file is also muti-voltage aware to support isolation cells and power domains.[5] 4.0 Physical Implementation The physical implementation of the ADI chip was performed with IC Compiler version 2006.06. ICC was used for floorplan generation, placement, clock-tree synthesis, and route. ICC is multi-

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

10

Page 11: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

voltage aware and has many features that were used to implement our design. Below we discuss some specific details of the physical implementation flow. 4.1 Power-Down Control Design

ADI’s designers chose to implement coarse grained multi-threshold CMOS (MTCMOS) gating using header p-channel transistors. Footer transistors would require a deep N-well under all of the logic being gated. This would create a large reverse biased diode which may give another leakage current path and unnecessary capacitance on the power domain. With header cells, a global VSS is used across the die which should result in less total IR-drop. The placement decision of the switch cells with respect to the power-down region brings up another trade-off. The cells could be placed in columns inside the power-down region, in a grid inside the power-down region, or around the perimeter of the power-down region. If the switch cells are placed inside the power-down region, in-theory you would get better IR-drop results. But the complexity of the design increases due to the power-down region having two power levels: the global VDD and the power-down internal VDD. If the switch cells are placed around the perimeter of the power-down region, a clear boundary between the global VDD and the power-down internal VDD nets is made. This makes power routing less complex without having to implement “always-on” logic within the power-down region. To reduce complexity, we chose to implement the switch cells around the perimeter of the power-down region. When the design is in sleep mode, the header switch cells are all turned off, resulting in the local VDD being gated off. When the design is woken up, the switch cells are turned back on, bringing the local VDD back to the global VDD level. The rate at which these switch cells are turned on must be controlled in order to limit the amount of charge drained from global VDD and reduce the in-rush current. We accomplished this by limiting the number, and the rate at which these cells were turned on. The switch cell design used, and its implementation with the power-down region, is shown in Figure 4. The rate at which they are turned on is controlled by connecting the sleep enable signals with a buffer chain. The number of switch cells in the chain was determined by an initial circuit analysis and later confirmed and refined by dynamic IR-drop analysis. In order to reduce the IR-drop between the global and local VDD, the impedance between these supplies must be lowered as much as possible. In order to do that, more switch cells must be turned on after the local VDD supply has been restored. With the size of the power-down region, a large number of switch cells could be used around its perimeter – many more than would be necessary to bring the local VDD back up to the global VDD supply level. These extra switch cells were used to lower the impedance between the supplies. The design block containing the power-down region, showing the switch cells and the isolation cells is shown in Figure 5. The wakeup control of the power-down block is begun with two enable signals. The first enable signal, the “trickle”, is sent to one switch cell. Then the trickle signal is buffered and sent to another switch cell – and so on until all of the switch cells in the trickle chain are turned back on. This will allow the local VDD signal to turn on in a controlled manner. Then, after a

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

11

Page 12: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

predetermined time, the second enable signal, the “hammer”, will be sent to the first of the remaining switch cells in a buffer chain. The trickle chain only had a small amount of cells, while the hammer chain had the remaining number of cells that fit around the perimeter of the power-down region.

VDD

VDD_PWRDWN

EN_IN EN_OUT

Switch Cell

Power-Down Region

Switch Cell Switch Cell

EN_OUT EN_IN

Figure 4 - Switch Cell Design and Usage

4.2 Floorplanning

The placement of the power-down region along with the switch cells must be done in the floorplanning step. Below we discuss the details of each major stage of the floorplanning step. 4.2.1 Power Domains and Voltage Areas

Power domains were created during initial synthesis. The power domain information is transferred to ICC with the ddc file format. The power domains and voltage areas should maintain an one to one mapping relationship. As stated earlier, the power-down area of the DSP core contains approximately 90% of the standard cells of the core. The size and shape of the power-down region is critical to the performance of the design. In this instance the power down region needed to interface with many areas of the always-on region. To reduce complexity a rectangular, as apposed to a rectilinear, shape was chosen. As seen in Figure 5, this left a small area around the power-down region that still maintained always-on power. The size of the power-down region was determined by limiting the standard cell utilization in the region, while still allowing enough area around the region for routing of the always-on region. The voltage area was created using the following command:

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

12

Page 13: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

create_voltage_area \ -coordinate "$va(llx) $va(lly) $va(urx) $va(ury)" \ -power_domain POWER_DOWN

Power-Down Region

Switch Cells Isolation Cells

Figure 5 - DSP Block with Power-Down Region

In ICC during the floorplan step, the power nets must be connected to the appropriate power domain. An example is shown below: connect_power_domain TOP \ -primary_power_net VDD \ -primary_ground_net VSS connect_power_domain POWER_DOWN \ -primary_power_net VDD_PWRDWN \ -primary_ground_net VSS

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

13

Page 14: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

4.2.2 MTCMOS Placement and Configuration

With the voltage area created we could then place the MTCMOS switch cells. The switch cells were actually defined as macros and were placed with a custom script that used the add_header_footer_cell_array command. See Appendix 9.1 for the full script details. The placement script was written such that any rectangular size voltage area could be in the design, and the script would detect the size of the VA and add the MTCMOS macros all around the perimeter. The add_header_footer_cell_array command was a bit tricky to use since we were placing macros with different orientation for each side and we were flipping every other macro on a side. This required a total of eight calls of the placement command (as seen in the script). Figure 6 shows details of the top-right corner of the power-down region and the switch cell placement and connections. After placement of the switch cells, the sleep enable signals are connected. As described above, the switch cells are connected in two chains – the trickle chain and the hammer chain. A custom script was created to connect the two chains and is shown in Appendix 9.2. The script was made to be able to handle any number of cells and detect the switch cell placement and connection requirements. This way if the size of the voltage area was changed in any way, the switch cell placement and connection scripts would perform their functions without any change in input necessary. The connections are made by creating an ECO change file and then applying this change to the design.

Isolation Cells

Sleep Enable Signals

Switch Cells

Figure 6 - Switch Cell Placement and Sleep Enable Signal Connections

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

14

Page 15: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

4.2.3 Power Grid Insertion

Once the switch cell had been placed and connected, the power grid for the DSP core was created. The switch cells were on the periphery of the power-down block and were designed to be connected to each other with power and ground rings. The MTCMOS cells were designed to be connected to the global VDD on the outside and the local VDD on the inside with VSS in between. Figure 7 shows the details of the power and ground connections to the switch cells and the power-down region (showing the top two metal layers).

VSS

VDD_PWRDWN

VDD

VDD

VDD

VSS

VDD_PWRDWN

VSS

Figure 7 - Power Grid Connections (Top Metal Layers Only)

Once the rings connecting the MTCMOS cells were created, the regular mesh was created. This was done in the normal way, creating the VDD and VSS mesh as if this were a normal block. Then, the VDD straps were cut and deleted out of the power-down region. Then the internal VDD supply signal, VDD_PWRDWN, was created exactly as the VDD signal was created. The tool noted the VDD straps were already placed, and simply filled in the cut-out straps with VDD_PWRDWN straps. This is where you can see the reduced complexity of the power mesh design by having the switch cells around the perimeter of the power-down region instead of inside the region.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

15

Page 16: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

4.3 Physical Synthesis

Physical synthesis was done with IC Compiler, which is fully multi-voltage aware. ICC automatically preserved any isolation cells that were already inserted into the design. ICC also places the isolation cells close to the voltage area boundaries (see Figure 5). Since ICC sets dont_touch attributes on the nets that cross the VA boundary, placing the isolation cells close will reduce the routing length. ICC also uses multi-voltage aware commands when performing automatic high fanout net synthesis (AHFS), which is part of the place_opt command. The buffer trees are built with different sub-trees for the fanout cones in each voltage area and using buffers with the appropriate operating conditions. At the end of each step of the physical synthesis process, we verified the proper connections of the isolation cells using the check_mv_design command. This ensures proper functionality in the event of an inappropriate change. This will be discussed further in section 5.1. 4.3.1 Power Optimization

The main power optimization done during physical synthesis was for leakage optimization. There were two Vth level standard cell libraries available for use: Standard Vth (SVT) and High Vth (HVT). SVT cells are faster but have more leakage than HVT cells, which are slower but have less leakage. DCT and ICC are able to do leakage optimization with multiple Vth libraries; swapping HVT cells for SVT cells that are in fast paths that exceed the timing goals. The ADI design used HVT cells only for the clock trees to reduce the leakage over the chip. We used two different methods for multi-Vth leakage optimization. For blocks that required high performance, SVT only cells were used from the initial synthesis in DCT until after placement in ICC. Then HVT cells were introduced during post-CTS optimization and multi-Vth optimization continued through the flow. For the designs that were not speed critical, HVT only cells were used until post-CTS optimization. Then SVT cells were introduced to bring the design up to speed. 4.3.2 Scan Reordering

Scan reordering was done in ICC during placement. As discussed previously in section 3.4, scan reordering required the use of a SCANDEF file that was generated after scan insertion in DCT. The SCANDEF file contains the scan chain definitions and was read by the read_def command before placement. The place_opt command reorders the scan chains based on the physical locations of the registers. ICC reordered the scan chains with awareness of the voltage area and can be set to reorder in a horizontal fashion or vertical fashion. Figure 8 shows the connections of the scan chains of the power-down region before any reordering has taken place. Figure 9 shows the power-down region after scan reordering with the vertical and the horizontal settings, respectively (two different place_opt results). We opted for the vertical orientation as it gave better results.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

16

Page 17: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Figure 8 - Power-down block scan chains before reordering

Figure 9 - Power-down block scan chains after reordering

4.3.3 CTS

The CTS engine in ICC is also multi-voltage aware as it creates the clock tree bottom-up by clustering the sink points from the same voltage area. After the clock sub-trees are built for the voltage area, ICC joins the sub-trees at the root of the clock net. CTS only allows one crossing of the boundary of the power-down region. Figure 10 shows an example of two clock trees and shows how they cross the power-down boundary only once. The clock sub-tree in the power-down region must be timing equivalent to the clock sub-tree outside the region, otherwise there will be added skew to the clock. You can see the difficulty in providing a good tree with such a multi-voltage floorplan as the region outside the voltage area is quite narrow and congested and much different than the open area in the power-down region.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

17

Page 18: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

Figure 10 - Power-down domain clock crossings

5.0 Sign-Off For the purposes of this paper we will discuss the sign-off steps taken that relate to the multi-supply nature of the design. The power and IR-drop analysis results were taken under several different scenarios provided by different PVT corners, extraction methods, and timing modes. The details of these scenarios are confidential and will not be covered in this paper. 5.1 Verification

To verify the proper operation of the multi-voltage aspects of the design, we used Formality, ICC and PrimeRail. Formal verification took into account the isolation cells in the design because they were included in the RTL code, even for the scan signals. Another important test was done in ICC after each physical optimization step. The check_mv_design command was used as shown below: check_mv_design -verbose -isolation \ -opcond_mismatches -target_library_subset \ -connection_rules

The main benefit of this command is to check for the proper use of isolation cells, using the isolation option. If there is an isolation cell missing, or if a cell of any kind is placed between the boundary of the power-down region and the isolation cell, this command will notify the user. It should be well noted that this check should also be put at the end of any ECO implementation script. The creation of ECO change files may not take the multi-supply requirements of the design into account. This command should be made a sign-off requirement. The dynamic IR-drop analysis and in-rush current analysis was done with PrimeRail, as discussed below in section 5.4. PrimeRail was also used to verify the proper operation of the power-down region. The analysis performed on the DSP core could only work if the power-down region was operating correctly.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

18

Page 19: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

5.2 Power Analysis

The power analysis of the ADI design was done using PrimeTime-PX. During timing analysis using different scenarios, power analysis was also performed. This provided us with accurate values for both the dynamic power and the leakage power at several different PVT corners and extraction methods. The switching activity used to generate the power analysis values and the IR-drop values was a 20% activity on the inputs to the design and on the outputs of the design registers. We are unable to report the actual results of the analysis due to early stage of this product. We can report that the leakage current for the DSP core is reduced to one third its normal leakage current when the design is put into sleep mode. 5.3 Static IR-Drop

The static and dynamic IR-drop analysis for this ADI design is discussed in detail in another SNUG Boston 2007 paper. The graphics for the static and dynamic IR-drop analysis were obtained from those authors.[6] Static IR-drop was measured with Astro-Rail which took into account the resistance of the switch cells in order to do its analysis. The IR-drop was measured at the block level as well as the chip level. Figure 11 shows the IR-drop maps for both the VDD and VSS supplies, respectively. In all tests the IR-drop results met or exceeded the design targets.

Figure 11 - VDD and VSS Static IR-Drop Maps

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

19

Page 20: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

5.4 Dynamic IR-Drop and In-Rush Current Analysis

Dynamic power analysis is crucial to the verification of the power-down block within the ADI chip design. The analysis highlighted areas of concern and helped to determine the number of switch cells to use in the wakeup of the power-down region. Figure 12 shows an example of in-rush current analysis results during the device wakeup measured for the DSP core. The first graph shows the in-rush current results. Notice where the main chain is activated. The second graph shows the IR-drop for the VDD signal of a cell outside, and close to the power-down region. The third graph shows the IR-drop of a cell that is far from the power-down region. The IR-drop of the close cell is obviously affected by the in-rush current more than the cell that is far from the power-down region.

In-Rush Current

IR-Drop: Cell close to Power-Down Region

IR-Drop: Cell far from Power-Down Region

Figure 12 - Power-Down Region In-Rush Current

Many other tests were run using dynamic analysis with all results pointing to a working design. 6.0 Conclusion This paper discussed the implementation of a low-power, multi-supply ASIC, using DCT and ICC for a 65nm technology developed by Analog Devices, Inc. with the assistance of Synopsys Professional Services. We touched on several of the design aspects necessary for a multi-supply design and showed some of the results. Many trade-offs were made with the goal of reducing the overall leakage power, while also reducing the design complexity necessary to make these reductions. 7.0 Acknowledgements Thanks go to Julio Hernandez and Kaijian Shi for the use of their IR-drop data and pictures. Also thank you to Michael Solka for help in reviewing this document.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

20

Page 21: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

8.0 References [1] Malnar, Zelic, Lewis, Raman, Maiman (2007) ‘Evaluation of DC-Topographical’, SNUG

San Jose 2007.

[2] Kiegle, Kinney, Suh, Yock (2007) ‘Will DCT really let me get rid of wire load models? Analysis of a DCT qualification effort’, SNUG San Jose 2007.

[3] (2007) Unified Power Format (UPF) Standard, Acellera, Version 1.0, February 22, 2007.

[4] (2006) Multivoltage Design Flow Methodology Application Note, Synopsys, August 4, 2006.

[5] (2006) DFT Compiler User Guide Vol. 1: Scan, Synopsys, Version Y-2006.06, June 2006.

[6] Shi, Hernandez, Geisler, (2007) ‘IR-drop analysis of a UDSM complex power-gating design’, SNUG Boston 2007.

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

21

Page 22: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

9.0 Appendix 9.1 Script to place the switch cells

## ---------------------------------------------------------------------- ## Add the switch cells ## ---------------------------------------------------------------------- set sw(cellname) header_switch set sw(width) x.x ;# Width of the switch cell set sw(height) x.x ;# Height of the switch cell # Bottom Side set sw(llx) $va(llx) set sw(lly) [expr $va(lly) - $sw(width)] set sw(urx) $va(urx) set sw(ury) $va(lly) set sw(bbox) "$sw(llx) $sw(lly) $sw(urx) $sw(ury)" set sw(prefix) header_bottom_0_ add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -orientation R90 \ -x_increment $sw(height) \ -prefix $sw(prefix) set sw(prefix) header_bottom_1_ set sw(bbox) "[expr $sw(llx) + $sw(height)] $sw(lly) $sw(urx) $sw(ury)" add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -orientation R90_MX \ -x_increment $sw(height) \ -prefix $sw(prefix) # Left Side set sw(llx) [expr $va(llx) - $sw(width)] set sw(lly) $va(lly) set sw(urx) $va(llx) set sw(ury) $va(ury) set sw(bbox) "$sw(llx) $sw(lly) $sw(urx) $sw(ury)" set sw(prefix) header_left_0_ add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -prefix $sw(prefix) set sw(prefix) header_left_1_ set sw(bbox) "$sw(llx) [expr $sw(lly) + $sw(height)] $sw(urx) $sw(ury)" add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -orientation R0_MY \ -prefix $sw(prefix) # Top Side set sw(llx) $va(llx) set sw(lly) $va(ury) set sw(urx) $va(urx) set sw(ury) [expr $va(ury) + $sw(width)]

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

22

Page 23: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

set sw(bbox) "$sw(llx) $sw(lly) $sw(urx) $sw(ury)" set sw(prefix) header_top_0_ add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -orientation R270 \ -x_increment $sw(height) \ -prefix $sw(prefix) set sw(prefix) header_top_1_ set sw(bbox) "[expr $sw(llx) + $sw(height)] $sw(lly) $sw(urx) $sw(ury)" add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -pattern staggered \ -orientation R90_MY \ -x_increment $sw(height) \ -prefix $sw(prefix) # Right Side set sw(llx) $va(urx) set sw(lly) $va(lly) set sw(urx) [expr $va(urx) + $sw(width)] set sw(ury) $va(ury) set sw(bbox) "$sw(llx) $sw(lly) $sw(urx) $sw(ury)" set sw(prefix) header_right_0_ add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -orientation R0_MX \ -pattern staggered \ -prefix $sw(prefix) set sw(prefix) header_right_1_ set sw(bbox) "$sw(llx) [expr $sw(lly) + $sw(height)] $sw(urx) $sw(ury)" add_header_footer_cell_array \ -lib_cell $sw(cellname) \ -bounding_box $sw(bbox) \ -orientation R180 \ -pattern staggered \ -prefix $sw(prefix)

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

23

Page 24: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

9.2 Script to connect the switch cell chains

## ----------------------------------------------------------------------- ## Logically connect powerdown signal ## ----------------------------------------------------------------------- set eco_change_file switch_cell_connections.eco # Setup sleep enable ports set power_cntrl_net(1) SHORTCHAIN_BSWITCH_ENABLE set afanout [all_fanout -from [get_nets $power_cntrl_net(1)]] if {$afanout != ""} { remove_cell [get_cells -of_objects $afanout ] } set power_cntrl_net(0) LONGCHAIN_BSWITCH_ENABLE set afanout [all_fanout -from [get_nets $power_cntrl_net(0)]] if {$afanout != ""} { remove_cell [get_cells -of_objects $afanout ] } # switch-cell sleep enable input and output set power_cntrl_pin EN_IN set power_cntrl_out_pin EN_OUT # set the chain lengths set chain_lengths [list 10000 $small_chain_number] # start with chain 1 set chain_num 1 # Procedure to select a header cell by point proc sel_header_pt {my_x my_y cells_to_check} { foreach_in_collection _cell [get_cells $cells_to_check] { set cell_llx [get_attribute -quiet $_cell bbox_llx] set cell_lly [get_attribute -quiet $_cell bbox_lly] set cell_urx [get_attribute -quiet $_cell bbox_urx] set cell_ury [get_attribute -quiet $_cell bbox_ury] if {[expr {($my_x >= $cell_llx) && ($my_x <= $cell_urx) \ && ($my_y >= $cell_lly) && ($my_y <= $cell_ury)}]} { return $_cell } } } # Left First set i 1 # open eco file set fid [open $eco_change_file w] # setup the offsets set x [expr $va(llx) - [expr $sw(width) / 2] ] set y [expr $va(lly) + [expr $sw(height) - 0.2] ] # get the first switch-cell set prev_cell_inst [sel_header_pt $x $y *header_left*] set prev_cell_name [get_attribute $prev_cell_inst full_name] # Connect the first switch-cell set create_net $power_cntrl_net($chain_num) set pinin "$prev_cell_name/$power_cntrl_pin" puts $fid "+HC $create_net $pinin" set num_switch_cells 0 for {set y [expr $y + $sw(height)]} {$y <= $va(ury)} {set y [expr $y + $sw(height)] } { set cell_inst [sel_header_pt $x $y *header_left*] set cell_name [get_attribute $cell_inst full_name] puts "Connecting switch-cell: $cell_name"

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

24

Page 25: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

set num_switch_cells [expr $num_switch_cells + 1] set pinout "$prev_cell_name/$power_cntrl_out_pin" set pinin "$cell_name/$power_cntrl_pin" set cntrl_net $power_cntrl_net($chain_num) set create_net ${cntrl_net}_${i} set prev_cell_name $cell_name if { $i != [lindex $chain_lengths $chain_num]} { puts $fid "+HN $create_net" puts $fid "+HC $create_net $pinout" puts $fid "+HC $create_net $pinin" set i [expr $i + 1] } else { # move to the next chain set chain_num [expr $chain_num - 1] set cntrl_net $power_cntrl_net($chain_num) set i 1 # connect the chain enable pin to the first switch tproc_msg -i "-- Connecting chain $chain_num to $power_cntrl_net($chain_num)" set create_net $power_cntrl_net($chain_num) set pinin "$prev_cell_name/$power_cntrl_pin" puts $fid "+HC $create_net $pinin" } } # TOP set x [expr $va(llx) + [expr $sw(height) - 0.2] ] set y [expr $va(ury) + [expr $sw(width) / 2] ] for {set x $x} {$x <= $va(urx)} {set x [expr $x + $sw(height)] } { set cell_inst [sel_header_pt $x $y *header_top*] set cell_name [get_attribute $cell_inst full_name] puts "Connecting switch-cell: $cell_name" set num_switch_cells [expr $num_switch_cells + 1] set pinout "$prev_cell_name/$power_cntrl_out_pin" set pinin "$cell_name/$power_cntrl_pin" set cntrl_net $power_cntrl_net($chain_num) set create_net ${cntrl_net}_${i} set prev_cell_name $cell_name puts $fid "+HN $create_net" puts $fid "+HC $create_net $pinout" puts $fid "+HC $create_net $pinin" set i [expr $i + 1] } # RIGHT set x [expr $va(urx) + [expr $sw(width) / 2] ] set y [expr $va(ury) - [expr $sw(height) - 0.2] ] for {set y $y} {$y >= $va(lly)} {set y [expr $y - $sw(height)] } { set cell_inst [sel_header_pt $x $y *header_right*] set cell_name [get_attribute $cell_inst full_name] puts "Connecting switch-cell: $cell_name" set num_switch_cells [expr $num_switch_cells + 1] set pinout "$prev_cell_name/$power_cntrl_out_pin" set pinin "$cell_name/$power_cntrl_pin" set cntrl_net $power_cntrl_net($chain_num) set create_net ${cntrl_net}_${i} set prev_cell_name $cell_name puts $fid "+HN $create_net" puts $fid "+HC $create_net $pinout"

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

25

Page 26: Advanced Low Power, Multi-Supply Implementation Techniques ... · being low-power, the complexities increase. If you wish to reduce the idle power (leakage power) of the ASIC, especially

puts $fid "+HC $create_net $pinin" set i [expr $i + 1] } # BOTTOM set x [expr $va(urx) - [expr $sw(height) - 0.2] ] set y [expr $va(lly) - [expr $sw(width) / 2] ] for {set x $x} {$x >= $va(llx)} {set x [expr $x - $sw(height)] } { set cell_inst [sel_header_pt $x $y *header_bot*] set cell_name [get_attribute $cell_inst full_name] puts "Connecting switch-cell: $cell_name" set num_switch_cells [expr $num_switch_cells + 1] set pinout "$prev_cell_name/$power_cntrl_out_pin" set pinin "$cell_name/$power_cntrl_pin" set cntrl_net $power_cntrl_net($chain_num) set create_net ${cntrl_net}_${i} set prev_cell_name $cell_name puts $fid "+HN $create_net" puts $fid "+HC $create_net $pinout" puts $fid "+HC $create_net $pinin" set i [expr $i + 1] } close $fid puts "####################################" puts " Number of Switch Cells: $num_switch_cells" puts "####################################" puts "Reading in the switch cell connection ECO file" read_mw_eco_list -lib $library -change_file $eco_change_file $block_name

SNUG Boston 2007 Advanced Low Power, Multi-Supply Implementation

26