[ieee 2009 52nd ieee international midwest symposium on circuits and systems (mwscas) - cancun,...

4
Low-Power FPGA Routing Switches Using Adaptive Body Biasing Technique George V. Leming and Kundan Nepal Dept. of Electrical Engineering, Bucknell University, Lewisburg, PA 17837 Abstract As technology scales and the geometries of the transistors shrink, leakage current and subsequently total power consumption increase considerably. Many of the benefits brought forth by the smaller transistors will be lost if the high power consumption problem cannot be solved. The leakage power consumption problem is especially relevant to an FPGA because of the amount of unused interconnect and logic fabric in the chip during any operation. In this paper, we propose to lower the power consumption of a standard SRAM based FPGA by using half-width transistor stacks and adaptive body biasing techniques. SPICE simulation on a standard pass-transistor based switch block and a switch matrix from the Xilinx XC4000 FPGA show that the leakage power can be reduced by up to 46% for a 45nm technology node and up to 10% for a 70nm technology node when a switch-matrix is fully loaded. 1. Introduction In recent years, reconfigurable fabrics such as Field Programmable Gate Arrays (FPGAs) have quickly closed the performance gap with custom ASICs. FPGAs offer an extremely versatile and powerful platform through their reconfigurable architecture, and they effectively eliminate the manufacturing process involved with ASIC production. Their attractiveness has been further enhanced through increasing logic densities scaled to smaller packages (now at 65nm). Complex designs can now be quickly synthesized and tested on these platforms leading to cheaper and quicker time-to-market implementations and faster returns for digital designers. The flexibility and versatility of the FPGA lies in its programmability. Programmability comes at the cost of extra circuitry, resulting in more power consumption. To perform the same function, a circuit implemented in an FPGA generally consumes much more dynamic and static power compared to its ASIC counterpart. The large area (i.e. long wires, SRAM cells) and inefficient use of the chip's resources contribute to the FPGAs poor dynamic power efficiency. Additionally, leakage current is becoming increasingly problematic as process technology is scaled down and supply and threshold voltages decrease. Interconnection fabric occupies the most area in an FPGA. The programmability of circuitry and the generic nature of computation blocks within the FPGA means that longer wires are usually needed compared to ASIC implementations. Longer wires present longer capacitive loading on circuits and longer charge and discharge times. Dynamic power is proportional to the capacitive loading and to the square of the power supply voltage (V DD ). Most work on reducing dynamic power has focused around V DD scaling and using multiple supply voltages [1,2]. Li et al showed that using programmable V DD levels, the total FPGA power could be reduced by about 50% compared to a single V DD case at the same target clock frequency [2]. The majority of their savings in total power came from savings in the interconnect fabric due to multiple V DD programmability. Static Power in the form of subthreshold leakage and gate leakage is another major source of power dissipation in an FPGA. Subthreshold leakage refers to the condition that when a MOS transistor is nominally off, it still conducts a small amount of current that is exponentially related to the threshold voltage. Gate leakage occurs when electrons tunnel through the gate oxide layer. Quantum mechanics shows that the probability of electron tunneling increases exponentially as the oxide layer width decreases. As process technology has been scaled down and threshold voltage and oxide thickness decrease, leakage power has begun to dominate FPGA power consumption. Leakage power is even more important in an FPGA because most applications use only a small amount of the available reconfigurable fabric. The remaining unused fabric and associated interconnects continue to leak causing an excess amount of leakage. Numerous researchers have shown that the interconnect structure accounts for approximately 60% of the total power dissipation in an FPGA [3,4,5,6]. Any reduction in the power consumption of this structure will lead to an overall savings in power in the entire design. In this work, we have chosen to focus on the switch block and the connection blocks that form the major components of the interconnect resources inside an FPGA. We have 978-1-4244-4480-9/09/$25.00 ©2009 IEEE 447

Upload: kundan

Post on 18-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 52nd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) - Cancun, Mexico (2009.08.2-2009.08.5)] 2009 52nd IEEE International Midwest Symposium on Circuits

Low-Power FPGA Routing Switches Using Adaptive Body Biasing Technique

George V. Leming and Kundan Nepal Dept. of Electrical Engineering, Bucknell University, Lewisburg, PA 17837

AbstractAs technology scales and the geometries of the transistors shrink, leakage current and subsequently total power consumption increase considerably. Many of the benefits brought forth by the smaller transistors will be lost if the high power consumption problem cannot be solved. The leakage power consumption problem is especially relevant to an FPGA because of the amount of unused interconnect and logic fabric in the chip during any operation. In this paper, we propose to lower the power consumption of a standard SRAM based FPGA by using half-width transistor stacks and adaptive body biasing techniques. SPICE simulation on a standard pass-transistor based switch block and a switch matrix from the Xilinx XC4000 FPGA show that the leakage power can be reduced by up to 46% for a 45nm technology node and up to 10% for a 70nm technology node when a switch-matrix is fully loaded.

1. Introduction

In recent years, reconfigurable fabrics such as Field Programmable Gate Arrays (FPGAs) have quickly closed the performance gap with custom ASICs. FPGAs offer an extremely versatile and powerful platform through their reconfigurable architecture, and they effectively eliminate the manufacturing process involved with ASIC production. Their attractiveness has been further enhanced through increasing logic densities scaled to smaller packages (now at 65nm). Complex designs can now be quickly synthesized and tested on these platforms leading to cheaper and quicker time-to-market implementations and faster returns for digital designers.

The flexibility and versatility of the FPGA lies in its programmability. Programmability comes at the cost of extra circuitry, resulting in more power consumption. To perform the same function, a circuit implemented in an FPGA generally consumes much more dynamic and static power compared to its ASIC counterpart. The large area (i.e. long wires, SRAM cells) and inefficient use of the chip's resources contribute to the FPGAs poor dynamic power efficiency. Additionally, leakage current is becoming increasingly problematic as process technology is scaled down and supply and threshold voltages decrease.

Interconnection fabric occupies the most area in an FPGA. The programmability of circuitry and the generic nature of computation blocks within the FPGA means that longer wires are usually needed compared to ASIC implementations. Longer wires present longer capacitive loading on circuits and longer charge and discharge times. Dynamic power is proportional to the capacitive loading and to the square of the power supply voltage (VDD). Most work on reducing dynamic power has focused around VDDscaling and using multiple supply voltages [1,2]. Li et alshowed that using programmable VDD levels, the total FPGA power could be reduced by about 50% compared to a single VDD case at the same target clock frequency [2]. The majority of their savings in total power came from savings in the interconnect fabric due to multiple VDDprogrammability.

Static Power in the form of subthreshold leakage and gate leakage is another major source of power dissipation in an FPGA. Subthreshold leakage refers to the condition that when a MOS transistor is nominally off, it still conducts a small amount of current that is exponentially related to the threshold voltage. Gate leakage occurs when electrons tunnel through the gate oxide layer. Quantum mechanics shows that the probability of electron tunneling increases exponentially as the oxide layer width decreases. As process technology has been scaled down and threshold voltage and oxide thickness decrease, leakage power has begun to dominate FPGA power consumption. Leakage power is even more important in an FPGA because most applications use only a small amount of the available reconfigurable fabric. The remaining unused fabric and associated interconnects continue to leak causing an excess amount of leakage. Numerous researchers have shown that the interconnect structure accounts for approximately 60% of the total power dissipation in an FPGA [3,4,5,6]. Any reduction in the power consumption of this structure will lead to an overall savings in power in the entire design. In this work, we have chosen to focus on the switch block and the connection blocks that form the major components of the interconnect resources inside an FPGA. We have

978-1-4244-4480-9/09/$25.00 ©2009 IEEE 447

Page 2: [IEEE 2009 52nd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) - Cancun, Mexico (2009.08.2-2009.08.5)] 2009 52nd IEEE International Midwest Symposium on Circuits

constructed general switch points, switch blocks and connection blocks using half-width transistors and adaptive body biasing.

2. Background and Related Work

Since the interconnect fabric has been identified as the majority power consumer in an FPGA, numerous studies have been conducted on static and dynamic power reduction in this area. At the architecture level, much research has focused on dual-VDD/dual-VT fabrics and deemed them effective for interconnect power reduction. [1] introduced such an architecture with a pre-defined ratio of VDD(high) tracks to VDD(low) tracks, resulting in power reductions of 23.45% on average. [2] varied this strategy by introducing a fine-grained VDD-programming algorithm. Coupled with fine-grained power gating, this method reduced total FPGA power by 26%, but carried an area overhead of 220%. Furthermore, Li et al. improved their power gating techniques and VDD assignment algorithm, and were able reduce total FPGA power by 45% with an area overhead of 186% [7].

Additionally, several studies have focused on switch block circuit design to achieve low power operation. Lemieux and Lewis introduced novel routing switches using multiplexers and pass-transistors in [8] that reduced area-delay by 14% and delay by 7%. Anderson and Najm improved upon the multiplexer circuit by adding high-speed, low-power, and sleep modes to it, reducing leakage by up to 40% in low-power mode and 61% in sleep mode [9].

Furthermore, as process technology scales toward 65nm, circuit design techniques have been applied to FPGA interconnections solely for the purpose of leakage reduction. Early research in this area identified three promising leakage reduction strategies: high-VT SRAM cells, redundant SRAM cells, and negative gate biasing [5]. Lodi et al. combined super cut-off, body biasing, and multi-threshold techniques to reduce leakage current [10]. Researchers in [11] were able to reduce leakage current by 70% using input control techniques for the routing multiplexers.

3. FPGA Model and Leakage Reduction

For this study, we assume an SRAM based FPGA model commonly seen in the Xilinx family. An SRAM based FPGA is composed of an array of configurable logic blocks (CLBs) surrounded by interconnect resources as shown in Figure 1. Long wires connect between one or more CLBs and run vertically and horizontally between the CLB blocks. Connections and routing of these wires are done by switch blocks (SB) or the connection blocks (CB). These blocks are generally a collection of switch-points designed

using tristate buffers, multiplexers or transmission gate/pass transistors. The connection and routing of a wire to other wires or logic blocks is governed by the configuration bit stored in SRAM memory. In this work, we focus on the switch block. A generic 3 connection switch block is shown in Figure 2. Three vertical wires and three horizontal wires meet at a switch point which consists of 6 programmable pass-transistors to route the signal.

Figure 1. Island style FPGA, the switch block and the connection block.

Figure 2. A generic switch block with three connections and a switch point with 6 programmable pass-transistors.

Subthreshold leakage current is directly related to the threshold voltage of a MOS transistor. The threshold voltage VT can be written in terms of the drain-source voltage Vds and source-body voltage Vsb as:

)(0 ssbsdsTT VVVV ���� �����where � describes the drain-induced body lowering (DIBL) and � is the body-effect coefficient. In an NMOS transistor, the threshold voltage of the transistor can be increased by providing a negative bias on the substrate. This body biasing of the NMOS leads to a lower leakage current.

Figure 3. A series-transistor stack.

As transistor lengths get smaller and as channels get shorter, the increase in the voltage at the drain of the MOSFET causes the depletion regions of the drain and source to interact and lower the source potential barrier resulting in a decrease of the threshold voltage and an increase in the leakage current. This effect can be lowered by putting transistors in stack. For a NMOS transistor of

448

Page 3: [IEEE 2009 52nd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) - Cancun, Mexico (2009.08.2-2009.08.5)] 2009 52nd IEEE International Midwest Symposium on Circuits

width W in the switch shown in Figure 3, the function of the transistor is unaltered by creating two series-connected NMOS transistors each of width W/2. The gates of the two half-transistors are tied together. When the input to the gate is 0, the NMOS transistors are OFF, and due to the stacking effect, the voltage at the midway point C is slightly positive instead of 0. The bulk or the substrate of the NMOS is normally tied to 0V, so this positive voltage at C makes the Vbs of transistor AC negative, increasing the body effect and hence the threshold voltage. The NMOS is off so the gate voltage is 0 and the slight positive voltage at C also makes the Vgs of transistor AC negative, decreasing the subthreshold current. Finally, the Vds voltage is reduced, which increases the barrier and reduces the DIBL effect thereby increasing the threshold.

In our approach for lowering the leakage power consumption in the switch-matrix of an FPGA interconnect network, we combine these useful techniques. We designed out routing switch with two options: a) a fixed body bias applied to the NMOS transistor, b) an adaptive body bias applied to the NMOS.

Both these cases were tested for a single full-width and two half-width series MOSFETs.

4. Results

We developed three scenarios in order to evaluate the performance of the proposed routing switch, shown in Figure 4. These scenarios are as follows:

(1) The switch circuit was tested in isolation at full capacity so several leakage-reduction techniques could be compared.

(2) The proposed circuit was tested in isolation with iterations for 1 active output, 2 active outputs, and three active outputs.

(3) The proposed circuit was implemented in the Xilinx XC4000 switch matrix architecture for W = 3...6, shown in Figure 5. Node 0 was arbitrarily chosen to carry the input signal, leaving all other nodes as outputs.

Figure 4. Proposed FPGA Routing Switch

Minimum-sized inverters were connected to the switch and matrix outputs in each respective scenario to simulate the typical loads seen by interconnect structures in current FPGAs. Additionally, MOSFETs were configured for routing and allowed to leak up to 10% of the supply voltage. All simulations were performed using ELDO SPICE, and were run at 100 �C for both 45nm and 70nm process technologies [12].

The simulations for the first scenario involved five types of FPGA pass-transistor routing switches:

(1) standard switch (2) switch with single pass-transistors replaced by two

half-width transistors (3) switch (2) with a body voltage bias of 0.1 V (4) switch (2) with a body voltage bias of 0.2 V (5) our proposed switch

Figure 5. Xilinx XC4000 Switch Matrix Architectures for W = 3, 4, 5 and 6

Each circuit was configured to operate at full capacity and the switch routed a constant logic 1 signal to all outputs. The average power was measured over a period of 4000ns. The results, shown below in Figure 6, show the five switches on the x-axis and power measurements on the y-axis. All power measurements are normalized to those of the standard routing switch. Our proposed low-leakage switch outperforms the others by an average of 10.8% in 70nm technology and by 19.6% in 45nm technology.

Figure 6. Simulation results for Scenario 1.

449

Page 4: [IEEE 2009 52nd IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) - Cancun, Mexico (2009.08.2-2009.08.5)] 2009 52nd IEEE International Midwest Symposium on Circuits

Simulations for the second testing scenario were written to measure how well the proposed switch performs for different routing configurations. Each switch has four possible configurations that correspond to the number of conducting pass-transistors.

The simulations cycled sequentially through these configurations and measured the average power over a period of 4000ns. The results, shown in Figure 7, use normalized power to compare the leakage properties of our proposed switch to those of a standard FPGA routing switch. For the respective number of outputs, our switch consumed 20.8%, 10.5%, and 6.25% less power in 70nm technology. In 45nm technology, our switch consumed 15.9%, 10.7%, and 9.0% less power for 1, 2, and 3 active outputs, respectively.

Figure 7. Simulation results for Scenario 2

The simulation process for the third experiment was driven by the structure of the switch itself as well as the structure of the switch matrix. We first recognized that when the switch is supplied with an input at a single node, it has four states that correspond to the number of active pass-transistors. The switch can be completely off (zero pass-transistors are conducting), or it can be transferring the input to some combination of three possible outputs (one or more pass-transistors are conducting) Furthermore, any switch in the matrix can be in any of those states at any particular point in time, resulting in 4W possible states for the matrix.

We developed the SPICE simulations to systematically consider all possible matrix states, giving us a comprehensive examination of the matrix’s leakage properties. The simulation starts by applying a constant voltage of 1 V at node 0 for input. Next, the simulation cycles through all possible matrix states by turning on the necessary pass-transistors. The pass-transistors are allowed to leak up to 10% of the supply voltage. Finally, the average power is measured for each matrix state, yielding a complete leakage profile for switch matrices with W = 3...6. For W=3 to 6, it was seen that the average savings in power increased as the percentage of active connections increased in the switch matrix. The maximum savings in leakage using our approach over a standard switch for a fully loaded switch matrix of size W=6 designed at 70nm

technology was calculated at 10%. The savings was much higher (46%) in the 45nm technology case.

5. Conclusion

In this paper we proposed a leakage power reduction technique for an SRAM based FPGA model commonly seen in commercial FPGAs from Xilinx. We targeted the switch block and the switch matrix structure to maximize the reduction in power. Simulation results show that our proposed method of creating a switch block and a switch matrix using a stack of half-width transistors with adaptive body biasing can lead to significant savings in leakage power as technology scales even further.

6. References

[1] Mondal, S., & Memik, S. O. (2005). A low power FPGA routing architecture. Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on, 1222-1225 Vol. 2. [2] Li, F., Lin, Y., & He, L. (2004). VDD programmability to reduce FPGA interconnect power. Computer Aided Design, 2004. ICCAD-2004. IEEE/ACM International Conference on, 760-765. [3] Li, F., Chen, D., He, L., & Cong, J. (2003). Architecture evaluation for power-efficient FPGAs. FPGA '03: Proceedings of the 2003 ACM/SIGDA Eleventh International Symposium on Field Programmable Gate Arrays, Monterey, California, USA. 175-184.[4] Tuan, T., & Lai, B. (2003). Leakage power analysis of a 90nm FPGA. Custom Integrated Circuits Conference, 2003. Proceedings of the IEEE 2003, 57-60. [5] Rahman, A., & Polavarapuv, V. (2004). Evaluation of low-leakage design techniques for field programmable gate arrays. FPGA '04: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, Monterey, California, USA. 23-30. [6] Poon, K. K. W., Wilton, S. J. E., & Yan, A. (2005). A detailed power model for field-programmable gate arrays. ACM Trans.Des.Autom.Electron.Syst., 10(2), 279-302. [7] Lin, Y., Li, F., & He, L. (2005). Routing track duplication with fine-grained power-gating for FPGA interconnect power reduction. ASP-DAC '05: Proceedings of the 2005 Conference on Asia South Pacific Design Automation, Shanghai, China. 645-650. [8] Lemieux, G., & Lewis, D. (2002). Circuit design of routing switches. FPGA '02: Proceedings of the 2002 ACM/SIGDA Tenth International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA. 19-28. [9] Anderson, J. H., & Najm, F. N. (2004). A novel low-power FPGA routing switch. Custom Integrated Circuits Conference, 2004. Proceedings of the IEEE 2004, 719-722. [10] Lodi, A., Ciccarelli, L., & Giansante, R. (2005). Combining low-leakage techniques for FPGA routing design. FPGA '05: Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA. 208-214.[11] Srinivasan, S., Gayasen, A., Vijaykrishnan, N., & Tuan, T. (2005). Leakage control in FPGA routing fabric. ASP-DAC '05: Proceedings of the 2005 Conference on Asia South Pacific Design Automation, Shanghai, China. 661-664. [12] Available online at http://www.eas.asu.edu/~ptm/

450