[ieee 2013 ieee postgraduate research in microelectronics and electronics asia (primeasia) -...

6
A Power Gating GALS Interface Implementation A. Rajakumari B.V.R.I.T, Narsapur Hyderabad, India [email protected] N. S. Murthy Sharma BVCE College, Odalarevu, India, 533210. [email protected] K. Lal Kishore JNTU Ananthapur, Ananthapur, India [email protected] Vasantha Kumar Petta Synopsys, INC Hillsboro, U.S.A [email protected] Abstract—In today’s nanometric VLSI designs achieving both power and performance targets is the top most priority for design closure. Globally asynchronous locally synchronous (GALS) architectures can offer less dynamic power and improved performance due to absence of global clock. In GALS SoC architectures each synchronous blocks runs on their local clocks. Synchronous blocks communicate with each other by pausing their local clocks using an asynchronous interface which is implemented using various handshake protocols. However any synchronous block which has to wait long time for data from another block need to be in idle state and as a result will dissipate significant leakage power in nanometric designs. Power gating is an effective technique to reduce leakage power of an idle circuit in synchronous designs. However implementing such power gating interface in GALS architecture is a challenge in the absence of clock. Thus to reduce the leakage power in ideal blocks which are waiting for the data, a new GALS wrapper interface was proposed which can generate power gating sequence. To corroborate the proposed interface a GALS 8051 was implemented using Synopsys SAED 90 nm libraries. The power gating sequence of 8051 asynchronous wrappers are used to gate the power of Random Access Memory (RAM) block while Arithmetic Logic Unit (ALU) block is busy in doing arithmetic operations. The experimental results show a 30% reduction in leakage power of RAM block due to power gating. Keywords—Globally asynchronous locally synchronous (GALS);PowerGating;ClockGating;SignalTransition Graph (STG); Asynchronous Wrappers; 4-Phase handshking; Petri Nets; Relative Timing. I. INTRODUCTION Throughout the last decade there has been a resurgence inquiry on asynchronous designs as an alternative to power hungry synchronous designs. The absence of global clock in asynchronous designs offers more advantages when compared with the synchronous designs like high performance and low power dissipation [1, 2, 3, 4]. Asynchronous circuits use various handshake protocol control signals in order to transfer the data to make sure correct circuit operation [5]. All critical clock network related difficulties in large SoC designs like clock skew, cross talk, on-chip variation and electro migration can be avoided in asynchronous designs. However implementing whole design with asynchronous design style is a challenge due to lack of EDA tools support [6]. The GALS (Globally-Asynchronous Locally-Synchronous) design methods brings compromise between synchronous and asynchronous design styles by separating each synchronous blocks in SoC with asynchronous interface. 978-1-4799-2751-7/13/$31.00 ©2013 IEEE In GALS architectures each synchronous block in SoC can perform operation with its own local clock. Hence EDA tools can be used for implementing these synchronous blocks. The data communication among synchronous blocks can be achieved via handshaking signals after pausing the local clocks. Though GALS system architectures ensure less dynamic power through clock less asynchronous interface the leakage power contribution to total power equation is still a challenge in deep submicron technologies. When any local synchronous block needs to wait for the data while pausing its local clock for long time, the entire circuit will be in idle state and will dissipate significant amount of leakage power. Power reduction is the topmost performance goal in the design of modern portable wireless communication devices. On the other hand the demand for low dynamic power dissipation, higher device integration and performance targets motivate CMOS scaling. The primary challenge of technology scaling is tremendous increase in leakage power. Leakage power dissipation is closely related to process variation and hence it affects standby power, active power and design margins as well [7, 8, 9, 10, 11]. Fig.1 highlights the predictions of ITRS where normalized leakage power is gradually going up as the technologies scales down [12, 13]. Designers can address dynamic power and leakage power by using advanced optimization techniques within their design flows. Power optimization is a design strategy that aims at reducing the circuit power consumption without significantly degrading circuit performance. Among power optimization and management techniques commonly used today, power gating is most effective technique for leakage power reduction in synchronous circuits [14, 15, 16, 17, 18, 19, and 20]. In power gating, the power supply to individual blocks can be switched on or off depending on whether they are in idle or active states. Power gating uses “sleep “transistors which are connected in series with the transistors of the logic block and supply rails to create virtual power and ground connections in a coarse- grained manner as shown in Fig. 2. The decision for switching between “sleep” and “active” mode is controlled by a power gating sequence generated by a power controller which operates on global clock [16]. The power controller design is complex and it is carefully designed to prevent any synchronization failures and timing violations. GALS architectures doesn’t have global clock and local clocks are paused during data exchange. So, designing a power controller which generates “sleep” and “active” power gating sequence is a challenge. Hence the focus of this work is to explore the possibilities of generating power gating sequence in GALS architectures. In this paper we proposed a novel GALS wrapper interface which can generate power gating sequence. The 2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia) 34

Upload: vasantha-kumar

Post on 02-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

A Power Gating GALS Interface Implementation

A. Rajakumari B.V.R.I.T, Narsapur

Hyderabad, India [email protected]

N. S. Murthy Sharma BVCE College,

Odalarevu, India, 533210. [email protected]

K. Lal Kishore JNTU Ananthapur, Ananthapur, India

[email protected]

Vasantha Kumar Petta Synopsys, INC

Hillsboro, U.S.A [email protected]

Abstract—In today’s nanometric VLSI designs achieving both power and performance targets is the top most priority for design closure. Globally asynchronous locally synchronous (GALS) architectures can offer less dynamic power and improved performance due to absence of global clock. In GALS SoC architectures each synchronous blocks runs on their local clocks. Synchronous blocks communicate with each other by pausing their local clocks using an asynchronous interface which is implemented using various handshake protocols. However any synchronous block which has to wait long time for data from another block need to be in idle state and as a result will dissipate significant leakage power in nanometric designs. Power gating is an effective technique to reduce leakage power of an idle circuit in synchronous designs. However implementing such power gating interface in GALS architecture is a challenge in the absence of clock. Thus to reduce the leakage power in ideal blocks which are waiting for the data, a new GALS wrapper interface was proposed which can generate power gating sequence. To corroborate the proposed interface a GALS 8051 was implemented using Synopsys SAED 90 nm libraries. The power gating sequence of 8051 asynchronous wrappers are used to gate the power of Random Access Memory (RAM) block while Arithmetic Logic Unit (ALU) block is busy in doing arithmetic operations. The experimental results show a 30% reduction in leakage power of RAM block due to power gating.

Keywords—Globally asynchronous locally synchronous (GALS);PowerGating;ClockGating;SignalTransition Graph (STG); Asynchronous Wrappers; 4-Phase handshking; Petri Nets; Relative Timing.

I. INTRODUCTION Throughout the last decade there has been a resurgence

inquiry on asynchronous designs as an alternative to power hungry synchronous designs. The absence of global clock in asynchronous designs offers more advantages when compared with the synchronous designs like high performance and low power dissipation [1, 2, 3, 4]. Asynchronous circuits use various handshake protocol control signals in order to transfer the data to make sure correct circuit operation [5]. All critical clock network related difficulties in large SoC designs like clock skew, cross talk, on-chip variation and electro migration can be avoided in asynchronous designs. However implementing whole design with asynchronous design style is a challenge due to lack of EDA tools support [6]. The GALS (Globally-Asynchronous Locally-Synchronous) design methods brings compromise between synchronous and asynchronous design styles by separating each synchronous blocks in SoC with asynchronous interface.

978-1-4799-2751-7/13/$31.00 ©2013 IEEE

In GALS architectures each synchronous block in SoC can perform operation with its own local clock. Hence EDA tools can be used for implementing these synchronous blocks. The data communication among synchronous blocks can be achieved via handshaking signals after pausing the local clocks. Though GALS system architectures ensure less dynamic power through clock less asynchronous interface the leakage power contribution to total power equation is still a challenge in deep submicron technologies. When any local synchronous block needs to wait for the data while pausing its local clock for long time, the entire circuit will be in idle state and will dissipate significant amount of leakage power. Power reduction is the topmost performance goal in the design of modern portable wireless communication devices. On the other hand the demand for low dynamic power dissipation, higher device integration and performance targets motivate CMOS scaling. The primary challenge of technology scaling is tremendous increase in leakage power. Leakage power dissipation is closely related to process variation and hence it affects standby power, active power and design margins as well [7, 8, 9, 10, 11]. Fig.1 highlights the predictions of ITRS where normalized leakage power is gradually going up as the technologies scales down [12, 13].

Designers can address dynamic power and leakage power by using advanced optimization techniques within their design flows. Power optimization is a design strategy that aims at reducing the circuit power consumption without significantly degrading circuit performance. Among power optimization and management techniques commonly used today, power gating is most effective technique for leakage power reduction in synchronous circuits [14, 15, 16, 17, 18, 19, and 20]. In power gating, the power supply to individual blocks can be switched on or off depending on whether they are in idle or active states. Power gating uses “sleep “transistors which are connected in series with the transistors of the logic block and supply rails to create virtual power and ground connections in a coarse-grained manner as shown in Fig. 2. The decision for switching between “sleep” and “active” mode is controlled by a power gating sequence generated by a power controller which operates on global clock [16]. The power controller design is complex and it is carefully designed to prevent any synchronization failures and timing violations. GALS architectures doesn’t have global clock and local clocks are paused during data exchange. So, designing a power controller which generates “sleep” and “active” power gating sequence is a challenge. Hence the focus of this work is to explore the possibilities of generating power gating sequence in GALS architectures. In this paper we proposed a novel GALS wrapper interface which can generate power gating sequence. The

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

34

Page 2: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

synthesis flow for implementing proposed GALS wrapper is also discussed in detail.

This paper is organized in to six different sections. Section II briefly explains necessary background about conventional GALS handshaking interface. The possibility of generating power gating sequence using handshaking control signals and proposed GALS wrapper is discussed in section III along with modeling equations. Section IV, V & VI presents experimental setup, implementation details and results of GALS 8051 respectively to corroborate the proposed power gating interface. Section VI concludes the paper.

II. GALS ASYNCHRONOUS INTERFACE The conventional GALS interface is shown in Fig. 3 and

Fig. 4. The data exchange between two GALS modules is handled by port controllers using handshake protocols which involves request and acknowledgement signals. The port controllers can be of two types based on how they control the local clock generators, poll-type and demand-type. Poll-type port controller stops the local clock during the hand shake process if necessary and hence the locally synchronous block can keep functioning while it handles the data transfer. On the other hand demand-type port controller stops the local clock as soon as locally synchronous block need to receive or send the data. Local synchronous modules are responsible for initiating any data transfer. It will be done by asserting a clock enable (Ri) signal which stops the next cycle of its local clock generator. After the local clock of synchronous block is paused the request port signal (Rp) is asserted to notify the receiver block about the readiness to transfer the data. The receiver synchronous block up on getting Rp pauses its local clock and sets it’s acknowledge port signal (Ap) to indicate that it is ready to accept the data. At this point both synchronous blocks have stopped their clocks and can exchange data without any timing violations. Once the data transfer completes sender de-asserts it Rp signal and in response to this receiver de-asserts its Ap signal. After completing data transfer the local clocks are released and two synchronous blocks resumes their normal operation [21, 22].

Figure 1. ITRS road map showing static power trend

Figure 2. Power Gating in a System on Chip [16]

Figure 3. GALS system with two synchronous blocks

Figure 4. GALS synchronous blocks handshaking interface

Figure 5. Power gating timing sequence is synchronous designs

Figure 6. Proposed GALS power gating timing sequence

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

35

Page 3: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

III. GALS POWER GATING WRAPPER PROPOSAL

A. Possibility for power gating in GALS architectures Asynchronous and GALS architectures use handshaking

protocols for data synchronization instead of using a global clock like synchronous architectures. By using handshaking control signals GALS and asynchronous circuits can determine start and end of the circuit operation inherently. Hence there is a possibility to generate power gating sequence to determine when to switch between ideal and active modes using handshake control signals. In pure asynchronous circuits due to micro pipeline stages only fine-grained power gating is possible which take more circuit area and power [23], whereas in GALS designs any local synchronous block can be power gated using coarse-grained power switches.

B. New Power Gating Proposal In conventional GALS interface with demand-type ports

discussed above, when data exchange between two blocks is initiated by pausing the local clocks, the receiver synchronous block will become ready to accept the data transfer with the output “Rp” signal form the sender synchronous block. if this request signal can be used as “wake-up” signal for the power gated receiver which is waiting for the data, a power gating sequence that can switch receiver between “active” and “sleep” modes can be designed with minimal control overheads. The proposed power gating sequence is shown in Fig. 6. Similar to synchronous clock based power gating sequence mentioned in [16] and shown in Fig. 5, a local synchronous block which has to wait long for data from other synchronous block can be power gated using proposed power gated timing sequence. The typical power gating logic signals to isolate output ports, retention signals to save and restore sequential data and power sleep signals to switch the block form “active” and “sleep” modes can be generated within the 4-phase asynchronous wrapper communication sequence. However the clock is shown in Fig. 6 represents local pausible clocks of synchronous blocks which need to be enable or disabled in separate handshake sequence.

C. Petri-Net and Signal Transition Graph (STG) GALS port controllers are designed as asynchronous finite

state machines triggered by input request and acknowledge output transitions. The proposed power sequence also uses the same signals and hence same design methodology can be adopted. However design of asynchronous circuits is a complex procedure. For Synthesizing corresponding power gating sequence logic of Fig. 5 the events of the signals needs to be modeled as signal transition graphs (STGs). Signal Transition Graphs (STG) is a subdivision of Petri Nets where events are modeled as signal transitions of the Circuit. Concurrent systems can be described and implemented by using Petri Net (PN) model [5]. It is consists of transition and place components. STGs are a particular type of labeled Petri nets, while the definition of Petri-net is a triple Ndf = (P; T; F) such that P and T are disjoint sets of respectively places and transitions, and F ⊆ (P × T) [ (T × P) is a flow relation. A marking of N is a multi-set M of places, i.e., M: P -> {0, 1,

2…}. The transitions are associated with the changes in the values of binary variables. These variables can be associated with wires, when modeling interfaces between blocks, or with input, output and internal signals in a control circuit [26].

Signal Transition Graph (STG) is a quadruple Ґ = (N;M0;Z; λ), where - Σ = (N;M0) is a Petri net (PN) based on a net N = (P; T; F) - Z± is a finite set of binary signals, which generates a finite alphabet Z± = Z±×{+,-} of signal transitions - λ: T -> Z± is a labeling function. In STG the signal transitions of a net “a” is represented

with label “a+” when it rises or with label “a-“when it falls. An STG describes the relation between events and can be viewed as high-level specification of design behavior [27]. However synthesis process needs a binary labeled state transition to map the logic with technology library. For this STG needs to be translated to a state graph (SG) which is a binary label finite state machine. A state graph (SG) is a quintuple SG = (A, S,E, π, s0), where A is a finite set of signals, S is a set of states, E = S × S is a set of transitions, π is a labeling function which labels each state’s ∈ S with a bit-vector over A and s0 is the initial state [28, 29]. Designers manually need to describe State Transition Graph (STG) and it is then inserted to tools such as Petrify to get State Graph (SG) and corresponding equations for implementation circuit.

IV. EXPERIMENTAL SETUP In this work to corroborate our proposed GALS power

gating technique Daltons [25] 8051 synchronous microcontroller RTL is used and created the asynchronous wrappers to meet the GALS criteria. The operations of the asynchronous microcontroller is similar to the synchronous microcontroller but with few key differences. Unlike the synchronous version of 8051, in asynchronous version, the clock is stopped while the I8051_CTR module waits for the I8051_ALU module to execute the result of a given operation by using the 4-phase handshaking signals generated from the ALU and controller wrappers as shown in Fig. 7.

Figure 7. GALS 8051 architecture with asynchronous wrappers

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

36

Page 4: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

Controller wrapper produces a request signal when it needs ALU to perform any operations. This request signal is again de-asserted by the controller when it receives an acknowledge signal from the ALU wrapper. ALU wrapper produces an acknowledge signal to specify that the ALU has completed the given operation which was requested from the controller. This requested operation and delay is determined by the ALU Op-Code. Apart from this the nature of this onboard local clocking element is same as the global clock, except that this clock is stopped when the request signals is asserted and acknowledge signal is de-asserted.

Figure 8. Signal transition graph (STG) and State Graph (SG) for

proposed power gating sequence

Figure 9. Implementation Flow for synthesis

V. SYNTHESIS FLOW In asynchronous 8051 only 8051_CTR and 8051_RAM

blocks uses clock signal. RAM block is implemented with SAED-90nm [30, 31, and 32] standard cell library along with other blocks and contributes to 50% of total power. On the other hand from the synthesis results the critical delay of 8051_ALU is more for division operation. Next two sub critical delays are for multiplication and addition operations. So in order to take the advantage of proposed power gating technique an U.P.F (Unified Power Format) based power

switch is created using power controller logic signals for 8051_RAM. During division, multiplication and addition operations 8051_RAM will be switched off through U.P.F switch. With U.P.F specification 8051_RAM logic switching activity in post synthesis simulations will not be counted for power analysis results during “sleep” mode. The power intent diagram for 8051_RAM block is shown in Fig.10 with all power gating sequence control signals. The asynchronous wrappers for 8051_ALU and 8051_CTR along with local pausible clocking element was designed based on 4-phase handshaking communication protocol principles [33]. The STG, SG and corresponding equations for power gating sequence is generated using petrify. After obtaining equations for SG relative constraints [34, 35, and 36] are developed for synthesis to obtain variable delays. The STG and SG for proposed GALS power gating sequence is shown in Fig. 8. The implementation flow for synthesis is shown in Fig. 9. Pre and post synthesis is carried out using Synopsys VCS®

with U.P.F. Synthesis is done with Synopsys Deign Compile®. The power analysis to results obtained using Synopsys Prime Time PX® by using switching activity information from simulation.

Figure 10. Power Intent Specification for post synthesis simulation with power switch, isolation to outputs and retention control

VI. EXPERIMENTAL RESULTS

Table-I & II shows the comparison results of 8051_RAM block power reports with and without power gating. There is a 30% reduction in leakage power of 8051_RAM block during power gating stage. Fig. 11 shows power gating timing

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

37

Page 5: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

sequence during division operation of ALU. Fig.12 shows entire simulation result for complete division operation which includes reading data from the I8051_RAM module; send the fetched data to the I8051_ALU module for an appropriate logical or arithmetic operation, then write the results of the ALU operation back into the I8051_RAM module. Besides division operation there are several ALU operations that are executed during this simulation time. For example, the PCSADD ALU op-code is used in calculating address offsets in jump instructions executed by the controller, and the PCUADD ALU op-code is used in incrementing the program counter inside the controller; since both op-codes make use of the addition function, these ALU operations are assumed to be part of the addition delay. Because of this during PCSADD or PCUADD related arithmetic operations also power gating sequence is generated based on request and acknowledgement signals from ALU and CTR asynchronous wrappers as shown in Fig. 12. However during power gating stage all the interface signals will be inactive and will be shown as “CORRUPT” stage in simulation results as shown in Fig. 12. The local clock generator out is paused whenever CTR WRAPPER block asserts request signal and eventually causes power gating down (sleep) sequence is generated. Like wise up on receiving acknowledgement form ALU WRAPPER block after completing addition, multiplication and division operations power gating up (active/wake-up) sequence is generated.

TABLE I. 8051 Power report without power gating S.No Hierarchy Leakage

Power Total Power

%

1 I8051_ALL 3.44E-04 6.79E-04 100

2 I8051_RAM 2.08E-04 3.63E-04 53.4

TABLE II. 8051 Power report with power gating S.No Hierarchy Leakage

Power Total Power

%

1 I8051_ALL 2.88E-04 6.68E-04 100

2 I8051_RAM 1.51E-04 3.54E-04 53

Figure 11. Power gating sequence timing during division operation

Figure 12. Simulation results for division operation

VII. CONCLUSIONS In Deep submicron Technology, leakage power is a critical

issue, so it is very important to minimizing leakage power as much as possible. Though GALS system architectures ensures less dynamic power through clock gating, there is a need for addressing leakage power as well. In this paper a novel power gating sequence for GALS designs using their 4-phase handshaking signals is proposed along with relative constraints based synthesis implementation flow. Experimental results show promising results in terms of reduced leakage power numbers in power gating blocks. However more techniques need to be explored for the scenarios where the power gating block has to be in sleep state for long time as part of the future work.

REFERENCES

[1] I.E Sutherland and J. Ebergen, “Computers without Clocks”, Scientific American, Agust 2002, PP. 62-69.

[2] A. Davis and S.M. Nowick, “An Introduction to Asynchronous circuit Design”, Techinical Report, UUCS-97-013, Computer Science Department, University of Utah, Sep 1997.

[3] S. Hank, “Asynchronous design methodologies: an overview”, proceedings of the IEEE, Vol.83, Issue 1, Jan 1995, pp.69-93.

[4] A. Bink and Mark de Clercq, “ARM996HS Synthesizable CPU with Clockless Technology”, Information Quarterly, Vol.5, No.4,2006,pp.20-24.

[5] J. Sparso and S. Furber, Principles of asynchronous circuit design-a systems prospective, Kluwer Academic Publishers, London, 2001.

[6] C. J. Chen, W. M. Cheng, H. Y. Tsai and J. C. Wu, “A Quasi-Delay-Insensitive Microprocessor Core Implementation for Microcontrollers”, Journal of Information Science and Engineering, Vol. 25, No.2, March 2009, pp. 543-557.

[7] Kumar, B.V.P.V, Sharma N.S.M, Kishore K.L and Goel N. “Leakage Power recovery in spare cells by using state dependent leakage tables from library models”, Asia Pacific Conference on Post Graduate Research in Microelectronics & Electronics, 5-12 Dec 2012, pp.19-24.

[8] A. Agarwal, S. Mukhopadhyay, A. Raychowdhury, K. Roy, and C.H. Kim, “Leakage Power Analysis and Reduction for Nanoscale Circuits,” IEEE Micro, vol. 26, no. 2, pp. 68-80, Mar. 2006.

[9] H. Rahman and C. Chakrabarti, “A Leakage Estimation and Reduction Technique for Scaled CMOS Logic Circuits Considering Gate Leakage,” Proc. Int’l Symp. Circuits and Systems, pp. 297-300, vol. 2, 2004.

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

38

Page 6: [IEEE 2013 IEEE Postgraduate Research in Microelectronics and Electronics Asia (PrimeAsia) - Visakhapatnam, India (2013.12.19-2013.12.21)] 2013 IEEE Asia Pacific Conference on Postgraduate

[10] T. Krishnamohan, Z. Krivokapic, K. Uchida, Y. Nishi, and K.C.Saraswat, “High-Mobility Ultrathin Strained Ge MOSFETs on Bulk and SOI with Low Band-to-Band Tunneling Leakage:Experiments,” IEEE Trans. Electron Devices, vol. 53, no. 5, pp. 990-999, May 2006.

[11] Kim N.S, Austin T, Baauw D, Mudge T, Flautner K, Hu J.S, Irwin M.J, Kandemir M, Narayanan, V., “Leakage current: Moore's law meets static power”, IEEE Computer Society, vol. 36, no 12, pp 68-75, Dec 2004.

[12] International Roadmap for Semiconductors, [Online]. Available: http://www.itrs.net

[13] P.E. Zeitzoff and J.E. Chung, “A Perspective from the 2003 ITRS:MOSFET Scaling Trends, Challenges, and Potential Solutions,”IEEE Circuits and Devices Magazine, vol. 21, no. 1, pp. 4-15, Jan./Feb. 2005.

[14] K. Lal Kishore, V.S.V. Prabhakar, “VLSI Design”, I.K International Pvt Ltd, Jan 2009, pp 414.

[15] Raj Nair, Donald Bennett, “Power Integrity Analysis and Management for Integrated Circuits”, Printice Hall, May 2010, pp. 432

[16] Michael Keating, David Flynn, Robert Aitken, Alan gibbons, Kaijian Shi, “Low Power Methodology Manual for System-on-Chip Design”, Springer, Jan 2007, pp 300.

[17] “Reducing Power with Advanced Synthesis”, Synopsys Insight, 2011, Issue4, [online] Available, http://www.synopsys.com/Company/Publications, Synopsys, Inc.

[18] Prance Zhang, “Dynamic Power Optimization with IC Compiler and Prime Time PX”, proc. SNUG China 2011.

[19] Li Ju Chi, “Power Gating Implementation Techniques for 90nm IP Library”, proc. SNUG Taiwan 2008.

[20] Maria Tovey, “Analyzing the Effectiveness of Multi-Voltage Power Saving Techniques with PT-PX, Tutorial, proc. SNUG San Jose 2011.

[21] X.Fan, M Krstic and E. Grass”Analysis and optimization of pausible clocking based GALS design”, in proc 27th IEEE ICCD, 2009.

[22] Chong-Fatt Law, Bah-Hwee Gwee and Joseph S. Chang, “Modeling and Synthesis of Asynchronous Pipelines”, IEEE transactions on Very Large Scale Integrations (VLSI) Systems, Vol 19, No.4.April 2011

[23] Tong Lin, Kwen-Siong Chong, Bah-Hwee Gwee, and Joseph S. Chang. “Fine-grained power gating for leakage and short-circuit power reduction by using asynchronous-logic”, IEEE International Symposium on Circuits and Systems, 24-27 May 2009, pp 3162-3165.

[24] S. Hauck, “Asynchronous design methodologies: An overview,” Proc. of the IEEE, vol. 83, no. 1, Jan. 1995, pp. 69–93.

[25] Dalton Project. University of California, Department of Computer Science, [online] Available , http://www.cs.ucr.edu/~dalton/8051

[26] Sufian Sudeng and Arthit Thongtak “Signal transition graph based logic synthesis for asynchronous control circuits using template based method” IEEE TENCON Conference, 2007, pp 1-4.

[27] L. Rosenblum and A. Yakovlev, “Signal graphs: From self-timed to timed ones”, Proceedings of International Workshop on Timed Petri Nets, 1985. 28, 30

[28] L. Lavagno. “Synthesis and Testing of Bounded Wire Delay Asyn-chronous Circuits from Signal Transition Graphs”,. PhD thesis, U.C.Berkeley, 1992. 11, 30, 89

[29] T. Murata, “Petri nets: Properties, analysis and applications.” in Proceedings of the IEEE, vol. 77, no. 4, April 1989, pp. 541–580.

[30] A. Rajakumari, N.S.Murthy Sharma and K. Lal Kishore, “A Novel Approach to Reduce Leakage Power in GALS system architectures, vol 36, no. 5, Dec 2011

[31] Goldman R., Bartleson K., Wood T., Kranen K., Cao C., Melikyan V., Markosyan G., "Synopsys' open educational design kit: capabilities, deployment and future", IEEE International Conference on Microelectronic Systems Education, 2009, pp. 20-24

[32] Goldman R., Bartleson K., Wood T., Kranen K., Cao C., Melikyan V., "Synopsys' Interoperable Process Design Kit", European Workshop on Microelectronics Education, 2010

[33] Yuan-Teng Chang , Wei-Che Chen , Hung-Yue Tsai , Wei-Min Cheng , Chang-Jiu Chen and Fu-Chiung Cheng, “ A Low-latency GALS Interface Implementation”, Proceedings of IEEE Asia Pacific Conference on Circits and Systems, 2010, pp 1183-1186.

[34] Knneth S. Stevens,Ran Ginosar and Shai Rotem, “Relative Timing”, IEEE Transactions on Very Large Scale Integraton (VLSI) Systems, vol 11, no.1, Feb 2003, pp 129-140.

[35] Kenneth S. Stevens, Yang Xu, and Vikas Vij, “Characterization of Asynchronous Templates for Integration into Clocked CAD Flows”, 15th IEEE Symposium on Asynchronous Circuits and Systems, 2009, pp 151-161

[36] Ryan Mabry, “Asynchronous Implementation of 8051 Microcontroller” ”, Honor’s thesis, USF,2005.

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

39