ieeevlsi1011

CEDRONICSAcademic Projects

A Brand for the Successful ProjectsVLSI IEEE PROJECTS 2010-12

CEDRONICS, G2, Dharani Residency, Behind Allahabad Bank, Radhika Theatre to Moulali Road, Gayathrinagar ,ECIL Hyderabad, Andhra Pradesh.

[email protected], [email protected] 550 550, (0) 99660 56748, (0)8897252837

Academic Projects

2010 - 2012

Why Cedronics

Develops your concept or IEEE paper of your choice*100+ projects to choose from *Complete

Guidance * On time Completion * Excellent Support *Multi platform Training * Flexibility

PROJECTS SUPPORTS & DELIVERABLES

Project Abstracts & IEEE Paper

Project Block Diagram & Circuit Diagram

Datasheets and Manuals

PPT & Review Details Guidance

Project Report Guidance

Working Procedure & Screen Shots

Materials & Reference Books in DVD





4-bit SFQ Multiplier Based on Booth Encoder

AbstractWe have designed a 2-bit Booth encoder with Josephson Transmission Lines (JTLs) and Passive Transmission Lines (PTLs) by using cell-based techniques and tools. The Boothencoding method is one of the algorithms to obtain partial products. With this method, the number of partial products decreases down to the half compared to the AND array method. We have fabricated a test chip for a multiplier with a 2-bit Booth encoderwith JTLs and PTLs. It has a processing frequency of 20 GHz with the bias margin 25%. The frequency of this circuit increases up to 45 GHz with the bias voltage by 25% increased from the design voltage. The circuit area of the multiplier designed with the Booth encoder method is compared to that designed with the AND array method.

A New Quaternary FPGA Based on a Voltage-mode Multi-valued Circuit

AbstractFPGA structures are widely used due to early time-to-market and reduced non-recurring engineering costs in comparison to ASIC designs. Interconnections play a crucial role in modern FPGAs, because they dominate delay, power and area. Multiple-valued logic allows the reduction of the number of signals in the circuit, hence can serve as a mean to effectively curtail the impact of interconnections. In this work we propose a new FPGA structure based on a low-power quaternary voltage-mode device. The most important characteristics of the proposed architecture are the reduced fanout, low number of wires and switches, and the small wire length. We use a set of FIR filters as a demonstrator of the benefits of the quaternary representation in FPGAs. Results show a significant reduction on power consumption with small timing penalties.

Voltage Scalable High-Speed Robust Hybrid Arithmetic Units Using Adaptive Clocking

Abstract

In this paper, we explore various arithmetic units for possible use in high-speed, high-yield ALUs operated at scaled supply voltage with adaptive clock stretching. We demonstrate that careful logic optimization of the existing arithmetic units (to create hybrid units) indeed make them further amenable to supply voltage scaling. Such hybrid units result from mixing right amount of fast arithmetic into the slower ones. Simulations on different hybrid adder and multipliers in BPTM 70 nm technology show





18%–50% improvements in power compared to standard adders with only 2%–8% increase in die-area at iso-yield. These optimized datapath units can be used to construct voltage scalable robust ALUs that can operate at high clock frequency with minimal performance degradation due to occasional clock stretching.

Complexity Analysis and Efficient Implementations of Bit Parallel Finite Field Multipliers Based on Karatsuba-Ofman Algorithm on

FPGAs

Abstract

This paper presents complexity analysis [both in application-specific integrated circuits (ASICs) and on field-programmable gate arrays (FPGAs)] and efficient FPGA implementations of bit parallel mixed Karatsuba–Ofman multipliers (KOM. By introducing the common expression sharing and the complexity analysis on odd-term polynomials, we achieve a lower gate bound than previous ASIC discussions. Theanalysis is extended by using 4-input/6-input lookup tables (LUT) on FPGAs. For an arbitrary bit-depth, the optimum iteration step is shown. The optimum iteration steps differ for ASICs, 4-input LUT-based FPGAs and 6-input LUT-based FPGAs. We evaluate the LUT complexity and area-time product tradeoffs on FPGAs with different computer-aided design (CAD) tools. Furthermore, the experimental results on FPGAs for bit parallel modular multipliers are shown and compared with previousimplementations. To the best of our knowledge, our bit parallel multipliers consumethe least resources among known FPGA implementations to date.

Systematic Design of RSA Processors Based on High-Radix Montgomery Multipliers

Abstract

This paper presents a systematic design approach to provide the optimized Rivest–Shamir–Adleman (RSA) processors based on high-radix Montgomery multipliers satisfying various user requirements, such as circuit area, operating time, and resistance against side-channel attacks. In order to involve the tradeoff between the performance and the resistance, we apply four types of exponentiation algorithms: two variants of the binary method with/without Chinese Remainder Theorem (CRT).We also introduces three multiplier-based datapath-architectures using different intermediate data forms: 1) single form, 2) semi carry-save form, and 3) carry-save form, and combined them with a wide variety of arithmetic components. Their radices





are parameterized. A total of 242 data paths for 1024-bit RSA processors were obtained for each radix. The potential of the proposed approach is demonstrated through anexperimental synthesis of all possible processors with a 90-nm CMOS standard cell library. As a result, the smallest design of 861 gates with 118.47 ms/RSA to the fastest design of 0.67 ms/RSA at 153 862 gates were obtained. In addition, the use of the CRT technique reduced the RSA operation time of the fastest design to 0.24 ms. Even if we employed the exponentiation algorithm resistant to typical side-channel attacks, the fastest design can perform the RSA operation in less than 1.0 ms.

A Lightweight High-Performance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields

Abstract

The faults that accidently or maliciously occur in the hardware implementations of the Advanced Encryption Standard (AES) may cause erroneous encrypted/decrypted output. The use of appropriate fault detection schemes for the AES makes it robust to internal defects and fault attacks. In this paper, we present a lightweight concurrent fault detection scheme for the AES. In the proposed approach, the composite field S-box andinverse S-box are divided into blocks and the predicted parities of these blocks are obtained. Through exhaustive searches among all available composite fields, we have found the optimum solutions for the least overhead parity-based fault detectionstructures. Moreover, through our error injection simulations for one S-box(respectively inverse S-box), we show that the total error coverage of almost 100% for 16 S-boxes (respectively inverse S-boxes) can be achieved. Finally, it is shown that both the application-specific integrated circuit and field-programmable gate-arrayimplementations of the fault detection structures using the obtained optimumcomposite fields, have better hardware and time complexities compared to their counterparts.

A Median Filter FPGA with Harvard Architecture

Abstract

To improve the speed of the image processing chip, to quick share the market and to reduce costs, this paper designs a chip with Harvard Architecture and FPGA. The chip is also used with a new hardware algorithm. Using the chip, the processing time is

13.2％ less than the time of the chip with Von Neumann Architecture. The used units of





filter are 13% of the whole FPGA gates, less than the claim part of the multi-imageprocessing chip.

A Pipeline VLSI Architecture for High-Speed Computation of the 1-D Discrete Wavelet Transform

Abstract

In this paper, a scheme for the design of high-speed pipeline VLSI architecture for the computation of the 1-D discrete wavelet transform (DWT) is proposed. The main focus of the scheme is on reducing the number and period of clock cycles for the DWT computation with little or no overhead on the hardware resources by maximizing the inter- and intrastage parallelisms of the pipeline. The interstage parallelism is enhanced by optimally mapping the computational load associated with the various DWT decomposition levels to the stages of the pipeline and by synchronizing their operations. The intrastage parallelism is enhanced by decomposing the filtering operation equally into two subtasks that can be performed independently in parallel and by optimally organizing the bitwise operations for performing each subtask sothat the delay of the critical data path from a partial-product bit to a bit of the outputsample for the filtering operation is minimized. It is shown that an architecturedesigned based on the proposed scheme requires a smaller number of clock cycles compared to that of the architectures employing comparable hardware resources. In fact, the requirement on the hardware resources of the architecture designed by usingthe proposed scheme also gets improved due to a smaller number of registers that need to be employed. Based on the proposed scheme, a specific example of designing architecture for the DWT computation is considered. In order to assess the feasibility and the efficiency of the proposed scheme, the architecture thus designed is simulated and implemented on a field-programmable gate-array board. It is seen that the simulation and implementation results conform to the stated goals of the proposed scheme, thus making the scheme a viable approach for designing a practical and realizable architecture for real-time DWT computation.

An Autonomous Vector/Scalar Floating Point Coprocessor for FPGAs

Abstract

We present a Floating Point Vector Coprocessor that works with the Xilinx embedded processors. The FPVC is completely autonomous from the embedded processor,





exploiting parallelism and exhibiting greater speedup than alternative vectorprocessors. The FPVC supports scalar computation so that loops can be executed independently of the main embedded processor. Floating point addition, multiplication, division and square root are implemented with the Northeastern University VFLOAT library. The FPVC is parameterized so that the number of vector lanes and maximum vector length can be easily modified. We have implemented the FPVC on a Xilinx Virtex 5 connected via the Processor Local Bus (PLB) to the embedded PowerPC. Our results show more than five times improved performance over the PowerPC augmented with the Xilinx Floating Point Unit on applications from linear algebra: QR and Cholesky decomposition.

An FPGA Based Real-time Remote Temperature Measurement System

Abstract

This project presents a wireless re-programmable real time temperature measurement system designed using the hardware description language and realized in hardware using the field programmable array (FPGA). The proposed system is able to measure the real time temperature of various remote locations with each of them to an accuracy of 0.25 °C. It uses wireless transmission with the data rate of 115Kbytes/s, to transmit the measured temperatures to the central control system for motioning purpose. In addition, the proposed system incorporates feature that controls the temperature at the remote locations in real time. This system effectively works for a distance of 60m between the temperature measurement locations and the control station. Since this proposed system uses a reprogrammable controller, it is possible to customize thedesign to various industry applications. This paper presents the simulation and experimental results of the proposed system.

An FPGA-based Architecture for Linear and Morphological Image Filtering

Abstract

Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of real time algorithms suited to video image processing applications. The unique architecture of the FPGA has allowed the technology to be used in many applications encompassing all aspects of video image processing. Among those algorithms, linear filtering based on a 2D convolution, and non-linear 2D





morphological filters, represent a basic set of image operations for a number of applications. In this work, an implementation of linear and morphological image filtering using a FPGA NexysII, Xilinx, Spartan 3E, with educational purposes, is presented. The system is connected to a USB port of a personal computer, which in that way form a powerful and low-cost design station. The FPGA-based system is accessed through a Matlab graphical user interface, which handles the communication setup. A comparison between results obtained from MATLAB simulations and the described FPGA-based implementation is presented.

Design and Characterization of Parallel Prefix Adders using FPGAs

Abstract

Parallel-prefix adders (also known as carry tree adders) are known to have the best performance in VLSI designs. However, this performance advantage does not translate directly into FPGA implementations due to constraints on logic block configurations and routing overhead. This paper investigates three types of carry-tree adders (the Kogge-Stone, sparse Kogge-Stone, and spanning tree adder) and compares them to thesimple Ripple Carry Adder (RCA) and Carry Skip Adder (CSA). These designs of varied bit-widths were implemented on a Xilinx Spartan 3E FPGA and delay measurements were made with a high-performance logic analyzer. Due to the presence of a fast carry-chain, the RCA designs exhibit better delay performance up to 128 bits. The carry-tree adders are expected to have a speed advantage over the RCA as bit widths approach 256.

Design and FPGA Implementation of Modified Distributive Arithmetic Based DWT – IDWT Processor for Image Compression

Abstract

Image compression is one of the major image processing techniques that is widely used in medical, automotive, consumer and military applications. Discrete wavelettransforms is the most popular transformation technique adopted for imagecompression. Complexity of DWT is always high due to large number of arithmetic operations. In this work a modified Distributive Arithmetic based DWT architecture is proposed and is implemented on FPGA. The modified approach consumes area of 6% on Virtex-II pro FPGA and operates at 134 MHz. The modified DA-DWT architecture has a latency of 44 clock cycles and a throughput of 4 clock cycles. This design is twice faster than the reference design and is thus suitable for applications that require high speed image processing algorithms.





Design and Implementation of a Low-Complexity RAKE Receiver and Channel Estimator for DS-UWB

Abstract

In this paper, the design and implementation of a low complexity Direct Sequence Ultra-Wideband (DS-UWB) receiver subsystem which incorporates a Channel Estimator (CE) and a novel hybrid Partial/Selective (HPS) RAKE Receiver (RR) using maximal ratio combining (MRC) is presented. The proposed architecture demonstrates the tradeoff between energy capture, performance and receiver complexity by combining the benefits of both partial and selective RAKE receiver algorithms. We focus our work on a highly parallel, modular, synthesizable design which is based on FPGA technology and it is optimized for high performance.

Design and Implementation of CORDIC Processor for Complex DPLL

Abstract

Now-a-days various Digital Signal Processing systems are implemented on a platform of programmable signal processors or on application specific VLSI chips. COordinateRotation DIgital Computer (CORDIC) algorithm has turned out to be such kind of programmable signal processor. In recent times, it has been a widely researched topic in the field of vector rotated Digital Signal Processing (DSP) applications due to itssimplicity. This paper presents the design of pipelined architecture for coordinate rotation algorithm for the computation of loop performance of complex Digital PhaseLocked Loop (DPLL) in In-phase and quadrature channel receiver. The design of CORDIC in the vector rotation mode results in high system throughput due to itspipelined architecture where latency is reduced in each of the pipelined stage. For on-chip application, the area reduction in proposed design can is achieved through optimization in the number of micro rotations. For better loop performance of firstorder complex DPLL and to minimize quantization error, the numbers of iterations are also optimized.





Design and Optimization of Serial communication system interface module

Abstract

Serial communication system has been widely used in data communications and control system because of less hardware resources, anti-jamming ability, and easy toimplement features. A FPGA-based high performance Serial communication system interface module which includes full functions of UART16550 is designed andoptimized based on the communication protocol and working principles in this paper. Various technologies are adopted during the design and optimization procedure, such as the three always block coding style, EDA optimization, circuit optimization, and so on. The frequency of the optimized design is up to 166MHz, and the power consumptionis reduced to 0.147W by 63.9%. The test data at typical baud-rate of 115200 and theanalyzed result by using Matlab are presented. The test results indicate that theoptimized design can be communicated correctly and steadily.

FPGA-based improvement of classical current tracking methods for high-frequency power converters

Abstract

Embedded microprocessors require efficient supply management systems to optimize its power consumption and to enhance their calculation potentials. Typically, the modulesperforming this function are known as Voltage Regulator Modules (VRMs). It is widely adapted that current-programmed regulation techniques own leveraging skills in the control of this kind of power converters. However, these strategies require a fineinductor-current sensing to achieve accurate results. One critical issue in the inductor-current sensing is the effect of parasitic inductances in the measurement loop. This undesirable effect produces a considerable mismatch between the real inductor currentwaveform and the equivalent voltage image captured thanks to the shunt resistance. Further, this unwanted deviation augments as long as the current value is increased. As a result, this problem makes loosely the data obtained. However, today’s commercial digital controllers, like FPGAs1, can be used to reduce overwhelmingly the aforementioned drawback. The presented work exploits some intrinsic advantages of FPGAs such as its great processing speed and its parallel working mode to overcome this drawback. Therefore, a new digital auto-tuning system is proposed in which this undesirable effect is treated and compensated. The obtained result is a digital signal which avoids the parasitic effect of the inductance in the measurement loop. In the last





part of our work, some experimental results, using a FPGA, validate the advantages of the proposed method.

FPGA Based Wide Range Optical Sensor: Vibration Detection of Compressor Blades in High Speed Turbine Engines

Abstract

Vibration in high speed turbine engines is a performance limiting factor. Opticalsensors can be used to accurately measure vibration in high speed turbo machinery.This paper will overview the development of the hardware and software whose purpose is to quantify these vibrations and present this data in a easy to understand fashion to the end user.

FPGA Design for Multi-Filtering Techniques Using Flag-Bit and Flicker Clock

Abstract

Real time systems typically suffer from delay in data processing. This delay is caused by many reasons such as computational power, processor unit architecture, andsynchronization signals in these systems. In order to increase the processing power, a new architecture and clocking technique is carried out in this paper hence theperformance. This new architecture design called Embedded Parallel Systolic Filters (EPSF) would process data gathered from sensors and landmarks using a high density FPGA chip. The results show that EPSF architecture and bit-flag with a flicker clock perform significantly better in multiple input sensors signals under both continuousand interrupted conditions. Unlike the usual processing units in previous tracking and navigation systems used in robots, this system allows autonomous control of the robot through multiple techniques of filtering and processing strategy. Furthermore, it also offer a speedy performance that minimizing the delay about 50%.

FPGA-based multi-channel CRC generator implementation

Abstract

This article mainly describes a way of designing a parallel and highly pipe lined Cyclic Redundancy Code (CRC) generator. The design can handle five different channels at an input rate of 2Gbps each. The generated CRCS are compatible with the 32-bit Ethernet





standards. This circuit has been implemented with the chip EP2C35F672C6 of AL TERA using the properties of Galois Field. The synthesis results show that the design canmeet the needs of high-speed data integrity check.

High-Speed Low-Power Viterbi Decoder Design for TCM Decoders

Abstract

High-speed, low-power design of Viterbi decoders for trellis coded modulation (TCM) systems is presented in this paper. It is well known that the Viterbi decoder (VD) is the dominant module determining the overall power consumption of TCM decoders. We propose a pre-computation architecture incorporated with -algorithm for VD, which can effectively reduce the power consumption without degrading the decoding speed much. A general solution to derive the optimal pre-computation steps is also given in the paper. Implementation result of a VD for a rate-3/4 convolutional code used in a TCM system shows that compared with the full trellis VD, the pre computation architecture reduces the power consumption by as much as 70% without performance loss, while the degradation in clock speed is negligible.

Accelerating the Non uniform Fast Fourier Transform using FPGAs

Abstract

We present an FPGA accelerator for the Non uniform Fast Fourier Transform, which is a technique to reconstruct images from arbitrarily sampled data. We accelerate the compute-intensive interpolation step of the NuFFT Gridding algorithm byimplementing it on an FPGA. In order to ensure efficient memory performance, we present a novel FPGA implementation for Geometric Tiling based sorting of thearbitrary samples. The convolution is then performed by a novel Data Translation architecture which is composed of a multi-port local memory, dynamic coordinate-generator and a plug-and-play kernel pipeline. Our implementation is in single-precision floating point and has been ported onto the BEE3 platform. Experimental results show that our FPGA implementation can generate fairly high performance without sacrificing flexibility for various data-sizes and kernel functions. We demonstrate up to 8X speedup and up to 27 times higher performance-per-watt over a comparable CPU implementation and up to 20% higher performance-per-watt when compared to a relevant GPU implementation.





(CMOS design) Low-Power and Area-Efficient Carry Select Adder

Abstract

Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-rootCSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power ascompared with the regular SQRT CSLA with only a slight increase in the delay. Thiswork evaluates the performance of the proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18- m CMOS process technology. The results analysis shows that the proposed CSLA structure is better than the regular SQRT CSLA.

Multi-Output Synchronously-Rectified Forward Converter with Load Transient Considered

Abstract

In this paper, an FPGA-counter-based scheme is presented herein and applied to aforward converter with single isolation stage and multiple outputs having synchronousrectification (SR). With only the required comparators and without any analog-to-digital converter (ADC), the information on feedback output voltage is entirely obtained according to a counter. Therefore, the proposed control topology for an SRforward converter can improve the load transient response and the cross regulation. Besides, to further upgrade the load transient response, the proposed nonlinear control technique is applied. In this paper, the proposed control scheme is described and some experimental results are provided to verify its effectiveness.

MULTIPLE COMMUNICATION–DOMAINS DESIGN IN FPGA–BASED SYSTEMS–ON–CHIP

ABSTRACT

The increasing complexity of modern digital devices demands for ever increasing communication requirements, and for an ever increasing heterogeneity of the target





applications. Specifically, different communication domains may be implemented using the same chip area, for instance to allow multiple parallel applications to be loaded onto the device. A flexible, reliable yet performant communication infrastructure is hereby proposed, to ensure inter–domain communication and cooperation. A novelcommunication–centric design is proposed to easily integrate classical bus–basedsystems with Network–on–Chip architectures, taking directly into consideration the resource requirements of the target FPGA device.

VLSI ARCHITECTURE OF PARALLEL MULTIPLIER–ACCUMULATORBASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

ABSTRACT

A new architecture of multiplier-and accumulator (MAC) for high-speed arithmetic. Bycombining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largestdelay in MAC was merged into CSA, the overall performance was elevated. Theproposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’salgorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can beadapted to various fields requiring high performance such as the signal processingareas.

LIBOR MARKET MODEL SIMULATION ON AN FPGA PARALLEL MACHINE

ABSTRACT

In this paper, we present a high performance scalable FPGA design and implementation of an interest rate derivative pricing engine that targets on the cap pricing. The design consists of a Gaussian random number generator, based on the Mersenne Twister uniform random generator, and a Monte Carlo path generation engine which calculates the prices of an interest rate derivative based on the LIBOR market model. We implemented this design on the Maxwell FPGA supercomputer using up to 32 XilinxXC4VFX100 FPGA nodes. We have also compared our FPGA hardware implementation with an equivalent optimized pure software implementation running on up to 322.8 GHz Xeon processors with 1 GB RAM each. This showed our FPGA





implementation to be 58x faster than the optimized software implementation, while being more than two orders of magnitude more energy efficient. These results scalelinearly with the number of FPGA and Xeon processor nodes used.

The Design of Image Edge Detection System Based on EDA Technique

Abstract

Principle of image edge detection system and advantages of FPGA technique inprocessing speed and exploitation period are discussed briefly and feasibility based onSobel operator to implement image edge detection is analyzed. Coprocessor module of image edge detection with EDA+FPGA technique is presented to meet the real-timerequest of image edge detection. Synthesis and waveform simulation prove that thedesign of image edge detection system is correct.

Three-phase Voltage Doubler Rectifier Based on Three-state switching Cell for Uninterruptible Power Supply Applications Using

FPGA

Abstract

This paper presents a three-phase voltage doubler rectifier based on three-stateswitching cells for Uninterruptible Power Supply (UPS) applications using FPGA. Its main features are: high power factor, reduced conduction losses, weight and volume, simple control strategy based on One-cycle Control (OCC), and connection between input and output enabling the use of inverter and bypass. A theoretical analysis, simulation results and preliminaries experimental results from a 9kW development stage lab model are presented.





(CMOS design) Design and Implementation of a Parallel Turbo-Decoder ASIC for 3GPP-LTE

Abstract

Turbo-decoding for the 3GPP-LTE (Long Term Evolution) wireless communication standard is among the most challenging tasks in terms of computational complexity and power consumption of corresponding cellular devices. This paper addresses design and implementation aspects of parallel turbo decoders that reach the 326.4 Mb/s LTE peak data-rate using multiple soft-input soft-output decoders that operate in parallel.To highlight the effectiveness of our design-approach, we realized a 3.57mm2 radix-4-based 8× parallel turbo-decoder ASIC in 0.13 _m CMOS technology achieving 390 Mb/s. At the more realistic 100 Mb/s LTE milestone targeted by industry today, the turbo-decoder consumes only 69mW.

(CMOS design) A 2 Gb/s 5.6 mW Digital LOS/NLOS Equalizer for the 60 GHz Band

Abstract

The wide unlicensed bandwidth of a 60 GHz channel presents an attractive opportunityfor high data rate and low power personal area networks (PANs). The use of single-carrier modulation can yield energy-efficient transmitter and receiver implementation, but equalization of the long channel response in non-line-of-sight (NLOS) conditions presents a significant challenge. A digital equalizer for 60 GHz channels has been designed for both line of sight (LOS) and NLOS channel conditions to meet the IEEE WPAN standard. Power consumption is minimized by using a parallelized distributed arithmetic (DA) architecture. A 2 mm 2 mm test chip in 65nm CMOS implements a 6 tap feed forward and 32 tap feedback equalizer that can be configured to cancel the response of up to 72 symbols, and consumes 5.6 mW at 2 Gb/s throughput. The chip also includes a channel estimator based on a Golay correlator for setting the equalizer coefficients and estimating frequency and timing error.





(CMOS design)An Efficient 10GBASE-T Ethernet LDPC Decoder Design With Low Error Floors

Abstract

A grouped-parallel low-density parity-check (LDPC) decoder is designed for the (2048, 1723) Reed-Solomon-based LDPC (RS-LDPC) code suitable for 10GBASE-T Ethernet. Atwo-step decoding scheme reduces the word length to 4 bits while lowering the error floor to below 10 �_ BER. The proposed post-processor is conveniently integrated with the decoder, adding minimal area and power. The decoder architecture is optimized bygroupings so as to localize irregular interconnects and regularize global interconnectsand the overall wiring overhead is minimized. The 5.35 mm_, 65 nm CMOS chip achieves a decoding throughput of 47.7 Gb/s. With scaled frequency and voltage, the chip delivers a 6.67 Gb/s throughput necessary for 10GBASE-T while dissipating 144 mW of power.

(CMOS design) Characterization of Dynamic SRAM Stability in 45 nm CMOS

Abstract

Optimization of SRAM yield using dynamic stability metrics has been evaluated in the past to ensure continued scaling of bitcell size and supply voltage in future technology nodes. Various dynamic stability metrics have been proposed but they have not been used in practical failure analysis and compared with conventional static margins. This work compares static and dynamic metrics to identify expected correlations. Adynamic stability characterization architecture using pulsed word-lines is implemented in 45 nm CMOS to identify sources of variability, and their impact on SRAM stability. Static read margins were observed to overestimate failures by 10–100 X while static write margins failed to predict outliers in critical writeability. Critical writeability was demonstrated to exhibit an enhanced sensitivity to processvariations, random telegraph noise (RTN), and negative bias temperature instability (NBTI), compared to static write margins.

ieeevlsi1011

Documents