thesis all f - delft university of...

93
Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2010 MSc THESIS Experimental Analysis on ECC Schemes for Fault-Tolerant Hybrid Memories Zaiyan Ahyadi Abstract Faculty of Electrical Engineering, Mathematics and Computer Science CE-MS-2010-02 Hybrid memories are one of the emerging memory technologies for future data storage. These memories are structured by integrat- ing non-CMOS nanodevices (e.g., carbon nanotubes, single electron junction, organic molecules) with CMOS devices. Non-CMOS nan- odevices build up crossbar-based memory cells, whereas CMOS de- vices form peripheral circuits. CMOS/Molecular (CMOL) memory is an example of hybrid memories. In spite of providing a huge data capacity and low power consumption, such memories suffer from high degree of cluster faults impacting their reliability. This thesis investigates the use of error correction codes (ECCs) to tolerate faults in hybrid memories. The ECCs considered in this work are Hamming, Reed Solomon (RS), and Redundant Residue Number System (RRNS) codes. The error correction capability and the cost incurred (in terms of area and time overhead of encoder and decoder) for each ECC and for different input data width are ana- lyzed. The experimental results show that RS and RRNS codes are able to correct cluster faults, yet requires higher cost as compared to Hamming code, which can only correct single fault at lower cost. Moreover, the area cost of RS and RRNS encoder/decoder tend to increase linearly and exponentially, respectively, as the input data width becomes bigger. Meanwhile, the time overhead of RS remains steady and while that of RRNS in- creases linearly, as the input data width increase. Overall, RS is the best ECC to tolerate cluster faults in hybrid memories.

Upload: others

Post on 19-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Computer EngineeringMekelweg 4,

2628 CD DelftThe Netherlands

http://ce.et.tudelft.nl/

2010

MSc THESIS

Experimental Analysis on ECC Schemes forFault-Tolerant Hybrid Memories

Zaiyan Ahyadi

Abstract

Faculty of Electrical Engineering, Mathematics and Computer Science

CE-MS-2010-02

Hybrid memories are one of the emerging memory technologies forfuture data storage. These memories are structured by integrat-ing non-CMOS nanodevices (e.g., carbon nanotubes, single electronjunction, organic molecules) with CMOS devices. Non-CMOS nan-odevices build up crossbar-based memory cells, whereas CMOS de-vices form peripheral circuits. CMOS/Molecular (CMOL) memoryis an example of hybrid memories. In spite of providing a huge datacapacity and low power consumption, such memories suffer from highdegree of cluster faults impacting their reliability.This thesis investigates the use of error correction codes (ECCs) totolerate faults in hybrid memories. The ECCs considered in thiswork are Hamming, Reed Solomon (RS), and Redundant ResidueNumber System (RRNS) codes. The error correction capability andthe cost incurred (in terms of area and time overhead of encoder anddecoder) for each ECC and for different input data width are ana-lyzed. The experimental results show that RS and RRNS codes areable to correct cluster faults, yet requires higher cost as comparedto Hamming code, which can only correct single fault at lower cost.Moreover, the area cost of RS and RRNS encoder/decoder tend toincrease linearly and exponentially, respectively, as the input data

width becomes bigger. Meanwhile, the time overhead of RS remains steady and while that of RRNS in-creases linearly, as the input data width increase. Overall, RS is the best ECC to tolerate cluster faults inhybrid memories.

Page 2: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,
Page 3: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Experimental Analysis on ECC Schemes forFault-Tolerant Hybrid Memories

THESIS

submitted in partial fulfillment of therequirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

by

Zaiyan Ahyadiborn in Banjarmasin, Indonesia

Computer EngineeringDepartment of Electrical EngineeringFaculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of Technology

Page 4: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,
Page 5: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Experimental Analysis on ECC Schemes forFault-Tolerant Hybrid Memories

by Zaiyan Ahyadi

Abstract

Hybrid memories are one of the emerging memory technologies for future data storage. Thesememories are structured by integrating non-CMOS nanodevices (e.g., carbon nanotubes, singleelectron junction, organic molecules) with CMOS devices. Non-CMOS nanodevices build upcrossbar-based memory cells, whereas CMOS devices form peripheral circuits. CMOS/Molecular(CMOL) memory is an example of hybrid memories. In spite of providing a huge data capacityand low power consumption, such memories suffer from high degree of cluster faults impactingtheir reliability.

This thesis investigates the use of error correction codes (ECCs) to tolerate faults in hybridmemories. The ECCs considered in this work are Hamming, Reed Solomon (RS), and RedundantResidue Number System (RRNS) codes. The error correction capability and the cost incurred(in terms of area and time overhead of encoder and decoder) for each ECC and for different inputdata width are analyzed. The experimental results show that RS and RRNS codes are able tocorrect cluster faults, yet requires higher cost as compared to Hamming code, which can onlycorrect single fault at lower cost. Moreover, the area cost of RS and RRNS encoder/decodertend to increase linearly and exponentially, respectively, as the input data width becomes bigger.Meanwhile, the time overhead of RS remains steady and while that of RRNS increases linearly,as the input data width increase. Overall, RS is the best ECC to tolerate cluster faults in hybridmemories.

Laboratory : Computer EngineeringCodenumber : CE-MS-2010-02

Committee Members :

Advisor: Dr. Ir. Said Hamdioui, CE, TU Delft

Advisor: Nor Zaidi Haron, MSc., CE, TU Delft

Chairperson: Dr. Ir Koen Bertels, CE, TU Delft

Member: Dr. Ir. Stephan Wong, CE, TU Delft

Member: Dr. Ir. Jaap Hoekstra, EE, TU Delft

i

Page 6: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

ii

Page 7: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

iii

Page 8: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

iv

Page 9: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Contents

List of Figures viii

List of Tables ix

Acknowledgements xi

1 INTRODUCTION 11.1 Potentials and Challenges of Hybrid Memories . . . . . . . . . . . . . . . 11.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Contribution of This Project . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Outline of The Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 HYBRID CMOS/NON-CMOS MEMORIES 52.1 CMOS Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Memory classification . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Behavioral model of memory . . . . . . . . . . . . . . . . . . . . . 62.1.3 Functional model of memory . . . . . . . . . . . . . . . . . . . . . 72.1.4 Electrical model of RAM . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Hybrid Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Hybrid memory classification . . . . . . . . . . . . . . . . . . . . . 132.2.2 CMOS/Molecullar hybrid memories . . . . . . . . . . . . . . . . . 142.2.3 CMOL memory structure and operation . . . . . . . . . . . . . . . 16

2.3 Defect and Fault in Hybrid Memories . . . . . . . . . . . . . . . . . . . . 17

3 ERROR CORRECTION CODES 213.1 Error Correction Code Concept . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Hamming encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Hamming decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Reed Solomon Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3.1 RS encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3.2 RS decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4 Redundant Residue Number System Code . . . . . . . . . . . . . . . . . . 333.4.1 RRNS encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.2 RRNS decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 IMPLEMENTATION 394.1 Hamming Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Design of Hamming encoder . . . . . . . . . . . . . . . . . . . . . . 394.1.2 Design of Hamming decoder . . . . . . . . . . . . . . . . . . . . . . 41

v

Page 10: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.2 Reed Solomon Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.1 Design of RS encoder . . . . . . . . . . . . . . . . . . . . . . . . . 444.2.2 Design of RS decoder . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Redundant Residue Number System ECC . . . . . . . . . . . . . . . . . . 494.3.1 Design of RRNS encoder . . . . . . . . . . . . . . . . . . . . . . . . 494.3.2 Design of RRNS decoder . . . . . . . . . . . . . . . . . . . . . . . . 50

5 EXPERIMENTAL RESULTS AND ANALYSIS 555.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.1 Hamming simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 565.1.2 Reed Solomon simulation . . . . . . . . . . . . . . . . . . . . . . . 585.1.3 RRNS simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.3 Discussion and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.3.1 Memory cell array overhead . . . . . . . . . . . . . . . . . . . . . . 635.3.2 Error correction capability . . . . . . . . . . . . . . . . . . . . . . . 645.3.3 Overall comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 Exploring Area and Time Overhead for General Cases . . . . . . . . . . . 665.4.1 Synthesis results of RS and RRNS for various data width . . . . . 665.4.2 Cost estimation of RRNS for higher error correction capability . . 68

6 CONCLUSIONS AND RECOMMENDATIONS 716.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Bibliography 76

vi

Page 11: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

List of Figures

2.1 Memory classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Block diagram of RAM behavioral model . . . . . . . . . . . . . . . . . . 72.3 Functional model (data path and control) of a RAM chip . . . . . . . . . 82.4 A six-transistor CMOS SRAM cell . . . . . . . . . . . . . . . . . . . . . . 92.5 A one-transistor CMOS DRAM cell . . . . . . . . . . . . . . . . . . . . . 92.6 An 8 by 8 decoder architecture . . . . . . . . . . . . . . . . . . . . . . . . 102.7 Static row decoder circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.8 Fast column decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.9 RAM write circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.10 Schematic view of a nanoelectronic crossbar-based nanoarchitecture . . . 122.11 Classification of hybrid memory . . . . . . . . . . . . . . . . . . . . . . . . 132.12 Structure of three dimensional hybrid memory . . . . . . . . . . . . . . . 132.13 The generic CMOL circuit (a) a side view schematic (b) a top view

schematic showing the idea of addressing a particular nanodevice via apair of CMOS lines and interface pins (c) an equivalent electrical circuitof the top view schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.14 Schematic structure of (a) top level of CMOL memory (b) one block ofCMOL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.15 Low level structure of CMOL memory. . . . . . . . . . . . . . . . . . . . . 172.16 Classification defects in CMOL . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Classifiction of ECCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Generic codeword for systematic code . . . . . . . . . . . . . . . . . . . . 223.3 Classification of ECCs based on types of faults they can correct . . . . . . 233.4 A block diagram of a hybrid memory with ECC scheme . . . . . . . . . . 233.5 Circuit that generate the sequence of elements of GF(23)= x3 + x + 1 = 0 273.6 Codeword for Reed Solomon with GF(23) . . . . . . . . . . . . . . . . . . 293.7 Structure of RRNS codeword . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1 Hamming codeword for 16 bits data . . . . . . . . . . . . . . . . . . . . . 394.2 Circuit for Hamming encoder for 16 bits data . . . . . . . . . . . . . . . . 404.3 Hamming decoder circuit for 16 bits data . . . . . . . . . . . . . . . . . . 424.4 Reed Solomon codeword for 16 bits dataword . . . . . . . . . . . . . . . . 434.5 The modified RS codeword. (a) before interleave (b) after interleave . . . 434.6 GF(24)= x4 + x3 + 1 = 0 generator . . . . . . . . . . . . . . . . . . . . . . 454.7 Circuit diagram of hardware RS encoder for 8 bits data using GF(24) . . 464.8 (a) Multiplying input bi with αi (b) Circuit diagram multiplying bi with α5 464.9 Circuit diagram of RS decoder for 8 bits data using GF(24) . . . . . . . . 484.10 RRNS codeword for 16 bits input . . . . . . . . . . . . . . . . . . . . . . . 494.11 Block diagram of RRNS encoder . . . . . . . . . . . . . . . . . . . . . . . 504.12 Block diagram of RRNS decoder . . . . . . . . . . . . . . . . . . . . . . . 53

vii

Page 12: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.1 (a) Simulation set up to verify the functionality of encoder and decoder(b) Simulation set up to evaluate the error correction capability of eachECC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 The error file that is masked with the codeword in memory cells . . . . . 565.3 Hamming simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.4 Bit positions for each symbol in RS codeword . . . . . . . . . . . . . . . . 585.5 Reed Solomon simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.6 Bit positions for each residue in RRNS codeword . . . . . . . . . . . . . . 605.7 RRNS simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.8 Comparison of the three ECCs (a) area overhead and (b) time overhead

using Xilinx (from Table 5.1) . . . . . . . . . . . . . . . . . . . . . . . . . 625.9 Comparison of the three ECCs (a) area overhead and (b) time overhead

using Synopsys (from Table 5.2) . . . . . . . . . . . . . . . . . . . . . . . 635.10 RS and RRNS (a) area overhead and (b) time overhead (using Xilinx tools) 675.11 RS and RRNS (a) area overhead and (b) time overhead using Synopsys

tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

viii

Page 13: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

List of Tables

3.1 Illustration of the encoding process of non-systematic Hamming code . . . 253.2 Illustration of the decoding process of non-systematic Hamming code . . . 263.3 Table element of GF(23) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Table element of GF(24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Multiplying 4 bits input with an element of GF(24) . . . . . . . . . . . . . 47

5.1 Synthesis results of the three ECCs using Xilinx ISE 10.1 for 16 bits datainput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Synthesis results of the three ECC using Synopsys for 16 bits data input . 635.3 Codeword bit length of Hamming, RS and RRNS code . . . . . . . . . . . 645.4 Error correction capability of Hamming, RS and RRNS code . . . . . . . 655.5 Area overhead of RS and RRNS encoder-decoder (using Xilinx tools) . . . 665.6 Time overhead of RS and RRNS encoder-decoder (using Xilinx tools) . . 665.7 Area overhead of RS and RRNS encoder-decoder (using Synopsys tools) . 675.8 Time overhead of RS and RRNS encoder and decoder (using Synopsys

tools) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

ix

Page 14: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

x

Page 15: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Acknowledgements

Alhamdulillah, thank to Allah, the God Almighty, for giving me the grace andstrength to complete this thesis.

I would like to take this opportunity to thank all people who contributed to this workand helped me during the course of this project.

First and foremost, I owe my deepest gratitude to Dr. Ir. Said Hamdioui for hisguidance, patience, and fruitful ideas. Secondly, thanks to Nor Zaidi Haron for the in-teresting discussions, feedback, and time.

Special thank to Seyab, Muttaqillah, Innocent, Andre, and Tisha for their friendshipand support.

Most important among the people, I would like to thank to my family members fortheir support and love from far during my study at TU Delft. They always cheer me upin my difficult times.

Zaiyan AhyadiDelft, The NetherlandsJanuary 4, 2010

xi

Page 16: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

xii

Page 17: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

INTRODUCTION 1The size of the devices used to construct electronic components has decreased at anastonishing pace over the past fifty years. It has followed the very familiar Moore’s Law,which states that the number of devices that can be placed inexpensively in a single unitarea of an integrate circuit has doubles approximately every two years. The continuationof technology scaling in order to follow Moore’s Law, brings the technology in electronicsto the nanoscale, hence named nanoelectronic.

Besides high density, nanoelectronic has other advantages such as high speed, lowpower dissipation, and potential of low fabrication cost. In nanoelectronic, the sizeof Complementary Metal Oxide Semiconductor (CMOS) gate length becomes smaller.The shorter the gate length, the faster the transistor and the less power it dissipates.Non-CMOS materials such as carbon nanotube (CNT) [1], and III-V materials [2] havesignificantly higher intrinsic mobility than Silicon (Si), which is the material used inCMOS. Therefore, they can potentially be used to replace Si as the channel of thetransistor for high speed applications [1, 2]. However, non-CMOS devices could notinstantly applied at once to replace CMOS devices because of some limitations anddisadvantages. Moreover, the non-CMOS technology is still new and requires moreunderstanding.

The lack of fully understanding of non-CMOS technology raise the idea of using acombination of CMOS and non-CMOS devices in a single circuit, which is named hybridcircuit. One of the emerging hybrid technologies is hybrid memories. In a hybrid memory,non-CMOS devices are used to form memory cell array, whereas CMOS devices are usedto build the peripheral circuits such as encoder, decoder, and controller. One of thehybrid memories is CMOS/Molecular (CMOL) memory [3]. This memory is structuredby integrating a non-CMOS circuit on top of CMOS circuit using an architecture knownas crossbar-based nano-on-CMOS [4]. Conical-shaped interface pins are used to connectthese two circuits. According to [5], CMOL memory is predicted to be able to storeabout 1 Tera bit of data.

1.1 Potentials and Challenges of Hybrid Memories

As mentioned above, the advantages of hybrid memories are huge storage capacity, highspeed, low power dissipation, and potential low fabrication cost. Despite of that, thememories are prone to defects and faults. Defects, include stuck-open/close and missingof non-CMOS devices, occur because of the immature bottom-up fabrication technique(e.g., self-assembly) [6]. Moreover, the imprecise top-down fabrication techniques (e.g.,nanoimprint and lithography) introduce defects such as open/short/misalignment/looseof nanowires and interface pins. These defects might leads to hard or intermittent faults.In addition, the tiny CMOS components are likely to suffer from transient faults, due to

1

Page 18: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2 CHAPTER 1. INTRODUCTION

lower signal-to-noise ratio and parametric variations. Furthermore, because the nanode-vices used to structure the memory arrays are closely connected, the impact of defectsand faults might last up to several contiguous memory cells. When this problem hap-pens, the data in the impacted memory cells may flip either from 0 to 1 or vice versa.The flipped data is said to be in erroneous states, which deviates from the original datathey represent.

In order to mitigate these problems, researchers have reinvented and reinnovatedclassic fault tolerance schemes, such as Built Self In Test (BIST) [7], Triple ModularRedundancy (TMR) [8], and Error Correcting Codes (ECC) [9]. Among of the fault tol-erance schemes, ECC is the most used due to its dynamic error detecting and correctingcapability. There are numerous types of ECC such as Hamming, Golay, Reed Muller,Bose-Chaudury-Hocquenghem, Reed Solomon, and Redundant Residue Number Systemcodes.

This thesis investigates the capability of three error correction codes, namely Ham-ming, Reed Solomon (RS), and Redundant Residue Number System (RRNS) to dealwith faults in hybrid memories. To achieve that, a comparison in term of correctioncapability, area, and time overhead are performed. The purpose of this comparison is toevaluate the appropriate ECC to be used for hybrid CMOS/non-CMOS memories.

1.2 Related Work

Existing memory technology has applied dynamic fault tolerance scheme based on ECC.The most used ECC for high speed memories is the Hamming code, because of itssimplicity and low latency [9]. This code is single error correction - double error detection(SEC - DED). Example, IBM has applied this code in their computing system includingmemory [10].

Reed Solomon code has been used in the telecommunication system. This code isused to correct burst or cluster of errors. In computing field, Reed Solomon code hasbeen used as fault tolerant schemes for memories. Cadarilli et. al. proposed a ReedSolomon code for DRAM for satellite applications [11]. B. Chen and X. Zhang proposedto use Gray code instead of binary code for bit mapping to reduce bit error rate for multi-level NAND flash memory [12]. They claimed that it can achieve a coding gain withoutany overhead and similar error-correcting performance as Bose-Chaudury-Hocquenghem(BCH) code.

RRNS has been used in telecommunication and signal processing [13]. Recently, thiscode has been proposed for memory system. Haron and Hamdioui [14] employed RRNSin their research on fault-tolerant hybrid memories. They show that RRNS is suitable tocorrect cluster errors in hybrid memory. In another work [15], the authors introduced theidea to use specific moduli set instead of using arbirary moduli set in order to have fault-tolerant hybrid memory with low impact on area and time overhead. The researchers in[16] tried to optimize the area overhead of RRNS by reducing the number of componentslike multipliers and adders in the design. Research on improving the reliability of hybridmemories have been done by Strukov by applying Bose-Chaudury-Hocquenghem ECC forCMOL memory [17]. Sun and Zhang propose two levels Bose-Chaudury-HocquenghemECC as fault tolerant for nanoscale memory [18]. Naeimi and Dehon propose Euclydean

Page 19: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

1.3. CONTRIBUTION OF THIS PROJECT 3

Geometry for hybrid memory [19], and Jeffrey and Figueiredo propose hierarchical faulttolerant by using multilevel Hamming ECC for nanoscale memory [20]. These worksaccording to to their publications have proven that ECC is able to improve the reliabilityof hybrid memories.

1.3 Contribution of This Project

This project investigates the cost of implementing three ECCs, namely Hamming, RS,and RRNS for transient faults mitigation in hybrid memories. The area and time over-head of the encoder and decoder of the ECC are estimated using hardware design tools.This will give an indication which ECC is suitable for hybrid memories.

The main contributions of this work are:

• Experimental evaluation of the efficiency of the existing ECC in term of errorcorrection capability, hardware, and time overhead.Simulations on Hamming and Reed Solomon have been carried out to accomplishedthis evaluation. This is to evaluate the suitability of the existing ECC to addressfaults in hybrid memories.

• Introduce RRNS as fault tolerant for hybrid memories.RRNS has been extensively used in digital signal processing and communication,however, not in memory system. This work is the first along with [14, 15] to useRRNS in memory system.

• Design one symbol error correction capability for RS and RRNS.Hardware implementation of RS has been performed, yet RRNS is still unavailable.This work designs RRNS encoder and decoder for one error correction using VeryHigh Speed Integrated Circuit Description Language (VHDL). The circuits are alsosynthesized using Xilinx and Synopsys tools for FPGA and ASIC implementation

1.4 Outline of The Report

This report is organized into six chapters as follows:Chapter 1 introduces the background of this work, the potentials and challenges of

hybrid memories, the related work, the contribution of the work and structure of thereport.

Chapter 2 describes two types of memories; CMOS memory and hybrid memory.In CMOS memory section the classification of CMOS memory, the functional, and theelectrical model of CMOS memories will be presented. In hybrid memory section, thearchitecture CMOL memory, its structure, and its operation will be given. The lastsection describes the defects and faults in hybrid memory.

Chapter 3 overviews the theoretical aspects of error correcting codes. This chapterconsists of three sections discussing three types of error correcting codes; Hamming, ReedSolomon and Redundant Residue Number System. Each section gives the theory of thecode, the generic algorithm in encoding and decoding process, and some examples.

Page 20: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4 CHAPTER 1. INTRODUCTION

Chapter 4 presents the implementation of the considered codes in this work. Theencoder and decoder circuit of each code are based on 16 bits dataword memory.

Chapter 5 reports the experimental and synthesis results. First, the simulation setupand result of each code will be presented. Next, the synthesis of the designed encodersand decoders based on Xilinx and Synopsys tools will be reported. Afterwards thecomparison analysis will be given.

Chapter 6 concludes the work and proposes some recommendations for further workswith the aim to enhance the fault tolerance capability of hybrid memory and to optimizethe investigated schemes in this work.

Page 21: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

HYBRID CMOS/NON-CMOSMEMORIES 2This chapter describes CMOS and hybrid CMOS/non-CMOS memories. It begins withthe CMOS memories; it provides a classification, and gives the model and the struc-ture, and explains their operations. After that, the chapter introduces the classificationof hybrid memories, their structures, and their operations. The chapter ends with adiscussion about reliability issues in hybrid memories.

2.1 CMOS Memories

Memory, in computer field, is a device that store data or programs (sequences of in-structions) on a temporary or permanent basis for use in an electronic digital computer.Existing memories are fabricated mainly based on Complementary Metal Oxide Semicon-ductor (CMOS) technology. This section overviews of CMOS memory, its classification,structure, and operations.

2.1.1 Memory classification

There are two main classifications of CMOS memory; these are Random Access Memory(RAM) and Read Only Memory (ROM). The name RAM implies that a memory isaccessed randomly for read and write operations. RAM is volatile, meaning it loses itsstored data when the power to memory is turned off. The name ROM, in opposition,implies that only read operations can be performed (although write can be performedin limited number). ROM is non-volatile, meaning that it retains its stored data whenthe power is removed. Figure 2.1 shows the classification of CMOS memory.

Memories

RAM ROM

SRAM DRAM DRAM OTP EEPROMEPROM Flash

Figure 2.1: Memory classification

5

Page 22: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

6 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

RAM can be classified into two types: Static RAM (SRAM) and Dynamic RAM(DRAM). SRAM uses bistable latching circuitry to store each bit, so it does not needto be refreshed. On the other hand, DRAM uses separate capacitor to store each bit,so it needs to refresh the capacitor charge periodically, because the charge of capacitorsare gradually leaked away. Typical DRAM cell uses one transistor and one capacitor toform one memory cell. Therefore, DRAM has more density as compared to SRAM cell,which is formed typically of six transistors. But, the operation of DRAM is slower thanSRAM. These advantage and disadvantage will be explained further in the structureand operation of RAM subsection in this chapter.

ROM can be distinguished into Non-programmable ROM, One-time Programmable ROM(OTP), Erasable Programmable ROM (EPROM), Electrically EPROM (EEPROM) andflash memory. These ROMs differ in terms of how many times they can be programmedand how they are erased.

Non-programmable ROM. This type of ROM, also known as basic ROM, is pro-grammed during manufacturing and after that it can not be programmed any more.

One-time programmable ROM (OTP). This type of ROM can be programmedonly once either by the end user or the manufacturer.

EPROM. This type of ROM can be programmed several times after stored data hasbeen erased. The stored data is erased by using ultraviolet light. On top of the packagethere is a transparent window to allow the UV light reach the memory cells. Duringoperation the transparent window has to be covered such that the light cannot reach thememory cells.

EEPROM. This type of ROM can be programmed several times. The way to erasethe data is by using electrical field electron emission. This can be done by applying acertain level voltage to the memory. A byte of data is programmed and erased at a time.

Flash ROM This type of ROM is a special type of EEPROM, which is erased andprogrammed in large blocks (block-wise) several times compared to EEPROM, which iserased and programmed in byte-wise. Hence accessing time of flash memory is fasterthan accessing time of EEPROM.

In the rest of the report, if it is not mentioned specifically, the term memorygenerally refers to a RAM.

2.1.2 Behavioral model of memory

Behavioral model is an abstraction level of the specifications of a system. The internalsof system are not visible because only the function of the system is specified. In be-havioral model of a memory system, the inputs are Data-in, Address, and control signal

Page 23: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.1. CMOS MEMORIES 7

Read/Write (R/W), whereas the output is Data-out as shown in Figure 2.2. Read pro-cess is to get the logic states of the memory cells. For this process, address points to thememory cells, control signal R is activated, and the data is available at data-out. Writeprocess is to store the logic states to memory cells. For this process, address points tothe memory cells, control signal W is activated and the data is asserted into data-in.

Figure 2.2: Block diagram of RAM behavioral model

2.1.3 Functional model of memory

Functional model is based on the functional specification of the system. It describesa collective subsystems that interact with each other to complete the system. Eachsubsystem has its own function and behavioral model. The model of a memory systemconsists of address latch, row decoder, column decoder, memory cell array, write driver,sense amplifier, data register, precharge circuit and refresh logic for DRAM.

A functional model of a RAM is shown on Figure 2.3. Note that the refresh logicsubsystem only applies to in DRAMs and is not required for SRAMs. The address latch(A) contains the memory address. The higher-order bits of the address are connected tothe row decoder (B) to select a row in the memory cell array (D). The lower-order bitsare connected to the column decoder (C) to select a column in the memory cell array.The number of column selected depends on the data width (data lines) of the chip, whichdetermines the number of bits accessed during a read or write operation. Generally, incurrent memory technology, data width of of the chips are either 8 bits, 16 bits, 32 bits,64 bits or 128 bits. Data width is also known as dataword.

When a read operation is performed, the content of the selected memory cell arraysare amplified by the sense amplifiers (F), loaded into the data registers (G), and presentedon the data-out lines. When a write operation is performed the data on the data-in linesare loaded into data registers and written into the memory cell array through the writedrivers (E). Usually the data-in and data-out lines are combined to form bidirectionaldata lines, thus reducing the number of pins of the chips.

In the physical form of memory, a logical address does not necessary means thatthe data associated with such address are placed in adjacent locations. Based on thephysical arrangement of memory cells, there are three types of the memory organizationadjacent, interleaved and subarray [21]. In the adjacent organization, the bits in oneword are placed contiguously in memory cells. In the interleaved organization, the bitsin one word are placed spread in one row in such a way that the bits of the word arescattered with the bits of the other words in that row. In the subarray organization,each bit of a word is taken from a different subarray such that each of the bits has thesame address in the each subarray.

Page 24: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

8 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

Address latch Column decoder

Row deco-der

Memorycell array

Sense amplifiers

Refresh logic

Write driver

Data register

Data-out Data-in Read/write and chip enable

Address Refresh

B

A C

DE

H

F G

Data flow

Control flow

Additional block For DRAM

Figure 2.3: Functional model (data path and control) of a RAM chip

2.1.4 Electrical model of RAM

The electrical model is based on both the functional specifications of the system and thecomplete knowledge of the internal structure at the electrical level.

2.1.4.1 Memory cells

SRAM. Typical one cell SRAM is formed by six metal oxide semiconductor field-effecttransistors (MOSFETs) as shown in Figure 2.4. There are two cross-coupled invertersformed by Q1, Q2, Q3, and Q4 which behave as latch. Besides that there are wordline (WL), which is connected to the row decoder, and a pair bit line (BL) and itscomplement BL, which are connected to the column decoder. Two access transistorQ5 and Q6, each connects the BL and BL to the cross-coupled inverters, are driven byWL. During writing or reading process WL is set to high, otherwise it is low. The logicstates of cell is represented by value of the voltage on the node Q and Q.

SRAM cell has three types of states: standby, writing and reading . When SRAM isin standby state the word line is not asserted means it is in logic 0, the access transistorsQ5 and Q6 disconnect the cell from the bit lines. The two cross-coupled inverters formedby Q1 − Q4 will continue to reinforce each other as long as they are connected to thepower supply VDD.

During a write operation, the value is put on the BL and its inverse on the BL,e.g., when writing a 1, the BL is set to logic 1 and BL to logic 0. This is similar toapplying a reset pulse to a SR-latch, which causes the flip-flop to change state. WL

Page 25: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.1. CMOS MEMORIES 9

Figure 2.4: A six-transistor CMOS SRAM cell

is then asserted and the value that is to be stored is latched in. Note that the reasonthis works is that the bit line input-drivers are designed to be much stronger than therelatively weak transistors in the cell itself, so that they can easily override the previousstate of the cross-coupled inverters.

Next, the reading process will be explained by assuming that the content of thememory is a 1. The read cycle starts by precharging both bit lines to a logical 1, thenasserting the word line WL, enabling both the access transistors (Q5 and Q6). Thesecond step occurs when the values stored in Q and Q are transferred to the bit lines byleaving BL at its precharged value and discharging BL through Q2 and Q6 to a logical0. On the BL side, the transistors Q3 and Q5 pull the bit line toward VDD, a logical 1.If the content of the memory were a 0, the opposite would happen and BL would bepulled toward 1 and BL toward 0.

DRAM. Typical DRAM uses one transistor and one capacitor to form one cellmemory as shown in Figure 2.5. Therefore DRAM memory has more density andis cheaper compared to SRAM, but it is slower. In order to keep the charge in thecapacitor, a refresh signal must be applied in periodic time.

Figure 2.5: A one-transistor CMOS DRAM cell

Page 26: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

10 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

During a read operation the bit line (BL) is precharged to the threshold levelof the sense amplifier. This level is between a logic 0 and a logic 1. After that theword line (WL) is driven high such that the charge from the capacitor C is transferredto the BL. This causes a voltage swing on BL, where the magnitude of this voltageswing is determined by the ratio of the capacitor C and the capacitance of BL. Thisvoltage ratio is rather small such that a very sensitive sense amplifier is required. Thetransferred charge from the capacitor to the sense amplifier during read operation isdestructive, hence requiring a write back step in order to restore the original charge.During write operation, WL set to high and activates the transistor, so it drives BL tocharge capacitor C to the desired value.

2.1.4.2 Address decoder

A decoder is used to access a particular cell or a group of cells in the memory cell array.It consists of a row decoder and a column decoder to form two-dimensional addressingscheme in the chip. Figure 2.6 shows a decoder for a memory with 64 cells organized as8 rows of 8 cells. WLr drives row r and CLc drives column c, such that the lines BL andBL of that cell are fed to the differential read amplifier.

Row

dec

oder

Memory cell array

Col

umn

deco

der

WL0

WL1

WL6

WL7

CL0

CL1

CL6

CL7

BL0BL0BL1BL1 BL7BL7

Differentialread amplifier

Figure 2.6: An 8 by 8 decoder architecture

Row decoder. A static row decoder as shown in Figure 2.7 consists of a NOR gate.The input to the decoder consists of the address bits a0 through ak−1 or their comple-ments, where k is address line width. The output (out) is the WL line in case of a rowdecoder, or the CL line in case of column decoder. All inputs a0 through ak−1 have tobe low in order for the output to be high. For example if the particular cell with address50 or 110010 in binary will be accessed the input of gates a0 through ak−1 should bea0a1a2a3a4a5.

Page 27: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.1. CMOS MEMORIES 11

a0

a1

ak-1

a0 a1 ak-1

out

VSS

VDD

Figure 2.7: Static row decoder circuit

Data in Data out

BL0 BL0 BL1 BL1

Differentialamplifier

Writecircuit

Figure 2.8: Fast column decoder

Column decoder. Column decoder is responsible for selecting n bits where n is thesize of dataword from the accessed row. Figure 2.8 shows a fast column decoder circuit.There is a transmission gates in BL and BL, which are controlled by the signals CL andCL. The CL and CL signal can be generated by the circuits of Figure 2.7; they have thepotential of being very fast. The transmission gates allow signals to be propagated intoboth directions, thus allowing for read and write operations.

Page 28: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

12 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

2.1.4.3 Read/write circuit

The write circuitry of a RAM cell is shown in Figure 2.9. It consists of a sequenceof two inverter; its opposite outputs are connected to lines BL and BL, respectively,through pass transistor. When a signal write is high, the pass transistors allow thesignal propagates from the outputs of the inverters to BL and BL.

Figure 2.9: RAM write circuit

2.2 Hybrid Memory

Hybrid memory is formed from nanoscale CMOS and non-CMOS nanodevices. Memorycells are structured by the non-CMOS nanodevices and the peripheral circuits of thememory, such as address latch, decoder, sense amplifier and read/write driver, are buildup using nanoscale CMOS devices. Basically, the peripheral circuits have the samearchitecture as in existing CMOS based circuit. However, the memory cells are basedon special crossbar architecture.

Figure 2.10: Schematic view of a nanoelectronic crossbar-based nanoarchitecture

The structure of memory cells in hybrid memory is based on the nanowires crossbar.G. Snider et al. [22] explains the definition of the basic structure in crossbar-basednanoarchitecture built from nanowires as depicted in Figure 2.10. The crossbar consistsof two set of wires aligned perpendicularly. The junction is the crosspoint where twowires crossing perpendicularly. At any crosspoint, a diode or a transistor can be formedby using either molecular nanodevices, single electron junction, phase-change materialor semiconductor nanowire, which act like a non-volatile switch.

Page 29: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.2. HYBRID MEMORY 13

2.2.1 Hybrid memory classification

Based on the physical dimensional structure, hybrid circuit can be classified into twotypes: two dimensional structure and three dimensional structure [14] as illustratedin Figure 2.11. In two dimensional structure, the nanoscale CMOS and non-CMOSnanodevices are fabricated on the same plan (die), means that they are placed on thesame layer. So in two dimensional structure, the total chip area is the nanoscale CMOSarea plus non-CMOS nanodevices area. Nano-memory proposed by Naeimi and Dehon[19] is an example of two dimensional hybrid memory.

Figure 2.11: Classification of hybrid memory

In three dimensional structure, nanoscale CMOS and non-CMOS nanodevices arefabricated on different planes (dies); the nanowire crossbar is placed on top of CMOScircuit as shown in Figure 2.12. The main advantage of three dimensional hybrid memorystructure over two dimensional is the memory capacity. CMOS/molecular (CMOL) isan example that uses such this structure. We choose CMOL as the three dimensionalhybrid memories because:

• CMOL has almost complete information about the structure, the components, theoperation, and the proposed fabrication [23, 17].

• CMOL offer abundant of data storage, which is predicted as high as 1 Tera bit/cm2.

Figure 2.12: Structure of three dimensional hybrid memory

Page 30: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

14 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

• CMOL memories do not require specific non-CMOS devices for the memory cells.As long as the devices can behave as two terminal latching switch, they can beused [23].

• International Technology Roadmap of Semiconductor (ITRS) 2007 has includedCMOL as one of the future memory technology to replace SDRAM and DRAM[24].

2.2.2 CMOS/Molecullar hybrid memories

CMOL is the hybrid CMOS/molecular nanoarchitecture that was introduced by Likharevand Strukov [3]. The most straightforward application of CMOL are embedded and standalone memories. In [25], the authors claimed that 1 Terabit of data could be stored in asingle integrated circuit of CMOL memories. Other applications of CMOL are such asFPGA and neuromorphic processors.

The architecture of CMOL is depicted in Figure 2.13. As shown in Figure 2.13(a),

NH

NV

P1

P2

C4

S1

C1

C3

C2 C1

C3

C2

C4P2

P1

NH

Nv

S1

(a)

(b) (c)

CMOS line

Figure 2.13: The generic CMOL circuit (a) a side view schematic (b) a top view schematicshowing the idea of addressing a particular nanodevice via a pair of CMOS lines andinterface pins (c) an equivalent electrical circuit of the top view schematic

the side view of CMOL consists of nanowires crossbar on top and CMOS stack at thebottom. These two circuits are connected by conical shape interface pins. The blue layers

Page 31: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.2. HYBRID MEMORY 15

(longer) interface pins connect to one layer of CMOS lines to upper level of nanowires.In opposition, the red (shorter) interface pins connect another layer of CMOS lines tolower level of nanowires. A two terminal non-CMOS nanodevices is embedded at eachof nanowire cross points.

Figure 2.13(b) shows the top view of CMOL architecture, and Figure 2.13(c) showsan equivalent electrical circuit to represent the connection given in Figure 2.13(b). Inthese figures we can see the way to access one memory cell S1. CMOS line C1 controlsthe pass transistor that connects CMOS line C3 to nearly-vertical nanowire via interfacepin 1. Similarly, CMOS line C4 controls the pass transistor that connects CMOS line C2

to nearly horizontal nanowire via interface pin 2. When these CMOL lines are activated,the non-CMOS nanodevices can be accessed.

To structure these memories, a number of components are involved. The followinggives a brief description of the functionality and the potential fabrication method of thecomponents used to build CMOL memory.

• CMOL linesCMOS lines with size of 45 nm can be used as global interconnects routing [3]. Thestandard CMOS lithography process can be employed in producing these compo-nents.

• NanowiresNanowires are used as local interconnects routing in memory arrays. Siliconnanowire or carbon nanotubes are the material that can be used as nanowires[26, 27]. Several fabrication techniques has been susggested in manufacturing thesecomponents such as nanoimprint lithography [28], extreme ultraviolet (EUV), in-terference lithography, and block copolymer lithography.

• NanodeviceMolecular nanodevices serve as 1 bit memory cell in CMOL memory and can be fab-ricated by using self assembly technique by chemical synthesis. In [29] Likharev pro-posed several others nanodevices that potentially can be used as memory cell suchas amorphous metal-oxide films, self-assembled monolayers (SAM) of molecules,and a combination of two single-electron devices.

• Interface pinsConical-shape interface pins connect the peripheral circuits with crossbar memoryarray. These interface pins are fabricated using the technique used to fabricate tipsin field emission arrays (FAE) [30].

• Pass TransistorsFor activating the CMOS-nanowire interface pins, CMOS-based pass transistorsare suggested in original work. M. Barua and Z. Abid has proposed transmissiongate instead of pass transistor for proper and low power voltage in accessing theprogrammable diode [31]. The standard CMOS lithography can be used as thefabrication technique.

Page 32: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

16 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

2.2.3 CMOL memory structure and operation

Top level structure and operation. The top level structure of CMOL memory isquite similar the conventional memory as pictured in Figure 2.14(a) [17]. Arrays ofCMOL memory blocks are connected to block address decoder and data interface. Theblock address decoder supplies global address to CMOL memory blocks. Data from andto the blocks can be accessed by using data interfaces. In each block, a memory arrayis surrounded by a pair of data decoders at the north and south, and another pair ofaddress decoders at the west and east as exhibited in Figure 2.14(b). Notice that onlysouth data decoder has bidirectional data flow, because only that decoder connects toinput and output data line; the other three decoders are unidirectional. Although thenorth decoder is called data decoders, it is used to decode the decoders from mappingtable. We assume that the term data decoder is used for the north decoder to replicatethe existing memory architecture, which decodes data in column direction. The mappingtable and address control circuits are used to access selected memory cells inside thememory array.

Figure 2.14: Schematic structure of (a) top level of CMOL memory (b) one block ofCMOL.

Low level structure and operation. The low level structure of CMOL memory isillustrated in Figure 2.15 [17]. The nearly-horizontal nanowires stretch over the wholeblock, but the nearly-vertical nanowires are naturally cut into segments of equal length.Example in this figure the length segment is equal to four CMOL cells. Example, thefigure shows a portion of nanowires crossbar involved to hold sixteen two terminal non-CMOS nanodevices. For this, each nearly vertical nanowire, has length that is sufficientto have sixteen crosspoints with nearly vertical nanowires. To access the sixteen twoterminal non-CMOS nanodevices, an appropriate address and data signals must be ac-

Page 33: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.3. DEFECT AND FAULT IN HYBRID MEMORIES 17

tivated. This begins with address signals Acol1 and Arow1 that only select the nearlyvertical nanowire with two terminal non-CMOS devices. Next, address signals Arow2a

and Arow2b select sixteen nearly-horizontal nanowires. At this moment, the sixteen twoterminal non-COS nanodevices have been accessed. To accomplished the read or writeoperation to this nanodevices, address signal Acol2 must be activated.

Figure 2.15: Low level structure of CMOL memory.

2.3 Defect and Fault in Hybrid Memories

Hybrid memories are prone to high degree of defect and fault. This section explain theterminology of defect and fault, follow by the types of defects in CMOL memory. Fault isthe physical difference between the ”good” or ”correct” system with the current system.Fault causes error, which is the logic state difference in the system. Error causes failure,which is the malfunction of an operation. Fault can be classified into two types [32]:

• Permanent faultPermanent fault is the presence of a fault that affects the functional behavioral ofsystem permanently. Examples of permanent faults:

– Incorrect connection between Integrated Circuits (ICs), board, track, etc.

– Broken components or parts of components.

– Incorrect IC masks, internal silicon-to-metal or metal to packaged connections(a manufacturing problem).

Page 34: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

18 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

– Functional design errors (the implementation of the logic is not correct).

Defect is the specific permanent faults that occur as a result of and imperfectionfabrication process.

• Non-permanent faultNon-permanent faults can be divided into two groups:

– Transient faults are caused by environmental conditions such as cosmic rays, αparticles, pollution, humidity and temperature. The frequency of these faultsis unpredictable. This type of fault causes soft errors, which is inconsistenterrors in data that are unrelated to components or manufacturing failures.

– Intermittent faults are caused by non environmental conditions such as looseconnections, deteriorating, aging components, critical timing, resistance andcapacitance variations, physical irregularities, and noise. A characteristic ofintermittent faults is that they are behave like permanent faults for the shortdurations.

Defects

Non-CMOS devices Interface CMOS devices

Stuck-on-open nanodevices

Stuck-on-close nanodevices

Broken or shortened nanowire

Defectinterface pin

Defective CMOS circuitry

Figure 2.16: Classification defects in CMOL

Defect types in CMOL hybrid memories Figure 2.16 shows the classificationof defects that can be happened in CMOL hybrid memory. According to [17] somepossibilities which cause defect in CMOL hybrid memories are :

• CMOS circuits

– Defective CMOS circuitry. Such faults may be simply resulted from de-fects in CMOS circuitry, in particular, stuck short/open CMOS transistor,open/short broken CMOS lines.

• Non-CMOS circuits

Page 35: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

2.3. DEFECT AND FAULT IN HYBRID MEMORIES 19

– Stuck-on-open defects in nanodevices. Such defect corresponds to miss-ing and improper self -assembled of two terminal nanodevices.

– Stuck-on-close defects in nanodevices. This type of defect correspondsto extra or improper assemble of two terminal nanowire

– Broken or shortened nanowires. When a nonowire broken, some nan-odevices in that nanowire can not be accessed. On another hand when it isshortened, some nanodevices which are connected to different nanowire willaccessed wrongly.

• Interface pin

– Defective nano-to-CMOS interface pins. Such defect because of im-proper shape and location of the interface pin. This means that the locationsof the end points of the pins may deviate from precisely defined ones. Alter-natively, a defective interface can be resulted from a too small overlappingarea between the surface of a pin and nanowire.

Soft error in CMOL hybrid memories. As mentioned before soft errors isinconsistent error in data that are unrelated to components or manufacturing failures.This error is caused by external radiation are also known as Single-Event effect (SEE).Soft errors cause single event upset (SEU), which manifests as either Single-bit upset(SBU) or Multiple-bit upset (MBU). SBU refers to the flipping of one bit due to thepassage of a single energetic radiation particle, where the physical separation from anyother flipped bit is at least two memory cells. MBU refers to the flipping of severalof adjacent memory cells due to the passage of one or more radiation particles. SEUare random and they do not normally destroy a device. Three commons source of SEUare low-energy alpha particles, high energy cosmic particles and thermal neutron [33].Soft error is becoming important since it is predicted will occur frequently in nanoscalecircuits [34, 35].

Soft error mitigation technique. Mitigation of soft error can be classified into twomethods: prevention (which is also known as device and circuit-level fault tolerance) andrecovery (which is also known system and architectural-level fault tolerance) [33]. Theprevention methods protects microchips from soft error. These are implemented duringchip design and development for example by changing from conventional packaging to anultra low alpha precharging. The package can reduce the effect of radiation from particlealpha. The recovery methods recover the the corrupted data by certain mechanism.Hence the data in memory can be read correctly or fixed. Error Correction Code (ECC)is one of the recovery method. In memory circuit, this method uses extra bits in memorycell array as parity data which are produced in encoding process and will be vanishedin the output of decoding process. In next chapter we will discuss about theory of ECCschemes, and followed by chapter how this work will implement the design of ECCs.

Page 36: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

20 CHAPTER 2. HYBRID CMOS/NON-CMOS MEMORIES

Page 37: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

ERROR CORRECTIONCODES 3This chapter discusses error correction codes (ECCs). It first introduces the concept oferror correction codes, classifies them, and provides their advantages and their disadvan-tages. Even though there are a lot of ECC schemes, we will elaborate only three schemes:Hamming, Reed Solomon (RS) and Redundant Residue Number System (RRNS), becausethese three ECCs can be used as fault tolerance techniques for hybrid memories. Ham-ming code is suitable to correct one bit random fault while RS and RRNS are moresuitable to repair cluster faults, can occur in hybrid memories.

3.1 Error Correction Code Concept

The principle of error correction codes (ECCs) is redundancy. Redundant bits are addedto the dataword in order to detect and correct errors that may reside in the memory.ECCs can be classified into two major variants [36] as depicted in Figure 3.1:

• Block codes.Block codes process the data on block by block basis, treating each block of databits independently from the others. Examples of block codes are Hamming [37],Bose-Chaudhuri-Hockquengem (BCH) [38], Reed Solomon [39], and RedundantResidue Number System (RRNS) [40].

• Convolutional codes.Convolutional codes process a stream of data, the output of the convolutionalencoder, depends not only on the current input data, but also on previous inputs,either on a block-by-block or bit-by-bit basis. An example of this type of code isthe binary convolutional code, introduced by Ellias. [41].

ECC

Block Convolutional

Systematic Non-systematic

Figure 3.1: Classifiction of ECCs

21

Page 38: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

22 CHAPTER 3. ERROR CORRECTION CODES

Based on its structure, block codes can be be further divided into:

• Systematic codewords.A codeword is said to be systematic when the dataword appears at the beginningof the codeword followed by checkword at the end of the codeword.

• Non-systematic codewords.A codeword is said to be non-systematic when the dataword and the checkwordare mixed up or interleaved in the codeword.

A generic systematic codeword for an ECC scheme is shown in Figure 3.2. Here,the n bits codeword is formed by k bits dataword and j bits checkword (also known asparity word). The following notation will be used throughout this thesis to describe thecodewords:

Figure 3.2: Generic codeword for systematic code

Dataword length: kCheckword length: jCodeword length: n = j + kError correction capability: t

The code usually is represented by (n, k, t). For example, Hamming(21,16,1) means aHamming code consisting 21 bits codeword, in which 16 bits present the dataword andan error correction capability of 1 bit.

ECCs can be used as fault tolerance techniques for computer systems, i.e., forhybrid memories. Figure 3.3 shows the classification of ECCs based on the faults thatthey can correct. There are two types of hybrid memories faults:

• Random faults.Random faults are faults or errors that occur at random locations in the memorycells. Hamming or BCH codes can be used to repair this type of faults. Hammingcodes can correct single bit error, while BCH codes can correct multi bit errors.

• Cluster faults.Cluster faults are faults that occur in a group of adjacent locations in the memorycells. Reed Solomon and Redundant Residue Number System are typically used tocorrect cluster faults.

Page 39: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.1. ERROR CORRECTION CODE CONCEPT 23

Types of faults and ECCs

Random fault Cluster fault

Hamming BCH RS RRNS

Figure 3.3: Classification of ECCs based on types of faults they can correct

The process of converting an input data to a codeword is called encoding ; this is per-formed by an encoder. In opposition, the process of converting the codeword to outputdata is called decoding ; this is done by a decoder.

Figure 3.4 shows a hybrid memory with the encoder and the decoder. As men-tioned earlier, this type of memories are structured by combining non-CMOS devices(for memory cells) and CMOS devices (for the peripheral circuitry like the encoder andthe decoder).

ECC encoder ECC decoder

Memory cell array

Non-CMOS nanodevices

n bits codeword n bits codeword

kkData in

CMOS

Data out

Hybrid memory

Figure 3.4: A block diagram of a hybrid memory with ECC scheme

During a memory write cycle, a k bits input dataword is encoded to an n bits code-word prior to the actual storage. During a memory read, an n bits codeword is decodedto a k bits output data. Faults can be induced in the stored data (codeword) due tohard and soft errors. The ECC decoder circuit is able to repair these induced errors.The detection and correction capability depends on the ECC schemes and the numberof bits in the checkword. When data is read from memory, the codeword is decoded andfrom this codeword, the dataword is restored and verified to be error free.

Page 40: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

24 CHAPTER 3. ERROR CORRECTION CODES

3.2 Hamming Code

One of the most popular ECC is the Hamming code, which was introduced in 1950[37]. The code operates at the bit level, meaning that the dataword and checkword arerepresented by a number of bits. Hamming codes detect and correct only one faultybit. A Hamming codeword is constructed by inserting redundant bits, called checkwordor parity bits, to the data. The minimum distance dmin is defined as the smallestdistance between distinct codewords. The smallest distance is equal to the number ofbits that differ between any two error-free codewords. The following parameters describeHamming codes with dmin = 3, and a checkword length j ≥ 3:

Error correction capability: t = �(dmin − 1)/2� = 1Dataword length: kCheckword length: j = �log2 k� + 1Codeword length: n = k + j = k + �log2 k� + 1

For example when the dataword k = 16 bits, the length of the checkword equals j =�log 16� + 1 = 5 bits. The total codeword length equals to n = 21 bits, with an errorcorrection capability t = 1. Hence, this Hamming code can be expressed as Hamming(21,16,1) code.

3.2.1 Hamming encoding

In Hamming encoding j bits checkword are added to k bits dataword to form a codeword.The following algorithm describes the parity generation procedure and parity positionsin the final codeword [42]:

1. Distribute the checkword bits.The dataword is expanded by inserting the checkword or parity bits Pi at thepositions 2i, where i = 0, 1, 2, 3, ...

2. Mark a sequence of bit positions in determining the parity of each checkword bit.This sequence depends on the position of the checkword bit Pi. It starts from theLSB bit. The first 2i − 1 bits are not marked, the next 2i bits are marked, thenthe next 2i bits are skipped, and the last 2 steps are repeated, depending on thecodeword length. Here, i = 0, 1, 2, 3, ...

3. Determine the parity of checkword bits.All the marked bit positions, except for the LSB marked positions, are summedby using modulo-2 addition, and the results determine Pi value. The LSB markedpositions are not included in the calculation because they are the checkword bitswhich should be calculated. In hardware, this modulo sums can be executed usingXOR gates.

Page 41: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.2. HAMMING CODE 25

4. Insert checkword bits.Finally, the codeword can be created by inserting Pi into the appropriate positions.If the code is non-systematic, Pi is inserted at the positions derived from step 1.If the code is systematic, the codeword is formed by appending the bits from thecheckword on the LSB side.

The following example will illustrate the Hamming encoding procedures by using Table3.1. For example, we want to generate a Hamming code for 8 bits data D = 11010100.The dataword length k = 8, and the codeword length equals to j = �log28�+1 = 3+1 = 4bits, which results in a 12 bits codeword.

First, we distribute the checkword bits P0, P1, P2, and P3 inside the 12 bits codewordaccording to step 1 of the encoding algorithm. P0, P1, P2, and P3 are inserted at bitpositions 1, 2, 4, and 8 of the codeword. Note that, this step requires the datawordbits, which originally were located at positions 1, 2, 4, and 8, to be shifted to higher bitpositions. For this example, it means that D0 now is shifted to bit position 3 to givespace for P0 and P1 at bit positions 1 and 2. Therefore, the codeword is now formed byD7D6D5D4P3D3D2D1P2D0P1P0. This is depicted in the first row of the table.

The checking sequence is marked for each Pi. For P0, bit positions 1, 3, 5, 7, 9, 11 aremarked. For P1, bit positions 2, 3, 6, 7, 10, 11 are marked. For P2, bit positions 4, 5, 6,7, 12 are marked. And for P3, bit positions 8, 9, 10, 11, 12 are marked, all according tostep 2 of the algorithm. These steps are depicted in rows 3 to 6 in the table. Next, theparity for each Pi is determined. For P0, P1, P2, marked sequences contain two binary1’s. Thus P0, P1, and P2 are set to 0. For P4, three binary 1’s are included in its markedpositions; at positions 12, 11 and 9, hence P4 set to 1.

Finally, by inserting those Pi at their desired positions as explained in step 1, results innon-systematic code C = 110110110000. This can be seen at the last row of the table. Inopposition, if systematic code is considered then the codeword will be C = 110101101000,where P3, P2, P1, and P0 are the LSB four bits.

Bit 12 11 10 9 8 7 6 5 4 3 2 1 Calculatedposition D7 D6 D5 D4 P3 D3 D2 D1 P2 D0 P1 P0 parityDataword 1 1 0 1 0 1 0 0P0 x x x x x x 0P1 x x x x x x 0P2 x x x x x 0P3 x x x x x 1Codeword 1 1 0 1 1 0 1 1 0 0 0 0

Table 3.1: Illustration of the encoding process of non-systematic Hamming code

Page 42: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

26 CHAPTER 3. ERROR CORRECTION CODES

3.2.2 Hamming decoding

Detecting and correcting erroneous bits in Hamming codeword can be performed usingthe following algorithm [42].

1. Separate the dataword from the checkword.In case the codeword is systematic, the separation can be easily done; the j LSBbits of the codeword represents the checkword and the remaining bits are dataword.If the code is non-systematic, we take the bits at position 2i for i = 0, 1, ..., j ascheckword, and the other bits form the dataword.

2. Calculate the parity bits of dataword by applying the encoding procedure. Thismeans that the decoder also include encoder.

3. Generate syndrome bits by comparing the calculated parity bits in step 2, withparity bits obtained in step 1. The comparison can be performed by using XORbitwise operation.

4. Check the syndrome value. If the syndrom value is equal 0, the checkword is errorfree; otherwise an error is detected. In case an error occurred, the values of thesyndrome point to the location of the error. The erroneous bit can be correctedby flipping the logic value at that bit position.

Example, consider the non-systematic codeword in the previous example C =110110110000. In case an error is induced during operation e.g. at position C11, thefaulty checkword becomes C = 100110110000. The example is depicted in Table 3.2.

The first step is the separation of the dataword from the checkword; the dataword =10010100 and checkword = 1000. After separating the dataword from the checkword, thesame procedure as in the encoding process is performed and applied to the dataword tocalculate the parity bits. In this case the calculated parity bits will be 0011. This valueis XOR-ed with received checkword to get the syndrome. The syndrome bits S = 1000XOR 0011 = 1011 = 11d, which indicates that an error occur. The representation of thesyndrome bits is 11 (eleven), which means that the bit at position eleven is the erroneousbit. The correction is applied by flipping this bit from 0 to 1.

Bit 12 11 10 9 8 7 6 5 4 3 2 1 Calculatedposition D7 D6 D5 D4 P3 D3 D2 D1 P2 D0 P1 P0 parityReceivedcodeword

1 0 0 1 1 0 1 0 0 0 0 0

P0 x x x x x x 1P1 x x x x x x 1P2 x x x x x 0P3 x x x x x 0Correctedcodeword

1 1 0 1 1 0 1 1 0 0 0 0

Table 3.2: Illustration of the decoding process of non-systematic Hamming code

Page 43: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.3. REED SOLOMON CODE 27

3.3 Reed Solomon Code

Reed Solomon (RS) codes were introduced by Reed and Solomon in 1960 [39]. Thiscodes are a special example of a more general class of codes called the Bose-Chaudhuriand Hocquenghem (BCH) codes discovered in 1959; however, they were developedindependently.

This section describes the RS correction code in a more practical approach insteadof a traditional theoretical approach. It starts with an explanation of theory of finitefield or Galois field (GF) which is required to understand the basis of the RS code.

The finite field or Galois Field is a field that contains only finite elements [42].RS code uses polynomial notation GF(2m) where m a positive integer number. Allelements of the field are generated by primitive polynomials. Primitive polynomialsare the polynomial that cannot be factorize with another polynomials. For example,x2 − 1 is not a primitive polynomial, since it can be written as (x-1)(x+1). An exampleof a primitive polynomial of GF(23) is x3 + x + 1 = 0. The circuit of this primitivepolynomial generator is shown in Figure 3.5 [42].

1 10 1

dqck

dqck

dqck

2 1 0

clock

2 1 0

Figure 3.5: Circuit that generate the sequence of elements of GF(23)= x3 + x + 1 = 0

From the circuit of Figure 3.5, we can obtain the table of GF(23) elements. Ateach clock stage of the circuit, each successive member of the field is generated andrepresented by an increasing power of α. In the figure α is defined by state q2q1q0.Initially, we start from α0 = 001. If we define αn = [q2, q1, q0], where [q2, q1, q0], definedas in Figure 3.5. We obtain αn+1 = [q1, q2 ⊕ q0, q2]. Each elements of all GF is a powerof the primitive root α. The complete field of GF(23) can be seen in the Table 3.3.

All arithmetic operations, which are applied on the power of the primitive rootat a certain GF lead to a result in the same range of the GF. For the field in Table3.3, this means that any mathematical operation, like α2 × α3, will result into αn

where 0 < n ≤ 6. Important mathematical operations for RS error correction code aremultiplication, division, addition and subtraction.

Multiplication and division

Generally, multiplication and division operations can be calculated by adding orsubtracting the power of α. It is worth to note that when n = 2m−1, then αn = α0 = 1.

Page 44: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

28 CHAPTER 3. ERROR CORRECTION CODES

Power Binary value Decimal value- 000 0

α0=1 001 1α1 010 2α2 100 4α3 011 3α4 110 6α5 111 7α6 101 5

Table 3.3: Table element of GF(23)

For example GF(23), n = 23 − 1 = 7, so α7 = 1 = α0 . This is a useful to simply themathematical operations.

Examples for GF(23):Multiplication

α2α4 = α6

α5α6 = α11 = α7α4 = (1)α4 = α4

Division

α4/α2 = α2

α2/α4 = (1)α2/α4 = α7α2/α4 = α5

Since power indexes are involved in multiplication and division operations in the finitefield, an efficient and simple way to operate on elements of GF(2m) is by storing thepower index hardware implementation using a Look Up Table (LUT) [42]. In the LUT,the index of the element αi is determined by the power i. This method will be explainedin more details in the Chapter 4 (Implementation).

Addition and subtractionAdditions and subtractions in the finite field work in a similar manner. These operationstranslate to an XOR bitwise operation performed on the binary number which representsthe symbols value.

Example for GF(23):Addition

α5 + α6 = 111 ⊕ 101 = 010 = αα2 + α7 = 100 ⊕ 001 = 101 = α6

Subtraction

α5 − α6 = α5 + α6 = 111 ⊕ 101 = 010 = α, whereas (⊕ = XOR).α2 − α7 = α2 + α7 = 100 ⊕ 001 = 101 = α6

Page 45: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.3. REED SOLOMON CODE 29

Both subtraction operation above are correct since they are modulo 2 operation forbinary number. Addition and subtraction operations in the finite field operate on thebit-level. Therefore, the LUT in hardware implementation is used to convert a powerindex of the GF to its binary value.

The RS codeword works on symbols instead of bits. An RS codeword con-sists of data symbols and checkword symbols. A symbol is an element of the finite fieldof GF(2m). One symbol in the RS code consists of m bits. A perfect RS codewordconsists of 2m − 1 symbols. Hence, the total number of bits for the perfect RS codeis m × (2m − 1) [42]. The parameter of RS codewords for GF(2m) with m ≥ 3 are asfollows:

Number of bits per symbol : mError correction capability : tNumber of bits per dataword : kbit

Number of dataword symbols : k = kbit/mNumber of checkword symbols : j = 2 × tNumber of bits per checkword : jbit = (2 × t) × mNumber of codeword symbols: n = j + kNumber of bits per codeword : nbit = n × m

Figure 3.6 shows the dataword and the checkword section for RS code with GF(23) withthe ability to correct one error. In the figure, the number of codeword symbols n = 7;where A, B, C, D, E are dataword symbols, whereas R and S are checkword symbols.Each symbol consist of m = 3 bits, hence this results in 21 bits RS codeword. Thiscodeword has an error correction capability of t = 2/1 = 1 symbol.

A D R

3 bits

Dataword Checkword

B C E S

3 bits3 bits3 bits3 bits3 bits3 bits

Figure 3.6: Codeword for Reed Solomon with GF(23)

3.3.1 RS encoding

As described before, a symbol consists of several bits, where the length is defined by theused finite field. The algorithm used to calculate the checkword from the the datawordto form an RS codeword is as follows [42]:

1. Determine the GF that will be used.The choice of GF depends on the number of bits to be used to form a symbol.This step result into the number of the codeword symbols, dataword symbols andcheckword symbols.

Page 46: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

30 CHAPTER 3. ERROR CORRECTION CODES

2. Construct j orthogonal polynomial equations from the codeword, where j equalsto the number of checkword symbols.The term orthogonal loosely applies to these equations in the sense that eachsymbol is associated with an unique combination of GF elements. This can beperformed by multiplying each symbol with different coefficients for each equation.Since the coefficient of the symbols are the elements of GF, a simple way to do thisis by increasing the power of the coefficient α sequentially for each symbol using aconstant in the range [1, 2m − 1]. Note that αi where i = 2m − 1 is equal to α0 =1.

3. Solve these equations by applying substitution or elimination to get the formulafor all checkword symbols.In this step, multiplication, division, addition and subtraction operations of finitefield are used. After applying this step, each checkword symbol is expressed interms of dataword symbols.

4. Append the checkword to dataword. To make the codeword systematic, each check-word is inserted at the right side of the dataword.

The following example illustrates the above algorithm. A 15 bits input data is to beencoded to an RS codeword. The error correction capability is set to t = 1, so thatthe checkword contains 2 symbols (i.e., j=2). For this example, GF(23) is chosen, whichmeans that one RS symbol is represented by 3 bits. Thus, the input data will be convertedinto k = kbit/m = 15/3 = 5 dataword symbols (A,B,C,D,E) and 2 checkword symbols(R,S). The RS codeword will be 7 symbols with total length of 21 bits.

From symbols A, B, C, D, E, R and S, we construct two orthogonal equations.There are different ways to do this [42]. One of the following presented equations can becreated to construct them:

A + B + C + D + E + R + S = 0 (3.1)

α1A + α2B + α3C + α4D + α5E + α6R + α7S = 0 (3.2)

In Equation 3.2, the first symbol is multiplied by coefficient α, and subsequent symbolis multiplied by α2, α3, α4..., α7. Rearrange Equation 3.1, note the addition operationis modulo 2 operation,

S = A + B + C + D + E + R

this equation is true since it is a modulo two operation (look at addition and subtractionoperation for finite field). Substituting into Equation 3.2,

α1A + α2B + α3C + α4D + α5E + α6R + α7(A + B + C + D + E + R) = 0

After gathering up the terms, and using the fact that α7 = α0 = 1,

R(α6 + 1) = A(α1 + 1) + B(α2 + 1) + C(α3 + 1) + D(α4 + 1) + E(α5 + 1)

Simplifying the additions in the brackets lead to:

Page 47: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.3. REED SOLOMON CODE 31

R(α2) = A(α3) + B(α6) + C(α1) + D(α5) + E(α4)

Divided by α2 leads to:

R = Aα + Bα4 + Cα6 + Dα3 + Eα2 (3.3)

S is obtained by substituting R back in equation 3.1 and solving for S:

S = A + B + C + D + E + Aα + Bα4 + Cα6 + Dα3 + Eα2

S = A(α + 1) + B(α4 + 1) + C(α6 + 1) + D(α3 + 1) + E(α2 + 1)

S = Aα3 + Bα5 + Cα2 + Dα + Eα6 (3.4)

The last step is appending the checkword to the dataword to form the codeword asshown in Figure 3.6. For example if a 15 bits data is 110000010100111, so A = 110 = α4,B = 000 = 0, C = 010 = α1, D = 100 = α2 and E = 111 = α5. From Equations 3.3 and3.4, the checkword R and S are obtained.

R = Aα + Bα4 + Cα6 + Dα3 + Eα2

R = α4α + 0α4 + α1α6 + α2α3 + α5α2

R = α5 + 0 + α7 + α5 + α7

R = 111 ⊕ 000 ⊕ 001 ⊕ 111 ⊕ 001R = 000

S = Aα3 + Bα5 + Cα2 + Dα + Eα6

S = α4α3 + 0α5 + α1α2 + α2α + α5α6

S = α7 + 0 + α3 + α3 + α4

S = 001 ⊕ 000 ⊕ 011 ⊕ 011 ⊕ 110S = 111

Finally the codeword is 110000010100111000111.

3.3.2 RS decoding

The general algorithm to decode RS codewords is as follows [36]:

1. Compute the syndromes, by using the encoder polynomial as for example in Equa-tion 3.1 and Equation 3.1 and detect if an error occurred or not.

2. Find the location of one or more corrupted symbols.

3. Find the values of the corrupted symbol.

4. Correct the received codeword based on the error location and the calculated valuesfound.

Page 48: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

32 CHAPTER 3. ERROR CORRECTION CODES

Note that in general, the RS decoding process is similar to that of Hamming. First,detection of error is performed. If an error occurred, the correction process is needed,otherwise no correction is required. Two steps are performed to correct the erroneoussymbol: deterimining the location erroneous symbol and then calculating the correctvalues; both steps are more complex than in case of Hamming codes.

Using the previous example in the encoding subsection, the equations to obtain syn-dromes S0 and S1 are:

S0 = A + B + C + D + E + R + S (3.5)

S1 = α1A + α2B + α3C + α4D + α5E + α6R + α7S (3.6)

The value of all syndromes indicate whether or not errors have occurred. In case thevalue of all syndromes are equal to zero, no error occurred, otherwise an error exist inthe codeword. The next step is to find the location of the corrupted symbol. For RScode with one symbol error capability such as in the previous example, the method issimple and works as follows:Suppose the symbol B is corrupted by a pattern ε, such that the decoder receive B′ =B + ε = B ⊕ ε. Starting from equations 3.5 and 3.6, the syndromes will now take thefollowing values :

S0 = A + (B + ε) + C + D + E + R + S

S0 = (A + B + C + D + E + R + S) + ε = ε

S1 = α1A + α2(B + ε) + α3C + α4D + α5E + α6R + α7S

S1 = (α1A + α2B + α3C + α4D + α5E + α6R + α7S) + α2ε = α2ε

This leaves S0 with the value ε, and S1 with α2ε. Here, we used equation 3.1 and 3.2.In other words, S0 holds the error pattern, while S1 holds additional information aboutthe error position [42]. If k is the location of corrupted symbol, it can be calculated byk = S1

S0= α2ε

ε = α2. From the result, k points out the exact error location. Datawordsymbol B is the symbol multiplied by α2, so symbol B is the faulty symbol. To recoverform the error we take the received symbol B′ and add ε to obtain B = B′ − ε = B′ + ε,because it is modulo two operations, which is error free.

For example, the codeword in the encoding example corrupted at symbol E by errorε = 011 = α3, such that E = 111 = α5 change to E′ = 100 = α2. Calculate all syndromesusing equations 3.5 and 3.6:

S0 = A + B + C + D + E′ + R + SS0 = 110 + 000 + 010 + 100 + 100 + 000 + 111

S0 = 011 = α3

S1 = α1A + α2B + α3C + α4D + α5E′ + α6R + α7SS1 = 111 + 000 + 110 + 101 + 001 + 000 + 111

S1 = 010 = α

Page 49: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.4. REDUNDANT RESIDUE NUMBER SYSTEM CODE 33

The location of the corrupted symbol is k = S1/S0 = (α7)α/α3 = α5. The result pointsto symbol E. The error ε = S0 = α3, so the corrected symbol is retrieved by the followingoperation:

E = E′ ⊕ S0 = 100 ⊕ 011 = 111

3.4 Redundant Residue Number System Code

Redundant Residue Number System (RRNS) codes were introduced in 1967 [40]. TheRRNS codes are derived from the Residue Number System (RNS), which is usuallyfound in high speed arithmetic operation application (e.g., digital signal processing,cryptography and communication). RRNS codes are example of block codes, in whichthe checkwords are not computed from dataword but from input data. Note that forRRNS dataword is different from input data. Datawaord has to be calculated from inputdata as well. This is the major difference compared to a Hamming and RS codes, wherein Hamming and RS code dataword straigthforward equal to input data without complexcalculation.

A RRNS codeword is constructed from a set of encoded numbers called residues. Aninput data of d bits will be encoded into n symbols codeword, which is divided into twosets of residues (see Figure 3.7) [15]: (i) k non-redundant residues denoted by xi formdataword, and (ii) redundant residues denoted by xj, consisting of (n− k) symbols formthe checkword, where 1 ≤ i ≤ k and k + 1 ≤ j ≤ n.

x1 xk xk+1 xn

Dataword Checkword (parity)

redundant residuesNon-redundant residues

Figure 3.7: Structure of RRNS codeword

We use the following notation and parameters for RRNS code:

Number of error correction capability: tNumber of non-redundant residues (dataword): kNumber of redundant residues (checkword): j = 2tCodeword length in symbols: n = k + jModuli set of non-redundant residues: mi

Moduli set of redundant residues: mj

Moduli set of all residues: mr

Dataword residues: xi

Checkword residues: xj

Number of bits per non-redundant residue : xi−bit = �log2 mi�Number of bit per redundant residue : xj−bit = �log2 mj�

Page 50: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

34 CHAPTER 3. ERROR CORRECTION CODES

The xi and xj residues generated by performing a modulo operation on the input data toa set of non-redundant moduli mi, and redundant moduli mj, respectively. Notationally,these residues are written as xi = |X|mi and xj = |X|mj . Where X equal to input data.

The product of all non-redundant and redundant moduli Mr =∏n

r=1 mr =∏k

i=1 mi×∏nj=k+1 mj = Mi × Mj is called dynamic range. Thus the RRNS code can represent all

numbers in range [0,(Mr − 1)]. This range consist of a legitimate range [0,Mi] and anillegitimate range [(Mi + 1),Mr]. A dataword is error free codeword when the decodingprocess (converting RRNS code to binary) produces a number in the legitimate range;otherwise it is in the illegitimate range, hence indicating an error.

The RRNS decode process is bounded by three rules [43]:

1. A pair of any two moduli in a selected moduli set, say ma and mb where a=b,must be relatively prime positive integers such that their greatest common divisorgcd(ma,mb)=1. This holds for both the redundant and non-redundant moduli;

2. The values of the integers of redundant moduli is greater than those of the non-redundant moduli, i.e., {mk+1, ...,mn} > {m1, ...,mk};

3. The product of moduli Mi =∏k

i=1 mi, is sufficient to represent all numbers in thelegitimate range of [0,Mi − 1].

In additional to the three rules, the redundant moduli mj, are chosen arbitrarily suchthat (i) they ensure the desired error correction capability; and (ii) their product issufficient to represent all the numbers in legitimate range.

Note that each modulus in an RRNS code can have different bit lengths depending onthe chosen moduli. Hence, choosing appropriate moduli can reduce the total bit lengthof codeword. This is in contrast to RS codes, where all symbols have fixed lengths.

3.4.1 RRNS encoding

The encoding process that converts an input data to an RRNS codeword is based onmodulo operation on input data to a moduli set. The non-redundant residue is generatedby performing modulo operation on input data X to a set of non-redundant moduli mi.Redundant residue are generated by performing the same operation on input data X,but using redundant moduli mj, instead of mi. Note that residue is remainder of divisionof one value to another. It can be mathematically represented as following equation:

xi = |X|mi , xj = |X|mj . (3.7)

The number of bits for each xi and xj are �log2(mi)� bits and �log2(mj)� bits, respec-tively. Thus, the total bit length of RRNS codeword is the summation of each individualresidue; i.e., Σk

i=1�log2(mi)� + Σnj=k+1�log2(mj)�. All the residues can be calculated si-

multaneously, by placing multiple moduli circuits in parallel. The concatenated residuesform the RRNS codeword. This encoded dataword is stored in the memory.

Choosing appropriate moduli can simplify the arithmetic operations significantly,thus speeding up the encoding process. It has been proved that moduli with pattern2n − 1, 2n, or 2n +1, where n a positive integer, has advantageous over randomly chosen

Page 51: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.4. REDUNDANT RESIDUE NUMBER SYSTEM CODE 35

moduli set [44]. With these optimal moduli, a more complex hardware division processescan be translated into a process of summations and subtractions which is simple to im-plement in hardware, with less area overhead.

For example, an 8 bits input data can be encoded into RRNS codeword based onthe moduli set {m1,m2} = 24, 24 + 1 = {16, 17} for the non-redundant residue, and{m3,m4} = 25 − 1, 25 + 1 = {31, 33} for the redundant residue. The legitimate rangeof this RRNS code is 16 × 17 = 272. Since the maximum input representable by 8 bitsis equal to 255, we can modify the legitimate range to Mi=255. For example encodingdata X = 20710 = 11001111 to this RRNS codeword, can be done as follows:

xki = |X|m1 , |X|m2 , |X|m3 , |X|m4

xki= {|207|16, |207|17, |207|31, |207|33}

xki= {15, 3, 21, 9} = {11112, 000112, 101012, 010012}

3.4.2 RRNS decoding

The decoding process of RRNS codeword consists of two phases: (i) detection and (ii)possible error correction. Error detection has to be applied to all the codewords readfrom the memory. When the value of the decoded codeword is within the legitimaterange, the codeword is assumed to be error free, thus no error correction is required.Otherwise, if the value of decode codeword is larger than the legitimate range, an erroroccured; hence error correction is needed.

In the correction phase, an iterative systematic procedure is performed to connect thefaulty symbol. This procedure is base on calculating the decimal value of the corruptedcodeword while discarding t number of residues as will be explained below. The procedurewill stop once the calculated value is within legitimate range. Note that in the worstcase scenario Cn

t = n!t!(n−t)! iterations should be performed; n denote the total number

of residues representing codeword while t denote the error correction capability. If Cnt

iterations are performed without find any X that is with the legitimate range, then theconsider RRNS code cannot correct the corrupted codeword because it is beyond theerror correction capability.

The error detection and correction procedures are as follows:

1. Decode the codeword X using all n residues and compute the value of X. At thisstage X is not verified yet.

2. If X ≤ Mi, no error occurred, thus the data X is verified.

3. If X > Mi, one error or more are detected. Therefore, correction procedure startsby discarding t number of residues where t the error correction capability of theRRNS code. Compute X ′ from the remaining (n− t) residues, and compare to Mi.(Note that, this procedure is based on trial and error. It must be repeated for atmost Cn

t times, each time with difference combination of (n − t) residues.)

4. If any of X ′ falls within the Mi, then this particular X ′ is the correct dataword.

5. When all X ′ fall outside the Mi, then the RRNS codes is unable corret the erroneouscodeword because the number of occurred error is beyond the error correctioncapability of the chosen RRNS code.

Page 52: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

36 CHAPTER 3. ERROR CORRECTION CODES

Two algorithms can be used in the decoding process: (i) Chinese Remainder Theorem(CRT) and (ii) Mixed-Radix Conversion (MRC) [40]. In this work, we use MRC since itsmore advantageous using smaller multiplicative inverse. Multiplicative inverse are usedto compute the data X from the codeword. MRC is based on the following equation:

Xr =n∑

s=1

vsws (3.8)

s ∈ {2, 3, ..., n} and n ∈ {1, 2, 3, ..., n − 1}. Where ws is

w1 = 1, ws =n−s∏

s=2

ms (3.9)

where v1, v2, v3 is calculated by

v1 = |X|m1= x1 (3.10)

v2 = |(x2 − v1) × |m−11 |m2 |m2

v3 = |((x3 − v1) × |m−11 |m3 − v2) × |m−1

2 |m3 |m3

and vs, where s > 3 is calculated byvs = |(((xs − v1) × m1s)... − vs−1) × m(s−1)s|ms (3.11)

where xs = |X|ms and m(s−n)s is the multiplicative inverse of m(s−n) with respect to ms,denote as

m(s−n) = |m−1(s−n)|ms

Given m(s−n) and ms we can calculate |m−1(s−1)|ms by solving the following equation:

|m(s−n)m−1(s−n)|ms = 1

Consider the following encoding process, in which input data X equals to 2007. Wewill use RRNS codeword with moduli set {m1,m2,m3,m4} = {16, 17, 31, 33}. So{x1, x2, x3, x4} = {1001, 00011, 10101, 001001} = {9, 3, 21, 9}. Hence the error free code-word equal to 11110001110101001001. Suppose that two bits error occurred at bit po-sitions 18 and 19. Hence the corrupted codeword is 10010001110101001001. The legiti-mate range = Mi = 16 × 17 = 272, since the input data X consists of 8 bits the range[0,255], so we can modifiy legitimate range to be Mi = 255. First step is detecting procesas follows:

v1 = x1 = 9v2 = |(x2 − v1) × |m−1

1 |m2 |m2

= |(3 − 9) × 16|17 = 7v3 = |((x3 − v1) × |m−1

1 |m3 − v2) × |m−12 |m3 |m3

= |(21 − 9) × 2 − v2) × 11|31 = 22v4 = |(((x4 − v1) × |m−1

1 |m4 − v2) × |m−12 |m4 − v3) × |m−1

3 |m4 |m4

= |(((9 − 9) × 31 − v2) × 2 − v3) × 16|33 = 18X == v1 + v2 × m1 + v3 × m1 × m2 + v4 × m1 × m2 × m3

= 9 + 7 × 16 + 22 × 16 × 17 + 18 × 16 × 17 × 31 = 157881 > Ma ,

Page 53: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

3.4. REDUNDANT RESIDUE NUMBER SYSTEM CODE 37

Since the value of X is outside the legitimate range, it can be concluded that an erroroccured, and the correction procedure should be applied. For this example, there aremaximal Cn

t = n!t!(n−t)! = Cn

t = 4!1!(4−1)!=4 iterations.

• Iteration 1: discard x4. The remain residues are x1, x2, x3. Calculate X ′ fromthese residues:

v1 = x1 = 9v2 = |(x2 − v1) × |m−1

1 |m2 |m2

= |(3 − 9) × 16|17 = 7v3 = |((x3 − v1) × |m−1

1 |m3 − v2) × |m−12 |m3 |m3

= |(21 − 9) × 2 − v2) × 11|31 = 22X ′= v1 + v2 × m1 + v3 × m1 × m2

= 9 + 7 × 16 + 22 × 16 × 17 = 6105

Since the result X ′ = 6105 > Ma. It means that the error still exist in one of theremaining residues.

• Iteration 2: discard x3. Now x1, x2, x4 will be involved in the process. CalculateX ′ from these residues:

v1 = x1 = 9v2 = |(x2 − v1) × |m−1

1 |m2 |m2

= |(3 − 9) × 16|17 = 7v3 = |((x4 − v1) × |m−1

1 |m4 − v2) × |m−12 |m4 |m4

= |(9 − 9) × 31 − v2) × 11|31 = 19X ′= v1 + v2 × m1 + v3 × m1 × m2

= 9 + 7 × 16 + 19 × 16 × 17 = 6105

The result X ′ = 5289 > Ma. Since the X ′ falls also outside the legitimate rangewe have to proceed to iteration 3.

• Iteration 3: discard x2. The remaining residues are x1, x3, x4. Calculating X ′

from these residues result in:

v1 = x1 = 9v2 = |(x3 − v1) × |m−1

1 |m3 |m3

= |(3 − 9) × 2|31 = 24v3 = |((x4 − v1) × |m−1

1 |m4 − v2) × |m−13 |m4 |m4

= |(9 − 9) × 31 − v2) × 16|33 = 12X ′= v1 + v2 × m1 + v3 × m1 × m3

= 9 + 24 × 16 + 12 × 16 × 31 = 6345

The result X ′ = 6345 > Ma. In this iteration the error is still not found and thedataword is invalid.

Page 54: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

38 CHAPTER 3. ERROR CORRECTION CODES

• Iteration 4: discard x1. The remaining residues are x2, x3, x4. Again calculatingX ′ result in:

v1 = x2 = 3v2 = |(x3 − v1) × |m−1

2 |m3 |m3

= |(3 − 9) × 11|31 = 12v3 = |((x4 − v1) × |m−1

2 |m4 − v2) × |m−13 |m4 |m4

= |(9 − 3) × 2 − v3) × 16|33 = 0X ′= v1 + v2 × m1 + v3 × m1 × m3

= 9 + 24 × 16 + 0 × 16 × 31 = 207

The result X ′ = 207 ≤ Ma. We can conclude that the error reside in residue x1.The value of the error free is 207.

In this example, all the possible sets of (n− t) residues are explored to recover the error.This is the worst case. In practice the number of iterations can be much less than theworst case.

Page 55: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

IMPLEMENTATION 4This chapter describes the implementation of the encoder and the decoder of Hamming,Reed Solomon, and Redundant Residue Number System codes. The implementationfocuses on designing the encoder and decoder for 16 bits dataword for simplicity. VeryHigh Speed Integrated Circuit Design Language (VHDL) is used to describe the encoderand decoder for each code.

The chapter starts with the design of Hamming encoder and decoder. Then it con-tinues with Reed Solomon encoder and decoder. A modification is made to RS encoderand decoder in order to optimize the area and time overhead. The chapter ends with thedesign of RRNS encoder and decoder. A modification is also applied to RRNS decoderto optimize the area and the time overhead.

4.1 Hamming Code

As described in Chapter 3, to construct a Hamming code for a 16 bits dataword, weneed a 5 bits checkword to form a 21 bits codeword. This code can correct 1 bit errorand can detect 2 bits error. Figure 4.1 shows the codeword for Hamming(21,16,1).

Figure 4.1: Hamming codeword for 16 bits data

4.1.1 Design of Hamming encoder

The Hamming encoder performs the encoding algorithm, which has been explained inSection 3.2, Chapter 3. Now, we present again the algorithm with the specific pa-rameters of 16 bits dataword and 5 bits checkword. The algorithm of non-systematicHamming(21,16,1) is described as the follows:

1. Distribute the checkword bits.The five checkword bits P0, P1, P2, P3, and P4 are distributed at positions 1, 2, 4,8 and 16.

2. Mark the sequence of the appropriate bit positions in determining the parity ofeach checkword bits.

39

Page 56: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

40 CHAPTER 4. IMPLEMENTATION

Position P0 : Mark the bit at positions 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21Position P1 : Mark the bit at positions 2, 3, 6, 7, 10, 11, 14, 15, 18, 19Position P2 : Mark the bit at positions 4, 5, 6, 7, 12, 13, 14, 15, 20, 21Position P3 : Mark the bit at positions 8, 9, 10, 11, 12, 13, 14, 15Position P4 : Mark the bit at positions 16, 17, 18, 19, 20, 21

3. Determine the parity of the checkword bits.All the marked bits, except marked bit at first positions because they are par-ity bits, are modulo-2 sum and the result will determine Pi value. In hardwareimplementation, this operation can be implemented using XOR gates.

4. Insert checkword bits.This step insert the checkword bits Pi into their positions to form a non-systematiccodeword; a non-systematic codeword is the codeword when the dataword and thecheckword are mixed up or interleaved in the codeword.

16 bits

21 bits

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 121 20 19 18 17

d0

P2

d1d2d3

P3

d4d5d6d7d8d9d10

P4

d11d12d13d14d15

P1 P0

Data out

Data in

Hamming encoder

Interleaver

P0P1P2P3P4d0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15

P2P3P4 P1 P0d0d1d2d3d4d5d6d7d8d9d10d11d12d13d14d15

Figure 4.2: Circuit for Hamming encoder for 16 bits data

Based on the above algorithm, we design the Hamming encoder. The circuit diagram ofthe designed Hamming encoder is shown in the Figure 4.2. In the figure, it is shown thatthe implementation of step 2 and step 3 are represented by the cascaded XOR gates foreach parity. Example, P4 is obtained from the cascaded XOR, which inputs are fromd11, d12, d13, d14, and d15. Note that d11, d12, d13, d14, and d15 are the bits at positions

Page 57: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.1. HAMMING CODE 41

17, 18, 19, 20, and 21, respectively (see step 2 and 3). As in this work we target touse systematic codeword, so an interleaver is added to the encoder to convert the non-systematic codeword to a systematic codeword. Note that a systematic codeword isthe codeword when the dataword appears at the beginning of the codeword followed bycheckword at the end of the codeword. We choose a systematic codeword because tofollow the structure most current memory chip which separate dataword from checkword.In the existing memory chip the parity bits are stored in special chip. In addition, wewant to make same codeword structure with the next two codeword in this thesis work(RS and RRNS codes). They will use systematic codeword as well. For RS and RRNScodes the systematic codeword give advantage because they work based on symbol forcluster error.

4.1.2 Design of Hamming decoder

During the reading process, the read codewords from the memory cell have to be decodedto obtain the original data. The output data, without the checkword, have to be producedby the decoder. We will consider here a codeword of 21 bits needs to code 16 bits data(as it was the code in the previous section). The decoding process can be performedby using the algorithm of Hamming decoding discussed in Section 3.2, Chapter 3. Thedecoding of 21 bits Hamming codeword to 16 bits output data is given next:

1. Extract 5 bits checkword from 21 bits codeword e.g., read from memory.

2. Calculate a new checkword bits from the remaining 16 bits codeword by applyingsteps 2 and 3 of the encoder algorithm; see Section 3.2, Chapter 3.

3. Obtain the syndrome bits by XOR-ing the 5 bits of the extracted checkword (step1) and the 5 bits checkword calculated in step 2.

4. If the syndrome = 0, then an error has occurred and the value of the syndromepoints to the position of the erroneous bit. The correction of the error is made byflipping the bit at that position. If syndrome = 0, there is no error occurred; thusno correction is needed.

Figure 4.3 shows the circuit diagram of the designed Hamming decoder. In the figure,we can see that the input of the decoder is 21 bits systematic codeword. It has to beinterleaved to make all the input signals become non-systematic, so the bit positions ofdataword and checkword follow the pattern in the encoder. The non-systematic codewordis stored into a buffer to be used later. The same cascaded XOR gates, like the ones inthe encoder are used in the decoder, are used to calculate new checkword bits. The newcheckword is then compared with the extracted checkword by using XOR gates to havea 5 bits syndrome. There is a circuit called flipper to flip the erroneous bit that storedin the buffer. This flipper operates only when the syndrome is not equal to 0, in thiscase it flips the bit pointed by the syndrome. The last step is to extract the correctedcodeword extractor, which let the dataword but not the checkword to the output portto produce 16 bits output data.

Page 58: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

42 CHAPTER 4. IMPLEMENTATION

4.2 Reed Solomon Code

We design the RS encoder in such a way that a 16 bits data will be encoded into 2symbols of dataword A and B; each consist of 8 bits. For this purpose, GF(2m) withm = 8 (i.e., the number of bit per symbol) is chosen, meaning that a maximum of 255symbols can be constructed in a codeword.

We target to correct one erroneous symbol, which is equal to maximum of 8 bits ofclusters errors. Therefore, the RS codeword needs checkword symbols j = 2× t where tis number of error correction capability. Since we target t = 1, hence checkword symbolsj = 2 as shown in Figure 4.4 (see Section 3.3, Chapter 3).

However, since GF(28) has too many elements, which results in a complex conversionfrom binary to GF elements αi (1 < i < m − 1) when encoding and the way aroundwhen decoding, we use GF(24) to reduce this problem. It means that one symbol is

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 121 20 19 18 17

d0P2d1d2d3P3d4d5d6d7d8d9d10P4d11d12d13d14d15 P1 P0

XOR

Flipper

Buffer for non-systematic codeword

Extractor

Codeword in

Data out

Hammingdecoder

5 bits syndrome

5 bits calculated checkword

5 bits extracted checkword

16 bits

21 bits

21 bits

21 bits

d0 P2d1d2d3 P3d4d5d6d7d8d9d10 P4d11d12d13d14d15 P1 P0

Interleaver

d15 d14 P0P1

Figure 4.3: Hamming decoder circuit for 16 bits data

Page 59: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.2. REED SOLOMON CODE 43

Figure 4.4: Reed Solomon codeword for 16 bits dataword

formed by 8 bits (Figure 4.4), will be represented by two symbols consist of 4 bits asshown in Figure 4.5. For example, dataword A and B become A1 and A2, and B1 andB2, respectively. Similarly, the checkword R and S become R1 and R2, and S1 and S2,respectively. Then, we use interleaving technique to group A1, B1, R1, S1 and A2, B2,R2, S2 into two 16 bits codeword. We name them the first-codeword (which consists ofthe first-dataword and the first-checkword), and the second-codeword (which consists ofthe second-dataword and the second-checkword), respectively. There are two advantagesof doing this modification as follows:

• First, it will reduce the area and the time latency, because with this scheme wecan process each part of 16 bits codeword separately. The symbols in each 16 bitscodeword consists of 4 bits. Therefore, we can use finite field GF(24), which hasless elements than GF(28). Thus, the conversion to binary representation is moresimple and faster compared to GF(28).

• Second, this method can correct cluster errors impacting two symbols; each consistsof 4 bits. These two symbols can be either two adjacent symbols or two non-adjacent symbols belonging to the two different 16 bits codewords (see Figure 4.5).For example, the error can be in A1 and A2, or in A1 and B2 but not in A1 andB1, or in A1 and R1.

A1 R1

4 bits

S1B1

4 bits 4 bits4 bits

A1 B1

4 bits

dataword

B2A2

4 bits 4 bits4 bits

R1 S1 S2R2

4 bits 4 bits 4 bits4 bits

checkword

A2 R2

4 bits

S2B2

4 bits 4 bits4 bits

first-dataword first-checkword second-checkword second-checkwordfirst-codeword second-codeword

(a)

(b)

Figure 4.5: The modified RS codeword. (a) before interleave (b) after interleave

Page 60: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

44 CHAPTER 4. IMPLEMENTATION

4.2.1 Design of RS encoder

The generic algorithm of RS encoding has been described in Section 3.3, Chapter 3. Inthis section, we present the algorithm used in the implementation work, based on themodified scheme explained above. The algorithm is given as follows [42];

1. Generate GF(24) table.

2. Divide first-dataword into symbols A1 and B1, each symbol consists of 4 bits.

3. Construct two orthogonal equations by using the four symbols of first-codeword.

A1 + B1 + R1 + S1 = 0, (4.1)

α1A1 + α2B1 + α3R1 + α4S1 = 0 (4.2)

4. Solve the two orthogonal equations to find the value of checkword R1 and S1 fromdataword A1 and B1.

5. Append checkword R1 and S1 to dataword A1 and B1 to form RS first-codeword.

6. Repeat step 2, step 3, step 4 and step 5 for the second-dataword. Now A2 and B2

will be used to produce the second-codeword A2, B2, R2 and S2.

7. Combine the two codewords using the interleaving structure as in Figure 4.5.

In the rest of this section we will discuss the implementation and design of RS encoderfor the first-codeword (see Figure 4.5); the second codeword has the same structure.

We start the design process by finding the binary value of GF(24) elements as thisis an integral part of RS encoder. This can be obtained by using polynomial generator.For this task, the GF(24) is generated from primitive polynomial x4 + x3 + 1 = 0 asshown in Figure 4.6. This results in the elements as shown in Table 4.1. In the tablethe left columns represents the elements of GF(24), the middle column represents thebinary values of the elements of the GF, and the right column represents the decimalvalues of the corresponding elements of the GF. Initially, binary values [q3, q2, q1, q0] inthe circuit in Figure 4.6 is equal to [0, 0, 0, 1], so the binary value of first row of Table4.1 is equal to α0= 0001. The other values in the table are produced by the polynomialgenerator at each clock cycle. After every fifteen clock cycles, the polynomial goes tothe initial state. If we define αn = [q3, q2, q1, q0], we obtain αn+1 = [q2 ⊕ q3, q1, q0, q3].After every fifteen clock cycle, the polynomial goes to the initial state.

By rearranging Equations 4.1 and 4.2, we get :

S1 = A1 + B1 + R1 ,so

α1A1 + α2B1 + α3R1 + α4(A1 + B1 + R1) = 0 ,

Page 61: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.2. REED SOLOMON CODE 45

dqck

dqck

dqck

clock

dqck

q0q1q2q3

b0b1b2b3

Figure 4.6: GF(24)= x4 + x3 + 1 = 0 generator

Power Binary value Decimal valueα0 = 1 0001 1

α1 0010 2α2 0100 4α3 1000 8α4 1001 9α5 1011 11α6 1111 15α7 0111 7α8 1110 14α9 0101 5α10 1010 10α11 1101 13α12 0011 3α13 0110 6α14 1100 12

Table 4.1: Table element of GF(24)

gathering up the terms we get :R1(α3 + α4) = A1(α1 + α4) + B1(α2 + α4)

The addition αi (in the bracket) are performed by XOR-ing their binary values as givenin GF(24) table. This will result into:

R1(α15) = A1(α5) + B1(α11)

Note for α15 = α0 = 1, soR1 = A1α

5 + B1α11, (4.3)

and substituting R1 into S1,

S1 = A1 + B1 + A1(α5) + B1(α11)S1 = A1(1 + α5) + B1(1 + α11) S1 = A1(α0 + α5) + B1(α0 + α11)

S1 = A1α10 + B1α

14 (4.4)

The circuit of RS encoder to execute the above algorithm is shown in Figure 4.7.

Page 62: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

46 CHAPTER 4. IMPLEMENTATION

5

11

10

14

XOR

XOR

dataout16 bit

datain

Encoder RS using GF(24)

A1

B1 S1

R1

A1

B1

8

4

4

4

4

4

4 4

4

4

4

4

4

4

4

16

Figure 4.7: Circuit diagram of hardware RS encoder for 8 bits data using GF(24)

From Equations 4.3 and 4.4 we can see that symbols A1 and B1 are multiplied withsome constants αi, which are elements of Galois field. In the designed circuit, this isperformed by XOR-ing A1 and B1 to their corresponding constant. The complete tableof multiplier constants αi for GF(24) is shown in Table 4.2 [42]. This table can begenerated by using circuit in Figure 4.6 in a similar way to as the elements in Table 4.1where generated. The index power of α indicates how many clock to be performed toget the output. In the table 4.2, biout, i = 0, 1, 2, 3, represents binary value of the GFelements after have been multiplied with some constant αi (see Figure 4.8(a)).

Figure 4.8: (a) Multiplying input bi with αi (b) Circuit diagram multiplying bi with α5

Page 63: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.2. REED SOLOMON CODE 47

For block multiplier αi, bits bi are the bits value of the input and the biout are the bitsvalue of the output, where i = 0, 1, 2, 3. For example to multiply a 4 bits input b3b2b1b0

with α5 (see Figure 4.8(b)) , the output b3outb2outb1outb0out, can be found based on Table4.2 as follows;

b3out = b0 XOR b1 XOR b3

b2out = b1 XOR b2 XOR b3

b1out = b0 XOR b1 XOR b2 XOR b3

b0out = b0 XOR b1 XOR b2

Multiply Combination of inputswith for each output bit

b3out b2out b1out b0out

α0 = 1 b3 b2 b1 b0

α1 b2 ⊕ b3 b1 b0 b3

α2 b1 ⊕ b2 ⊕ b3 b0 b3 b2 ⊕ b3

α3 b0 ⊕ b1 ⊕ b2 ⊕ b3 b3 b2 ⊕ b3 b1 ⊕ b2 ⊕ b3

α4 b0 ⊕ b1 ⊕ b2 b2 ⊕ b3 b1 ⊕ b2 ⊕ b3 b0 ⊕ b1 ⊕ b2 ⊕ b3

α5 b0 ⊕ b1 ⊕ b3 b1 ⊕ b2 ⊕ b3 b0 ⊕ b1 ⊕ b2 ⊕ b3 b0 ⊕ b1 ⊕ b2

α6 b0 ⊕ b2 b0 ⊕ b1 ⊕ b2 ⊕ b3 b0 ⊕ b1 ⊕ b2 b0 ⊕ b1 ⊕ b3

α7 b1 ⊕ b3 b0 ⊕ b1 ⊕ b2 b0 ⊕ b1 ⊕ b3 b0 ⊕ b2

α8 b0 ⊕ b2 ⊕ b3 b0 ⊕ b1 ⊕ b3 b0 ⊕ b2 b1 ⊕ b3

α9 b1 ⊕ b2 b0 ⊕ b2 b1 ⊕ b3 b0 ⊕ b2 ⊕ b3

α10 b0 ⊕ b1 b1 ⊕ b3 b0 ⊕ b2 ⊕ b3 b1 ⊕ b2

α11 b0 ⊕ b3 b0 ⊕ b2 ⊕ b3 b1 ⊕ b2 b0 ⊕ b1

α12 b2 b1 ⊕ b2 b0 ⊕ b1 b0 ⊕ b3

α13 b1 b0 ⊕ b1 b0 ⊕ b3 b2

α14 b0 b0 ⊕ b3 b1 ⊕ b2 b1

Table 4.2: Multiplying 4 bits input with an element of GF(24)

4.2.2 Design of RS decoder

In this work we design RS decoder for 32 bits codeword using GF(24). Note that becausethe 32 bits codeword in the encoder is structured based on two codewords of 16 bits (seeprevious subsection), the circuit of 32 bits decoder will be based on two duplicateddecoders of 16 bits codeword. Therefore, we describe only one part of the decoder thatconverts first-codeword (A1, B1, R1, S1) into first-dataword (A1, B1). The second part ofthe decoder has the same design. The algorithm used in the implementation for decodingprocess is explained as the follows [42] (see also Section 3.3, Chapter 3):

1. Divide the first-codeword to become 4 symbols A1, B1, R1 and S1; each symbolconsists of 4 bits.

Page 64: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

48 CHAPTER 4. IMPLEMENTATION

2. Calculate the syndrome Sy0 = A1 XOR B1 XOR R1 XOR S1.

3. Calculate the syndrome Sy1 = (α1A1) XOR (α2B1) XOR (α3R1) XOR (α4S1).

4. Find the error location.This step is done by calculating Sy1

Sy0. Note that the dividing process in the finite

field is easily to perform by subtracting the power of α. Since the result of Sy1

Sy0

has to be in the range of 1 to 15, Sy0 is subtracted from Sy1 and then modulo 15operation is performed on the obtained result.

5. If error location Sy1

Sy0= 1, then the symbol A has error. The erroneous A is corrected

by XOR-ing it with Sy0. Otherwise if error location = 1, no error occurred in A.

6. If error location = 2, then the symbol B has error. The erroneous B is correctedby XOR-ing it with Sy0. Otherwise if error location = 2, no error occurred in B.

Figure 4.9 shows the circuit diagram of RS decoder for the input first-codeword. Thereare four multiplier constants α1, α2, α3 and α4. These circuits perform the multiplicationof the Equation in step 3. There are four 4 bits XORs. Two of them (at the left site)perform the process to find the two syndromes Sy0 in step 2 and Sy1 in step 3. Theother two XORs perform the correcting procedure in steps 5 and 6. The submod is alook up table (LUT), which performs the process to find the error location in step 4. TheLook Up Table maps the power of α to binary value and vice versa. Furthermore thediagram consists of two multiplexers, which perform the process of selecting the correctvalues of symbols A and B in steps 5 and 6 depending on the error location.

XOR

1

2

3

4

XORsubmod

XOR

XOR

Sy0

Sy1

MUX

MUX B1

A1

Dataout8 bit

codewordin

Decoder RS using GF(24)

A1

B1

R1

S1

location16

4

4

4

4

44

4

4

4

44

44

4

4

4

4

8

4

4

4

4

4

4

4

4

4

4

44

4

4

4

4

4

Figure 4.9: Circuit diagram of RS decoder for 8 bits data using GF(24)

Page 65: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.3. REDUNDANT RESIDUE NUMBER SYSTEM ECC 49

4.3 Redundant Residue Number System ECC

We design RRNS encoder and decoder for 16 bits data to correct one erroneous symbol orresidue. According to [45], RRNS codes have error correction capability t = �(n−k)/2�,where n is the number of codeword symbols, and k is the number of dataword symbols.We use moduli set {2f −1, 2f or 2f +1}, where f is positive integer, instead of arbitrarymoduli because it realizes smaller hardware and fast latency [45].

For this work, we divide a 16 bits dataword into two non-redundant residues (equalto two symbols in RS codes). These non-redundant residues are generated based onmoduli set {m1, m2}={2f , 2f + 1}. To correct one symbol error, we add two redundantresidues to form a codeword. These redundant residues are generated based on moduliset {m3, m4}={2f − 1, 2f + 1} to prevent the excessive area and time overhead as longas they fulfill the RRNS rules. For this purpose, f is set to 8 for non-redundant residues,and 9 for redundant residues. Thus, the design uses moduli set {256, 257} for {m1,m2}and {511, 513} for {m3,m4} to form a RRNS code as illustrated in Figure 4.10. Notethat the bit length of each residue depends on the corresponding moduli used and itcalculated as xi = �log2 mi�, where i = 0, 1, 2, 3, 4 . The RRNS codewords have a bitlength = �log2(256)� + �log2(257)� + �log2(511)� + �log2(513)� = 8+9+9+10 = 36.

x1(mod 256)

x4(mod 513)

x3(mod 511)

x2(mod 257)

8 bits 9 bits 9 bits 10 bits

Redundant residuesNon-redundant residues

Figure 4.10: RRNS codeword for 16 bits input

4.3.1 Design of RRNS encoder

The RRNS encoder is designed by using shifting or adding and subtracting process foreach residue. The following describes the method used for the implementation [45].

• For modulo m1 = 256 = 28, the residue x1 = |X|m1 (X denote the input data) canbe obtained by taking the eight least significant bits from the input.

• For modulo m2 = 255 = 28 + 1, we divide the 16 bits data input into two groupsB0 and B1; B0 consists of bit 0 to bit 7, while B1 consists of bit 8 to bit 15. Thenresidue x2 can be obtained byx2 = |X|m2 = B0 − B1 if (B0 ≥ B1) else x2 = 257 − (B1 − B0)

• For modulo m3 = 511 = 29 − 1, we divide the 16 bits data input into two groupsB0 and B1; B0 consists of bit 0 to bit 8, while B1 consists of bit 9 to bit 15. Thenresidue x3 can be obtained byx3 = |X|m3 = B1 + B2.

• For modulo m4 = 513 = 29 + 1, we divide the 16 bits data input into two groupsB0 and B1; B0 consists of bit 0 to bit 8, while B1 consists of bit 8 to bit 15. Then

Page 66: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

50 CHAPTER 4. IMPLEMENTATION

residue x4 can be obtained byx4 = |X|m4 = B0 − B1 if (B0 ≥ B) else x2 = 513 − (B1 − B0).

The block diagram of RRNS encoder is shown in Figure 4.11. Block Mod256 257 convertsthe input data into x1 and x2, which are the residues for modulo 256 and modulo 257,respectively. Block Mod511 513 converts the input data into x3 and x4, which are theresidues for modulo 511 and modulo 513, respectively. For the implementation, themodulo 256 and 257 are described in VHDL as one entity because they have the samevalue f=8 in choosing moduli set 2f . The same is also applied for modulo 511 and 513because their value f=9.

Data_in

x1

x2

x3

x4

Data_out

Encoder_rrns

Modulo 256

Modulo 257

Modulo 511

Modulo 513

Mod256_257

Mod511_51316

16

16

16

16

16

16 8

9

9

10

36

Figure 4.11: Block diagram of RRNS encoder

4.3.2 Design of RRNS decoder

As described in Section 3.4, Chapter 3, the decoding process for RRNS codes has twosteps. The first step is to detect the error by calculating all the residues, and then checkthe result of the calculation whether it is within the legitimate range or not. Only if it isnot within the legitimate range (i.e., beyond the legitimate range), the second step, (i.e.,correcting step) should be performed. The later is done in our case by discarding oneresidue at a time (since t=1), and thereafter calculating the remaining residues. Thisprocess has to be performed iteratively until we find the value within the legitimate range.This method is based on Mixed Radix Conversion (MRC) (see Section 3.4, Chapter 3).

In our implementation, we modify the decoding algorithm. The design does notperform the detection, but it performs the correction directly. This is because duringcorrection procedure, the detection process has been included. For this, we check themixed radix digit vi (see Section 3.4, Chapter 3). In the design under consideration,

Page 67: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.3. REDUNDANT RESIDUE NUMBER SYSTEM ECC 51

there are three mixed radix digits v1, v2, v3, which have to be calculated from threeremaining residues in each iteration. The output data X ′ is calculated with formula:

X ′ = v1 + v2 × m1 + v3 × m1 × m2

where vi (for example the iteration step with discarded residue x4):v1 = x1

v2 = |(x2 − v1) × c1|2|m2

v3 = |((x3 − v1) × c1|3 − v2)) × c2|3|m3

As mentioned before, the legitimate range is [0, m1 × m2]. In the above equation ifv3 = 0, then X ′ ≥ m1 × m2, meaning that X ′ is not within the legitimate range. Thisindicates that an error has occurred. Otherwise if v3 = 0, then X ′ is within legitimaterange, meaning that the output data is valid.

In calculating X ′, precalculated multiplicative inverses c(s−n)|s are needed. These areequal to m−1

s−n which are obtained by solving the following equation for a given ms−1

(see also Section 3.4, Chapter 3):

|m(s−n) × m−1(s−n)|ms = 1, where s ∈ {2, 3, 4} and n ∈ {1, 2, 3, 4}.

So, the values for all multiplicative inverses required for this design are:

|256 × c1|2|257 = 1 ⇒ c1|2 = 256|256 × c1|3|511 = 1 ⇒ c1|3 = 2|256 × c1|4|513 = 1 ⇒ c1|4 = 511|257 × c2|3|511 = 1 ⇒ c2|3 = 171|257 × c2|4|513 = 1 ⇒ c2|4 = 2|511 × c3|4|513 = 1 ⇒ c3|4 = 256

The following procedures describe the decoding process for in the RRNS decoder understudy:

1. Discard residue x4

Calculate the remaining residues x1, x2, x3 by using MRCv1 = x1

v2 = |(x2 − v1) × c1|2|m2

v3 = |((x3 − v1) × c1|3 − v2)) × c2|3|m3

If v3 = 0, then an error has occurred; go to step 2else calculate X ′ = v1 + (v2 × m1), and go to step 6.

2. Discard residue x3

Calculate the remaining residues x1, x2, x4 by using MRCv1 = x1

v2 = |(x2 − v1) × c1|2|m2

v3 = |((x4 − v1) × c1|4 − v2) × c2|4|m4

If v3 = 0, then an error has occurred; go to step 3else calculate X ′ = v1 + (v2 × m1), and go to step 6.

3. Discard residue x2

Calculate the remaining residues x1, x3, x4 by using MRC

Page 68: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

52 CHAPTER 4. IMPLEMENTATION

v1 = x1

v2 = |(x3 − v1) × c1|3|m3

v3 = |((x4 − v1) × c1|4 − v2) × c3|4|m4

If v3 = 0, then an error occurred; go to step 4else calculate X ′ = v1 + (v3 × m1), and go to step 6.

4. Discard residue x1

Calculate the remaining residues x2, x3, x4 by using MRCv1 = x2

v2 = |(x3 − v1) × c1|3|m2

v3 = |((x4 − v1) × c1|4 − v2) × c3|4|m4

If v3 = 0, then an error has occurred; the error can not be corrected because theerroneous residues beyond of the error correction capabilty, go to step 5else calculate X ′ = v2 + (v2 × m2), and go to step 6.

5. Calculate Take X ′ = v1 +(v2 ×m1), as an alternative value for uncorrected errors.Note that this can be any value as the result is wrong.

6. Read data out = X ′

In general, MRC operates in sequential. However, in our implementation, we parallelizethe algorithm to make it faster. We make each iteration to become one block of circuit,and share some signals. The RRNS decoder is shown in Figure 4.12. In the figure thereare three types of block circuit as follows:

• Block SubmodmulM(i) where M = 257, 511, 513 and i = 1, 2, 3, 4 (for exampleSubmodmul257(1,2,3), numbers in the bracket means that block is used in the iter-ation step 1, 2 and 3).These blocks perform operation subtraction, modulo mi and multiplication, for ex-ample to calculate v2 = |(x2 − v1)× c1|2|m2 (see step 1 of the decoding algorithm).

• Block AddmulM(i) where M = 256, 257 and i = 1, 2, 3 (for exampleAddmul256(1,2), numbers in the bracket means that block is used in the itera-tion step 1 and 2.These blocks perform operation v2 multiplication with modulo M and then addi-tion the result with v1, for example to calculate X ′ = v1 + (v2 ×m1) (see iterationstep 1 of decoding algorithm).

• Block Mux.This block perform multiplexing operating with the inputs are signal from the leftand selectors are signal from bottom.

We can also see at least three types of signals, there are:

• Signals xi where i = 1, 2, 3, 4 (i.e., x1, x2, x3 and x4).These signals are from input signal codeword. Signal x1 consists of bit 35 downto28, signal x2 consists of bit 27 downto 19, signal x3 consists of bit 18 downto 10,and signal x4 consists of bit 9 downto 0.

Page 69: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

4.3. REDUNDANT RESIDUE NUMBER SYSTEM ECC 53

Submodmul 257(1,2)

Submodmul 511(1,3)

Submodmul 513(2,3)

Submodmul 511(4)

Submodmul 513(4)

Submodmul 511(1)

Submodmul 513(2)

Submodmul 513(3)

Submodmul 513(4)

c2|3

c2|4

c3|4

c1|2

c2|3

c1|4

c2|3

c2|4 c3|4

x1x2x3x4

v1(1,2,3)

Addmul256(1,2)

Addmul256(3)

Addmul257(4)

Mux

Submodmul = substract modulo multiply

Data_in

Decoder_rrns

Data_out

36 bit

16 bit

v2(1,2)

v3(1)

v2(3)

v3(2)

v3(3)

v1(4)

v2(4)

v3(4)

Addmul = add modulo

Figure 4.12: Block diagram of RRNS decoder

• Signals vn(i) where n = 1, 2, 3 and i = 1, 2, 3, 4.These signals are mix-radix-digit in each iteration step. For example v1(1,2,3), is v1

in the iteration step 1, 2 and 3 of the decoding algorithm.

• Signals ca|b where a = 1, 2, 3 and b = 2, 3, 4.These signals are multiplicative inverses required in the MRC calculation. Forexample c1|2 is multiplicative inverse m1 respect to m2.

Each iteration step in the decoding algorithm is realized with the blocks and signals inFigure 4.12. The first four steps in the decoding algorithm are represented in the figureas follows:

• The first step of decoding algorithm is realize with the Submodmul257(1,2),Submodmul511(1,3), Submodmul511(1) and Addmul256(1,2).

• The second step with the of blocks Submodmul257(1,2,3), Submodmul513(2,3),Submodmul513(2), and Addmul256(1,2)

• The third step with the blocks Submodmul257(1,2,3), Submodmul513a(2, 3),Submodmul513(3), and Addmul256(3) .

• The fourth step with the blocks Submodmul511(4), two Submodmul513(4) andAddmul257(4) .

Page 70: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

54 CHAPTER 4. IMPLEMENTATION

Iteration step 1 of the decoding algorithm is chosen as an example to describe processin the diagram block. All calculations in iteration step 1 are:

• v1 = x1

This calculation is represented by signal v1(1,2,3) directly from x1.

• v2 = |(x2 − v1) × c1|2|m2 where m2 = 257.This calculation is represented by block Submodmul257(1,2) with input signals x2,v1 and c1|2, and the output is v2(1,2).

• v3 = |((x3 − v1) × c1|3 − v2) × c2|3|m3 , where m3 = 511.This calculation is represented by blocks Submodmul511(1,3) with input signalsv1(1,2,3), x3 and c1|3; and Submodmul511(1) with input signals v2(1,2), c2|3 and theoutput of block Submodmul511(1,3). The output of Submodmul511(1) is signalv3(1).

• X ′ = v1 + (v2 × m1), where m1 = 256.This calculation is represented by block Addmul256(1,2) with input signals v1(1,2,3)

and v2(1,2). The output of block Addmul256(1,2) become one input of multiplexerinput signals.

Page 71: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

EXPERIMENTAL RESULTSAND ANALYSIS 5This chapter presents the experimental results and analysis of the implementation work.The chapter starts with a short description of the simulation setup together with some re-sults; the simulation is performed using Modelsim simulator. Then, the chapter presentsthe synthesis results for 16 bits data, which are performed by Xilinx and Synopsys designtools. Thereafter, it discusses and analysis the obtained experimental results. Finally,the obtained results will be explored for more general cases.

5.1 Simulation Setup

In Chapter 3, we have designed the encoder and decoder for three ECCs: Hamming, RS,and RRNS. Note that Hamming ECC has one bit error correction capability, whereasRS and RRNS have one symbol/residue error correction capability.

In order to verify the functionality of the designed encoders and decoders, we performthe simulations using Modelsim simulator. Two cases of simulation are performed: (1)error-free case, (2) error case as shown in Figure 5.1(a) and (b). The first case is toverify the correctness of encoders and decoders circuits. The second case is to verify theerror correction capability of each ECC scheme.

For the first case, we use the setup simulation as depicted in Figure 5.1(a). A 16bits data is encoded by the encoder and produces a codeword. Then, the codewords arestored in a 1 kB RAM, namely error-free-RAM. Next, the outputs of the RAM becomethe inputs of the decoders. The decoded codeword must be equal to the value of theinput data, to verify that the encoder and the decoder work correctly. For the secondcase, two memories, 1 KB each are used; namely error-RAM and error-free-RAM asshown in Figure 5.1(b). The errors from error-file are injected only into error-RAM.The output data of error-free-RAM are used to verify the decoder whether it can correctthe error or not.

The error-file consists of two columns separated by character space. Figure 5.2 il-lustrates a portion of the error file used to inject errors for 21 bits Hamming codeword.The left column is the memory address and the right column is the binary number rep-resenting the error, which length equal to the length of the codeword. The binary 1 ina binary number indicates that the error corrupts the data bit at that position in theerror-RAM. Figure 5.2 shows two memory locations that are impacted by errors. Thefirst location is a single bit error at position 2 in memory address 1; and the secondlocation is two bits error at positions 2 and 4 in memory address 3. The contents oferror file are XOR-ed with the stored data in error-RAM. So the bits of stored data inerror-RAM will be flipped at the bit positions of the binary 1 of the error file.

55

Page 72: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

56 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

Encoder

Error-freememory

Inputdata

Memory cell

Decoder

Error file

Read process Read

process

Writeprocess

Writeprocess

Writeprocess

Error-freememory

decoder

encoder

Inputdata

Outputdata

N bit

N bit+#Checkword bit

N bit

N bit+#Checkword bit

(a) (b)

N bit

N bit N bitCorrectdata

Output data

compare

pass/fail

Figure 5.1: (a) Simulation set up to verify the functionality of encoder and decoder (b)Simulation set up to evaluate the error correction capability of each ECC

0 0000000000000000000001 0000000000000000000102 0000000000000000000003 0000000000000000010104 0000000000000000000005 0000000000000000000006 0000000000000000000007 000000000000000000000

Figure 5.2: The error file that is masked with the codeword in memory cells

5.1.1 Hamming simulation

The simulation result of Hamming ECC is shown in Figure 5.3. Note that only eightmemory locations are shown here for visibility purpose. There are six groups of signals;address, input data, output data, control signal, stored codeword, and test control sig-nals. Address is represented by address s, input data is represented by data in, andtwo output data are represented by data out which is the output of error-RAM andcorrect data which is the output of error-free-RAM. Control signals are represented bywe s for write enable and oe s for output enable. Stored codeword ad different addressare represented by ram block. Test control signal fault in is the signal enabling the

Page 73: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.1. SIMULATION SETUP 57

injection of faults from the error file to the error-RAM. Syndrom signal shows the syn-drome value and indicates if error occurred in the codeword.

The first step in the simulation is to write all memory addresses with data in. Thevalue of data in is equal to the value of its address. For example memory address = 000will be written with data in = 0000, memory address = 001 will be written with data in= 0001, and so on. These input data are written into both RAMs; i.e., error-RAM anderror-free-RAM. The contents of error-RAM are the Hamming codewords, which aredisplayed by the ram block signals.

After the all memory addresses have been written, faults are injected to the error-RAM. This is indicated by signal fault in = 1 for a short period. This signal enables thecontents of error-file to mask the contents of ram block. We can see that at the memoryaddresses 1 and 3, the values of ram block change. The memory address 1 (indicatedby the circle) is impacted by error at bit position 10. So it changes from 000023h to000423h. The memory address 3 (indicated by the rectangle) is impacted by errors atbit positions 12 and 10. So the codeword changes from 000066h to 001466h.

Next, read process is performed to both RAMs; error-RAM and error-free-RAM. Theoutput of error-RAM is data out, while the output of error-free-RAM is correct data.When memory address 1 is read, syndrom is not equal to 0, it means that an error hasbeen found in the codeword. The signal syndrom s = 0Ah, means that the bit at posi-tion 10 is flipped. The error has been corrected by the decoder , and the read data resultsinto data out = 0001h. To verify this output is correct, it is compared to correct data.Since they are the same, it means that the error has been corrected; because it is onebit error. When memory address 3 is read, the syndrom is not 0, means an error has

Figure 5.3: Hamming simulation

Page 74: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

58 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

been found in the codeword. The signal syndrom s = 06h, which mean that the bitat position 6 is flipped. The error correction procedure results into data out = 00A7h.The comparison to correct data indicate that the corrupted codeword is not correctedproperly. This is true because there are two erroneous bits, one bit more than the errorcorrection capability of Hamming. Hence if the number of corrupted bits per codewordis more than the correction capability, the error will be not corrected.

5.1.2 Reed Solomon simulation

The simulation result of Reed Solomon ECC is shown in Figure 5.5. Note that onlyeight memory locations are shown here for visibility purpose. Similarly to Hammingsimulation, there are six group of signals. Each signal in Figure 5.5 uses the same nameas in Figure 5.3. However, the number of syndromes signals for RS are four insteadof one in Hamming. Additional signals called location are also shown in the figure.These signals represent the location of the erroneous symbol. There are four syndromesignals in the two decoder circuits; dec1/sy0, dec1/sy1, dec2/sy0, and dec2/sy1. Notethat in the RS ECC design, we divide the codeword into two parts (first-codeword andsecond-codeword, see Section 4.2, Chapter 4). The symbols of the codeword A1, A2, B1,B2, R1, R2, S1 and S2 are the bits at positions 31 downto 28, 27 downto 24, 23 downto20, 19 downto 16, 15 downto 12, 11 downto 8, 7 downto 4, and 3 downto 0, respectively(see Figure 5.4).

A131..28

B123..20

4 bits

dataword

B219..16

A227..24

4 bits 4 bits4 bits

R115..12

S17..4

S23..0

R211..8

4 bits 4 bits 4 bits4 bits

checkword

Figure 5.4: Bit positions for each symbol in RS codeword

The first step of RS simulation is a write operation, which is similar to the step ofHamming simulation. In the second step, the error-file is used to inject the error-RAM.At memory address 1, the codeword 00010D0Ch is masked by error 00FF0000h result-ing into 00FE0D0Ch. The binary value of the error represents a cluster errors at bitpositions 16 to 23, which corrupt both symbols B1 and B2. At memory address 3, thecodeword 00030EODh is masked by error 40100000h resulting into 40130E0Dh. Notethat, these errors impact two different symbols in the first-codeword namely A1 and B1.At memory address 5, the codeword 00050B0Eh is masked by error 0FF00000h resultinginto 0FF50B0Eh. This cluster error corrupts at two symbols A2 and B1.

When the read operation is performed at memory address 1, all four syndrome signalsare not equal to 0. This indicates that errors occurred in the codeword. The correctingprocedure of the decoder results in data out = 0001h. The comparison to corrected datashows that the output is correct, that means that the error is corrected. This conformsthe RS theory, since it can correct one symbol in our case of study.

When the read operation is performed at memory address 3, not all of syndrome sig-

Page 75: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.1. SIMULATION SETUP 59

Figure 5.5: Reed Solomon simulation

nals are equal to 0. This indicates that errors occurred in the codeword. The correctingprocedure of the decoder results in data out = 4013h. The comparison to corrected datashows that the output is not corrected, this means that the errors cannot be corrected.This is true because the errors occurred in two different symbols, while the correctioncapability of designed RS encoder and decoder is only one symbol.

When the read operation is performed at memory address 5, all the syndrome signalsare not equal to 0. It indicates that the errors occurred in the codeword. The correctingprocedure of the decoder results in data out = 0005h. The comparison to corrected datashows that the output is correct, it means that the errors is corrected. Although the code-word are corrupted at different symbols A2 (belongs to second-codeword) and B1 (belongsto first-codeword), since they belong to different group of codeword, the designed RSECC can correct the error. This is the advantage to apply interleaving structure to RScodeword.

5.1.3 RRNS simulation

The simulations result of RRNS ECC is shown in Figure 5.7. Note that only eightmemory locations is shown here for visibility purpose. The name of all signals carriesthe same meaning as in RS simulation. There are four detecting signals: v3 1, v4 2,v4 3, and v4 4 in the decoder circuit, which represents syndromes signals. Signal v3 1represents the syndrome for the iteration step 1 of the decoding algorithm (see Section4.3, Chapter 3), signal v4 2 represents the syndrome for the iteration step 2, signalv4 3 represents the syndrome for the iteration step 3, and signal v4 4 represents thesyndrome for the iteration step 4. Note that in RRNS ECC design, we divide thecodeword into 4 residues x1, x2, x3 and x4; these are bits at positions 35 downto 28, 27

Page 76: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

60 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

downto 19, 18 downto 10, and 9 downto 0, respectively (see Figure 5.6).

x135..28

x227..19

x318..10

x49..0

non redundant residues redundant residues

8 bits 9 bits 10 bits9 bits

Figure 5.6: Bit positions for each residue in RRNS codeword

Figure 5.7: RRNS simulation

First, all data are encoded and stored in both RAMs; the error-RAM and theerror-free-RAM. Then, the content of the error-file is used to inject errors into theerror-RAM. At memory address 1, the codeword 010080401h is masked by 00FF00000h

resulting into 01FF80401h. The cluster errors corrupt non-redundant residue x2. Atmemory address 3, the codeword 030180C03h is masked by 080200000 resulting into0B0380C03h. The multi bit random errors occurred in the codeword at bit positions31 and 21 (corrupting x1 and x2). At memory address 5, the codeword 050281405h ismasked by 00003FFFF resulting into 0502BEBFAh. The errors occurred in bit 0 to bit18 (19 adjacent bits, which corrupt x3 and x4).

The next step is to perform read operations. When memory address 1 is read, threeof the four detecting signals v3 1, v4 2 and v4 4 are not equal to zero. It means that theerrors have occurred. One of the detecting signal v4 3 is equal to zero. It means thatthe error can be corrected. The correcting procedure results in data out = 0001h. Thecorrect data from error-free-RAM is 0001h, which means that the correction procedurecan correct the errors.

Page 77: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.2. SYNTHESIS RESULTS 61

When memory address 3 is read, all four detecting signals v3 1, v4 2, v4 3, and v4 4are not equal to zero. That means that errors have occurred and cannot be corrected.In this case, the default output is calculated from non-redundant-residue resulting intodata out = EF0Bh. The comparison to correct data shows that it is uncorrectable.This follows the theory, since the designed RRNS encoder and decoder can correct onlyone erroneous residue. In this case errors corrupt two residues which are beyond theerror correction capability.

When memory address 5 is read, all four detecting signals v3 1, v4 2, v4 3, andv4 4 are not equal to zero. This means that the errors have occurred and cannotbe corrected. The default output is calculated from non-redundant-residue resultingdata out = 0005h. Although the comparison to correct data shows that the output iscorrect, but this is just a fortunate case and does not true for general case, because theerror is beyond error correction capability of the designed RRNS encoder and decoder.

From all simulation of the three ECCs, the simulation results verify and validatethe three designed ECCs namely Hamming, RS and RRNS work properly and conformto their theory in error correction capability.

5.2 Synthesis Results

We synthesize the encoder and decoder of Hamming, RS, and RRNS to estimate thearea and time overhead. For this work, we use Xilinx ISE 10.1 and Synopsys A-2007tools. Xilinx synthesizes the design to Virtex 4 Field Programmable Gate Array (FPGA),while Synopsys synthesizes the design to Application Specific Integrated Circuit (ASIC)based on 90 nm CMOS technology. Xilinx tool gives the area overhead estimation interms of slices or Look-Up Tables (LUTs), whereas Synopsys tool gives the area overheadestimation in terms of transistors cell area (μm2). Both tools provide the time estimationin nanosecond (ns).

Unit Hamming Reed Solomon RRNSencoder decoder encoder decoder encoder decoder

4 input LUT 12 73 22 86 97 493Delay (ns) 5.823 7.793 5.546 10.374 7.616 25.693

Table 5.1: Synthesis results of the three ECCs using Xilinx ISE 10.1 for 16 bits datainput

Table 5.1 gives the area and the time overhead of all designs encoder and decodercircuit for 16 bits data in FPGA. The table clearly shows that Hamming requires thesmallest area, follow by RS and RRNS. Specifically, Hamming encoder and encoderrequire 12 and 73 LUTs, respectively, which means 85 LUTs in total. RS encoder anddecoder require 22 and 86 LUTs, respectively, and 108 LUTs in total. RRNS encoderand decoder require 97 and 493 LUTs, respectively, and 590 LUTs in total. This areaoverhead is also illustrated in Figure 5.8(a).

In term of time latency, we can see that Hamming code is the fastest with 5.823 ns for

Page 78: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

62 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

(a) (b)

Area estimation using Xilinx ISE

0

100

200

300

400

500

600

700

Hamming RS RRNSECC schemes

Are

a (L

UT)

encoder decoderTime Latency using Xilnx ISE

0

5

10

15

20

25

30

35

Hamming RS RRNSECC schemes

Tim

e (n

s)

encoder decoder

Figure 5.8: Comparison of the three ECCs (a) area overhead and (b) time overheadusing Xilinx (from Table 5.1)

encoder and 7.793 ns for decoder, follow by Reed Solomon with 5.546 ns for encoder and10.374 ns for decoder, and RRNS with 7.616 ns for encoder and 25.693 ns for decoder.This latency information is also illustrated in Figure 5.8(b). Note that the decodingprocess of RRNS is the most time consuming.

The synthesis results using Synopsys tool for the same designed encoer and decoderare given in Table 5.2. The results show that the total area (encoder and decoder) ofHamming encoder and decoder is the smallest, follow by RS and RRNS. Specifically, thearea overhead of Hamming is 236.376 + 405.014 = 741.390 μm2, for RS is 261.072 +1155.772 = 1416.844 μm2, and for RRNS is 1042.171 + 5936.212 = 6978.383 μm2.

In term of the time latency, Hamming encoder time latency is 0.46 ns, and followedby RS with 0.83 ns and RRNS with 1.99 ns. Hamming decoder time latency is 0.98 ns,and followed by RS with 2.02 and RRNS with 6.57 ns.

From Xilinx and Synopsys synthesis result for 16 bits input data, we can concludethat Hamming is the smallest in the term area overhead, and the fastest in term timelatency. But Hamming can correct only one bit error. In case of cluster error RS is bettercompared to RRNS in term area overhead and time latency. Overall, RRNS is the mostexpensive in term of larger area overhead and longer impact on the performance, whileHamming is the cheapest. However, RRNS seems better in term of error correctioncapability as we will see in the next section.

Page 79: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.3. DISCUSSION AND ANALYSIS 63

Unit Hamming Reed Solomon RRNSencoder decoder encoder decoder encoder decoder

Area (μm2) 236.376 405.014 261.072 1155.772 1042.171 5936.212Delay (ns) 0.46 0.98 0.83 2.02 1.99 6.57

Table 5.2: Synthesis results of the three ECC using Synopsys for 16 bits data input

(a) (b)

Area Estimation using Synopsys

0

1000

2000

3000

4000

5000

6000

7000

8000

Hamming RS RRNSECC shemes

Are

a (

m2 )

encoder decoder

Time Latency Estimation using Synopsys

0

1

2

3

4

5

6

7

8

9

Hamming RS RRNSECC Schemes

Tim

e (n

ano

seco

nd)

encoder decoder

Figure 5.9: Comparison of the three ECCs (a) area overhead and (b) time overheadusing Synopsys (from Table 5.2)

5.3 Discussion and Analysis

This section discusses and analysis the different aspects of the studied ECC in term oftheir error correction capabilities and the required cost. First estimation of memorycell array area overhead will be discussed. Thereafter the error correction capabilities.Finally an overall comparison will be given.

5.3.1 Memory cell array overhead

The area of memory cell depends on the the bit length of the codewords. So, we didnot synthesized the memory cell area because the area can be estimated by multiplyingthe codeword bit length by the number of address. Furthermore, the memory cell of thehybrid memory is formed by using non-CMOS nanodevices. The tools to generate thearea for non-CMOS devices do not exist yet.

Table 5.3 shows the comparison of the codeword bit length for Hamming, RS andRRNS for 16, 32 and 64 data bits. From the table we can see that the codeword bitlength for each ECC can be obtained as follows:

• Hamming codeword bit length follows the formula n = k + �log2 k� + 1, where nthe codeword bit length, and k the dataword bit length (see Section 3.2, Chapter3).

Page 80: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

64 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

• RS codeword bit length follows the formula n = k + j = k + 2t, where n thenumber of codeword symbols, k the dataword symbols, j the checkword symbolsand t erroneous symbols correction capability. Since our designed RS code dividedataword become two symbols k = 2, and we target to correct up to one erroneoussymbol t = 1, hence n = 2 + 2 = 4 symbols. This means that the codeword bitlength of our designed RS code is doubled than the data bit length .

• RRNS codeword bith length follows the formula n = k + j = k + 2t, where n thenumber of codeword residues, k the dataword residues, j the checkword residuesand t erroneous residue correction capability. Since we target one erroneous residuecorrection capability t = 1 and divide input data into two residue k = 2, so numberof codeword n = k + 2t = 2 + 2 = 4 residues. The number of codeword residuesis twice than dataword residues, same to RS codeword. However, the bit length ofresidue depends on modulo set. In our design, we use moduli set {m1,m2,m3,m4}equal to {2f , 2f +1, 2(f+1)−1, 2(f+1) +1} where f positive integer, and each residuebit length for x1, x2, x3, x4 equal to f , f + 1, f + 1 and f + 2. Since our designedRRNS code choose f = kbit/2 (where kbit is dataword bit length) to form minimalbit length, this moduli set result into codeword bit length equal to 2kbit + 4.

Data Codeword bit lengthbit length Hamming RS RRNS

16 21 32 3632 38 128 13264 71 256 260

Table 5.3: Codeword bit length of Hamming, RS and RRNS code

From Table 5.3, we can conclude that RRNS is the longest codeword bit length. Thismeans that memory cell array area for RRNS code is the largest compared to the other.Hamming code is the shortest codeword bit length, this means required smallest area ofmemory cell array.

5.3.2 Error correction capability

Table 5.4 shows the comparison of error correction capability of Hamming, RS andRRNS. Hamming has one bit error correction capability. RS can correct one erroneoussymbol, which consists of 8, 16 and 32 bits for 16, 32 and 64 bits input data, respectively.RRNS can also correct one erroneous residue, which consists of 8 to 10 bits, 16 to 18 bitsand 32 to 34 bits for 16, 32 and 64 bits input data. The residues of RRNS code consistof different bit length, with a maximum of 2 bits different between the shortest and thelongest residues.

For 16 bits input data, Hamming can correct 1/16=0.0625%, RS can correct8/16=0.5%, RRNS can correct 8/16=0.5% to 10/16=0.625% of each stored codeword.For 32 bits input data, Hamming can correct 1/32=0.0313%, RS can correct 16/32=0.5%,RRNS can correct 16/32=0.5% to 18/32=0.5625%. For 64 bits input data, Hamming can

Page 81: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.3. DISCUSSION AND ANALYSIS 65

Data Error correction capabilitybit length Hamming RS RRNS

16 1 bit 1 symbol (8 bits) 1 residue ( 8 to 10 bits)32 1 bit 1 symbol (16 bits) 1 residue ( 16 to 18 bits)64 1 bit 1 symbol (32 bits) 1 residue ( 32 to 34 bits)

Table 5.4: Error correction capability of Hamming, RS and RRNS code

correct 1/64=0.0156%, RS can correct 32/64=0.5%, RRNS can correct 32/64=0.5% to34/64=0.531%. These results show that RRNS code is the best in term error correctioncapability.

5.3.3 Overall comparison

From the synthesis results, we can see that overall comparison, Hamming is the best ascompared to RS and RRNS in term of the area overhead and the time latency. However,the error correction capability for Hamming is one bit, which is the worst. The errorcorrection capability of RS and RRNS is one symbol or one residue, which consist ofseveral bits. Because of this, it is not fair to compare the cost of Hamming to RS andRRNS. Therefore, we only analyze the cost between RS and RRNS.

The synthesis results show that the area and time overhead for RS encoder anddecoder is better than RRNS. The reason are given as the followings;

1. RS symbols are based on Galois Field elements, for which all symbols have equal bitlength. However, RRNS symbols are based on residues generated from relativelyprime moduli, which means that each residue has different bit length. Moreover,the redundant residues (checkword) must be bigger than non-redundant residues(dataword). Hence, the bit length of RRNS codeword is longer than that of RS.E.g., for 16 bits input data, we use 32 bits codeword for RS, but 36 bits for RRNS.The longer the bit length, the higher cost incurs.

2. RS uses XOR gates for bit wise operation to compute syndrome bits (as explainedin Chapter 4) while RRNS decoder uses multipliers to compute the residues indetermining the legitimate range. Multipliers ocupy larger area and require longercomputation time than XOR’s. Furthermore, RRNS decoder consists of morecomplex circuit including adder and subtracter for modulo operation.

3. In this work we considered only one symbol error correction capability. RS checksand corrects the error in one shot, without iteration. RRNS uses mixed radixconversion (MRC) algorithm, which requires iterations to find the correct data.We parallelize the iteration process in our design resulting in the bigger area butfaster computing time. This is the main reason why the area overhead for RRNSdecoder is much bigger than that of RS. Note that, however, if the error correctioncapability is more than one, RS decoding requires iteration algorithm as well.

Page 82: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

66 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

4. Error correction capability for RS code is smaller than for RRNS code. But thecodeword bit length of RS is also shorter than bit length of RRNS, as have beenexplain in the previous subsections.

From all reasons above in comparison between RS and RRNS codes overall, we canconclude that RS is better than RRNS.

5.4 Exploring Area and Time Overhead for General Cases

In this section we will explore the results for more general cases. First the area andlatency overhead for RS and RRNS encoding and decoding circuits for different data bitlength will be discussed. We will not explore hamming since it is not fair to compareHamming cost to the others. Then we will explore RRNS code for more than oneerroneous residue capability.

5.4.1 Synthesis results of RS and RRNS for various data width

Additional works have been done to design the RS and RRNS encoder and decoder fordata bit length 32 and 64 in order to determine the trend in area and latency overhead.Table 5.5 and 5.6 shows the area and time for RS and RRNS encoders and decoderswhich are generated by Xilinx for data bit length 16, 32 and 64, respectively.

Data Reed Solomon RRNSbit length encoder decoder total encoder decoder total

(bits) (LUT) (LUT) (LUT) (LUT) (LUT) (LUT)16 22 86 108 97 493 59032 44 172 216 197 847 104464 88 344 432 389 1623 2012

Table 5.5: Area overhead of RS and RRNS encoder-decoder (using Xilinx tools)

Data Reed Solomon RRNSbit length encoder decoder total encoder decoder total

(bits) (ns) (ns) (ns) (ns) (ns) (ns)16 5.546 10.374 15.920 7.616 25.917 33.53332 5.546 10.374 15.920 7.619 25.693 33.31264 5.546 10.374 15.920 8.763 32.383 41.146

Table 5.6: Time overhead of RS and RRNS encoder-decoder (using Xilinx tools)

Figure 5.10 give the same results graphically. It show that by increasing the data bitlength, the area of RS encoder plus decoder increases linearly, whereas the latency re-mains constant. For RRNS encoder plus decoder, the area increases almost by a factor2 by doubling the data bit length, while the time latency remain constant for 16 an 32bits data length and increase with about 20% for 64 bit data length.

To accurately estimate the are and latency overhead, the work has been done usingSynopsys tools. Table 5.7 and 5.8 show the area and time overhead for RS and RRNSencoder and decoder, which are generated by Synopsys for data bit length 16, 32 and

Page 83: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.4. EXPLORING AREA AND TIME OVERHEAD FOR GENERAL CASES 67

(a) (b)

Area of RS and RRNS using Xilinx

0

500

1000

1500

2000

2500

16 32 64Data bit length

Are

a (L

UT)

RS RRNS

Latency of RS and RRNS encoder-decoder using Xilinx

05

1015202530354045

16 32 64Data bit length

Tim

e (n

ano

seco

nd)

RS RRNS

Figure 5.10: RS and RRNS (a) area overhead and (b) time overhead (using Xilinx tools)

Data Reed Solomon RRNSbit length encoder decoder total encoder decoder total

(bits) (μm2) (μm2) (μm2) (μm2) (μm2) (μm2)16 261.072 1155.772 1416.844 1042.171 5936.212 6978.38332 522.144 2218.406 2740.55 2128.089 9680.126 11808.21564 1044.288 4487.616 5531.904 4692.240 36718.013 41410.013

Table 5.7: Area overhead of RS and RRNS encoder-decoder (using Synopsys tools)

Data Reed Solomon RRNSbit length total encoder decoder encoder decoder total

(bits) (ns) (ns) (ns) (ns) (ns) (ns)16 0.83 2.02 2.85 1.99 6.57 8.4632 0.83 2.12 2.95 3.23 11.28 14.6164 0.83 2.17 3.00 4.96 20.11 25.07

Table 5.8: Time overhead of RS and RRNS encoder and decoder (using Synopsys tools)

64, respectively. Figure 5.11 graphically presents the same result. One clearly see thatRS encoder plus decoder increase linearly, while the time latency remains constant. ForRRNS encoder plus decoder, its area overhead increases exponentially, while the timelatency almost linear.

The main conclusion based on the above are:

• The area overhead of RS encoder-decoder is increases linearly by increasing the bitlength of input data, while the time latency are remains constant for all designedRS encoder-decoder regardless input data bit length.

• The area overhead of RRNS encoder-decoder is increases exponentially by increas-ing the input data bit length, while the time latency is increasing double by in-creasing the input data bit length.

• These results shows that RS is better than RRNS.

Page 84: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

68 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

(b)

(a) (b)

Area of RS and RRNS using Synopsys

05000

1000015000200002500030000350004000045000

16 32 64Data bit length

Are

a (

m2 )

RS RRNS

Time Latency of RS and RRNS encoder-decoder using Synopsys

0

5

10

15

20

25

30

16 32 64Data bit length

Tim

e (n

ano

seco

nd)

RS RRNS

Figure 5.11: RS and RRNS (a) area overhead and (b) time overhead using Synopsystools

5.4.2 Cost estimation of RRNS for higher error correction capability

Our design can only correct errors that occur in a single residue. Note that the errorcorrection capability of RRNS code j = 2t, where j is the number of redundant residuesand t is error correction capability. Therefore, the code requires twice as many as thenumber of redundant residues to correct certain number of residues in RRNS codeword.In other word, if the code is extended to be able to correct more erroneous residues, wehave to append two more redundant residues for every single additional error correctioncapability. This implicates the bigger area of the memory cells, the encoder, and thedecoder plus higher time latency because more iterations have to be performed duringthe correction step. In the rest of this section we explore RRNS code for more than oneerror correction capability related to the memory cell array area, encoder and decoderarea, and the necessity to use RRNS for higher error correction capability.

5.4.2.1 Memory cells array area

The area of memory cell array depends on the moduli set used to produce the residues.The RRNS code is formed by four residues that use moduli set 2f , 2f + 1, 2f+1 − 1,2f+1 + 1, where f is positive integer, can correct one erroneous residue, with the totalbit length of the codeword is 3f + 4 bits. Note that this moduli set is used to realizefast and simple hardware [44]. The same moduli set cannot be used for six residuesRRNS code to correct two erroneous residues. This is because the moduli set will notbe pairwise relatively prime anymore. Example, moduli 2f − 1, 2f ; f = 10 are added onthe four moduli set to form six moduli set. The new moduli set consist of {mi}i=1..6 ={256, 257, 511, 513, 1023, 1025}. However checking the greatest common divisor (gcd) forall moduli set we find that gcd between m3 = 513 andm5 = 1023 result in gcd=3.This result break the rule that state the greatest gcd for all moduli equal to one, asconsequences all moduli have to be pairwise relatively prime. This breaking rule causesinconsistency in having residue x3 and x5. Example number 174933 and number 0 results

Page 85: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

5.4. EXPLORING AREA AND TIME OVERHEAD FOR GENERAL CASES 69

in the same residue value for both moduli x3 = 0 and x5 = 0, which can cause confusionduring decoding step.

One solution is to use arbitrary moduli set. Instead of using modulus that basedon the pattern of 2f , 2f − 1, 2f + 1, any prime numbers to existing moduli set can beused. The advantage of using arbitrary moduli set is the length of the codeword can beminimized. Example for RRNS code formed by six residues from moduli set {mi}i=1..6 ={256, 257, 263, 509, 511, 513}, the codeword bit length is 53 bits. Nevertheless, arbitrarymoduli set incurs complex hardware realization, because the modulo operation can notbuilt from the simple circuit shifter, adders or subtracter anymore. Another solution isto used the pattern 2f , 2f −1, 2f +1, and find number f that fulfills the RRNS code rules.The advantage is that the encoder-decoder can be realize fast and simple hardware. Thedisadvantage is the codeword bit length become larger. For example we can use moduliset {mi}i=1..6 = {28, 28+1, 29−1, 29+1, 210+1, 211−1} = {256, 257, 511, 513, 1025, 2047},where the codeword bit length 8+9+9+10+12+12=60 bits.

5.4.2.2 Cost of the area and time latency of encoder and decoder

Increasing error correction capability require more variables (signals in hardware im-plementation) and iteration during correction. Thus, we can expect the area and timelatency to become worse as we increase the error correction capability. For example, todetect one erroneous residue, the calculation is X = v1 + v2 ×m1. These are three inputvariables (signals) require one adder and one multiplier for this. However, to correct twoerroneous residues, the calculation is X = v1 + v2 × m1 + v3 × m1 × m2. For this case,we need six signals, two adders, three multipliers. Besides bigger area, the second caserequire more time to produce the result, consequently result in higher time latency.

5.4.2.3 RRNS code higher residues error correction capability

Using RRNS code for higher error correction capability could be not a good option formany reason. Some of the reason are listed below:

1. Huge memory cell array overhead.As explained in previous section that RRNS codeword bit length become longer ifwe use moduli 2f , 2f − 1 or 2f + 1 (with f positive integer number) to keep themoduli set pairwise relatively prime.

2. Complex encoder and decoder circuits.When the arbitrary moduli set is used to keep pairwise relatively prime can avoidof huge memory cell array area. However, as explained before the encoder anddecoder circuits become more complex.

3. Special for three error correction capability, Triple Modular Redundancy is moresimple and low latency compared to RRNS encoder decoder. TMR only uses simplyXORs to compare and a multiplexer to vote the read data during decoding, whileRRNS decoder circuits use multiplier and iterations step. More error correctioncapability result to more iterations step.

Page 86: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

70 CHAPTER 5. EXPERIMENTAL RESULTS AND ANALYSIS

Page 87: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

CONCLUSIONS ANDRECOMMENDATIONS 6This chapter presents the main conclusions of the performed work and provides somerecommendations and direction for future works.

6.1 Conclusions

This thesis is mainly concerned with the benefit and cost of applying error correctioncodes (ECC) in hybrid memories to improve the overall quality and reliability. The mainconclusions that can be made from this work area summarized as follows:

• Hybrid memories is one of the candidate to replace CMOS memories. The ad-vantages of having abundant of data storage, low power, and potential of lowfabrication cost is the motivation of this.

• Full understanding of the physics of hybrid memories is far for being complete. Inaddition the fabrication and prototyping of such devices are still in their infancystage.

• It is expected that hybrid memories like CMOL (i.e., CMOS / Molecullar hybridmemories) will have high defect rates and will be more susceptible to soft errors.In addition, these defect and soft errors will impact several contiguous bits causingcluster faults.

• In order to tolerate high defect rates and soft error, fault tolerance schemes arerequired. Error correction code (ECC) is one of the solutions that can be used totolerate faults.

• Reed Solomon (RS) and Redundant Residue Number System (RRNS)are suitableECC to address cluster faults. RS has been extensively used in traditional memories(e.g., flash memories). However, and to the best knowledge of the author, this isthe first work to use RRNS as fault tolerance scheme for hybrid memories.

• Hamming is suitable for hybrid memories that operate fast but can only toleratea single bit error.

• Reed Solomon is suitable for fast hybrid memories can tolerate cluster errors.

• Redundant Residue Number System is suitable for hybrid memories that operateslow and can tolerate cluster errors.

• Each ECC incurs different levels of area overhead of the encoder and decoder.Hamming incurs smallest overhead follows by RS and RRNS. For 16 bit data, area

71

Page 88: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

72 CHAPTER 6. CONCLUSIONS AND RECOMMENDATIONS

of Hamming encoder and decoder synthesis on FPGA Virtex 4 equal to 85 LUTs,for RS equal to 108 LUTs, and for RRNS equal to 490 LUTs. Overall, RS is thebest ECC to be used to address cluster faults in hybrid memories.

6.2 Recommendations

There are a number of issues that can be further explored.

1. This work has designed RS and RRNS encoder and decoder to correct one sym-bol only. The estimation for two error correction capability for RRNS has beenexplained in Section 5.4.2. Further work can investigate the designed encoder anddecoder for more than one symbol/residue error correction capability and analyzethe impact on the area overhead of these circuits. Note that, for more than onesymbol correction capability, both RS and RRNS require iterative decoding. Thus,we can expect more complex circuits of the decoder.

2. ECC can correct errors during the reading process (i.e., during decoding process).Although the red values will be correct, the corrupted codeword stored in thememory cell array remain unchanged. To prevent the accumulation of errors inthe memory cell array, one can use technique to overwrite the corrupted codewordwith the corrected one. This technique is called ”scrubbing”, where an internalperiodic read and write operations of stored codeword is performed during the idletime of the memory.

Page 89: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Bibliography

[1] R. Chau, S. Datta, M. Doczy, B. Doyle, B. Jin, J. Kavalieros, A. Majum-dar, M. Metz, and M. Radosavljevic, “Benchmarking Nanotechnology for High-Performance and Low-Power Logic Transistor Application,” IEEE Transactions onNanotechnology, vol. 4, pp. 3–6, 2005.

[2] R. Chau, “III-V on Silicon for Future High Speed and Ultra-Low Power DigitalApplications: Challenges and Opportunities,” CS Mantech Conference, April 14-172008.

[3] K. K. Likharev and D. B.Strukov, “CMOL: Devices, Circuit and Architecture,”Lecture Notes in Physics, vol. 680, pp. 447–477, 2005.

[4] M. M. Ziegler and M. R. Stan, “CMOS/Nano co-design for Crossbar-Based Molecu-lar Electronic System,” IEEE Transactions on Nanotechnology, vol. 2, pp. 217–230,Dec. 2003.

[5] K. Likharev, “CMOL: Second Life for Silicon?” Microelectronic Journal, vol. 39,pp. 177–183, 2008.

[6] G. Whitesides and M. Boncheva, “Beyond Molecules: Self-Assembly of Mesoscopicand Macroscopic Component,” Proceeding of the National Academy of Sciences ofUSA, vol. 99, no. 8, pp. 4769–4774, April 2002.

[7] P. Mazumder and J. Patel, “An Efficient Built-In Self Testing for Random AccessMemory,” IEEE Transactions on Industrial Electronic, vol. 36, no. 2, pp. 246–253,1989.

[8] F. Kastensmidt, L. Sterpone, L. Carro, and M. Reorda, “On the Optimal Designof Triple Modular Redundancy Logic for SRAM Based FPGA,” Proceedings of theDesign, Automation and Test in Europe Conference and Exhibition, pp. 1290 – 1295,2005.

[9] T. Rao and E. Fujiwara, Error-Control Coding for Computer System. Prentice-Hall,Inc, 1989.

[10] C. I. Chen and L. E. Grosbach, “Fault-Tolerant Memory Design in the IBM Ap-plication System/400m,” Proceedings of Twenty-First International Symposium onFault-Tolerant Computing, pp. 393–400, 1991.

[11] G. Cadarilli, P. Marinucci, and A. Salsano, “Fault-Tolerant Solid State Mass Mem-ory for Satellite Applications,” Proceedings of IEEE Instrumentation and Measure-ment, vol. 1, pp. 253–256, 1997.

[12] B. Chen, X. Zhang, and Z. Wang, “Error Correction for Multi Level Nand FlashMemory Using Reed-Solomon codes,” Proceedings of IEEE Workshop on SignalProcessing Systems (SiPS), pp. 94–99, October 2008.

73

Page 90: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

74 BIBLIOGRAPHY

[13] W. K. Jenkins, C. Radhakrishnan, and S. Pal, “Fault Tolerant Signal Processingfor Masking Transient Errors in VLSI Signal Processors,” Proceedings IEEE Inter-national Symposium on Circuits and System (ISCAS), pp. 2570 – 2573, May 2007.

[14] N. Z. Haron and S. Hamdioui, “Residue-Based Code for Reliable Hybrid Memo-ries,” Proceedings of IEEE/ACM International Symposium on Nanoscale Architec-ture (NanoArch), pp. 27–32, 2009.

[15] ——, “Using RRNS Codes for Cluster Faults Tolerance in Hybrid Memories,” Pro-ceedings of IEEE International Symposium on Defect and Fault Tolerance in VLSISystems, pp. 85–93, October 2009.

[16] J. Sun, H. Khrisna, and K. Lin, “A Superfast Algorithm for Single-Error Correctionin RRNS and Hardware Implementation,” Journal of VLSI Signal Processing, vol. 6,pp. 259–269, 1993.

[17] D. B. Strukov, “Digital Architectures for Hybrid CMOS/Nanodevices Circuit,”Ph.D. dissertation, Stony Brook University, 2006.

[18] F. Sun and T. Zhang, “Two Fault Tolerance Design Approaches for HybridCMOS/Nanodevice Digital Memories,” Proceedings of IEEE International Work-shop on Defect and Fault Tolerant Nanoscale Architectures (Nanoarch), 2006.

[19] H. Naeimi and A. Dehon, “Fault Tolerant Nano-Memory with Fault Secure En-coder,” Proceedings of the 2nd international conference on Nano-Networks, Septem-ber 2007.

[20] C. M. Jeffery and R. J. O. Figueiredo, “Hierarchical Fault Tolerance for NanoscaleMemories,” IEEE Transactions on Nanotechnology, vol. 5, pp. 407–414, 2009.

[21] S. Hamdioui and J. E. Reyes, “New Data-Background Sequences and Their Indus-trial Evaluation for Word-Oriented RAM,” IEEE Transaction on Computer-AidedDesign of Integrated Circuit and System, pp. 892–904, 2009.

[22] G. Snider, P. Kuekes, and T. Hogg, “Nanoelectronic Architecture,” Applied PhysicsA: Materials Science and Processing, vol. 80, pp. 1183–1195, 2005.

[23] K. K. Likharev and D. B. Strukov, “Prospects for the Development of Digital CmolCircuits,” Proceedings of IEEE/ACM International Symposium on Nanoscale Ar-chitecture (NanoArch), pp. 109–116, 2007.

[24] “International Technology Roadmap of Semiconductor Emerging Research Devices,”http://www.itrs.net, 2007.

[25] X. Ma, D. B. Strukov, J. H. Lee, and K. K. Likharev, “Afterlife for Silicon: CMOLCircuit Architecture,” Proceeding of 2005 5th IEEE Conference on Nanotechnology,pp. 175– 178, 2005.

[26] Y. Chui, L. J. Lauhon, M. S. Gudiksen, J. Wang, and C. M. Liebera, “Diameter-Controlled Synthesis of Single-Crystal Silicon Nanowire,” Applied Physics Letters,vol. 78, no. 15, pp. 2214–2216, 2001.

Page 91: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

BIBLIOGRAPHY 75

[27] A. P. Graham, G. Duesberg, W. Hoenlein, F. Kreupl, M. Liebau, R. Martin, B. Ra-jasekharan, W. Pamler, R. Seidel, W. Steinhoegl, and E. Unger, “How Do CarbonNanotubes fit into the semiconductor roadmap?” Applied Physics Material Scienceand Processing, vol. 80, no. 6, pp. 1141–1155, 2005.

[28] M. Bender, A. Fuchs, U. Plachetka, and H. Kurz, “Status and Prospect of UV-Nanoimprint Technology,” Microelectronic Engineering, vol. 83, pp. 827–830, 2006.

[29] K. K. Likharev, “Hybrid CMOS/nanoelectronic Circuit: Opportunities and Chal-lenges,” Journal of Nanoelectronics and Optoelectronics, vol. 3, no. 3, pp. 203–230,2008.

[30] K. Jensen, “Field Emitter Arrays for Plasma and Microwave Source Applications,”Journal / letters Physics of Plasmas, vol. 5, no. 6, p. 2241, 1999.

[31] M. Barua and Z. Abid, “Design of a Transmission Gate Based (CMOS)/Molecular(CMOL) Memory Cell,” Proceedings of the 3rd Intenational Confrence on Designand Technology of Integrated Systems in Nanoscale Era (DTIS), pp. 1–4, 2008.

[32] N. Jha and S. Gupta, Testing of Digital Systems. Cambridge University Press,2003.

[33] R. Mastipuram and E. Wee, “Soft Errors’ Impact on System Reliability,” EDN,vol. 30, pp. 69–74, 2004.

[34] H. Li, J. Mundy, W. Patterson, D. Kazazis, A. Zaslavsky, and R. I. Bahar,“Thermally-Induced Soft Errors in Nanoscale CMOS Circuit,” Proceeding IEEEInternational Symposium on Nanoscale Architecture (Nanoarch 2007), pp. 62–69,2007.

[35] R. Baumann, “Soft Error in Advanced Computer Systems,” IEEE Design and TestComputers, pp. 258–266, May-June 2005.

[36] R. H. Morelos, The Art of Error Correcting Coding. John Wiley and Sons Inc.,2002.

[37] R. Hamming, “Error Detecting and Error Corection Code,” The Bell System Tech-nical Journal, vol. 29, no. 2, April 1950.

[38] R. Bose and D. Chauduri, “On a Class of Error Correcting Binary Group Codes,”Information and control, vol. 3, pp. 68–70, 1960.

[39] S. Wicker and V. K. Bargava, Reed Solomon Codes and Their Application. Wiley,2009.

[40] N. Szabo and R. Tanaka, Residue Arithmetic and its Application. MC-Graw-Hill,1967.

[41] P. Ellias, “Coding for Noisy Channels,” IRE Conv. Rec., vol. 3, pp. 37–46, 1955.

[42] A. Houghton, The Engineer’s Error Coding Handbook. Chapman and Hall, 1997.

Page 92: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

76 BIBLIOGRAPHY

[43] F. Barsi and P. Maestrini, “Error Correcting Properties of Redundant Residue Num-ber Systems,” IEEE Transactions on Computers, vol. 22, no. 3, pp. 307–315, March1973.

[44] A. Hiasat and A. Sweidan, “Residue Number System to Binary Converter for TheModuli Set (2n − 1, 2n, 2n + 1),” Journal of Systems Architecture 49, vol. 49, pp.53–58, July 2003.

[45] A. Omondi and B. Premkumar, Residue Number System : Theory and Implemen-tation. Imperial College Press, 2008.

Page 93: thesis all f - Delft University of Technologyce-publications.et.tudelft.nl/publications/377_experimental_analysis... · COMPUTER ENGINEERING by Zaiyan Ahyadi born in Banjarmasin,

Curriculum Vitae

Zaiyan Ahyadi was born in Banjarmasin, SouthKalimantan, Indonesia on June 1st. In Novem-ber 2001, he received Bachelor of Engineeringfrom Gadjah Mada University, Yogyakarta, In-donesia by completing The Electrical Engineer-ing Program. He started a job as Lecturer atState Polytechnic of Banjarmasin in December2001 and intend to do that job forever. He con-tinue the study at Computer Engineering Depart-ment of Delft University of Technology in August2007 with scholarship fund from MCIT (Ministryof Communication and Information) of Indonesia.He finished his Msc. Thesis Thesis ExperimentalAnalysis on ECC Schemes for Fault-Tolerant Hy-brid Memories under supervision of Dr. Ir. SaidHamdioui and advisor Nor Zaidi Haron, MSc.