estimating error rates in processor-based architectures

8
1680 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 5, OCTOBER 2001 Estimating Error Rates in Processor-Based Architectures Sana Rezgui, Raoul Velazco, Robert Ecoffet, Santiago Rodriguez, and José Ramon Mingo Abstract—This paper investigates a new technique to predict error rates in digital architectures based on microprocessors. Three studied cases are presented concerning three different pro- cessors. Two of them are included in the instruments of a satellite project. The actual space applications of these two instruments were implemented using the capabilities of a dedicated system. Results of the fault injection and radiation testing experiments and discussions about the potentialities of this technique are presented. Index Terms—Fault injection, ground testing. I. INTRODUCTION T HE increasing demand for high dependability and relia- bility of safety-critical systems (spacecraft, satellites, etc.) requires deep studies of methods suitable for the qualification of microprocessor-based architectures, in order to improve the correlation between results obtained from ground tests and those obtained in orbit. Methods usually adopted to perform single event upset (SEU) ground testing of processors differ in the way the circuit is ex- ercised while exposed to the particle beam. The so-called reg- ister testing corresponds to a static strategy, in which the whole observable SEU-sensitive area (internal registers and memory) is permanently observed. An alternative, sometimes called dy- namic testing, is to make the processor execute simple programs and to observe only the program results, this in order to activate the sensitive areas in a way closer than that of the final applica- tion. The results of these two tests can be reported as: 1) the register-bit cross-section, which is the underlying SEU cross-section; 2) the application error rate. Traditionally, the underlying SEU cross-section in a pro- cessor is assumed to be a direct measure of the rate of observable errors induced by SEUs in a system. This supposes that every SEU arising in a processor’s memory cell induces Manuscript received January 26, 2001; revised June 12, 2001. This work was supported by the French Space Agency (CNES) under Grant 721/CNES/2/99/043. S. Rezgui is with the TIMA Laboratory, F-38031 Grenoble, France (e-mail: [email protected]). R. Velazco is with the French Research Agency, 31055 Toulouse, France, and the TIMA Laboratory, 38031 Grenoble, France (e-mail: raoul.ve- [email protected]). R. Ecoffet is with the French Research Agency, 31055 Toulouse, France (e-mail: [email protected]). S. Rodríguez and J. R. Mingo are with the Spanish Research Agency, 28850 Torrejón de Ardoz, Spain (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 0018-9499(01)09023-2. errors on the program results. This interpretation is in fact the worst case situation. Indeed, SEUs will induce observable errors only if they occur in a target during its sensitivity period, called duty cycle in [1]. Ground test data (issued from particle accelerator facilities) presented in [2], as well as fault injection results carried out by means of a simulator [3], show that the observable SEU-in- duced error rate in a processor-based architecture is in fact some fraction of the underlying SEU cross-section and depends on the software executed. In [4] and [5], a benchmark of simple programs (matrix multiplication, fast Fourier transform (FFT), sorting, etc.) was used to characterize the SEU vulnerability of the Harris H80C85 and the Motorola 68 020 microprocessors. Although the programs were not too complex, significant differ- ences between the observable SEU-induced error rates for each tested program were put in evidence. This paper aims at presenting experimental data for the mi- crocontroller 80C51 from Intel, the Digital Signal Processor ADSP21060, also called SHARC from Analog Devices, and the Motorola TS6 833 216A microprocessor. The two latter pro- cessors are included in the instruments of the CESAR satellite project from INTA (Spanish Research Space Agency). These data were obtained when running simple benchmark programs as well as program modules of these final applications on digital architectures built around these processors for ground-testing purposes. The comparison of the obtained SEU error rates to those derived from the commonly used strategy would allow one to objectively draw conclusions from the resulting deviations in measured error rates. To predict the error rate for the different applications, we have used a new method allowing us to characterize and to quan- tify the effects of upsets on the operation of microprocessor- based digital architectures. This technique, recently developed at TIMA Laboratory for upset fault injection concurrently with the program execution on the tested processor, makes possible the quantification of the rate of “effective” upsets for the tested programs and thus the derivation of realistic figures for the ex- pected error rate in flight. Such experiments could lead to a well-founded methodology for the final application error-rate estimation, based on both a limited radiation testing (to evaluate the SEU cross-section per register type) and fault-injection ex- periments to statistically evaluate the fraction of upsets having consequences for the program execution. In a recent previous work, we have described the implemen- tation of this technique and its validation in two different dig- ital architectures, based, respectively, on the 80C51 microcon- troller from Intel and the TMS320C50 digital signal processor from Texas Instruments [7]. Preliminary results proved that the 0018–9499/01$10.00 © 2001 IEEE

Upload: jr

Post on 22-Sep-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

1680 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 5, OCTOBER 2001

Estimating Error Rates in Processor-BasedArchitectures

Sana Rezgui, Raoul Velazco, Robert Ecoffet, Santiago Rodriguez, and José Ramon Mingo

Abstract—This paper investigates a new technique to predicterror rates in digital architectures based on microprocessors.Three studied cases are presented concerning three different pro-cessors. Two of them are included in the instruments of a satelliteproject. The actual space applications of these two instrumentswere implemented using the capabilities of a dedicated system.Results of the fault injection and radiation testing experimentsand discussions about the potentialities of this technique arepresented.

Index Terms—Fault injection, ground testing.

I. INTRODUCTION

T HE increasing demand for high dependability and relia-bility of safety-critical systems (spacecraft, satellites, etc.)

requires deep studies of methods suitable for the qualificationof microprocessor-based architectures, in order to improve thecorrelation between results obtained from ground tests and thoseobtained in orbit.

Methods usually adopted to perform single event upset (SEU)ground testing of processors differ in the way the circuit is ex-ercised while exposed to the particle beam. The so-calledreg-ister testingcorresponds to a static strategy, in which the wholeobservable SEU-sensitive area (internal registers and memory)is permanently observed. An alternative, sometimes calleddy-namic testing,is to make the processor execute simple programsand to observe only the program results, this in order to activatethe sensitive areas in a way closer than that of the final applica-tion.

The results of these two tests can be reported as:

1) the register-bit cross-section, which is the underlyingSEU cross-section;

2) the application error rate.Traditionally, the underlying SEU cross-section in a pro-

cessor is assumed to be a direct measure of the rate ofobservable errors induced by SEUs in a system. This supposesthat every SEU arising in a processor’s memory cell induces

Manuscript received January 26, 2001; revised June 12, 2001. Thiswork was supported by the French Space Agency (CNES) under Grant721/CNES/2/99/043.

S. Rezgui is with the TIMA Laboratory, F-38031 Grenoble, France (e-mail:[email protected]).

R. Velazco is with the French Research Agency, 31055 Toulouse, France,and the TIMA Laboratory, 38031 Grenoble, France (e-mail: [email protected]).

R. Ecoffet is with the French Research Agency, 31055 Toulouse, France(e-mail: [email protected]).

S. Rodríguez and J. R. Mingo are with the Spanish Research Agency, 28850Torrejón de Ardoz, Spain (e-mail: [email protected]; [email protected]).

Publisher Item Identifier S 0018-9499(01)09023-2.

errors on the program results. This interpretation is in factthe worst case situation. Indeed, SEUs will induce observableerrors only if they occur in a target during its sensitivity period,calledduty cyclein [1].

Ground test data (issued from particle accelerator facilities)presented in [2], as well as fault injection results carried outby means of a simulator [3], show that the observable SEU-in-duced error rate in a processor-based architecture is in fact somefraction of the underlying SEU cross-section and depends onthe software executed. In [4] and [5], a benchmark of simpleprograms (matrix multiplication, fast Fourier transform (FFT),sorting, etc.) was used to characterize the SEU vulnerability ofthe Harris H80C85 and the Motorola 68 020 microprocessors.Although the programs were not too complex, significant differ-ences between the observable SEU-induced error rates for eachtested program were put in evidence.

This paper aims at presenting experimental data for the mi-crocontroller 80C51 from Intel, the Digital Signal ProcessorADSP21060, also called SHARC from Analog Devices, andthe Motorola TS6 833 216A microprocessor. The two latter pro-cessors are included in the instruments of the CESAR satelliteproject from INTA (Spanish Research Space Agency). Thesedata were obtained when running simple benchmark programsas well as program modules of these final applications on digitalarchitectures built around these processors for ground-testingpurposes. The comparison of the obtained SEU error rates tothose derived from the commonly used strategy would allow oneto objectively draw conclusions from the resulting deviations inmeasured error rates.

To predict the error rate for the different applications, we haveused a new method allowing us to characterize and to quan-tify the effects of upsets on the operation of microprocessor-based digital architectures. This technique, recently developedat TIMA Laboratory for upset fault injection concurrently withthe program execution on the tested processor, makes possiblethe quantification of the rate of “effective” upsets for the testedprograms and thus the derivation of realistic figures for the ex-pected error rate in flight. Such experiments could lead to awell-founded methodology for the final application error-rateestimation, based on both a limited radiation testing (to evaluatethe SEU cross-section per register type) and fault-injection ex-periments to statistically evaluate the fraction of upsets havingconsequences for the program execution.

In a recent previous work, we have described the implemen-tation of this technique and its validation in two different dig-ital architectures, based, respectively, on the 80C51 microcon-troller from Intel and the TMS320C50 digital signal processorfrom Texas Instruments [7]. Preliminary results proved that the

0018–9499/01$10.00 © 2001 IEEE

REZGUI et al.: ERROR RATES IN PROCESSOR-BASED ARCHITECTURES 1681

code emulating an upset (CEU)-injection method leads, for thestudied programs, to error rate prediction in very good agree-ment with error rates measured during testing under radiation.

This paper aims at demonstrating that the proposed approachfor error-rate prediction can be efficiently applied with minorhardware/software intrusion to most of microprocessor-basedapplications. The fault-injection experiments are performedwhile the studied processors (the 80C51, the TS6 833 216A, andthe SHARC) are executing various programs including modulesof flight software. As the SHARC device was not available forradiation testing, only the 80C51 and TS6 833 216A have beenexposed to heavy ion beams. The ground-testing experimentsallowed us to measure the 80C51 and TS6 833 216A underlyingcross-sections and to objectively conclude about the efficiencyof the fault-injection method to predict the error rate of astudied program.

In the first section, the predicted and measured error-ratecurves of the 80C51 are presented. In the second section arepredicted error rates for simple programs (discrete Fouriertransform, matrix multiplication) and the application of aCESAR satellite running on the SHARC as well as for twobenchmark programs (FFT, matrix multiplication) and theapplication developed for the control of the second instrumentof the CESAR satellite implemented on the TS6 833 216A.In the following, both predicted and measured cross-sectionsfor different applications running on the TS6 833 216A arecompared in order to study the efficiency of this technique.

II. FAULT INJECTIONMETHODOLOGY [6]

A. Principle

The approach relies on the injection of bit flips, randomlyin time and location, concurrently with the execution of a pro-gram. This can be achieved with minimal “intrusiveness” bysoftware/hardware means, using the interrupt mechanism. Infact, implementing this method supposes that the tested applica-tion is a processor-based digital board, organized around a de-vice capable of executing instruction sequences and taking intoaccount asynchronous signals (interrupts). The key idea is thegeneration and storage at an appropriate memory address of apiece of code called in the following CEU, whose execution willprovoke the content inversion of the selected bit, called CEUtarget. If the processor is properly configured, the CEU-codeexecution can be triggered by the assertion of an interrupt-likesignal, as shown in Fig. 1. The interrupt activation instant andthe CEU-target can be pseudorandomly chosen by an ad hocexternal mechanism. In this way, bit flips may be injected inall accessible processors’ CEU targets (internal registers andSRAM memory area) as well as in the external SRAM whereprogram data and code are stored. It is important to note that theCEU code may include instruction sequences to read, modify,and overwrite values stored in the stack. This makes it possibleto inject CEUs on critical control registers (program counter,stack pointer, status registers, etc.), often not directly accessibleby the instruction set.

The main advantages of this fault-injection strategy are thereduced intrusiveness in the system, the low cost, the possibili-ties of automation, and the flexibility. Nevertheless, two limita-

Fig. 1. Soft error injection by means of interrupts.

tions of the CEU injection approach must be mentioned: 1) asinterrupts are always taken into account at predetermined fixedinstants, the effects of SEUs’ occurring during instruction exe-cution are not possible to simulate, and 2) not all possible sen-sitive targets can be reached. In spite of these limitations, weassume that the performances of modern processors and theirhuge internal memory space make the accessible area representa significant percent of the total sensitive area, giving some sig-nificance to the results of the proposed fault-injection approach.

Implementing the proposed fault-injection approach requiresextra hardware to load the memory with data corresponding tothe desired CEU code, to trigger the interrupt signal and to com-pare the program execution time and outputs to expected values.The architecture of a dedicated test system, called testbed forharsh environment studies on integrated circuits (THESIC), de-veloped at TIMA for SEU ground-testing purposes [8], offereda suitable platform for the CEU injection. Indeed, THESIC isorganized in two boards: amotherboardfor both the controlof test operation under radiation and user interface purposesand adaughterboardfor the adaptation of the device under test(DUT) to the motherboard bus protocol (Fig. 2). The communi-cation between the two boards is achieved in asynchronous waythrough a common memory area, called memory mapped inter-face (MMI). Typically, during a test, the DUT indicates by aninterrupt when the MMI area has data to be transferred to themotherboard. When this happens, the motherboard interruptsthe DUT board to read the results and thus to detect eventualerrors. To cope with critical errors, a programmable softwarewatchdog was implemented in the motherboard.

The THESIC motherboard was recently enhanced with pseu-dorandom interrupt generation capabilities and a new operationmode providing different options for CEU injection. With theseoptions, the selection of the two parameters of simulated up-sets—the bit location to be corrupted and the instant of fault oc-currence—can be chosen either pseudorandomly or determin-istically. This flexibility appears to be very useful for the in-vestigation of the effects of upsets in complex applications. Forinstance, repeated experiments with pseudorandom choice forboth the CEU-target and the occurrence instant allow one to getobjective figures about the fraction of upsets that have no effects

1682 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 5, OCTOBER 2001

Fig. 2. THESIC experimental setup.

for a given program. Moreover, it may also put in evidence theconfiguration (occurrence instant/location) of critical upsets.

B. Error-Rate Prediction

The main contribution of this work is the prediction, exclu-sively from the underlying cross-section (1) and fault-injectionexperimental results, of the processor cross-section executinga given program, thuswithout running the evaluated programunder beam exposure. Obviously, radiation experiments areneeded to determine the underlying cross-section, but noticethat hardware and software developments needed to performsuch experiments are significantly more simple

ErrorsNumber of particles

(1)

The error rate issued from fault-injection experiments ,predicted for a studied program, is calculated according to (2)

ErrorsCEUs injected

(2)

The cross-section of a processor running a particular pro-gram can be estimated as the product of the underlying SEUcross-section , corresponding to the sensitive areas of theprocessor, by the percentage of global errors caused byinjected CEUs, as shown in (3)

program (3)

Radiation testing campaigns, in which the 80C51 andTS6 833 216A processors were exposed to beams of severalheavy-ion species, were performed with two facilities.1Derivedresults for the microcontroller 80C51 are presented in thefollowing.

III. EXPERIMENTAL RESULTS FOR THE80C51DAUGHTERBOARD

The predicted error rate for the 80C51 was estimated asthrough sessions of CEU injection performed

when running a matrix multiplication program. A statictesting (continual verification of internal memory and registers

1The Tandem Van de Graaff particle accelerator (Institute of Nuclear Physics,Orsay, France) and the cyclotron Cyclone (Louvain-la-Neuve, Belgium).

Fig. 3. Predicted and measured cross-sections for the 80C51.Exposedprogram: a matrix multiplication (6� 6).

TABLE IPREDICTED AND MEASUREDCROSS-SECTIONSEXPOSEDPROGRAM: A MATRIX

MULTIPLICATION (6� 6)

contents) of the microcontroller 80C51 has been performedwith heavy ions having a linear energy transfer (LET) varyingbetween 2.97 and 40.7 MeV/mg/cmin order to determinethe underlying cross-section. From (3), we have estimated thecross-sections for the matrix multiplication program. Predictedand measured cross-sections are provided in Table I andrepresented in Fig. 3.

The curves represented in Fig. 3 put in evidence the excellentcorrelation obtained between predicted and measured cross-sec-tions. In fact, the two curves are practically superposed exceptfor the neon (LET MeV/mg/cm ), where the differenceis still negligible (see Table I). The excellent agreement betweenpredicted and observed cross-sections can be explained by thelarge portion (estimated at 93%) of the sensitive area accessibleto the CEU injection mechanism.

Nevertheless, to show evidence of the generality of theapproach, it has to be implemented with different processors,having more complicated features. For this purpose, we havechosen the SHARC and the TS6 833 216A because modules ofa flight program were available.

IV. CESAR PROJECT

The CESAR project is an earth observation satellite missiondeveloped in cooperation between INTA from Spain and the Na-tional Commission for Space Activities from Argentina. The

REZGUI et al.: ERROR RATES IN PROCESSOR-BASED ARCHITECTURES 1683

Fig. 4. CESAR satellite current flight configuration.

mission, with a proposed launching date of the correspondingCESAR satellite of 2002/2003, consists of the design, construc-tion, launching, and operation of a small satellite, around 400Kg, and the update of the existing ground segment capabilitiesin Spain and Argentina to receive and process the CESAR-gen-erated data. The primary objectives will be cartography, topog-raphy, thematic studies, and geophysics, with a satellite payloadcomposed of different cameras (Fig. 4).

1) Panchromatic camera (IRIS):in the visible range of thespectrum; it will be used for cartography and topography.

2) Multispectral camera:with six bands in the visible andnear infrared range of the spectrum; it will be used fornatural resources applications.

3) Spectrometer (MEGA):to measure the concentration ofthe atmospheric gases involved in the ozone destructionprocess.

4) High-sensitivity panchromatic camera:in the visiblerange of the spectrum, but with very high sensitivity; itwill be used to take images of clouds and polar vortex inthe nighttime.

INTA is responsible for the design and development of twoinstruments of CESARs payload: the panchromatic cameraIRIS and the spectrometer MEGA. From the point of viewof the electronic design, both of them profit from the use ofstate-of-the-art architectures and devices for ground commer-cial applications. That will allow for the best performance,which means the top-quality image for the camera, and thehighest resolution when detecting atmospheric components inthe spectrometer.

The CESAR program has specified that, as for the level ofqualification, the electronic components must comply withnorm MIL-STD-883 B. Nonetheless, those complex devices forwhich there is no information on sensitivity to singular eventeffects (SEEs) should be tested in a radiated environment.

Two military versions of two processors and two A/D con-verters have been selected.

1) ADSP-21060 (Analog Devices). 40 MIPS, which will beused as a CPU for control and acquisition for the IRIScamera.

2) AD 9040 (Analog Devices). A/D converter with a sam-pling rate of 40 MHz and a resolution of 10 bits, used inthe IRIS camera.

Fig. 5. Architecture of the MEGA instrument.

3) TS6 833 216A (Thomson-CSF). A microprocessor of 32bits and 16 MHz that will serve as the control CPU in theMEGA spectrometer.

4) AD 677 (Analog Devices). A/D converter with a resolu-tion of 16 bits and 100 kSPS with serial interface. It isused for reading a linear image sensor (1024 pixels) inthe MEGA spectrometer.

Two software applications have been developed for both theADSP21060 and the TS6 833 216A for control of the operationof the two instruments IRIS and MEGA. Details about theseprograms are presented in the next section.

V. TESTEDARCHITECTURES

A. MEGA Application

To capture those solar spectra that contain informationon gases present in the atmosphere, the microprocessorTS6 833 216A performs thermal control functions of thedetector and the instrument itself, as well as its maintenance,calibration, and auto-test, plus data downloading to the satellitestorage unit. The MEGA application running on this mi-croprocessor controls the exposition time of a PDA (sensorarray of 1024 photodiodes) image detector of 1024 pixelsand reads it by means of the converter AD677 connected toa synchronous channel QSPI in the TS6 833 216A, as shownin Fig. 5. It also communicates with the satellite’s onboarddata handling (OBDH), which is in charge of telemetry andexecuting telecommand through an interpreter of high-levelcommands implemented on an MIL-1553 data bus.

B. IRIS Application

The software for the IRIS application has been developed andimplemented on the ADSP21060. The philosophy of this appli-cation is to activate the functions of the instrument commandedby the OBDH. The communication between the OBDH and theIRIS instrument is achieved by means of an MIL-1553 data bus.First, the program carries out a test of the different units of thecamera. Once the test is completed, the camera rests in standby

1684 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 5, OCTOBER 2001

Fig. 6. Architecture of the IRIS for digital signal processor ADSP21060.

Fig. 7. Architecture of the connection of the AD 9040 with the processorADSP-21 060.

mode. The configuration allows the modification of parameterssuch as exposition time and acquisition gain for each of thefour channels. After clock synchronization, the application goesagain to standby mode, where it pauses until the OBDH startsthe image acquisition. Fig. 6 depicts the architecture of an IRISinstrument.

All charge-coupled device (CCD) control lines are generatedby using the link-port channels of the digital signal processor(DSP). One link-port channel provides the CCD clock signalsand a second one is used for the control of the sample & holdcircuit (S&H), the A/D converter, and the programmable logicdevice (PLD). The data transmission to the PAD (storage sub-system) is carried out through a 10-bit parallel synchronouschannel at a rate of 20 MHz. Some information is attached toeach of the image lines: the current line number, the channelsused, and the time of acquisition.

The CCD detector is made up by 12 172 pixels divided intofour channels of 3043 pixels each. The CCD image acquisitionis achieved by reading simultaneously the four channels throughthe DSP link ports. Each channel has an S&H, a 10-bit A/Dconverter, and a PLD, which makes the average of the four serialreadings. Fig. 7 depicts the details of the connection between theA/D converter and the DSP ADSP21060.

Fig. 8. Block diagram of the TS6 833 216A daughterboard.

For this purpose, two new THESIC daughterboards were de-signed and developed for the ADSP21060 processor and theTS6 833 216A microprocessor for ground testing and fault in-jection.

VI. EXPERIMENTAL SETUP

A. THESIC Daughterboard

A daughterboard for the THESIC testbed is composedmainly of the DUT, glue logic, the MMI for the communica-tions between the two boards, external SRAM memory, and anEEPROM for program storage and clock system. Glue logicadapts the signals of the DUT to the bus of the motherboard.The TS6 833 216A board includes also an A/D converter(AD677), needed for the adaptation of programs developed forCESAR instrument. During ground tests, appropriate signalstimuli will exercise the A/D converter, simulating in someway the measures performed in flight. As an example, Fig. 8depicts the block diagram of the TS6 833 216A daughterboardand the A/D converter AD 677.

The SHARC daughterboard has similar architecture to that ofthe TS6 833 216A. Fig. 9 shows the motherboard connected tothe ADSP21060 daughterboard.

B. Implementation of IRIS Software on the SHARCDaughterboard

Two link ports are used in this application program: link port0 and link port 3 connected to the CCD and the A/D converter,respectively. They generate the control and clock signals for thecamera and the A/D converter. The analog signal generated bythe CCD is sent to the A/D converter. The camera operates intwo modes: standby and transfer. The clock signal of the A/Dconverter is only generated during the transfer period, and theCCD standby mode must be held within a period of 706s bymeans of a timer interrupt. During the standby mode, the oper-ation of the CCD is controlled by link port 0. For testing pur-poses, two other link ports were used to detect any error thatwould occur during the operation under radiation: link ports 1and 2 connected with link ports 0 to 3, respectively. To avoidmissing any data, the receivers operate at two times the speed

REZGUI et al.: ERROR RATES IN PROCESSOR-BASED ARCHITECTURES 1685

Fig. 9. The ADSP21060 daughterboard plugged on the THESIC motherboard.

of the transmitter. Note that the A/D converter was not imple-mented in the DSP daughterboard.

Fault-injection experiments have been performed on both theSHARC and the TS6 833 216A daughterboards for differentprograms, including simple benchmark programs and theCESAR flight software. These experiments aimed at quanti-fying the rate of “effective” upsets for the tested programs andthus to derive realistic figures for the expected error rate in thefinal environment. In the following, we present the results offault-injection experiments.

VII. EXPERIMENTAL RESULTS

Three different programs—an 1111 matrix multiplication,a DFT, and the program modules of the IRIS application—wereimplemented on the ADSP21060 for fault-injection testing. Forall these programs, both the code and the data are resident withinthe SHARC internal memory. For the TS6 833 216A micropro-cessor, fault-injection and radiation experiments have been per-formed for two simple programs (FFT and a matrix multipli-cation) and the flight software MEGA intended to control theoperation of the spectrometer.

A. ADSP21060 Daughterboard

1) Targets of CEU Injection:Areas considered sensitivefor the SHARC are the internal memory and the registers. Theformer is partitioned in 1.5 Mbits for code (48 bits) and 2 Mbitsfor data (32 bits). The latter comprises I/O registers for inputand output (172) and control or status (101) registers. Eachregister is coded in 32 bits. Fig. 10 represents the mapping ofI/O registers and internal memories in the memory map of theSHARC.

Among the whole sensitive area (containing approximately3.5 Mbits), only 151 bits in the I/O registers cannot be acces-sible by the user. Compared to the huge zones where bit flipscan be injected, the percentage of these bits is around 410 ,which is negligible. In addition, it must be noticed that reservedzones in the SHARC (whose content is unknown to the user)and control parts of this processor may be perturbed by radi-ation. Those zones, where bit flips cannot be injected, couldmake some difference in the resulting error rate compared to

Fig. 10. Targets of CEU injection for SHARC.

TABLE IIADSP21060 SENSITIVE AREA OCCUPATION FOREACH TESTEDPROGRAM

the one obtained by radiation ground tests. Considering the largeinternal memory area where errors can be simulated, we expectthat the difference will be negligible.

Effects of CEU injection differ according to the occupation ofsensitive areas (internal memory, registers, etc.) while executingthe considered program on the target processor. Table II sum-marizes the percentages of occupied sensitive areas, for boththe code data and registers, for each program running on theSHARC.

2) Results of CEU Injection:For each program, we have in-jected 40 000 pseudorandom bit flips. Obtained results are clas-sified in three groups: tolerated errors, result errors, and loss ofsequence. The first group, tolerated errors, corresponds to thosebit flips injected on memory elements which do not cause anyeffects at the outputs of the program. The second one covers thecase where the obtained results and the expected ones differ in atleast a single bit. Finally, cases where after fault injection we donot get any answer from the processor are classified in the lossof the sequence group. The consequences of CEUs’ belongingto this last malfunction type are unrecoverable, needing a hard-ware Reset to restart program execution.

The error rate predicted for each program is calculatedaccording to (3). Obtained results are summarized in Table III:

From these experiments, it can be forecasted that only a fewof the upsets arising in flight during the IRIS software operationwill provoke instrument misbehaviors. Indeed, only around 2%

1686 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 48, NO. 5, OCTOBER 2001

TABLE IIIRESULTS OFCEU FAULT-INJECTIONEXPERIMENTS FOR THESHARC

Fig. 11. Targets of CEU injection for TS6 833 216A.

of the injected CEU caused errors on the results of the IRIS ap-plication. This low sensitivity is certainly a direct consequenceof both the small register and memory area occupation, around3% for the IRIS software (we recall that faults are randomly in-jected in all the sensitive area), and the fact that injecting an erroron a memory element where the content was just consumed doesnot obviously cause errors.

For the other simple programs, the error rate is negligible. Thedifferences of sensitivity are due to the sensitive area occupiedby each program: the IRIS application uses approximately threetimes (respectively, four times) more sensitive zones of the DSPwhen running the DFT (respectively, the matrix multiplication).

Concerning the detected error types, for the DFT program thesequence loss errors were significantly more frequent than forthe other programs. This can be explained by the large numberof registers used during the calculation loop to store criticalparameters. Bit flips on these targets lead to DSP crashes,which exceeded the time limited by the software watchdogimplemented on the motherboard.

B. TS6 833 216A Daughterboard

1) Targets of CEU Injection:Fig. 11 illustrates the map-ping of registers and internal memory (in gray) in the micro-processor TS6 833 216A. The registers of this microprocessorserve to control five modules:

1) SIM: system integration module for the system control(watchdog, protection, etc.);

2) CPU32: central processing unit for the code;

TABLE IVTS6 833 216A SENSITIVE AREA OCCUPATION FOREACH TESTEDPROGRAM

TABLE VRESULTS OFCEU FAULT-INJECTIONEXPERIMENTS FOR THETS6 833 216A

3) TPU: time processor unit, which is a dedicated micro-engine operating independently of CPU32;

4) QSM: queued serial module, containing serial communi-cation interface (SCI) and a queued serial peripheral in-terface;

5) TPURAM CTL: control registers for the TPU microcodeemulation RAM.

As shown in Fig. 11, reserved internal memory areas repre-sent 320 bytes. If they are used in the operation of this micropro-cessor, they may also cause some difference when comparingradiation and CEU injection results.

The percentages, for both the internal memory and the reg-isters used for each program running on the microprocessorTS6 833 216A, are given in Table IV.

Except the matrix multiplication program, neither the FFTnor the MEGA programs use the internal memory. In fact, allthe data needed to store the elements of matrixes operands arestored in the internal memory.

2) Results of CEU Injection:Table V summarizes the resultsof the CEU injection sessions obtained for the microprocessorTS6 833 216A when running the three programs.

Referring to the experimental results presented in Table Vfor the FFT application, around 2% of the total number of in-jected faults caused program result errors. However, for the ma-trix multiplication, more than 13% of the CEU injected pro-voked errors on the program outputs. These results can be ex-plained by the fact that the FFT program does not use locationsof the internal memory as the matrix multiplication one, whereall the matrix operands and results are stored inside the internalmemory of the microprocessor.

Finally, only 0.86% of injected CEUs in the matrix multipli-cation program caused system crashes. This can be due to thefact that this program is using a few number of control registers.

One of the advantages of the CEU injection method is thepossibility to perform exhaustive experiments in particular reg-isters or critical data stored in internal memory or even in codearea. The case of the program counter PC was particularly in-vestigated when executing the matrix multiplication programand shows that 14.7% of the injected CEUs are tolerated, 37.7%caused errors on the matrix result, and 47.7% provoked system

REZGUI et al.: ERROR RATES IN PROCESSOR-BASED ARCHITECTURES 1687

TABLE VIPREDICTED AND MEASUREDERRORRATES FOR THETS6 833 216A UNDER NEON IONS

Fig. 12. Predicted and measured cross-sections for the TS6 833 216Aexposedprogram: a matrix multiplication.

crashes. This demonstrates that the sensitivity of this register isnot 100%, as generally assumed.

C. Radiation Testing Results for the TS6 833 216ADaughterboard

We have run the different programs on the TS6 833 216Awhile exposed to the Neon heavy ions. Table VI summarizesthe predicted and the measured error rates for each program. Ex-cept for the matrix multiplication, the error rates are negligibleand show that if an application running on the TS6 833 216Adoes not use the internal memory, SEUs are practically withouteffects. We have run the matrix multiplication program underheavy-ion beams. Predicted and measured error rates are repre-sented in Fig. 12. Notice that no errors were obtained for boron(B) and nitrogen (N) heavy ions.2

Again, the good agreement obtained between the predictedand measured error rates prove the efficiency of this method ofCEU injection to predict the error rates, using just the underlyingcross-section to SEUs of the studied processor and without ex-posing to radiation the processor running the specific program.

VIII. C ONCLUSION AND FUTURE WORK

In this paper, we have presented a methodology for the pre-diction of error rates for digital architectures operating underradiation. A flexible tool for bit flip injection concurrently withthe execution of a program was developed and implementedon three digital architectures built respectively around theIntel 80C51 microcontroller, the ADSP21060 digital signalprocessor, and the TS6 833 216A microprocessor. The behaviorof these architectures in the presence of bit flips was studied

2Results obtained for the TS6 833 216A were also presented for reviewing in[12].

for different kinds of programs. For the 80C51 microcontroller,a matrix multiplication has been developped. For the SHARCarchitecture, we implemented a matrix multiplication, a DFT,and a program module devoted to the control of a CCD cameraincluded in a scientific satellite project. For the board based onthe TS6 833 216A, two simple programs and the flight softwareintended to control the operation of the spectrometer MEGAwere implemented. The result of fault-injection experimentsallowed us to forecast very little sensitivity to bit flips provokedby charged particles for both types of flight software developed,respectively, for the SHARC and the TS6 833 216A.

To demonstrate that application error rates can be predictedfrom the results of CEU injection experiments combinedwith the measure of individual sensitivities to upsets of theprocessor’s memory elements obtained from radiation testing,radiation testing was performed. The confrontation for botharchitectures built around the 80C51 and the TS6 833 216Arunning different programs of predicted error rates to measuredones proved the efficiency of the CEU injection approach.

ACKNOWLEDGMENT

The authors would like to thank W. K. Ho, R. Ong, J. Li, andE. Y. Ong from Ngee ANN Polytechnic (Singapore) for theirhelp in software and hardware developments.

REFERENCES

[1] R. Koga, W. A. Kolasanski, M. T. Marra, and W. A. Hanna, “Techniquesof microprocessor testing and SEU-rate prediction,”IEEE Trans. Nucl.Sci., vol. NS-32, pp. 4219–4224, Dec. 1985.

[2] F. Bezerra, D. Hardy, R. Velazco, and H. Ziade, “A new SEU latch-uptester for microprocessors initial results on 32-bit floating point DSP’s,”in Proc. RADECS Radiation and its Effects on Components and Systems,1995, pp. 296–301.

[3] V. Asenek and al, “SEU induced errors observed in microprocessor sys-tems,”IEEE Trans. Nucl. Sci., vol. 45, pp. 2876–2883, Dec. 1998.

[4] J. H. Elder, J. Osborn, W. A. Kolasinsky, and R. Koga, “A methodfor characterizing microprocessor’s vulnerability to SEU,”IEEE Trans.Nucl. Sci., vol. 35, pp. 1679–1681, Dec. 1988.

[5] R. Velazco, S. Karoui, T. Chapuis, D. Benezech, and L. H. Rosier,“Heavy ion tests for the 68 020 microprocessor and the 68 882 copro-cessor,”IEEE Trans. Nucl. Sci., vol. 39, Dec. 1992.

[6] R. Velazco, S. Rezgui, and R. Ecoffet, “Injecting CEU’s (Code Emu-lating Upsets) to evaluate the error rate of microprocessor-embeddeddigital applications,” in2000 Single Event Effects (SEE) Symp., Man-hattan Beach, Los Angeles, April 11–13, 2000.

[7] , “Predicting error rate for microprocessor-based digital architec-tures through C.E.U. (Code Emulating Upsets) injection,”IEEE Trans.Nucl. Sci., vol. 47, Dec. 2000.

[8] R. Velazco, P. Cheynet, A. Bofill, and R. Ecoffet, “THESIC: A testbedsuitable for the qualification of integrated circuits devoted to operatein harsh environment,” inProc. IEEE Eur. Test Workshop (ETW’98),Sitges, Spain, May 1998, pp. 89–90.

[9] R. Velazco and S. Rezgui, “Error rate estimation process for digital ar-chitectures exposed to radiation,” patent pending, Jan. 16, 2001.