rashad s. oreifej, carthik a. sharma, and ronald f. demara university of central florida expediting...

27
Rashad S. Oreifej, Carthik A. Sharma, and Ronald F. DeMara University of Central Florida Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware 1 ReConFig’06 ReConFig’06 San Luis Potosi - Mexico San Luis Potosi - Mexico 1. Research support in-part by NSF grant CRCD: 0203446

Upload: sherman-turner

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Rashad S. Oreifej, Carthik A. Sharma, and Ronald F. DeMaraUniversity of Central Florida

Rashad S. Oreifej, Carthik A. Sharma, and Ronald F. DeMaraUniversity of Central Florida

Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable

Hardware1

ReConFig’06ReConFig’06San Luis Potosi - MexicoSan Luis Potosi - Mexico

ReConFig’06ReConFig’06San Luis Potosi - MexicoSan Luis Potosi - Mexico

1. Research support in-part by NSF grant CRCD: 0203446

Evolvable Hardware

Evolutionary Design:Evolutionary Design:• Start with available CLBs and IOBs• Implement a design using Genetic

Operators etc [Fogarty97]• Limited or no ability to re-design to account

for suspected faulty resources

Evolutionary Regeneration:Evolutionary Regeneration:• Start with an existing pool of designs

• Some existing configurations may use faulty resources

• Eliminate use of suspected faulty resources

• Genetic Operators can be applied to refurbish designs [Vigander01]

Previous Work

• Pre-compiled Column-Based Dual FPGA architecture [Mitra04]Pre-compiled Column-Based Dual FPGA architecture [Mitra04] Autonomous detection, repair by shifting pre-compiled columns Isolation using distributed CED-checkers and “blind” reconfiguration

attempts

• Overview of Combinatorial Group Testing and Applications Overview of Combinatorial Group Testing and Applications [Du00][Du00] Provides taxonomy and general algorithms for applying CGT Examples of CGT applications: DNA clone library filtering, vaccine

screening, computer fault diagnosis, etc.

• CGT Enhanced Circuit Diagnosis [Kahng04]CGT Enhanced Circuit Diagnosis [Kahng04] Present doubling, halving etc for circuit fault diagnosis using BIST,

CGT Requires ability to test resources individually

• Chinese Remainder Sieve technique [Eppstein05]Chinese Remainder Sieve technique [Eppstein05] Efficient non-adaptive and two-stage CGT based on prime number

driven test formation Improved algorithms for practical problem sizes (n < 1080) with small

number of defectives (d < 4)

Genetic Algorithms & Evolvable Hardware

GAs are strong candidates for implementing system GAs are strong candidates for implementing system refurbishment:refurbishment: They implement guided trial-and-error search using principles of

Darwinian evolution Iterative selection enforces “survival of the fittest” Genetic operators - mutation, crossover, … - can be used to

refurbish designs HypothesisHypothesis: Information regarding resource performance can

expedite GA-based refurbishment IndividualIndividual(Chromosome)(Chromosome)

GENEGENE

GAs frequently use strings of 1s and 0s to GAs frequently use strings of 1s and 0s to represent candidate solutionsrepresent candidate solutions FPGA Configuration File is a String of 1s and 0s

Conventional vs. CGT-Pruned GA

• Conventional GA: Conventional GA: Searches the whole space to evolve a working design or repair Information about resource suitability may accelerate search

• CGT-Pruned GA: CGT-Pruned GA: Prefers resources of higher fitness to evolve a working design or repair.

Q. How to obtain resource fitness information?A. Using Group Testing Techniques.

Combinatorial Group Testing identifies a decreasing group of “defectives” by iterative refinement

Tests on subsets of suspects Is expected to take less time. “Faster Design and Faster

Repair”

CGT-Pruned GA Simulator

Settings

Truth Table

Seed Config.

Fitness Report

Best Config.

CGT

GA

If Repair

Resource Info

No. Of CLBs = ...No. LUTs = ...Pop. Size = … . . .

I1 I2 ... O1 O2 ...0 0 ... 0 0 0 ...0 0 ... 0 1 0 … . . .

CLB #:0LUT #:0FunctionType: ORLUT inputlineInputLine#0:4InputLine#1:3 . . .

Gen. Max Ave 2 154 142 3 155 139 . . .

CLB #:0LUT #:0FunctionType: XORLUT inputlineInputLine#0:0InputLine#1:5 . . .

Experimental Setup

Target CircuitTarget Circuit 3-bit x 2-bit Multiplier

No. of ExperimentsNo. of Experiments 120 (60/Experiment Type Repair and Design)

FPGA ArchitectureFPGA Architecture Feed-Forward design

No. of ResourcesNo. of Resources 60 LUTs (15 CLB, 4LUTs/CLB)

Fault ModelFault Model Logic Single Fault Model

Fault TypeFault Type Stuck at One

CGT-Pruned Refurbishment

• IsolateIsolate and A and Avoidvoid suspect resources from being used suspect resources from being used

• HypothesisHypothesis: CGT-Pruned GA Repair evolves a full fitness circuit faster

than Conventional GA Repair Results show performance improvement in CGT-Pruned

Repair

Results: Conventional Vs. CGT-Pruned Repair

CGT-Pruned GA out-performs Conventional GA

Experiment Type Conventional Repair CGT-pruned Repair

Circuit 3-bit x 2-bit Multiplier 3-bit x 2-bit Multiplier

Number of Experiments 30 30

Arithmetic Mean (Generations)

17150 10700

Standard Deviation 15650 12550

Standard Error of the Mean 2850 2300

68% Confidence Interval [14300 → 20000] [8400 → 13000]

Achieving Refurbishment with Cell Swapping

• IsolateIsolate and and SwapSwap suspect resources suspect resources • Cell SwappingCell Swapping Operator Operator

Copy suspect resource “Cell” configuration to another unused cell GA searches for routing strategy to re-route interconnect to the

previously-unused cell• Refurbishment with Cell SwappingRefurbishment with Cell Swapping

Swap suspect cells one by one and evaluate fitness until full fitness is evolved

If swapping all suspect cells does not realize complete refurbishment, then employ other GA operators

Repair Progress

CGT-Pruned GA Design

• Evolve the entire circuit design from scratchEvolve the entire circuit design from scratch• Avoid Avoid suspectsuspect resources and take advantage of resources and take advantage of

resource redundancy within the FPGAresource redundancy within the FPGA

CGT-Pruning outperforms Conventional GA-based techniques

Results: Conventional Vs. CGT-Pruned Design

Design of a circuit in the presence of a single stuck-at fault

Experiment Type Conventional design CGT-pruned design

Circuit 3-bit x 2-bit Multiplier 3-bit x 2-bit Multiplier

Number of Experiments 30 30

Arithmetic Mean (Generations)

64500 53900

Standard Deviation 36000 37300

Standard Error of the Mean 7200 7450

68% Confidence Interval [57300 → 71700] [46450 → 61350]

Comparison of Performance – Number of Generations for Repair

More than 70%70% of the experiments benefited substantially from resource information generated using CGT

Results Summary

As opposed to Conventional GAs, CGT-Pruned GAsCGT-Pruned GAs:: Completely refurbish configurations in 38%38% fewer generations Design fully functional configurations in 16%16% fewer generations Faulty resources are eliminated from

Pool of unused-resources in the case of repair as opposed to the pool of all-resources in the case of design.

Repair complexity vs. Design complexity Repair complexity << Design complexity Repairs were realized in one-fifthone-fifth of the time required for Design

Backup Slides

• On following pages …

Motivation

• Mission-critical Embedded Systems require high Mission-critical Embedded Systems require high reliability and availabilityreliability and availability

• Characteristics of Operating Environment may Characteristics of Operating Environment may induce hardware failures:induce hardware failures: Aging, Manufacturing Defects, …etc.

• System Reliability:System Reliability: Fault Avoidance. “Always Possible?”… No Design Margin. “Always Adequate?”… No Modular Redundancy. “Always Recoverable?”…No Fault Refurbishment. “Highly Flexible?” … Yes … but

technically challenging to achieve

Group Testing Techniques

• Competitive Group TestingCompetitive Group Testing Algorithm based on Algorithm based on group testinggroup testing

methodsmethods Use Use competitioncompetition between between

configurationsconfigurations Temporal information stored in Temporal information stored in HH matrix matrix Successive intersectionSuccessive intersection Monitor health history of resources Monitor health history of resources

which presents resource fitnesswhich presents resource fitness Simulated using C programming Simulated using C programming

language and GSL functions language and GSL functions [Sharma-[Sharma-06]06]

0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 1 1 0 0 0

0 0 2 1 0 0 1 0 0 0

0 0 1 0 1 1 0 1 0 0

0 0 1 1 0 1 0 0 0 0

0 0 1 0 0 1 1 0 0 0

0 0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Relative fitness of resource α 1/H [i,j]

H H [i,j][i,j]

i,j

Three Fast Runs of the CGT-pruned GA Repair

GA evolves to a relatively very high fitness within the first few hundreds generations, but takes significantly more generations to reach the maximum fitness

References[1] Fogarty T. C., J. F. Miller, and P. Thomson, "Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs," in Proceedings of The

2nd Online Conference on Soft Computing, 23-27 June 1997.

[2] Sverre Vigander, “Evolutionary Fault Repair in Space Applications”, Master’s Thesis, Dept. of Computer & Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, 2001.

[3] C. A. Sharma, R. F. DeMara, "A Combinatorial Group Testing Method for FPGA Fault Location", accepted to International Conference on Advances in Computer Science and Technology (ACST 2006), Puerto Vallarta, Mexico, January 23 - 25, 2006

[4] S. Mitra and E. J. McCluskey, “Which Concurrent Error Detection Scheme to Choose?,” in Proceedings of the International Test Conference 2000, p. 985, October 2000.

[5] D. Du and F. K. Hwang. Combinatorial Group Testing and its Applications, volume 12 of Series on Applied Mathematics. World Scientific, 2000.

[6] A. B. Kahng and S. Reda. “Combinatorial Group Testing Methods for the BIST Diagnosis Problem,” in Proceedings of the Asia and South Pacific Design Automation Conference, January 2004.

[7] Keymeulen, D.; Zebulum, R.S.; Jin, Y.; Stoica, A.. “Fault-Tolerant Evolvable Hardware Using Field-Programmable Transistor Arrays”, IEEE Transactions On Reliability, Vol. 49, No. 3, September 2000

[8] Lohn, J.; Larchev, G.; DeMara, R. “Evolutionary fault recovery in a Virtex FPGA using a representation that incorporates routing”, Parallel and Distributed Processing Symposium, 2003. Proceedings. International 22-26 April 2003

[9] Lach, J.; Mangione-Smith, W.H.; Potkonjak, M. “Low overhead fault-tolerant FPGA systems”, Very Large Scale Integration (VLSI) Systems, IEEE Transactions on Volume 6,  Issue 2,  June 1998

[10] Miron Abramovici, John M. Emmert and Charles E. Stroud , “Roving Stars: An Integrated Approach To On-Line Testing, Diagnosis, And Fault Tolerance For Fpgas In Adaptive Computing Systems”, The Third NASA/DoD Workshop on Evolvable Hardware, Long Beach, Cailfornia 2001

[11] DeMara, R.F.; Kening Zhang. “Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration”, Evolvable Hardware, 2005. Proceedings. 2005 NASA/DoD Conference on 29-01 June 2005

[12] D. Eppstein, M. T. Goodrich, and D. S. Hirschberg. “Improved combinatorial group testing for realworld problem sizes”, In Workshop on Algorithms and Data Structures (WADS), Lecture Notes Comput. Sci. Springer, 2005.

[13] J. F. Miller, P. Thomson, and T. Fogarty. “Designing Electronic Circuits Using Evolutionary Algorithms. Arithmetic Circuits: A Case Study”, In D. Quagliarella, J. Periaux, C. Poloni, and G. Winter, editors, Genetic Algorithms and Evolution Strategy in Engineering and Computer Science, pages 105--131. Morgan Kaufmann, Chichester, England, 1998.

Fault Tolerant Design and Detection Characteristics

***Incorporates resource performance information

Previous Work

Fault Recovery Characteristics

Previous Work

Our Goal: Autonomous FPGA Refurbishment

Redundancy

increases with amount of spare capacity

restricted at design-time

based on time required to select spare resource

determined by adequacy of spares available (?)

yes

Refurbishment

weakly-related to number

recovery capacity

variable at recovery-time

based on time required to find suitable recovery

affected by multiple characteristics (+ or -)

yes

Overhead from Unutilized Spares weight, size, power

Granularity of Fault Coverage resolution where fault handled

Fault-Resolution Latency availability via downtime required to handle fault

Quality of Repair likelihood and completeness

Autonomous Operation fix without outside intervention

increase availability without carrying pre-configured spares …

everyday example

spare tires can of fix-a-flat

Commercial Applications: Nextel: frequency allocation for cellular phone networks -- $15M

predicted savings in NY market Pratt & Whitney: turbine engine design --- engineer: 8 weeks;

GA: 2 days w/3x improvement

International Truck: production scheduling improved by 90% in 5 plants

NASA: superior Jupiter trajectory optimization, antennas, FPGAs

Koza: 25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters

GA Success Stories

Adaptive GA Design

Circuit: 2 to 4 Decoder

CLBs: 2

LUTs/CLB: 4

Fault: Stuck at 1 and Stuck at 0

Traditional GA: 220 Generations *, std dev 240**

Adaptive GA: 152 Generations *, std dev 120**

* Arithmetic mean for twenty experiments

** Standard Deviation for twenty experiments

Analysis Metrics

Mean:

Standard Deviation:

Standard Error of the Mean:

Confidence Level:

1

)(1

2

n

xn

kxk

x

n

xn

kk

x

1

nSEM x

x

%68)( xxx SEMCL

%95)2( xxx SEMCL

CGT-Pruned GA Simulator

• C++ based console applicationC++ based console application• Consists of:Consists of:

Combinatorial Group Testing component Uses Gnu Scientific Library (GSL)

Genetic Algorithm component Object oriented architecture that models FPGA resources

• Modes of Operation:Modes of Operation: CGT-Pruned GA Repair

Use CGT to isolate suspect resources Avoid use of suspect-faulty resource in design refurbishment

process CGT-Pruned GA Repair with Cell Swapping

Swap suspect-faulty resources with previously unused resources to evolve a recovery

CGT-Pruned GA Design Evolve a new working design while avoiding suspect resources