7 july 2008 sustainable fault-handling of reconfigurable logic using throughput-driven assessment...
Post on 15-Jan-2016
217 views
TRANSCRIPT
1
7 July 20087 July 2008
Sustainable Fault-HandlingSustainable Fault-Handlingof Reconfigurable Logic using of Reconfigurable Logic using
Throughput-Driven AssessmentThroughput-Driven Assessment
Carthik Anand SharmaCarthik Anand SharmaUniversity of Central FloridaUniversity of Central Florida
Carthik Anand SharmaCarthik Anand SharmaUniversity of Central FloridaUniversity of Central Florida
2
MotivationMotivation
• Mission-critical Embedded Systems require high Mission-critical Embedded Systems require high reliability and availabilityreliability and availability
• Characteristics of Operating Environment may Characteristics of Operating Environment may induce hardware failures:induce hardware failures: Aging, Manufacturing Defects, …etc.
• System Reliability:System Reliability: Fault Avoidance. “Always Possible?”… No Design Margin. “Always Adequate?”… No Modular Redundancy. “Always Recoverable?”…No Fault Refurbishment. “Highly Flexible?” … Yes … but
technically challenging to achieve
3
Technical Objective:Technical Objective:Autonomous FPGA Regeneration
Redundancy
increases with amount of spare capacity
restricted at design-time
based on time required to select spare resource
determined by adequacy of spares available (?)
yes
Regeneration
weakly-related to number
recovery capacity
variable at recovery-time
based on time required to find suitable recovery
affected by multiple characteristics (+ or -)
yes
Overhead from Unutilized Spares weight, size, power
Granularity of Fault Coverage resolution where fault handled
Fault-Resolution Latency availability via downtime required to handle fault
Quality of Repair likelihood and completeness
Autonomous Operation recover without outside intervention
Increased availability without pre-configured spares …
everyday example spare tire can of fix-a-flat
NASA Moon, Mars, and Beyond:
Realize 10’s years service life ???
Reconfiguration allows new fault-handling paradigm
4
Reprogrammable Device Failure
Duration:
Target:
Detection:
Isolation:
Diagnosis:
Recovery:
Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD
Repetitive Readback [Wells00]
DeviceConfiguration
Approach: TMR(conventional
spatial redundancy)
BIST
Processing Datapath
DeviceConfiguration
Processing Datapath
Evolutionary
Bitwise Comparison
Invert BitValue
IgnoreDiscrepancy
MajorityVote
STARS[Abramovici01]
SupplementaryTestbench
CartesianIntersection
Worst-caseClock Period
Dilation
Replicate inSpare Resource
Characteristics
MethodsCED
[McCluskey04]
Duplex Output
Comparison
Fast Run-time Location
Select SpareResource
Sussex[Vigander01]
DuplexOutput
Comparison
(not addressed)
(not addressed)
unnecessary unnecessary
unnecessary
Population-basedGA using
Extrinsic FitnessEvaluation
EvolutionaryAlgorithm usingIntrinsic Fitness
Evaluation
Fault-Handling Techniques for Fault-Handling Techniques for SRAM-based FPGAsSRAM-based FPGAs
CRR
5
ContributionsContributions
• Strategy for Integrating all phases of Fault Handling processStrategy for Integrating all phases of Fault Handling process detection, isolation, diagnosis and recovery work in synergy
• Elimination of Additional Test VectorsElimination of Additional Test Vectors enables detection and isolation with minimal system downtime
• Autonomous Group Testing techniques for FPGA devicesAutonomous Group Testing techniques for FPGA devices isolates faults in FPGA while maintaining system performance
• Competitive Runtime ReconfigurationCompetitive Runtime Reconfiguration leverages iterative pairwise comparison and functional
regeneration to provide adaptive refurbishment
with resource recycling
6
Previous WorkPrevious Work Detection Characteristics of FPGA Fault-Handling SchemesDetection Characteristics of FPGA Fault-Handling Schemes
Fault Detection
Resource Coverage
Fault Isolation
Approach Fault Handling Method Latency Distinguish Transients
Logic Inter-
connect Comparator Granularity
TMR Spatial voting Negligible No Yes Yes No Voting element
[Vigander01] Spatial voting & offline
evolutionary regeneration
Negligible No Yes No No Voting element
[Lohn, Larchev, DeMara03]
Offline evolutionary regeneration
Negligible No Yes Yes No Unnecessary
[Lach98] Static-capability tile
reconfiguration Relies on independent fault detection mechanism
STARS [Abramovici01]
Roving Test Area Up to 8.5M
erroneous outputs Test pattern transients
Yes Yes No LUT function
[Keymeulen, Stoica,
Zebulum00]
Population-based fault insensitive design
Design-time prevention emphasis
No Yes Yes No Not addressed
at runtime
Competitive Runtime
Reconfiguration (CRR)
Competing configurations with temporal voting and online regeneration
Negligible
Transients are
attenuated automatically
Yes Yes Yes
Unnecessary, but can isolate
functional components
StrategiesStrategies: 1) Evolve redundancy into design before anticipated failure 2) Redesign after detection of failure 3) Combine desirable aspects of both strategies 1) + 2) …
7
Group Testing AlgorithmsGroup Testing Algorithms
• Origin – World War II Blood testingOrigin – World War II Blood testing ProblemProblem: Test samples from millions of new : Test samples from millions of new
recruitsrecruits SolutionSolution: Test blocks of sample before testing : Test blocks of sample before testing
individual samplesindividual samples• Problem DefinitionProblem Definition
Identify subset Identify subset QQ of defectives from set of defectives from set PP Minimize numberMinimize number of tests of tests Test Test v-subsetsv-subsets of of PP Form suitable blocksForm suitable blocks
8
CRR Arrangement in SRAM FPGACRR Arrangement in SRAM FPGA
Configurations in PopulationConfigurations in Population• C = CL CR
• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2
Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:
• Z(Ci) is FPGA data throughput output of configuration Ci
• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual
• Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair
Othewise
CZCZCC
Ri
LiR
iLi
)()(
1
0
Reconfiguration Algorithm
`
SR A M-based FPGA
LHalf-Configuration
Discrepancy Check L Discrepancy Check R
Function Logic L
CONFIGURATION BIT STREAM
INPUT DATA
Function Logic R
DATA OUTPUT
FEE
DB
AC
K
RHalf-Configuration
CONTROL
OFF
-CH
IP E
EPR
OM
( NO
TE: a
non
-vol
atile
mem
ory
is a
lread
y re
quire
d to
boo
t any
SR
AMFP
GA
from
col
d st
art .
.. th
is is
not
an
addi
tiona
l chi
p )
Rji
Ljii CEORC ,,j =RS:
(Hamming Distance)
Rji
Ljii CEORC ,,j ^ =WTA:
(Equivalence)
9
1.1. InitializationInitialization Population P of functionally-identical yet physically-distinct configurations Partition P into sub-populations that use supersets of physically-distinct resources,
e.g. size |P|/2 to designate physical FPGA left-half or right-half resource utilization
2.2. Fitness AssessmentFitness Assessment Discrepancy Operator is some function of
bitwise agreement between each half’s output
Four Fitness States defined for Configurations as
{CP,CS,CU,CR} with transitions, respectively:
Pristine Suspect Under Repair Refurbished
Fitness Evaluation Window W determines comparison interval
3.3. RegenerationRegeneration Genetic Operators used to recover from fault based on Reintroduction Rate
Operators only applied once then offspring returned to “service” without for concern about increasing fitness
Sketch of CRR ApproachSketch of CRR ApproachPremise: Recovery Complexity << Design Complexity
fitness assessment viafitness assessment via
pairwise discrepancypairwise discrepancy (temporal voting vs. (temporal voting vs.
spatial voting)spatial voting)
10
FPGA Genetic RepresentationsFPGA Genetic Representations
• Chromosome Goals: Allow all possible LUT configurations Allow all possible CLB interconnections given constraints of routing support Disallow illegal FPGA configurations and non-coding introns (junk DNA) Facilitate crossover operator
• Bitstring representation is natural choice, though may not scale well (investigating generative reps)
• Representation shown here is sample specific to Xilinx Virtex FPGA
LUT 0 BITS
R-CLB = REMOTE CLB
R-LUTR-CLB
R-LUT = REMOTE LUT
R-LUTR-CLBLUT 0 INPUTS
R-LUTR-CLB R-LUTR-CLBLUT 3 INPUTS
LUT 3 BITS CLB 0 CLB 1
CLB 0
LUT0
LUT1
LUT2
LUT3
CLB 1 CLB n
LUT
0
LUT1
LUT2
LUT3
LUT0
LUT1
LUT2
LUT3
11
graceful degredationvia ranking of alternatives
Evolutionary Computation strategies effective for more than just repair phase: continually detect, rank, and isolate faults entirely within the underlying data throughput flow
Competitive Runtime Reconfiguration Competitive Runtime Reconfiguration (CRR)(CRR)
no test
vectors
diverse alternatives working
a-priori
fault detection by robust consensus
over time device remains
online during repair
no reconfiguration when fault-free
fault isolation is model-free and self-calibrating
completely-repaired
criteria can be ignored
performance readily adjustable
novel fitness novel fitness assessment assessment via via pairwise pairwise discrepancydiscrepancy without any without any
pre-conceived pre-conceived oracle for oracle for
correctness correctness (emergent (emergent behavior)behavior)
ConceptualConceptualInnovationInnovation
Reconfiguration Algorithm
`
SR A M-based FPGA
LHalf-Configuration
Discrepancy Check L Discrepancy Check R
Function Logic L
CONFIGURATION BIT STREAM
INPUT DATA
Function Logic R
DATA OUTPUT
FE
ED
BA
CK
RHalf-Configuration
CONTROL
OF
F-C
HIP
EE
PR
OM
( N
OT
E:
a n
on
-vo
lati
le m
em
ory
is a
lre
ad
y r
eq
uir
ed
to
bo
ot
an
y S
RA
M
FP
GA
fro
m c
old
sta
rt .
.. t
his
is n
ot
an
ad
dit
ion
al c
hip
)
checking logic part of
individual hence also
competes for correctness
failures in population memory covered
Initialization Population partitioned into
functionally-identical yetphysically-distincthalf-configurations
Fitness Adjustment
update fitness of onlyL and R based ondetection results
either L's or R'sfitness < Repair
Threshold?
Selectionchoose
FPGA configuration(s)labeled L and R
Detectionapply functional inputs
to compute FPGAoutputs using L, R
Adjust Controlsdetection mode, overlap interval, ...
invoke
GeneticOperators only once
and only on L or R
L=R
L=R
PRIMARYLOOP
discrepancyfree
L, R results
NO
YES
is
12
Fitness Evaluation WindowFitness Evaluation Window
• Fitness Evaluation WindowFitness Evaluation Window: E denotes number of iterations used to evaluate fitness before the state of
an individual is determined
• Determination ofDetermination of E for 3x3 multiplierfor 3x3 multiplier 6 input pins articulating 26=64 possible inputs W should be selected so that all possible inputs appear More formally,
Let rand(X) return some xi X at random
Seek W : [ rand(X) ] = X with high probabilityi=1
W
1
112
.....1
12.....
1
1
121
121
m
K
m
KK
DKK
Pm
K
xK
PK
PK
KP
K
K
KxK
xK
xK
Kx
K
K• xK = distinct orderings of K inputs showing in D trials
• if D constant, can calculate Pk>1 successively
• probability PK of K inputs showing after D trials is ratio of xK / KD
13
When K=64:
E DeterminationE Determination
14
Integer Multiplier Case StudyInteger Multiplier Case Study
• 3bit x 3bit unsigned multiplier3bit x 3bit unsigned multiplier automated design:esign:– Building blocks
Half-Adder: 18 templates created Full-Adder: 24 templates Parallel-And : 1 template created
– Randomly select templates for instantiation in modules
GA operatorsGA operatorsExternal-Module-CrossoverInternal-Module-Crossover Internal-Module-Mutation
GA parametersGA parametersPopulation size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit
Experimental EvaluationExperimental EvaluationXilinx Virtex II Pro on Avnet PCI board • Objective fitness function replaced by Objective fitness function replaced by
the Consensus-based Evaluation the Consensus-based Evaluation Approach and Relative FitnessApproach and Relative Fitness
• Elimination of additional test vectorsElimination of additional test vectors• Temporal Assessment processTemporal Assessment process
Experiments Demonstrate …Experiments Demonstrate …
15
Regeneration PerformanceRegeneration Performance
Difference (vs. Hamming Distance)Evaluation Window, Ew = 600Suspect Threshold: S = 1-6/600=99%Repair Threshold: R = 1-4/600 = 99.3%Re-introduction rate: r = 0.1
ParametersParameters:
Repairs evolvedRepairs evolved in-situ, in real-time, without additional test in-situ, in real-time, without additional test vectors, vectors, while allowing device to remainwhile allowing device to remain partially online. partially online.
Exp. Number
Fault LocationFailure Type
Correctness afterFault
TotalIterations
DiscrepantIterations
Repair Iteration
s
Final Correctness
Throughput(%)
1 CLB3,LUT0,Input1 Stuck-at-1 52 / 64 1.7 107 4.2 105 1194 64 / 64 97.7
2 CLB6,LUT0,Input1 Stuck-at-0 33 / 64 8.0 105 1.7 104 47 64 / 64 97.9
3 CLB5,LUT2,Input0 Stuck-at-1 22 / 64 3.1 106 6.8 104 193 64 / 64 97.8
4 CLB7,LUT2,Input0 Stuck-at-0 38 / 64 8.1 106 1.8 105 513 64 / 64 97.7
5 CLB9,LUT0,Input1 Stuck-at-0 40 / 64 2.3 106 7.1 104 219 64 / 64 96.9
Average 32.6 / 64 6.4 106 1.5 105 433 64 / 64 97.6
System Throughput during Regeneration for a 3x3 multiplier
16
Isolation Problem OutlineIsolation Problem Outline
ObjectivesObjectives Locate faulty logic and/or interconnect resource: a single stuck-at fault Locate faulty logic and/or interconnect resource: a single stuck-at fault
model is assumedmodel is assumed Online Fault Isolation: device not entirely removed from serviceOnline Fault Isolation: device not entirely removed from service
FeaturesFeatures Runtime Reconfiguration: FPGA resources configured dynamicallyRuntime Reconfiguration: FPGA resources configured dynamically Utilize Runtime Inputs: avoid special test-vectors, improve availabilityUtilize Runtime Inputs: avoid special test-vectors, improve availability
Constraints Constraints Use pre-designed configurations: defined by target applicationUse pre-designed configurations: defined by target application Subsets under test have constant resource utilization range for a given Subsets under test have constant resource utilization range for a given
isolation problemisolation problem Resource grouping influences fault articulation: resource-mapping and Resource grouping influences fault articulation: resource-mapping and
input vector might mask hardware faultsinput vector might mask hardware faults Do not use specialized “block designs”Do not use specialized “block designs” Runtime reconfiguration initially limited to column-swapping Runtime reconfiguration initially limited to column-swapping ““Non-reasonable” algorithm: “tests” may be repeated without gaining Non-reasonable” algorithm: “tests” may be repeated without gaining
new isolation informationnew isolation information
17
Discrepancy MirrorDiscrepancy Mirror
Fault CoverageFault Coverage
• Mechanism for Checking-the-Checker (“golden element” problem)
• Makes checker part of configuration that competes [DeMara PDPTA-05]
18
Influence of LUT utilizationInfluence of LUT utilization
Perpetually Articulating InputsPerpetually Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution
Intermittently Articulating InputsIntermittently Articulating Inputswith Equiprobable Distributionwith Equiprobable Distribution
• expected number of pairings grows sub-linearly in number of resources
• utilization below 20% or above 80% implicates (or exonerates) a smaller sub-set of resources
• 50% utilization, the expected number of pairings for 1,000, 10,000, and 100,000 resources are 11.1, 14.9, and 17.6
• at 90% utilization mean value of 258 pairings are required to isolate the faulty resource.
19
Fault Location Using DuelingFault Location Using Dueling
The set of all competing configurations is represented by S.
Set Ck represents the resources utilized by configuration k.
Each competing configuration k, 1 < k < |S| has a unique binary
Usage MatrixUsage Matrix Uk, 1 < k < p.
Elements Uk[i,j], 1 < i < m, 1 < j n, where m and n represent the rows and columns in the device layout respectively.
Elements Uk[i,j] = 1 denote the usage of resource (i, j) by Ck.
The History MatrixHistory Matrix H, with elements H[i,j] 1 < i < m, 1 < j < n, is an integer matrix used to represent the relative fitness of individual resources.
H[i,j] provides instantaneous relative fitness values of resources.
20
Dueling ExampleDueling Example
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 0 0 00 0 0 0 0 1 0 1 0 00 0 0 1 0 0 0 0 0 00 0 1 0 0 1 1 0 0 00 0 0 0 1 0 0 0 0 00 0 1 0 0 0 0 1 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 00 0 0 1 0 1 1 0 0 00 0 1 1 0 0 1 0 0 00 0 1 0 1 0 0 0 0 00 0 1 0 0 1 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
0 0 2 1 0 0 1 0 0 0
0 0 1 0 1 1 0 1 0 0
0 0 1 1 0 1 0 0 0 0
0 0 1 0 0 1 1 0 0 0
0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
H H [i,j][i,j]@ t = 0
H H [i,j][i,j]@ t = 2
UU11 UU22
• H H [i,j] changes after [i,j] changes after CC1 1 andand C C2 2 are loadedare loaded
• UU11 and and UU22 are corresponding are corresponding Usage MatricesUsage Matrices
• (3,3) is identified as the faulty resource(3,3) is identified as the faulty resource
21
Isolation Progress without HalvingIsolation Progress without Halving
0 5 10 15 20 25 30
100
1000
10000
100
1000
10000
Nu
mb
er
of S
usp
ect
ed
Fa
ulty
Ele
me
nts
(lo
g)
Number of Iterations
Without HalvingWithout Halving
• Initially |S| = 20,000
• Resource Utilization = 40%
• Number of suspected faulty elements constant at 36 after 23 iterations
• No subsequent improvement due to lack of differentiating information between competing configurations
Temporary stasis in isolation due to insufficient design
diversity
22
0 5 10 15 20 25
100
1000
10000
Nu
mb
er
of S
usp
ect
ed
Fa
ulty
Ele
me
nts
(lo
g)
Number of Iterations
Dueling with Modified HalvingDueling with Modified Halving
Dueling with HalvingDueling with Halving
• Halving works by Halving works by swapping half the used swapping half the used columns with unused onescolumns with unused ones • Halving progressively Halving progressively reduces the size of the set of reduces the size of the set of suspected faulty elementssuspected faulty elements
• Isolation proceeds till a Isolation proceeds till a single faulty element is single faulty element is isolatedisolated
• Fault isolated after 19 Fault isolated after 19 iterationsiterations
Symptoms of stasis invoke
halving procedure for fast isolation
23
Enhancing Embedded Core BIST Enhancing Embedded Core BIST using Group Testingusing Group Testing
BIST Structure Used for Embedded Core Testing
XCVLX30 device - 32 DSP48E Cores divided into n = 8 groups
8 x 6 2x1 multiplexers are needed.
6 columns of Comparators, each Column has 8 Comparators
Comparators kn(i,j), 0 i,j 3, ij complete test for a group of 4
Flipflops FF0 through FF5 register comparison results for each group
Fault diagnosis script processes result of each set of 6 outputs
24
Embedded Core BIST using Embedded Core BIST using Group Testing – Resource UtilizationGroup Testing – Resource Utilization
Faults in up to 2 BUTs in each group of 4 can be isolated
Isolation is achieved without device reconfiguration in a single stage
25
Logic Element Isolation Using Logic Element Isolation Using Autonomous Group Testing (AGT)Autonomous Group Testing (AGT)
Ind1
N LUTs
Ind2 Ind3 Ind4 Ind R
M LUTs M LUTs M LUTsM LUTs M LUTs
In each stage, suspect resources S are equally sharedamong pstage individualsIf S = Smax then mutually exclusive shares are possible, else,nshare = nreqd - |R| - |S| are shared
26
Equal Share StrategyEqual Share Strategy
27
Fault Isolation Using FIATFault Isolation Using FIAT
Fault Insertion and Analysis Toolkit (FIAT) • provides methods to modify Xilinx FPGA configurations• inserts suck-at-faults at LUT inputs• precludes need to edit configuration bitstream• works in conjunction with Xilinx ISE software (COTS design suite)
28
AGT ExperimentsAGT Experiments
• Experimental SetupExperimental Setup DES-56 encryption circuitDES-56 encryption circuit Xilinx ISE design tools to place and route the designXilinx ISE design tools to place and route the design Virtex II Pro FPGA deviceVirtex II Pro FPGA device Fault Injection and Analysis Toolkit (FIAT)Fault Injection and Analysis Toolkit (FIAT)
Application Programmer Interfaces (APIs)Application Programmer Interfaces (APIs) to interact with to interact with the Xilinx ISE tools to inject and evaluate faultsthe Xilinx ISE tools to inject and evaluate faults
Editing the design file rather than the configuration Editing the design file rather than the configuration bitstreamsbitstreams to introduce stuck-at-faultsto introduce stuck-at-faults
Editing Editing User Constraint Files (UCF) User Constraint Files (UCF) to control resource to control resource usageusage
29
AGT – Isolation ProgressAGT – Isolation Progress
30
AGT – Maintaining GoodputAGT – Maintaining Goodput
With ppreset = 5, goodput is maintained at > 90%
Since goodput remains high, the rate of fault isolation is slower, with better-performing individuals selected to maintain Goodput
Fault detection latency is minimal as compared to STARs, isolation is achieved with manageable system performance degradation
31
ConclusionConclusion
• Graceful Performance DegradationGraceful Performance Degradation elimination of additional test vectorselimination of additional test vectors temporal assessment using aging and outlier detectiontemporal assessment using aging and outlier detection resource recycling to utilize residual functionalityresource recycling to utilize residual functionality
• Population-Centric AssessmentPopulation-Centric Assessment Provides adaptability and self-calibrating autonomy with a relative assessment methodProvides adaptability and self-calibrating autonomy with a relative assessment method fitness assessment using population information and competitionfitness assessment using population information and competition create a fully functional solution using partially-fit individualscreate a fully functional solution using partially-fit individuals
• Autonomous Group TestingAutonomous Group Testing Minimal latency fault detectionMinimal latency fault detection Fault isolation without additional test vectorsFault isolation without additional test vectors Efficient strategies for fast fault isolation with minimal reconfigurationEfficient strategies for fast fault isolation with minimal reconfiguration Fast first-responder to faults via resource trackingFast first-responder to faults via resource tracking
• Run-time Fault Management Run-time Fault Management Can be realized using consensus-driven assessment methods, and using information Can be realized using consensus-driven assessment methods, and using information
contained in the populationcontained in the population Integrate Detection, Isolation, Repair under a single Population-based techniqueIntegrate Detection, Isolation, Repair under a single Population-based technique
32
Future WorkFuture Work
• Evolvable Sequential Logic CircuitsEvolvable Sequential Logic Circuits Fitness assessment is a major challenge for large circuitsFitness assessment is a major challenge for large circuits
• Logic and Interconnect fault handlingLogic and Interconnect fault handling Need to integrate fault handling methods for faults in logic and the interconnectsNeed to integrate fault handling methods for faults in logic and the interconnects Extend group testing principles to interconnect faultsExtend group testing principles to interconnect faults
• Challenges in partial reconfigurationChallenges in partial reconfiguration Need well-tested and supported APIs for runtime reconfiguration of commercial Need well-tested and supported APIs for runtime reconfiguration of commercial
FPGAsFPGAs Open standards in partial reconfiguration will assist reliability studiesOpen standards in partial reconfiguration will assist reliability studies Decreased dependence on vendor-provided design tools with an open bitstream Decreased dependence on vendor-provided design tools with an open bitstream
structure is essentialstructure is essential FIAT can be used to study fault isolation properties of different approaches, and FIAT can be used to study fault isolation properties of different approaches, and
for evaluating other group testing algorithms for fault isolationfor evaluating other group testing algorithms for fault isolation• Extending AGT to other domainsExtending AGT to other domains
Group testing techniques presented here can adapted for fault tolerant nano-scale Group testing techniques presented here can adapted for fault tolerant nano-scale mechanism, software etcmechanism, software etc
Reliable, self-monitoring, self-adaptive Reliable, self-monitoring, self-adaptive organic organic systems are a need, with systems are a need, with increasing design complexity and computational capabilitiesincreasing design complexity and computational capabilities
33
PublicationsPublications
Michael Georgiopoulos , Ronald F. DeMara, Avelino J. Gonzalez, Annie S. Wu, Mansooreh Mollaghasemi, Erol Gelenbe, Marcella Kysilka, Jimmy Secretan, Carthik A. Sharma and Ayman J. Alnsour, “A Sustainable Model for Integrating Current Topics in Machine Learning Research into the Undergraduate Curriculum,” accepted to the IEEE Transactions in Education, July 2008.
A. Sarvi, C. A. Sharma and R. F. DeMara, “BIST-Based Group Testing for Diagnosis of Embedded FPGA Cores,” accepted to The 2008 International Conference on Embedded Systems and Applications, Las Vegas, Nevada, USA (July 14-17, 2008).
C. A. Sharma, R. F. DeMara and A. Sarvi, “Self-Healing Reconfigurable Logic using Autonomous Group Testing,” submitted to ACM Transactions on Autonomous and Adaptive Systems (TAAS) of Special Issue on Organic Computing May 2007.
R. F. DeMara, K. Zhang, C. A. Sharma, “Consensus-based Evolvable Hardware for Sustainable Fault Handling,” submitted to The IEEE Transactions in Evolutionary Computation Aug 2007.
R. N. Al-Haddad, C. A. Sharma, R. F. DeMara, “Performance Evaluation of Two Allocation Schemes for Combinatorial Group Testing Fault Isolation,” in Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms ERSA ‘07,, Las Vegas, Nevada, U.S.A, June 25 – 28, 2007.
R. S. Oreifej, C. A. Sharma, R. F. DeMara, “Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware,” in Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGAs (Reconfig’06), San Luis Potosi, Mexico, September 20-22, 2006, pp 106-113. [
C. A. Sharma, R. F. DeMara, “A Combinatorial Group Testing Method for FPGA Fault Location“, in Proceedings of the International Conference on Advances in Computer Science and Technology (ACST 2006) , Puerto Vallarta, Mexico, January 23 - 35, 2006.
C. J. Milliord, C. A. Sharma, R. F. DeMara, “Dynamic Voting Schemes to Enhance Evolutionary Repair in Reconfigurable Logic Devices,” in Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig’05), pp. 8.1.1 - 8.1.6, Puebla City, Mexico, September 28 - 30, 2005.
K. Zhang, R. F. DeMara, C. A. Sharma, “Consensus-based Evaluation for Fault Isolation and On-line Evolutionary Regeneration,” in Proceedings of the International Conference in Evolvable Systems (ICES’05), pp. 12 -24, Barcelona, Spain, September 12 - 14, 2005.
R. F. DeMara and C. A. Sharma, “Self-Checking Fault Detection using Discrepancy Mirrors,” in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), pp. 311-317, Las Vegas, Nevada, U.S.A, June 27 – 30, 2005.
34
Backup SlidesBackup Slides
• On following pages …
35
Isolation: Block Duelling
• Algorithm based on group testingAlgorithm based on group testing methodsmethods• Successive intersection to assess health of resourcesSuccessive intersection to assess health of resources
Each configuration kk has a binary Usage Matrix UUk[i,j][i,j] 1 i m and 1 j n m, n are the number of rows and columns of resources in the device Elements Uk[i,j] = 1 are resources used in k
History Matrix H H [i,j][i,j] 1 i m and 1 j n, initially all zero, exists in which : entries represent the fitness of resources (i, j) Information regarding the fitness of resources over time is stored
A discrepant output will lead to an increase in the value of
H[i,j], Uk[i,j] = 1 ,k S All elements of H, corresponding to resources used by discrepant
configuration will be incremented by one. At any point in time, H[i,j] will be a record the outcomes of competitions m successive intersections among are performed
until |S|=1
36
Isolation of a single faulty individual with 1-out-of-64 impact
• Outliers are identified after W iterations elapsed• E.V. = (1/64)*600 = 9.375 from minimum impact faulty individual• Isolated individual’s f differs from the average DV by 33 after 1 or more observation intervals of length W
37
Isolation of a single faulty L individual with 10-out-of-64 impact
• Compare with 1-out-of-64 fault impact E.V. of (10/64)*600 = 93.75 discrepancies for faulty configuration One isolation will be complete approx. once in every 93.75/5 = 19 Observation Intervals Fault Isolation demonstrated in 100% of case
38
Isolation of 8 faulty individuals L4&R4 with 1-out-of-64 impact
• Expected isolations do not occur approximately 40% of the time Average discrepancy value of the population is higher Outlier isolation difficult Multiple faulty individual, Discrepancies scattered
39
Online Dueling Evaluation
• ObjectiveObjective Isolate faults by successive intersection between sets of FPGA
resources used by configurations Analyze complexity of Isolation process
• VariablesVariables Total resources available
Measured in number of LUTs Number of Competing Configurations
Number of initial “Seed” designs in CRR process Degree of Articulation
Some inputs may not manifest faults, even if faulty resource used by individual
Resource Utilization Factor Percentage of FPGA resources required by target application/design
Number of Iterations for Isolation Measure of complexity and time involved in isolating fault
40
For further info … EH Websitehttp://cal.ucf.edu
41
Fast Reconfiguration for Fast Reconfiguration for Autonomously Reprogrammable LogicAutonomously Reprogrammable Logic
• MotivationMotivation– Dynamic reconfiguration required by application– Exploit architectural & performance improvements fully– Reconfiguration delay – a major performance barrier
• Previous WorkPrevious Work• MethodologyMethodology
– Multilayer Runtime Reconfiguration Architecture (MRRA)– Spatial Management
• Prototype DevelopmentPrototype Development – Loosely-Coupled solution– Timing Analysis – System-On-Chip solution
42
Reconfiguration Demand during CRRReconfiguration Demand during CRR
For a complete repairFor a complete repair – Approximately 2,000 generations ( ) may be required– For each generation, # evaluations may be up to 100 evaluations– Yielding the Cumulative Number of Reconfigurations (CNR) up to
– For each reconfiguration task
)()()( iTiTiTL EDRTTATi
CNR
iitot LL
1
Even if reconfiguration delay alone is assumed to be in the order of tens or hundreds of milliseconds Ltot >= 5.5 hours
– Therefore, the total delay
CRG
newO
000,20 newCR OG
43
Previous Work - Algorithm LevelPrevious Work - Algorithm Level
Approach MethodPartial
ReconfigSpatial
Relocation
Temporal
ParallelismArea
shapeRun-Time
Potential Limitations
Hauck, Li, Schwabe
Bit file compression
N/A No N/A N/A NoFull
reconfiguration required
Shirazi, Luk, Cheung
Identifying common
componentsYes No Yes N/A No
Design time work required
Mak, YoungDynamic
PartitioningYes No Yes N/A Yes
Only desirable for large designs
Ganesan, Vemuri
Pipelining Yes No Yes N/A YesLimited
pipeline depth
Compton, Li, Knol, Hauck
Relocation and Defragmentatio
n with new FPGA
architecture
Yes Yes No Row-based YesSpecial FPGA architecture
required
Diessel, Middendorf
Schmeck, Schmidt
Task Remapped and Relocated
Yes Yes No Rectangle YesOverhead for remapping
calculations
Herbert, Christoph,
Macro
Partitioning and 2D Hashing
Yes Yes Yes Rectangle YesRigid task modeling
assumptions
compression method temporal method spatial method
44
Multilayer Runtime Reconfiguration Architecture Multilayer Runtime Reconfiguration Architecture
(MRRA)(MRRA)
Fault-RepairGenetic Algorithm
ReconfigurationEngineM
icro
proc
esso
r
System Bus
Virtex-II ProFPGA RAM
Control S
ystem
• Develop MRRA fast Develop MRRA fast reconfiguration paradigm for the reconfiguration paradigm for the CRR approachCRR approach
• Validate with real hardware Validate with real hardware platform along with detailed platform along with detailed performance analysis performance analysis
• First general-purpose framework First general-purpose framework for a wide variety of applications for a wide variety of applications requiring dynamic reconfiguration requiring dynamic reconfiguration
• Extend existing theories on Extend existing theories on reconfiguration reconfiguration
45
Avnet FPGA Development Board
PCI I nt er f ace
Virtex-IIPro FPGA
Off ChipRAM
Controlhosted on
PC
FP
GA
Ou
tp
ut
Bit file
Input Data
Loosely Coupled SolutionLoosely Coupled Solution
The entire system operates on a The entire system operates on a 32-bit basis32-bit basis
The The Virtex-II ProVirtex-II Pro is mounted on a is mounted on a development board which can then development board which can then
be interfaced with a WorkStation be interfaced with a WorkStation running running XilinxXilinx EDK and ISE. EDK and ISE.
46
Result AssessmentResult Assessment
• Establish full functional framework of both prototypesEstablish full functional framework of both prototypes
• Communication overhead, throughput and overall speed-up Communication overhead, throughput and overall speed-up
analysisanalysis Communication overhead for SOC solution is decreased to micro or sub-
micro second order Vs. milliseconds order of Loosely Coupled solution
Up to 5-fold speedup is expected compared to the Loosely Coupled solution
• Translation Complexity AnalysisTranslation Complexity Analysis The quantity of information that needs to be translated to generate the
reconfiguration bitstream
Simplification from file level to bit level is expected
• Storage Complexity AnalysisStorage Complexity Analysis– The memory space required for the run-time algorithms– Decreased memory requirement is expected due to the translation
complexity improvement
47
Publications
AcceptedAccepted ManuscriptsManuscripts1. R. F. DeMara and K. Zhang, “Autonomous FPGA Fault Handling through Competitive Runtime
Reconfiguration,” to appear in NASA/DoD Conference on Evolvable Hardware(EH’05), Washington D.C., U.S.A., June 29 – July 1, 2005.
2. H. Tan and R. F. DeMara, “A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management,” to appear in International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005.
3. R. F. DeMara and C. A. Sharma, “Self-Checking Fault Detection using Discrepancy Mirrors,” to appear in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’05), Las Vegas, Nevada, U.S.A, June 27 – 30, 2005.
SubmittedSubmitted ManuscriptsManuscripts1. R. F. DeMara and K. Zhang, “Populational Fault Tolerance Analysis Under CRR Approach,”
submitted to International Conference on Evolvable Systems (ICES’05), Barcelona, Sept. 12 – 14, 2005.
2. R. F. DeMara and C. A. Sharma, “FPGA Fault Isolation and Refurbishment using Iterative Pairing,” submitted to IFIP VLSI-SOC Conference, Perth, W. Australia, October 17 – 19, 2005.
Manuscripts In-preparationManuscripts In-preparation 1. R. F. DeMara and K. Zhang, “Autonomous Fault Occlusion through Competitive Runtime
Reconfiguration,” submission planned to IEEE Transactions on Evolutionary Computation.
2. R. F. DeMara and C. A. Sharma, “Multilayer Dynamic Reconfiguration Supporting Heterogeneous FPGA Resource Management,” submission planned to IEEE Design and Test of Computers.
Field TestingField TestingImplementation of CRR on-board SRAM-based FPGA in a Cubesat mission
48
EHW Environments
• Evolvable Hardware (EHW) Environments enable experimental methods to research soft computing intelligent search techniques
• EHW operates by repetitive reprogramming of real-world physical devices using an iterative refinement process:
Genetic
Algorithm
Hardware in the loop
orTwo
modes
of
Evolvabl
e
Hardwar
e
Extrinsic Evolution
Genetic
Algorithm
software modelDone? Build it
device “design-time”refinement
Simulation in the loop
Intrinsic Evolution
device “run-time”refinement
new approach to
Autonomous Repair
of failed devices
Stardust Satellite: • >100 FPGAs onboard• hostile environment: radiation, thermal stress• How to achieve reliability to avoid mission failure???
Application
49
Genetic Algorithms (GAs)
Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics)
selection of
parents
population of candidate solutions
parents
offspring
crossover
mutation
evaluatefitness
ofindividuals
replacement
start
Fitnessfunction
Goal reached
50
Genetic Mechanisms
• Guided trial-and-error search techniques using principles of Darwinian evolution iterative selection, “survival of the fittest” genetic operators -- mutation, crossover, … implementor must define fitness function
• GAs frequently use strings of 1s and 0s to represent candidate solutions if 100101 is better than 010001 it will have more chance to breed and
influence future population
• GAs “cast a net” over entire solution space to find regions of high fitness
• Can invoke Elitism Operator (E=1, E=2 …) guarantees monotonically increasing fitness of best individual over all
generations
51
Commercial Applications: Nextel: frequency allocation for cellular phone networks -- $15M
predicted savings in NY market Pratt & Whitney: turbine engine design --- engineer: 8 weeks;
GA: 2 days w/3x improvement
International Truck: production scheduling improved by 90% in 5 plants
NASA: superior Jupiter trajectory optimization, antennas, FPGAs
Koza: 25 instances showing human-competitive performance such as analog circuit design, amplifiers, filters
GA Success Stories
52
Representing Candidate Solutions
IndividualIndividual(Chromosome)(Chromosome)
GENEGENE
Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values)
Example of Binary DNA Encoding:
53
Genetic Operators
t t + 1
mutation
recombination (crossover)
reproduction
selection
54
Crossover Operator
Population: . . .
1 1 1 1 1 1 1 0 0 0 0 0 0 0 parentscut cut
1 1 1 0 0 0 0 0 0 0 1 1 1 1 offspring
55
Procedural Flow under Competitive Runtime Reconfiguration
Initialization Population partitioned into
functionally-identical yetphysically-distincthalf-configurations
Fitness Adjustment
update fitness of onlyL and R based ondetection results
either L's or R'sfitness < Repair
Threshold?
Selectionchoose
FPGA configuration(s)labeled L and R
Detectionapply functional inputs
to compute FPGAoutputs using L, R
Adjust Controlsdetection mode, overlap interval, ...
invoke
GeneticOperators only once
and only on L or R
L=R
L=R
PRIMARYLOOP
discrepancyfree
L, R results
NO
YES
is
Integrates all fault handling stages using EC strategyIntegrates all fault handling stages using EC strategy Detects faults by the occurrence of discrepancy Isolates faults by accumulation of discrepancies Failure-specific refurbishment using Genetic Operators:
Intra-Module-Crossover, Inter-Module-Crossover, Intra-Module-Mutation
Realize online device refurbishmentRealize online device refurbishment Refurbished online without additional function or resource test vectors Repair during the normal data throughput process
56
Template Fault Coverage
Half-Adder Template A
Half-Adder Template B
Template ATemplate A– Gate3 is an AND gate– Will lose correctness if a Stuck-At-Zero fault occurs in second
input line of the Gate3, an AND gate
Template BTemplate B – Gate3 is a NOT gate and only uses the first input line– Will work correctly even if second input line is stuck at Zero or
One
Half-Adder Template A
57
Evolvable Hardware
Evolutionary Design:Evolutionary Design:• Start with available CLBs and IOBs• Implement a design using Genetic
Operators etc Limited or no ability to re-design to account for suspected faulty resources
Evolutionary Regeneration:Evolutionary Regeneration:• Start with an existing pool of designs
• Some existing configurations may use faulty resources
• Eliminate use of suspected faulty resources
• Genetic Operators can be applied to refurbish designs
58
Competitive Runtime Reconfiguration (CRR)Overview
• Uses a Relative Fitness MeasureUses a Relative Fitness Measure Pairwise discrepancy checking yields relative fitness measurePairwise discrepancy checking yields relative fitness measure Broad temporal consensus in the population used to determine Broad temporal consensus in the population used to determine
fitness metricfitness metric Transition between Transition between Fitness States Fitness States occurs in the populationoccurs in the population Provides graceful degradation in presence of changing Provides graceful degradation in presence of changing
environments, applications and inputs, since this is a moving environments, applications and inputs, since this is a moving measuremeasure
• Test Inputs = Normal Inputs for Data ThroughputTest Inputs = Normal Inputs for Data Throughput CBE does not utilizes additional functional nor resource test CBE does not utilizes additional functional nor resource test
vectorsvectors Potential for higher availability as regeneration is integrated Potential for higher availability as regeneration is integrated
with normal operationwith normal operation
59
Exploiting Population Information
• Population contains more robust information than individualsPopulation contains more robust information than individuals Utilize this information for robust fault detection, faster Utilize this information for robust fault detection, faster
regeneration, increased diversity for adaptationregeneration, increased diversity for adaptation• Detect Failure and Isolate Faulty ResourcesDetect Failure and Isolate Faulty Resources
Detect by inconsistencies among the populationDetect by inconsistencies among the population Isolate faults using outlier identification and agingIsolate faults using outlier identification and aging
• Realize RegenerationRealize Regeneration Recovery Complexity << Design ComplexityRecovery Complexity << Design Complexity
utilize diverse raw material during regeneration vs. isolated re-designutilize diverse raw material during regeneration vs. isolated re-design
Temporal consensus directs searchTemporal consensus directs search• Adaptable Performance based on Online InputsAdaptable Performance based on Online Inputs
The population evolves to changing physical environment, input The population evolves to changing physical environment, input vectors, and target application while increasing availabilityvectors, and target application while increasing availability
60
Selection Process
Any Pristineindividuals?
Any Suspectindividuals?
Select* one Pristine individualas L half-configuration
Choose random number X on [0..1]
X >Re-introduction
rate?
YES
YES
YES
NO
NO
NO
* = selection that favors inventory rotation
**= selection based on fitness ranking that favors correctness
*** = selection based on fitness ranking that favors correctness with optional second-order metric such as routing delay (to automatically evolve better throughput performance at no additional cost)
Select** one Suspect individualas L half-configuration
Select*** one Refurbished individualas L half-configuration
Select*** one Under Repairindividual as R half-configuration
Select one Operational (Pristine*,Suspect**, or Refurbished***)
individual as R half-configuration
gotoDetectionprocess
X > R
61
Fitness Adjustment Procedure
Discrepancy?
Increase L's & R's fitnessaccording to fitness up-adjustment process
Decrease L's & R 's fitnessaccording to fitness down-adjustment process
Isthe individualPristine?
Mark individual as Suspect
Is itsfitness < Repair
Threshold?
YES
YES
NO
YES
NO
YES
Mark individual as Under Repair
Invoke Genetic Operators only onceand only on L or RMark individual as Refurbished
Isindividual Under
Repair?
Is itsfitness > Operational
Threshold?
YES
adjust controls& goto Selection process
fL,R>fOT
fL,R<fRT
62
Discrepancy Mirror Circuit
Fault CoverageFault CoverageComponent Fault Scenarios Fault-Free
Function Output A Fault Correct Correct Correct Correct
Function Output B Correct Fault Correct Correct Correct
XNORA Disagree (0) Disagree (0) Fault : Disagree(0) Agree (1) Agree (1)
XNORB Disagree (0) Disagree (0) Agree (1) Fault : Disagree(0) Agree (1)
BufferA 0 0 High-Z 0 1
BufferB 0 0 0 High-Z 1
Match Output 0 0 0 0 1
63
CGT-Pruned GA Simulator
Settings
Truth Table
Seed Config.
Fitness Report
Best Config.
CGT
GA
If Repair
Resource Info
No. Of CLBs = ...No. LUTs = ...Pop. Size = … . . .
I1 I2 ... O1 O2 ...0 0 ... 0 0 0 ...0 0 ... 0 1 0 … . . .
CLB #:0LUT #:0FunctionType: ORLUT inputlineInputLine#0:4InputLine#1:3 . . .
Gen. Max Ave 2 154 142 3 155 139 . . .
CLB #:0LUT #:0FunctionType: XORLUT inputlineInputLine#0:0InputLine#1:5 . . .
64
Repair Progress