anirban chaudhuri, garrett waycaster, taiki matsumura, nathaniel price, raphael t. haftka

Prioritized Observation Uncertainty Quantification (POUQ) Anirban Chaudhuri

Framework for Quantification and Reliability Analysis for Layered Uncertainty using Optimization: NASA UQ ChallengeAnirban Chaudhuri, Garrett Waycaster, Taiki Matsumura, Nathaniel Price, Raphael T. Haftka

Structural and Multidisciplinary Optimization Group, University of Florida11NASA Problem DescriptionCombined aleatory and epistemic uncertaintyEpistemic uncertainty: 31 s (Sub-parameters)Aleatory uncertainty: 21 ps (Parameters)

2Intermediate VariablesConstraintsWorst case scenarioPerformance MetricsParametersDesign VariablesToy Problem3Function of G G1 = 5(- P1 + P2 - (P3 0.5))G2 = 0.7 P3

P1: ConstantP2: Normal distributionP3: Beta distributionNo Intermediate variables

w(p) = max(g1, g2) SymbolCategoryUncertainty ModelTrue Valuep1II=[0, 1]p1=0.5p2IIINormal, -2E[p2]1, 0.5V[p2]1.1 E[p2]=0, V[p2]=1p3IIIBeta, 0.6 E[p3] 0.8, 0.02 V[p3] 0.04E[p3]=0.7, V[p3]=0.03True distribution of G1Task A: Uncertainty Characterization4Assumption and ApproachesAssumptionThe distribution of each uncertain parameter is modeled as a uniform distribution.

ApproachesBayesian based approachCDF matching approachPrioritized Observation UQ approach5Bayesian Based ApproachUncertainty models are updated by Bayesian inference

Marginal distribution of each parameter i is obtained by integration

Each marginal distribution (posterior) is obtained by Markov Chain Monte Carlo (MCMC) method as a sample distribution.6

: set of uncertain parametersx1,obs: 1st set of observationsP(): prior distributionL(|x1,obs): likelihood functionf(|x1,obs): posterior distribution

CDF Matching Approach7

eCDF using given observationsGreenwoods formula is used7CDF Matching Approach8

DIRECT optimizer is used (Finkel et al.)Prioritized Observation UQBoth performance metrics measure risk.Refining the UQ based on amount of risk attached to an observation.Similar strategy as the CDF matching method except the objective function is weighted modified KS statistic.9

WR is weight of the observation according to the risk associated with it.Could be decided based on the J2 value.Implementation is very expensive in order to find J2 using Monte Carlo.Exploring Importance sampling or surrogate based strategies for future work.

Posterior distributions updated by using 20 observations20 observations of G1 are used.

Initially, the mean and variance P2 are most uncertain (wider ranges).

While MCMC reduced the range of mean and variance of P2, the ranges of other parameters remain. makes sense!

Toy Problem Results:Posterior Distributions using Bayesian approach10True ValueMCMC found that the correlation coefficients are almost zero for most of the combinations of the sub-parameters.10Toy Problem Results:Reduced bounds using CDF matching11Epistemic uncertainty parameterTrue ValueGiven PriorReduced Bounds using 20 observations (Median)Reduction in median range (% of prior range)p10.5[0, 1][0.0556, 0.9444]11.1%p20[-2, 1][-0.5, 0.5]66.7%2p21[0.5, 1.1][0.7663, 1.0988]44.6%E[p3]0.7[0.6, 0.8][0.6037, 0.7963]3.7%V[p3]0.03[0.02. 0.04][0.0204, 0.0396]3.7%Using 20 observations for G1.Maximum reduction in bounds for mean and variance of P2.Similar results as the Bayesian Approach.PriorBayesian approachCDF Matching approachPosterior by 5 observationsPosterior by 20 observationsPosterior by 5 observationsPosterior by 20 observationsK-S test rejection percentage60.8%69%0.7%85.3%13.8%Toy Problem Results:Effects of Number of Observations12Rejection rate is substantially reduced by both approaches when 20 observations are used.Create eCDF using 1000 samplesPerform KS test to see if the hypothesis that the CDF is same as eCDF of the given all 20 observations is rejected.How many thetas from MCMC?12NASA Problem Results:Posterior Distribution using Bayesian approach13Posterior by first 25 observations

Posterior by all 50 observations(Correlation coefficients are almost zero)

13NASA Problem Results: Reduced bounds using CDF matching14SymbolGiven PriorUncertainty ModelReduction in range (% of prior range)E[p1][0.6, 0.8][0.6012, 0.7444]28.4%V[p1][0.02, 0.04][0.0209, 0.0344]32.1%p2[0, 1][0.1173, 0.9983]12.4%E[p4] [-5, 5][-4.8148, 4.4444]7.4%V[p4][0.0025, 4][0.0765, 3.9589]2.9%E[p5][-5, 5][-4.4444, 0]55.6%V[p5][0.0025, 4][0.6688, 3.7779]22.2%[-1, 1][-0.6914, 0.8889]21%Using first 25 observationsUsing all 50 observationsSymbolGiven PriorUncertainty ModelReduction in range (% of prior range)E[p1][0.6, 0.8][0.6267, 0.7667]30%V[p1][0.02, 0.04][0.0231, 0.04]15.6%p2[0, 1][0.1296, 0.9979]13.2%E[p4] [-5, 5][-4.8148, 3.2922]18.9%V[p4][0.0025, 4][0.1423, 3.9260]5.3%E[p5][-5, 5][-4.4444, 0.0412]55.1%V[p5][0.0025, 4][1.5571, 3.9424]40.3%[-1, 1][-0.6667, 0.8916]22.1%PriorBayesian approachCDF Matching approachPosterior by 25 observationsPosterior by 50 observationsPosterior by 25 observationsPosterior by 50 observationsK-S test rejection percentage62.4%51.8%37.6%41.9%30.8%NASA Problem Results:Effects of Number of Observations15Rejection rate is reduced by both approaches as compared to the prior.Create eCDF using 1000 samplesPerform KS test to see if the hypothesis that the CDF is same as eCDF of the given all 50 observations is rejected.Task B: Sensitivity Analysis16Primary objectives17Effect of reduced sub-parameter bounds on intermediate variable uncertainty

Fix parameter values without error in intermediate variables

Effect of reduced bounds on range of values of interest, J1 and J2

Fix parameter values without error in range of J1 or J2Empirical estimate of p-box of intermediate variable, x.Using double-loop Monte Carlo simulation.Sample sub-parameter values within the bounds and subsequent parameter realizations.Reduce the range of sub-parameters by 25% and repeat the above process.Reduce upper bound.Increase lower bound.Centered reduction.Intermediate Variable Sensitivity18Sensitivity analysis based on changes in the bounds of a variable, rather than its value.Average change in the area of the p-box brought about by these three reductions is a measure of sensitivity of these bounds.

AinitialArevisedJ1 and J2 Range Sensitivity19-For J1 and J2, we use the range of values from Monte Carlo simulation

-Surrogate models are used to reduce computation of J1 and J2

-Parameters are ranked based on each parameters sensitivity on J1 and J2 using a rank sum score~8% is the values of reasonable NRMSEThese surrogates are fitted using 5th order polynomial response surfacesUsing the normalized root mean square error as a measure of goodness of fit, we continue to add test points until each of the fits converge to a reasonable NRMSE

19Fixing Parameter Values20We use DIRECT global optimization to maximize the remaining uncertainty (either p-box area or J1/J2 range) while fixing a single parameter.

We generate an initial large random sample of all parameters and replace one parameter with a constant.

We fix parameters where the optimized uncertainty measure is close to the initial valueWe use an aggressive threshold in order to fix as many parameters as possible, thereby reducing the computational burden of our Monte Carlo simulation20Toy Problem Results21Monte Carlo simulation introduces some error, as should be expected.

Able to accurately rank the sensitivities of each of the parameter bounds, and suggests fixing unimportant parameters at reasonable values.Percent ChangeEffect on J1Effect on J2Parameter 143.0%21.8%Parameter 286.3%44.1%Parameter 385.8%44.2%G1 = 5(- P1 + P2 - (P3 0.5))G2 = 0.7 P3

P1: Constant P2: Normal distribution P3: Beta distributionWe have to show which one is sensitive in the true sense.I am not sure how I see that the unimportant parameters are fixed at reasonable values?21NASA Problem Results: Revised uncertainty model22ParameterPercent Change in J1Percent Change in J2140%88%613%48%106%53%1219%47%1625%64%1833%89%2024%55%2151%115%Initial intermediate variable analysis: We are able to fix nine parameters: 2, 4, 5, 7, 8, 13, 14, 15, and 17.Based on their expected impact on both J1 and J2, we select revised models for parameters 1, 16, 18, and 21.I am not sure we need the absolute value of changes in J1 and J2.22Tasks C & D: Uncertainty Propagation & Extreme Case Analysis23Primary objectivesUncertainty PropagationFind the range of J1 and J2

Extreme Case AnalysisFind the epistemic realizations that yield extreme J1 and J2 valuesFind a few representative realizations of x leading to J2 > 0

24Double Loop Sampling (DLS)Double Loop Monte Carlo Sampling (DLS)Parameter Loop samples sub-parameters (epistemic uncertainty)31 distribution parametersProbability Loop samples parameters (aleatory uncertainty)17 parameters (ps)Challenges:Computationally expensive25Efficient Reliability Re-Analysis (ERR) (Importance Sampling Method)Full double loop MCS is infeasible.Black box function g = f(x,dbaseline) is computationally expensive

Instead of re-evaluating the constraints at each epistemic realization we weigh existing points based on likelihood.Not importance sampling in traditional sense (i.e. No important region).

How do we handle fixed but unknown constants that lie within given interval?Generate initial p samples over entire range, [0,1]Use narrow normal distribution as true pdfpi ~ N(i,0.25i)26[1] Farizal, F., and Efstratios Nikolaidis. "Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis."System2013: 10-17.Optimized Parameter Sampling27Stochastic MCS based objective function is replaced with a deterministic function27Validation of ERR Method on Toy ProblemThis shows that ERR method performed well when compared to the more expensive DLS method for the toy problem.Method usedDLSERROptimization with ERRRange of J1[-0.10, 4.71][-0.11, 4.50] [-0.10, 5.22]Range of J2[0.23, 0.94] [0.13, 0.92] [0.11,1]An MCS was performed using the epistemic realizations from the optimization.[-0.10, 4.82][0.22, 0.96]

Just used fmincon here as it is a simple problem.28Results of DLS for NASA problemResults show a significant reduction in range of J1Can we trust these results with such a small sample size?29It was only possible to use a small number of samples due to computational time required400 samples of the epistemic uncertainty1,000 samples of the aleatory uncertainty Results of ERR method: NASA problemERR results didnt correspond very well with the DLS results.30MethodRange of J1Range of J2DLS[0.02, 5.19][0.08, 0.70]ERR[0, 2.72][0, 1]DLS with x to g 5th order PRS Surrogate[11.01, 33.26][0.36, 0.78]Limitations of current Importance sampling based approachGood agreement with double loop sampling results for the toy problem but not for NASA problem.

Hypothesized that poor performance of the importance sampling based approach is due to:Difficulty in creating initial set of samples with good coverage in 21 dimensional space (limited samples).Fixed but unknown constant parameters that were modeled using narrow normal distributions.

Possible fix:Dimensionality reduction by fixing the parameters through sensitivity analysis.Use of surrogates to reduce computational time.31SummaryUncertainty Quantification using a given set of samples was successfully performed using a Bayesian approach and a CDF matching approach.

P-box / reduction in range was used as the criterion to decided the sensitivity of the parameters.

An importance sampling based approach was utilized for uncertainty propagation and extreme case analysis.

A simpler toy problem was used validate all our methods, increasing our confidence in the methods.32Thank You

Questions??33Back-Up Slides34Reduced bounds using CDF matchingRepeated the process 50 times.35Epistemic uncertainty parameterTrue ValueGiven PriorReduced Bounds using 5 observations (Median)Reduction in median range (% of prior range)Standard deviation of lower boundStandard deviation of upper boundp10.5[0, 1][0.0556, 0.9444]11.1%0.04250.0201p20[-2, 1][-0.5, 0.5]66.7%0.14100.00782p21[0.5, 1.1][0.7663, 1.0988]44.6%0.13820.0586E[p3]0.7[0.6, 0.8][0.6037, 0.7963]3.7%0.00320.0073V[p3]0.03[0.02. 0.04][0.0204, 0.0396]3.7%0.00140.0007

MCMC Implementation (Backup Slide)36Metropolis MCMC is used20 MCMC runs (m=20) - Different starting points * - 10,000 posterior samples (2n=10,000) - First 5000 samples are discarded for accuracy

Proposal distribution* is a normal distribution with the standard deviation of 10% of the prior range.

1000 random samples* is generated to construct an empirical PDF of G1 to calculate the likelihood

Likelihood (empirical PDF) is calculated by the kernel density estimation - MATLAB ksdensity

* Sources of noise in outputMCMC Convergence (Backup Slide)37Potential scale reduction factorwhereCorrelations between sub-parametersE[p1]V[p1]p2E[p4]V[p4]E[p5]V[p5]E[p1]0.020.03-0.02-0.030.08-0.03-0.06V[p1]0.050.080.000.020.060.01p2-0.04-0.05-0.06-0.060.05E[p4]0.030.040.03V[p4]0.000.05-0.03E[p5]0.000.040.00V[p5]0.000.00-0.0238Task B Summary39Evaluating sensitivity using p-box and range as a metric to quantify changes

Surrogate models are utilized to reduce the computational expense of the double loop simulation

Parameter values are fixed by optimizing the remaining uncertainty using DIRECT global optimization

Refined models are requesting based on the rank sum score of each parameter for both values of interest, J1 and J2

Though the Monte Carlo simulation and surrogate models introduce errors through approximation, our simple toy problem suggests this method is still adequate to provide rankings of parameter sensitivityDo you use surrogates for the toy problem?? (last point)39Other Methods That Were TriedP-box Convolution SamplingRequires replacing distributional p-box with free p-boxFailure Domain Bounding (Homothetic Deformations)NASA UQ toolbox for Matlab has steep learning curveTheoretical background is challengingReplacing x to g function with surrogatesRequires 8 surrogates (one for each constraint function) in 5 dimensional spaceExploration of functions indicates delta function type behavior that is difficult to fit with surrogateAttempts at creating PRS and Kriging surrogates results in poor accuracy 10 / 1040Importance Sampling FormulationWorst case requirement metric

Similarly, for probability of failure 4 / 1041Sampling DistributionsSampling Distributions19 ps are bounded between 0 and 1 (Beta, Uniform, or Constant).Uniform sampling distribution is used.

2 ps are normally distributed and possibly correlated.Samples must cover a large range.-5 E[pi] 51/400 V[pi] 4Uncorrelated multivariate normal distribution with mean of 0 and standard deviation of 4.5 is used.

8 constraint functions are evaluated for 1e6 realizations of p.4242Epistemic Realizations Corresponding to J1/J2 Extrema: Toy ProblemMin J1Max J1Min J2Max J20.600.071.000.03-2.001.00-1.431.000.521.100.500.500.800.600.800.600.030.020.020.02Given Uncertainty model44

J1J2Updated Uncertainty modelNASA Problem ERR error45Percent Error (%)Worst-Case Requirement Metric (J1)Probability of Failure (J2)Percent Error in Mean75%29%Percent Error at Max97%84%Percent Error at Min70%40%Max Percent Error4,110%5,670%Percent error between MCS estimates for J1 and J2 using 1,000 p samples and ERR estimates using 1e6 initial samples

anirban chaudhuri, garrett waycaster, taiki matsumura, nathaniel price, raphael t. haftka

Documents

sample distribution

uniform distribution

layered uncertainty

variance p2

variance of p2

cdf matching method

posterior distributions

true distribution of