In Uncertainty Quantification, a user maps distributed, uncertain inputs through a non-linear simulation model and studies the relationship between those distributed inputs and the outputs, and in-between the outputs. This means that careful consideration must be taken as to the form of the variability applied to each input parameter varied as part of the study.
Choose your distributions wisely
The first question the analyst has to ask themselves is: “How do we know the form of the variability?” Is it the best case — backed by empirical data with small enough confidence intervals at the probability levels of interest (usually the tails — 1st percentile or less and 99th percentile or higher)? How do you know those confidence intervals are small enough?
If there isn’t enough empirical data for the parameter to justify a small family of probability models for that parameter, **DO NOT ASSUME THAT A UNIFORM DISTRIBUTION IS CONSERVATIVE**. Mapping through a non-linear model in combination with other uncertainty variables can concentrate failures so they are not at the extremes of a range of values. The Uniform distribution assumes that every point is equally likely. This may not be true in reality, and it could lie in the region that exacerbates outliers or non-behavioral “broken” cases (violation of requirements). The analyst is usually better off using the limits you would have chosen for the uniform distribution and treating that parameter as an epistemic variable, sampled periodically within the range. (See other blog entry for hybrid epistemic/aleatory studies)
For each parameter: ask yourself if the uncertainty model is properly bounded within physical limits… for example, does an infinite tail violate the laws of physics, materials sciences, geometry, or any other reasonable limitation at some value? How much probability is beyond that value in the region of non-physicality? If this is the case, the user should use a truncated model to ensure physicality of all sampled parameters. Depending on the type of variability, this can either be a one-sided or two-sided truncation. When truncating, ensure that your uncertainty model has a Cumulative Distribution Function (CDF) — both theoretical and empirical — which integrates to 1.0.
Not all parameters can be represented by neat/tidy mathematically convenient probabilistic models. If there is a crap-ton of data but it’s very messy (e.g. multi-modal, fat-tailed, etc. ) consider sampling a probability density histogram directly, with the caveat that it only represents data recorded to date and may be incomplete. Cases like this should almost always be used to justify taking more data to ensure variability form is appropriate, with the ultimate goal of understanding WHY the model is the way it is. Perhaps it needs to be broken into multiple simpler models combined parametrically, for example. This situation usually pokes at the underlying model form inadequately modeling all the phenomena necessary for accurate predictive modeling.
Consider variability of the model form
If there isn’t enough empirical data to converge on a closed-form probability, one should consider a *family* of probability models that could best fit what limited data is available, and do sensitivity studies against those probabilities and parameters thereof (see next section). For example, data may LOOK gaussian, but the shoulders or tails of the empirical data aren’t fitting the form. Perhaps there is a slight multi-modality going on that isn’t quite clear from the data. Whatever the case, select the most conservative one for your application and look for ways to get more empirical data to reduce the family size and/or parameter uncertainties of the fit probability forms. Studies in this area can be used to justify spending more money on physical testing to get more empirical data or to improve test accuracy, for instance.
Consider the variability of closed-form probability distribution parameters
Once a probability form is chosen, look at all empirical data and the fit to the distribution chosen. How does it vary from the ideal form? Are there a range of parameters that could be viable? If so, then undertake a sensitivity to those parameters, or a so-called “second order probability” study… “variability of the variability”. If there is a strong sensitivity driver (this is usually the case in tails causing failure cases of extreme risk) use this to justify design modification, additional testing, or to inform risk assessment.
In conclusion…
Picking a variability form for a parameter in a model under study is only the beginning, for each parameter with a variability form is a model unto itself that must be justified/grounded through its own Verification, Validation, and Uncertainty Quantification. Ultimately, the best UQ has aleatory variability that can be justified by empirical data, and includes epistemic variability where necessary. Driving epistemic variability to aleatory variability is not required, unless the epistemic variability mapped through the nonlinear modeling violates a requirement, and the violation is of unacceptable risk to the task at hand.