[1]:
import emat
import numpy
from matplotlib import pyplot as plt
[2]:
from emat.util.distributions import pert, triangle, uniform, get_bounds
This page reviews some common continuous distributions used for exploratory and risk analysis.
EMAT can also use any named continuous distribution from the scipy.stats
module.
Uniform Distribution¶
The uniform distribution is defined by a probability density function that is a rectangle. It is parameterized using two parameters (minimum, maximum). It is a simple distribution that is easy to understand and explain, and is often assumed as the implied default distribution for exploratory analysis.
[3]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 4
min: 1
max: 4
dist: uniform
dtype: float
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = (0,5)
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
It is also valid to include the min
and max
values under the dist
key, instead of as top level keys for the parameter definition.
[4]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 4
dist:
name: uniform
min: 1
max: 4
dtype: float
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = (0,5)
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
Triangle Distribution¶
The triangle distribution is defined by a probability density function that is a triangle. It is parameterized using three parameters (minimum, peak, maximum). It is a simple distribution that is easy to understand and explain, and unlike the uniform distribution, it allow more likelihood to be directed towards some particular value.
[5]:
x = numpy.linspace(0,5)
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=0.0).pdf(x), label='Peak=0.0')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=0.5).pdf(x), label='Peak=0.5')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=1.0).pdf(x), label='Peak=1.0')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=2.5).pdf(x), label='Peak=2.5')
_=plt.legend()
-
emat.util.distributions.
triangle
(lower_bound, upper_bound=None, *, rel_peak=None, peak=None, width=None)[source]¶ Generate a frozen scipy.stats.triang distribution.
This function provides the same actual distribution as
triang
, but offers multiple intuitive ways to identify the peak and upper bound of the distribution, whiletriang
is less flexible and intuitive (it must be defined using arguments labeled as ‘c’, ‘loc’, and ‘scale’).Parameters: - lower_bound (numeric) – The lower bound of the distribution.
- upper_bound (numeric, optional) – The upper bound of the distribution. Can be inferred from width if not given.
- rel_peak (numeric, optional) – The relative position of the peak of the triangle. Must be in the range (0,1). If neither peak nor rel_peak is given, a default value of 0.5 is used.
- peak (numeric, optional) – The location of the peak of the triangle, given as a particular value, which must be between the lower and upper bounds inclusive.
- width (numeric, optional) – The distance between the lower and upper bounds. Can be inferred from those values if not given.
Returns: scipy.stats.rv_frozen
[6]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 4
min: 0
max: 5
dist:
name: triangle
peak: 4
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
It is also valid to include the min
and max
values under the dist
key, instead of as top level keys for the parameter definition.
[7]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 4
dist:
name: triangle
min: 0
peak: 4
max: 5
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
PERT Distribution¶
The PERT distrubution (“PERT” is an acronym for “project evaluation and review techniques”) is a generally bell-shaped curve that, unlike the normal distribution, has finite minimum and maximum values. It can be parameterized similar to the triangular distribution, using three parameters (minimum, peak, maximum). This allows a skew to be introduced, by setting the peak value to be other-than the midpoint between maximum and minimum values.
[8]:
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=0.0).pdf(x), label='Peak=0.0')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=0.5).pdf(x), label='Peak=0.5')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=1.0).pdf(x), label='Peak=1.0')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=2.5).pdf(x), label='Peak=2.5')
_=plt.legend()
The relative peakiness (i.e., kurtosis) of the distribution can be controlled using the gamma parameter. The default value of gamma for a PERT distrubution is 4.0, but other positive numbers can be used as well, with higher numbers for a distribution that more favors outcomes near the peak, or smaller numbers for a distribution that gives less pronounced weight to value near the peak, and relatively more weight to the tails. In the limit, setting gamma to zero results in a uniform distribution.
[9]:
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=1).pdf(x), label='gamma=1')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=2).pdf(x), label='gamma=2')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=3).pdf(x), label='gamma=3')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=4).pdf(x), label='gamma=4', lw=3.0)
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=5).pdf(x), label='gamma=5')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=10).pdf(x), label='gamma=10')
_=plt.legend()
-
emat.util.distributions.
pert
(lower_bound, upper_bound=None, *, rel_peak=None, peak=None, width=None, gamma=4.0)[source]¶ Generate a frozen scipy.stats.beta PERT distribution.
For details on the PERT distribution see wikipedia.
Parameters: - lower_bound (numeric) – The lower bound of the distribution.
- upper_bound (numeric, optional) – The upper bound of the distribution. Can be inferred from width if not given.
- rel_peak (numeric, optional) – The relative position of the peak of the triangle. Must be in the range (0,1). If neither peak nor rel_peak is given, a default value of 0.5 is used.
- peak (numeric, optional) – The location of the peak of the triangle, given as a particular value, which must be between the lower and upper bounds inclusive.
- width (numeric, optional) – The distance between the lower and upper bounds. Can be inferred from those values if not given.
Returns: scipy.stats.rv_frozen
The PERT distribution can be indicated in a yaml scope file using the name “pert”, with optional values for other named arguments outlined in the function docstring shown above.
[10]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 1.0
min: 0
max: 5
dist:
name: pert
peak: 4
gamma: 3
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
It is also valid to include the min
and max
values under the dist
key, instead of as top level keys for the parameter definition.
[11]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 1.0
dist:
name: pert
min: 0
max: 5
peak: 4
gamma: 3
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)
Other Distributions¶
It is possible to use any other continuous distribution provided in the scipy.stats
module.
As a demonstration, below we define a trapezoidal distribution for an uncertainty. Instead of
using the more intuitively named keys shown above, it is necessary to fall back to the standard
scipy.stats
names for each of the distribution parameters, and they must all be defined within
the dist key, which may be less intuitive than the suggested distributions above. For example,
note in the example below that the upper bound of the distribution is implictly set to 7 based
on the parameters, and that upper bound is not explicitly identified in the yaml file.
[12]:
y = """---
scope:
name: demonstration
inputs:
uncertain_variable_name:
ptype: uncertainty
desc: Slightly More Verbose Description
default: 1.0
dist:
name: trapz
c: 0.2
d: 0.5
loc: 2
scale: 5
outputs:
performance_measure_name:
kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)