[1]:

import emat
import numpy
from matplotlib import pyplot as plt

[2]:

from emat.util.distributions import pert, triangle, uniform, get_bounds

This page reviews some common continuous distributions used for exploratory and risk analysis. EMAT can also use any named continuous distribution from the scipy.stats module.

Uniform Distribution¶

The uniform distribution is defined by a probability density function that is a rectangle. It is parameterized using two parameters (minimum, maximum). It is a simple distribution that is easy to understand and explain, and is often assumed as the implied default distribution for exploratory analysis.

[3]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 4
        min: 1
        max: 4
        dist: uniform
        dtype: float
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = (0,5)
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

It is also valid to include the min and max values under the dist key, instead of as top level keys for the parameter definition.

[4]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 4
        dist:
            name: uniform
            min: 1
            max: 4
        dtype: float
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = (0,5)
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

Triangle Distribution¶

The triangle distribution is defined by a probability density function that is a triangle. It is parameterized using three parameters (minimum, peak, maximum). It is a simple distribution that is easy to understand and explain, and unlike the uniform distribution, it allow more likelihood to be directed towards some particular value.

[5]:

x = numpy.linspace(0,5)
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=0.0).pdf(x), label='Peak=0.0')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=0.5).pdf(x), label='Peak=0.5')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=1.0).pdf(x), label='Peak=1.0')
plt.plot(x, triangle(lower_bound=0, upper_bound=5, peak=2.5).pdf(x), label='Peak=2.5')
_=plt.legend()

emat.util.distributions.triangle(lower_bound, upper_bound=None, *, rel_peak=None, peak=None, width=None)[source]¶

Generate a frozen scipy.stats.triang distribution.

This function provides the same actual distribution as triang, but offers multiple intuitive ways to identify the peak and upper bound of the distribution, while triang is less flexible and intuitive (it must be defined using arguments labeled as ‘c’, ‘loc’, and ‘scale’).

Parameters:

lower_bound (numeric) – The lower bound of the distribution.
upper_bound (numeric, optional) – The upper bound of the distribution. Can be inferred from width if not given.
rel_peak (numeric, optional) – The relative position of the peak of the triangle. Must be in the range (0,1). If neither peak nor rel_peak is given, a default value of 0.5 is used.
peak (numeric, optional) – The location of the peak of the triangle, given as a particular value, which must be between the lower and upper bounds inclusive.
width (numeric, optional) – The distance between the lower and upper bounds. Can be inferred from those values if not given.

Returns:

scipy.stats.rv_frozen

[6]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 4
        min: 0
        max: 5
        dist:
            name: triangle
            peak: 4
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

It is also valid to include the min and max values under the dist key, instead of as top level keys for the parameter definition.

[7]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 4
        dist:
            name: triangle
            min: 0
            peak: 4
            max: 5
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

PERT Distribution¶

The PERT distrubution (“PERT” is an acronym for “project evaluation and review techniques”) is a generally bell-shaped curve that, unlike the normal distribution, has finite minimum and maximum values. It can be parameterized similar to the triangular distribution, using three parameters (minimum, peak, maximum). This allows a skew to be introduced, by setting the peak value to be other-than the midpoint between maximum and minimum values.

[8]:

plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=0.0).pdf(x), label='Peak=0.0')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=0.5).pdf(x), label='Peak=0.5')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=1.0).pdf(x), label='Peak=1.0')
plt.plot(x, pert(lower_bound=0, upper_bound=5, peak=2.5).pdf(x), label='Peak=2.5')
_=plt.legend()

The relative peakiness (i.e., kurtosis) of the distribution can be controlled using the gamma parameter. The default value of gamma for a PERT distrubution is 4.0, but other positive numbers can be used as well, with higher numbers for a distribution that more favors outcomes near the peak, or smaller numbers for a distribution that gives less pronounced weight to value near the peak, and relatively more weight to the tails. In the limit, setting gamma to zero results in a uniform distribution.

[9]:

plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=1).pdf(x), label='gamma=1')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=2).pdf(x), label='gamma=2')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=3).pdf(x), label='gamma=3')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=4).pdf(x), label='gamma=4', lw=3.0)
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=5).pdf(x), label='gamma=5')
plt.plot(x, pert(lower_bound=0, upper_bound=5, gamma=10).pdf(x), label='gamma=10')
_=plt.legend()

emat.util.distributions.pert(lower_bound, upper_bound=None, *, rel_peak=None, peak=None, width=None, gamma=4.0)[source]¶

Generate a frozen scipy.stats.beta PERT distribution.

For details on the PERT distribution see wikipedia.

Parameters:

lower_bound (numeric) – The lower bound of the distribution.
upper_bound (numeric, optional) – The upper bound of the distribution. Can be inferred from width if not given.
rel_peak (numeric, optional) – The relative position of the peak of the triangle. Must be in the range (0,1). If neither peak nor rel_peak is given, a default value of 0.5 is used.
peak (numeric, optional) – The location of the peak of the triangle, given as a particular value, which must be between the lower and upper bounds inclusive.
width (numeric, optional) – The distance between the lower and upper bounds. Can be inferred from those values if not given.

Returns:

scipy.stats.rv_frozen

The PERT distribution can be indicated in a yaml scope file using the name “pert”, with optional values for other named arguments outlined in the function docstring shown above.

[10]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 1.0
        min: 0
        max: 5
        dist:
            name: pert
            peak: 4
            gamma: 3
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

It is also valid to include the min and max values under the dist key, instead of as top level keys for the parameter definition.

[11]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 1.0
        dist:
            name: pert
            min: 0
            max: 5
            peak: 4
            gamma: 3
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)

Other Distributions¶

It is possible to use any other continuous distribution provided in the scipy.stats module. As a demonstration, below we define a trapezoidal distribution for an uncertainty. Instead of using the more intuitively named keys shown above, it is necessary to fall back to the standard scipy.stats names for each of the distribution parameters, and they must all be defined within the dist key, which may be less intuitive than the suggested distributions above. For example, note in the example below that the upper bound of the distribution is implictly set to 7 based on the parameters, and that upper bound is not explicitly identified in the yaml file.

[12]:

y = """---
scope:
    name: demonstration
inputs:
    uncertain_variable_name:
        ptype: uncertainty
        desc: Slightly More Verbose Description
        default: 1.0
        dist:
            name: trapz
            c: 0.2
            d: 0.5
            loc: 2
            scale: 5
outputs:
    performance_measure_name:
        kind: maximize
...
"""
s = emat.Scope('t.yaml', scope_def=y)
bounds = get_bounds(s['uncertain_variable_name'])
x = numpy.linspace(*bounds)
y = s['uncertain_variable_name'].dist.pdf(x)
_=plt.plot(x,y)