Road Test

[1]:
import emat, os, numpy, pandas, functools, asyncio
emat.versions()
emat 0.5.2, plotly 4.14.3
[2]:
logger = emat.util.loggers.log_to_stderr(30, True)

Defining the Exploratory Scope

The model scope is defined in a YAML file. For this Road Test example, the scope file is named road_test.yaml and is included in the model/tests directory.

[3]:
road_test_scope_file = emat.package_file('model','tests','road_test.yaml')

The filename for the YAML file is the first argument when creating a Scope object, which will load and process the content of the file.

[4]:
road_scope = emat.Scope(road_test_scope_file)
road_scope
[4]:
<emat.Scope with 2 constants, 7 uncertainties, 4 levers, 7 measures>

A short summary of the scope can be reviewed using the info method.

[5]:
road_scope.info()
name: EMAT Road Test
desc: prototype run
constants:
  free_flow_time = 60
  initial_capacity = 100
uncertainties:
  alpha = 0.1 to 0.2
  beta = 3.5 to 5.5
  input_flow = 80 to 150
  value_of_time = 0.001 to 0.25
  unit_cost_expansion = 95.0 to 145.0
  interest_rate = 0.025 to 0.04
  yield_curve = -0.0025 to 0.02
levers:
  expand_capacity = 0.0 to 100.0
  amortization_period = 15 to 50
  debt_type = categorical
  interest_rate_lock = boolean
measures:
  no_build_travel_time
  build_travel_time
  time_savings
  value_of_time_savings
  net_benefits
  cost_of_capacity_expansion
  present_cost_expansion

Alternatively, more detailed information about each part of the scope can be accessed in four list attributes:

[6]:
road_scope.get_constants()
[6]:
[Constant('free_flow_time', 60), Constant('initial_capacity', 100)]
[7]:
road_scope.get_uncertainties()
[7]:
[<emat.RealParameter 'alpha'>,
 <emat.RealParameter 'beta'>,
 <emat.IntegerParameter 'input_flow'>,
 <emat.RealParameter 'value_of_time'>,
 <emat.RealParameter 'unit_cost_expansion'>,
 <emat.RealParameter 'interest_rate'>,
 <emat.RealParameter 'yield_curve'>]
[8]:
road_scope.get_levers()
[8]:
[<emat.RealParameter 'expand_capacity'>,
 <emat.IntegerParameter 'amortization_period'>,
 <emat.CategoricalParameter 'debt_type'>,
 <emat.BooleanParameter 'interest_rate_lock'>]
[9]:
road_scope.get_measures()
[9]:
[Measure('no_build_travel_time'),
 Measure('build_travel_time'),
 Measure('time_savings'),
 Measure('value_of_time_savings'),
 Measure('net_benefits'),
 Measure('cost_of_capacity_expansion'),
 Measure('present_cost_expansion')]

Using a Database

The exploratory modeling process will typically generate many different sets of outputs, for different explored modeling scopes, or for different applications. It is convenient to organize these outputs in a database structure, so they are stored consistently and readily available for subsequent analysis.

The SQLiteDB object will create a database to store results. When instantiated with no arguments, the database is initialized in-memory, which will not store anything to disk (which is convenient for this example, but in practice you will generally want to store data to disk so that it can persist after this Python session ends).

[10]:
emat_db = emat.SQLiteDB('tempfile')

An EMAT Scope can be stored in the database, to provide needed information about what the various inputs and outputs represent.

[11]:
road_scope.store_scope(emat_db)

Trying to store another scope with the same name (or the same scope) raises a KeyError.

[12]:
try:
    road_scope.store_scope(emat_db)
except KeyError as err:
    print(err)
'scope named "EMAT Road Test" already exists'

We can review the names of scopes already stored in the database using the read_scope_names method.

[13]:
emat_db.read_scope_names()
[13]:
['EMAT Road Test']

Experimental Design

Actually running the model can be done by the user on an ad hoc basis (i.e., manually defining every combination of inputs that will be evaluated) but the real power of EMAT comes from runnning the model using algorithm-created experimental designs.

An important experimental design used in exploratory modeling is the Latin Hypercube. This design selects a random set of experiments across multiple input dimensions, to ensure “good” coverage of the multi-dimensional modeling space.

The design_latin_hypercube function creates such a design based on a Scope, and optionally stores the design of experiments in a database.

[14]:
from emat.experiment.experimental_design import design_experiments
[15]:
design = design_experiments(road_scope, db=emat_db, n_samples_per_factor=10, sampler='lhs')
design.head()
[15]:
alpha amortization_period beta debt_type expand_capacity input_flow interest_rate interest_rate_lock unit_cost_expansion value_of_time yield_curve free_flow_time initial_capacity
experiment
1 0.184682 38 5.237143 Rev Bond 18.224793 115 0.031645 False 118.213466 0.059510 0.015659 60 100
2 0.166133 36 4.121963 Paygo 87.525790 129 0.037612 True 141.322696 0.107772 0.007307 60 100
3 0.198937 44 4.719838 GO Bond 45.698048 105 0.028445 False 97.783320 0.040879 -0.001545 60 100
4 0.158758 42 4.915816 GO Bond 51.297546 113 0.036234 True 127.224901 0.182517 0.004342 60 100
5 0.157671 42 3.845952 Paygo 22.824149 133 0.039257 False 107.820482 0.067102 0.001558 60 100
[16]:
large_design = design_experiments(road_scope, db=emat_db, n_samples=5000, sampler='lhs', design_name='lhs_large')
large_design.head()
[16]:
alpha amortization_period beta debt_type expand_capacity input_flow interest_rate interest_rate_lock unit_cost_expansion value_of_time yield_curve free_flow_time initial_capacity
experiment
111 0.154130 21 5.061648 Rev Bond 75.542217 112 0.029885 True 124.452736 0.056335 0.001425 60 100
112 0.148731 29 4.088663 Rev Bond 91.184595 145 0.028659 False 131.688623 0.051854 0.007850 60 100
113 0.124027 34 3.956884 Paygo 60.436585 80 0.038101 False 95.462532 0.045672 0.011101 60 100
114 0.129724 41 4.969628 Paygo 74.271040 139 0.029665 False 98.206495 0.044312 0.010072 60 100
115 0.185723 22 4.485432 Paygo 61.084166 95 0.039195 False 140.792308 0.144629 0.019277 60 100

We can review what experimental designs have already been stored in the database using the read_design_names method of the Database object.

[17]:
emat_db.read_design_names('EMAT Road Test')
[17]:
['lhs', 'lhs_large']

Core Model in Python

Up until this point, we have been considering a model in the abstract, defining in the Scope what the inputs and outputs will be, and designing some experiments we would like to run with the model. Now we will actually interface with the model itself.

Model Definition

In the simplest approach for EMAT, a model can be defined as a basic Python function, which accepts all inputs (exogenous uncertainties, policy levers, and externally defined constants) as named keyword arguments, and returns a dictionary where the dictionary keys are names of performace measures, and the mapped values are the computed values for those performance measures. The Road_Capacity_Investment function provided in EMAT is an example of such a function. This made-up example considers the investment in capacity expansion for a single roadway link. The inputs to this function are described above in the Scope, including uncertain parameters in the volume-delay function, traffic volumes, value of travel time savings, unit construction costs, and interest rates, and policy levers including the amount of capacity expansion and amortization period.

[18]:
from emat.model.core_python import PythonCoreModel, Road_Capacity_Investment

The PythonCoreModel object provides an interface that links the basic Python function that represents the model, the Scope, and optionally the Database used to manage data storage.

[19]:
m = PythonCoreModel(Road_Capacity_Investment, scope=road_scope, db=emat_db)

From the PythonCoreModel, which links the model, scope, design, and database, we can run the design of experiments. This will systematically run the core model with each set of input parameters in the design, store the results in the database, and return a pandas.DataFrame containing the results.

Model Execution

[20]:
lhs_results = m.run_experiments(design_name='lhs')
lhs_results.head()
[20]:
alpha beta input_flow value_of_time unit_cost_expansion interest_rate yield_curve expand_capacity amortization_period debt_type interest_rate_lock free_flow_time initial_capacity no_build_travel_time build_travel_time time_savings value_of_time_savings net_benefits cost_of_capacity_expansion present_cost_expansion
experiment
1 0.184682 5.237143 115 0.059510 118.213466 0.031645 0.015659 18.224793 38 Rev Bond False 60 100 83.038716 69.586789 13.451927 92.059972 -22.290905 114.350877 2154.415985
2 0.166133 4.121963 129 0.107772 141.322696 0.037612 0.007307 87.525790 36 Paygo True 60 100 88.474313 62.132583 26.341730 366.219659 -16.843014 383.062672 12369.380535
3 0.198937 4.719838 105 0.040879 97.783320 0.028445 -0.001545 45.698048 44 GO Bond False 60 100 75.027180 62.543328 12.483852 53.584943 -113.988412 167.573355 4468.506839
4 0.158758 4.915816 113 0.182517 127.224901 0.036234 0.004342 51.297546 42 GO Bond True 60 100 77.370428 62.268768 15.101660 311.462907 11.539561 299.923347 6526.325171
5 0.157671 3.845952 133 0.067102 107.820482 0.039257 0.001558 22.824149 42 Paygo False 60 100 88.328990 72.848428 15.480561 138.156464 78.036616 60.119848 2460.910705

If running a large number of experiments, it may be valuable to parallelize the processing using a DistributedEvaluator instead of the default SequentialEvaluator. The DistributedEvaluator uses dask.distributed to distribute the workload to a cluster of processes, which can all be on the same machine or distributed over multiple networked computers. (The details of using dask.distributed in more complex environments are beyond this scope of this example, but interested users can refer to that package’s documentation.)

[21]:
lhs_large_async = m.async_experiments(large_design, max_n_workers=8, batch_size=157)
[22]:
lhs_large_results = await lhs_large_async.final_results()

Once a particular design has been run once, the results can be recovered from the database without re-running the model itself.

[23]:
reload_results = m.read_experiments(design_name='lhs')
reload_results.head()
[23]:
free_flow_time initial_capacity alpha beta input_flow value_of_time unit_cost_expansion interest_rate yield_curve expand_capacity amortization_period debt_type interest_rate_lock no_build_travel_time build_travel_time time_savings value_of_time_savings net_benefits cost_of_capacity_expansion present_cost_expansion
experiment
1 60 100 0.184682 5.237143 115 0.059510 118.213466 0.031645 0.015659 18.224793 38 Rev Bond False 83.038716 69.586789 13.451927 92.059972 -22.290905 114.350877 2154.415985
2 60 100 0.166133 4.121963 129 0.107772 141.322696 0.037612 0.007307 87.525790 36 Paygo True 88.474313 62.132583 26.341730 366.219659 -16.843014 383.062672 12369.380535
3 60 100 0.198937 4.719838 105 0.040879 97.783320 0.028445 -0.001545 45.698048 44 GO Bond False 75.027180 62.543328 12.483852 53.584943 -113.988412 167.573355 4468.506839
4 60 100 0.158758 4.915816 113 0.182517 127.224901 0.036234 0.004342 51.297546 42 GO Bond True 77.370428 62.268768 15.101660 311.462907 11.539561 299.923347 6526.325171
5 60 100 0.157671 3.845952 133 0.067102 107.820482 0.039257 0.001558 22.824149 42 Paygo False 88.328990 72.848428 15.480561 138.156464 78.036616 60.119848 2460.910705

It is also possible to load only the parameters, or only the performance meausures.

[24]:
lhs_params = m.read_experiment_parameters(design_name='lhs')
lhs_params.head()
[24]:
free_flow_time initial_capacity alpha beta input_flow value_of_time unit_cost_expansion interest_rate yield_curve expand_capacity amortization_period debt_type interest_rate_lock
experiment
1 60 100 0.184682 5.237143 115 0.059510 118.213466 0.031645 0.015659 18.224793 38 Rev Bond False
2 60 100 0.166133 4.121963 129 0.107772 141.322696 0.037612 0.007307 87.525790 36 Paygo True
3 60 100 0.198937 4.719838 105 0.040879 97.783320 0.028445 -0.001545 45.698048 44 GO Bond False
4 60 100 0.158758 4.915816 113 0.182517 127.224901 0.036234 0.004342 51.297546 42 GO Bond True
5 60 100 0.157671 3.845952 133 0.067102 107.820482 0.039257 0.001558 22.824149 42 Paygo False
[25]:
lhs_outcomes = m.read_experiment_measures(design_name='lhs')
lhs_outcomes.head()
[25]:
no_build_travel_time build_travel_time time_savings value_of_time_savings net_benefits cost_of_capacity_expansion present_cost_expansion
experiment run
1 fdeef8a8-be52-11eb-a248-acde48001122 83.038716 69.586789 13.451927 92.059972 -22.290905 114.350877 2154.415985
2 fdf0e65e-be52-11eb-a248-acde48001122 88.474313 62.132583 26.341730 366.219659 -16.843014 383.062672 12369.380535
3 fdf254b2-be52-11eb-a248-acde48001122 75.027180 62.543328 12.483852 53.584943 -113.988412 167.573355 4468.506839
4 fdf3ac86-be52-11eb-a248-acde48001122 77.370428 62.268768 15.101660 311.462907 11.539561 299.923347 6526.325171
5 fdf525a2-be52-11eb-a248-acde48001122 88.328990 72.848428 15.480561 138.156464 78.036616 60.119848 2460.910705

Feature Scoring

[26]:
m.get_feature_scores('lhs')
[26]:
alpha beta input_flow value_of_time unit_cost_expansion interest_rate yield_curve expand_capacity amortization_period debt_type interest_rate_lock
no_build_travel_time 0.075444 0.060221 0.540386 0.039323 0.037390 0.046159 0.040861 0.051164 0.049716 0.030621 0.028717
build_travel_time 0.055973 0.049988 0.261898 0.057534 0.052644 0.050073 0.053859 0.318194 0.035832 0.035718 0.028288
time_savings 0.092528 0.079272 0.443769 0.049315 0.041853 0.049267 0.051544 0.078433 0.046920 0.034810 0.032290
value_of_time_savings 0.088381 0.066305 0.313696 0.180042 0.040443 0.059434 0.061504 0.068176 0.053497 0.036622 0.031900
net_benefits 0.074451 0.055493 0.277536 0.096591 0.058608 0.049967 0.062057 0.179746 0.076616 0.043170 0.025766
cost_of_capacity_expansion 0.033834 0.037829 0.037349 0.036240 0.079136 0.045176 0.037441 0.523266 0.105854 0.039362 0.024514
present_cost_expansion 0.032350 0.031131 0.031986 0.036581 0.089316 0.044133 0.030549 0.627419 0.031182 0.024638 0.020715

Visualization

[27]:
from emat.analysis import display_experiments
display_experiments(road_scope, lhs_results, rows=['time_savings', 'net_benefits', 'input_flow'])

Time Savings

Net Benefits