Database Walkthrough¶
[1]:
import os
import numpy as np
import pandas as pd
import seaborn; seaborn.set_theme()
import plotly.io; plotly.io.templates.default = "seaborn"
import emat
import yaml
from emat.util.show_dir import show_dir
from emat.analysis import display_experiments
emat.versions()
emat 0.5.2, plotly 4.14.3
For this walkthrough of database features, we’ll work in a temporary directory. (In real projects you’ll likely want to save your data somewhere less ephemeral, so don’t just copy this tempfile code into your work.)
[2]:
import tempfile
tempdir = tempfile.TemporaryDirectory()
os.chdir(tempdir.name)
We begin our example by populating a database with some experimental data, by creating and running a single design of experiments for the Road Test model.
[3]:
import emat.examples
scope, db, model = emat.examples.road_test()
design = model.design_experiments()
model.run_experiments(design);
Single-Design Datasets¶
Writing Out Raw Data¶
When the database has only a single design of experiments, or if we don’t care about any differentiation between multiple designs that we may have created and ran, we can dump the entire set of model runs, including uncertainties, policy levers, and performance measures, all consolidated into a single pandas DataFrame using the read_experiment_all
function. The constants even appear in this DataFrame too, for good measure.
[4]:
df = db.read_experiment_all(scope.name)
df
[4]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
1 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | 38 | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 |
2 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | 36 | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 |
3 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | 44 | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 |
4 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | 42 | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 |
5 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | 42 | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
106 | 60 | 100 | 0.169674 | 4.939898 | 150 | 0.131775 | 112.348054 | 0.033034 | -0.000120 | 24.215074 | 34 | Rev Bond | True | 135.446332 | 85.847986 | 49.598345 | 980.371960 | 841.462782 | 138.909178 | 2720.516457 |
107 | 60 | 100 | 0.148297 | 3.824779 | 110 | 0.103255 | 105.248708 | 0.033437 | 0.007041 | 38.013885 | 22 | GO Bond | True | 72.811506 | 63.736168 | 9.075338 | 103.078214 | -146.712793 | 249.791007 | 4000.912327 |
108 | 60 | 100 | 0.134701 | 3.627795 | 144 | 0.035233 | 132.063099 | 0.036702 | 0.018681 | 52.155613 | 32 | Paygo | True | 90.340993 | 66.617994 | 23.722999 | 120.358653 | -112.568104 | 232.926757 | 6887.831931 |
109 | 60 | 100 | 0.176125 | 4.243675 | 100 | 0.188515 | 97.503468 | 0.035896 | 0.016807 | 38.352260 | 33 | Paygo | True | 70.567477 | 62.664842 | 7.902635 | 148.976184 | 25.480553 | 123.495631 | 3739.478399 |
110 | 60 | 100 | 0.142843 | 4.486119 | 145 | 0.064909 | 99.837764 | 0.035372 | 0.011055 | 15.851006 | 46 | GO Bond | False | 105.386378 | 83.456509 | 21.929869 | 206.401068 | 127.311542 | 79.089526 | 1582.528991 |
110 rows × 20 columns
Exporting this data is simply a matter of using the usual pandas methods to save the dataframe to a format of your choosing. We’ll save our data into a gzipped CSV file, which is somewhat compressed (we’re not monsters here) but still widely compatible for a variety of uses.
[5]:
df.to_csv("road_test_1.csv.gz")
This table contains most of the information we want to export from our database, but not everything. We also probably want to have access to all of the information in the exploratory scope as well. Our example generator gives us a Scope
reference directly, but if we didn’t have that we can still extract it from the database, using the read_scope
method.
[6]:
s = db.read_scope()
s
[6]:
<emat.Scope with 2 constants, 7 uncertainties, 4 levers, 7 measures>
[7]:
s.dump(filename="road_test_scope.yaml")
[8]:
show_dir('.')
/
├── road_test_1.csv.gz
└── road_test_scope.yaml
Reading In Raw Data¶
Now, we’re ready to begin anew, constructing a fresh database from scratch, using only the raw formatted files.
First, let’s load our scope from the yaml file, and initialize a clean database using that scope.
[9]:
s2 = emat.Scope("road_test_scope.yaml")
[10]:
db2 = emat.SQLiteDB("road_test_2.sqldb")
[11]:
db2.store_scope(s2)
Just as we used pandas to save out our consolidated DataFrame of experimental results, we can use it to read in a consolidated table of experiments.
[12]:
df2 = pd.read_csv("road_test_1.csv.gz", index_col='experiment')
df2
[12]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
1 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | 38 | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 |
2 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | 36 | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 |
3 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | 44 | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 |
4 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | 42 | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 |
5 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | 42 | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
106 | 60 | 100 | 0.169674 | 4.939898 | 150 | 0.131775 | 112.348054 | 0.033034 | -0.000120 | 24.215074 | 34 | Rev Bond | True | 135.446332 | 85.847986 | 49.598345 | 980.371960 | 841.462782 | 138.909178 | 2720.516457 |
107 | 60 | 100 | 0.148297 | 3.824779 | 110 | 0.103255 | 105.248708 | 0.033437 | 0.007041 | 38.013885 | 22 | GO Bond | True | 72.811506 | 63.736168 | 9.075338 | 103.078214 | -146.712793 | 249.791007 | 4000.912327 |
108 | 60 | 100 | 0.134701 | 3.627795 | 144 | 0.035233 | 132.063099 | 0.036702 | 0.018681 | 52.155613 | 32 | Paygo | True | 90.340993 | 66.617994 | 23.722999 | 120.358653 | -112.568104 | 232.926757 | 6887.831931 |
109 | 60 | 100 | 0.176125 | 4.243675 | 100 | 0.188515 | 97.503468 | 0.035896 | 0.016807 | 38.352260 | 33 | Paygo | True | 70.567477 | 62.664842 | 7.902635 | 148.976184 | 25.480553 | 123.495631 | 3739.478399 |
110 | 60 | 100 | 0.142843 | 4.486119 | 145 | 0.064909 | 99.837764 | 0.035372 | 0.011055 | 15.851006 | 46 | GO Bond | False | 105.386378 | 83.456509 | 21.929869 | 206.401068 | 127.311542 | 79.089526 | 1582.528991 |
110 rows × 20 columns
Writing experiments to a database is not quite as simple as reading them. There is a parallel write_experiment_all
method for the Database
class, but to use it we need to provide not only the DataFrame of actual results, but also a name for the design of experiments we are writing (all experiments exist within designs) and the source of the performance measure results (zero means actual results from a core model run, and non-zero values are ID numbers for metamodels). This allows many
different possible sets of performance measures to be stored for the same set of input parameters.
[13]:
db2.write_experiment_all(
scope_name=s2.name,
design_name='general',
source=0,
xlm_df=df2,
)
[14]:
display_experiments(s2, 'general', db=db2, rows=['time_savings'])
Time Savings
Multiple-Design Datasets¶
The EMAT database is not limited to storing a single design of experiments. Multiple designs can be stored for the same scope. We’ll add a set of univariate sensitivity test to our database, and a “ref” design that contains a single experiment with all inputs set to their default values.
[15]:
design_uni = model.design_experiments(sampler='uni')
model.run_experiments(design_uni)
model.run_reference_experiment();
We now have three designs stored in our database. We can confirm this by reading out the set of design names.
[16]:
db.read_design_names(s.name)
[16]:
['lhs', 'ref', 'uni']
The design names we se here are the default names given when designs are created with each of the given samplers. When creating new designs, we can override the default names with other names of our choice using the design_name
argument. The names can be any string not already in use.
[17]:
design_b = model.design_experiments(sampler='lhs', design_name='bruce')
db.read_design_names(s.name)
[17]:
['bruce', 'lhs', 'ref', 'uni']
If you try to re-use a name you’ll get an error, as having multiple designs with the same name does not allow you to make it clear which design you are referring to.
[18]:
try:
model.design_experiments(sampler='lhs', design_name='bruce')
except ValueError as err:
print(err)
the design "bruce" already exists for scope "EMAT Road Test"
As noted above, the design name, which can be any string, is separate from the sampler method. A default design name based on the name of the sampler method is used if no design name is given. The selected sampler must be one available in EMAT, as the sampler defines a particular logic about how to generate the design.
[19]:
try:
model.design_experiments(sampler='uni')
except ValueError as err:
print(err)
Note that there can be some experiments that are in more than one design. This is not merely duplicating the experiment and results, but actually assigning the same experiment to both designs. We can see this for the ‘uni’ and ‘ref’ designs – both contain the all-default parameters experiment, and when we read these designs out of the database, the same experiment number is reported out in both designs.
[20]:
db.read_experiment_all(scope.name, design_name='uni').head()
[20]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
111 | 60 | 100 | 0.15 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 66.147121 | 2.852879 | 21.396592 | -30.807346 | 52.203937 | 1000.0 |
112 | 60 | 100 | 0.10 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 66.0 | 64.098081 | 1.901919 | 14.264395 | -37.939543 | 52.203937 | 1000.0 |
113 | 60 | 100 | 0.20 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 72.0 | 68.196161 | 3.803839 | 28.528789 | -23.675148 | 52.203937 | 1000.0 |
114 | 60 | 100 | 0.15 | 3.5 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 66.447155 | 2.552845 | 19.146338 | -33.057600 | 52.203937 | 1000.0 |
115 | 60 | 100 | 0.15 | 5.5 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 65.328227 | 3.671773 | 27.538295 | -24.665642 | 52.203937 | 1000.0 |
[21]:
db.read_experiment_all(scope.name, design_name='ref')
[21]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
111 | 60 | 100 | 0.15 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 66.147121 | 2.852879 | 21.396592 | -30.807346 | 52.203937 | 1000.0 |
One “gotcha” to be wary of is unintentionally replicating experiments. By default, the random_seed
for randomly generated experiemnts is set to 0 for reproducibility. This means that, for example, the ‘bruce’ design is actually the same as the original ‘lhs’ design:
[22]:
db.read_experiment_all(scope.name, design_name='lhs').equals(
db.read_experiment_all(scope.name, design_name='bruce')
)
[22]:
True
If we want a new set of random experiments with the same sampler and other parameters, we’ll need to provide a different random_seed
.
[23]:
design_b = model.design_experiments(sampler='lhs', design_name='new_bruce', random_seed=42)
db.read_experiment_all(scope.name, design_name='lhs').equals(design_b)
[23]:
False
Writing Out Raw Data¶
We can read a single dataframe containing all the experiments associated with this scope by omitting the design_name
argument, just as if there was only one design.
[24]:
df = db.read_experiment_all(scope.name)
df
[24]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
1 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | 38 | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 |
2 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | 36 | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 |
3 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | 44 | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 |
4 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | 42 | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 |
5 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | 42 | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
238 | 60 | 100 | 0.164649 | 4.851786 | 104 | 0.012456 | 127.667298 | 0.025673 | 0.012674 | 24.114264 | 30 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
239 | 60 | 100 | 0.113568 | 4.545709 | 105 | 0.137105 | 139.911784 | 0.028280 | 0.016936 | 92.883345 | 15 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
240 | 60 | 100 | 0.184434 | 4.732430 | 144 | 0.100242 | 99.904559 | 0.030392 | 0.019534 | 84.793007 | 15 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
241 | 60 | 100 | 0.146598 | 3.616301 | 107 | 0.065130 | 135.593036 | 0.035733 | 0.002085 | 29.107628 | 34 | Paygo | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
242 | 60 | 100 | 0.193327 | 5.402220 | 134 | 0.048285 | 140.945872 | 0.026210 | -0.001255 | 26.285726 | 44 | Rev Bond | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
242 rows × 20 columns
This dataframe is different than the one we saw earlier with the same command, as we have since added a few more experiments to the database in a few different designs. If we don’t give a design_name
argument, we’ll retrieve every (unique) experiment from every design currently stored in the database.
[25]:
df.to_csv("road_test_2.csv.gz")
If we want to be able to reconstruct the various designs of experiments later, we’ll also need to write out instructions for that. The read_all_experiment_ids
method can give us a dictionary of all the relevant information.
[26]:
design_experiments = db.read_all_experiment_ids(scope.name, design_name='*',grouped=True)
design_experiments
[26]:
{'bruce': '1-110',
'lhs': '1-110',
'new_bruce': '133-242',
'ref': '111',
'uni': '111-132',
'uni_2': '111-132'}
We can write this dictionary to a file in ‘yaml’ format.
[27]:
with open("road_test_design_experiments.yaml", 'wt') as f:
yaml.dump(design_experiments, f)
Reading In Raw Data¶
To construct a new emat Database with multiple designs of experients,…
[28]:
db3 = emat.SQLiteDB("road_test_3.sqldb")
db3.store_scope(s2)
[29]:
df3 = pd.read_csv("road_test_2.csv.gz", index_col='experiment')
df3
[29]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
1 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | 38 | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 |
2 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | 36 | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 |
3 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | 44 | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 |
4 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | 42 | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 |
5 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | 42 | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
238 | 60 | 100 | 0.164649 | 4.851786 | 104 | 0.012456 | 127.667298 | 0.025673 | 0.012674 | 24.114264 | 30 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
239 | 60 | 100 | 0.113568 | 4.545709 | 105 | 0.137105 | 139.911784 | 0.028280 | 0.016936 | 92.883345 | 15 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
240 | 60 | 100 | 0.184434 | 4.732430 | 144 | 0.100242 | 99.904559 | 0.030392 | 0.019534 | 84.793007 | 15 | GO Bond | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
241 | 60 | 100 | 0.146598 | 3.616301 | 107 | 0.065130 | 135.593036 | 0.035733 | 0.002085 | 29.107628 | 34 | Paygo | True | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
242 | 60 | 100 | 0.193327 | 5.402220 | 134 | 0.048285 | 140.945872 | 0.026210 | -0.001255 | 26.285726 | 44 | Rev Bond | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
242 rows × 20 columns
[30]:
with open("road_test_design_experiments.yaml", 'rt') as f:
design_experiments2 = yaml.safe_load(f)
design_experiments2
[30]:
{'bruce': '1-110',
'lhs': '1-110',
'new_bruce': '133-242',
'ref': '111',
'uni': '111-132',
'uni_2': '111-132'}
[31]:
db3.write_experiment_all(
scope_name=s2.name,
design_name=design_experiments2,
source=0,
xlm_df=df3,
)
[32]:
db3.read_design_names(s.name)
[32]:
['bruce', 'lhs', 'new_bruce', 'ref', 'uni', 'uni_2']
[33]:
db3.read_all_experiment_ids(scope.name, design_name='*',grouped=True)
[33]:
{'bruce': '1-110',
'lhs': '1-110',
'new_bruce': '133-242',
'ref': '111',
'uni': '111-132',
'uni_2': '111-132'}
[34]:
db3.read_experiment_all(scope.name, design_name='uni').head()
[34]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | ||||||||||||||||||||
111 | 60 | 100 | 0.15 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 66.147121 | 2.852879 | 21.396592 | -30.807346 | 52.203937 | 1000.0 |
112 | 60 | 100 | 0.10 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 66.0 | 64.098081 | 1.901919 | 14.264395 | -37.939543 | 52.203937 | 1000.0 |
113 | 60 | 100 | 0.20 | 4.0 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 72.0 | 68.196161 | 3.803839 | 28.528789 | -23.675148 | 52.203937 | 1000.0 |
114 | 60 | 100 | 0.15 | 3.5 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 66.447155 | 2.552845 | 19.146338 | -33.057600 | 52.203937 | 1000.0 |
115 | 60 | 100 | 0.15 | 5.5 | 100 | 0.075 | 100.0 | 0.03 | 0.01 | 10.0 | 30 | GO Bond | False | 69.0 | 65.328227 | 3.671773 | 27.538295 | -24.665642 | 52.203937 | 1000.0 |
Re-running Experiments¶
This section provides a short walkthrough of how to handle mistakes in an EMAT database. By “mistakes” we are referring to incorrect values that have been written into the database by accident, generally arising from core model runs that were misconfigured or suffered non-fatal errors that caused the results to be invalid.
One approach to handling such problems is to simply start over with a brand new clean database file. However, this may be inconvenient if the database already includes a number of valid results, especially if those valid results were expensive to generate. It may also be desirable to keep prior invalid results on hand, so as to easily recognized when errors recur.
We begin this example by populating our database with some more experimental data, by creating and running a single design of experiments for the Road Test model, except these experiments will be created with a misconfigured model (lane_width = 11, it should be 10), so the results will be bad. (In general, you probably won’t intentionally create corrupt data, but we’re doing so here for expository purposes, so we’ll give this design a name of ‘oops’ so we can readily recall what we’ve done.)
[35]:
model.lane_width = 10.3
oops = model.design_experiments(design_name='oops', random_seed=12345)
model.run_experiments(oops);
We can review a dataframe of results as before, using the read_experiment_all
method. This time we will add with_run_ids=True
, which will add an extra column to the index, showing a universally unique id attached to each row of results.
[36]:
oops_result1 = db.read_experiment_all(scope.name, 'oops', with_run_ids=True)
oops_result1.head()
[36]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | run | ||||||||||||||||||||
243 | b2358b5e-be73-11eb-809d-acde48001122 | 60 | 100 | 0.157955 | 4.270404 | 112 | 0.151487 | 116.445987 | 0.029784 | 0.018968 | 50.163393 | 38 | Rev Bond | False | 75.376818 | 79.495869 | -4.119051 | -69.886026 | -381.303111 | 311.417085 | 5841.325841 |
244 | b2376db6-be73-11eb-809d-acde48001122 | 60 | 100 | 0.165144 | 4.008218 | 134 | 0.039670 | 102.879244 | 0.032505 | 0.000708 | 57.330143 | 40 | Rev Bond | False | 92.024075 | 83.059349 | 8.964726 | 47.654780 | -220.075486 | 267.730266 | 5898.081794 |
245 | b238ff32-be73-11eb-809d-acde48001122 | 60 | 100 | 0.103964 | 4.019636 | 92 | 0.100936 | 131.282035 | 0.038432 | 0.002658 | 64.949710 | 39 | GO Bond | True | 64.461446 | 77.449591 | -12.988145 | -120.608948 | -520.723349 | 400.114402 | 8526.730135 |
246 | b23a6b42-be73-11eb-809d-acde48001122 | 60 | 100 | 0.172162 | 4.920619 | 144 | 0.135484 | 95.348634 | 0.025803 | 0.008870 | 77.498534 | 26 | Paygo | False | 122.134147 | 80.716842 | 41.417305 | 808.041389 | 513.014691 | 295.026699 | 7389.379292 |
247 | b23bdc66-be73-11eb-809d-acde48001122 | 60 | 100 | 0.136954 | 4.901048 | 100 | 0.109919 | 114.310247 | 0.027538 | 0.016733 | 46.312847 | 41 | Paygo | False | 68.217253 | 78.289223 | -10.071970 | -110.710261 | -276.303370 | 165.593109 | 5294.033017 |
[37]:
display_experiments(scope, oops_result1, rows=['time_savings'])
Time Savings
Some of these results are obviously problematic. Increasing capacity cannot possibly result in a negative travel time savings. (Braess paradox doesn’t apply here because it’s just one link, not a network.) So those negative values are clearly wrong. We can fix the model so they won’t be wrong, but by default the run_experiments
method won’t actually re-run models when the results are already available in the database. To solve this conundrum, we can mark the incorrect results as invalid,
using a query to pull out the rows that can be flagged as wrong.
[38]:
db.invalidate_experiment_runs(
queries=['time_savings < 0']
)
[38]:
[73]
The [73]
returned here indicates that 73 sets of results were invalidated by this command. The invalidation command actually sets a “valid” flag in the database to False for these experiment runs, so that a persistant record that they are bad is stored in the database. Now we can fix our model, and then use the run_experiments
method to get new model runs for the invalidated results.
[39]:
model.lane_width = 10
oops_result2 = model.run_experiments(oops)
[40]:
display_experiments(scope, 'oops', db=db, rows=['time_savings'])
Time Savings
The re-run fixed the negative values, although it left in place the other experimental runs in the database. By the way we constructed this example, we know those are wrong too, and it’s evident in the apparent discontinuity in the input flow graph, which we can zoom in on.
[41]:
ax = oops_result2.plot.scatter(x='input_flow', y='time_savings', color='r')
ax.plot([109,135], [0,35], '--',color='y');
Those original results are bad too, and we want to invalidate them as well. In addition to giving conditional queries to the invalidate_experiment_runs
method, we can also give a dataframe of results that have run ids attached, and those unique ids will be used to to find and invalidate results in the database. Here, we pass in the dataframe of all the results, which contains all 110 runs, but only 37 runs are newly invalidated (77 were invalidated previously).
[42]:
db.invalidate_experiment_runs(
oops_result1
)
[42]:
37
Now when we run the experiments again, those 37 experiments are re-run.
[43]:
oops_result3 = model.run_experiments(oops)
[44]:
display_experiments(scope, 'lhs', db=db, rows=['time_savings'])
Time Savings
Writing Out All Runs¶
By default, the read_experiment_all
method returns the most recent valid set of performance measures for each experiment, but we can override this behavior to ask for 'all'
run results, or all 'valid'
or 'invalid'
results, by setting the runs
argument to those literal values. This allows us to easily write out data files containing all the results stored in the database.
[45]:
db.read_experiment_all(scope.name, with_run_ids=True, runs='all')
[45]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | amortization_period | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | run | ||||||||||||||||||||
1 | af00ae82-be73-11eb-809d-acde48001122 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | 38 | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 |
2 | af0320e0-be73-11eb-809d-acde48001122 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | 36 | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 |
3 | af04aaaa-be73-11eb-809d-acde48001122 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | 44 | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 |
4 | af0628e4-be73-11eb-809d-acde48001122 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | 42 | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 |
5 | af07aea8-be73-11eb-809d-acde48001122 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | 42 | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
350 | b3f00848-be73-11eb-809d-acde48001122 | 60 | 100 | 0.127109 | 4.855843 | 103 | 0.130109 | 110.140882 | 0.039189 | 0.016098 | 69.050648 | 31 | GO Bond | False | 68.803652 | 60.687779 | 8.115873 | 108.762921 | -345.671302 | 454.434223 | 7605.299268 |
351 | b2d23300-be73-11eb-809d-acde48001122 | 60 | 100 | 0.191902 | 4.144514 | 102 | 0.145524 | 141.761624 | 0.037483 | 0.004611 | 20.006234 | 45 | GO Bond | True | 72.499003 | 83.787080 | -11.288077 | -167.553833 | -295.911196 | 128.357363 | 2836.116224 |
b3f17d7c-be73-11eb-809d-acde48001122 | 60 | 100 | 0.191902 | 4.144514 | 102 | 0.145524 | 141.761624 | 0.037483 | 0.004611 | 20.006234 | 45 | GO Bond | True | 72.499003 | 65.869676 | 6.629328 | 98.401992 | -29.955371 | 128.357363 | 2836.116224 | |
352 | b2d3b568-be73-11eb-809d-acde48001122 | 60 | 100 | 0.189252 | 4.588903 | 111 | 0.191116 | 105.423753 | 0.038912 | 0.012596 | 70.999408 | 16 | Rev Bond | True | 78.330479 | 78.364867 | -0.034388 | -0.729512 | -597.920964 | 597.191453 | 7485.024087 |
b3f2fdd2-be73-11eb-809d-acde48001122 | 60 | 100 | 0.189252 | 4.588903 | 111 | 0.191116 | 105.423753 | 0.038912 | 0.012596 | 70.999408 | 16 | Rev Bond | True | 78.330479 | 61.563090 | 16.767389 | 355.700934 | -241.490519 | 597.191453 | 7485.024087 |
462 rows × 20 columns
In the resulting dataframe above, we can see that we have retrieved two different runs for some of the experiments. Only one of them is valid for each. If we want to get all the stored runs and also mark the valid and invalid runs, we can read them seperately and attach a tag to the two dataframes.
[46]:
runs_1 = db.read_experiment_all(scope.name, with_run_ids=True, runs='valid')
runs_1['is_valid'] = True
runs_0 = db.read_experiment_all(scope.name, with_run_ids=True, runs='invalid', only_with_measures=True)
runs_0['is_valid'] = False
all_runs = pd.concat([runs_1, runs_0])
all_runs.sort_index()
[46]:
free_flow_time | initial_capacity | alpha | beta | input_flow | value_of_time | unit_cost_expansion | interest_rate | yield_curve | expand_capacity | ... | debt_type | interest_rate_lock | no_build_travel_time | build_travel_time | time_savings | value_of_time_savings | net_benefits | cost_of_capacity_expansion | present_cost_expansion | is_valid | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
experiment | run | |||||||||||||||||||||
1 | af00ae82-be73-11eb-809d-acde48001122 | 60 | 100 | 0.184682 | 5.237143 | 115 | 0.059510 | 118.213466 | 0.031645 | 0.015659 | 18.224793 | ... | Rev Bond | False | 83.038716 | 69.586789 | 13.451927 | 92.059972 | -22.290905 | 114.350877 | 2154.415985 | True |
2 | af0320e0-be73-11eb-809d-acde48001122 | 60 | 100 | 0.166133 | 4.121963 | 129 | 0.107772 | 141.322696 | 0.037612 | 0.007307 | 87.525790 | ... | Paygo | True | 88.474313 | 62.132583 | 26.341730 | 366.219659 | -16.843014 | 383.062672 | 12369.380535 | True |
3 | af04aaaa-be73-11eb-809d-acde48001122 | 60 | 100 | 0.198937 | 4.719838 | 105 | 0.040879 | 97.783320 | 0.028445 | -0.001545 | 45.698048 | ... | GO Bond | False | 75.027180 | 62.543328 | 12.483852 | 53.584943 | -113.988412 | 167.573355 | 4468.506839 | True |
4 | af0628e4-be73-11eb-809d-acde48001122 | 60 | 100 | 0.158758 | 4.915816 | 113 | 0.182517 | 127.224901 | 0.036234 | 0.004342 | 51.297546 | ... | GO Bond | True | 77.370428 | 62.268768 | 15.101660 | 311.462907 | 11.539561 | 299.923347 | 6526.325171 | True |
5 | af07aea8-be73-11eb-809d-acde48001122 | 60 | 100 | 0.157671 | 3.845952 | 133 | 0.067102 | 107.820482 | 0.039257 | 0.001558 | 22.824149 | ... | Paygo | False | 88.328990 | 72.848428 | 15.480561 | 138.156464 | 78.036616 | 60.119848 | 2460.910705 | True |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
350 | b3f00848-be73-11eb-809d-acde48001122 | 60 | 100 | 0.127109 | 4.855843 | 103 | 0.130109 | 110.140882 | 0.039189 | 0.016098 | 69.050648 | ... | GO Bond | False | 68.803652 | 60.687779 | 8.115873 | 108.762921 | -345.671302 | 454.434223 | 7605.299268 | True |
351 | b2d23300-be73-11eb-809d-acde48001122 | 60 | 100 | 0.191902 | 4.144514 | 102 | 0.145524 | 141.761624 | 0.037483 | 0.004611 | 20.006234 | ... | GO Bond | True | 72.499003 | 83.787080 | -11.288077 | -167.553833 | -295.911196 | 128.357363 | 2836.116224 | False |
b3f17d7c-be73-11eb-809d-acde48001122 | 60 | 100 | 0.191902 | 4.144514 | 102 | 0.145524 | 141.761624 | 0.037483 | 0.004611 | 20.006234 | ... | GO Bond | True | 72.499003 | 65.869676 | 6.629328 | 98.401992 | -29.955371 | 128.357363 | 2836.116224 | True | |
352 | b2d3b568-be73-11eb-809d-acde48001122 | 60 | 100 | 0.189252 | 4.588903 | 111 | 0.191116 | 105.423753 | 0.038912 | 0.012596 | 70.999408 | ... | Rev Bond | True | 78.330479 | 78.364867 | -0.034388 | -0.729512 | -597.920964 | 597.191453 | 7485.024087 | False |
b3f2fdd2-be73-11eb-809d-acde48001122 | 60 | 100 | 0.189252 | 4.588903 | 111 | 0.191116 | 105.423753 | 0.038912 | 0.012596 | 70.999408 | ... | Rev Bond | True | 78.330479 | 61.563090 | 16.767389 | 355.700934 | -241.490519 | 597.191453 | 7485.024087 | True |
462 rows × 21 columns
These mechanisms can be use to write out results of multiple runs, and to repopulate a database with both valid and invalid raw results. This can be done multiple ways (seperate files, one combined file, keeping track of invalidation queries, etc.). The particular implementations of each are left as an exercise for the reader.