Database Walkthrough¶

[1]:

import os
import numpy as np
import pandas as pd
import seaborn; seaborn.set_theme()
import plotly.io; plotly.io.templates.default = "seaborn"
import emat
import yaml
from emat.util.show_dir import show_dir
from emat.analysis import display_experiments
emat.versions()

emat 0.5.2, plotly 4.14.3

For this walkthrough of database features, we’ll work in a temporary directory. (In real projects you’ll likely want to save your data somewhere less ephemeral, so don’t just copy this tempfile code into your work.)

[2]:

import tempfile
tempdir = tempfile.TemporaryDirectory()
os.chdir(tempdir.name)

We begin our example by populating a database with some experimental data, by creating and running a single design of experiments for the Road Test model.

[3]:

import emat.examples
scope, db, model = emat.examples.road_test()
design = model.design_experiments()
model.run_experiments(design);

Single-Design Datasets¶

Writing Out Raw Data¶

When the database has only a single design of experiments, or if we don’t care about any differentiation between multiple designs that we may have created and ran, we can dump the entire set of model runs, including uncertainties, policy levers, and performance measures, all consolidated into a single pandas DataFrame using the read_experiment_all function. The constants even appear in this DataFrame too, for good measure.

[4]:

df = db.read_experiment_all(scope.name)
df

[4]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
1	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	38	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985
2	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	36	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535
3	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	44	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839
4	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	42	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171
5	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	42	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
106	60	100	0.169674	4.939898	150	0.131775	112.348054	0.033034	-0.000120	24.215074	34	Rev Bond	True	135.446332	85.847986	49.598345	980.371960	841.462782	138.909178	2720.516457
107	60	100	0.148297	3.824779	110	0.103255	105.248708	0.033437	0.007041	38.013885	22	GO Bond	True	72.811506	63.736168	9.075338	103.078214	-146.712793	249.791007	4000.912327
108	60	100	0.134701	3.627795	144	0.035233	132.063099	0.036702	0.018681	52.155613	32	Paygo	True	90.340993	66.617994	23.722999	120.358653	-112.568104	232.926757	6887.831931
109	60	100	0.176125	4.243675	100	0.188515	97.503468	0.035896	0.016807	38.352260	33	Paygo	True	70.567477	62.664842	7.902635	148.976184	25.480553	123.495631	3739.478399
110	60	100	0.142843	4.486119	145	0.064909	99.837764	0.035372	0.011055	15.851006	46	GO Bond	False	105.386378	83.456509	21.929869	206.401068	127.311542	79.089526	1582.528991

110 rows × 20 columns

Exporting this data is simply a matter of using the usual pandas methods to save the dataframe to a format of your choosing. We’ll save our data into a gzipped CSV file, which is somewhat compressed (we’re not monsters here) but still widely compatible for a variety of uses.

[5]:

df.to_csv("road_test_1.csv.gz")

This table contains most of the information we want to export from our database, but not everything. We also probably want to have access to all of the information in the exploratory scope as well. Our example generator gives us a Scope reference directly, but if we didn’t have that we can still extract it from the database, using the read_scope method.

[6]:

s = db.read_scope()
s

[6]:

<emat.Scope with 2 constants, 7 uncertainties, 4 levers, 7 measures>

[7]:

s.dump(filename="road_test_scope.yaml")

[8]:

show_dir('.')

/
├── road_test_1.csv.gz
└── road_test_scope.yaml

Reading In Raw Data¶

Now, we’re ready to begin anew, constructing a fresh database from scratch, using only the raw formatted files.

First, let’s load our scope from the yaml file, and initialize a clean database using that scope.

[9]:

s2 = emat.Scope("road_test_scope.yaml")

[10]:

db2 = emat.SQLiteDB("road_test_2.sqldb")

[11]:

db2.store_scope(s2)

Just as we used pandas to save out our consolidated DataFrame of experimental results, we can use it to read in a consolidated table of experiments.

[12]:

df2 = pd.read_csv("road_test_1.csv.gz", index_col='experiment')
df2

[12]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
1	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	38	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985
2	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	36	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535
3	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	44	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839
4	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	42	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171
5	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	42	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
106	60	100	0.169674	4.939898	150	0.131775	112.348054	0.033034	-0.000120	24.215074	34	Rev Bond	True	135.446332	85.847986	49.598345	980.371960	841.462782	138.909178	2720.516457
107	60	100	0.148297	3.824779	110	0.103255	105.248708	0.033437	0.007041	38.013885	22	GO Bond	True	72.811506	63.736168	9.075338	103.078214	-146.712793	249.791007	4000.912327
108	60	100	0.134701	3.627795	144	0.035233	132.063099	0.036702	0.018681	52.155613	32	Paygo	True	90.340993	66.617994	23.722999	120.358653	-112.568104	232.926757	6887.831931
109	60	100	0.176125	4.243675	100	0.188515	97.503468	0.035896	0.016807	38.352260	33	Paygo	True	70.567477	62.664842	7.902635	148.976184	25.480553	123.495631	3739.478399
110	60	100	0.142843	4.486119	145	0.064909	99.837764	0.035372	0.011055	15.851006	46	GO Bond	False	105.386378	83.456509	21.929869	206.401068	127.311542	79.089526	1582.528991

110 rows × 20 columns

Writing experiments to a database is not quite as simple as reading them. There is a parallel write_experiment_all method for the Database class, but to use it we need to provide not only the DataFrame of actual results, but also a name for the design of experiments we are writing (all experiments exist within designs) and the source of the performance measure results (zero means actual results from a core model run, and non-zero values are ID numbers for metamodels). This allows many different possible sets of performance measures to be stored for the same set of input parameters.

[13]:

db2.write_experiment_all(
    scope_name=s2.name,
    design_name='general',
    source=0,
    xlm_df=df2,
)

[14]:

display_experiments(s2, 'general', db=db2, rows=['time_savings'])

Time Savings

Multiple-Design Datasets¶

The EMAT database is not limited to storing a single design of experiments. Multiple designs can be stored for the same scope. We’ll add a set of univariate sensitivity test to our database, and a “ref” design that contains a single experiment with all inputs set to their default values.

[15]:

design_uni = model.design_experiments(sampler='uni')
model.run_experiments(design_uni)
model.run_reference_experiment();

We now have three designs stored in our database. We can confirm this by reading out the set of design names.

[16]:

db.read_design_names(s.name)

[16]:

['lhs', 'ref', 'uni']

The design names we se here are the default names given when designs are created with each of the given samplers. When creating new designs, we can override the default names with other names of our choice using the design_name argument. The names can be any string not already in use.

[17]:

design_b = model.design_experiments(sampler='lhs', design_name='bruce')
db.read_design_names(s.name)

[17]:

['bruce', 'lhs', 'ref', 'uni']

If you try to re-use a name you’ll get an error, as having multiple designs with the same name does not allow you to make it clear which design you are referring to.

[18]:

try:
    model.design_experiments(sampler='lhs', design_name='bruce')
except ValueError as err:
    print(err)

the design "bruce" already exists for scope "EMAT Road Test"

As noted above, the design name, which can be any string, is separate from the sampler method. A default design name based on the name of the sampler method is used if no design name is given. The selected sampler must be one available in EMAT, as the sampler defines a particular logic about how to generate the design.

[19]:

try:
    model.design_experiments(sampler='uni')
except ValueError as err:
    print(err)

Note that there can be some experiments that are in more than one design. This is not merely duplicating the experiment and results, but actually assigning the same experiment to both designs. We can see this for the ‘uni’ and ‘ref’ designs – both contain the all-default parameters experiment, and when we read these designs out of the database, the same experiment number is reported out in both designs.

[20]:

db.read_experiment_all(scope.name, design_name='uni').head()

[20]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
111	60	100	0.15	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	66.147121	2.852879	21.396592	-30.807346	52.203937	1000.0
112	60	100	0.10	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	66.0	64.098081	1.901919	14.264395	-37.939543	52.203937	1000.0
113	60	100	0.20	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	72.0	68.196161	3.803839	28.528789	-23.675148	52.203937	1000.0
114	60	100	0.15	3.5	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	66.447155	2.552845	19.146338	-33.057600	52.203937	1000.0
115	60	100	0.15	5.5	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	65.328227	3.671773	27.538295	-24.665642	52.203937	1000.0

[21]:

db.read_experiment_all(scope.name, design_name='ref')

[21]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
111	60	100	0.15	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	66.147121	2.852879	21.396592	-30.807346	52.203937	1000.0

One “gotcha” to be wary of is unintentionally replicating experiments. By default, the random_seed for randomly generated experiemnts is set to 0 for reproducibility. This means that, for example, the ‘bruce’ design is actually the same as the original ‘lhs’ design:

[22]:

db.read_experiment_all(scope.name, design_name='lhs').equals(
    db.read_experiment_all(scope.name, design_name='bruce')
)

[22]:

True

If we want a new set of random experiments with the same sampler and other parameters, we’ll need to provide a different random_seed.

[23]:

design_b = model.design_experiments(sampler='lhs', design_name='new_bruce', random_seed=42)
db.read_experiment_all(scope.name, design_name='lhs').equals(design_b)

[23]:

False

Writing Out Raw Data¶

We can read a single dataframe containing all the experiments associated with this scope by omitting the design_name argument, just as if there was only one design.

[24]:

df = db.read_experiment_all(scope.name)
df

[24]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
1	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	38	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985
2	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	36	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535
3	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	44	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839
4	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	42	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171
5	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	42	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
238	60	100	0.164649	4.851786	104	0.012456	127.667298	0.025673	0.012674	24.114264	30	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
239	60	100	0.113568	4.545709	105	0.137105	139.911784	0.028280	0.016936	92.883345	15	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
240	60	100	0.184434	4.732430	144	0.100242	99.904559	0.030392	0.019534	84.793007	15	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
241	60	100	0.146598	3.616301	107	0.065130	135.593036	0.035733	0.002085	29.107628	34	Paygo	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
242	60	100	0.193327	5.402220	134	0.048285	140.945872	0.026210	-0.001255	26.285726	44	Rev Bond	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN

242 rows × 20 columns

This dataframe is different than the one we saw earlier with the same command, as we have since added a few more experiments to the database in a few different designs. If we don’t give a design_name argument, we’ll retrieve every (unique) experiment from every design currently stored in the database.

[25]:

df.to_csv("road_test_2.csv.gz")

If we want to be able to reconstruct the various designs of experiments later, we’ll also need to write out instructions for that. The read_all_experiment_ids method can give us a dictionary of all the relevant information.

[26]:

design_experiments = db.read_all_experiment_ids(scope.name, design_name='*',grouped=True)
design_experiments

[26]:

{'bruce': '1-110',
 'lhs': '1-110',
 'new_bruce': '133-242',
 'ref': '111',
 'uni': '111-132',
 'uni_2': '111-132'}

We can write this dictionary to a file in ‘yaml’ format.

[27]:

with open("road_test_design_experiments.yaml", 'wt') as f:
    yaml.dump(design_experiments, f)

Reading In Raw Data¶

To construct a new emat Database with multiple designs of experients,…

[28]:

db3 = emat.SQLiteDB("road_test_3.sqldb")
db3.store_scope(s2)

[29]:

df3 = pd.read_csv("road_test_2.csv.gz", index_col='experiment')
df3

[29]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
1	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	38	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985
2	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	36	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535
3	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	44	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839
4	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	42	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171
5	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	42	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
238	60	100	0.164649	4.851786	104	0.012456	127.667298	0.025673	0.012674	24.114264	30	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
239	60	100	0.113568	4.545709	105	0.137105	139.911784	0.028280	0.016936	92.883345	15	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
240	60	100	0.184434	4.732430	144	0.100242	99.904559	0.030392	0.019534	84.793007	15	GO Bond	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
241	60	100	0.146598	3.616301	107	0.065130	135.593036	0.035733	0.002085	29.107628	34	Paygo	True	NaN	NaN	NaN	NaN	NaN	NaN	NaN
242	60	100	0.193327	5.402220	134	0.048285	140.945872	0.026210	-0.001255	26.285726	44	Rev Bond	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN

242 rows × 20 columns

[30]:

with open("road_test_design_experiments.yaml", 'rt') as f:
    design_experiments2 = yaml.safe_load(f)
design_experiments2

[30]:

{'bruce': '1-110',
 'lhs': '1-110',
 'new_bruce': '133-242',
 'ref': '111',
 'uni': '111-132',
 'uni_2': '111-132'}

[31]:

db3.write_experiment_all(
    scope_name=s2.name,
    design_name=design_experiments2,
    source=0,
    xlm_df=df3,
)

[32]:

db3.read_design_names(s.name)

[32]:

['bruce', 'lhs', 'new_bruce', 'ref', 'uni', 'uni_2']

[33]:

db3.read_all_experiment_ids(scope.name, design_name='*',grouped=True)

[33]:

{'bruce': '1-110',
 'lhs': '1-110',
 'new_bruce': '133-242',
 'ref': '111',
 'uni': '111-132',
 'uni_2': '111-132'}

[34]:

db3.read_experiment_all(scope.name, design_name='uni').head()

[34]:

	free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment
111	60	100	0.15	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	66.147121	2.852879	21.396592	-30.807346	52.203937	1000.0
112	60	100	0.10	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	66.0	64.098081	1.901919	14.264395	-37.939543	52.203937	1000.0
113	60	100	0.20	4.0	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	72.0	68.196161	3.803839	28.528789	-23.675148	52.203937	1000.0
114	60	100	0.15	3.5	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	66.447155	2.552845	19.146338	-33.057600	52.203937	1000.0
115	60	100	0.15	5.5	100	0.075	100.0	0.03	0.01	10.0	30	GO Bond	False	69.0	65.328227	3.671773	27.538295	-24.665642	52.203937	1000.0

Re-running Experiments¶

This section provides a short walkthrough of how to handle mistakes in an EMAT database. By “mistakes” we are referring to incorrect values that have been written into the database by accident, generally arising from core model runs that were misconfigured or suffered non-fatal errors that caused the results to be invalid.

One approach to handling such problems is to simply start over with a brand new clean database file. However, this may be inconvenient if the database already includes a number of valid results, especially if those valid results were expensive to generate. It may also be desirable to keep prior invalid results on hand, so as to easily recognized when errors recur.

We begin this example by populating our database with some more experimental data, by creating and running a single design of experiments for the Road Test model, except these experiments will be created with a misconfigured model (lane_width = 11, it should be 10), so the results will be bad. (In general, you probably won’t intentionally create corrupt data, but we’re doing so here for expository purposes, so we’ll give this design a name of ‘oops’ so we can readily recall what we’ve done.)

[35]:

model.lane_width = 10.3
oops = model.design_experiments(design_name='oops', random_seed=12345)
model.run_experiments(oops);

We can review a dataframe of results as before, using the read_experiment_all method. This time we will add with_run_ids=True, which will add an extra column to the index, showing a universally unique id attached to each row of results.

[36]:

oops_result1 = db.read_experiment_all(scope.name, 'oops', with_run_ids=True)
oops_result1.head()

[36]:

		free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment	run
243	b2358b5e-be73-11eb-809d-acde48001122	60	100	0.157955	4.270404	112	0.151487	116.445987	0.029784	0.018968	50.163393	38	Rev Bond	False	75.376818	79.495869	-4.119051	-69.886026	-381.303111	311.417085	5841.325841
244	b2376db6-be73-11eb-809d-acde48001122	60	100	0.165144	4.008218	134	0.039670	102.879244	0.032505	0.000708	57.330143	40	Rev Bond	False	92.024075	83.059349	8.964726	47.654780	-220.075486	267.730266	5898.081794
245	b238ff32-be73-11eb-809d-acde48001122	60	100	0.103964	4.019636	92	0.100936	131.282035	0.038432	0.002658	64.949710	39	GO Bond	True	64.461446	77.449591	-12.988145	-120.608948	-520.723349	400.114402	8526.730135
246	b23a6b42-be73-11eb-809d-acde48001122	60	100	0.172162	4.920619	144	0.135484	95.348634	0.025803	0.008870	77.498534	26	Paygo	False	122.134147	80.716842	41.417305	808.041389	513.014691	295.026699	7389.379292
247	b23bdc66-be73-11eb-809d-acde48001122	60	100	0.136954	4.901048	100	0.109919	114.310247	0.027538	0.016733	46.312847	41	Paygo	False	68.217253	78.289223	-10.071970	-110.710261	-276.303370	165.593109	5294.033017

[37]:

display_experiments(scope, oops_result1, rows=['time_savings'])

Time Savings

Some of these results are obviously problematic. Increasing capacity cannot possibly result in a negative travel time savings. (Braess paradox doesn’t apply here because it’s just one link, not a network.) So those negative values are clearly wrong. We can fix the model so they won’t be wrong, but by default the run_experiments method won’t actually re-run models when the results are already available in the database. To solve this conundrum, we can mark the incorrect results as invalid, using a query to pull out the rows that can be flagged as wrong.

[38]:

db.invalidate_experiment_runs(
    queries=['time_savings < 0']
)

[38]:

[73]

The [73] returned here indicates that 73 sets of results were invalidated by this command. The invalidation command actually sets a “valid” flag in the database to False for these experiment runs, so that a persistant record that they are bad is stored in the database. Now we can fix our model, and then use the run_experiments method to get new model runs for the invalidated results.

[39]:

model.lane_width = 10
oops_result2 = model.run_experiments(oops)

[40]:

display_experiments(scope, 'oops', db=db, rows=['time_savings'])

Time Savings

The re-run fixed the negative values, although it left in place the other experimental runs in the database. By the way we constructed this example, we know those are wrong too, and it’s evident in the apparent discontinuity in the input flow graph, which we can zoom in on.

[41]:

ax = oops_result2.plot.scatter(x='input_flow', y='time_savings', color='r')
ax.plot([109,135], [0,35], '--',color='y');

../../_images/source_emat.database_database-walkthrough_71_0.png

Those original results are bad too, and we want to invalidate them as well. In addition to giving conditional queries to the invalidate_experiment_runs method, we can also give a dataframe of results that have run ids attached, and those unique ids will be used to to find and invalidate results in the database. Here, we pass in the dataframe of all the results, which contains all 110 runs, but only 37 runs are newly invalidated (77 were invalidated previously).

[42]:

db.invalidate_experiment_runs(
    oops_result1
)

[42]:

Now when we run the experiments again, those 37 experiments are re-run.

[43]:

oops_result3 = model.run_experiments(oops)

[44]:

display_experiments(scope, 'lhs', db=db, rows=['time_savings'])

Time Savings

Writing Out All Runs¶

By default, the read_experiment_all method returns the most recent valid set of performance measures for each experiment, but we can override this behavior to ask for 'all' run results, or all 'valid' or 'invalid' results, by setting the runs argument to those literal values. This allows us to easily write out data files containing all the results stored in the database.

[45]:

db.read_experiment_all(scope.name, with_run_ids=True, runs='all')

[45]:

		free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	amortization_period	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion
experiment	run
1	af00ae82-be73-11eb-809d-acde48001122	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	38	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985
2	af0320e0-be73-11eb-809d-acde48001122	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	36	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535
3	af04aaaa-be73-11eb-809d-acde48001122	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	44	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839
4	af0628e4-be73-11eb-809d-acde48001122	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	42	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171
5	af07aea8-be73-11eb-809d-acde48001122	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	42	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
350	b3f00848-be73-11eb-809d-acde48001122	60	100	0.127109	4.855843	103	0.130109	110.140882	0.039189	0.016098	69.050648	31	GO Bond	False	68.803652	60.687779	8.115873	108.762921	-345.671302	454.434223	7605.299268
351	b2d23300-be73-11eb-809d-acde48001122	60	100	0.191902	4.144514	102	0.145524	141.761624	0.037483	0.004611	20.006234	45	GO Bond	True	72.499003	83.787080	-11.288077	-167.553833	-295.911196	128.357363	2836.116224
351	b3f17d7c-be73-11eb-809d-acde48001122	60	100	0.191902	4.144514	102	0.145524	141.761624	0.037483	0.004611	20.006234	45	GO Bond	True	72.499003	65.869676	6.629328	98.401992	-29.955371	128.357363	2836.116224
352	b2d3b568-be73-11eb-809d-acde48001122	60	100	0.189252	4.588903	111	0.191116	105.423753	0.038912	0.012596	70.999408	16	Rev Bond	True	78.330479	78.364867	-0.034388	-0.729512	-597.920964	597.191453	7485.024087
352	b3f2fdd2-be73-11eb-809d-acde48001122	60	100	0.189252	4.588903	111	0.191116	105.423753	0.038912	0.012596	70.999408	16	Rev Bond	True	78.330479	61.563090	16.767389	355.700934	-241.490519	597.191453	7485.024087

462 rows × 20 columns

In the resulting dataframe above, we can see that we have retrieved two different runs for some of the experiments. Only one of them is valid for each. If we want to get all the stored runs and also mark the valid and invalid runs, we can read them seperately and attach a tag to the two dataframes.

[46]:

runs_1 = db.read_experiment_all(scope.name, with_run_ids=True, runs='valid')
runs_1['is_valid'] = True
runs_0 = db.read_experiment_all(scope.name, with_run_ids=True, runs='invalid', only_with_measures=True)
runs_0['is_valid'] = False
all_runs = pd.concat([runs_1, runs_0])
all_runs.sort_index()

[46]:

		free_flow_time	initial_capacity	alpha	beta	input_flow	value_of_time	unit_cost_expansion	interest_rate	yield_curve	expand_capacity	...	debt_type	interest_rate_lock	no_build_travel_time	build_travel_time	time_savings	value_of_time_savings	net_benefits	cost_of_capacity_expansion	present_cost_expansion	is_valid
experiment	run
1	af00ae82-be73-11eb-809d-acde48001122	60	100	0.184682	5.237143	115	0.059510	118.213466	0.031645	0.015659	18.224793	...	Rev Bond	False	83.038716	69.586789	13.451927	92.059972	-22.290905	114.350877	2154.415985	True
2	af0320e0-be73-11eb-809d-acde48001122	60	100	0.166133	4.121963	129	0.107772	141.322696	0.037612	0.007307	87.525790	...	Paygo	True	88.474313	62.132583	26.341730	366.219659	-16.843014	383.062672	12369.380535	True
3	af04aaaa-be73-11eb-809d-acde48001122	60	100	0.198937	4.719838	105	0.040879	97.783320	0.028445	-0.001545	45.698048	...	GO Bond	False	75.027180	62.543328	12.483852	53.584943	-113.988412	167.573355	4468.506839	True
4	af0628e4-be73-11eb-809d-acde48001122	60	100	0.158758	4.915816	113	0.182517	127.224901	0.036234	0.004342	51.297546	...	GO Bond	True	77.370428	62.268768	15.101660	311.462907	11.539561	299.923347	6526.325171	True
5	af07aea8-be73-11eb-809d-acde48001122	60	100	0.157671	3.845952	133	0.067102	107.820482	0.039257	0.001558	22.824149	...	Paygo	False	88.328990	72.848428	15.480561	138.156464	78.036616	60.119848	2460.910705	True
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
350	b3f00848-be73-11eb-809d-acde48001122	60	100	0.127109	4.855843	103	0.130109	110.140882	0.039189	0.016098	69.050648	...	GO Bond	False	68.803652	60.687779	8.115873	108.762921	-345.671302	454.434223	7605.299268	True
351	b2d23300-be73-11eb-809d-acde48001122	60	100	0.191902	4.144514	102	0.145524	141.761624	0.037483	0.004611	20.006234	...	GO Bond	True	72.499003	83.787080	-11.288077	-167.553833	-295.911196	128.357363	2836.116224	False
351	b3f17d7c-be73-11eb-809d-acde48001122	60	100	0.191902	4.144514	102	0.145524	141.761624	0.037483	0.004611	20.006234	...	GO Bond	True	72.499003	65.869676	6.629328	98.401992	-29.955371	128.357363	2836.116224	True
352	b2d3b568-be73-11eb-809d-acde48001122	60	100	0.189252	4.588903	111	0.191116	105.423753	0.038912	0.012596	70.999408	...	Rev Bond	True	78.330479	78.364867	-0.034388	-0.729512	-597.920964	597.191453	7485.024087	False
352	b3f2fdd2-be73-11eb-809d-acde48001122	60	100	0.189252	4.588903	111	0.191116	105.423753	0.038912	0.012596	70.999408	...	Rev Bond	True	78.330479	61.563090	16.767389	355.700934	-241.490519	597.191453	7485.024087	True

462 rows × 21 columns

These mechanisms can be use to write out results of multiple runs, and to repopulate a database with both valid and invalid raw results. This can be done multiple ways (seperate files, one combined file, keeping track of invalidation queries, etc.). The particular implementations of each are left as an exercise for the reader.