Running Experiments
tl;dr
When you run a VisionEval model, it takes a bunch of files as input, it does some stuff, and then it gives you a bunch of files as output. To run an experiment from EMAT, we need to set up the input files to reflect the values of policy levers and exogenous uncertainties for that experiment, run VisionEval model to get the outputs, then extract whatever performance measures we want from those outputs and feed them back to EMAT.
The idea behind EMAT is to run a number of experiments, and then analyze the results of those experiments. The number of experiments that needs to be run is a function of the level of complexity of the EMAT scope, but in general it is more experiments that a user would want to run manually. Thus, the EMAT toolset is designed to automate the process of running experiments.
When working with VisionEval, at least as defined in this demonstration repository, we will be treating the VisionEval model as a "files-based core model". Doing so requires a few steps for each experiment:
- Prepare the input files for the VisionEval model, based on the values of policy levers and exogenous uncertainties defined for the experiment.
- Run the VisionEval model, using the input files that have been prepared.
- (Optional) Run any post-processing steps that are needed to extract the results of the experiment from the output files of the VisionEval model.
- Collect the output files from the VisionEval model and parse then to extract the results of the experiment.
Each of these steps is encapsulated in a Python function that is part of the
FilesCoreModel
interface. In the implementation code, you will see a class that is a subclass of
FilesCoreModel
, and that class will define the specific steps needed to prepare
the input files, run the model, and extract the results.
from emat.model.core_files import FilesCoreModel
class VEModel(FilesCoreModel): # (1)!
"""
A class for using Vision Eval as a files core model.
"""
...
- The
VEModel
class is a subclass ofFilesCoreModel
, which defines the specific steps needed to prepare the input files, run the model, and extract the results.
You can see some examples of the FilesCoreModel
interface here
and here.
The process for creating a new analysis with EMAT and VisionEval includes creating
a similar class that is a subclass of FilesCoreModel
, and then defining the specific
methods needed to carry out the steps of the integration. This can be done from
scratch, or by copying and modifying an existing example.
Setting Up an Experiment
Each experiment involves making a complete copy of the VisionEval model in a contained
environment, and then modifying the input files for that copy of the VisionEval model
to reflect the specific values of policy levers and exogenous uncertainties for that
experiment. The FilesCoreModel
interface defines the setup
method as the place to
create a new copy of the VisionEval model in a contained environment, and then modify
the input files for that copy of the VisionEval model. The setup
method needs to be
overloaded in a subclass of FilesCoreModel
to define the specific steps needed to
modify the input files for the experiment.
class VEModel(FilesCoreModel):
...
def setup(self, params: dict): # (1)!
"""
Configure the core model with the experiment variable values.
Args:
params (dict):
experiment variables including both exogenous
uncertainty and policy levers
Raises:
KeyError:
if a defined experiment variable is not supported
by the core model
"""
- The
setup
method accepts a dictionary of parameters, which includes the values of policy levers and exogenous uncertainties for the experiment.
Within the setup
method, the subclass of FilesCoreModel
will need to make a complete
copy of the VisionEval model in a contained environment, and then modify the input files
for that copy of the VisionEval model to reflect the specific values of policy levers and
exogenous uncertainties for that experiment.
There are numerous possible ways to prepare the input files for the VisionEval model, depending on the exploratory scope and the types of inputs that need to be modified. This demo repository includes a few different examples of how to prepare input files based on the scope:
- Categorical Drop-In
- Mixture of Data Tables
- Scaling Data Tables
- Additive Data Tables
- Template Injection
- Direct Injection
- Custom Methods
Each of these methods can be implemented in a bespoke manner for each specific input parameter (both policy levers and exogenous uncertainties), or you can use generic methods that can be applied to a wide range of input parameters. The generic approach is shown in the example repositories.
Categorical Drop-In
Many of the input files for VisionEval are in the form of CSV files. The simplest way to actuate a change in the input files is to simply select an entire file that has the desired values, and copy that file into the requisite input location. This is limited to categorical inputs, which are inputs that can be represented as discrete categorical values. For example, you may have two different population projections, one that represents scenario "A" where a particular brownfield area is cleaned up and developed, and another that represents scenario "B" where the brownfield is left as is. Under this policy lever, it doesn't make sense to have an intermediate value ("we'll just clean up part of the toxic waste, and let only few people move in").
An advantage of this method is that it is simple to implement, and it places no limits on the format of the input files. There is no need to have a specific format or a matching number of rows or columns in the input files. In the population projection example considered above, the input files for the two scenarios could have different numbers of rows as one of the two scenarios could imply a different zonal structure within the region.
In this example repository, this approach is called the "categorical drop-in" method.\
The VisionEval model will either use the inputs file "A" or the inputs file "B",
but not a mix of the two. This is expressed in the code by the categorical_drop_in
method, which is a method of the FilesCoreModel
.
def _manipulate_by_categorical_drop_in(
self,
params: dict, # (1)!
cat_param: str, # (2)!
ve_scenario_dir: os.PathLike, # (3)!
):
scenario_dir = params[cat_param]
for i in os.scandir(scenario_input(ve_scenario_dir, scenario_dir)): # (4)!
if i.is_file():
shutil.copyfile(
scenario_input(ve_scenario_dir, scenario_dir, i.name),
join_norm(self.resolved_model_path, "inputs", i.name),
)
- The
params
dictionary is passed through to the_manipulate_by_categorical_drop_in
method. This dictionary includes the values of all the policy levers and exogenous uncertainties for the experiment. - The
cat_param
argument is the name of the parameter in theparams
dictionary that is the categorical drop-in. - The
ve_scenario_dir
argument is the directory where the categorical input files for the categorical drop-in are stored. - The
_manipulate_by_categorical_drop_in
method will scan the appropriate directory\ where the categorical input files are stored, and copy the input files for the selected categorical value into the requisite input location for the VisionEval model.
This method is in turn called from individual setup
sub-methods, which will
define the specific input parameters that are categorical drop-ins. For example,
the _manipulate_carsvcavail
method can define the specific input parameters that are
categorical drop-ins for car service availability inputs.
def _manipulate_carsvcavail(self, params):
return self._manipulate_by_categorical_drop_in(
params, # (1)!
"CARSVCAVAILSCEN", # (2)!
self.scenario_input_dirs.get("CARSVCAVAILSCEN"), # (3)!
)
- The
params
dictionary is passed through to the_manipulate_by_categorical_drop_in
method. - The second argument to the
_manipulate_by_categorical_drop_in
method is the name of the parameter in theparams
dictionary that is the categorical drop-in, in this case theCARSVCAVAILSCEN
parameter. - The third argument to the
_manipulate_by_categorical_drop_in
method is the directory where the categorical input files for the categorical drop-in are stored.
You will find this function mirrored in the EMAT exploratory scope definition, where the categorical drop-in is defined as an uncertainty.
inputs:
CARSVCAVAILSCEN:
shortname: Car Service Availability
address: CARSVCAVAILSCEN
ptype: exogenous uncertainty
dtype: cat # (1)!
desc: Different levels of car service availability
default: mid # (2)!
values: # (3)!
- low
- mid
- high
- The
dtype
is set tocat
to indicate that this is a categorical input, which can only take on one of a discrete set of values. - The
default
value is set tomid
, which will be the selected value for this parameter if no other value is specified. - The
values
list defines the discrete set of values that this parameter can take on. These should be strings, so that we can match against sub-directory names in theScenario-Inputs
directory of the VisionEval model.
This structure also requires each categorical drop-in to have a corresponding
directory in the inputs
directory of the VisionEval model, where the input file(s)
for each categorical drop-in are stored. Note that there is a directory matching
each categorical value, and within that directory are the input files that are
to be used when that categorical value is selected. Generally, the names of the
input files will be the same across all categorical values, as shown here.
📁 Scenario-Inputs/
└── 📁 OTP/
├── 📁 ANOTHER_PARAMETER/
├── 📁 CARSVCAVAILSCEN/
│ ├── 📁 low/
│ │ └── 📄 marea_carsvc_availability.csv
│ ├── 📁 mid/
│ │ └── 📄 marea_carsvc_availability.csv
│ └── 📁 high/
│ └── 📄 marea_carsvc_availability.csv
└── 📁 OTHER_PARAMETER/
Mixture of Data Tables
In contrast to the categorical drop-in method, the "mixture of data tables" method allows for creating "intermediate" input files that are a mix of different input files. The approach is suitable for continuous inputs, which are inputs that can take on a range of values. For example, you may have a land use density projection that has upper and lower bounds, and you want to explore the effects of different levels of density between those limits.
An advantage of this method is that it allows for a more fine-grained exploration of the input space, and it can be used for continuous inputs. However, it does require that the input files have a specific format (a CSV table containing primarily numeric data), and that the number of rows and columns in the input files match across both the input files, which are labeled as "1" and "2" in this example.
Instead of copying an entire file, the mixture of data tables method will read in
both input files, and then linearly interpolate between the two input files based
on the value of the policy lever or exogenous uncertainty. This is expressed in the
code by the _manipulate_by_mixture
method, which is a method of the FilesCoreModel
.
def _manipulate_by_mixture(
self,
params, # (1)!
weight_param, # (2)!
ve_scenario_dir, # (3)!
no_mix_cols=(
"Year",
"Geo",
), # (4)!
float_dtypes=False, # (5)!
):
weight_2 = params[weight_param]
weight_1 = 1.0 - weight_2
# Gather list of all files in directory "1", and confirm they
# are also in directory "2"
filenames = []
for i in os.scandir(scenario_input(ve_scenario_dir, "1")):
if i.is_file():
filenames.append(i.name)
f2 = scenario_input(ve_scenario_dir, "2", i.name)
if not os.path.exists(f2):
raise FileNotFoundError(f2)
for filename in filenames:
df1 = pd.read_csv(scenario_input(ve_scenario_dir, "1", filename))
isna_ = (df1.isnull().values).any()
df1.fillna(0, inplace=True) # (6)!
df2 = pd.read_csv(scenario_input(ve_scenario_dir, "2", filename))
df2.fillna(0, inplace=True)
float_mix_cols = list(df1.select_dtypes("float").columns)
if float_dtypes:
float_mix_cols = float_mix_cols + list(df1.select_dtypes("int").columns)
for j in no_mix_cols:
if j in float_mix_cols:
float_mix_cols.remove(j)
if float_mix_cols:
df1_float = df1[float_mix_cols]
df2_float = df2[float_mix_cols]
df1[float_mix_cols] = df1_float * weight_1 + df2_float * weight_2
int_mix_cols = list(df1.select_dtypes("int").columns)
if float_dtypes:
int_mix_cols = list()
for j in no_mix_cols:
if j in int_mix_cols:
int_mix_cols.remove(j)
if int_mix_cols:
df1_int = df1[int_mix_cols]
df2_int = df2[int_mix_cols]
df_int_mix = df1_int * weight_1 + df2_int * weight_2
df1[int_mix_cols] = np.round(df_int_mix).astype(int) # (7)!
out_filename = join_norm(self.resolved_model_path, "inputs", filename)
if isna_:
df1.replace(0, np.nan, inplace=True)
df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")
- The
params
dictionary is passed through to the_manipulate_by_mixture
method. - The
weight_param
argument is the name of the parameter in theparams
dictionary that is the weight for the mixture of data tables. - The
ve_scenario_dir
argument is the directory where the input files for the mixture of data tables are stored. There should be two subdirectories, "1" and "2". - The
no_mix_cols
argument is a list of column names that should not be mixed. This is useful for columns that are not numerical, such as year or geography, which should not be mixed (or for which there is no reasonable linear interpolation). These columns will be copied from the input file in directory "1" to the output file. - The
float_dtypes
argument is a boolean that indicates whether integer columns should be treated as float columns for the purposes of mixing. Setting this toTrue
will treat integer columns as float columns, and will mix them as such, which can be problematic if VisionEval is expecting integers. - The
isna_
variable is set toTrue
if there are anyNaN
values in the input file. If there are, these will be replaced with zeros for the purposes of mixing, and then replaced withNaN
in the output file, as linear interpolation ofNaN
values is not possible. - The
df_int_mix
variable is the linear interpolation of the integer columns, and is optionally rounded to the nearest integer. This is done to ensure that the output file has integer values, which is important if VisionEval is expecting integers.
This method is in turn called from individual setup
sub-methods, which will
define the specific input parameters that are mixtures of data tables. For example,
the _manipulate_landuse
method can define the specific input parameters that are
mixtures of data tables for land use density inputs.
def _manipulate_ludensity(self, params):
return self._manipulate_by_mixture(
params, # (1)!
"LUDENSITYMIX", # (2)!
self.scenario_input_dirs.get("LUDENSITYMIX"), # (3)!
)
- The
params
dictionary is passed through to the__manipulate_by_mixture
method. - The second argument to the
_manipulate_by_mixture
method is the name of the parameter in theparams
dictionary that is controlling the, mixture, in this case theLUDENSITYMIX
parameter. - The third argument to the
__manipulate_by_mixture
method is the directory where the categorical input files for the mixture bounds are stored.
You will find this function mirrored in the EMAT exploratory scope definition, where the mixture of data tables is defined as an exogenous uncertainty.
inputs:
LUDENSITYMIX:
shortname: Urban Mix Prop
address: LUDENSITYMIX
ptype: exogenous uncertainty
dtype: float # (1)!
desc: Urban proportion for each marea by year
default: 0
min: 0 # (2)!
max: 1 # (3)!
- The
dtype
is set tofloat
to indicate that this is a continuous input, which can take on a range of values. - The
min
value for mixtures is always set to0
, which represents the lower bound for this parameter, and will set the weight of the "1" input file to1.0
and the weight of the "2" input file to0.0
. - The
max
value for mixtures is always set to1
, which represents the upper bound for this parameter, and will set the weight of the "1" input file to0.0
and the weight of the "2" input file to1.0
.
This structure also requires each mixture to have a corresponding
directory in the inputs
directory of the VisionEval model, where the input file(s)
for each categorical drop-in are stored. Note that there are exactly two sub-directories
in this parameters directory, and they are named "1" and "2", and within those two directories
are the input files that are to be mixed together. The names of the
input file(s) must be the same across all both sub-directories, as shown here,
and they must be in the same format (a CSV table containing primarily numeric data).
📁 Scenario-Inputs/
└── 📁 OTP/
├── 📁 ANOTHER_PARAMETER/
├── 📁 LUDENSITYMIX/
│ ├── 📁 1/
│ │ └── 📄 marea_mix_targets.csv
│ └── 📁 2/
│ └── 📄 marea_mix_targets.csv
└── 📁 OTHER_PARAMETER/
Scaling Data Tables
The scaling data tables method is much like the mixture of data tables method, but instead of linearly interpolating between two input files, the scaling data tables method will scale all the values in selected columns of an input file up or down based on the value of the policy lever or exogenous uncertainty. This is useful for continuous inputs that are best represented as a single table, but where the values in that table can be scaled up or down. For example, you may have a population projection that represents a "baseline" scenario, and you want to explore the effects of different levels of population growth.
The _manipulate_by_scale
function shown below can be included in an integration's
subclass of FilesCoreModel
, and used to scale the values in the input files
based on the value of the policy lever or exogenous uncertainty.
def _manipulate_by_scale(
self,
params, # (1)!
param_map, # (2)!
ve_scenario_dir, # (3)!
max_thresh=1e9, # (4)!
):
# Gather list of all files in scenario input directory
filenames = []
for i in os.scandir(scenario_input(ve_scenario_dir)):
if i.is_file():
filenames.append(i.name)
for filename in filenames:
df1 = pd.read_csv(scenario_input(ve_scenario_dir, filename))
for param_name, column_names in param_map.items():
if isinstance(column_names, str):
column_names = [column_names]
for column_name in column_names:
df1[[column_name]] = (df1[[column_name]] * params.get(param_name)).clip(
lower=-max_thresh, upper=max_thresh
) # (5)!
out_filename = join_norm(self.resolved_model_path, "inputs", filename)
df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")
- The
params
dictionary is passed through from thesetup
method to the_manipulate_by_scale
method. - The
param_map
argument is a dictionary that maps the parameter names in theparams
dictionary to the column names in the input file that should be scaled. - The
ve_scenario_dir
argument is the directory where the input files for the scaling are stored. - The
max_thresh
argument is the maximum value that any value in the input file can be scaled to. This is important to ensure that the scaled values are not too large or too small. - The
clip
method is used to ensure that the scaled values are not too large or too small. This is important to ensure that the scaled values are within the range of values that VisionEval is expecting.
If you use this approach, you would not set the min
and max
values for the
relevant parameter in the exploratory scope definition to 0 and 1, as you would
for the mixture model. Instead, set those limits to the minimum and maximum
values that you want to use for the scaling factor. The upper and lower limits
need not be symmetric around 1.0, as the scaling factor can be used to scale
values up or down, or both.
inputs:
LUDENSITYMIX:
shortname: Urban Mix Prop
address: LUDENSITYMIX
ptype: exogenous uncertainty
dtype: float # (1)!
desc: Urban proportion for each marea by year
default: 0
min: 0.75 # (2)!
max: 1.5 # (3)!
- The
dtype
is set tofloat
to indicate that this is a continuous input, which can take on a range of values. - The
min
value for scaling factor can be any value. Positive values less than or equal to 1.0 are most common, but negative values are also allowable, if the signs on the targeted values might be inverted. - The
max
value for the scaling factor can be any value. Positive values greater than or equal to 1.0 are most common.
The function __manipulate_by_scale
written above also implies that the input
files on which the scaling factor are applied are in the scenario directory, not
a subdirectory of the scenario directory, as was the case in mixture models.\
This is because the scaling factor method is applied to a single set of inputs,
so there's no need to have multiple subdirectories for the input files.
📁 Scenario-Inputs/
└── 📁 OTP/
├── 📁 ANOTHER_PARAMETER/
├── 📁 LUDENSITYMIX/
│ └── 📄 marea_mix_targets.csv
└── 📁 OTHER_PARAMETER/
Additive Data Tables
Forthcoming: documentation of the _manipulate_by_delta
method, which allows
mixtures based on additive deltas instead of linear interpolation.
Template Injection
Forthcoming: documentation of the template injection method, which writes parameter values directly into the input files based on a template.
Direct Injection
Forthcoming: documentation of the direct injection method, which writes parameter values directly into the input files, overwriting existing values.
Custom Methods
Forthcoming: documentation of how to define custom methods for preparing input files.
Running an Experiment
Once the input files have been prepared, the VisionEval model can be run. The
FilesCoreModel
interface defines the run
method as the place to run the VisionEval
model. The run
method needs to be overloaded in a subclass of FilesCoreModel
to define
the specific steps needed to run the VisionEval model.
In this example, the main thing we do in the run
method is to set the path
environment
variable to include the path to the R executable, and then run s small script
that opens the VisionEval model and runs it with the desired inputs.
class VEModel(FilesCoreModel):
...
def run(self): # (1)!
os.environ["path"] = (
join_norm(self.config["r_executable"]) + ";" + os.environ["path"]
)
cmd = "Rscript"
# write a small script that opens the model and runs it
with open(join_norm(self.local_directory, "vemodel_runner.R"), "wt") as script:
script.write(f"""
thismodel <- openModel("{r_join_norm(self.local_directory, self.modelname)}")
thismodel$run("reset")
""")
self.last_run_result = subprocess.run( # (2)!
[cmd, "vemodel_runner.R"],
cwd=self.local_directory,
capture_output=True,
)
- The
run
method accepts no arguments. All the information needed to run the experiment is stored in files written during thesetup
method. - The subprocess.run command runs a command line tool. The
name of the command line tool, plus all the command line arguments
for the tool, are given as a list of strings, not one string.
The
cwd
argument sets the current working directory from which the command line tool is launched. Settingcapture_output
to True will capture both stdout and stderr from the command line tool, and make these available in the result object to facilitate debugging.
Extracting Results
Once the VisionEval model has been run, the output files need to be collected and
parsed to extract the performance measures that quantify the results of the
experiment. The FilesCoreModel
interface defines some standardized data extraction
processes that can be configured entirely in the exploratory scope (i.e. the YAML
config file), so it may not be necessary to write Python code to extract the results.
Each performance measure that is to be extracted from the output files of the VisionEval
model is defined in the exploratory scope definition, under the outputs
section. Each
output is defined by a unique name, and the parser
section of the output definition
defines how to extract the performance measure from the output files. The parser
section can include a file
argument, which is the name of the output file from which
the performance measure is to be extracted. It can also include a loc
or an iloc
argument,
which is the location in the output file where the performance measure is to be found.
These locations correlate with the pandas.DataFrame
accessors loc
and iloc
,
respectively. For the loc
argument, file is read in as a pandas.DataFrame
, with the
first row as the column names, and the first column as the index, and the performance
measure is extracted by selecting the row & column with the labels that match the
value of the loc
argument. For the iloc
argument, the performance measure is
extracted by selecting the row & column with the integer positions that match the
value of the iloc
argument.
Since most
outputs:
HouseholdDvmtPerHh:
kind: info
desc: Average daily vehicle miles traveled by households in 2050
metamodeltype: linear
parser:
file: state_validation_measures.csv
loc: [HouseholdDvmtPerHh, 2050]
Since most VisionEval output files are in CSV format, this simple parsing will
be sufficient for most cases. However, if the desired performance measures is
not directly available as a scalar value in an output files, there is also an
eval
parser that can be used to evaluate a Python expression that computes
the desired result. For example, this parser will compute the difference between
the 2010 and 2038 values of the UrbanHhCO2e
values, and report the difference
as the performance measure.
outputs:
UrbanHhCO2eReduction:
shortname: Cars GHG Reduction
kind: info
desc: Reduction from 2010 level in average annual production of greenhouse gas emissions from light-duty
vehicle travel by households residing in the urban area
transform: none
metamodeltype: linear
parser:
file: Measures_VERSPM_2010,2038_Marea=RVMPO.csv
eval: loc['UrbanHhCO2e','2010'] - loc['UrbanHhCO2e','2038']
For more complex parsing and analysis, you can define a custom Python function
to compute arbitrarily complex performance measures. In the example repository,
the Oregon VE State model has a custom parser
written in R that computes a number of performance measures. This R script is
called as a subprocess from the post_process
method of the FilesCoreModel
subclass. The built-in parser described above is
then used to extract the performance measures from the output of the R
post-processing script.