Running Experiments
tl;dr
When you run a VisionEval model, it takes a bunch of files as input, it does some stuff, and then it gives you a bunch of files as output. To run an experiment from EMAT, we need to set up the input files to reflect the values of policy levers and exogenous uncertainties for that experiment, run VisionEval model to get the outputs, then extract whatever performance measures we want from those outputs and feed them back to EMAT.
The idea behind EMAT is to run a number of experiments, and then analyze the results of those experiments. The number of experiments that needs to be run is a function of the level of complexity of the EMAT scope, but in general it is more experiments that a user would want to run manually. Thus, the EMAT toolset is designed to automate the process of running experiments.
When working with VisionEval, at least as defined in this demonstration repository, we will be treating the VisionEval model as a "files-based core model". Doing so requires a few steps for each experiment:
- Prepare the input files for the VisionEval model, based on the values of policy levers and exogenous uncertainties defined for the experiment.
- Run the VisionEval model, using the input files that have been prepared.
- (Optional) Run any post-processing steps that are needed to extract the results of the experiment from the output files of the VisionEval model.
- Collect the output files from the VisionEval model and parse then to extract the results of the experiment.
Each of these steps is encapsulated in a Python function that is part of the
FilesCoreModel
interface. In the implementation code, you will see a class that is a subclass of
FilesCoreModel, and that class will define the specific steps needed to prepare
the input files, run the model, and extract the results.
from emat.model.core_files import FilesCoreModel
class VEModel(FilesCoreModel): # (1)!
"""
A class for using Vision Eval as a files core model.
"""
...
- The
VEModelclass is a subclass ofFilesCoreModel, which defines the specific steps needed to prepare the input files, run the model, and extract the results.
You can see some examples of the FilesCoreModel interface here
and here.
The process for creating a new analysis with EMAT and VisionEval includes creating
a similar class that is a subclass of FilesCoreModel, and then defining the specific
methods needed to carry out the steps of the integration. This can be done from
scratch, or by copying and modifying an existing example.
Setting Up an Experiment
Each experiment involves making a complete copy of the VisionEval model in a contained
environment, and then modifying the input files for that copy of the VisionEval model
to reflect the specific values of policy levers and exogenous uncertainties for that
experiment. The FilesCoreModel interface defines the setup method as the place to
create a new copy of the VisionEval model in a contained environment, and then modify
the input files for that copy of the VisionEval model. The setup method needs to be
overloaded in a subclass of FilesCoreModel to define the specific steps needed to
modify the input files for the experiment.
class VEModel(FilesCoreModel):
...
def setup(self, params: dict): # (1)!
"""
Configure the core model with the experiment variable values.
Args:
params (dict):
experiment variables including both exogenous
uncertainty and policy levers
Raises:
KeyError:
if a defined experiment variable is not supported
by the core model
"""
- The
setupmethod accepts a dictionary of parameters, which includes the values of policy levers and exogenous uncertainties for the experiment.
Within the setup method, the subclass of FilesCoreModel will need to make a complete
copy of the VisionEval model in a contained environment, and then modify the input files
for that copy of the VisionEval model to reflect the specific values of policy levers and
exogenous uncertainties for that experiment.
There are numerous possible ways to prepare the input files for the VisionEval model, depending on the exploratory scope and the types of inputs that need to be modified. This demo repository includes a few different examples of how to prepare input files based on the scope:
- Categorical Drop-In
- Mixture of Data Tables
- Scaling Data Tables
- Additive Data Tables
- Template Injection
- Direct Injection
- Custom Methods
Each of these methods can be implemented in a bespoke manner for each specific input parameter (both policy levers and exogenous uncertainties), or you can use generic methods that can be applied to a wide range of input parameters. The generic approach is shown in the example repositories.
Categorical Drop-In
Many of the input files for VisionEval are in the form of CSV files. The simplest way to actuate a change in the input files is to simply select an entire file that has the desired values, and copy that file into the requisite input location. This is limited to categorical inputs, which are inputs that can be represented as discrete categorical values. For example, you may have two different population projections, one that represents scenario "A" where a particular brownfield area is cleaned up and developed, and another that represents scenario "B" where the brownfield is left as is. Under this policy lever, it doesn't make sense to have an intermediate value ("we'll just clean up part of the toxic waste, and let only few people move in").
An advantage of this method is that it is simple to implement, and it places no limits on the format of the input files. There is no need to have a specific format or a matching number of rows or columns in the input files. In the population projection example considered above, the input files for the two scenarios could have different numbers of rows as one of the two scenarios could imply a different zonal structure within the region.
In this example repository, this approach is called the "categorical drop-in" method.\
The VisionEval model will either use the inputs file "A" or the inputs file "B",
but not a mix of the two. This is expressed in the code by the categorical_drop_in
method, which is a method of the FilesCoreModel.
def _manipulate_by_categorical_drop_in(
self,
params: dict, # (1)!
cat_param: str, # (2)!
ve_scenario_dir: os.PathLike, # (3)!
):
scenario_dir = params[cat_param]
for i in os.scandir(scenario_input(ve_scenario_dir, scenario_dir)): # (4)!
if i.is_file():
shutil.copyfile(
scenario_input(ve_scenario_dir, scenario_dir, i.name),
join_norm(self.resolved_model_path, "inputs", i.name),
)
- The
paramsdictionary is passed through to the_manipulate_by_categorical_drop_inmethod. This dictionary includes the values of all the policy levers and exogenous uncertainties for the experiment. - The
cat_paramargument is the name of the parameter in theparamsdictionary that is the categorical drop-in. - The
ve_scenario_dirargument is the directory where the categorical input files for the categorical drop-in are stored. - The
_manipulate_by_categorical_drop_inmethod will scan the appropriate directory\ where the categorical input files are stored, and copy the input files for the selected categorical value into the requisite input location for the VisionEval model.
This method is in turn called from individual setup sub-methods, which will
define the specific input parameters that are categorical drop-ins. For example,
the _manipulate_carsvcavail method can define the specific input parameters that are
categorical drop-ins for car service availability inputs.
def _manipulate_carsvcavail(self, params):
return self._manipulate_by_categorical_drop_in(
params, # (1)!
"CARSVCAVAILSCEN", # (2)!
self.scenario_input_dirs.get("CARSVCAVAILSCEN"), # (3)!
)
- The
paramsdictionary is passed through to the_manipulate_by_categorical_drop_inmethod. - The second argument to the
_manipulate_by_categorical_drop_inmethod is the name of the parameter in theparamsdictionary that is the categorical drop-in, in this case theCARSVCAVAILSCENparameter. - The third argument to the
_manipulate_by_categorical_drop_inmethod is the directory where the categorical input files for the categorical drop-in are stored.
You will find this function mirrored in the EMAT exploratory scope definition, where the categorical drop-in is defined as an uncertainty.
inputs:
CARSVCAVAILSCEN:
shortname: Car Service Availability
address: CARSVCAVAILSCEN
ptype: exogenous uncertainty
dtype: cat # (1)!
desc: Different levels of car service availability
default: mid # (2)!
values: # (3)!
- low
- mid
- high
- The
dtypeis set tocatto indicate that this is a categorical input, which can only take on one of a discrete set of values. - The
defaultvalue is set tomid, which will be the selected value for this parameter if no other value is specified. - The
valueslist defines the discrete set of values that this parameter can take on. These should be strings, so that we can match against sub-directory names in theScenario-Inputsdirectory of the VisionEval model.
This structure also requires each categorical drop-in to have a corresponding
directory in the inputs directory of the VisionEval model, where the input file(s)
for each categorical drop-in are stored. Note that there is a directory matching
each categorical value, and within that directory are the input files that are
to be used when that categorical value is selected. Generally, the names of the
input files will be the same across all categorical values, as shown here.
π Scenario-Inputs/
βββ π OTP/
βββ π ANOTHER_PARAMETER/
βββ π CARSVCAVAILSCEN/
β βββ π low/
β β βββ π marea_carsvc_availability.csv
β βββ π mid/
β β βββ π marea_carsvc_availability.csv
β βββ π high/
β βββ π marea_carsvc_availability.csv
βββ π OTHER_PARAMETER/
Mixture of Data Tables
In contrast to the categorical drop-in method, the "mixture of data tables" method allows for creating "intermediate" input files that are a mix of different input files. The approach is suitable for continuous inputs, which are inputs that can take on a range of values. For example, you may have a land use density projection that has upper and lower bounds, and you want to explore the effects of different levels of density between those limits.
An advantage of this method is that it allows for a more fine-grained exploration of the input space, and it can be used for continuous inputs. However, it does require that the input files have a specific format (a CSV table containing primarily numeric data), and that the number of rows and columns in the input files match across both the input files, which are labeled as "1" and "2" in this example.
Instead of copying an entire file, the mixture of data tables method will read in
both input files, and then linearly interpolate between the two input files based
on the value of the policy lever or exogenous uncertainty. This is expressed in the
code by the _manipulate_by_mixture method, which is a method of the FilesCoreModel.
def _manipulate_by_mixture(
self,
params, # (1)!
weight_param, # (2)!
ve_scenario_dir, # (3)!
no_mix_cols=(
"Year",
"Geo",
), # (4)!
float_dtypes=False, # (5)!
):
weight_2 = params[weight_param]
weight_1 = 1.0 - weight_2
# Gather list of all files in directory "1", and confirm they
# are also in directory "2"
filenames = []
for i in os.scandir(scenario_input(ve_scenario_dir, "1")):
if i.is_file():
filenames.append(i.name)
f2 = scenario_input(ve_scenario_dir, "2", i.name)
if not os.path.exists(f2):
raise FileNotFoundError(f2)
for filename in filenames:
df1 = pd.read_csv(scenario_input(ve_scenario_dir, "1", filename))
isna_ = (df1.isnull().values).any()
df1.fillna(0, inplace=True) # (6)!
df2 = pd.read_csv(scenario_input(ve_scenario_dir, "2", filename))
df2.fillna(0, inplace=True)
float_mix_cols = list(df1.select_dtypes("float").columns)
if float_dtypes:
float_mix_cols = float_mix_cols + list(df1.select_dtypes("int").columns)
for j in no_mix_cols:
if j in float_mix_cols:
float_mix_cols.remove(j)
if float_mix_cols:
df1_float = df1[float_mix_cols]
df2_float = df2[float_mix_cols]
df1[float_mix_cols] = df1_float * weight_1 + df2_float * weight_2
int_mix_cols = list(df1.select_dtypes("int").columns)
if float_dtypes:
int_mix_cols = list()
for j in no_mix_cols:
if j in int_mix_cols:
int_mix_cols.remove(j)
if int_mix_cols:
df1_int = df1[int_mix_cols]
df2_int = df2[int_mix_cols]
df_int_mix = df1_int * weight_1 + df2_int * weight_2
df1[int_mix_cols] = np.round(df_int_mix).astype(int) # (7)!
out_filename = join_norm(self.resolved_model_path, "inputs", filename)
if isna_:
df1.replace(0, np.nan, inplace=True)
df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")
- The
paramsdictionary is passed through to the_manipulate_by_mixturemethod. - The
weight_paramargument is the name of the parameter in theparamsdictionary that is the weight for the mixture of data tables. - The
ve_scenario_dirargument is the directory where the input files for the mixture of data tables are stored. There should be two subdirectories, "1" and "2". - The
no_mix_colsargument is a list of column names that should not be mixed. This is useful for columns that are not numerical, such as year or geography, which should not be mixed (or for which there is no reasonable linear interpolation). These columns will be copied from the input file in directory "1" to the output file. - The
float_dtypesargument is a boolean that indicates whether integer columns should be treated as float columns for the purposes of mixing. Setting this toTruewill treat integer columns as float columns, and will mix them as such, which can be problematic if VisionEval is expecting integers. - The
isna_variable is set toTrueif there are anyNaNvalues in the input file. If there are, these will be replaced with zeros for the purposes of mixing, and then replaced withNaNin the output file, as linear interpolation ofNaNvalues is not possible. - The
df_int_mixvariable is the linear interpolation of the integer columns, and is optionally rounded to the nearest integer. This is done to ensure that the output file has integer values, which is important if VisionEval is expecting integers.
This method is in turn called from individual setup sub-methods, which will
define the specific input parameters that are mixtures of data tables. For example,
the _manipulate_landuse
method can define the specific input parameters that are
mixtures of data tables for land use density inputs.
def _manipulate_ludensity(self, params):
return self._manipulate_by_mixture(
params, # (1)!
"LUDENSITYMIX", # (2)!
self.scenario_input_dirs.get("LUDENSITYMIX"), # (3)!
)
- The
paramsdictionary is passed through to the__manipulate_by_mixturemethod. - The second argument to the
_manipulate_by_mixturemethod is the name of the parameter in theparamsdictionary that is controlling the, mixture, in this case theLUDENSITYMIXparameter. - The third argument to the
__manipulate_by_mixturemethod is the directory where the categorical input files for the mixture bounds are stored.
You will find this function mirrored in the EMAT exploratory scope definition, where the mixture of data tables is defined as an exogenous uncertainty.
inputs:
LUDENSITYMIX:
shortname: Urban Mix Prop
address: LUDENSITYMIX
ptype: exogenous uncertainty
dtype: float # (1)!
desc: Urban proportion for each marea by year
default: 0
min: 0 # (2)!
max: 1 # (3)!
- The
dtypeis set tofloatto indicate that this is a continuous input, which can take on a range of values. - The
minvalue for mixtures is always set to0, which represents the lower bound for this parameter, and will set the weight of the "1" input file to1.0and the weight of the "2" input file to0.0. - The
maxvalue for mixtures is always set to1, which represents the upper bound for this parameter, and will set the weight of the "1" input file to0.0and the weight of the "2" input file to1.0.
This structure also requires each mixture to have a corresponding
directory in the inputs directory of the VisionEval model, where the input file(s)
for each categorical drop-in are stored. Note that there are exactly two sub-directories
in this parameters directory, and they are named "1" and "2", and within those two directories
are the input files that are to be mixed together. The names of the
input file(s) must be the same across all both sub-directories, as shown here,
and they must be in the same format (a CSV table containing primarily numeric data).
π Scenario-Inputs/
βββ π OTP/
βββ π ANOTHER_PARAMETER/
βββ π LUDENSITYMIX/
β βββ π 1/
β β βββ π marea_mix_targets.csv
β βββ π 2/
β βββ π marea_mix_targets.csv
βββ π OTHER_PARAMETER/
Scaling Data Tables
The scaling data tables method is much like the mixture of data tables method, but instead of linearly interpolating between two input files, the scaling data tables method will scale all the values in selected columns of an input file up or down based on the value of the policy lever or exogenous uncertainty. This is useful for continuous inputs that are best represented as a single table, but where the values in that table can be scaled up or down. For example, you may have a population projection that represents a "baseline" scenario, and you want to explore the effects of different levels of population growth.
The _manipulate_by_scale function shown below can be included in an integration's
subclass of FilesCoreModel, and used to scale the values in the input files
based on the value of the policy lever or exogenous uncertainty.
def _manipulate_by_scale(
self,
params, # (1)!
param_map, # (2)!
ve_scenario_dir, # (3)!
max_thresh=1e9, # (4)!
):
# Gather list of all files in scenario input directory
filenames = []
for i in os.scandir(scenario_input(ve_scenario_dir)):
if i.is_file():
filenames.append(i.name)
for filename in filenames:
df1 = pd.read_csv(scenario_input(ve_scenario_dir, filename))
for param_name, column_names in param_map.items():
if isinstance(column_names, str):
column_names = [column_names]
for column_name in column_names:
df1[[column_name]] = (df1[[column_name]] * params.get(param_name)).clip(
lower=-max_thresh, upper=max_thresh
) # (5)!
out_filename = join_norm(self.resolved_model_path, "inputs", filename)
df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")
- The
paramsdictionary is passed through from thesetupmethod to the_manipulate_by_scalemethod. - The
param_mapargument is a dictionary that maps the parameter names in theparamsdictionary to the column names in the input file that should be scaled. - The
ve_scenario_dirargument is the directory where the input files for the scaling are stored. - The
max_threshargument is the maximum value that any value in the input file can be scaled to. This is important to ensure that the scaled values are not too large or too small. - The
clipmethod is used to ensure that the scaled values are not too large or too small. This is important to ensure that the scaled values are within the range of values that VisionEval is expecting.
If you use this approach, you would not set the min and max values for the
relevant parameter in the exploratory scope definition to 0 and 1, as you would
for the mixture model. Instead, set those limits to the minimum and maximum
values that you want to use for the scaling factor. The upper and lower limits
need not be symmetric around 1.0, as the scaling factor can be used to scale
values up or down, or both.
inputs:
LUDENSITYMIX:
shortname: Urban Mix Prop
address: LUDENSITYMIX
ptype: exogenous uncertainty
dtype: float # (1)!
desc: Urban proportion for each marea by year
default: 0
min: 0.75 # (2)!
max: 1.5 # (3)!
- The
dtypeis set tofloatto indicate that this is a continuous input, which can take on a range of values. - The
minvalue for scaling factor can be any value. Positive values less than or equal to 1.0 are most common, but negative values are also allowable, if the signs on the targeted values might be inverted. - The
maxvalue for the scaling factor can be any value. Positive values greater than or equal to 1.0 are most common.
The function __manipulate_by_scale written above also implies that the input
files on which the scaling factor are applied are in the scenario directory, not
a subdirectory of the scenario directory, as was the case in mixture models.\
This is because the scaling factor method is applied to a single set of inputs,
so there's no need to have multiple subdirectories for the input files.
π Scenario-Inputs/
βββ π OTP/
βββ π ANOTHER_PARAMETER/
βββ π LUDENSITYMIX/
β βββ π marea_mix_targets.csv
βββ π OTHER_PARAMETER/
Additive Data Tables
The additive data tables alows scenario specific inputs to be generated by adding a fraction of the difference between two baseline input datasets. Instead of scaling a single dataset or blending them directly, _manipulate_by_delta method, computes the delta (difference) between the two inputs and applies a fraction of that delta to the first dataset. This is useful when changes between scenarios represent additive differences rather than proportional shifts.
The _manipulate_by_delta function shown below is designed to perform this interpolation. It can be included in an integration's
subclass of FilesCoreModel and used to generate intermediate scenario inputs based on policy lever or exogenous uncertainty.
def _manipulate_by_delta(
self,
params, # (1)!
weight_param, # (2)!
ve_scenario_dir, # (3)!
no_mix_cols=('Year', 'Geo',) # (4)!
):
weight_ = params[weight_param]
# Gather list of all files in directory "1", and confirm they
# are also in directory "2"
filenames = []
for i in os.scandir(scenario_input(ve_scenario_dir,'1')):
if i.is_file():
filenames.append(i.name)
f2 = scenario_input(ve_scenario_dir,'2', i.name)
if not os.path.exists(f2):
raise FileNotFoundError(f2)
for filename in filenames:
df1 = pd.read_csv(scenario_input(ve_scenario_dir,'1',filename))
df2 = pd.read_csv(scenario_input(ve_scenario_dir,'2',filename))
float_mix_cols = list(df1.select_dtypes('float').columns)
for j in no_mix_cols:
if j in float_mix_cols:
float_mix_cols.remove(j)
if float_mix_cols:
df1_float = df1[float_mix_cols]
df2_float = df2[float_mix_cols]
delta_float = df2_float - df1_float
df1[float_mix_cols] = df1_float + (delta_float * weight_) #(5)!
int_mix_cols = list(df1.select_dtypes('int').columns)
for j in no_mix_cols:
if j in int_mix_cols:
int_mix_cols.remove(j)
if int_mix_cols:
df1_int = df1[int_mix_cols]
df2_int = df2[int_mix_cols]
delta_int = df2_int - df1_int
df_int_mix = df1_int + (delta_int * weight_)
df1[int_mix_cols] = np.round(df_int_mix).astype(int)
out_filename = join_norm(
self.resolved_model_path, 'inputs', filename
)
df1.to_csv(out_filename, index=False, float_format="%.5f")
- The
paramsdictionary is passed through from thesetupmethod to the_manipulate_by_deltamethod. - The
weight_paramis the name of the parameter that determines how much of the difference between the two datasets should be applied. A value of 0 results in no change from the first dataset, while 1 results in a full shift to the second. - The
ve_scenario_dirmust contain two subdirectories:1(baseline) and2(target scenario). These subfolders must contain identically named files for proper delta computation. - The
no_mix_colsargument specifies the columns that should not be modified. - This computes the delta between two input datasets and applies the weight (e.g., 30% of the delta) to first input dataset and updates the columns of the first dataset.
This approach expects you scope file to define the parameter in the [0, 1] interval, similar to linear interpolation, but conceptually it's applying partial additive changes rather than a blend.
You will find this function mirrored in the EMAT exploratory scope definition
inputs:
LANEMILESCEN:
shortname: Marea Lane Miles
address: TRIPINCREMENT
ptype: policy lever
dtype: float # (1)!
desc: Different marea lane mile scenario
default: 0
min: 0.0 # (2)!
max: 1.0 # (3)!
- The
dtypeis set tofloatto indicate that this is a continuous input, which can take on a range of values. - The
minvalue for delta factor should be 0.0. Positive values less than or equal to 1.0 are also acceptable as long as it is less than themaxvalue. - The
maxvalue for the delta factor can be any value less than 1 but greater than theminvalue.
The input folder structure should look like this:
π Scenario-Inputs/
βββ π OTP/
βββ π LANEMILESCEN/
β βββ π 1/
β β βββ π marea_lane_miles.csv
β βββ π 2/
β βββ π marea_lane_miles.csv
βββ π OTHER_PARAMETER/
Template Injection
The template injection method modifies the input file based on a predefined template using parameter values from the experimental setup to directly update specific fields. This is useful when values in a table are calculated rather than interpolated or scaled, such as applying a compound growth rate to income projections or updating a single parameter across multiple years.
The _manipualte_income function shown below demonstrates how to apply this approach to per capita income data, adjusting values across simulation years using a user-defined growth rate.
def _manipulate_income(
self,
params # (1)!
):
income_df = pd.read_csv(join_norm(scenario_input(self.scenario_input_dirs.get('INCOMEGROWTHRATE'),'azone_per_cap_inc.csv'))) # (2)!
unique_years = income_df.Year.unique()
base_year = self.model_base_year
for run_year in unique_years:
year_diff = run_year - base_year
income_df.loc[income_df.Year == run_year,['HHIncomePC.2005', 'GQIncomePC.2005']] = \
income_df.loc[income_df.Year == run_year,['HHIncomePC.2005', 'GQIncomePC.2005']] * (params['INCOMEGROWTHRATE'] ** year_diff) # (3)!
out_filename = join_norm(
self.resolved_model_path, 'inputs', 'azone_per_cap_inc.csv'
)
_logger.debug(f"writing updates to: {out_filename}")
income_df.to_csv(out_filename, index=False)
- The
paramsdictionary contains the value of the INCOMEGROWTHRATE parameter, which may vary across experimental runs. - The
azone_per_cap_inc.csvinput file is treated as a template. The structure of the file is retained but the specific fields are updated using the growth rate. - For each unique year in the file, the function computes how far that year is from the base model year and applies compound growth accordingly.The columns
HHIncomePC.2005andGQIncomePC.2005are multiplied by the growth factor.
Here is an example setup in the scope file.
inputs:
INCOMEGROWTHRATE:
shortname: Income Growth Rate
address: INCOMEGROWTHRATE
ptype: exogenous uncertainty
dtype: float # (1)!
desc: Annual compound growth rate for per capita income
default: 1.0
min: 0.95 # (2)!
max: 1.05 # (3)!
- The
dtypeis set tofloatto indicate that this is a continuous input, which can take on a range of values. - The
minandmaxdefines the range of annual growth factors. A default value of1indicates no change, while values below or above that reflect decreases or increases, respectively.
Unlike additive or scaling methods, template injection uses a single version of the input file, typically stored in the scenario-specific directory:
π Scenario-Inputs/
βββ π OTP/
βββ π ANOTHER_PARAMETER/
βββ π INCOMEGROWTHRATE/
β βββ π azone_per_cap_inc.csv
βββ π OTHER_PARAMETER/
Direct Injection
The direct injection method is used to overwrite values directly in input files using parameter values. Unlike interpolation or scaling methods that manipulate entire tables or columns, direct injection targets specific cellsβtypically a single value in a known row and column. This approach is ideal when a policy lever or exogenous uncertainty corresponds to a single numeric input that varies across EMAT experiments.
For instance, consider a scenario where you want to update the average occupancy in shared care services for a particular year.
The _manipulate_shdcarsvc function shown below is an example implementation
def _manipulate_shdcarsvc(
self,
params # (1)!
):
shdcarsvc_occp_df = pd.read_csv(join_norm(scenario_input(self.scenario_input_dirs.get('SHDCARSVCOCCUPRATE'),'region_carsvc_shd_occup.csv'))) # (2)!
future_year = self.model_future_year # (3)!
shdcarsvc_occp_df.loc[shdcarsvc_occp_df.Year == future_year, 'ShdCarSvcAveOccup'] = params['SHDCARSVCOCCUPRATE'] # (4)!
out_filename = join_norm(
self.resolved_model_path, 'inputs', 'region_carsvc_shd_occup.csv'
)
_logger.debug(f"writing updates to: {out_filename}")
shdcarsvc_occp_df.to_csv(out_filename, index=False)
- The
paramsdictionary contains the value of theSHDCARSVCOCCUPRATEparameter, which may vary across experimental runs. - The input file
region_carsvc_shd_occup.csvis read from the scenario folder. - The method identifies the target year using
self.model_future_year. - The
ShdCarSvcAveOccupcolumn only for the year is set to the value of theSHDCARSVCOCCUPRATEparameter. - The modified file is saved to the model's resolved input directory for the use in the experiment run.
Here is an setup in the scope file
inputs:
SHDCARSVCOCCUPRATE:
shortname: Shared Car Svc Occup
address: SHDCARSVCOCCUPRATE
ptype: exogenous uncertainty
dtype: float
desc: Average occupancy in shared car services for future year.
default: 2.25
min: 1.0 # (1)!
max: 3 # (1)!
- The default, min, and max values define the range of average occupancy that will be injected into the input file.
Similar to template injection, direct injection uses a single version of the input file, typically stored in the scenario-specific directory:
π Scenario-Inputs/
βββ π OTP/
βββ π ANOTHER_PARAMETER/
βββ π SHDCARSVCOCCUPRATE/
β βββ π region_carsvc_shd_occup.csv
βββ π OTHER_PARAMETER/
Custom Methods
The various methods for manipulating input files described above are likely
to be sufficient for most experiments. However, it is not strictly necessary
to follow any of these recipes. Advanced users can harness the full power and
flexibility of Python to manipulate or create new VisionEval input files in any
way they see fit, by writing bespoke methods to do so and calling those methods
from the setup method of your subclass of FilesCoreModel. Virtually any
method or process that can be called from Python can be used to manipulate the
input files. This also includes potentially modifying or creating new input files
using R or any other programming language, by calling the necessary commands as
subprocesses from Python. The Python code necessary to call R (or any other tool)
as a subprocess is very similar to the code shown in the run method below,
and can be used to run R scripts or any other command line tool from within Python.
Running an Experiment
Once the input files have been prepared, the VisionEval model can be run. The
FilesCoreModel interface defines the run method as the place to run the VisionEval
model. The run method needs to be overloaded in a subclass of FilesCoreModel to define
the specific steps needed to run the VisionEval model.
In this example, the main thing we do in the run method is to set the path environment
variable to include the path to the R executable, and then run s small script
that opens the VisionEval model and runs it with the desired inputs.
class VEModel(FilesCoreModel):
...
def run(self): # (1)!
os.environ["path"] = (
join_norm(self.config["r_executable"]) + ";" + os.environ["path"]
)
cmd = "Rscript"
# write a small script that opens the model and runs it
with open(join_norm(self.local_directory, "vemodel_runner.R"), "wt") as script:
script.write(f"""
thismodel <- openModel("{r_join_norm(self.local_directory, self.modelname)}")
thismodel$run("reset")
""")
self.last_run_result = subprocess.run( # (2)!
[cmd, "vemodel_runner.R"],
cwd=self.local_directory,
capture_output=True,
)
- The
runmethod accepts no arguments. All the information needed to run the experiment is stored in files written during thesetupmethod. - The subprocess.run command runs a command line tool. The
name of the command line tool, plus all the command line arguments
for the tool, are given as a list of strings, not one string.
The
cwdargument sets the current working directory from which the command line tool is launched. Settingcapture_outputto True will capture both stdout and stderr from the command line tool, and make these available in the result object to facilitate debugging.
Extracting Results
Once the VisionEval model has been run, the output files need to be collected and
parsed to extract the performance measures that quantify the results of the
experiment. The FilesCoreModel interface defines some standardized data extraction
processes that can be configured entirely in the exploratory scope (i.e. the YAML
config file), so it may not be necessary to write Python code to extract the results.
Each performance measure that is to be extracted from the output files of the VisionEval
model is defined in the exploratory scope definition, under the outputs section. Each
output is defined by a unique name, and the parser section of the output definition
defines how to extract the performance measure from the output files. The parser
section can include a file argument, which is the name of the output file from which
the performance measure is to be extracted. It can also include a loc or an iloc argument,
which is the location in the output file where the performance measure is to be found.
These locations correlate with the pandas.DataFrame accessors loc and iloc,
respectively. For the loc argument, file is read in as a pandas.DataFrame, with the
first row as the column names, and the first column as the index, and the performance
measure is extracted by selecting the row & column with the labels that match the
value of the loc argument. For the iloc argument, the performance measure is
extracted by selecting the row & column with the integer positions that match the
value of the iloc argument.
outputs:
HouseholdDvmtPerHh:
kind: info
desc: Average daily vehicle miles traveled by households in 2050
metamodeltype: linear
parser:
file: state_validation_measures.csv
loc: [HouseholdDvmtPerHh, 2050]
Since most VisionEval output files are in CSV format, this simple parsing will
be sufficient for most cases. However, if the desired performance measures is
not directly available as a scalar value in an output files, there is also an
eval parser that can be used to evaluate a Python expression that computes
the desired result. For example, this parser will compute the difference between
the 2010 and 2038 values of the UrbanHhCO2e values, and report the difference
as the performance measure.
outputs:
UrbanHhCO2eReduction:
shortname: Cars GHG Reduction
kind: info
desc: Reduction from 2010 level in average annual production of greenhouse gas emissions from light-duty
vehicle travel by households residing in the urban area
transform: none
metamodeltype: linear
parser:
file: Measures_VERSPM_2010,2038_Marea=RVMPO.csv
eval: loc['UrbanHhCO2e','2010'] - loc['UrbanHhCO2e','2038']
For more complex parsing and analysis, you can define a custom Python function
to compute arbitrarily complex performance measures. In the example repository,
the Oregon VE State model has a custom parser
written in R that computes a number of performance measures. This R script is
called as a subprocess from the post_process
method of the FilesCoreModel subclass. The built-in parser described above is
then used to extract the performance measures from the output of the R
post-processing script.