Running Experiments

tl;dr

When you run a VisionEval model, it takes a bunch of files as input, it does some stuff, and then it gives you a bunch of files as output. To run an experiment from EMAT, we need to set up the input files to reflect the values of policy levers and exogenous uncertainties for that experiment, run VisionEval model to get the outputs, then extract whatever performance measures we want from those outputs and feed them back to EMAT.

The idea behind EMAT is to run a number of experiments, and then analyze the results of those experiments. The number of experiments that needs to be run is a function of the level of complexity of the EMAT scope, but in general it is more experiments that a user would want to run manually. Thus, the EMAT toolset is designed to automate the process of running experiments.

When working with VisionEval, at least as defined in this demonstration repository, we will be treating the VisionEval model as a "files-based core model". Doing so requires a few steps for each experiment:

Prepare the input files for the VisionEval model, based on the values of policy levers and exogenous uncertainties defined for the experiment.
Run the VisionEval model, using the input files that have been prepared.
(Optional) Run any post-processing steps that are needed to extract the results of the experiment from the output files of the VisionEval model.
Collect the output files from the VisionEval model and parse then to extract the results of the experiment.

Each of these steps is encapsulated in a Python function that is part of the FilesCoreModel interface. In the implementation code, you will see a class that is a subclass of FilesCoreModel, and that class will define the specific steps needed to prepare the input files, run the model, and extract the results.

from emat.model.core_files import FilesCoreModel


class VEModel(FilesCoreModel):  # (1)!
    """
    A class for using Vision Eval as a files core model.
    """

    ...

The VEModel class is a subclass of FilesCoreModel, which defines the specific steps needed to prepare the input files, run the model, and extract the results.

You can see some examples of the FilesCoreModel interface here and here. The process for creating a new analysis with EMAT and VisionEval includes creating a similar class that is a subclass of FilesCoreModel, and then defining the specific methods needed to carry out the steps of the integration. This can be done from scratch, or by copying and modifying an existing example.

Setting Up an Experiment

Each experiment involves making a complete copy of the VisionEval model in a contained environment, and then modifying the input files for that copy of the VisionEval model to reflect the specific values of policy levers and exogenous uncertainties for that experiment. The FilesCoreModel interface defines the setup method as the place to create a new copy of the VisionEval model in a contained environment, and then modify the input files for that copy of the VisionEval model. The setup method needs to be overloaded in a subclass of FilesCoreModel to define the specific steps needed to modify the input files for the experiment.

class VEModel(FilesCoreModel):
    ...

    def setup(self, params: dict):  # (1)!
        """
        Configure the core model with the experiment variable values.

        Args:
            params (dict):
                experiment variables including both exogenous
                uncertainty and policy levers

        Raises:
            KeyError:
                if a defined experiment variable is not supported
                by the core model
        """

The setup method accepts a dictionary of parameters, which includes the values of policy levers and exogenous uncertainties for the experiment.

Within the setup method, the subclass of FilesCoreModel will need to make a complete copy of the VisionEval model in a contained environment, and then modify the input files for that copy of the VisionEval model to reflect the specific values of policy levers and exogenous uncertainties for that experiment.

There are numerous possible ways to prepare the input files for the VisionEval model, depending on the exploratory scope and the types of inputs that need to be modified. This demo repository includes a few different examples of how to prepare input files based on the scope:

Categorical Drop-In
Mixture of Data Tables
Scaling Data Tables
Additive Data Tables
Template Injection
Direct Injection
Custom Methods

Each of these methods can be implemented in a bespoke manner for each specific input parameter (both policy levers and exogenous uncertainties), or you can use generic methods that can be applied to a wide range of input parameters. The generic approach is shown in the example repositories.

Categorical Drop-In

Many of the input files for VisionEval are in the form of CSV files. The simplest way to actuate a change in the input files is to simply select an entire file that has the desired values, and copy that file into the requisite input location. This is limited to categorical inputs, which are inputs that can be represented as discrete categorical values. For example, you may have two different population projections, one that represents scenario "A" where a particular brownfield area is cleaned up and developed, and another that represents scenario "B" where the brownfield is left as is. Under this policy lever, it doesn't make sense to have an intermediate value ("we'll just clean up part of the toxic waste, and let only few people move in").

An advantage of this method is that it is simple to implement, and it places no limits on the format of the input files. There is no need to have a specific format or a matching number of rows or columns in the input files. In the population projection example considered above, the input files for the two scenarios could have different numbers of rows as one of the two scenarios could imply a different zonal structure within the region.

In this example repository, this approach is called the "categorical drop-in" method.\ The VisionEval model will either use the inputs file "A" or the inputs file "B", but not a mix of the two. This is expressed in the code by the categorical_drop_in method, which is a method of the FilesCoreModel.

def _manipulate_by_categorical_drop_in(
    self,
    params: dict,  # (1)!
    cat_param: str,  # (2)!
    ve_scenario_dir: os.PathLike,  # (3)!
):
    scenario_dir = params[cat_param]
    for i in os.scandir(scenario_input(ve_scenario_dir, scenario_dir)):  # (4)!
        if i.is_file():
            shutil.copyfile(
                scenario_input(ve_scenario_dir, scenario_dir, i.name),
                join_norm(self.resolved_model_path, "inputs", i.name),
            )

The params dictionary is passed through to the _manipulate_by_categorical_drop_in method. This dictionary includes the values of all the policy levers and exogenous uncertainties for the experiment.
The cat_param argument is the name of the parameter in the params dictionary that is the categorical drop-in.
The ve_scenario_dir argument is the directory where the categorical input files for the categorical drop-in are stored.
The _manipulate_by_categorical_drop_in method will scan the appropriate directory\ where the categorical input files are stored, and copy the input files for the selected categorical value into the requisite input location for the VisionEval model.

This method is in turn called from individual setup sub-methods, which will define the specific input parameters that are categorical drop-ins. For example, the _manipulate_carsvcavail method can define the specific input parameters that are categorical drop-ins for car service availability inputs.

def _manipulate_carsvcavail(self, params):
    return self._manipulate_by_categorical_drop_in(
        params,  # (1)!
        "CARSVCAVAILSCEN",  # (2)!
        self.scenario_input_dirs.get("CARSVCAVAILSCEN"),  # (3)!
    )

The params dictionary is passed through to the _manipulate_by_categorical_drop_in method.
The second argument to the _manipulate_by_categorical_drop_in method is the name of the parameter in the params dictionary that is the categorical drop-in, in this case the CARSVCAVAILSCEN parameter.
The third argument to the _manipulate_by_categorical_drop_in method is the directory where the categorical input files for the categorical drop-in are stored.

You will find this function mirrored in the EMAT exploratory scope definition, where the categorical drop-in is defined as an uncertainty.

inputs:
  CARSVCAVAILSCEN:
    shortname: Car Service Availability
    address: CARSVCAVAILSCEN
    ptype: exogenous uncertainty
    dtype: cat # (1)!
    desc: Different levels of car service availability
    default: mid # (2)!
    values: # (3)!
        - low    
        - mid    
        - high

The dtype is set to cat to indicate that this is a categorical input, which can only take on one of a discrete set of values.
The default value is set to mid, which will be the selected value for this parameter if no other value is specified.
The values list defines the discrete set of values that this parameter can take on. These should be strings, so that we can match against sub-directory names in the Scenario-Inputs directory of the VisionEval model.

This structure also requires each categorical drop-in to have a corresponding directory in the inputs directory of the VisionEval model, where the input file(s) for each categorical drop-in are stored. Note that there is a directory matching each categorical value, and within that directory are the input files that are to be used when that categorical value is selected. Generally, the names of the input files will be the same across all categorical values, as shown here.

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 ANOTHER_PARAMETER/
    ├── 📁 CARSVCAVAILSCEN/
    │   ├── 📁 low/
    │   │   └── 📄 marea_carsvc_availability.csv
    │   ├── 📁 mid/
    │   │   └── 📄 marea_carsvc_availability.csv
    │   └── 📁 high/
    │       └── 📄 marea_carsvc_availability.csv
    └── 📁 OTHER_PARAMETER/

Mixture of Data Tables

In contrast to the categorical drop-in method, the "mixture of data tables" method allows for creating "intermediate" input files that are a mix of different input files. The approach is suitable for continuous inputs, which are inputs that can take on a range of values. For example, you may have a land use density projection that has upper and lower bounds, and you want to explore the effects of different levels of density between those limits.

An advantage of this method is that it allows for a more fine-grained exploration of the input space, and it can be used for continuous inputs. However, it does require that the input files have a specific format (a CSV table containing primarily numeric data), and that the number of rows and columns in the input files match across both the input files, which are labeled as "1" and "2" in this example.

Instead of copying an entire file, the mixture of data tables method will read in both input files, and then linearly interpolate between the two input files based on the value of the policy lever or exogenous uncertainty. This is expressed in the code by the _manipulate_by_mixture method, which is a method of the FilesCoreModel.

def _manipulate_by_mixture(
    self,
    params,  # (1)!
    weight_param,  # (2)!
    ve_scenario_dir,  # (3)!
    no_mix_cols=(
        "Year",
        "Geo",
    ),  # (4)!
    float_dtypes=False,  # (5)!
):
    weight_2 = params[weight_param]
    weight_1 = 1.0 - weight_2

    # Gather list of all files in directory "1", and confirm they
    # are also in directory "2"
    filenames = []
    for i in os.scandir(scenario_input(ve_scenario_dir, "1")):
        if i.is_file():
            filenames.append(i.name)
            f2 = scenario_input(ve_scenario_dir, "2", i.name)
            if not os.path.exists(f2):
                raise FileNotFoundError(f2)

    for filename in filenames:
        df1 = pd.read_csv(scenario_input(ve_scenario_dir, "1", filename))
        isna_ = (df1.isnull().values).any()
        df1.fillna(0, inplace=True)  # (6)!
        df2 = pd.read_csv(scenario_input(ve_scenario_dir, "2", filename))
        df2.fillna(0, inplace=True)

        float_mix_cols = list(df1.select_dtypes("float").columns)
        if float_dtypes:
            float_mix_cols = float_mix_cols + list(df1.select_dtypes("int").columns)
        for j in no_mix_cols:
            if j in float_mix_cols:
                float_mix_cols.remove(j)

        if float_mix_cols:
            df1_float = df1[float_mix_cols]
            df2_float = df2[float_mix_cols]
            df1[float_mix_cols] = df1_float * weight_1 + df2_float * weight_2

        int_mix_cols = list(df1.select_dtypes("int").columns)
        if float_dtypes:
            int_mix_cols = list()
        for j in no_mix_cols:
            if j in int_mix_cols:
                int_mix_cols.remove(j)

        if int_mix_cols:
            df1_int = df1[int_mix_cols]
            df2_int = df2[int_mix_cols]
            df_int_mix = df1_int * weight_1 + df2_int * weight_2
            df1[int_mix_cols] = np.round(df_int_mix).astype(int)  # (7)!

        out_filename = join_norm(self.resolved_model_path, "inputs", filename)
        if isna_:
            df1.replace(0, np.nan, inplace=True)
        df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")

The params dictionary is passed through to the _manipulate_by_mixture method.
The weight_param argument is the name of the parameter in the params dictionary that is the weight for the mixture of data tables.
The ve_scenario_dir argument is the directory where the input files for the mixture of data tables are stored. There should be two subdirectories, "1" and "2".
The no_mix_cols argument is a list of column names that should not be mixed. This is useful for columns that are not numerical, such as year or geography, which should not be mixed (or for which there is no reasonable linear interpolation). These columns will be copied from the input file in directory "1" to the output file.
The float_dtypes argument is a boolean that indicates whether integer columns should be treated as float columns for the purposes of mixing. Setting this to True will treat integer columns as float columns, and will mix them as such, which can be problematic if VisionEval is expecting integers.
The isna_ variable is set to True if there are any NaN values in the input file. If there are, these will be replaced with zeros for the purposes of mixing, and then replaced with NaN in the output file, as linear interpolation of NaN values is not possible.
The df_int_mix variable is the linear interpolation of the integer columns, and is optionally rounded to the nearest integer. This is done to ensure that the output file has integer values, which is important if VisionEval is expecting integers.

This method is in turn called from individual setup sub-methods, which will define the specific input parameters that are mixtures of data tables. For example, the _manipulate_landuse method can define the specific input parameters that are mixtures of data tables for land use density inputs.

def _manipulate_ludensity(self, params):
    return self._manipulate_by_mixture(
        params,  # (1)!
        "LUDENSITYMIX",  # (2)!
        self.scenario_input_dirs.get("LUDENSITYMIX"),  # (3)!
    )

The params dictionary is passed through to the __manipulate_by_mixture method.
The second argument to the _manipulate_by_mixture method is the name of the parameter in the params dictionary that is controlling the, mixture, in this case the LUDENSITYMIX parameter.
The third argument to the __manipulate_by_mixture method is the directory where the categorical input files for the mixture bounds are stored.

You will find this function mirrored in the EMAT exploratory scope definition, where the mixture of data tables is defined as an exogenous uncertainty.

inputs:
    LUDENSITYMIX:
        shortname: Urban Mix Prop
        address: LUDENSITYMIX
        ptype: exogenous uncertainty
        dtype: float # (1)!
        desc: Urban proportion for each marea by year
        default: 0
        min: 0 # (2)!
        max: 1 # (3)!

The dtype is set to float to indicate that this is a continuous input, which can take on a range of values.
The min value for mixtures is always set to 0, which represents the lower bound for this parameter, and will set the weight of the "1" input file to 1.0 and the weight of the "2" input file to 0.0.
The max value for mixtures is always set to 1, which represents the upper bound for this parameter, and will set the weight of the "1" input file to 0.0 and the weight of the "2" input file to 1.0.

This structure also requires each mixture to have a corresponding directory in the inputs directory of the VisionEval model, where the input file(s) for each categorical drop-in are stored. Note that there are exactly two sub-directories in this parameters directory, and they are named "1" and "2", and within those two directories are the input files that are to be mixed together. The names of the input file(s) must be the same across all both sub-directories, as shown here, and they must be in the same format (a CSV table containing primarily numeric data).

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 ANOTHER_PARAMETER/
    ├── 📁 LUDENSITYMIX/
    │   ├── 📁 1/
    │   │   └── 📄 marea_mix_targets.csv
    │   └── 📁 2/
    │       └── 📄 marea_mix_targets.csv
    └── 📁 OTHER_PARAMETER/

Scaling Data Tables

The scaling data tables method is much like the mixture of data tables method, but instead of linearly interpolating between two input files, the scaling data tables method will scale all the values in selected columns of an input file up or down based on the value of the policy lever or exogenous uncertainty. This is useful for continuous inputs that are best represented as a single table, but where the values in that table can be scaled up or down. For example, you may have a population projection that represents a "baseline" scenario, and you want to explore the effects of different levels of population growth.

The _manipulate_by_scale function shown below can be included in an integration's subclass of FilesCoreModel, and used to scale the values in the input files based on the value of the policy lever or exogenous uncertainty.

def _manipulate_by_scale(
    self,
    params,  # (1)!
    param_map,  # (2)!
    ve_scenario_dir,  # (3)!
    max_thresh=1e9,  # (4)!
):
    # Gather list of all files in scenario input directory
    filenames = []
    for i in os.scandir(scenario_input(ve_scenario_dir)):
        if i.is_file():
            filenames.append(i.name)

    for filename in filenames:
        df1 = pd.read_csv(scenario_input(ve_scenario_dir, filename))
        for param_name, column_names in param_map.items():
            if isinstance(column_names, str):
                column_names = [column_names]
            for column_name in column_names:
                df1[[column_name]] = (df1[[column_name]] * params.get(param_name)).clip(
                    lower=-max_thresh, upper=max_thresh
                )  # (5)!

        out_filename = join_norm(self.resolved_model_path, "inputs", filename)
        df1.to_csv(out_filename, index=False, float_format="%.5f", na_rep="NA")

The params dictionary is passed through from the setup method to the _manipulate_by_scale method.
The param_map argument is a dictionary that maps the parameter names in the params dictionary to the column names in the input file that should be scaled.
The ve_scenario_dir argument is the directory where the input files for the scaling are stored.
The max_thresh argument is the maximum value that any value in the input file can be scaled to. This is important to ensure that the scaled values are not too large or too small.
The clip method is used to ensure that the scaled values are not too large or too small. This is important to ensure that the scaled values are within the range of values that VisionEval is expecting.

If you use this approach, you would not set the min and max values for the relevant parameter in the exploratory scope definition to 0 and 1, as you would for the mixture model. Instead, set those limits to the minimum and maximum values that you want to use for the scaling factor. The upper and lower limits need not be symmetric around 1.0, as the scaling factor can be used to scale values up or down, or both.

inputs:
    LUDENSITYMIX:
        shortname: Urban Mix Prop
        address: LUDENSITYMIX
        ptype: exogenous uncertainty
        dtype: float # (1)!
        desc: Urban proportion for each marea by year
        default: 0
        min: 0.75 # (2)!
        max: 1.5 # (3)!

The dtype is set to float to indicate that this is a continuous input, which can take on a range of values.
The min value for scaling factor can be any value. Positive values less than or equal to 1.0 are most common, but negative values are also allowable, if the signs on the targeted values might be inverted.
The max value for the scaling factor can be any value. Positive values greater than or equal to 1.0 are most common.

The function __manipulate_by_scale written above also implies that the input files on which the scaling factor are applied are in the scenario directory, not a subdirectory of the scenario directory, as was the case in mixture models.\ This is because the scaling factor method is applied to a single set of inputs, so there's no need to have multiple subdirectories for the input files.

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 ANOTHER_PARAMETER/
    ├── 📁 LUDENSITYMIX/
    │   └── 📄 marea_mix_targets.csv
    └── 📁 OTHER_PARAMETER/

Additive Data Tables

The additive data tables alows scenario specific inputs to be generated by adding a fraction of the difference between two baseline input datasets. Instead of scaling a single dataset or blending them directly, _manipulate_by_delta method, computes the delta (difference) between the two inputs and applies a fraction of that delta to the first dataset. This is useful when changes between scenarios represent additive differences rather than proportional shifts.

The _manipulate_by_delta function shown below is designed to perform this interpolation. It can be included in an integration's subclass of FilesCoreModel and used to generate intermediate scenario inputs based on policy lever or exogenous uncertainty.

def _manipulate_by_delta(
    self, 
    params,  # (1)!
    weight_param,  # (2)!
    ve_scenario_dir,  # (3)!
    no_mix_cols=('Year', 'Geo',) # (4)!
    ):

        weight_ = params[weight_param]

        # Gather list of all files in directory "1", and confirm they
        # are also in directory "2"
        filenames = []
        for i in os.scandir(scenario_input(ve_scenario_dir,'1')):
            if i.is_file():
                filenames.append(i.name)
                f2 = scenario_input(ve_scenario_dir,'2', i.name)
                if not os.path.exists(f2):
                    raise FileNotFoundError(f2)

        for filename in filenames:
            df1 = pd.read_csv(scenario_input(ve_scenario_dir,'1',filename))
            df2 = pd.read_csv(scenario_input(ve_scenario_dir,'2',filename))

            float_mix_cols = list(df1.select_dtypes('float').columns)
            for j in no_mix_cols:
                if j in float_mix_cols:
                    float_mix_cols.remove(j)

            if float_mix_cols:
                df1_float = df1[float_mix_cols]
                df2_float = df2[float_mix_cols]
                delta_float = df2_float - df1_float
                df1[float_mix_cols] = df1_float + (delta_float * weight_) #(5)!

            int_mix_cols = list(df1.select_dtypes('int').columns)
            for j in no_mix_cols:
                if j in int_mix_cols:
                    int_mix_cols.remove(j)

            if int_mix_cols:
                df1_int = df1[int_mix_cols]
                df2_int = df2[int_mix_cols]
                delta_int = df2_int - df1_int
                df_int_mix = df1_int + (delta_int * weight_)
                df1[int_mix_cols] = np.round(df_int_mix).astype(int)

            out_filename = join_norm(
                self.resolved_model_path, 'inputs', filename
            )
            df1.to_csv(out_filename, index=False, float_format="%.5f")

The params dictionary is passed through from the setup method to the _manipulate_by_delta method.
The weight_param is the name of the parameter that determines how much of the difference between the two datasets should be applied. A value of 0 results in no change from the first dataset, while 1 results in a full shift to the second.
The ve_scenario_dir must contain two subdirectories: 1 (baseline) and 2 (target scenario). These subfolders must contain identically named files for proper delta computation.
The no_mix_cols argument specifies the columns that should not be modified.
This computes the delta between two input datasets and applies the weight (e.g., 30% of the delta) to first input dataset and updates the columns of the first dataset.

This approach expects you scope file to define the parameter in the [0, 1] interval, similar to linear interpolation, but conceptually it's applying partial additive changes rather than a blend.

You will find this function mirrored in the EMAT exploratory scope definition

inputs:
    LANEMILESCEN:
        shortname: Marea Lane Miles
        address: TRIPINCREMENT
        ptype: policy lever
        dtype: float # (1)!
        desc: Different marea lane mile scenario
        default: 0
        min: 0.0 # (2)!
        max: 1.0 # (3)!

The dtype is set to float to indicate that this is a continuous input, which can take on a range of values.
The min value for delta factor should be 0.0. Positive values less than or equal to 1.0 are also acceptable as long as it is less than the max value.
The max value for the delta factor can be any value less than 1 but greater than the min value.

The input folder structure should look like this:

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 LANEMILESCEN/
    │   ├── 📁 1/
    │   │   └── 📄 marea_lane_miles.csv
    │   └── 📁 2/
    │       └── 📄 marea_lane_miles.csv
    └── 📁 OTHER_PARAMETER/

This structure ensures the function can locate both versions of the input file and interpolate between them as needed.

Template Injection

The template injection method modifies the input file based on a predefined template using parameter values from the experimental setup to directly update specific fields. This is useful when values in a table are calculated rather than interpolated or scaled, such as applying a compound growth rate to income projections or updating a single parameter across multiple years.

The _manipualte_income function shown below demonstrates how to apply this approach to per capita income data, adjusting values across simulation years using a user-defined growth rate.

def _manipulate_income(
    self, 
    params # (1)!
    ):

    income_df = pd.read_csv(join_norm(scenario_input(self.scenario_input_dirs.get('INCOMEGROWTHRATE'),'azone_per_cap_inc.csv'))) # (2)!

    unique_years = income_df.Year.unique()
    base_year = self.model_base_year

    for run_year in unique_years:
        year_diff = run_year - base_year
        income_df.loc[income_df.Year == run_year,['HHIncomePC.2005', 'GQIncomePC.2005']] = \
        income_df.loc[income_df.Year == run_year,['HHIncomePC.2005', 'GQIncomePC.2005']] * (params['INCOMEGROWTHRATE'] ** year_diff) # (3)!

    out_filename = join_norm(
        self.resolved_model_path, 'inputs', 'azone_per_cap_inc.csv'
    )
    _logger.debug(f"writing updates to: {out_filename}")
    income_df.to_csv(out_filename, index=False)

The params dictionary contains the value of the INCOMEGROWTHRATE parameter, which may vary across experimental runs.
The azone_per_cap_inc.csv input file is treated as a template. The structure of the file is retained but the specific fields are updated using the growth rate.
For each unique year in the file, the function computes how far that year is from the base model year and applies compound growth accordingly.The columns HHIncomePC.2005 and GQIncomePC.2005 are multiplied by the growth factor.

Here is an example setup in the scope file.

inputs:
    INCOMEGROWTHRATE:
        shortname: Income Growth Rate
        address: INCOMEGROWTHRATE
        ptype: exogenous uncertainty
        dtype: float # (1)!
        desc: Annual compound growth rate for per capita income
        default: 1.0
        min: 0.95 # (2)!
        max: 1.05 # (3)!

The dtype is set to float to indicate that this is a continuous input, which can take on a range of values.
The min and max defines the range of annual growth factors. A default value of 1 indicates no change, while values below or above that reflect decreases or increases, respectively.

Unlike additive or scaling methods, template injection uses a single version of the input file, typically stored in the scenario-specific directory:

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 ANOTHER_PARAMETER/
    ├── 📁 INCOMEGROWTHRATE/
    │   └── 📄 azone_per_cap_inc.csv
    └── 📁 OTHER_PARAMETER/

Direct Injection

The direct injection method is used to overwrite values directly in input files using parameter values. Unlike interpolation or scaling methods that manipulate entire tables or columns, direct injection targets specific cells—typically a single value in a known row and column. This approach is ideal when a policy lever or exogenous uncertainty corresponds to a single numeric input that varies across EMAT experiments.

For instance, consider a scenario where you want to update the average occupancy in shared care services for a particular year.

The _manipulate_shdcarsvc function shown below is an example implementation

def _manipulate_shdcarsvc(
    self, 
    params # (1)!
    ):

        shdcarsvc_occp_df = pd.read_csv(join_norm(scenario_input(self.scenario_input_dirs.get('SHDCARSVCOCCUPRATE'),'region_carsvc_shd_occup.csv'))) # (2)!

        future_year = self.model_future_year # (3)!

        shdcarsvc_occp_df.loc[shdcarsvc_occp_df.Year == future_year, 'ShdCarSvcAveOccup'] = params['SHDCARSVCOCCUPRATE'] # (4)!

        out_filename = join_norm(
            self.resolved_model_path, 'inputs', 'region_carsvc_shd_occup.csv'
        )
        _logger.debug(f"writing updates to: {out_filename}")
        shdcarsvc_occp_df.to_csv(out_filename, index=False)

The params dictionary contains the value of the SHDCARSVCOCCUPRATE parameter, which may vary across experimental runs.
The input file region_carsvc_shd_occup.csv is read from the scenario folder.
The method identifies the target year using self.model_future_year.
The ShdCarSvcAveOccup column only for the year is set to the value of the SHDCARSVCOCCUPRATE parameter.
The modified file is saved to the model's resolved input directory for the use in the experiment run.

Here is an setup in the scope file

inputs:
    SHDCARSVCOCCUPRATE:
        shortname: Shared Car Svc Occup
        address: SHDCARSVCOCCUPRATE
        ptype: exogenous uncertainty
        dtype: float 
        desc: Average occupancy in shared car services for future year.
        default: 2.25
        min: 1.0 # (1)!
        max: 3 # (1)!

The default, min, and max values define the range of average occupancy that will be injected into the input file.

Similar to template injection, direct injection uses a single version of the input file, typically stored in the scenario-specific directory:

📁 Scenario-Inputs/
└── 📁 OTP/
    ├── 📁 ANOTHER_PARAMETER/
    ├── 📁 SHDCARSVCOCCUPRATE/
    │   └── 📄 region_carsvc_shd_occup.csv
    └── 📁 OTHER_PARAMETER/

Custom Methods

The various methods for manipulating input files described above are likely to be sufficient for most experiments. However, it is not strictly necessary to follow any of these recipes. Advanced users can harness the full power and flexibility of Python to manipulate or create new VisionEval input files in any way they see fit, by writing bespoke methods to do so and calling those methods from the setup method of your subclass of FilesCoreModel. Virtually any method or process that can be called from Python can be used to manipulate the input files. This also includes potentially modifying or creating new input files using R or any other programming language, by calling the necessary commands as subprocesses from Python. The Python code necessary to call R (or any other tool) as a subprocess is very similar to the code shown in the run method below, and can be used to run R scripts or any other command line tool from within Python.

Running an Experiment

Once the input files have been prepared, the VisionEval model can be run. The FilesCoreModel interface defines the run method as the place to run the VisionEval model. The run method needs to be overloaded in a subclass of FilesCoreModel to define the specific steps needed to run the VisionEval model.

In this example, the main thing we do in the run method is to set the path environment variable to include the path to the R executable, and then run s small script that opens the VisionEval model and runs it with the desired inputs.

class VEModel(FilesCoreModel):
    ...

    def run(self):  # (1)!
        os.environ["path"] = (
            join_norm(self.config["r_executable"]) + ";" + os.environ["path"]
        )
        cmd = "Rscript"

        # write a small script that opens the model and runs it
        with open(join_norm(self.local_directory, "vemodel_runner.R"), "wt") as script:
            script.write(f"""
            thismodel <- openModel("{r_join_norm(self.local_directory, self.modelname)}")
            thismodel$run("reset")
            """)

        self.last_run_result = subprocess.run(  # (2)!
            [cmd, "vemodel_runner.R"],
            cwd=self.local_directory,
            capture_output=True,
        )

The run method accepts no arguments. All the information needed to run the experiment is stored in files written during the setup method.
The subprocess.run command runs a command line tool. The name of the command line tool, plus all the command line arguments for the tool, are given as a list of strings, not one string. The cwd argument sets the current working directory from which the command line tool is launched. Setting capture_output to True will capture both stdout and stderr from the command line tool, and make these available in the result object to facilitate debugging.

Extracting Results

Once the VisionEval model has been run, the output files need to be collected and parsed to extract the performance measures that quantify the results of the experiment. The FilesCoreModel interface defines some standardized data extraction processes that can be configured entirely in the exploratory scope (i.e. the YAML config file), so it may not be necessary to write Python code to extract the results.

Each performance measure that is to be extracted from the output files of the VisionEval model is defined in the exploratory scope definition, under the outputs section. Each output is defined by a unique name, and the parser section of the output definition defines how to extract the performance measure from the output files. The parser section can include a file argument, which is the name of the output file from which the performance measure is to be extracted. It can also include a loc or an iloc argument, which is the location in the output file where the performance measure is to be found. These locations correlate with the pandas.DataFrame accessors loc and iloc, respectively. For the loc argument, file is read in as a pandas.DataFrame, with the first row as the column names, and the first column as the index, and the performance measure is extracted by selecting the row & column with the labels that match the value of the loc argument. For the iloc argument, the performance measure is extracted by selecting the row & column with the integer positions that match the value of the iloc argument.

outputs:
    HouseholdDvmtPerHh:
        kind: info
        desc: Average daily vehicle miles traveled by households in 2050
        metamodeltype: linear
        parser:
            file: state_validation_measures.csv
            loc: [HouseholdDvmtPerHh, 2050]

Since most VisionEval output files are in CSV format, this simple parsing will be sufficient for most cases. However, if the desired performance measures is not directly available as a scalar value in an output files, there is also an eval parser that can be used to evaluate a Python expression that computes the desired result. For example, this parser will compute the difference between the 2010 and 2038 values of the UrbanHhCO2e values, and report the difference as the performance measure.

outputs:
    UrbanHhCO2eReduction:
        shortname: Cars GHG Reduction
        kind: info
        desc: Reduction from 2010 level in average annual production of greenhouse gas emissions from light-duty
            vehicle travel by households residing in the urban area
        transform: none
        metamodeltype: linear
        parser:
            file: Measures_VERSPM_2010,2038_Marea=RVMPO.csv
            eval: loc['UrbanHhCO2e','2010'] - loc['UrbanHhCO2e','2038']

For more complex parsing and analysis, you can define a custom Python function to compute arbitrarily complex performance measures. In the example repository, the Oregon VE State model has a custom parser written in R that computes a number of performance measures. This R script is called as a subprocess from the post_process method of the FilesCoreModel subclass. The built-in parser described above is then used to extract the performance measures from the output of the R post-processing script.