Basic EMAT Model API

class emat.model.AbstractCoreModel(configuration: Optional[Union[str, Mapping]], scope, safe=True, db=None, name='EMAT', metamodel_id=0)[source]

Bases: abc.ABC, emat.workbench.em_framework.model.AbstractModel

An interface for using a model with EMAT.

Individual models should be instantiated using derived subclasses of this abstract base class, and not using this class directly.

Parameters:
  • configuration (str or Mapping or None) – The configuration for this core model. This can be given explicitly as a dict, or as a str which gives the filename of a YAML file that will be loaded. If there is no configuration, giving None is also acceptable.
  • scope (Scope or str) – The exploration scope, as a Scope object or as a str which gives the filename of a YAML file that will be loaded.
  • safe (bool) – Load the configuration YAML file in ‘safe’ mode. This can be disabled if the configuration requires custom Python types or is otherwise not compatible with safe mode. Loading configuration files with safe mode off is not secure and should not be done with files from untrusted sources.
  • db (Database, optional) – An optional Database to store experiments and results.
  • name (str, default "EMAT") – A name for this model, given as an alphanumeric string. The name is required by workbench operations.
  • metamodel_id (int, optional) – An identifier for this model, if it is a meta-model. Defaults to 0 (i.e., not a meta-model).
property allow_short_circuit

Allow model runs to be skipped if measures already appear in the database.

Type:Bool
async_experiments(design: Optional[pandas.core.frame.DataFrame] = None, db=None, *, design_name=None, evaluator=None, max_n_workers=None, stagger_start=None, batch_size=None)[source]

Asynchronously runs a design of combined experiments using this model.

A combined experiment includes a complete set of input values for all exogenous uncertainties (a Scenario) and all policy levers (a Policy). Unlike the perform_experiments function in the EMA Workbench, this method pairs each Scenario and Policy in sequence, instead of running all possible combinations of Scenario and Policy. This change ensures compatibility with the EMAT database modules, which preserve the complete set of input information (both uncertainties and levers) for each experiment. To conduct a full cross-factorial set of experiments similar to the default settings for EMA Workbench, use a factorial design, by setting the jointly argument for the design_experiments to False, or by designing experiments outside of EMAT with your own approach.

Parameters:
  • design (pandas.DataFrame, optional) – experiment definitions given as a DataFrame, where each exogenous uncertainties and policy levers is given as a column, and each row is an experiment.
  • db (Database, required) – The database to use for loading and saving experiments. If none is given, the default database for this model is used. If there is no default db, and none is given here, these experiments will be aborted.
  • design_name (str, optional) – The name of a design of experiments to load from the database. This design is only used if design is None.
  • evaluator (emat.workbench.Evaluator, optional) – Optionally give an evaluator instance. If not given, a default DistributedEvaluator will be instantiated. Passing any other kind of evaluator will currently cause an error, although in the future other async compatible evaluators may be provided.
  • max_n_workers (int, optional) – The maximum number of workers that will be created for a default dask.distributed LocalCluster. If the number of cores available is smaller than this number, fewer workers will be spawned. This value is only used if a default LocalCluster has not yet been created.
  • stagger_start (int, optional) – If provided, wait this number of seconds between initial dispatch of experiments to the evaluator. For models that do a lot of file copying up front, this can prevent over-saturating the file storage system.
  • batch_size (int, optional) – For fast-running core models, the overhead from multi-processing can represent a big chunk of overall runtime. Grouping experiments into batches that are sent to workers as a group can mitigate this. Setting batch_size to 1 will process every experiment separately. If no batch size is given, a guess is made as to an efficient batch_size based on the number of experiments and the number of workers.
Raises:

ValueError – If there are no experiments defined. This includes the situation where design is given but no database is available.

enter_run_model()[source]

A hook for actions at the very beginning of the run_model step.

exit_run_model()[source]

A hook for actions at the very end of the run_model step.

property ignore_crash

Allow model runs to post_process and archive even after an apparent crash in run.

Type:Bool
property killed_indicator

The name of a file that indicates the model was killed due to an unrecoverable error.

The flag is the mere existance of a file with this name, not any particular file content. This file is deleted automatically when the model run is initiated, so that it can be recreated to indicate an unrecoverable error.

Type:str
property local_directory

The current local working directory for this model.

Type:Path
log(message, level=20)[source]

Log a message.

This facility will attempt to send log messages to the attached database, falling back to the regular module logger in case that fails.

Parameters:
  • message (str) – Message to send to log.
  • level (int, default logging.INFO) – Log level.

Returns:

property resolved_model_path

The resolved model path.

For core models that don’t rely on the file system, this is set to the current working directory and is generally irrelevant. Overload this property for models that do rely on the file system.

Type:Path
property success_indicator

The name of a file that indicates the model has run successfully.

The flag is the mere existance of a file with this name, not any particular file content. This file is deleted automatically when the model run is initiated, so that it can be recreated to indicate a success.

Type:str

Abstract Methods

The interface for these methods is defined in this abstract base class, but any implementation must provide implementation-specific overrides for each of these methods.

Note

An important feature of overriding these functions is that the function signature (what arguments and types each function accepts, and what types it returns) should not be changed, even though technically Python itself allows doing so.

abstract AbstractCoreModel.setup(params)[source]

Configure the core model with the experiment variable values.

This method is the place where the core model set up takes place, including creating or modifying files as necessary to prepare for a core model run. When running experiments, this method is called once for each core model experiment, where each experiment is defined by a set of particular values for both the exogenous uncertainties and the policy levers. These values are passed to the experiment only here, and not in the run method itself. This facilitates debugging, as the setup method can potentially be used without the run method, allowing the user to manually inspect the prepared files and ensure they are correct before actually running a potentially expensive model.

Each input exogenous uncertainty or policy lever can potentially be used to manipulate multiple different aspects of the underlying core model. For example, a policy lever that includes a number of discrete future network “build” options might trigger the replacement of multiple related network definition files. Or, a single uncertainty relating to the cost of fuel might scale both a parameter linked to the modeled per-mile cost of operating an automobile, as well as the modeled total cost of fuel used by transit services.

At the end of the setup method, a core model experiment should be ready to run using the run method.

Parameters:params (dict) – experiment variables including both exogenous uncertainty and policy levers
Raises:KeyError – if a defined experiment variable is not supported by the core model
abstract AbstractCoreModel.run()[source]

Run the core model.

This method is the place where the core model run takes place. Note that this method takes no arguments; all the input exogenous uncertainties and policy levers are delivered to the core model in the setup method, which will be executed prior to calling this method. This facilitates debugging, as the setup method can potentially be used without the run method, allowing the user to manually inspect the prepared files and ensure they are correct before actually running a potentially expensive model. When running experiments, this method is called once for each core model experiment, after the setup method completes.

If the core model requires some post-processing by post_process method defined in this API, then when this function terminates the model directory should be in a state that is ready to run the post_process command next.

Raises:UserWarning – If model is not properly setup
abstract AbstractCoreModel.load_measures(measure_names: Optional[Collection[str]] = None, *, rel_output_path=None, abs_output_path=None)dict[source]

Import selected measures from the core model.

This method is the place to put code that can actually reach into files in the core model’s run results and extract performance measures. It is expected that it should not do any post-processing of results (i.e. it should read from but not write to the model outputs directory).

Imports measures from active scenario

Parameters:
  • measure_names (Collection[str]) – Collection of measures to be loaded.
  • rel_output_path (str, optional) – Path to model output locations, either relative to the model_path directory (when a subclass is a type that has a model path) or as an absolute directory. If neither is given, the default value is equivalent to setting rel_output_path to ‘Outputs’.
  • abs_output_path (str, optional) – Path to model output locations, either relative to the model_path directory (when a subclass is a type that has a model path) or as an absolute directory. If neither is given, the default value is equivalent to setting rel_output_path to ‘Outputs’.
Returns:

dict of measure name and values from active scenario

Raises:

KeyError – If load_measures is not available for specified measure

AbstractCoreModel.post_process(params, measure_names, output_path=None)[source]

Runs post processors associated with particular performance measures.

This method is the place to conduct automatic post-processing of core model run results, in particular any post-processing that is expensive or that will write new output files into the core model’s output directory. The core model run should already have been completed using setup and run. If the relevant performance measures do not require any post-processing to create (i.e. they can all be read directly from output files created during the core model run itself) then this method does not need to be overloaded for a particular core model implementation.

Parameters:
  • params (dict) – Dictionary of experiment variables, with keys as variable names and values as the experiment settings. Most post-processing scripts will not need to know the particular values of the inputs (exogenous uncertainties and policy levers), but this method receives the experiment input parameters as an argument in case one or more of these parameter values needs to be known in order to complete the post-processing.
  • measure_names (List[str]) – List of measures to be processed. Normally for the first pass of core model run experiments, post-processing will be completed for all performance measures. However, it is possible to use this argument to give only a subset of performance measures to post-process, which may be desirable if the post-processing of some performance measures is expensive. Additionally, this method may also be called on archived model results, allowing it to run to generate only a subset of (probably new) performance measures based on these archived runs.
  • output_path (str, optional) – Path to model outputs. If this is not given (typical for the initial run of core model experiments) then the local/default model directory is used. This argument is provided primarily to facilitate post-processing archived model runs to make new performance measures (i.e. measures that were not in-scope when the core model was actually run).
Raises:

KeyError – If post process is not available for specified measure

abstract AbstractCoreModel.get_experiment_archive_path(experiment_id=None, makedirs=False, parameters=None, run_id=None)[source]

Returns a file system location to store model run outputs.

For core models with long model run times, it is recommended to store the complete model run results in an archive. This will facilitate adding additional performance measures to the scope at a later time.

Both the scope name and experiment id can be used to create the folder path.

Parameters:
  • experiment_id (int) – The experiment id, which is also the row id of the experiment in the database. If this is omitted, an experiment id is read or created using the parameters.
  • makedirs (bool, default False) – If this archive directory does not yet exist, create it.
  • parameters (dict, optional) – The parameters for this experiment, used to create or lookup an experiment id. The parameters are ignored if experiment_id is given.
  • run_id (UUID, optional) – The run_id of this model run. If not given but a run_id attribute is stored in this FilesCoreModel instance, that value is used.
Returns:

Experiment archive path (no trailing backslashes).

Return type:

str

abstract AbstractCoreModel.archive(params, model_results_path, experiment_id: int = 0)[source]

Copies model outputs to archive location.

Parameters:
  • params (dict) – Dictionary of experiment variables
  • model_results_path (str) – archive path
  • experiment_id (int, optional) – The id number for this experiment.

Data Management

AbstractCoreModel.read_experiments(design_name, db=None, only_pending=False, only_complete=False, only_with_measures=False)[source]

Reads results from a design of experiments from the database.

Parameters:
  • design_name (str) – The name of the design to load.
  • db (Database, optional) – The Database from which to read experiments. If no db is given, the default db for this model is used.
  • only_pending (bool, default False) – If True, only pending experiments (which have no performance measure results stored in the database) are returned.
  • only_complete (bool, default False) – If True, only complete experiments (which have no performance measure results missing in the database) are returned.
  • only_with_measures (bool, default False) – If True, only experiments with at least one stored performance measure are returned.
Returns:

A DataFrame that contains all uncertainties, levers, and measures for the experiments.

Return type:

pandas.DataFrame

Raises:

ValueError – If there is no Database connection db set.

AbstractCoreModel.read_experiment_parameters(design_name=None, db=None, only_pending=False, *, experiment_ids=None)[source]

Reads uncertainties and levers from a design of experiments from the database.

Parameters:
  • design_name (str, optional) – If given, only experiments associated with both the scope and the named design are returned, otherwise all experiments associated with the scope are returned.
  • db (Database, optional) – The Database from which to read experiments. If no db is given, the default db for this model is used.
  • only_pending (bool, default False) – If True, only pending experiments (which have no performance measure results stored in the database) are returned.
  • experiment_ids (Collection, optional) – A collection of experiment id’s to load. If given, both design_name and only_pending are ignored.
Returns:

A DataFrame that contains all uncertainties, levers, and measures for the experiments.

Return type:

pandas.DataFrame

Raises:

ValueError – If db is not given and there is no default Database connection set.

AbstractCoreModel.read_experiment_measures(design_name, experiment_id=None, db=None)[source]

Reads performance measures from a design of experiments from the database.

Parameters:
  • design_name (str) – The name of the design to load.
  • experiment_id (int, optional) – The id of the experiment to load.
  • db (Database, optional) – The Database from which to read experiment(s). If no db is given, the default db for this model is used.
Returns:

A DataFrame that contains all uncertainties, levers, and measures for the experiments.

Return type:

pandas.DataFrame

Raises:

ValueError – If db is not given and there is no default Database connection set.

AbstractCoreModel.ensure_dtypes(df: pandas.core.frame.DataFrame)[source]

Convert columns of dataframe to correct dtype as needed.

Parameters:df (pandas.DataFrame) – A dataframe with column names that are uncertainties, levers, or measures.
Returns:The same data as input, but with dtypes as appropriate.
Return type:pandas.DataFrame

Model Execution

Assuming that the abstract methods outlined above are properly implemented, these model execution methods should not need to be overridden.

AbstractCoreModel.design_experiments(*args, **kwargs)[source]

Create a design of experiments based on this model.

Parameters:
  • n_samples_per_factor (int, default 10) – The number of samples in the design per random factor.
  • n_samples (int or tuple, optional) – The total number of samples in the design. If jointly is False, this is the number of samples in each of the uncertainties and the levers, the total number of samples will be the square of this value. Give a 2-tuple to set values for uncertainties and levers respectively, to set them independently. If this argument is given, it overrides n_samples_per_factor.
  • random_seed (int or None, default 1234) – A random seed for reproducibility.
  • db (Database, optional) – If provided, this design will be stored in the database indicated. If not provided, the db for this model will be used, if one is set.
  • design_name (str, optional) – A name for this design, to identify it in the database. If not given, a unique name will be generated based on the selected sampler.
  • sampler (str or AbstractSampler, default 'lhs') –

    The sampler to use for this design. Available pre-defined samplers include:

    • ’lhs’: Latin Hypercube sampling
    • ’ulhs’: Uniform Latin Hypercube sampling, which ignores defined
      distribution shapes from the scope and samples everything as if it was from a uniform distribution
    • ’mc’: Monte carlo sampling
    • ’uni’: Univariate sensitivity testing, whereby experiments are
      generated setting each parameter individually to minimum and maximum values (for numeric dtypes) or all possible values (for boolean and categorical dtypes). Note that designs for univariate sensitivity testing are deterministic and the number of samples given is ignored.
  • sample_from ('all', 'uncertainties', or 'levers') – Which scope components from which to sample. Components not sampled are set at their default values in the design.
  • jointly (bool, default True) – Whether to sample jointly all uncertainties and levers in a single design, or, if False, to generate separate samples for levers and uncertainties, and then combine the two in a full-factorial manner. This argument has no effect unless sample_from is ‘all’. Note that setting jointly to False may produce a very large design, as the total number of experiments will be the product of the number of experiments for the levers and the number of experiments for the uncertainties, which are set separately (i.e. if n_samples is given, the total number of experiments is the square of that value).
Returns:

The resulting design.

Return type:

pandas.DataFrame

AbstractCoreModel.run_experiments(design=None, evaluator=None, *, design_name=None, db=None, allow_short_circuit=None)[source]

Runs a design of combined experiments using this model.

A combined experiment includes a complete set of input values for all exogenous uncertainties (a Scenario) and all policy levers (a Policy). Unlike the perform_experiments function in the EMA Workbench, this method pairs each Scenario and Policy in sequence, instead of running all possible combinations of Scenario and Policy. This change ensures compatibility with the EMAT database modules, which preserve the complete set of input information (both uncertainties and levers) for each experiment. To conduct a full cross-factorial set of experiments similar to the default settings for EMA Workbench, use a factorial design, by setting the jointly argument for the design_experiments to False, or by designing experiments outside of EMAT with your own approach.

Parameters:
  • design (pandas.DataFrame, optional) – experiment definitions given as a DataFrame, where each exogenous uncertainty and policy levers is given as a column, and each row is an experiment.
  • evaluator (emat.workbench.Evaluator, optional) – Optionally give an evaluator instance. If not given, a default SequentialEvaluator will be instantiated.
  • design_name (str, optional) – The name of a design of experiments to load from the database. This design is only used if design is None.
  • db (Database, optional) – The database to use for loading and saving experiments. If none is given, the default database for this model is used. If there is no default db, and none is given here, the results are not stored in a database. Set to False to explicitly not use the default database, even if it exists.
Returns:

A DataFrame that contains all uncertainties, levers, and measures for the experiments.

Return type:

pandas.DataFrame

Raises:

ValueError – If there are no experiments defined. This includes the situation where design is given but no database is available.

AbstractCoreModel.io_experiment(params)[source]

Run an experiment, and return a dictionary of inputs and outputs together.

Parameters:params – dict
Returns:dict
AbstractCoreModel.run_reference_experiment(evaluator=None, *, db=None)[source]

Runs a reference experiment using this model.

This single experiment includes a complete set of input values for all exogenous uncertainties (a Scenario) and all policy levers (a Policy). Each is set to the default value indicated by the scope.

Parameters:
  • evaluator (emat.workbench.Evaluator, optional) – Optionally give an evaluator instance. If not given, a default SequentialEvaluator will be instantiated.
  • db (Database, optional) – The database to use for loading and saving experiments. If none is given, the default database for this model is used. If there is no default db, and none is given here, the results are not stored in a database. Set to False to explicitly not use the default database, even if it exists.
Returns:

A DataFrame that contains all uncertainties, levers, and measures for the experiments.

Return type:

pandas.DataFrame

AbstractCoreModel.optimize(searchover='levers', evaluator=None, nfe=10000, convergence='default', display_convergence=True, convergence_freq=100, constraints=None, reference=None, reverse_targets=False, algorithm=None, epsilons='auto', min_epsilon=0.1, cache_dir=None, cache_file=None, check_extremes=False, **kwargs)[source]

Perform multi-objective optimization over levers or uncertainties.

The targets for the multi-objective optimization (i.e. whether each individual performance measures is to be maximized or minimized) are read from the model’s scope.

Parameters:
  • searchover ({'levers', 'uncertainties'}) – Which group of inputs to search over. The other group will be set at their default values, unless other values are provided in the reference argument.
  • evaluator (Evaluator, optional) – The evaluator to use to run the model. If not given, a SequentialEvaluator will be created.
  • nfe (int, default 10_000) – Number of function evaluations. This generally needs to be fairly large to achieve stable results in all but the most trivial applications.
  • convergence ('default', None, or emat.optimization.ConvergenceMetrics) – A convergence display during optimization. The default value is to report the epsilon-progress (the number of solutions that ever enter the candidate pool of non-dominated solutions) and the number of solutions remaining in that candidate pool. Pass None explicitly to disable convergence tracking.
  • display_convergence (bool, default True) – Whether to automatically display figures that dynamically track convergence. Set to False if you are not using this method within a Jupyter interactive environment.
  • convergence_freq (int, default 100) – How frequently to update the convergence measures. There is some computational overhead to these convergence updates, so setting a value too small may noticeably slow down the process.
  • constraints (Collection[Constraint], optional) – Solutions will be constrained to only include values that satisfy these constraints. The constraints can be based on the search parameters (levers or uncertainties, depending on the value given for searchover), or performance measures, or some combination thereof.
  • reference (Mapping) – A set of values for the non-active inputs, i.e. the uncertainties if searchover is ‘levers’, or the levers if searchover is ‘uncertainties’. Any values not set here revert to the default values identified in the scope.
  • reverse_targets (bool, default False) – Whether to reverse the optimization targets given in the scope (i.e., changing minimize to maximize, or vice versa). This will result in the optimization searching for the worst outcomes, instead of the best outcomes.
  • algorithm (platypus.Algorithm, optional) – Select an algorithm for multi-objective optimization. The default algorithm is EpsNSGAII. See platypus documentation for details.
  • epsilons (float or array-like) – Used to limit the number of distinct solutions generated. Set to a larger value to get fewer distinct solutions.
  • cache_dir (path-like, optional) – A directory in which to cache results. Most of the arguments will be hashed to develop a unique filename for these results, making this generally safer than cache_file.
  • cache_file (path-like, optional) – A file into which to cache results. If this file exists, the contents of the file will be loaded and all other arguments are ignored. Use with great caution.
  • kwargs – Any additional arguments will be passed on to the platypus algorithm.
Returns:

The set of non-dominated solutions found. When convergence is given, the convergence measures are included, as a pandas.DataFrame in the convergence attribute.

Return type:

emat.OptimizationResult

AbstractCoreModel.robust_optimize(robustness_functions, scenarios, evaluator=None, nfe=10000, convergence='default', display_convergence=True, convergence_freq=100, constraints=None, epsilons=0.1, cache_dir=None, cache_file=None, algorithm=None, check_extremes=False, **kwargs)[source]

Perform robust optimization.

The robust optimization generally a multi-objective optimization task. It is undertaken using statistical measures of outcomes evaluated across a number of scenarios, instead of using the individual outcomes themselves. For each candidate policy, the model is evaluated against all of the considered scenarios, and then the robustness measures are evaluated using the set of outcomes from the original runs. The robustness measures are aggregate measures that are computed from a set of outcomes. For example, this may be expected value, median, n-th percentile, minimum, or maximum value of any individual outcome. It is also possible to have joint measures, e.g. expected value of the larger of outcome 1 or outcome 2.

Each robustness function is indicated as a maximization or minimization target, where higher or lower values are better, respectively. The optimization process then tries to identify one or more non-dominated solutions for the possible policy levers.

Parameters:
  • robustness_functions (Collection[Measure]) – A collection of aggregate statistical performance measures.
  • scenarios (int or Collection) – A collection of scenarios to use in the evaluation(s), or give an integer to generate that number of random scenarios.
  • evaluator (Evaluator, optional) – The evaluator to use to run the model. If not given, a SequentialEvaluator will be created.
  • nfe (int, default 10_000) – Number of function evaluations. This generally needs to be fairly large to achieve stable results in all but the most trivial applications.
  • convergence ('default', None, or emat.optimization.ConvergenceMetrics) – A convergence display during optimization.
  • display_convergence (bool, default True) – Automatically display the convergence metric figures when optimizing.
  • convergence_freq (int, default 100) – The frequency at which convergence metric figures are updated.
  • constraints (Collection[Constraint], optional) – Solutions will be constrained to only include values that satisfy these constraints. The constraints can be based on the policy levers, or on the computed values of the robustness functions, or some combination thereof.
  • epsilons (float or array-like) – Used to limit the number of distinct solutions generated. Set to a larger value to get fewer distinct solutions.
  • cache_dir (path-like, optional) – A directory in which to cache results. Most of the arguments will be hashed to develop a unique filename for these results, making this generally safer than cache_file.
  • cache_file (path-like, optional) – A file into which to cache results. If this file exists, the contents of the file will be loaded and all other arguments are ignored. Use with great caution.
  • algorithm (platypus.Algorithm or str, optional) – Select an algorithm for multi-objective optimization. The algorithm can be given directly, or named in a string. See platypus documentation for details.
  • check_extremes (bool or int, default False) – Conduct additional evaluations, setting individual policy levers to their extreme values, for each candidate Pareto optimal solution.
  • kwargs – any additional arguments will be passed on to the platypus algorithm.
Returns:

The set of non-dominated solutions found. When convergence is given, the convergence measures are included, as a pandas.DataFrame in the convergence attribute.

Return type:

emat.OptimizationResult

AbstractCoreModel.robust_evaluate(robustness_functions, scenarios, policies, evaluator=None, cache_dir=None, suspend_db=True)[source]

Perform robust evaluation(s).

The robust evaluation is used to generate statistical measures of outcomes, instead of generating the individual outcomes themselves. For each policy, the model is evaluated against all of the considered scenarios, and then the robustness measures are evaluated using the set of outcomes from the original runs. The robustness measures are aggregate measures that are computed from a set of outcomes. For example, this may be expected value, median, n-th percentile, minimum, or maximum value of any individual outcome. It is also possible to have joint measures, e.g. expected value of the larger of outcome 1 or outcome 2.

Parameters:
  • robustness_functions (Collection[Measure]) – A collection of aggregate statistical performance measures.
  • scenarios (int or Collection) – A collection of scenarios to use in the evaluation(s), or give an integer to generate that number of random scenarios.
  • policies (int, or collection) – A collection of policies to use in the evaluation(s), or give an integer to generate that number of random policies.
  • evaluator (Evaluator, optional) – The evaluator to use to run the model. If not given, a SequentialEvaluator will be created.
  • cache_dir (path-like, optional) – A directory in which to cache results.
  • suspend_db (bool, default True) – Suspend writing the results of individual model runs to the database. Robust evaluation potentially generates a large number of model executions, and storing all these individual results may not be useful.
Returns:

The computed value of each item

in robustness_functions, for each policy in policies.

Return type:

pandas.DataFrame

AbstractCoreModel.feature_scores(design, return_type='styled', random_state=None, cmap='viridis', measures=None, shortnames=None)[source]

Calculate feature scores based on a design of experiments.

This method is provided as a convenient pass-through to the feature_scores function in the analysis sub-package, using the scope and database attached to this model.

Parameters:
  • design (str or pandas.DataFrame) – The name of the design of experiments to use for feature scoring, or a single pandas.DataFrame containing the experimental design and results.
  • return_type ({'styled', 'figure', 'dataframe'}) – The format to return, either a heatmap figure as an SVG render in and xmle.Elem, or a plain pandas.DataFrame, or a styled dataframe.
  • random_state (int or numpy.RandomState, optional) – Random state to use.
  • cmap (string or colormap, default 'viridis') – matplotlib colormap to use for rendering.
  • measures (Collection, optional) – The performance measures on which feature scores are to be generated. By default, all measures are included.
Returns:

Returns a rendered SVG as xml, or a DataFrame, depending on the return_type argument.

Return type:

xmle.Elem or pandas.DataFrame

This function internally uses feature_scoring from the EMA Workbench, which in turn scores features using the “extra trees” regression approach.

Meta-Model Construction

AbstractCoreModel.create_metamodel_from_data(experiment_inputs: pandas.core.frame.DataFrame, experiment_outputs: pandas.core.frame.DataFrame, output_transforms: Optional[dict] = None, metamodel_id: Optional[int] = None, include_measures=None, exclude_measures=None, db=None, random_state=None, experiment_stratification=None, suppress_converge_warnings=False, regressor=None, find_best_metamodeltype=False)[source]

Create a MetaModel from a set of input and output observations.

Parameters:
  • experiment_inputs (pandas.DataFrame) – This dataframe should contain all of the experimental inputs, including values for each uncertainty, level, and constant.
  • experiment_outputs (pandas.DataFrame) – This dataframe should contain all of the experimental outputs, including a column for each performance measure. The index for the outputs should match the index for the experiment_inputs, so that the I-O matches row-by-row.
  • output_transforms (dict) – Deprecated. Specify the output transforms directly in the scope instead.
  • metamodel_id (int, optional) – An identifier for this meta-model. If not given, a unique id number will be created randomly.
  • include_measures (Collection[str], optional) – If provided, only output performance measures with names in this set will be included.
  • exclude_measures (Collection[str], optional) – If provided, only output performance measures with names not in this set will be included.
  • db (Database, optional) – The database to use for loading and saving metamodels. If none is given, the default database for this model is used. If there is no default db, and none is given here, the metamodel is not stored in a database.
  • random_state (int, optional) – A random state to use in the metamodel regression fitting.
  • experiment_stratification (pandas.Series, optional) – A stratification of experiments, used in cross-validation.
  • suppress_converge_warnings (bool, default False) – Suppress convergence warnings during metamodel fitting.
  • regressor (Estimator, optional) – A scikit-learn estimator implementing a multi-target regression. If not given, a detrended simple Gaussian process regression is used.
  • find_best_metamodeltype (int, default 0) – Run a search to find the best metamodeltype for each performance measure, repeating each cross-validation step this many times. For more stable results, choose 3 or more, although larger numbers will be slow. If domain knowledge about the normal expected range and behavior of each performance measure is available, it is better to give the metamodeltype explicitly in the Scope.
Returns:

a callable object that, when called as if a function, accepts keyword arguments as inputs and returns a dictionary of (measure name: value) pairs.

Return type:

MetaModel

AbstractCoreModel.create_metamodel_from_design(design_name: str, metamodel_id: Optional[int] = None, include_measures=None, exclude_measures=None, db=None, random_state=None, suppress_converge_warnings=False, regressor=None, find_best_metamodeltype=False)[source]

Create a MetaModel from a set of input and output observations.

Parameters:
  • design_name (str) – The name of the design to use.
  • metamodel_id (int, optional) – An identifier for this meta-model. If not given, a unique id number will be created randomly.
  • include_measures (Collection[str], optional) – If provided, only output performance measures with names in this set will be included.
  • exclude_measures (Collection[str], optional) – If provided, only output performance measures with names not in this set will be included.
  • random_state (int, optional) – A random state to use in the metamodel regression fitting.
  • suppress_converge_warnings (bool, default False) – Suppress convergence warnings during metamodel fitting.
  • regressor (Estimator, optional) – A scikit-learn estimator implementing a multi-target regression. If not given, a detrended simple Gaussian process regression is used.
  • find_best_metamodeltype (int, default 0) – Run a search to find the best metamodeltype for each performance measure, repeating each cross-validation step this many times. For more stable results, choose 3 or more, although larger numbers will be slow. If domain knowledge about the normal expected range and behavior of each performance measure is available, it is better to give the metamodeltype explicitly in the Scope.
Returns:

a callable object that, when called as if a function, accepts keyword arguments as inputs and returns a dictionary of (measure name: value) pairs.

Return type:

MetaModel

Raises:

ValueError – If the named design still has pending experiments.

AbstractCoreModel.create_metamodel_from_designs(design_names: str, metamodel_id: Optional[int] = None, include_measures=None, exclude_measures=None, db=None, random_state=None, suppress_converge_warnings=False)[source]

Create a MetaModel from multiple sets of input and output observations.

Parameters:
  • design_names (Collection[str]) – The names of the designs to use.
  • metamodel_id (int, optional) – An identifier for this meta-model. If not given, a unique id number will be created randomly.
  • include_measures (Collection[str], optional) – If provided, only output performance measures with names in this set will be included.
  • exclude_measures (Collection[str], optional) – If provided, only output performance measures with names not in this set will be included.
  • random_state (int, optional) – A random state to use in the metamodel regression fitting.
  • suppress_converge_warnings (bool, default False) – Suppress convergence warnings during metamodel fitting.
Returns:

a callable object that, when called as if a function, accepts keyword arguments as inputs and returns a dictionary of (measure name: value) pairs.

Return type:

MetaModel

Raises:

ValueError – If the named design still has pending experiments.