Basic EMAT Database API

class emat.database.Database(readonly=False)[source]

Bases: abc.ABC

Abstract Base Class for EMAT data storage

Database constains the design experiments, meta-model parameters, and the core and meta-model results (performance measures)

abstract add_scope_meas(scope_name, scp_m)[source]

Update the set of performance measures associated with the scope

Use this function when the core model runs are complete to add performance measures to the scope and post-process against the archived results

Parameters:
  • scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
  • scp_m (List[str]) – scope performance measures
Raises:

KeyError – If scope name does not exist or the performance measures are not initialized in the database.

abstract delete_experiment_measures(experiment_ids=None)[source]

Delete experiment performance measure results.

The method removes only the performance measures, not the parameters. This can be useful if a set of corrupted model results was stored in the database.

Parameters:experiment_ids (Collection, optional) – A collection of experiment id’s for which measures shall be deleted. Note that no scope or design are given here, experiments must be individually identified.
get_db_info()[source]

Get a short string describing this Database

Returns:str
info(stream=None)[source]

Print info about scopes and designs in this database.

abstract init_xlm(parameter_list, measure_list)[source]

Initialize or extend set of experiment variables and measures

Initialize database with universe of risk variables, policy variables, and performance measures. All variables and measures defined in scopes must be defined in this set. This method only needs to be run once after creating a new database.

Parameters:
  • parameter_list (List[tuple]) – Experiment variable tuples (variable name, type) where variable name is a string and type is ‘uncertainty’, ‘lever’, or ‘constant’
  • measure_list (List[tuple]) – Performance measure tuples (name, transform), where name is a string and transform is a defined transformation used in metamodeling, currently supported include {‘log’, None}.
property lock

Context manager to temporarily mark this database as locked.

abstract new_run_id(scope_name=None, parameters=None, location=None, experiment_id=None, source=0)[source]

Create a new run_id in the database.

Parameters:
  • scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
  • parameters (dict) – keys are experiment parameters, values are the experimental values to look up. Subsequent positional or keyword arguments are used to update parameters.
  • location (str or True, optional) – An identifier for this location (i.e. this computer). If set to True, the name of this node is found using the platform module.
  • experiment_id (int, optional) – The experiment id associated with this run. If given, the parameters are ignored.
  • source (int, default 0) – The metamodel_id of the source for this run, or 0 for a core model run.
Returns:

The run_id and experiment_id of the identified experiment

Return type:

Tuple[Int,Int]

Raises:
  • ValueError – If scope name does not exist
  • ValueError – If multiple experiments match an experiment definition. This can happen, for example, if the definition is incomplete.
abstract read_experiment_id(scope_name, *args, **kwargs)[source]

Read the experiment id previously defined in the database

Parameters:
  • scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
  • parameters (dict) – keys are experiment parameters, values are the experimental values to look up. Subsequent positional or keyword arguments are used to update parameters.
Returns:

the experiment id of the identified experiment

Return type:

int

Raises:
  • ValueError – If scope name does not exist
  • ValueError – If multiple experiments match an experiment definition. This can happen, for example, if the definition is incomplete.
abstract read_experiment_measure_sources(scope_name, design_name=None, experiment_id=None, design=None)[source]

Read all source ids from the results stored in the database.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis.
  • design_name (str, optional) – If given, only experiments associated with both the scope and the named design are returned, otherwise all experiments associated with the scope are returned.
  • experiment_id (int, optional) – The id of the experiment to retrieve. If omitted, get all experiments matching the named scope and design.
  • design (str) – Deprecated, use design_name.
Returns:

performance measure source ids

Return type:

List[Int]

abstract store_scope(scope)[source]

Save an emat.Scope directly to the database.

Parameters:scope (Scope) – The scope object to store.
Raises:KeyError – If scope name already exists.
abstract update_scope(scope)[source]

Update the emat scope information in the database.

Parameters:scope (Scope) – scope to update

Scopes

abstract Database.read_scope(scope_name=None)[source]

Load the pickled scope from the database.

Parameters:scope_name (str, optional) – The name of the scope to load. If not given and there is only one scope stored in the database, that scope is loaded. If not given and there are multiple scopes stored in the database, a KeyError is raised.
Returns:Scope
Raises:KeyError – If a name is given but is not found in the database, or if no name is given but there is more than one scope stored.
abstract Database.delete_scope(scope_name)[source]

Delete the scope from the database

Deletes the scope as well as any experiments and results associated with the scope

Parameters:scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
abstract Database.read_scope_names(design_name=None)list[source]

A list of all available scopes in the database.

Parameters:design_name (str, optional) – If a design name, is given, only scopes containing a design with this name are returned.
Returns:list

Scope Features

abstract Database.read_uncertainties(scope_name: str)list[source]

A list of all uncertainties for a given scope.

Parameters:scope_name (str) – scope name
abstract Database.read_levers(scope_name: str)list[source]

A list of all levers for a given scope.

Parameters:scope_name (str) – scope name
abstract Database.read_constants(scope_name: str)list[source]

A list of all constants for a given scope.

Parameters:scope_name (str) – scope name
abstract Database.read_measures(scope_name: str)list[source]

A list of all performance measures for a given scope.

Parameters:scope_name (str) – scope name

Boxes

abstract Database.write_box(box, scope_name=None)[source]

Write a single box to the database.

Parameters:
  • box (Box) – The Box to write to the database.
  • scope_name (str, optional) – The scope name to use when writing to the database. If the boxes has a particular scope assigned, the name of that scope is used.
Raises:

ValueError – If the box has a particular scope assigned, and scope_name is given but it is not the same name of the assigned scope.

abstract Database.write_boxes(boxes, scope_name=None)[source]

Write Boxes to the database.

Parameters:
  • boxes (Boxes) – The collection of Boxes to write to the database.
  • scope_name (str, optional) – The scope name to use when writing to the database. If the boxes has a particular scope assigned, the name of that scope is used.
Raises:

ValueError – If the boxes has a particular scope assigned, and scope_name is given but it is not the same name of the assigned scope.

abstract Database.read_box(scope_name: str, box_name: str, scope=None)[source]

Read a Box from the database.

Parameters:
  • scope_name (str) – The name of the scope from which to read the box.
  • box_name (str) – The name of the box to read.
  • scope (Scope, optional) – The Scope to assign to the Box that is returned. If not given, no Scope object is assigned to the box.
Returns:

Box

abstract Database.read_boxes(scope_name: Optional[str] = None, scope=None)[source]

Read Boxes from the database.

Parameters:
  • scope_name (str, optional) – The name of the scope from which to load Boxes. This is used exclusively to identify the Boxes to load from the database, and the scope by this name is not attached to the Boxes, unless scope is given, in which case this argument is ignored.
  • scope (Scope, optional) – The scope to assign to the Boxes. If not given, no Scope object is assigned.
Returns:

Boxes

abstract Database.read_box_names(scope_name: str)[source]

Get the names of all boxes associated with a particular scope.

Parameters:scope_name (str) – The name of the scope from which to read the Box names.
Returns:list[str]
abstract Database.read_box_parent_name(scope_name: str, box_name: str)[source]

Get the name of the parent box for a particular box in the database

Parameters:
  • scope_name (str) – The name of the scope from which to read the Box parent.
  • box_name (str) – The name of the box from which to read the parent.
Returns:

If the identified box has a parent, this is the name of that parent, otherwise None is returned.

Return type:

str or None

abstract Database.read_box_parent_names(scope_name: str)[source]

Get the name of the parent box for each box in the database.

Parameters:scope_name (str) – The name of the scope from which to read Box parents.
Returns:
dict
A dictionary, with keys giving Box names and values giving the respective Box parent names.

Experiments

abstract Database.write_experiment_parameters(scope_name, design_name, xl_df)[source]

Write experiment definitions the the database.

This method records values for each experiment parameter, for each experiment in a design of one or more experiments.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis. The scope with this name should already have been stored in this database.
  • design_name (str) – An experiment design name. This name should be unique within the named scope, and typically will include a reference to the design sampler, for example: ‘uni’ - generated by univariate sensitivity test design ‘lhs’ - generated by latin hypercube sample design The design_name is used primarily to load groups of related experiments together.
  • xl_df (pandas.Dataframe) – The columns of this DataFrame are the experiment parameters (i.e. policy levers, uncertainties, and constants), and each row is an experiment.
Returns:

the experiment id’s of the newly recorded experiments

Return type:

list

Raises:
  • UserWarning – If scope name does not exist
  • TypeError – If not all scope variables are defined in the exp_def
Database.write_experiment_parameters_1(scope_name, design_name: str, *args, **kwargs)[source]

Write experiment definitions for a single experiment.

This method records values for each experiment parameter, for a single experiment only.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis. The scope with this name should already have been stored in this database.
  • design_name (str) – An experiment design name. This name should be unique within the named scope, and typically will include a reference to the design sampler, for example: ‘uni’ - generated by univariate sensitivity test design ‘lhs’ - generated by latin hypercube sample design The design_name is used primarily to load groups of related experiments together.
  • *args (Mapping[s]) – A dictionary where the keys are experiment parameter names (i.e. policy levers, uncertainties, and constants), and values are the the parameter values for this experiment. Subsequent positional or keyword arguments are used to update the parameters.
  • **kwargs

    A dictionary where the keys are experiment parameter names (i.e. policy levers, uncertainties, and constants), and values are the the parameter values for this experiment. Subsequent positional or keyword arguments are used to update the parameters.

Returns:

The experiment id of the newly recorded experiments

Return type:

int

Raises:
  • UserWarning – If scope name does not exist
  • TypeError – If not all scope variables are defined in the exp_def
abstract Database.write_experiment_measures(scope_name, source, m_df, run_ids=None, experiment_id=None)[source]

Write experiment results to the database.

Write the performance measure results for each experiment in the scope - if the scope does not exist, nothing is recorded.

Note that the design_name is not required to write experiment measures, as the individual experiments from any design are uniquely identified by the experiment id’s.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis. The scope with this name should already have been stored in this database.
  • source (int) – An indicator of performance measure source. This should be 0 for a bona-fide run of the associated core models, or some non-zero metamodel_id number.
  • m_df (pandas.DataFrame) – The columns of this DataFrame are the performance measure names, and row indexes are the experiment id’s.
  • run_ids (pandas.Index, optional) – Provide an optional index of universally unique run ids (UUIDs) for these results. The UUIDs can be used to help identify problems and organize model runs.
Raises:

UserWarning – If scope name does not exist

abstract Database.write_experiment_all(scope_name, design_name, source, xlm_df)[source]

Write experiment definitions and results

Writes the values from each experiment variable and the results for each performance measure per experiment

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis. The scope with this name should already have been stored in this database.
  • design_name (str) – An experiment design name. This name should be unique within the named scope, and typically will include a reference to the design sampler, for example: ‘uni’ - generated by univariate sensitivity test design ‘lhs’ - generated by latin hypercube sample design The design_name is used primarily to load groups of related experiments together.
  • source (int) – An indicator of performance measure source. This should be 0 for a bona fide run of the associated core models, or some non-zero metamodel_id number.
  • xlm_df (pandas.Dataframe) – The columns of this DataFrame are the experiment parameters (i.e. policy levers, uncertainties, and constants) and performance measures, and each row is an experiment.
Raises:
  • DesignExistsError – If scope and design already exist
  • TypeError – If not all scope variables are defined in the experiment
abstract Database.read_experiment_parameters(scope_name, design_name=None, only_pending=False, design=None, *, experiment_ids=None, ensure_dtypes=True)[source]

Read experiment definitions from the database.

Read the values for each experiment parameter per experiment.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis.
  • design_name (str, optional) – If given, only experiments associated with both the scope and the named design are returned, otherwise all experiments associated with the scope are returned.
  • only_pending (bool, default False) – If True, only pending experiments (which have no performance measure results stored in the database) are returned.
  • design (str, optional) – Deprecated. Use design_name.
  • experiment_ids (Collection, optional) – A collection of experiment id’s to load. If given, both design_name and only_pending are ignored.
  • ensure_dtypes (bool, default True) – If True, the scope associated with these experiments is also read out of the database, and that scope file is used to format experimental data consistently (i.e., as float, integer, bool, or categorical).
Returns:

The experiment parameters are returned in a subclass of a normal pandas.DataFrame, which allows attaching the design_name as meta-data to the DataFrame.

Return type:

emat.ExperimentalDesign

Raises:

ValueError – if scope_name is not stored in this database

abstract Database.read_experiment_measures(scope_name, design_name=None, experiment_id=None, source=None, design=None, runs=None)[source]

Read experiment results from the database.

Parameters:
  • scope_name (str or Scope) – A scope or just its name, used to identify experiments, performance measures, and results associated with this exploratory analysis.
  • design_name (str, optional) – If given, only experiments associated with both the scope and the named design are returned, otherwise all experiments associated with the scope are returned.
  • experiment_id (int, optional) – The id of the experiment to retrieve. If omitted, get all experiments matching the named scope and design.
  • source (int, optional) – The source identifier of the experimental outcomes to load. If not given, but there are only results from a single source in the database, those results are returned. If there are results from multiple sources, an error is raised.
  • design (str) – Deprecated, use design_name.
  • runs ({None, 'all', 'valid', 'invalid'}, default None) – By default, this method fails if there is more than one valid model run matching the given design_name and source (if any) for any experiment. Set this to ‘valid’ or ‘invalid’ to get all valid or invalid model runs (instead of raising an exception). Set to ‘all’ to get everything, including both valid and invalidated results.
  • formulas (bool, default True) – If the scope includes formulaic measures (computed directly from other measures) then compute these values and include them in the results.
Returns:

performance measures

Return type:

results (pandas.DataFrame)

Raises:

ValueError – When the database contains multiple sets of results matching the given design_name and/or source (if any) for any experiment.

abstract Database.read_experiment_all(scope_name, design_name=None, source=None, *, only_pending=False, only_incomplete=False, only_complete=False, only_with_measures=False, ensure_dtypes=True, with_run_ids=False, runs=None)[source]

Read experiment definitions and results

Read the values from each experiment variable and the results for each performance measure per experiment.

Parameters:
  • scope_name (str) – A scope name, used to identify experiments, performance measures, and results associated with this exploratory analysis.
  • design_name (str or Collection[str], optional) – The experimental design name (a single str) or a collection of design names to read.
  • source (int, optional) – The source identifier of the experimental outcomes to load. If not given, but there are only results from a single source in the database, those results are returned. If there are results from multiple sources, an error is raised.
  • only_pending (bool, default False) – If True, only pending experiments (which have no performance measure results stored in the database) are returned. Experiments that have any results, even if only partial results, are excluded.
  • only_incomplete (bool, default False) – If True, only incomplete experiments (which have at least one missing performance measure result that is not stored in the database) are returned. Only complete experiments (that have every performance measure populated) are excluded.
  • only_complete (bool, default False) – If True, only complete experiments (which have no missing performance measure results stored in the database) are returned.
  • only_with_measures (bool, default False) – If True, only experiments with at least one stored performance measure are returned.
  • ensure_dtypes (bool, default True) – If True, the scope associated with these experiments is also read out of the database, and that scope file is used to format experimental data consistently (i.e., as float, integer, bool, or categorical).
  • with_run_ids (bool, default False) – Whether to use a two-level pd.MultiIndex that includes both the experiment_id (which always appears in the index) as well as the run_id (which only appears in the index if this argument is set to True).
  • runs ({None, 'all', 'valid', 'invalid'}, default None) – By default, this method returns the one and only valid model run matching the given design_name and source (if any) for any experiment, and fails if there is more than one such valid run. Set this to ‘valid’ or ‘invalid’ to get all valid or invalid model runs (instead of raising an exception). Set to ‘all’ to get everything, including both valid and invalidated results.
Returns:

The experiment parameters are returned in a subclass of a normal pandas.DataFrame, which allows attaching the design_name as meta-data to the DataFrame.

Return type:

emat.ExperimentalDesign

Raises:

ValueError – When no source is given but the database contains results from multiple sources.

abstract Database.read_experiment_ids(scope_name, xl_df)[source]

Read the experiment ids previously defined in the database.

This method is used to recover the experiment id, if the set of parameter values is known but the id of the experiment is not known.

Parameters:
  • scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
  • xl_df (pandas.DataFrame) – columns are experiment parameters, each row is a full experiment
Returns:

the experiment id’s of the identified experiments

Return type:

list

Raises:
  • ValueError – If scope name does not exist
  • ValueError – If multiple experiments match an experiment definition. This can happen, for example, if the definition is incomplete.
abstract Database.read_design_names(scope_name: str)list[source]

A list of all available designs for a given scope.

Parameters:scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
abstract Database.delete_experiments(scope_name, design_name=None, design=None)[source]

Delete experiment definitions and results.

The method removes the linkage between experiments and the identified experimental design. Experiment parameters and results are only removed if they are also not linked to any other experimental design stored in the database.

Parameters:
  • scope_name (str) – scope name, used to identify experiments, performance measures, and results associated with this run
  • design_name (str) – Only experiments associated with both the scope and the named design are deleted.
  • design (str) – Deprecated, use design_name.

Meta-Models

abstract Database.get_new_metamodel_id(scope_name)[source]

Get a new unused metamodel id for a given scope.

Parameters:scope_name (str) – scope name
Returns:int
abstract Database.write_metamodel(scope_name, metamodel, metamodel_id=None, metamodel_name='')[source]

Store a meta-model in the database

Parameters:
  • scope_name (str) – scope name
  • metamodel (emat.MetaModel) – The meta-model to be stored. If a PythonCoreModel containing a MetaModel is given, the MetaModel will be extracted.
  • metamodel_id (int, optional) – A unique id number for this metamodel. If no id number is given and it cannot be inferred from metamodel, a unique id number will be created.
  • metamodel_name (str, optional) – A name for this meta-model. If no name is given and it cannot be inferred from metamodel, an empty string is used.
abstract Database.read_metamodel_ids(scope_name)[source]

A list of all metamodel id’s for a given scope.

Parameters:scope_name (str) – scope name
abstract Database.read_metamodel(scope_name, metamodel_id=None)[source]

Retrieve a meta-model from the database.

Parameters:
  • scope_name (str) – scope name
  • metamodel_id (int, optional) – A unique id number for this metamodel. If not given but there is exactly one metamodel stored for the given scope, that metamodel will be returned.
Returns:

The meta-model, ready to use

Return type:

PythonCoreModel