Scope¶
The exploratory Scope
provides a high-level
group of instructions for what inputs and outputs a model
provides, and what ranges and/or distributions of these inputs
will be considered in an exploratory analysis.
The Scope
API outlined below provides the framework for
interacting with an existing scope in Python. However, the process
of defining a new scope is generally done by using
a scope definition YAML file.
-
class
emat.
Scope
(scope_file, scope_def=None)[source]¶ Definitions for the relevant inputs and outputs for a model.
A Scope provides a structure to define the nature of the inputs and outputs for exploratory modeling.
Parameters:
Read / Write¶
-
Scope.
store_scope
(db: emat.database.database.Database)[source]¶ Write variables and scope definition to database.
Writing the scope to the database is required prior to running a experiments that will be stored.
Parameters: db (Database) – database object
-
Scope.
delete_scope
(db: emat.database.database.Database)[source]¶ Deletes scope from database.
Parameters: db (Database) – The database from which to delete this Scope. Note
Only the name attribute is used to identify the scope to delete. If some other different scope is stored in the database with the same name as this scope, it will be deleted.
-
Scope.
dump
(stream=None, filename=None, strip_measure_transforms=False, include_measures=None, exclude_measures=None, default_flow_style=False, **kwargs)[source]¶ Serialize this scope into a YAML stream.
Parameters: - stream (file-like or None) – Serialize into this stream. If None, return the produced string instead, unless filename is given.
- filename (path-like or None) – If given and stream is None, then write the serialized result into this file.
- strip_measure_transforms (bool, default False) – Remove the ‘transform’ values from all measures in the output.
- include_measures (Collection[str], optional) – If provided, only output performance measures with names in this set will be included.
- exclude_measures (Collection[str], optional) – If provided, only output performance measures with names not in this set will be included.
- default_flow_style (bool, default False) – Use the default_flow_style, see yaml.dump for details.
- **kwargs – All other keyword arguments are forwarded as-is to yaml.dump
Returns: If both stream and filename are None, the serialized YAML content is returned as a string.
Return type: Raises: - FileExistsError – If filename already exists.
- ValueError – If both stream and filename are given.
-
Scope.
duplicate
(strip_measure_transforms=False, include_measures=None, exclude_measures=None)[source]¶ Create a duplicate scope, optionally stripping some features.
Parameters: - strip_measure_transforms (bool, default False) – Remove the ‘transform’ values from all measures.
- include_measures (Collection[str], optional) – If provided, only output performance measures with names in this set will be included.
- exclude_measures (Collection[str], optional) – If provided, only output performance measures with names not in this set will be included.
Returns: Scope
Feature Access¶
Names¶
Utilities¶
-
Scope.
info
(return_string=False)[source]¶ Print a summary of this Scope.
Parameters: return_string (bool) – Defaults False (print to stdout) but if given as True then this function returns the string instead of printing it.
-
Scope.
design_experiments
(*args, **kwargs)[source]¶ Create a design of experiments based on this Scope.
Parameters: - n_samples_per_factor (int, default 10) – The number of samples in the design per random factor.
- n_samples (int or tuple, optional) – The total number of samples in the design. If jointly is False, this is the number of samples in each of the uncertainties and the levers, the total number of samples will be the square of this value. Give a 2-tuple to set values for uncertainties and levers respectively, to set them independently. If this argument is given, it overrides n_samples_per_factor.
- random_seed (int or None, default 1234) – A random seed for reproducibility.
- db (Database, optional) – If provided, this design will be stored in the database indicated.
- design_name (str, optional) – A name for this design, to identify it in the database. If not given, a unique name will be generated based on the selected sampler. Has no effect if no db is given.
- sampler (str or AbstractSampler, default 'lhs') –
The sampler to use for this design. Available pre-defined samplers include:
- ’lhs’: Latin Hypercube sampling
- ’ulhs’: Uniform Latin Hypercube sampling, which ignores defined
- distribution shapes from the scope and samples everything as if it was from a uniform distribution
- ’mc’: Monte carlo sampling
- ’uni’: Univariate sensitivity testing, whereby experiments are
- generated setting each parameter individually to minimum and maximum values (for numeric dtypes) or all possible values (for boolean and categorical dtypes). Note that designs for univariate sensitivity testing are deterministic and the number of samples given is ignored.
- sample_from ('all', 'uncertainties', or 'levers') – Which scope components from which to sample. Components not sampled are set at their default values in the design.
- jointly (bool, default True) – Whether to sample jointly all uncertainties and levers in a single design, or, if False, to generate separate samples for levers and uncertainties, and then combine the two in a full-factorial manner. This argument has no effect unless sample_from is ‘all’. Note that setting jointly to False may produce a very large design, as the total number of experiments will be the product of the number of experiments for the levers and the number of experiments for the uncertainties, which are set separately (i.e. if n_samples is given, the total number of experiments is the square of that value).
Returns: The resulting design.
Return type:
-
Scope.
ensure_dtypes
(df)[source]¶ Convert columns of dataframe to correct dtype as needed.
Parameters: df (pandas.DataFrame) – A dataframe with column names that are uncertainties, levers, or measures. Returns: The same data as input, but with dtypes as appropriate. Return type: pandas.DataFrame