lir.config package

Submodules

lir.config.base module

class lir.config.base.ConfigParser

Bases: ABC

Abstract base configuration parser class.

Each implementation should implement a custom parse() method which is dedicated to parsing a specific aspect, e.g. the configuration for setting up the numpy CSV writer.

static get_type_name(cls: Any) str

Return the full type name of the cls argument.

abstractmethod parse(config: ContextAwareDict, output_dir: Path) Any

Dedicated function to parse a specific section of a YAML configuration.

Arguments: - config: a section of a YAML configuration - config_context_path: the path in the YAML configuration to config, the section to be parsed - output_dir: the directory where the returned object may write results during its lifetime

Returns: an object that is configured according to config.

reference() str

Return the full class name that was used to initialize this parser.

By default, return the name of this class. In a subclass that was initialized with another class or function that does the actual work, the name of that class is returned.

class lir.config.base.ContextAwareDict(context: list[str], *args: Any, **kwargs: Any)

Bases: dict

Dictionary wrapper which has knowledge about its context.

clone(context: list[str] | None = None) ContextAwareDict

Allow creating a new instance of the ContextAware dictionary, keeping the context.

class lir.config.base.ContextAwareList(context: list[str], *args: Any, **kwargs: Any)

Bases: list

List wrapper which has knowledge about its context.

clone(context: list[str] | None = None) ContextAwareList

Allow creating a new instance of the ContextAware dictionary, keeping the context.

class lir.config.base.GenericConfigParser(component_class: type[Any])

Bases: ConfigParser

Return an instantiation of a class, initialized with the specified arguments.

parse(config: ContextAwareDict, output_dir: Path) Any

Perform parsing of a class, based on configuration.

reference() str

Return the full name of the component_class class argument.

class lir.config.base.GenericFunctionConfigParser(component_class: Callable)

Bases: ConfigParser

Parser for callable functions or component classes.

parse(config: ContextAwareDict, output_dir: Path) Callable

Perform the parsing, based on component class.

reference() str

Return the full name of the component_class function argument.

exception lir.config.base.YamlParseError(config_context_path: list[str], message: str)

Bases: ValueError

Error raised when parsing YAML configuration fails, mentioning specific YAML path.

lir.config.base.check_is_empty(config: ContextAwareDict, accept_keys: Sequence[str] | None = None) None

Ensure all defined expected arguments are parsed and warn about ignored arguments.

If any unexpected arguments remain, a YamlParseError is raised indicating the argument was unexpected and not taken into account (i.e. not parsed). This methodology ensures the user does not assume arguments are parsed that are in fact not recognized.

lir.config.base.check_not_none(v: Any) Any

Validate the value to not be equal to None.

lir.config.base.check_type(type_class: Any, v: ContextAwareDict | ContextAwareList | None | int | float | str, message: str | None = None) Any

Check whether a value is an instance of a type.

Returns the value if successful, raises an exception otherwise.

Value types that may be found in YAML configurations: - dict - list - int - float - str - NoneType

Parameters:
  • type_class – the target type

  • v – the value to check

  • message – an optional message that is used in case of an error

Returns:

the value

lir.config.base.config_parser(func: Callable[[ContextAwareDict, Path], Any]) Callable

Wrap parsing functions in a ConfigParser object by providing a decorator.

The ConfigParser object exposes a parse() method, required by the API.

Using the @config_parser decorator, exposes the body of the function through the wrapped parse() method.

This decorator can be used as follows (example): ``` @config_parser def foo(config, config_context_path, output_dir):

if “some_argument” not in config or “another_argument” not in config:

raise YamlParseError(config_context_path, “a required argument is missing”)

return Bar(config[“some_argument”], config[“another_argument”])

```

Now, the function foo() is wrapped within a ConfigParser object, which exposes the function body of foo() through the parse() method. See documentation of ConfigParser for the meaning of the arguments.

lir.config.base.parse_pairing_config(module_config: ContextAwareDict | str, output_dir: Path, context: list[str]) PairingMethod

Parse and delegate pairing to the corresponding function for the defined pairing method.

The argument module_config defines the pairing method. If its value is a str, the registry is queried and the corresponding pairing method is returned. If its value is a dict, the pairing method is defined by the value module_config[“method”], and the registry is queried for the config parser of the corresponding pairing method. The remaining values in module_config are passed as arguments to the configuration parser of the pairing method.

If the registry cannot resolve the pairing method, an exception is raised.

lir.config.base.pop_field(config: ContextAwareDict | Any, field: str, default: Any = None, required: bool | None = None, validate: Callable[[Any], Any] | None = None) Any

Validate and retrieve the value for a given field, after which it is removed from the configuration.

Parameters:
  • config – the configuration

  • field – the field to obtain from the config

  • default – the value to return if the field is not found; defaults to None; if the value is not None, the required argument defaults to False

  • required – if True and the field was not found, raise an error; defaults to True unless default is not None

  • validate – a callable to validate the value type

Returns:

the field value or the default value or an error is raised

lir.config.data_providers module

lir.config.data_providers.parse_data_provider(cfg: ContextAwareDict, output_path: Path) DataProvider

Instantiate specific implementation of DataProvider as configured.

The type field is parsed, which is expected to refer to a name in the registry. See for example lir.config.data_sources.synthesized_normal_binary or lir.config.data_sources.synthesized_normal_multiclass.

Data sources are provided under the data_sources key.

lir.config.data_strategies module

lir.config.data_strategies.parse_data_strategy(cfg: ContextAwareDict, output_path: Path) DataStrategy

Instantiate specific implementation of DataStrategy as configured.

The setup field is parsed, which is expected to refer to a name in the registry. See for example lir.data_setup.binary_cross_validation or lir.data_setup.binary_train_test_split.

Data setup configuration is provided under the data_setup key.

lir.config.experiment_strategies module

class lir.config.experiment_strategies.ExperimentStrategyConfigParser

Bases: ConfigParser, ABC

Base class for an experiment strategy configuration parser.

data() tuple[DataProvider, DataStrategy]

Parse the data section of the configuration.

The corresponding data provider and splitting strategy instances are provided.

abstractmethod get_experiment(name: str) Experiment

Get the experiment by name for the defined LR system.

lrsystem() tuple[ContextAwareDict, list[Hyperparameter]]

Parse the LR System section including hyperparameters.

The baseline configuration is provided along with the specified parameters to vary (the defined hyperparameters).

output_list() Sequence[Aggregation]

Initialize corresponding aggregation classes based on the output section.

The initialized aggregation classes are returned as a sequence, to be iterated over in a later stage.

parse(config: ContextAwareDict, output_dir: Path) Experiment

Parse the experiment section of the configuration.

primary_metric() Callable

Parse the primary_metric field.

class lir.config.experiment_strategies.GridStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment consisting of multiple runs using configuration values.

get_experiment(name: str) Experiment

Get experiment for the grid strategy run, based on its name.

class lir.config.experiment_strategies.OptunaStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment for optimizing configuration parameters.

get_experiment(name: str) Experiment

Get experiment for the optuna run, based on its name.

class lir.config.experiment_strategies.SingleRunStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment consisting of a single run using configuration values.

get_experiment(name: str) Experiment

Get an experiment for a single run, based on its name.

lir.config.experiment_strategies.parse_experiment_strategy(config: ContextAwareDict, output_path: Path) Experiment

Instantiate the corresponding experiment strategy class, e.g. for a single or grid run.

A corresponding Experiment class is returned.

lir.config.experiment_strategies.parse_experiments(cfg: ContextAwareDict, output_path: Path) Mapping[str, Experiment]

Extract which Experiment to run as dictated in the configuration.

Parameters:
  • cfg – a dict object describing the experiments

  • output_path – the filesystem path to the results directory

Returns:

a mapping of names to experiments

lir.config.lrsystem_architectures module

class lir.config.lrsystem_architectures.ParsedLRSystem(lrsystem: LRSystem, config: ContextAwareDict, output_dir: Path)

Bases: LRSystem

Represent a given initialized LR system based on the provided configuration.

apply(instances: InstanceData) LLRData

Use the fitted LR system to calculate LLR data for the input instance data.

fit(instances: InstanceData) Self

Fit the LR system on the instance data.

lir.config.lrsystem_architectures.parse_augmented_lrsystem(baseline_lrsystem_config: ContextAwareDict, hyperparameters: dict[str, HyperparameterOption], output_dir: Path, dirname_prefix: str = '') ParsedLRSystem

Parse an augmented LR system.

The LR system is parsed from a base configuration and a set of parameter substitutions that override parts of the base configuration. Results are written to a subdirectory of output_dir that is named by its parameter substitutions and prefixed by dirname_prefix.

Parameters:
  • baseline_lrsystem_config – the base LR system configuration

  • hyperparameters – hyperparameter substitutions that override parts of the base configuration

  • output_dir – the directory where create a results directory

  • dirname_prefix – the prefix of the created directory name

Returns:

the LR system

lir.config.lrsystem_architectures.parse_default_pipeline(config: ContextAwareDict) str

Parse the intermediate result field from configuration, with the goal of determining the default pipeline method.

Parameters:

config – the configuration dictionary

Returns:

the default method to use (‘logging_pipeline’ or ‘pipeline’)

lir.config.lrsystem_architectures.parse_lrsystem(config: ContextAwareDict, output_dir: Path) ParsedLRSystem

Determine and initialise corresponding LR system from configuration values.

LR systems are provided under the architectures key.

lir.config.metrics module

lir.config.metrics.parse_individual_metric(name: str, output_path: Path, context: list[str]) Callable

Leverage config parser to interpret metric section of the configuration.

lir.config.substitution module

Substitution module.

This module offers support functions for replacing/modifying components of an LR Benchmark pipeline at runtime. For example to compare a logistic regression approach with a support vector approach or to optimize a given (hyper)parameter of the system.

For example, the parameters path of the model_selection_run benchmark, which defines the comparing.clf as a path to modify with the options as defined in the values section. This will replace (update) the defined comparing module in the LR system configuration, used in this pipeline. ``` benchmarks:

model_selection_run:

lr_system: … … parameters:

  • path: comparing.clf values:

    • name: logit method: logistic_regression C: 1

    • name: svm method: svm probability: True

```

class lir.config.substitution.CategoricalHyperparameter(name: str, options: list[HyperparameterOption])

Bases: Hyperparameter

A categorical hyperparameter.

A categorical hyperparameter has the following fields in a YAML configuration: - path: the path of this hyperparameter in the LR system configuration - options: a list of options

options() list[HyperparameterOption]

Provide API access to the options for the hyperparameter.

class lir.config.substitution.FloatHyperparameter(path: str, low: float, high: float, step: float | None, log: bool)

Bases: Hyperparameter

A floating point hyperparameter.

A floating point hyperparameter has the following fields in a YAML configuration: - path: the path of this hyperparameter in the LR system configuration - low: the lowest possible value - high: the highest possible value - step (optional): the step size - log (optional): if True, search in log space instead of linear space; cannot be combined with step (defaults to

False)

options() list[HyperparameterOption]

Provide API access to the options for the hyperparameter.

class lir.config.substitution.FolderHyperparameter(path: str, folder: str, ignore_files: list[str] | None = None)

Bases: Hyperparameter

A folder hyperparameter that takes all files in a given folder as options.

A folder hyperparameter has fields in a YAML configuration: - folder: the path of the folder containing the options - ignore_files: a list of file patterns to ignore

The generated options will have the full path of each file as both name and value.

An example configuration is as follows: ```

hyperparameters:
  • path: data.provider.path type: folder folder: project_files/my_dataset/ ignore_files: # Optional list of file patterns to ignore.

    • *.tmp’

    • ‘ignore_this_file.csv’

```

A ValueError can be raised in the following situations: - the given folder does not exist

applies during initialization

  • no valid files are found in the folder (after applying the ignore list)

    applies when calling the options() method

options() list[HyperparameterOption]

Generates the options by walking over the folder.

class lir.config.substitution.Hyperparameter(name: str)

Bases: ABC

Base class for all hyperparameters.

abstractmethod options() list[HyperparameterOption]

Get a list of values that a hyperparameter can take in the context of a particular experiment.

Returns:

a list of HyperparameterOption

class lir.config.substitution.HyperparameterOption(name: str, substitutions: Mapping[str, Any])

Bases: NamedTuple

An option for a value of a hyperparameter.

A HyperparameterOption is a named tuple with two fields: - name: a descriptive name of this option - substitutions: a mapping of configuration paths to values

name: str

Alias for field number 0

substitutions: Mapping[str, Any]

Alias for field number 1

lir.config.substitution.parse_folder(spec: ContextAwareDict, output_path: Path) FolderHyperparameter

Parse the parameters section of the configuration into a FolderHyperparameter object.

lir.config.substitution.parse_hyperparameter(spec: ContextAwareDict, output_dir: Path) Hyperparameter

Parse the parameters section of the configuration into a dedicated value wrapper object.

lir.config.substitution.substitute_hyperparameters(base_config: ContextAwareDict, hyperparameters: Mapping[str, Any], context: list[str]) ContextAwareDict

Substitute hyperparameters in an LR system configuration and return the updated configuration.

Parameters:
  • base_config – the original LR system configuration

  • hyperparameters – the hyperparameters and their values

  • context – the context path of the augmented configuration

Returns:

the augmented LR system configuration

lir.config.transform module

class lir.config.transform.GenericTransformerConfigParser(component_class: object)

Bases: ConfigParser

Parser class to help parse the defined component into its corresponding Transformer object.

Since the scikit-learn Pipeline expects a fit() and transform() method on each of the pipeline steps, the configured components should adhere to this contract and implement these methods.

The parse() function offered in this helper class, implements a branching strategy to determine which strategy is best suited to make the component compatible with the scikit-learn pipeline.

parse(config: ContextAwareDict, output_dir: Path) Transformer

Prepare the defined component to support the expected methods in the scikit-learn Pipeline.

class lir.config.transform.NumpyWrappingConfigParser(module_parser: ConfigParser)

Bases: ConfigParser

Wrap a Transformer to add a header to FeatureData.

parse(config: ContextAwareDict, output_dir: Path) Transformer

Parse the provided header configuration.

reference() str

Return the full name of the module_parser class argument.

lir.config.transform.parse_module(module_config: ContextAwareDict | str | None, output_dir: Path, config_context_path: list[str], default_method: str | None = None) Transformer

Construct a Transformer from a string or configuration section.

If the module_config argument is None, the Identity transformer is returned.

If module_config is a dictionary, it must have the field method, which is an object that name is looked up the registry. All other fields are initialization arguments. If no arguments are required, the input can be just the object name instead.

If the class is: - a subclass of ConfigParser, then the class is instantiated, and the return value of its `parse() method is

returned;

  • a class which has a transform attribute, or a Transformer subclass, it is instantiated and returned;

  • a class which has a predict_proba attribute, it is instantiated, wrapped by EstimatorTransformer and returned;

  • any other callable, it is wrapped by FunctionTransformer, and returned.

If module_config is a str, the call to this function has the same effect as if it was a dictionary whose single field method had this value.

Parameters:
  • module_config – the specification of this module

  • output_dir – where any output is written

  • config_context_path – the context of this configuration

  • default_method – the default value for the method field of the mdoule_config

Returns:

a transformer object

lir.config.util module

class lir.config.util.TeeParser

Bases: ConfigParser

Parse configuration for allowing multiple tasks for given input.

parse(config: ContextAwareDict, output_dir: Path) Any

Read configuration for modules section and provide wrapped corresponding transformers.

lir.config.util.simplify_data_structure(data: Any) Any

Simplify data structure: specialized data types are replaced.

For example, ContextAwareDict is replaced by dict.

Module contents