lir.config package

Submodules

lir.config.base module

class lir.config.base.ConfigParser

Bases: ABC

Abstract base configuration parser class.

Each implementation should implement a custom parse() method which is dedicated to parsing a specific aspect, e.g. the configuration for setting up the numpy CSV writer.

static get_type_name(cls: Any) → str: Return the full type name of the cls argument.

abstractmethod parse(config: ContextAwareDict, output_dir: Path) → Any

Dedicated function to parse a specific section of a YAML configuration.

Arguments: - config: a section of a YAML configuration - config_context_path: the path in the YAML configuration to config, the section to be parsed - output_dir: the directory where the returned object may write results during its lifetime

Returns: an object that is configured according to config.

reference() → str

Return the full class name that was used to initialize this parser.

By default, return the name of this class. In a subclass that was initialized with another class or function that does the actual work, the name of that class is returned.

class lir.config.base.ContextAwareDict(context: list[str], *args: Any, **kwargs: Any)

Bases: dict

Dictionary wrapper which has knowledge about its context.

clone(context: list[str] | None = None) → ContextAwareDict: Allow creating a new instance of the ContextAware dictionary, keeping the context.

class lir.config.base.ContextAwareList(context: list[str], *args: Any, **kwargs: Any)

Bases: list

List wrapper which has knowledge about its context.

clone(context: list[str] | None = None) → ContextAwareList: Allow creating a new instance of the ContextAware dictionary, keeping the context.

class lir.config.base.GenericConfigParser(component_class: type[Any])

Bases: ConfigParser

Return an instantiation of a class, initialized with the specified arguments.

parse(config: ContextAwareDict, output_dir: Path) → Any: Perform parsing of a class, based on configuration.

reference() → str: Return the full name of the component_class class argument.

class lir.config.base.GenericFunctionConfigParser(component_class: Callable)

Bases: ConfigParser

Parser for callable functions or component classes.

parse(config: ContextAwareDict, output_dir: Path) → Callable: Perform the parsing, based on component class.

reference() → str: Return the full name of the component_class function argument.

exception lir.config.base.YamlParseError(config_context_path: list[str], message: str)

Bases: ValueError

Error raised when parsing YAML configuration fails, mentioning specific YAML path.

lir.config.base.check_is_empty(config: ContextAwareDict, accept_keys: Sequence[str] | None = None) → None

Ensure all defined expected arguments are parsed and warn about ignored arguments.

If any unexpected arguments remain, a YamlParseError is raised indicating the argument was unexpected and not taken into account (i.e. not parsed). This methodology ensures the user does not assume arguments are parsed that are in fact not recognized.

lir.config.base.check_not_none(v: Any) → Any: Validate the value to not be equal to None.

Check whether a value is an instance of a type.

Returns the value if successful, raises an exception otherwise.

Value types that may be found in YAML configurations: - dict - list - int - float - str - NoneType

Parameters:

type_class – the target type
v – the value to check
message – an optional message that is used in case of an error

Returns:

the value

lir.config.base.config_parser(func: Callable[[ContextAwareDict, Path], Any]) → Callable

Wrap parsing functions in a ConfigParser object by providing a decorator.

The ConfigParser object exposes a parse() method, required by the API.

Using the @config_parser decorator, exposes the body of the function through the wrapped parse() method.

This decorator can be used as follows (example): ``` @config_parser def foo(config, config_context_path, output_dir):

if “some_argument” not in config or “another_argument” not in config:
raise YamlParseError(config_context_path, “a required argument is missing”)

return Bar(config[“some_argument”], config[“another_argument”])

```

Now, the function foo() is wrapped within a ConfigParser object, which exposes the function body of foo() through the parse() method. See documentation of ConfigParser for the meaning of the arguments.

lir.config.base.parse_pairing_config(module_config: ContextAwareDict | str, output_dir: Path, context: list[str]) → PairingMethod

Parse and delegate pairing to the corresponding function for the defined pairing method.

The argument module_config defines the pairing method. If its value is a str, the registry is queried and the corresponding pairing method is returned. If its value is a dict, the pairing method is defined by the value module_config[“method”], and the registry is queried for the config parser of the corresponding pairing method. The remaining values in module_config are passed as arguments to the configuration parser of the pairing method.

If the registry cannot resolve the pairing method, an exception is raised.

lir.config.base.pop_field(config: ContextAwareDict | Any, field: str, default: Any = None, required: bool | None = None, validate: Callable[[Any], Any] | None = None) → Any

Validate and retrieve the value for a given field, after which it is removed from the configuration.

Parameters:

config – the configuration
field – the field to obtain from the config
default – the value to return if the field is not found; defaults to None; if the value is not None, the required argument defaults to False
required – if True and the field was not found, raise an error; defaults to True unless default is not None
validate – a callable to validate the value type

Returns:

the field value or the default value or an error is raised

lir.config.data_providers module

lir.config.data_providers.parse_data_provider(cfg: ContextAwareDict, output_path: Path) → DataProvider

Instantiate specific implementation of DataProvider as configured.

The type field is parsed, which is expected to refer to a name in the registry. See for example lir.config.data_sources.synthesized_normal_binary or lir.config.data_sources.synthesized_normal_multiclass.

Data sources are provided under the data_sources key.

lir.config.data_strategies module

lir.config.data_strategies.parse_data_strategy(cfg: ContextAwareDict, output_path: Path) → DataStrategy

Instantiate specific implementation of DataStrategy as configured.

The setup field is parsed, which is expected to refer to a name in the registry. See for example lir.data_setup.binary_cross_validation or lir.data_setup.binary_train_test_split.

Data setup configuration is provided under the data_setup key.

lir.config.experiment_strategies module

class lir.config.experiment_strategies.ExperimentStrategyConfigParser

Bases: ConfigParser, ABC

Base class for an experiment strategy configuration parser.

data() → tuple[DataProvider, DataStrategy]

Parse the data section of the configuration.

The corresponding data provider and splitting strategy instances are provided.

abstractmethod get_experiment(name: str) → Experiment: Get the experiment by name for the defined LR system.

lrsystem() → tuple[ContextAwareDict, list[Hyperparameter]]

Parse the LR System section including hyperparameters.

The baseline configuration is provided along with the specified parameters to vary (the defined hyperparameters).

output_list() → Sequence[Aggregation]

Initialize corresponding aggregation classes based on the output section.

The initialized aggregation classes are returned as a sequence, to be iterated over in a later stage.

parse(config: ContextAwareDict, output_dir: Path) → Experiment: Parse the experiment section of the configuration.

primary_metric() → Callable: Parse the primary_metric field.

class lir.config.experiment_strategies.GridStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment consisting of multiple runs using configuration values.

get_experiment(name: str) → Experiment: Get experiment for the grid strategy run, based on its name.

class lir.config.experiment_strategies.OptunaStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment for optimizing configuration parameters.

get_experiment(name: str) → Experiment: Get experiment for the optuna run, based on its name.

class lir.config.experiment_strategies.SingleRunStrategy

Bases: ExperimentStrategyConfigParser

Prepare Experiment consisting of a single run using configuration values.

get_experiment(name: str) → Experiment: Get an experiment for a single run, based on its name.

lir.config.experiment_strategies.parse_experiment_strategy(config: ContextAwareDict, output_path: Path) → Experiment

Instantiate the corresponding experiment strategy class, e.g. for a single or grid run.

A corresponding Experiment class is returned.

lir.config.experiment_strategies.parse_experiments(cfg: ContextAwareDict, output_path: Path) → Mapping[str, Experiment]

Extract which Experiment to run as dictated in the configuration.

Parameters:

cfg – a dict object describing the experiments
output_path – the filesystem path to the results directory

Returns:

a mapping of names to experiments

lir.config.lrsystem_architectures module

class lir.config.lrsystem_architectures.ParsedLRSystem(lrsystem: LRSystem, config: ContextAwareDict, output_dir: Path)

Bases: LRSystem

Represent a given initialized LR system based on the provided configuration.

apply(instances: InstanceData) → LLRData: Use the fitted LR system to calculate LLR data for the input instance data.

fit(instances: InstanceData) → Self: Fit the LR system on the instance data.

lir.config.lrsystem_architectures.parse_augmented_lrsystem(baseline_lrsystem_config: ContextAwareDict, hyperparameters: dict[str, HyperparameterOption], output_dir: Path, dirname_prefix: str = '') → ParsedLRSystem

Parse an augmented LR system.

The LR system is parsed from a base configuration and a set of parameter substitutions that override parts of the base configuration. Results are written to a subdirectory of output_dir that is named by its parameter substitutions and prefixed by dirname_prefix.

Parameters:

baseline_lrsystem_config – the base LR system configuration
hyperparameters – hyperparameter substitutions that override parts of the base configuration
output_dir – the directory where create a results directory
dirname_prefix – the prefix of the created directory name

Returns:

the LR system

lir.config.lrsystem_architectures.parse_default_pipeline(config: ContextAwareDict) → str

Parse the intermediate result field from configuration, with the goal of determining the default pipeline method.

Parameters:: config – the configuration dictionary
Returns:: the default method to use (‘logging_pipeline’ or ‘pipeline’)

lir.config.lrsystem_architectures.parse_lrsystem(config: ContextAwareDict, output_dir: Path) → ParsedLRSystem

Determine and initialise corresponding LR system from configuration values.

LR systems are provided under the architectures key.

lir.config.metrics module

lir.config.metrics.parse_individual_metric(name: str, output_path: Path, context: list[str]) → Callable: Leverage config parser to interpret metric section of the configuration.

lir.config.substitution module

Substitution module.

This module offers support functions for replacing/modifying components of an LR Benchmark pipeline at runtime. For example to compare a logistic regression approach with a support vector approach or to optimize a given (hyper)parameter of the system.

For example, the parameters path of the model_selection_run benchmark, which defines the comparing.clf as a path to modify with the options as defined in the values section. This will replace (update) the defined comparing module in the LR system configuration, used in this pipeline. ``` benchmarks:

model_selection_run:
lr_system: … … parameters:

path: comparing.clf values:

name: logit method: logistic_regression C: 1

name: svm method: svm probability: True

```

class lir.config.substitution.CategoricalHyperparameter(name: str, options: list[HyperparameterOption])

Bases: Hyperparameter

A categorical hyperparameter.

A categorical hyperparameter has the following fields in a YAML configuration: - path: the path of this hyperparameter in the LR system configuration - options: a list of options

options() → list[HyperparameterOption]: Provide API access to the options for the hyperparameter.

class lir.config.substitution.FloatHyperparameter(path: str, low: float, high: float, step: float | None, log: bool)

Bases: Hyperparameter

A floating point hyperparameter.

A floating point hyperparameter has the following fields in a YAML configuration: - path: the path of this hyperparameter in the LR system configuration - low: the lowest possible value - high: the highest possible value - step (optional): the step size - log (optional): if True, search in log space instead of linear space; cannot be combined with step (defaults to

False)

options() → list[HyperparameterOption]: Provide API access to the options for the hyperparameter.

class lir.config.substitution.FolderHyperparameter(path: str, folder: str, ignore_files: list[str] | None = None)

Bases: Hyperparameter

A folder hyperparameter that takes all files in a given folder as options.

A folder hyperparameter has fields in a YAML configuration: - folder: the path of the folder containing the options - ignore_files: a list of file patterns to ignore

The generated options will have the full path of each file as both name and value.

An example configuration is as follows: ```

hyperparameters:

path: data.provider.path type: folder folder: project_files/my_dataset/ ignore_files: # Optional list of file patterns to ignore.

‘*.tmp’

‘ignore_this_file.csv’

```

A ValueError can be raised in the following situations: - the given folder does not exist

applies during initialization

no valid files are found in the folder (after applying the ignore list)
applies when calling the options() method

options() → list[HyperparameterOption]: Generates the options by walking over the folder.

class lir.config.substitution.Hyperparameter(name: str)

Bases: ABC

Base class for all hyperparameters.

abstractmethod options() → list[HyperparameterOption]

Get a list of values that a hyperparameter can take in the context of a particular experiment.

Returns:: a list of HyperparameterOption

class lir.config.substitution.HyperparameterOption(name: str, substitutions: Mapping[str, Any])

Bases: NamedTuple

An option for a value of a hyperparameter.

A HyperparameterOption is a named tuple with two fields: - name: a descriptive name of this option - substitutions: a mapping of configuration paths to values

name: str: Alias for field number 0

substitutions: Mapping[str, Any]: Alias for field number 1

lir.config.substitution.parse_folder(spec: ContextAwareDict, output_path: Path) → FolderHyperparameter: Parse the parameters section of the configuration into a FolderHyperparameter object.

lir.config.substitution.parse_hyperparameter(spec: ContextAwareDict, output_dir: Path) → Hyperparameter: Parse the parameters section of the configuration into a dedicated value wrapper object.

lir.config.substitution.substitute_hyperparameters(base_config: ContextAwareDict, hyperparameters: Mapping[str, Any], context: list[str]) → ContextAwareDict

Substitute hyperparameters in an LR system configuration and return the updated configuration.

Parameters:

base_config – the original LR system configuration
hyperparameters – the hyperparameters and their values
context – the context path of the augmented configuration

Returns:

the augmented LR system configuration

lir.config.transform module

class lir.config.transform.GenericTransformerConfigParser(component_class: object)

Bases: ConfigParser

Parser class to help parse the defined component into its corresponding Transformer object.

Since the scikit-learn Pipeline expects a fit() and transform() method on each of the pipeline steps, the configured components should adhere to this contract and implement these methods.

The parse() function offered in this helper class, implements a branching strategy to determine which strategy is best suited to make the component compatible with the scikit-learn pipeline.

parse(config: ContextAwareDict, output_dir: Path) → Transformer: Prepare the defined component to support the expected methods in the scikit-learn Pipeline.

class lir.config.transform.NumpyWrappingConfigParser(module_parser: ConfigParser)

Bases: ConfigParser

Wrap a Transformer to add a header to FeatureData.

parse(config: ContextAwareDict, output_dir: Path) → Transformer: Parse the provided header configuration.

reference() → str: Return the full name of the module_parser class argument.

lir.config.transform.parse_module(module_config: ContextAwareDict | str | None, output_dir: Path, config_context_path: list[str], default_method: str | None = None) → Transformer

Construct a Transformer from a string or configuration section.

If the module_config argument is None, the Identity transformer is returned.

If module_config is a dictionary, it must have the field method, which is an object that name is looked up the registry. All other fields are initialization arguments. If no arguments are required, the input can be just the object name instead.

If the class is: - a subclass of ConfigParser, then the class is instantiated, and the return value of its `parse() method is

returned;

a class which has a transform attribute, or a Transformer subclass, it is instantiated and returned;
a class which has a predict_proba attribute, it is instantiated, wrapped by EstimatorTransformer and returned;
any other callable, it is wrapped by FunctionTransformer, and returned.

If module_config is a str, the call to this function has the same effect as if it was a dictionary whose single field method had this value.

Parameters:

module_config – the specification of this module
output_dir – where any output is written
config_context_path – the context of this configuration
default_method – the default value for the method field of the mdoule_config

Returns:

a transformer object

lir.config.util module

class lir.config.util.TeeParser

Bases: ConfigParser

Parse configuration for allowing multiple tasks for given input.

parse(config: ContextAwareDict, output_dir: Path) → Any: Read configuration for modules section and provide wrapped corresponding transformers.

lir.config.util.simplify_data_structure(data: Any) → Any

Simplify data structure: specialized data types are replaced.

For example, ContextAwareDict is replaced by dict.

lir.config package

Submodules

lir.config.base module

lir.config.data_providers module

lir.config.data_strategies module

lir.config.experiment_strategies module

lir.config.lrsystem_architectures module

lir.config.metrics module

lir.config.substitution module

lir.config.transform module

lir.config.util module

Module contents