lir.config.data module

class lir.config.data.DataSetup(provider: DataProvider, strategy: DataStrategy, data_filter: Transformer | None)[source]

Bases: object

Data setup, consisting of three components: a data provider, a filter, and a strategy.

The filter is a Transformer that supports calling the apply() method without priorly calling fit(). Unlike in LR system pipelines, this transformer may change the number of instances in the dataset.

Parameters:
get_splits() Iterable[tuple[InstanceData, InstanceData]][source]

Return the data in the form of one or more train/test splits.

This method follows three steps: - retrieve instances from the data provider; - pass them through the filter by calling its apply() method; - apply the data strategy to arrange them into one or more train/test splits.

Returns:

An iterator over tuples of train/test splits.

Return type:

Iterable[tuple[InstanceData, InstanceData]]

lir.config.data.parse_data_provider(cfg: ContextAwareDict, output_path: Path) DataProvider[source]

Instantiate specific implementation of DataProvider as configured.

The method field is parsed, which is expected to refer to a name in the registry. See for example lir.config.data_sources.synthesized_normal_binary or lir.config.data_sources.synthesized_normal_multiclass.

Data sources are provided under the data_sources key.

Parameters:
  • cfg (ContextAwareDict) – Data provider configuration.

  • output_path (Path) – Output path for created objects.

Returns:

Parsed data provider instance.

Return type:

DataProvider

lir.config.data.parse_data_setup(cfg: ContextAwareDict, output_path: Path) DataSetup[source]

Parse data provider and data strategy from configuration.

The fields provider, filter and splits are parsed, which are expected to refer to specific implementations of DataProvider, Transformer and DataStrategy, respectively. See parse_data_provider, parse_module and parse_data_strategy for more information.

Parameters:
  • cfg (ContextAwareDict) – Configuration section containing provider and split strategy.

  • output_path (Path) – Output path for created objects.

Returns:

Parsed data provider, filter and strategy.

Return type:

DataSetup

lir.config.data.parse_data_strategy(cfg: ContextAwareDict, output_path: Path) DataStrategy[source]

Instantiate specific implementation of DataStrategy as configured.

The strategy field is parsed, which is expected to refer to a name in the registry. See for example lir.data_setup.binary_cross_validation or lir.data_setup.binary_train_test_split.

Data setup configuration is provided under the data_setup key.

Parameters:
  • cfg (ContextAwareDict) – Data strategy configuration.

  • output_path (Path) – Output path for created objects.

Returns:

Parsed data strategy instance.

Return type:

DataStrategy