lir.config.data module
- class lir.config.data.DataSetup(provider: DataProvider, strategy: DataStrategy, data_filter: Transformer | None)[source]
Bases:
objectData setup, consisting of three components: a data provider, a filter, and a strategy.
The filter is a
Transformerthat supports calling the apply() method without priorly calling fit(). Unlike in LR system pipelines, this transformer may change the number of instances in the dataset.- Parameters:
provider (DataProvider) –
- The
DataProviderthat retrieves the data from some data source, such as a CSV file or a database.
- The
strategy (DataStrategy) – The
DataStrategythat determines how the data are used.data_filter (Transformer | None) – An optional filter (
Transformer) to apply to the raw data before doing anything else.
- get_splits() Iterable[tuple[InstanceData, InstanceData]][source]
Return the data in the form of one or more train/test splits.
This method follows three steps: - retrieve instances from the data provider; - pass them through the filter by calling its apply() method; - apply the data strategy to arrange them into one or more train/test splits.
- Returns:
An iterator over tuples of train/test splits.
- Return type:
Iterable[tuple[InstanceData, InstanceData]]
- lir.config.data.parse_data_provider(cfg: ContextAwareDict, output_path: Path) DataProvider[source]
Instantiate specific implementation of DataProvider as configured.
The method field is parsed, which is expected to refer to a name in the registry. See for example lir.config.data_sources.synthesized_normal_binary or lir.config.data_sources.synthesized_normal_multiclass.
Data sources are provided under the data_sources key.
- Parameters:
cfg (ContextAwareDict) – Data provider configuration.
output_path (Path) – Output path for created objects.
- Returns:
Parsed data provider instance.
- Return type:
- lir.config.data.parse_data_setup(cfg: ContextAwareDict, output_path: Path) DataSetup[source]
Parse data provider and data strategy from configuration.
The fields provider, filter and splits are parsed, which are expected to refer to specific implementations of DataProvider, Transformer and DataStrategy, respectively. See parse_data_provider, parse_module and parse_data_strategy for more information.
- Parameters:
cfg (ContextAwareDict) – Configuration section containing provider and split strategy.
output_path (Path) – Output path for created objects.
- Returns:
Parsed data provider, filter and strategy.
- Return type:
- lir.config.data.parse_data_strategy(cfg: ContextAwareDict, output_path: Path) DataStrategy[source]
Instantiate specific implementation of DataStrategy as configured.
The strategy field is parsed, which is expected to refer to a name in the registry. See for example lir.data_setup.binary_cross_validation or lir.data_setup.binary_train_test_split.
Data setup configuration is provided under the data_setup key.
- Parameters:
cfg (ContextAwareDict) – Data strategy configuration.
output_path (Path) – Output path for created objects.
- Returns:
Parsed data strategy instance.
- Return type: