lir.data_strategies.predefined module

class lir.data_strategies.predefined.PredefinedCrossValidation[source]

Bases: DataStrategy

Split data into cross validation folds based on predefined assignments.

This strategy expects a fold_assignments field in the data. For example, the parse_features_from_csv_file with the fold_assignment_column specifeid will create this field.

Each instance should be labelled according in which test set (fold) the instance should be. This means that care should be taken to use the correct number of folds (= number of unique labels) and wether the folds are based on sources or on instances.

In the experiment setup file, this split strategy can be referenced as follows:

cross_validation_splits:
    strategy: predefined_cross_validation

apply(instances: DataType) → Iterator[tuple[DataType, DataType]][source]

Perform cross-validation based on predefined fold assignments.

This strategy expects a fold_assignments field in the data, where each instance is labelled with a fold identifier. The strategy will return one train/test split for each unique fold identifier, using the instances with that identifier as the test set and the others as the training set.

Parameters:: instances (InstanceDataType) – Input instances to be processed by this method.

class lir.data_strategies.predefined.PredefinedTrainTestSplit[source]

Bases: DataStrategy

Split data into a training set and a test set based on predefined assignments.

This strategy expects a role_assignments field in the data, where each instance is labelled either "train" (included in the training set) or "test" (included in the test set).

In the experiment setup file, this split strategy can be referenced as follows:

train_test_splits:
    strategy: predefined_train_test

apply(instances: DataType) → Iterator[tuple[DataType, DataType]][source]

Split the data into a training set and a test set.

Parameters:: instances (InstanceDataType) – Input instances to be processed by this method.
Yields:: tuple[DataType, DataType] – An iterator over a single item, which is a tuple of the training set and the test set.

class lir.data_strategies.predefined.RoleAssignment(*values)[source]

Bases: Enum

Indicate whether the data is part of the train or the test split.

TEST = 'test'

TRAIN = 'train'

lir.data_strategies.predefined.is_valid_input(instances: InstanceData) → bool[source]: Return True iff predefined strategies can be applied.