lir.data_strategies.predefined module
- class lir.data_strategies.predefined.PredefinedCrossValidation[source]
Bases:
DataStrategySplit data into cross validation folds based on predefined assignments.
This strategy expects a
fold_assignmentsfield in the data. For example, theparse_features_from_csv_filewith thefold_assignment_columnspecifeid will create this field.Each instance should be labelled according in which test set (fold) the instance should be. This means that care should be taken to use the correct number of folds (= number of unique labels) and wether the folds are based on sources or on instances.
In the experiment setup file, this split strategy can be referenced as follows:
cross_validation_splits: strategy: predefined_cross_validation
- apply(instances: DataType) Iterator[tuple[DataType, DataType]][source]
Perform cross-validation based on predefined fold assignments.
This strategy expects a
fold_assignmentsfield in the data, where each instance is labelled with a fold identifier. The strategy will return one train/test split for each unique fold identifier, using the instances with that identifier as the test set and the others as the training set.- Parameters:
instances (InstanceDataType) – Input instances to be processed by this method.
- class lir.data_strategies.predefined.PredefinedTrainTestSplit[source]
Bases:
DataStrategySplit data into a training set and a test set based on predefined assignments.
This strategy expects a
role_assignmentsfield in the data, where each instance is labelled either"train"(included in the training set) or"test"(included in the test set).In the experiment setup file, this split strategy can be referenced as follows:
train_test_splits: strategy: predefined_train_test
- apply(instances: DataType) Iterator[tuple[DataType, DataType]][source]
Split the data into a training set and a test set.
- Parameters:
instances (InstanceDataType) – Input instances to be processed by this method.
- Yields:
tuple[DataType, DataType] – An iterator over a single item, which is a tuple of the training set and the test set.
- class lir.data_strategies.predefined.RoleAssignment(*values)[source]
Bases:
EnumIndicate whether the data is part of the train or the test split.
- TEST = 'test'
- TRAIN = 'train'
- lir.data_strategies.predefined.is_valid_input(instances: InstanceData) bool[source]
Return True iff predefined strategies can be applied.