lir.data_strategies.pairs module

class lir.data_strategies.pairs.PairsTrainTestSplit(test_size: float | int, seed: int | None = None)[source]

Bases: DataStrategy

A train/test split policy for paired instances.

The input data should have source_ids with two columns. This split assigns all sources to either the training set or the test set. The pairs are assigned to training or testing if both of their sources have that role. Pairs with mixed roles are omitted.

Parameters:

test_size (float | int) – Fraction or absolute number of items assigned to the test split.
seed (int | None) – Random seed controlling stochastic behaviour for reproducible results.

apply(instances: DataType) → Iterator[tuple[DataType, DataType]][source]

Split the data into a training set and a test set.

Parameters:: instances (InstanceDataType) – Input instances to be processed by this method.
Yields:: tuple[DataType, DataType] – An iterator over a single item, which is a tuple of the training set and the test set.

lir.data_strategies.pairs.is_valid_input(instances: InstanceData) → bool[source]: Return True iff pair-based strategies can be applied.