lir.data.models module
- class lir.data.models.DataProvider[source]
Bases:
ABCBase class for data providers.
Each data provider should provide access to instance data by implementing the get_instances() method.
- abstractmethod get_instances() InstanceData[source]
Return an InstanceData object, containing data for a set of instances.
- Returns:
Instance data object produced by this operation.
- Return type:
- class lir.data.models.DataStrategy[source]
Bases:
ABCBase class for data (splitting) strategies.
- abstractmethod apply(instances: DataType) Iterable[tuple[DataType, DataType]][source]
Provide iterator to access training and test set.
Returns an iterator over tuples of a training set and a test set. Both the training set and the test is represented by an InstanceData object.
- Parameters:
instances (DataType) – Input instances to be processed by this method.
- Returns:
Iterable of
(train_set, test_set)splits for the provided data.- Return type:
Iterable[tuple[DataType, DataType]]
- class lir.data.models.FeatureData(*, labels: Annotated[ndarray | None, AfterValidator(func=_validate_labels)] = None, source_ids: Annotated[ndarray | None, AfterValidator(func=_validate_source_ids)] = None, features: Annotated[ndarray, AfterValidator(func=_validate_features)], **extra_data: Any)[source]
Bases:
InstanceDataData class for feature data.
Feature data can be any type of numeric data that is associated with the instances, such as measurements on a single instance or similarity scores between a pair of instances.
If the object describes single instance data, the features attribute is generally 2-dimensional, with one row per instance and one or more feature columns.
More than 2 dimensions may be used for paired data, see PairedFeatureData.
- - features
- Type:
an array of instance features, with one row per instance
- check_features() Self[source]
Validate the features.
- Returns:
This feature-data object after numeric type validation.
- Return type:
Self
- check_matching_shapes() Self[source]
Validate the shape of the features and the labels are matching.
- Returns:
This feature-data object after shape consistency checks.
- Return type:
Self
- features: Annotated[ndarray, AfterValidator(func=_validate_features)]
- model_config = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lir.data.models.InstanceData(*, labels: Annotated[ndarray | None, AfterValidator(func=_validate_labels)] = None, source_ids: Annotated[ndarray | None, AfterValidator(func=_validate_source_ids)] = None, **extra_data: Any)[source]
Bases:
BaseModel,ABCBase class for data on instances.
An InstanceData object may be labeled or unlabeled with ground-truth data. If it is labeled, the label values correspond to the hypotheses and have values 0 or 1. In literature, the labels may have different names for values 1 and 0 respectively, such as:
hypothesis 1 and hypothesis 2 (or H1 and H2)
prosecutor’s hypothesis and defense hypothesis (or Hp and Hd)
same-source and different-source (or Hss and Hds)
The instances may optionally be associated with sources by means of the source_ids attribute. If available, each instance will generally have one source id if the object holds single instances, or two source ids if the object holds pairs of instances.
This class imposes no restrictions on the actual instance data. Sub class implementations will specialize in particular data types.
- - `labels`
either 0 or 1.
- Type:
The hypothesis labels of the instances, as a 1-dimensional array with one value per instance, can be
- - `source_ids`
except if it is a pair, in which case it has two sources. The source ids is either a 1-dimensional array or a 2-dimensional array with two columns.
- Type:
The ids of all sources that contributed to the instances. Each instance is from a single source,
- property all_fields: list[str]
Return all available field names for this data object.
- Returns:
Names of all standard and extra fields available on the instance.
- Return type:
list[str]
- apply(fn: Callable, *args: Any, **kwargs: Any) Self[source]
Apply a custom function to this InstanceData object.
The function fn is applied to all Numpy fields. Other fields are copied as-is.
- Parameters:
fn (Callable) – Value passed via
fn.*args (Any) – Additional positional arguments forwarded to the underlying call.
**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- Returns:
New instance data object after applying the function to numpy fields.
- Return type:
Self
- check_both_labels() ndarray[source]
Return labels or raise an error if they are missing or if they do not represent both hypotheses.
- Raise:
ValueError if hypothesis labels are missing or either label is not represented.
- Returns:
Label array containing both classes 0 and 1.
- Return type:
np.ndarray
- check_sourceids_labels_match() Self[source]
Validate the source_ids and labels have matching shapes.
- Returns:
This instance data object after post-init validation.
- Return type:
Self
- combine(others: list[InstanceData] | InstanceData, fn: Callable, *args: Any, **kwargs: Any) Self[source]
Apply a custom combination function to InstanceData objects.
All objects must have the same types and fields, and the same values for all non-numpy array fields, or an error is raised. Numpy fields are concatenated using fn. Other fields are copied as-is.
- Parameters:
others ('list[InstanceData] | InstanceData') – Value passed via
others.fn (Callable) – Value passed via
fn.*args (Any) – Additional positional arguments forwarded to the underlying call.
**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- Returns:
New instance data object after applying the combination function.
- Return type:
Self
- concatenate(*others: InstanceData) Self[source]
Concatenate instances from InstanceData objects.
All concatenated objects must have the same types and fields. How fields are concatenated may depend on the subclass. By default, they must have the same values for all non-numpy array fields, or an error is raised. Numpy fields are concatenated using np.concatenate. Other fields are copied as-is.
Returns a new object with the concatenated instances.
- Parameters:
*others ('InstanceData') – Value passed via
others.- Returns:
New instance data object with concatenated rows.
- Return type:
Self
- property has_labels: bool
Indicate whether label values are available.
- Returns:
Truewhen label information is present.- Return type:
bool
- has_same_type(other: Any) bool[source]
Compare these instance data to another class.
Returns True iff: - other has the same class - other has the same fields - all fields have the same type
- Parameters:
other (Any) – Value passed via
other.- Returns:
Truewhen type, fields, and field value types all match.- Return type:
bool
- labels: Annotated[ndarray | None, AfterValidator(func=_validate_labels)]
- model_config = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- replace(**kwargs: Any) Self[source]
Return a modified copy with updated values.
- Parameters:
**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- Returns:
Copy of this object with the provided fields replaced.
- Return type:
Self
- replace_as(datatype: type[InstanceDataType], **kwargs: Any) InstanceDataType[source]
Return a modified copy with updated data type and values.
- Parameters:
datatype (type['InstanceDataType']) – Value passed via
datatype.**kwargs (Any) – Additional keyword arguments forwarded to the underlying call.
- Returns:
Instance data object produced by this operation.
- Return type:
‘InstanceDataType’
- property require_labels: ndarray
Return labels and guarantee that it is not None (or raise an error).
- Returns:
Label array guaranteed to contain values for both hypotheses.
- Return type:
np.ndarray
- source_ids: Annotated[ndarray | None, AfterValidator(func=_validate_source_ids)]
- property source_ids_1d: ndarray
Return source identifiers as a one-dimensional array.
- Returns:
One-dimensional source-id array with one source per instance.
- Return type:
np.ndarray
- class lir.data.models.LLRData(*, labels: Annotated[ndarray | None, AfterValidator(func=_validate_labels)] = None, source_ids: Annotated[ndarray | None, AfterValidator(func=_validate_source_ids)] = None, features: Annotated[ndarray, AfterValidator(func=_validate_features)], llr_upper_bound: float | None = None, llr_lower_bound: float | None = None, **extra_data: Any)[source]
Bases:
FeatureDataRepresentation of calculated LLR values.
An object of LLRData adds a specific interpretation to the features attribute.
If the features attribute has a single column (i.e. dimensions (n, 1)), the values are LLRs.
If the features attribute has three columns (i.e. dimensions (n, 3)), the values are LLRs and their confidence intervals.
The values are also accessible by the attributes llrs and llr_intervals.
- - llrs
- Type:
1-dimensional numpy array of LLR values
- - has_intervals
- Type:
indicate whether the LLR’s have intervals
- - llr_intervals
- Type:
numpy array of LLR values of dimensions (n, 2), or None if the LLR’s have no intervals
- - llr_upper_bound
- Type:
upper bound applied to the LLRs, or None if no upper bound was applied
- - llr_lower_bound
- Type:
lower bound applied to the LLRs, or None if no lower bound was applied
- check_features_are_llrs() Self[source]
Validate the feature data.
- Returns:
This LLR object after validating LLR-specific feature constraints.
- Return type:
Self
- check_misleading_finite() None[source]
Check whether all values are either finite or not misleading.
- feature_for_plot(source_key: str) ndarray | None[source]
Return the feature values for a given source key, or None if not available.
The return value has to be saved during the LR system execution by using the save_features_after_step configuration option. If the feature values for the given source key are not available, this method returns None. Use the require_feature_for_plots if you want to raise an error instead of returning None when the feature values are not available.
- Parameters:
source_key (str) – Key identifying the source of the feature values to be returned.
- Returns:
Feature values for the specified source key, or
Noneif not available.- Return type:
np.ndarray | None
- property has_intervals: bool
Indicate whether interval bounds are present for each LLR.
- Returns:
Truewhen lower and upper interval bounds are included.- Return type:
bool
- property llr_bounds: tuple[float | None, float | None]
Return global lower and upper bounds applied to LLR values.
- Returns:
Tuple containing global lower and upper LLR clipping bounds.
- Return type:
tuple[float | None, float | None]
- property llr_intervals: ndarray | None
Return interval bounds for each LLR when available.
- Returns:
Two-column array with lower and upper LLR bounds, if available.
- Return type:
np.ndarray | None
- llr_lower_bound: float | None
- llr_upper_bound: float | None
- property llrs: ndarray
Return the core LLR values.
- Returns:
One-dimensional array containing the central LLR values.
- Return type:
np.ndarray
- model_config = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- require_feature_for_plots(source_key: str) ndarray[source]
Return the feature values for a given source key, raising an error if not available.
If the feature values for the given source key are not available, this method raises a ValueError with an informative error message. Use the feature_for_plot method if you want to return None instead of raising an error when the feature values for the given source key are not available.
- Parameters:
source_key (str) – Key identifying the source of the feature values to be returned.
- Returns:
Feature values for the specified source key.
- Return type:
np.ndarray
- Raises:
ValueError – If the feature values for the given source key are not available.
- class lir.data.models.PairedFeatureData(*, labels: Annotated[ndarray | None, AfterValidator(func=_validate_labels)] = None, source_ids: Annotated[ndarray | None, AfterValidator(func=_validate_source_ids)] = None, features: Annotated[ndarray, AfterValidator(func=_validate_features)], n_trace_instances: int, n_ref_instances: int, **extra_data: Any)[source]
Bases:
FeatureDataData class for instance pair data.
Each item in this data set represents instances from the “trace” source and from the “reference” source. The number of instances from either source must be at least one.
The features attribute has at least 3 dimensions: - the pairs are along the first dimension; - the instances are along the second dimension (e.g. in a comparison of 1 trace instance and 1 reference instance,
the length of this dimension is 2);
the features are along the third dimension onward.
The source_ids, if available, must have two values for each item, i.e. 2 columns.
- - n_trace_instances
- Type:
the number of trace instances in each pair
- - n_ref_instances
- Type:
the number of reference instances in each pair
- - features
second
- Type:
the features of all instances in the pair, with pairs along the first dimension, and instances along the
- - source_ids
columns
- Type:
the source ids of the trace and reference instances of each pair, a 2-dimensional array with two
- - features_trace
- Type:
the features of the trace instances
- - features_ref
- Type:
the features of the reference instances
- - source_ids_trace
- Type:
the source ids of the trace instances
- - source_ids_ref
- Type:
the source ids of the reference instances
- check_features_dimensions() Self[source]
Validate feature dimensions.
- Returns:
This paired-feature object after feature-dimension validation.
- Return type:
Self
- check_sourceid_shape() Self[source]
Override the InstanceData implementation.
- Returns:
This paired-feature object after source-id shape validation.
- Return type:
Self
- property features_ref: ndarray
Get the features of the reference instances.
- Returns:
Feature tensor slice containing reference-instance features.
- Return type:
np.ndarray
- property features_trace: ndarray
Get the features of the trace instances.
- Returns:
Feature tensor slice containing trace-instance features.
- Return type:
np.ndarray
- model_config = {'arbitrary_types_allowed': True, 'extra': 'allow', 'frozen': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_ref_instances: int
- n_trace_instances: int
- property source_ids_ref: ndarray | None
Get the source ids of the reference instances.
- Returns:
Reference source IDs when available, otherwise
None.- Return type:
np.ndarray | None
- property source_ids_trace: ndarray | None
Get the source ids of the trace instances.
- Returns:
Trace source IDs when available, otherwise
None.- Return type:
np.ndarray | None
- lir.data.models.concatenate_instances(first: InstanceDataType, *others: InstanceDataType) InstanceDataType[source]
Concatenate the results of the InstanceData objects.
Alias for first.concatenate(*others).
- Parameters:
first (InstanceDataType) – Value passed via
first.*others (InstanceDataType) – Value passed via
others.
- Returns:
Instance data object produced by this operation.
- Return type:
InstanceDataType
- lir.data.models.get_instances_by_category(instances: InstanceDataType, category_field: str, category_shape: tuple[int] | None = None) Iterator[tuple[ndarray, InstanceDataType]][source]
Return subsets of a set of instances by category.
The instances object must have a field by the name of category_field. That field is a numpy array with one row per instance. Its values are the categories of each instance. The field may have any shape, as long as the number of rows matches the number of instances.
If category_shape is provided, the shape of the category field is checked against this value.
The returned value is an iterator with each item being a tuple of the category and the subset of instances of that category.
- Parameters:
instances (InstanceDataType) – Input instances to be processed by this method.
category_field (str) – Value passed via
category_field.category_shape (tuple[int] | None) – Value passed via
category_shape.