lir.transform package

Submodules

lir.transform.distance module

class lir.transform.distance.ElementWiseDifference

Bases: Transformer

Calculate the element-wise absolute difference between pairs.

Takes an array of sample pairs and returns the element-wise absolute difference.

Expects:

a PairedFeatureData object with n_trace_instances=1 and n_ref_instances=1;

Return type:

a copy of the FeatureData object with features of shape (n, f)

apply(instances: InstanceData) → FeatureData: Calculate the absolute difference between all elements in the instance data (pairs).

class lir.transform.distance.ManhattanDistance

Bases: Transformer

Calculate the Manhattan distance between pairs.

Takes a PairedFeatureData object or a FeatureData object and returns the manhattan distance.

If the input is a PairedFeatureData object, the distance is computed as the manhattan distance, i.e. the sum of the element-wise difference between both sides of the pairs, for all features.

If the input is a FeatureData object, it is assumed that it contains the element-wise differences, and the sum over these differences is calculated.

Returns:: a FeatureData object with features of shape (n, 1)

apply(instances: InstanceData) → FeatureData: Calculate the Manhattan distance between all elements in the instance data (pairs).

lir.transform.pairing module

class lir.transform.pairing.InstancePairing(same_source_limit=None, different_source_limit=None, ratio_limit=None, seed=None)

Bases: PairingMethod

Construct pairs from a set of instances.

Note that this pairing method may cause performance problems with large datasets, even if the number of instances in the output is limited.

pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) → PairedFeatureData

Construct pairs.

Parameters:

instances – the set of instances to be paired
n_trace_instances – the number of trace instances in each pair (must be 1 for this pairing method)
n_ref_instances – the number of reference instances in each pair (must be 1 for this pairing method)

Returns:

paired instances

property rng: Obtain a random number generator using a provided seed.

class lir.transform.pairing.PairingMethod

Bases: ABC

Base class for pairing methods.

A pairing method should implement the pair() function.

abstractmethod pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) → PairedFeatureData

Takes instances as input, and returns pairs.

A pair may be a pair of sources, with multiple instances per source.

The returned features have dimensions (p, i, …)` where the first dimension is the pairs, the second dimension is the instances, and subsequent dimensions are the features. If the input has labels, the returned labels are an array of source labels, one label per pair, where the labels are 0=different source, 1=same source. Any other attributes are combined into tuples.

Parameters:

instances – An array of instance features, with one row per instance.
n_trace_instances – Number of instances per trace.
n_ref_instances – Number of instances per reference source.

Returns:

instance pairs

class lir.transform.pairing.SourcePairing(same_source_limit: int | None = None, different_source_limit: int | None = None, ratio_limit: int | None = None, seed: Any | int = None)

Bases: PairingMethod

Construct pairs of sources (i.e. classes) from an array of instances.

While pairing at instance level results in pairs of instances, some same-source and some different-source, pairing at source level results in pairing of multiple instances of source A against multiple instances of source B, where A and B can be same-source or different-source.

pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) → PairedFeatureData

Pair sources.

Takes a FeatureData object that contains instances. Returns pairs as a PairedFeatureData object.

The input is expected to have source_ids, that govern how pairs are compiled. The input instances may be used in a pair, either as a trace instance or as a reference instance.

Parameters:

instances – the set of instances to be paired
n_trace_instances – the number of trace instances in each pair
n_ref_instances – the number of reference instances in each pair

Returns:

paired instances

lir.transform.pipeline module

class lir.transform.pipeline.LoggingPipeline(steps: list[tuple[str, Transformer | Any]], output_file: PathLike, include_batch_number: bool = True, include_labels: bool = True, include_fields: list[str] | None = None, include_steps: list[str] | None = None, include_input: bool = True)

Bases: Pipeline

A pipeline that writes debugging output to a CSV file.

apply(instances: InstanceData) → InstanceData: Apply the pipeline to the incoming instances.

class lir.transform.pipeline.Pipeline(steps: list[tuple[str, Transformer | Any]])

Bases: Transformer

A pipeline of processing modules.

A module may be a scikit-learn style transformer, estimator, or a LIR Transformer

apply(instances: InstanceData) → InstanceData: Apply the fitted model on the instance data.

fit(instances: InstanceData) → Self: Fit the model on the instance data.

fit_apply(instances: InstanceData) → InstanceData: Combine fitting the transformer/estimator and applying the model to the instances.

lir.transform.pipeline.parse_steps(config: ContextAwareDict | None, output_dir: Path) → list[tuple[str, Transformer]]: Parse the defined pipeline steps in the configuration and return the initialized modules as a list.

Module contents

class lir.transform.BinaryClassifierTransformer(estimator: SKLearnPipelineModule)

Bases: Transformer

Implementation of a binary class classifier as scikit-learn Pipeline step.

apply(instances: InstanceData) → InstanceData: Convert instances by applying the fitted model.

fit(instances: InstanceData) → Self: Fit the model on the provided instances.

class lir.transform.CsvWriter(path: Path, include: list[str] | None = None, header: list[str] | None = None, include_labels: bool = False, include_meta: bool = False, include_input: bool = True, include_batch: bool = False)

Bases: Transformer

Implementation of a transformation step in a scikit-learn Pipeline that writes to CSV.

This might be used to obtain temporary or intermediate results for logging or debugging purposes.

apply(instances: InstanceData) → FeatureData: Write numpy feature vector to CSV output file.

fit_apply(instances: InstanceDataType) → InstanceDataType

Provide required fit_apply() and return all instances.

Since the CsvWriter is implemented as a step (Transformer) in the pipeline, it should support the fit_apply method which is called on all transformers of the pipeline.

We don’t need to actually fit or transform anything, so we simply return the instances (as is).

class lir.transform.DataWriter(*args, **kwargs)

Bases: Protocol

Representation of a data writer and necessary methods.

writerow(row: Any) → None: Write row to output.

class lir.transform.FunctionTransformer(func: Callable)

Bases: Transformer

Implementation of a transformer function as scikit-learn Pipeline step.

apply(instances: InstanceData) → FeatureData: Call the custom defined function on the feature data instances and use output as features.

class lir.transform.Identity

Bases: Transformer

Represent the Identity function of a transformer.

When apply() is called on such a transformer, it simply returns the instances.

apply(instances: InstanceDataType) → InstanceDataType: Simply provide the instances.

class lir.transform.NumpyTransformer(transformer: Transformer, header: list[str] | None)

Bases: TransformerWrapper

Implementation of a transformer wrapper.

apply(instances: InstanceData) → InstanceData: Extend the instances with the desired header data, call base apply.

fit_apply(instances: InstanceData) → InstanceData: Extend the instances with the desired header data, call base fit_apply.

class lir.transform.SKLearnPipelineModule(*args, **kwargs)

Bases: Protocol

Representation of the interface required for estimators by the scikit-learn Pipeline.

fit(X: ndarray, y: ndarray | None) → Self

predict_proba(X: ndarray) → Any

transform(X: ndarray) → Any

class lir.transform.SklearnTransformer(transformer: SklearnTransformerType)

Bases: Transformer

Implementation of a binary class classifier as scikit-learn Pipeline step.

apply(instances: InstanceData) → InstanceData: Convert instances by applying the fitted model.

fit(instances: InstanceData) → Self: Fit the model on the provided instances.

fit_apply(instances: InstanceData) → FeatureData: Combine call to .fit() followed by .apply().

class lir.transform.SklearnTransformerType(*args, **kwargs)

Bases: Protocol

Representation of the interface required for transformers by the scikit-learn Pipeline.

fit(features: ndarray, labels: ndarray | None) → Self

fit_transform(features: ndarray, labels: ndarray | None) → ndarray

transform(features: ndarray) → Any

class lir.transform.Tee(transformers: list[Transformer])

Bases: Transformer

Implementation of a custom transformer allowing to perform two separate tasks on a given input.

apply(instances: InstanceData) → InstanceData: Delegate apply() to all specified transformers.

fit(instances: InstanceData) → Self: Delegate fit() to all specified transformers.

class lir.transform.Transformer

Bases: ABC

Transformer module which is compatible with the scikit-learn Pipeline.

The transformer should provide a transform() method. Since transformers are not fitted to the data, the fit() simply returns the object it was called upon without side effects.

abstractmethod apply(instances: InstanceData) → InstanceData: Convert the instance data based on the (optionally fitted) model.

fit(instances: InstanceData) → Self: Perform (optional) fitting of the instance data.

fit_apply(instances: InstanceData) → InstanceData: Combine call to fit() with directly following call to apply().

class lir.transform.TransformerWrapper(wrapped_transformer: Transformer)

Bases: Transformer

Base class for a transformer wrapper.

This class is derived from AdvancedTransformer and has a default implementation of all functions by forwarding the call to the wrapped transformer. A subclass may add or change functionality by overriding functions.

apply(instances: InstanceData) → InstanceData: Delegate calls to underlying wrapped transformer but return the Wrapper instance.

fit(instances: InstanceData) → Self: Delegate calls to underlying wrapped transformer but return the Wrapper instance.

lir.transform.as_transformer(transformer_like: Any) → Transformer

Provide a Transformer instance of the provided transformer like input.

For any transformer-like object, wrap if necessary, and return a Transformer.

The transformer-like object may be one of the following: - an instance of Transformer, which is returned as-is; - a scikit-learn style transformer which implements transform() and optionally fit() and/or fit_transform(); - a scikit-learn style estimator, which implements fit() and predict_proba(); or - a callable which takes an np.ndarray argument and returns another np.ndarray.

Parameters:: transformer_like – the object to wrap or return
Returns:: a Transformer instance