lir.transform package

Submodules

lir.transform.distance module

class lir.transform.distance.ElementWiseDifference

Bases: Transformer

Calculate the element-wise absolute difference between pairs.

Takes an array of sample pairs and returns the element-wise absolute difference.

Expects:
  • a PairedFeatureData object with n_trace_instances=1 and n_ref_instances=1;

Return type:

  • a copy of the FeatureData object with features of shape (n, f)

apply(instances: InstanceData) FeatureData

Calculate the absolute difference between all elements in the instance data (pairs).

class lir.transform.distance.ManhattanDistance

Bases: Transformer

Calculate the Manhattan distance between pairs.

Takes a PairedFeatureData object or a FeatureData object and returns the manhattan distance.

If the input is a PairedFeatureData object, the distance is computed as the manhattan distance, i.e. the sum of the element-wise difference between both sides of the pairs, for all features.

If the input is a FeatureData object, it is assumed that it contains the element-wise differences, and the sum over these differences is calculated.

Returns:

a FeatureData object with features of shape (n, 1)

apply(instances: InstanceData) FeatureData

Calculate the Manhattan distance between all elements in the instance data (pairs).

lir.transform.pairing module

class lir.transform.pairing.InstancePairing(same_source_limit=None, different_source_limit=None, ratio_limit=None, seed=None)

Bases: PairingMethod

Construct pairs from a set of instances.

Note that this pairing method may cause performance problems with large datasets, even if the number of instances in the output is limited.

pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData

Construct pairs.

Parameters:
  • instances – the set of instances to be paired

  • n_trace_instances – the number of trace instances in each pair (must be 1 for this pairing method)

  • n_ref_instances – the number of reference instances in each pair (must be 1 for this pairing method)

Returns:

paired instances

property rng

Obtain a random number generator using a provided seed.

class lir.transform.pairing.PairingMethod

Bases: ABC

Base class for pairing methods.

A pairing method should implement the pair() function.

abstractmethod pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData

Takes instances as input, and returns pairs.

A pair may be a pair of sources, with multiple instances per source.

The returned features have dimensions (p, i, …)` where the first dimension is the pairs, the second dimension is the instances, and subsequent dimensions are the features. If the input has labels, the returned labels are an array of source labels, one label per pair, where the labels are 0=different source, 1=same source. Any other attributes are combined into tuples.

Parameters:
  • instances – An array of instance features, with one row per instance.

  • n_trace_instances – Number of instances per trace.

  • n_ref_instances – Number of instances per reference source.

Returns:

instance pairs

class lir.transform.pairing.SourcePairing(same_source_limit: int | None = None, different_source_limit: int | None = None, ratio_limit: int | None = None, seed: Any | int = None)

Bases: PairingMethod

Construct pairs of sources (i.e. classes) from an array of instances.

While pairing at instance level results in pairs of instances, some same-source and some different-source, pairing at source level results in pairing of multiple instances of source A against multiple instances of source B, where A and B can be same-source or different-source.

pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData

Pair sources.

Takes a FeatureData object that contains instances. Returns pairs as a PairedFeatureData object.

The input is expected to have source_ids, that govern how pairs are compiled. The input instances may be used in a pair, either as a trace instance or as a reference instance.

Parameters:
  • instances – the set of instances to be paired

  • n_trace_instances – the number of trace instances in each pair

  • n_ref_instances – the number of reference instances in each pair

Returns:

paired instances

lir.transform.pipeline module

class lir.transform.pipeline.LoggingPipeline(steps: list[tuple[str, Transformer | Any]], output_file: PathLike, include_batch_number: bool = True, include_labels: bool = True, include_fields: list[str] | None = None, include_steps: list[str] | None = None, include_input: bool = True)

Bases: Pipeline

A pipeline that writes debugging output to a CSV file.

apply(instances: InstanceData) InstanceData

Apply the pipeline to the incoming instances.

class lir.transform.pipeline.Pipeline(steps: list[tuple[str, Transformer | Any]])

Bases: Transformer

A pipeline of processing modules.

A module may be a scikit-learn style transformer, estimator, or a LIR Transformer

apply(instances: InstanceData) InstanceData

Apply the fitted model on the instance data.

fit(instances: InstanceData) Self

Fit the model on the instance data.

fit_apply(instances: InstanceData) InstanceData

Combine fitting the transformer/estimator and applying the model to the instances.

lir.transform.pipeline.parse_steps(config: ContextAwareDict | None, output_dir: Path) list[tuple[str, Transformer]]

Parse the defined pipeline steps in the configuration and return the initialized modules as a list.

Module contents

class lir.transform.BinaryClassifierTransformer(estimator: SKLearnPipelineModule)

Bases: Transformer

Implementation of a binary class classifier as scikit-learn Pipeline step.

apply(instances: InstanceData) InstanceData

Convert instances by applying the fitted model.

fit(instances: InstanceData) Self

Fit the model on the provided instances.

class lir.transform.CsvWriter(path: Path, include: list[str] | None = None, header: list[str] | None = None, include_labels: bool = False, include_meta: bool = False, include_input: bool = True, include_batch: bool = False)

Bases: Transformer

Implementation of a transformation step in a scikit-learn Pipeline that writes to CSV.

This might be used to obtain temporary or intermediate results for logging or debugging purposes.

apply(instances: InstanceData) FeatureData

Write numpy feature vector to CSV output file.

fit_apply(instances: InstanceDataType) InstanceDataType

Provide required fit_apply() and return all instances.

Since the CsvWriter is implemented as a step (Transformer) in the pipeline, it should support the fit_apply method which is called on all transformers of the pipeline.

We don’t need to actually fit or transform anything, so we simply return the instances (as is).

class lir.transform.DataWriter(*args, **kwargs)

Bases: Protocol

Representation of a data writer and necessary methods.

writerow(row: Any) None

Write row to output.

class lir.transform.FunctionTransformer(func: Callable)

Bases: Transformer

Implementation of a transformer function as scikit-learn Pipeline step.

apply(instances: InstanceData) FeatureData

Call the custom defined function on the feature data instances and use output as features.

class lir.transform.Identity

Bases: Transformer

Represent the Identity function of a transformer.

When apply() is called on such a transformer, it simply returns the instances.

apply(instances: InstanceDataType) InstanceDataType

Simply provide the instances.

class lir.transform.NumpyTransformer(transformer: Transformer, header: list[str] | None)

Bases: TransformerWrapper

Implementation of a transformer wrapper.

apply(instances: InstanceData) InstanceData

Extend the instances with the desired header data, call base apply.

fit_apply(instances: InstanceData) InstanceData

Extend the instances with the desired header data, call base fit_apply.

class lir.transform.SKLearnPipelineModule(*args, **kwargs)

Bases: Protocol

Representation of the interface required for estimators by the scikit-learn Pipeline.

fit(X: ndarray, y: ndarray | None) Self
predict_proba(X: ndarray) Any
transform(X: ndarray) Any
class lir.transform.SklearnTransformer(transformer: SklearnTransformerType)

Bases: Transformer

Implementation of a binary class classifier as scikit-learn Pipeline step.

apply(instances: InstanceData) InstanceData

Convert instances by applying the fitted model.

fit(instances: InstanceData) Self

Fit the model on the provided instances.

fit_apply(instances: InstanceData) FeatureData

Combine call to .fit() followed by .apply().

class lir.transform.SklearnTransformerType(*args, **kwargs)

Bases: Protocol

Representation of the interface required for transformers by the scikit-learn Pipeline.

fit(features: ndarray, labels: ndarray | None) Self
fit_transform(features: ndarray, labels: ndarray | None) ndarray
transform(features: ndarray) Any
class lir.transform.Tee(transformers: list[Transformer])

Bases: Transformer

Implementation of a custom transformer allowing to perform two separate tasks on a given input.

apply(instances: InstanceData) InstanceData

Delegate apply() to all specified transformers.

fit(instances: InstanceData) Self

Delegate fit() to all specified transformers.

class lir.transform.Transformer

Bases: ABC

Transformer module which is compatible with the scikit-learn Pipeline.

The transformer should provide a transform() method. Since transformers are not fitted to the data, the fit() simply returns the object it was called upon without side effects.

abstractmethod apply(instances: InstanceData) InstanceData

Convert the instance data based on the (optionally fitted) model.

fit(instances: InstanceData) Self

Perform (optional) fitting of the instance data.

fit_apply(instances: InstanceData) InstanceData

Combine call to fit() with directly following call to apply().

class lir.transform.TransformerWrapper(wrapped_transformer: Transformer)

Bases: Transformer

Base class for a transformer wrapper.

This class is derived from AdvancedTransformer and has a default implementation of all functions by forwarding the call to the wrapped transformer. A subclass may add or change functionality by overriding functions.

apply(instances: InstanceData) InstanceData

Delegate calls to underlying wrapped transformer but return the Wrapper instance.

fit(instances: InstanceData) Self

Delegate calls to underlying wrapped transformer but return the Wrapper instance.

lir.transform.as_transformer(transformer_like: Any) Transformer

Provide a Transformer instance of the provided transformer like input.

For any transformer-like object, wrap if necessary, and return a Transformer.

The transformer-like object may be one of the following: - an instance of Transformer, which is returned as-is; - a scikit-learn style transformer which implements transform() and optionally fit() and/or fit_transform(); - a scikit-learn style estimator, which implements fit() and predict_proba(); or - a callable which takes an np.ndarray argument and returns another np.ndarray.

Parameters:

transformer_like – the object to wrap or return

Returns:

a Transformer instance