lir.transform package
Submodules
lir.transform.distance module
- class lir.transform.distance.ElementWiseDifference
Bases:
TransformerCalculate the element-wise absolute difference between pairs.
Takes an array of sample pairs and returns the element-wise absolute difference.
- Expects:
a PairedFeatureData object with n_trace_instances=1 and n_ref_instances=1;
- Return type:
a copy of the FeatureData object with features of shape (n, f)
- apply(instances: InstanceData) FeatureData
Calculate the absolute difference between all elements in the instance data (pairs).
- class lir.transform.distance.ManhattanDistance
Bases:
TransformerCalculate the Manhattan distance between pairs.
Takes a PairedFeatureData object or a FeatureData object and returns the manhattan distance.
If the input is a PairedFeatureData object, the distance is computed as the manhattan distance, i.e. the sum of the element-wise difference between both sides of the pairs, for all features.
If the input is a FeatureData object, it is assumed that it contains the element-wise differences, and the sum over these differences is calculated.
- Returns:
a FeatureData object with features of shape (n, 1)
- apply(instances: InstanceData) FeatureData
Calculate the Manhattan distance between all elements in the instance data (pairs).
lir.transform.pairing module
- class lir.transform.pairing.InstancePairing(same_source_limit=None, different_source_limit=None, ratio_limit=None, seed=None)
Bases:
PairingMethodConstruct pairs from a set of instances.
Note that this pairing method may cause performance problems with large datasets, even if the number of instances in the output is limited.
- pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData
Construct pairs.
- Parameters:
instances – the set of instances to be paired
n_trace_instances – the number of trace instances in each pair (must be 1 for this pairing method)
n_ref_instances – the number of reference instances in each pair (must be 1 for this pairing method)
- Returns:
paired instances
- property rng
Obtain a random number generator using a provided seed.
- class lir.transform.pairing.PairingMethod
Bases:
ABCBase class for pairing methods.
A pairing method should implement the pair() function.
- abstractmethod pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData
Takes instances as input, and returns pairs.
A pair may be a pair of sources, with multiple instances per source.
The returned features have dimensions (p, i, …)` where the first dimension is the pairs, the second dimension is the instances, and subsequent dimensions are the features. If the input has labels, the returned labels are an array of source labels, one label per pair, where the labels are 0=different source, 1=same source. Any other attributes are combined into tuples.
- Parameters:
instances – An array of instance features, with one row per instance.
n_trace_instances – Number of instances per trace.
n_ref_instances – Number of instances per reference source.
- Returns:
instance pairs
- class lir.transform.pairing.SourcePairing(same_source_limit: int | None = None, different_source_limit: int | None = None, ratio_limit: int | None = None, seed: Any | int = None)
Bases:
PairingMethodConstruct pairs of sources (i.e. classes) from an array of instances.
While pairing at instance level results in pairs of instances, some same-source and some different-source, pairing at source level results in pairing of multiple instances of source A against multiple instances of source B, where A and B can be same-source or different-source.
- pair(instances: InstanceData, n_trace_instances: int = 1, n_ref_instances: int = 1) PairedFeatureData
Pair sources.
Takes a FeatureData object that contains instances. Returns pairs as a PairedFeatureData object.
The input is expected to have source_ids, that govern how pairs are compiled. The input instances may be used in a pair, either as a trace instance or as a reference instance.
- Parameters:
instances – the set of instances to be paired
n_trace_instances – the number of trace instances in each pair
n_ref_instances – the number of reference instances in each pair
- Returns:
paired instances
lir.transform.pipeline module
- class lir.transform.pipeline.LoggingPipeline(steps: list[tuple[str, Transformer | Any]], output_file: PathLike, include_batch_number: bool = True, include_labels: bool = True, include_fields: list[str] | None = None, include_steps: list[str] | None = None, include_input: bool = True)
Bases:
PipelineA pipeline that writes debugging output to a CSV file.
- apply(instances: InstanceData) InstanceData
Apply the pipeline to the incoming instances.
- class lir.transform.pipeline.Pipeline(steps: list[tuple[str, Transformer | Any]])
Bases:
TransformerA pipeline of processing modules.
A module may be a scikit-learn style transformer, estimator, or a LIR Transformer
- apply(instances: InstanceData) InstanceData
Apply the fitted model on the instance data.
- fit(instances: InstanceData) Self
Fit the model on the instance data.
- fit_apply(instances: InstanceData) InstanceData
Combine fitting the transformer/estimator and applying the model to the instances.
- lir.transform.pipeline.parse_steps(config: ContextAwareDict | None, output_dir: Path) list[tuple[str, Transformer]]
Parse the defined pipeline steps in the configuration and return the initialized modules as a list.
Module contents
- class lir.transform.BinaryClassifierTransformer(estimator: SKLearnPipelineModule)
Bases:
TransformerImplementation of a binary class classifier as scikit-learn Pipeline step.
- apply(instances: InstanceData) InstanceData
Convert instances by applying the fitted model.
- fit(instances: InstanceData) Self
Fit the model on the provided instances.
- class lir.transform.CsvWriter(path: Path, include: list[str] | None = None, header: list[str] | None = None, include_labels: bool = False, include_meta: bool = False, include_input: bool = True, include_batch: bool = False)
Bases:
TransformerImplementation of a transformation step in a scikit-learn Pipeline that writes to CSV.
This might be used to obtain temporary or intermediate results for logging or debugging purposes.
- apply(instances: InstanceData) FeatureData
Write numpy feature vector to CSV output file.
- fit_apply(instances: InstanceDataType) InstanceDataType
Provide required fit_apply() and return all instances.
Since the CsvWriter is implemented as a step (Transformer) in the pipeline, it should support the fit_apply method which is called on all transformers of the pipeline.
We don’t need to actually fit or transform anything, so we simply return the instances (as is).
- class lir.transform.DataWriter(*args, **kwargs)
Bases:
ProtocolRepresentation of a data writer and necessary methods.
- writerow(row: Any) None
Write row to output.
- class lir.transform.FunctionTransformer(func: Callable)
Bases:
TransformerImplementation of a transformer function as scikit-learn Pipeline step.
- apply(instances: InstanceData) FeatureData
Call the custom defined function on the feature data instances and use output as features.
- class lir.transform.Identity
Bases:
TransformerRepresent the Identity function of a transformer.
When apply() is called on such a transformer, it simply returns the instances.
- apply(instances: InstanceDataType) InstanceDataType
Simply provide the instances.
- class lir.transform.NumpyTransformer(transformer: Transformer, header: list[str] | None)
Bases:
TransformerWrapperImplementation of a transformer wrapper.
- apply(instances: InstanceData) InstanceData
Extend the instances with the desired header data, call base apply.
- fit_apply(instances: InstanceData) InstanceData
Extend the instances with the desired header data, call base fit_apply.
- class lir.transform.SKLearnPipelineModule(*args, **kwargs)
Bases:
ProtocolRepresentation of the interface required for estimators by the scikit-learn Pipeline.
- fit(X: ndarray, y: ndarray | None) Self
- predict_proba(X: ndarray) Any
- transform(X: ndarray) Any
- class lir.transform.SklearnTransformer(transformer: SklearnTransformerType)
Bases:
TransformerImplementation of a binary class classifier as scikit-learn Pipeline step.
- apply(instances: InstanceData) InstanceData
Convert instances by applying the fitted model.
- fit(instances: InstanceData) Self
Fit the model on the provided instances.
- fit_apply(instances: InstanceData) FeatureData
Combine call to .fit() followed by .apply().
- class lir.transform.SklearnTransformerType(*args, **kwargs)
Bases:
ProtocolRepresentation of the interface required for transformers by the scikit-learn Pipeline.
- fit(features: ndarray, labels: ndarray | None) Self
- fit_transform(features: ndarray, labels: ndarray | None) ndarray
- transform(features: ndarray) Any
- class lir.transform.Tee(transformers: list[Transformer])
Bases:
TransformerImplementation of a custom transformer allowing to perform two separate tasks on a given input.
- apply(instances: InstanceData) InstanceData
Delegate apply() to all specified transformers.
- fit(instances: InstanceData) Self
Delegate fit() to all specified transformers.
- class lir.transform.Transformer
Bases:
ABCTransformer module which is compatible with the scikit-learn Pipeline.
The transformer should provide a transform() method. Since transformers are not fitted to the data, the fit() simply returns the object it was called upon without side effects.
- abstractmethod apply(instances: InstanceData) InstanceData
Convert the instance data based on the (optionally fitted) model.
- fit(instances: InstanceData) Self
Perform (optional) fitting of the instance data.
- fit_apply(instances: InstanceData) InstanceData
Combine call to fit() with directly following call to apply().
- class lir.transform.TransformerWrapper(wrapped_transformer: Transformer)
Bases:
TransformerBase class for a transformer wrapper.
This class is derived from AdvancedTransformer and has a default implementation of all functions by forwarding the call to the wrapped transformer. A subclass may add or change functionality by overriding functions.
- apply(instances: InstanceData) InstanceData
Delegate calls to underlying wrapped transformer but return the Wrapper instance.
- fit(instances: InstanceData) Self
Delegate calls to underlying wrapped transformer but return the Wrapper instance.
- lir.transform.as_transformer(transformer_like: Any) Transformer
Provide a Transformer instance of the provided transformer like input.
For any transformer-like object, wrap if necessary, and return a Transformer.
The transformer-like object may be one of the following: - an instance of Transformer, which is returned as-is; - a scikit-learn style transformer which implements transform() and optionally fit() and/or fit_transform(); - a scikit-learn style estimator, which implements fit() and predict_proba(); or - a callable which takes an np.ndarray argument and returns another np.ndarray.
- Parameters:
transformer_like – the object to wrap or return
- Returns:
a Transformer instance