lir.transform.pipeline module

class lir.transform.pipeline.LoggingPipeline(steps: list[tuple[str, Transformer | Any]], output_file: PathLike, include_batch_number: bool = True, include_labels: bool = True, include_fields: list[str] | None = None, include_steps: list[str] | None = None, include_input: bool = True)[source]

Bases: Pipeline

A pipeline that writes debugging output to a CSV file.

This pipeline act like a normal Pipeline, but has a CSV file as a side effect. Depending on the settings and the data, the CSV file may have the following columns:

  • batch: if the data strategy yields multiple train/test splits, the batch value is the sequence number of the test set

  • label: the hypothesis label

In addition, there may be columns for the input features and output of individual steps. These columns are named featuresI for input features or stepnameI for step output, where stepname is replaced by the name of the step, and I refers to the index of the feature value.

Parameters:
  • steps (list[tuple[str, Transformer | Any]]) – Ordered transformer steps executed by this pipeline.

  • output_file (PathLike) – Destination file used to log intermediate pipeline output.

  • include_batch_number (bool) – Whether to include the batch number in logged output.

  • include_labels (bool) – Whether to include labels in logged output.

  • include_fields (list[str] | None) – Additional instance fields to include in logged output.

  • include_steps (list[str] | None) – Whether to include step names in logged output.

  • include_input (bool) – Whether to include original inputs in logged output.

apply(instances: InstanceData) InstanceData[source]

Apply the pipeline to the incoming instances.

Parameters:

instances (InstanceData) – Input instances to be processed by this method.

Returns:

Instance data object produced by this operation.

Return type:

InstanceData

class lir.transform.pipeline.Pipeline(steps: list[tuple[str, Transformer | Any]])[source]

Bases: Transformer

A pipeline of processing modules.

Each step in the pipeline may be a

  • a scikit-learn style transformer (with fit() and transform() functions),

  • a scikit-learn style estimator (with fit() and predict_proba()), or

  • a LiR Transformer object.

Example:

from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from lir.transform.pipeline import Pipeline
from lir.algorithms.logistic_regression import LogitCalibrator
from lir.util import probability_to_odds

pipeline = Pipeline(steps=[
        ('scaler', StandardScaler()),       # a scikit-learn transformer, for scaling the data
        ('clf', RandomForestClassifier()),  # a scikit-learn estimator, to calculate pseudo-probabilities
        ('to_odds', probability_to_odds),   # a plain function, to convert probabilities to pseudo-LLRs
        ('calibrator', LogitCalibrator()),  # a LiR transformer, to calibrate the LLRs
])
Parameters:

steps (list[tuple[str, Transformer | Any]]) – Ordered transformer steps executed by this pipeline.

apply(instances: InstanceData) InstanceData[source]

Apply the fitted model on the instance data.

Parameters:

instances (InstanceData) – Input instances to be processed by this method.

Returns:

Instance data object produced by this operation.

Return type:

InstanceData

fit(instances: InstanceData) Self[source]

Fit the model on the instance data.

Parameters:

instances (InstanceData) – Input instances to be processed by this method.

Returns:

The fitted pipeline instance.

Return type:

Self

fit_apply(instances: InstanceData) InstanceData[source]

Combine fitting the transformer/estimator and applying the model to the instances.

Parameters:

instances (InstanceData) – Input instances to be processed by this method.

Returns:

Instance data object produced by this operation.

Return type:

InstanceData

lir.transform.pipeline.logging_pipeline

alias of ConfigParserFunction