lir.transform.pipeline module
- class lir.transform.pipeline.LoggingPipeline(steps: list[tuple[str, Transformer | Any]], output_file: PathLike, include_batch_number: bool = True, include_labels: bool = True, include_fields: list[str] | None = None, include_steps: list[str] | None = None, include_input: bool = True)[source]
Bases:
PipelineA pipeline that writes debugging output to a CSV file.
This pipeline act like a normal
Pipeline, but has a CSV file as a side effect. Depending on the settings and the data, the CSV file may have the following columns:batch: if the data strategy yields multiple train/test splits, the batch value is the sequence number of the test setlabel: the hypothesis label
In addition, there may be columns for the input features and output of individual steps. These columns are named
featuresIfor input features orstepnameIfor step output, wherestepnameis replaced by the name of the step, andIrefers to the index of the feature value.- Parameters:
steps (list[tuple[str, Transformer | Any]]) – Ordered transformer steps executed by this pipeline.
output_file (PathLike) – Destination file used to log intermediate pipeline output.
include_batch_number (bool) – Whether to include the batch number in logged output.
include_labels (bool) – Whether to include labels in logged output.
include_fields (list[str] | None) – Additional instance fields to include in logged output.
include_steps (list[str] | None) – Whether to include step names in logged output.
include_input (bool) – Whether to include original inputs in logged output.
- apply(instances: InstanceData) InstanceData[source]
Apply the pipeline to the incoming instances.
- Parameters:
instances (InstanceData) – Input instances to be processed by this method.
- Returns:
Instance data object produced by this operation.
- Return type:
- class lir.transform.pipeline.Pipeline(steps: list[tuple[str, Transformer | Any]])[source]
Bases:
TransformerA pipeline of processing modules.
Each step in the pipeline may be a
a scikit-learn style transformer (with
fit()andtransform()functions),a scikit-learn style estimator (with
fit()andpredict_proba()), ora LiR
Transformerobject.
Example:
from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier from lir.transform.pipeline import Pipeline from lir.algorithms.logistic_regression import LogitCalibrator from lir.util import probability_to_odds pipeline = Pipeline(steps=[ ('scaler', StandardScaler()), # a scikit-learn transformer, for scaling the data ('clf', RandomForestClassifier()), # a scikit-learn estimator, to calculate pseudo-probabilities ('to_odds', probability_to_odds), # a plain function, to convert probabilities to pseudo-LLRs ('calibrator', LogitCalibrator()), # a LiR transformer, to calibrate the LLRs ])
- Parameters:
steps (list[tuple[str, Transformer | Any]]) – Ordered transformer steps executed by this pipeline.
- apply(instances: InstanceData) InstanceData[source]
Apply the fitted model on the instance data.
- Parameters:
instances (InstanceData) – Input instances to be processed by this method.
- Returns:
Instance data object produced by this operation.
- Return type:
- fit(instances: InstanceData) Self[source]
Fit the model on the instance data.
- Parameters:
instances (InstanceData) – Input instances to be processed by this method.
- Returns:
The fitted pipeline instance.
- Return type:
Self
- fit_apply(instances: InstanceData) InstanceData[source]
Combine fitting the transformer/estimator and applying the model to the instances.
- Parameters:
instances (InstanceData) – Input instances to be processed by this method.
- Returns:
Instance data object produced by this operation.
- Return type:
- lir.transform.pipeline.logging_pipeline
alias of
ConfigParserFunction