lir.algorithms.bootstraps module
- class lir.algorithms.bootstraps.Bootstrap(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None)[source]
Bases:
Pipeline,ABCBootstrap system that estimates confidence intervals around the best estimate of a pipeline.
This bootstrap system creates bootstrap samples from the training data, fits the pipeline on each sample, and then computes confidence intervals for the pipeline outputs based on the variability across the bootstrap samples.
Computing these intervals is done by creating interpolation functions that map the best estimate to the difference between the best estimate and the lower and upper bounds of the confidence interval. To achieve this, two subclasses are provided that differ in how the data points for interval estimation are obtained.
BootstrapAtData: Uses the original training data points for interval estimation.
BootstrapEquidistant: Uses equidistant points within the range of the training data for interval estimation.
The AtData variant allows for more complex data types, while the Equidistant variant is only suitable for continuous features.
- Parameters:
steps (list[tuple[str, Any]]) – The pipeline steps to bootstrap.
n_bootstraps (int, optional) – Number of bootstrap samples to generate.
interval (tuple[float, float], optional) – Lower and upper quantiles for the confidence interval.
seed (int | None, optional) – Random seed for reproducibility.
- apply(instances: InstanceData) LLRData[source]
Transform the provided instances to include the best estimate and confidence intervals.
- Parameters:
instances (InstanceData) – The feature data to transform.
- Returns:
The transformed feature data with best estimate and confidence intervals.
- Return type:
- fit(instances: InstanceData) Self[source]
Fit the bootstrap system to the provided instances.
- Parameters:
instances (InstanceData) – The feature data to fit the bootstrap system on.
- Returns:
The fitted bootstrap system.
- Return type:
Self
- fit_apply(instances: InstanceData) LLRData[source]
Combine fitting and transforming in one step.
- Parameters:
instances (InstanceData) – The feature data to fit and transform.
- Returns:
The transformed feature data with best estimate and confidence intervals.
- Return type:
- abstractmethod get_bootstrap_data(instances: InstanceData) InstanceData[source]
Get the data points to use for interval estimation.
This method should be implemented by subclasses to specify how the data points for interval estimation are obtained.
- Parameters:
instances (InstanceData) – The feature data to fit the bootstrap system on.
- Returns:
The feature data to use for interval estimation.
- Return type:
- class lir.algorithms.bootstraps.BootstrapAtData(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None)[source]
Bases:
BootstrapBootstrap system that uses the original training data points for interval estimation.
See the Bootstrap class for more details.
- get_bootstrap_data(instances: InstanceData) InstanceData[source]
Get the data points to use for interval estimation.
The original training data points are used.
- Parameters:
instances (InstanceData) – The feature data to fit the bootstrap system on.
- Returns:
The feature data to use for interval estimation.
- Return type:
- class lir.algorithms.bootstraps.BootstrapEquidistant(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None, n_points: int | None = 1000)[source]
Bases:
BootstrapBootstrap system that uses equidistant points within the range of the training data for interval estimation.
See the Bootstrap class for more details.
- Parameters:
steps (list[tuple[str, Any]]) – The pipeline steps to bootstrap.
n_bootstraps (int, optional) – Number of bootstrap samples to generate.
interval (tuple[float, float], optional) – Lower and upper quantiles for the confidence interval.
seed (int | None, optional) – Random seed for reproducibility.
n_points (int | None, optional) – Number of equidistant points to use for interval estimation.
- get_bootstrap_data(instances: InstanceData) FeatureData[source]
Get the data points to use for interval estimation.
This is done by creating equidistant points within the range of the training data.
- Parameters:
instances (InstanceData) – The feature data to fit the bootstrap system on.
- Returns:
The feature data to use for interval estimation, consisting of equidistant points within the range of the training data.
- Return type: