lir.algorithms package
Submodules
lir.algorithms.bayeserror module
Normalized Bayes Error-rate (NBE).
See: [-] Peter Vergeer, Andrew van Es, Arent de Jongh, Ivo Alberink and Reinoud
Stoel, Numerical likelihood ratios outputted by LR systems are often based on extrapolation: When to stop extrapolating? In: Science and Justice 56 (2016) 482–491.
- class lir.algorithms.bayeserror.ELUBBounder(lower_llr_bound: float | None = None, upper_llr_bound: float | None = None)
Bases:
LLRBounderCalculate the Emperical Upper and Lower Bounds for a given LR system.
Class that, given an LR system, outputs the same LRs as the system but bounded by the Empirical Upper and Lower Bounds as described in P. Vergeer, A. van Es, A. de Jongh, I. Alberink, R.D. Stoel, Numerical likelihood ratios outputted by LR systems are often based on extrapolation: when to stop extrapolating? Sci. Justics 56 (2016) 482-491.
# MATLAB code from the authors:
# clear all; close all; # llrs_hp=csvread(’…’); # llrs_hd=csvread(’…’); # start=-7; finish=7; # rho=start:0.01:finish; theta=10.^rho; # nbe=[]; # for k=1:length(rho) # if rho(k)<0 # llrs_hp=[llrs_hp;rho(k)]; # nbe=[nbe;(theta(k)^(-1))*mean(llrs_hp<=rho(k))+… # mean(llrs_hd>rho(k))]; # else # llrs_hd=[llrs_hd;rho(k)]; # nbe=[nbe;theta(k)*mean(llrs_hd>=rho(k))+… # mean(llrs_hp<rho(k))]; # end # end # plot(rho,-log10(nbe)); hold on; # plot([start finish],[0 0]); # a=rho(-log10(nbe)>0); # empirical_bounds=[min(a) max(a)]
- lir.algorithms.bayeserror.calculate_expected_utility(lrs: ndarray, y: ndarray, threshold_lrs: ndarray, add_misleading: int = 0) float
Calculate the expected utility of a set of LRs for a given threshold.
- Parameters:
lrs – an array of LRs
y – an array of ground-truth labels (values 0 for Hd or 1 for Hp); must be of the same length as lrs
threshold_lrs – an array of threshold lrs: minimum LR for acceptance
add_misleading – the number of consequential misleading LRs to be added.
- Returns:
an array of utility values, one element for each threshold LR
- lir.algorithms.bayeserror.elub(llrdata: LLRData, add_misleading: int = 1, step_size: float = 0.01, substitute_extremes: tuple[float, float] = (-9, 9)) tuple[float, float]
Calculate and return the empirical upper and lower bound log10-LRs (ELUB LLRs).
- Parameters:
llrdata – An instance of LLRData containing LLRs and ground-truth labels
add_misleading – the number of consequential misleading LLRs to be added to both sides (labels 0 and 1)
step_size – required accuracy on a 10-base logarithmic scale
substitute_extremes – tuple of scalars: substitute for extreme LRs, i.e. LRs of 0 and inf are substituted by these values
- lir.algorithms.bayeserror.plot_nbe(llrdata: LLRData, log_lr_threshold_range: tuple[float, float] | None=None, add_misleading: int = 1, step_size: float = 0.01, ax: Axes = <module 'matplotlib.pyplot' from '/home/runner/work/lir/lir/.venv/lib/python3.12/site-packages/matplotlib/pyplot.py'>) None
Generate the visual NBE plot using matplotlib.
lir.algorithms.bootstraps module
- class lir.algorithms.bootstraps.Bootstrap(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None)
Bases:
Pipeline,ABCBootstrap system that estimates confidence intervals around the best estimate of a pipeline.
This bootstrap system creates bootstrap samples from the training data, fits the pipeline on each sample, and then computes confidence intervals for the pipeline outputs based on the variability across the bootstrap samples.
Computing these intervals is done by creating interpolation functions that map the best estimate to the difference between the best estimate and the lower and upper bounds of the confidence interval. To achieve this, two subclasses are provided that differ in how the data points for interval estimation are obtained.
BootstrapAtData: Uses the original training data points for interval estimation.
BootstrapEquidistant: Uses equidistant points within the range of the training data for interval estimation.
The AtData variant allows for more complex data types, while the Equidistant variant is only suitable for continuous features.
- steps
- Type:
The steps of s pipeline to be bootstrapped.
- n_bootstraps
- Type:
int: The number of bootstrap samples to generate.
- interval
- Type:
tuple[float, float]: The lower and upper quantiles for the confidence interval.
- seed
- Type:
int | None: The random seed for reproducibility.
- n_points
- Type:
int | None: Number of equidistant points to use for interval estimation (BootstrapEquidistant only).
- f_delta_interval_lower
- Type:
Interpolation function for the lower bound of the interval.
- f_delta_interval_upper
- Type:
Interpolation function for the upper bound of the interval.
- apply(instances: InstanceData) LLRData
Transform the provided instances to include the best estimate and confidence intervals.
- Parameters:
instances – FeatureData: The feature data to transform.
- Return LLRData:
The transformed feature data with best estimate and confidence intervals.
- fit(instances: InstanceData) Self
Fit the bootstrap system to the provided instances.
- Parameters:
instances – FeatureData: The feature data to fit the bootstrap system on.
- Return Self:
The fitted bootstrap system.
- fit_apply(instances: InstanceData) LLRData
Combine fitting and transforming in one step.
- Parameters:
instances – FeatureData: The feature data to fit and transform.
- Return LLRData:
The transformed feature data with best estimate and confidence intervals.
- abstractmethod get_bootstrap_data(instances: InstanceData) InstanceData
Get the data points to use for interval estimation.
- Parameters:
instances – FeatureData: The feature data to fit the bootstrap system on.
- Return FeatureData:
The feature data to use for interval estimation.
- class lir.algorithms.bootstraps.BootstrapAtData(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None)
Bases:
BootstrapBootstrap system that uses the original training data points for interval estimation.
See the Bootstrap class for more details.
- get_bootstrap_data(instances: InstanceData) InstanceData
Get the data points to use for interval estimation. The original training data points are used.
- Parameters:
instances – FeatureData: The feature data to fit the bootstrap system on.
- Return FeatureData:
The feature data to use for interval estimation.
- class lir.algorithms.bootstraps.BootstrapEquidistant(steps: list[tuple[str, Any]], n_bootstraps: int = 400, interval: tuple[float, float] = (0.05, 0.95), seed: int | None = None, n_points: int | None = 1000)
Bases:
BootstrapBootstrap system that uses equidistant points within the range of the training data for interval estimation.
See the Bootstrap class for more details.
- get_bootstrap_data(instances: InstanceData) FeatureData
Get the data points to use for interval estimation.
This is done by creating equidistant points within the range of the training data.
- Parameters:
instances – FeatureData: The feature data to fit the bootstrap system on.
- Return FeatureData:
The feature data to use for interval estimation.
lir.algorithms.invariance_bounds module
Extrapolation bounds on LRs using the Invariance Verification method by Alberink et al. (2025).
See: [-] A transparent method to determine limit values for Likelihood Ratio systems, by
Ivo Alberink, Jeannette Leegwater, Jonas Malmborg, Anders Nordgaard, Marjan Sjerps, Leen van der Ham In: Submitted for publication in 2025.
- class lir.algorithms.invariance_bounds.IVBounder(lower_llr_bound: float | None = None, upper_llr_bound: float | None = None)
Bases:
LLRBounderCalculate Invariance Verification bounds for a given LR system.
Class that, given an LR system, outputs the same LRs as the system but bounded by the Invariance Verification bounds as described in: A transparent method to determine limit values for Likelihood Ratio systems, by Ivo Alberink, Jeannette Leegwater, Jonas Malmborg, Anders Nordgaard, Marjan Sjerps, Leen van der Ham In: Submitted for publication in 2025.
- lir.algorithms.invariance_bounds.calculate_invariance_bounds(llrdata: LLRData, llr_threshold: ndarray | None = None, step_size: float = 0.001, substitute_extremes: tuple[float, float] = (-20, 20)) tuple[float, float, ndarray, ndarray]
Return the upper and lower Invariance Verification bounds of the LRs.
- Parameters:
llrdata – an instance of LLRData containing LLRs and ground-truth labels
llr_threshold – predefined values of LLRs as possible bounds
step_size – required accuracy on a base-10 logarithmic scale
substitute_extremes – (tuple of scalars) substitute for extreme LLRs, i.e. LLRs smaller than the lower value or greater than the upper value are clipped
- lir.algorithms.invariance_bounds.calculate_invariance_delta_functions(llrdata: LLRData, llr_threshold: ndarray) tuple[ndarray, ndarray]
Calculate the Invariance Verification delta functions for a set of LRs at given threshold values.
- Parameters:
llrdata – An instance of LLRData containing LLRs and ground-truth labels
llr_threshold – an array of threshold LLRs
- Returns:
two arrays of delta-values, at all threshold LR values
- lir.algorithms.invariance_bounds.plot_invariance_delta_functions(llrdata: LLRData, llr_threshold_range: tuple[float, float] | None = None, step_size: float = 0.001, ax: Axes | None = None) None
Return a figure of the Invariance Verification delta functions along with the upper and lower bounds of the LRs.
- Parameters:
llrdata – An instance of LLRData containing LLRs and ground-truth labels
llr_threshold_range – lower limit and upper limit for the LLRs to include in the figure
step_size – required accuracy on a base-10 logarithmic scale
ax – matplotlib axes
lir.algorithms.isotonic_regression module
- class lir.algorithms.isotonic_regression.IsotonicCalibrator(add_misleading: int = 0)
Bases:
BaseEstimator,TransformerMixinCalculate LR from a score belonging to one of two distributions using isotonic regression.
Calculates a likelihood ratio of a score value, provided it is from one of two distributions. Uses isotonic regression for interpolation.
In contrast to IsotonicRegression, this class: - has an initialization argument that provides the option of adding misleading data points - outputs logodds instead of probabilities
- fit(X: ndarray, y: ndarray) IsotonicCalibrator
Allow fitting the estimator on the given data.
- transform(X: ndarray) ndarray
Transform a given value, using the fitted Isotonic Regression model.
- class lir.algorithms.isotonic_regression.IsotonicRegression(*, y_min=None, y_max=None, increasing=True, out_of_bounds='nan')
Bases:
IsotonicRegressionWrap SKlearn implementation to support infinite values.
Sklearn implementation IsotonicRegression throws an error when values are Inf or -Inf when in fact IsotonicRegression can handle infinite values. This wrapper around the sklearn implementation of IsotonicRegression prevents the error being thrown when Inf or -Inf values are provided.
- fit(X: ArrayLike, y: ArrayLike, sample_weight: ArrayLike | tuple | None = None) IsotonicRegression
Fit the model using X, y as training data.
- Parameters:
X (array-like of shape (n_samples,)) – Training data.
y (array-like of shape (n_samples,)) – Training target.
sample_weight (array-like of shape (n_samples,), default=None) – Weights. If set to None, all weights will be set to 1 (equal weights).
- Returns:
self – Returns an instance of self.
- Return type:
object
Notes
X is stored for future use, as
transform()needs X to interpolate new input data.
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IsotonicRegression
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') IsotonicRegression
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- transform(T: ArrayLike) ndarray
Transform new data by linear interpolation.
- Parameters:
T (array-like of shape (n_samples,)) – Data to transform.
- Returns:
T_ – The transformed data
- Return type:
array, shape=(n_samples,)
lir.algorithms.kde module
- class lir.algorithms.kde.KDECalibrator(bandwidth: Callable | str | float | tuple[float, float] | None = None)
Bases:
TransformerCalculate LR from a score, belonging to one of two distributions using KDE.
Calculates a likelihood ratio of a score value, provided it is from one of two distributions. Uses kernel density estimation (KDE) for interpolation.
- apply(instances: InstanceData) LLRData
Provide LLR’s as output.
- Parameters:
instances – InstanceData to apply the calibrator to.
- Returns:
LLRData with calibrated log-likelihood ratios.
- static bandwidth_silverman(X: ndarray, y: ndarray) tuple[float, float]
Estimate the optimal bandwidth parameter using Silverman’s rule of thumb.
- Parameters:
X – n * 1 np.array of scores
y – n * 1 np.array of labels (Booleans).
- Returns:
bandwidth for class 0, bandwidth for class 1
- fit(instances: InstanceData) Self
Fit the KDE model on the data.
- lir.algorithms.kde.compensate_and_remove_neginf_inf(log_odds: ndarray, y: ndarray) tuple[ndarray, ndarray, float, float]
For Gaussian and KDE-calibrator fitting: remove negInf, Inf and compensate.
- Parameters:
log_odds – n * 1 np.array of log-odds
y – n * 1 np.array of labels (Booleans).
- Returns:
log_odds (with negInf and Inf removed), y (with negInf and Inf removed), numerator compensator, denominator compensator
- lir.algorithms.kde.parse_bandwidth(bandwidth: Callable | str | float | tuple[float, float] | None) Callable[[Any, Any], tuple[float, float]]
Parse and return the corresponding bandwidth based on input type.
Returns bandwidth as a tuple of two (optional) floats. Extrapolates a single bandwidth.
- Parameters:
bandwidth – provided bandwidth
- Returns:
bandwidth used for kde0, bandwidth used for kde1
lir.algorithms.llr_overestimation module
- lir.algorithms.llr_overestimation.calc_fiducial_density_functions(data: ndarray, grid: ndarray, df_type: str = 'pdf', num_fids: int = 1000, smoothing_grid_fraction: float = 0.1, smoothing_sample_size_correction: float = 1, seed: None | int = None) ndarray
Calculate (smoothed) density functions of fiducial distributions of a dataset.
- Parameters:
data – 1-dimensional array of data points
grid – 1-dimensional array of equally spaced grid points, at which to calculate the density functions
df_type – type of density function (df) to generate: either probability (‘pdf’) or cumulative (‘cdf’)
num_fids – number of fiducial distributions to generate
smoothing_grid_fraction – fraction of grid points to use as half window during smoothing
smoothing_sample_size_correction – value to use for sample size correction of smoothing window; 0 is no correction
seed – seed for random number generator used draw samples from a uniform distribution.
- lir.algorithms.llr_overestimation.calc_llr_overestimation(llrs: ndarray, y: ndarray, num_fids: int = 1000, bw: tuple[str | float, str | float] = ('silverman', 'silverman'), num_grid_points: int = 100, alpha: float = 0.05, **kwargs: Any) tuple[ndarray | None, ndarray | None, ndarray | None]
Calculate the LLR-overestimation as function of the system LLR.
- The LLR-overestimation is defined as the log-10 of the ratio between
the system LRs; the outputs of the LR-system, and
the empirical LRs; the ratio’s between the relative frequencies of the H1-LLRs and H2-LLRs.
It quantifies the deviation from the requirement that ‘the LR of the LR is the LR’: the ‘LR-consistency’.
For a perfect LR-system, the LLR-overestimation is 0: the system and empirical LRs are the same.
A positive LLR-overestimation indicates that the system LRs are too high, compared to the empirical LRs.
An LLR-overestimation of +1 indicates that the system LRs are too high by a factor of 10.
An LLR-overestimation of -1 indicates that the system LRs are too low by a factor of 10.
The relative frequencies are estimated with KDE using Silverman’s rule-of-thumb for the bandwidths.
An interval around the LLR-overestimation can be calculated using fiducial distributions.
- Parameters:
llrs – the log-10 likelihood ratios (LLRs), as calculated by the LR-system
y – the corresponding labels (0 for H2 or Hd, 1 for H1 or Hp)
num_grid_points – number of points used in the grid to calculate the LLR-overestimation on
bw – two bandwidths for the KDEs of H1 & H2; for each specify a method (string) or a value (float)
num_fids – number of fiducial distributions to base the interval on; use 0 for no interval
alpha – level of confidence to use for the interval
kwargs – additional arguments to pass to
calc_fiducial_density_functions()
- Returns:
a tuple of LLRs, their overestimation (best estimate), and their overestimation interval
- lir.algorithms.llr_overestimation.plot_llr_overestimation(llrdata: LLRData, num_fids: int = 1000, ax: Axes = <module 'matplotlib.pyplot' from '/home/runner/work/lir/lir/.venv/lib/python3.12/site-packages/matplotlib/pyplot.py'>, **kwargs: Any) None
Plot the LLR-overestimation as function of the system LLR.
- The LLR-overestimation is defined as the log-10 of the ratio between
the system LRs; the outputs of the LR-system, and
the empirical LRs; the ratio’s between the relative frequencies of the H1-LLRs and H2-LLRs.
See documentation on
calc_llr_overestimation()for more details on the LLR-overestimation.An interval around the LLR-overestimation can be calculated using fiducial distributions.
The average absolute LLR-overestimation can be used as single metric.
- Parameters:
llrdata – An instance of LLRData containing LLRs and ground-truth labels
num_fids – number of fiducial distributions to base the interval on; use 0 for no interval
ax – matplotlib axes to plot into
kwargs – additional arguments to pass to
calc_llr_overestimation()and/orcalc_fiducial_density_functions().
lir.algorithms.logistic_regression module
- class lir.algorithms.logistic_regression.FourParameterLogisticCalibrator
Bases:
TransformerCalculate LR of a score, belonging to one of two distributions, using a logistic model.
Calculates a likelihood ratio of a score value, provided it is from one of two distributions. Depending on the training data, a 2-, 3- or 4-parameter logistic model is used.
- apply(instances: InstanceData) LLRData
Apply the fitted calibrator to new data.
- Parameters:
instances – InstanceData to apply the calibrator to.
- Returns:
LLRData with calibrated log-likelihood ratios.
- fit(instances: InstanceData) Self
Fit the calibrator to data.
- Parameters:
instances – InstanceData to fit the calibrator to.
- Returns:
Self
- class lir.algorithms.logistic_regression.LogitCalibrator(**kwargs: dict)
Bases:
TransformerCalculate LR from a score, belonging to one of two distributions using Logistic Regression.
Calculates a likelihood ratio of a score value, provided it is from one of two distributions. Uses logistic regression for interpolation.
Infinite values in the input are ignored, except if they are misleading, which is an error.
- apply(instances: InstanceData) LLRData
Calculate LLR data from the fitted model, using instance data.
- fit(instances: InstanceData) Self
Fit the model on the data.
lir.algorithms.percentile_rank module
- class lir.algorithms.percentile_rank.PercentileRankTransformer
Bases:
TransformerMixinCompute the percentile rankings of a dataset, relative to another dataset.
Rankings are in range [0, 1]. Handling ties: the maximum of the ranks that would have been assigned to all the tied values is assigned to each value.
To be able to compute the rankings of dataset Z relative to dataset X, fit will create a ranking function for each feature separately, based on X. The method transform will apply ranking of Z based on dataset X.
This class has the methods fit() and transform(), both take a parameter X with one row per instance, e.g. dimensions (n, f) with n = number of measurements, f = number of features. The number of features should be the same in fit() and transform().
If the parameter X has a pair of measurements per row, i.e. has dimensions (n, f, 2), the percentile rank is fitted and applied independently for the first and second measurement of the pair.
Fit: Expects:
X is a numpy array with one row per instance
Transform: Expects:
X is a numpy array with one row per instance
- Returns:
a numpy array with the same shape as X
- fit(X: ndarray, y: ndarray | None = None) PercentileRankTransformer
Fit the transformer model on the data.
- transform(X: ndarray) ndarray
Use the fitted model to transform the input data.