lir.lrsystems.two_level module

class lir.lrsystems.two_level.TwoLevelModelNormalKDE[source]

Bases: object

Implement two-level model as outlined by Bolck et al.

An implementation of the two-level model as outlined in FSI191(2009)42 by Bolck et al. “Different likelihood

ratio approaches to evaluate the strength of evidence of MDMA tablet comparisons”.

Model description:

Definitions X_ij = vector, measurement of reference j, ith repetition, with i=1..n Y_kl = vector, measurement of trace l, kth repetition, with k=1..m

Model:

First level of variance: X_ij ~ N(theta_j, sigma_within) Y_kl ~ N(theta_k, sigma_within) where theta_j is the true but unknown mean of the reference and theta_k the true but unknown mean of the trace. sigma_within is assumed equal for trace and reference (and for repeated measurements of some background data)

Second level of variance: theta_j ~ theta_k ~ KDE(means background database, h) with h the kernel bandwidth.

H1: theta_j = theta_k H2: theta_j independent of theta_k

Numerator LR = Integral_theta N(X_Mean|theta, sigma_within, n) * N(Y_mean|theta, sigma_within, m) * KDE(theta|means background database, h) Denominator LR = Integral_theta N(X_Mean|theta, sigma_within, n) * KDE(theta|means background database, h) * Integral_theta N(Y_Mean|theta, sigma_within, m) * KDE(theta|means background database, h)

In Bolck et al. in the appendix one finds a closed-form solution for the evaluation of these integrals.

sigma_within and h (and other parameters) are estimated from repeated measurements of background data.

fit_on_unpaired_instances(X: ndarray, y: ndarray) → TwoLevelModelNormalKDE[source]

Fit the model on unpaired instances.

X np.ndarray of measurements, rows are sources/repetitions, columns are features y np 1d-array of labels. For each source a unique identifier (label). Repetitions get the same label.

Construct the necessary matrices/scores/etc based on test data (X) so that we can predict a score later on. Store any calculated parameters in self.

Parameters:

X (np.ndarray) – Value passed via X.
y (np.ndarray) – Value passed via y.

Returns:

Fitted two-level KDE model instance.

Return type:

‘TwoLevelModelNormalKDE’

predict_proba(X_trace: ndarray, X_ref: ndarray) → ndarray[source]

Predict probability scores, using the fitted model.

Predict probability scores, making use of the parameters constructed during self.fit() (which should now be stored in self).

X_trace measurements of trace object. np.ndarray of shape (instances, repetitions_trace, features) X_ref measurements of reference object. np.ndarray of shape (instances, repetitions_ref, features)

returns: probabilities for same source and different source: np.ndarray with shape (instances, 2)

Parameters:

X_trace (np.ndarray) – Value passed via X_trace.
X_ref (np.ndarray) – Value passed via X_ref.

Returns:

Two-column probability matrix for Hd and Hp.

Return type:

np.ndarray

transform(X_trace: ndarray, X_ref: ndarray) → ndarray[source]

Transform the input data using the fitted model.

Predict odds scores, making use of the parameters constructed during self.fit() (which should now be stored in self).

X_trace measurements of trace object. np.ndarray of shape (instances, repetitions_trace, features) X_ref measurements of reference object. np.ndarray of shape (instances, repetitions_ref, features)

returns: odds of same source / different source: one-dimensional np.ndarray with one element per instance

Parameters:

X_trace (np.ndarray) – Value passed via X_trace.
X_ref (np.ndarray) – Value passed via X_ref.

Returns:

Log10 LR scores for each trace/reference pair.

Return type:

np.ndarray

class lir.lrsystems.two_level.TwoLevelSystem(preprocessing_pipeline: Transformer | None, pairing_function: PairingMethod, postprocessing_pipeline: Transformer | None, n_trace_instances: int, n_ref_instances: int)[source]

Bases: LRSystem

Implement two level model, common-source feature-based LR system architecture.

During the training phase, the system calculates statistics on the unpaired instances. On application, it calculates LRs for same-source and different-source pairs. Each side of the pair may consist of multiple instances.