lir.algorithms.mcmc module

class lir.algorithms.mcmc.McmcLLRModel(distribution_h1: str, parameters_h1: dict[str, dict[str, float | int | str]] | None, distribution_h2: str, parameters_h2: dict[str, dict[str, float | int | str]] | None, bounding: Callable[[], ~lir.bounding.LLRBounder] | None=<class 'lir.algorithms.bayeserror.ELUBBounder'>, interval: tuple[float, float]=(0.05, 0.95), **mcmc_kwargs: Any)[source]

Bases: Transformer

Use Markov Chain Monte Carlo simulations to fit a statistical distribution for each of the two hypotheses.

Using samples from the posterior distributions of the model parameters, a posterior distribution of the LR is obtained. The median of this distribution is used as best estimate for the LR; a credible interval is also determined.

Parameters:
  • distribution_h1 (str) – Statistical distribution used to model H1.

  • parameters_h1 (dict[str, dict[str, float | int | str]] | None) – Parameter definitions and priors for the H1 distribution.

  • distribution_h2 (str) – Statistical distribution used to model H2.

  • parameters_h2 (dict[str, dict[str, float | int | str]] | None) – Parameter definitions and priors for the H2 distribution.

  • bounding (Callable[[], LLRBounder] | None, optional) – Bounding method factory to prevent over-extrapolation.

  • interval (tuple[float, float], optional) – Lower and upper bounds of the credible interval in range [0, 1].

  • **mcmc_kwargs (Any) – Additional MCMC simulation settings passed to McmcModel.

apply(instances: InstanceData) LLRData[source]

Apply the fitted model to the supplied instances.

Parameters:

instances (InstanceData) – Instances to transform.

Returns:

LLR estimates with median and credible interval columns.

Return type:

LLRData

fit(instances: InstanceData) Self[source]

Fit the defined model to the supplied instances.

Parameters:

instances (InstanceData) – Training instances.

Returns:

Fitted model.

Return type:

Self

class lir.algorithms.mcmc.McmcModel(distribution: str, parameters: dict[str, dict[str, float | int | str]] | None, chain_count: int = 4, tune_count: int = 1000, draw_count: int = 1000, random_seed: int | None = None)[source]

Bases: object

Use Markov Chain Monte Carlo simulations to fit a statistical distribution.

Parameters:
  • distribution (str) – Statistical distribution used, for example ‘normal’ or ‘binomial’.

  • parameters (dict[str, dict[str, float | int | str]] | None) – Definitions of distribution parameters and their prior distributions.

  • chain_count (int, optional) – Number of parallel MCMC chains.

  • tune_count (int, optional) – Number of tune/warm-up/burn-in samples per chain.

  • draw_count (int, optional) – Number of posterior draws per chain.

  • random_seed (int | None, optional) – Random seed.

Notes

Supported distributions are betabinomial, binomial, and normal. Parameter names follow the PyMC naming conventions. The parameters dictionary maps each model parameter to a prior specification containing a prior key and the corresponding prior parameters.

fit(features: ndarray) Self[source]

Draw samples from the posterior distributions of the parameters of a specified statistical distribution.

The posteriors are based on the specified prior distributions of these parameters and observed feature values.

Parameters:

features (np.ndarray) – Observed feature values used to update parameter priors.

Returns:

Fitted model with posterior samples.

Return type:

Self

transform(features: ndarray) ndarray[source]

Get samples of the posterior distribution of the (log10) probability.

Use the samples of the posterior distributions of the parameters, in combination with the selected statistical distribution, to get samples of the posterior distribution of the (log10) probability, evaluated for specified feature values.

Parameters:

features (np.ndarray) – Feature values for which probabilities are calculated.

Returns:

Samples of log10 probabilities.

Return type:

np.ndarray