BioCCP

Intro

BioCCP.jl applies the Coupon Collector's Problem to combinatorial biotechnology, in particular to aid minimum sample size determination of screening experiments.

Modular designs are considered, created by randomly combining r modules from a set of navailable modules. The module probabilities during the generation of the designs are specified by a probability/abundance vector p_vec. Depending on how many complete sets of modules one wants to observe, parameter m can be increased from its default value of 1 to a higher value.

For a specific combinatorial design set-up of interest, a report with results regarding minimum sample sizes can be easily retrieved by using the provided Pluto notebook.

Functions

BioCCP.expectation_minsamplesizeFunction
expectation_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)

Calculates the expected number of designs needed E[T], the minimum sample size to observe each module at least m times.

  • n: number of modules in the design space
  • p_vec: vector with the probabilities or abundances of the different modules
  • m: number of complete sets of modules that need to be collected
  • r: number of modules per design
  • normalize: if true, normalize p_vec

References:

  • Doumas, A. V., & Papanicolaou, V. G. (2016). The coupon collector’s problem revisited: generalizing the double Dixie cup problem of Newman and Shepp. ESAIM: Probability and Statistics, 20, 367-399.
  • Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.

Examples

julia> n = 100
julia> expectation_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
519.0
BioCCP.std_minsamplesizeFunction
std_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)

Calculates the standard deviation on the number of designs needed std[T], the standard deviation on the minimum sample size to observe each module at least m times.

  • n: number of modules in the design space
  • p_vec: vector with the probabilities or abundances of the different modules
  • m: number of complete sets of modules that need to be collected
  • r: number of modules per design
  • normalize: if true, normalize p_vec

Examples

julia> n = 100
julia> std_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
126.0
BioCCP.success_probabilityFunction
success_probability(n, t; p_vec = ones(n), m = 1, r = 1, normalize = true)

Calculates the success probability F(t) = P(T < t) or the chance that the required number of designs to see each module at least m times is smaller than t.

  • n: number of modules in design space
  • t: sample size/number of designs for which to calculate the success probability
  • p_vec: vector with the probabilities or abundances of the different modules
  • m: number of complete sets of modules that need to be collected
  • r: number of modules per design
  • normalize: if true, normalize p_vec

References:

  • Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.

Examples

julia> n = 100
julia> t = 600
julia> success_probability(n, t; p_vec = ones(n), m = 1, r = 1, normalize = true)
0.7802171997092149
BioCCP.expectation_fraction_collectedFunction
expectation_fraction_collected(n, t; p_vec = ones(n), r = 1, normalize=true)

Calculates the expected fraction of all modules observed after collecting tdesigns.

  • n: number of modules in design space
  • t: sample size/number of designs for which to calculate the expected fraction of modules observed
  • p_vec: vector with the probabilities or abundances of the different modules
  • r: number of modules per design
  • normalize: if true, normalize p_vec

References:

  • Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.

Examples

julia> n = 100
julia> t = 200
julia> expectation_fraction_collected(n, t; p_vec = ones(n), r = 1, normalize=true)
0.8660203251420364
BioCCP.prob_occurence_moduleFunction
prob_occurence_module(p, t, j)

Calculates probability that specific module with module probability p has occured j times after collecting t designs.

Sampling of modules are assumed to be independent Poisson processes.

  • p: module probaility
  • t: sample size/number of designs
  • j: number of occurence

References:

  • Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.

Examples

julia> p = 0.005
julia> t = 500
julia> t = 2
julia> prob_occurence_module(p, t, j)
0.25651562069968376

References

  • Doumas, A. V., & Papanicolaou, V. G. (2016). The coupon collector’s problem revisited: generalizing the double Dixie cup problem of Newman and Shepp. ESAIM: Probability and Statistics, 20, 367-399.
  • Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.