BioCCP
Intro
BioCCP.jl applies the Coupon Collector's Problem to combinatorial biotechnology, in particular to aid minimum sample size determination of screening experiments.
Modular designs are considered, created by randomly combining r
modules from a set of n
available modules. The module probabilities during the generation of the designs are specified by a probability/abundance vector p_vec
. Depending on how many complete sets of modules one wants to observe, parameter m
can be increased from its default value of 1 to a higher value.
For a specific combinatorial design set-up of interest, a report with results regarding minimum sample sizes can be easily retrieved by using the provided Pluto notebook.
Functions
BioCCP.expectation_minsamplesize
— Functionexpectation_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
Calculates the expected number of designs needed E[T]
, the minimum sample size to observe each module at least m
times.
n
: number of modules in the design spacep_vec
: vector with the probabilities or abundances of the different modulesm
: number of complete sets of modules that need to be collectedr
: number of modules per design- normalize: if true, normalize
p_vec
References:
- Doumas, A. V., & Papanicolaou, V. G. (2016). The coupon collector’s problem revisited: generalizing the double Dixie cup problem of Newman and Shepp. ESAIM: Probability and Statistics, 20, 367-399.
- Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.
Examples
julia> n = 100
julia> expectation_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
519.0
BioCCP.std_minsamplesize
— Functionstd_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
Calculates the standard deviation on the number of designs needed std[T]
, the standard deviation on the minimum sample size to observe each module at least m
times.
n
: number of modules in the design spacep_vec
: vector with the probabilities or abundances of the different modulesm
: number of complete sets of modules that need to be collectedr
: number of modules per design- normalize: if true, normalize
p_vec
Examples
julia> n = 100
julia> std_minsamplesize(n; p_vec = ones(n), m = 1, r = 1, normalize = true)
126.0
BioCCP.success_probability
— Functionsuccess_probability(n, t; p_vec = ones(n), m = 1, r = 1, normalize = true)
Calculates the success probability F(t) = P(T < t)
or the chance that the required number of designs to see each module at least m
times is smaller than t
.
n
: number of modules in design spacet
: sample size/number of designs for which to calculate the success probabilityp_vec
: vector with the probabilities or abundances of the different modulesm
: number of complete sets of modules that need to be collectedr
: number of modules per design- normalize: if true, normalize
p_vec
References:
- Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.
Examples
julia> n = 100
julia> t = 600
julia> success_probability(n, t; p_vec = ones(n), m = 1, r = 1, normalize = true)
0.7802171997092149
BioCCP.expectation_fraction_collected
— Functionexpectation_fraction_collected(n, t; p_vec = ones(n), r = 1, normalize=true)
Calculates the expected fraction of all modules observed after collecting t
designs.
n
: number of modules in design spacet
: sample size/number of designs for which to calculate the expected fraction of modules observedp_vec
: vector with the probabilities or abundances of the different modulesr
: number of modules per design- normalize: if true, normalize
p_vec
References:
- Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.
Examples
julia> n = 100
julia> t = 200
julia> expectation_fraction_collected(n, t; p_vec = ones(n), r = 1, normalize=true)
0.8660203251420364
BioCCP.prob_occurence_module
— Functionprob_occurence_module(p, t, j)
Calculates probability that specific module with module probability p
has occured j
times after collecting t
designs.
Sampling of modules are assumed to be independent Poisson processes.
p
: module probailityt
: sample size/number of designsj
: number of occurence
References:
- Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.
Examples
julia> p = 0.005
julia> t = 500
julia> t = 2
julia> prob_occurence_module(p, t, j)
0.25651562069968376
References
- Doumas, A. V., & Papanicolaou, V. G. (2016). The coupon collector’s problem revisited: generalizing the double Dixie cup problem of Newman and Shepp. ESAIM: Probability and Statistics, 20, 367-399.
- Boneh, A., & Hofri, M. (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Stochastic Models, 13(1), 39-66.