Documentation for PottsEvolver.
PottsEvolver.aa_alphabetPottsEvolver.codon_alphabetPottsEvolver.nt_alphabetBioSequenceMappings.AlignmentPottsEvolver.AASequencePottsEvolver.AASequencePottsEvolver.BranchLengthMeaningPottsEvolver.CodonSequencePottsEvolver.CodonSequencePottsEvolver.NumSequencePottsEvolver.PottsGraphPottsEvolver.PottsGraphPottsEvolver.SamplingParametersPottsEvolver.average_transition_ratePottsEvolver.energyPottsEvolver.genetic_codePottsEvolver.genetic_codePottsEvolver.mcmc_samplePottsEvolver.mcmc_samplePottsEvolver.mcmc_steps!PottsEvolver.read_graphPottsEvolver.read_potts_graph
PottsEvolver.mcmc_steps! — Function
mcmc_steps!(
s::AbstractSequence, g, num_steps::Integer, p::SamplingParameters;
gibbs_holder, kwargs...
)Perform num_steps discrete MCMC steps starting from sequence s and using graph g. The step type (:gibbs, :metropolis) and the interpretation of num_steps (:changed, :accepted, :proposed) is set using p (see ?SamplingParameters). Modifies the input sequence s and returns it.
mcmc_steps!(
sequence::AbstractSequence, g, Tmax::AbstractFloat, parameters::SamplingParameters;
kwargs...
)
mcmc_steps!(
state::CTMCState, g, Tmax::AbstractFloat, step_type::Symbol; kwargs...
)Sample g during time Tmax (continuous) from sequence seq. The step type is p.step_type (see ?SamplingParameters). Modifies the input sequence s and returns it. Allocates a CTMCState (three matrices of order q*L).
Notes
- The form with
state::CTMCStatewill use a pre-allocatedCTMCStatefor calculations. - Expects
parameters.sampling_typeto be:continuous. Fails if otherwise.
PottsEvolver.aa_alphabet — Constant
aa_alphabet
nt_alphabet
codon_alphabetDefault alphabets for PottsEvolver.
PottsEvolver.codon_alphabet — Constant
aa_alphabet
nt_alphabet
codon_alphabetDefault alphabets for PottsEvolver.
PottsEvolver.nt_alphabet — Constant
aa_alphabet
nt_alphabet
codon_alphabetDefault alphabets for PottsEvolver.
BioSequenceMappings.Alignment — Method
Alignment(sequences; alphabet, names, as_codons=true)Construct a BioSequenceMappings.Alignment from a set of sequences. If the sequences are AASequence, alphabet defaults to aa_alphabet. If they are CodonSequence, as_codons can be used to decide whether the alignment should store codons or amino acids. alphabet can be determined automatically from this.
PottsEvolver.AASequence — Type
mutable struct AASequence{T<:Integer} <: AbstractSequenceField: seq::Vector{T}. Wrapper around a vector of integers, with implied alphabet PottsEvolver.aa_alphabet.
PottsEvolver.AASequence — Method
AASequence(L; T)Return a random AASequence{T} of length L.
PottsEvolver.BranchLengthMeaning — Type
BranchLengthMeaning
type::Symbol
length::SymbolDefine how the branch lengths should be interpreted when sampling on a tree with a discrete-time algorithm. This is done in two consecutive steps.
- Decide whether the branch length represents mcmc steps or sweeps.
- If
type==:sweep, multiply the branch length by the length of the sequence. - Else, if
type==:step, leave it as is.
- If
- Convert the branch length to an integer value which will be the number of mcmc steps.
- if
length==:exact, the branch length is expected to be an integer (rtol=1e-6). - if
length==:round, round it to the closestInt. - if
length==:poisson, sample a poisson distribution with the branch length as mean.
- if
PottsEvolver.CodonSequence — Method
CodonSequence(L::Int; source=:aa, T)Sample L states at random of the type of source (:aa or :codon):
- if
:codon, sample codons at random - if
:aa, sample amino acids at random and reverse translate them randomly to matching codons
Underlying integer type is T.
PottsEvolver.CodonSequence — Method
CodonSequence(seq::Vector{Integer}; source=:aa)Build a CodonSequence from seq:
- if
source==:codon,seqis interpreted as representing codons (seecodon_alphabet); - if
source==:aa,seqis interpreted as representing amino acids (seeaa_alphabet); matching codons are randomly chosen using thePottsEvolver.reverse_code_randmethod.
PottsEvolver.NumSequence — Type
NumSequence{T<:Integer, q}A mutable struct representing a sequence of integers with a maximum value constraint q.
- Explicit Construction:
NumSequence(seq::AbstractVector{T}, q::Integer)orNumSequence{T,q}(seq) - Random Construction:
NumSequence{T,q}(L::Integer)orNumSequence(L::Integer, q::Integer; T=IntType). Construct aNumSequenceof lengthLwith random integers of typeTin the range[1, q].
Examples
julia> seq = [1, 2, 3, 4]
julia> num_seq = NumSequence(seq, 4)
julia> random_seq = NumSequence(10, 5; T=Int8)
julia> max_value = num_seq.q # Returns 4
julia> copied_seq = copy(num_seq)PottsEvolver.PottsGraph — Type
PottsGraph(L, q[, T]; init = :null, alphabet)Return a PottsGraph{T} of the required size.
init == :null: parameters are intialized to zero.init == :rand: parameters are randomly sampled usingJrandandhrandkeywords.
alphabet is aa_alphabet if q=21, nothing otherwise.
Random initialization
hrandshould be a functionq -> h.Jrandshould be a functionq -> J.
Jrand does not have to return a symetric matrix. The output matrix is made symetric with zeroes on the diagonal blocks.
PottsEvolver.PottsGraph — Type
PottsGraph{T}- Array
Jof dimensionsq x q x L x Land eltypeT - Array
hof dimensionsq x Land eltypeT - Inverse temperature
β alphabet
PottsEvolver.SamplingParameters — Type
mutable struct SamplingParametersConstruct using keyword arguments:
sampling_type::Symbol = :discrete
step_type::Symbol = :gibbs
Teq::Real = 0 # time between samples, for chain
burnin::Real = 5*Teq # time before first sample
step_meaning::Symbol = :accepted # relevant for :discrete
fraction_gap_step::Float64 = 0.9 # relevant for :discrete and codon sampling
branchlength_meaning::BranchLengthMeaning # relevant for :discrete and sampling on a tree
substitution_rate::Union{Nothing,Float64} = nothing # relevant for :continuous
track_substitutions::Bool = false # relevant for continuoussampling_type can be :discrete or :continuous. The time between samples is Teq, and an extra time burnin is taken before the first sample. Allowed values of step_type differ between the continuous and discrete cases.
Discrete sampling
Teqis measured in swaps: accepted (or attempted) changes of one sequence position. It must be an integer.burnin: number of steps starting from the initial sequence before the first sample. Must be an integer.step_type::gibbsonly. I intend to implement:metropolisand maybe:glauber.step_meaning(:discrete only): whether a swap count towards equilibration or not. It can take three values:proposed: all mcmc steps count towards equilibration:accepted: only accepted steps count (all steps for non-codon Gibbs):changed: only steps that lead to a change count (Gibbs can resample the same state)
Note: Gibbs steps for codons are more complicated, since they involve the possibility Metropolis step for gaps, which can be rejected.
fraction_gap_step: fraction of:metropolissteps concerning gaps when sampling with codons. This is only relevant for discrete sampling with codons.branchlength_meaning: how branch-lengths on a tree must be converted to swaps. See?BranchLengthMeaningfor information. This is only useful if you sample along branches of a tree.
Continuous sampling
For continuous sampling, there is no notion of swap/sweep or of accepted/rejected steps. Since any positive real number is acceptable as a sampling time, branch lenghts of trees can be used directly.
Teqandburninare floats.step_typecan be:metropolis,:glauberor:sqrt.substitution_rateis the average substitution rate for a given Potts model (the average runs over sequences). It is the result ofaverage_substitution_rate. This is computed automatically if not provided (but takes some time).track_substitutions: track all substitutions (position, state, time) occuring during the Gillespie simulation. They are returned in theinfooutput ofmcmc_sample.
PottsEvolver.average_transition_rate — Method
average_transition_rate(g::PottsGraph, step_type, s0::AbstractSequence)
average_transition_rate(g::PottsGraph, step_type, S::Vector{<:AbstractSequence})
average_transition_rate(
g::PottsGraph, step_type, fastafile::AbstractString; seq_type=AASequence
)Compute the average transition rate for the continuous time Markov chain based on g. A sample from g is generated using a discrete time Markov chain for this purpose.
In the first form, a sample from g is used for averaging. s0 is only used to initialize the CTMCState and provide a type (codon or aa), and the keyword arguments are used to parametrize the sampling process.
In the second form, the sample is provided as argument, either as a vector of sequences or as a fasta file.
PottsEvolver.energy — Method
energy(s, g::PottsGraph)Energy of sequence s in g.
PottsEvolver.genetic_code — Method
genetic_code(c::Codon)Translate c and return the amino acid as a Char.
PottsEvolver.genetic_code — Method
genetic_code(x::Integer)Translate the ith codon and return the index of the corresponding amino acid, using the default aa_alphabet
PottsEvolver.mcmc_sample — Function
mcmc_sample(
g::PottsGraph, M::Integer, s0::AbstractSequence, params::SamplingParameters;
rng=Random.GLOBAL_RNG, verbose=0, progress_meter=true, alignment_output=true,
)
mcmc_sample(
g::PottsGraph, M::Integer, params::SamplingParameters; init=:random_num, kwargs...)
)
mcmc_sample(g, tvals::AbstractVector, s0, params::SamplingParameters; kwargs...)First form: sample g for M steps starting from sequence s0, using parameters in params. Return value: named tuple with fields
sequences: alignment (or vector) of sequencestvals: vector with the number of steps at each sampleinfo: information about the runparams: parameters of the run.
Second form: same, but initial sequence is provided through the init kwarg. See ?get_init_sequence for details on how the initial sequence is determined from init.
Third form: provide a set of times tvals at which samples are taken. Can also be used with the init kwarg.
Note: this function is not very efficient if M is small.
Sampling details are determined by parameters, see ?SamplingParameters. Whether to use the genetic code is determined by the type of the init sequence: it is used if init::CodonSequence, otherwise not.
PottsEvolver.mcmc_sample — Method
mcmc_sample(g, tree, M=1, params; alignment_output, translate_output, init, kwargs...)Sample g along branches of tree. Repeat the process M times, returning an array of named tuples of the form (; tree, leaf_sequences, internal_sequences). If M is omitted, the output is just a named tuple (no array). Sequences in leaf_sequences and internal_sequences are sorted in post-order traversal.
The sequence to be used as the root should be provided using the init kwarg, see ?PottsEvolver.get_init_sequence.
If alignment_output, the sequences will be wrapped into an Alignment structure. Otherwise, they are in a dictionary indexed by node label. If translate_output and if the root sequence was a CodonSequence, the output alignment will contain the amino acid sequence and not the codons.
Warning
The Teq field of params is not used in the sampling. However, the burnin field will be used to set the root sequence: burnin mcmc steps will be performed starting from the input sequence, and the result is placed at the root. If you want a precise root sequence to be used, set burnin=0 in params.
PottsEvolver.read_graph — Function
read_graph
read_potts_graphRead PottsGraph object from a file.
PottsEvolver.read_potts_graph — Function
read_potts_graphAlias for read_graph.