Reference · PottsEvolver.jl

Documentation for PottsEvolver.

PottsEvolver.aa_alphabet
PottsEvolver.codon_alphabet
PottsEvolver.nt_alphabet
BioSequenceMappings.Alignment
PottsEvolver.AASequence
PottsEvolver.AASequence
PottsEvolver.BranchLengthMeaning
PottsEvolver.CodonSequence
PottsEvolver.CodonSequence
PottsEvolver.NumSequence
PottsEvolver.PottsGraph
PottsEvolver.PottsGraph
PottsEvolver.SamplingParameters
PottsEvolver.average_transition_rate
PottsEvolver.energy
PottsEvolver.genetic_code
PottsEvolver.genetic_code
PottsEvolver.mcmc_sample
PottsEvolver.mcmc_sample
PottsEvolver.mcmc_steps!
PottsEvolver.read_graph
PottsEvolver.read_potts_graph

PottsEvolver.mcmc_steps! — Function

mcmc_steps!(
    s::AbstractSequence, g, num_steps::Integer, p::SamplingParameters;
    gibbs_holder, kwargs...
)

Perform num_steps discrete MCMC steps starting from sequence s and using graph g. The step type (:gibbs, :metropolis) and the interpretation of num_steps (:changed, :accepted, :proposed) is set using p (see ?SamplingParameters). Modifies the input sequence s and returns it.

mcmc_steps!(
    sequence::AbstractSequence, g, Tmax::AbstractFloat, parameters::SamplingParameters;
    kwargs...
)
mcmc_steps!(
    state::CTMCState, g, Tmax::AbstractFloat, step_type::Symbol; kwargs...
)

Sample g during time Tmax (continuous) from sequence seq. The step type is p.step_type (see ?SamplingParameters). Modifies the input sequence s and returns it. Allocates a CTMCState (three matrices of order q*L).

Notes

The form with state::CTMCState will use a pre-allocated CTMCState for calculations.
Expects parameters.sampling_type to be :continuous. Fails if otherwise.

PottsEvolver.aa_alphabet — Constant

aa_alphabet
nt_alphabet
codon_alphabet

Default alphabets for PottsEvolver.

PottsEvolver.codon_alphabet — Constant

aa_alphabet
nt_alphabet
codon_alphabet

Default alphabets for PottsEvolver.

PottsEvolver.nt_alphabet — Constant

aa_alphabet
nt_alphabet
codon_alphabet

Default alphabets for PottsEvolver.

BioSequenceMappings.Alignment — Method

Alignment(sequences; alphabet, names, as_codons=true)

Construct a BioSequenceMappings.Alignment from a set of sequences. If the sequences are AASequence, alphabet defaults to aa_alphabet. If they are CodonSequence, as_codons can be used to decide whether the alignment should store codons or amino acids. alphabet can be determined automatically from this.

PottsEvolver.AASequence — Type

mutable struct AASequence{T<:Integer} <: AbstractSequence

Field: seq::Vector{T}. Wrapper around a vector of integers, with implied alphabet PottsEvolver.aa_alphabet.

PottsEvolver.AASequence — Method

AASequence(L; T)

Return a random AASequence{T} of length L.

PottsEvolver.BranchLengthMeaning — Type

BranchLengthMeaning

    type::Symbol
    length::Symbol

Define how the branch lengths should be interpreted when sampling on a tree with a discrete-time algorithm. This is done in two consecutive steps.

Decide whether the branch length represents mcmc steps or sweeps.
- If type==:sweep, multiply the branch length by the length of the sequence.
- Else, if type==:step, leave it as is.
Convert the branch length to an integer value which will be the number of mcmc steps.
- if length==:exact, the branch length is expected to be an integer (rtol=1e-6).
- if length==:round, round it to the closest Int.
- if length==:poisson, sample a poisson distribution with the branch length as mean.

PottsEvolver.CodonSequence — Method

CodonSequence(L::Int; source=:aa, T)

Sample L states at random of the type of source (:aa or :codon):

if :codon, sample codons at random
if :aa, sample amino acids at random and reverse translate them randomly to matching codons

Underlying integer type is T.

PottsEvolver.CodonSequence — Method

CodonSequence(seq::Vector{Integer}; source=:aa)

Build a CodonSequence from seq:

if source==:codon, seq is interpreted as representing codons (see codon_alphabet);
if source==:aa, seq is interpreted as representing amino acids (see aa_alphabet); matching codons are randomly chosen using the PottsEvolver.reverse_code_rand method.

PottsEvolver.NumSequence — Type

NumSequence{T<:Integer, q}

A mutable struct representing a sequence of integers with a maximum value constraint q.

Explicit Construction: NumSequence(seq::AbstractVector{T}, q::Integer) or NumSequence{T,q}(seq)
Random Construction: NumSequence{T,q}(L::Integer) or NumSequence(L::Integer, q::Integer; T=IntType). Construct a NumSequence of length L with random integers of type T in the range [1, q].

Examples

julia> seq = [1, 2, 3, 4]
julia> num_seq = NumSequence(seq, 4)

julia> random_seq = NumSequence(10, 5; T=Int8)


julia> max_value = num_seq.q  # Returns 4


julia> copied_seq = copy(num_seq)

PottsEvolver.PottsGraph — Type

PottsGraph(L, q[, T]; init = :null, alphabet)

Return a PottsGraph{T} of the required size.

init == :null: parameters are intialized to zero.
init == :rand: parameters are randomly sampled using Jrand and hrand keywords.

alphabet is aa_alphabet if q=21, nothing otherwise.

Random initialization

hrand should be a function q -> h.
Jrand should be a function q -> J.

Jrand does not have to return a symetric matrix. The output matrix is made symetric with zeroes on the diagonal blocks.

PottsEvolver.PottsGraph — Type

PottsGraph{T}

Array J of dimensions q x q x L x L and eltype T
Array h of dimensions q x L and eltype T
Inverse temperature β
alphabet

PottsEvolver.SamplingParameters — Type

mutable struct SamplingParameters

Construct using keyword arguments:

sampling_type::Symbol = :discrete
step_type::Symbol = :gibbs
Teq::Real = 0 # time between samples, for chain
burnin::Real = 5*Teq # time before first sample
step_meaning::Symbol = :accepted # relevant for :discrete
fraction_gap_step::Float64 = 0.9 # relevant for :discrete and codon sampling
branchlength_meaning::BranchLengthMeaning # relevant for :discrete and sampling on a tree
substitution_rate::Union{Nothing,Float64} = nothing # relevant for :continuous
track_substitutions::Bool = false # relevant for continuous

sampling_type can be :discrete or :continuous. The time between samples is Teq, and an extra time burnin is taken before the first sample. Allowed values of step_type differ between the continuous and discrete cases.

Discrete sampling

Teq is measured in swaps: accepted (or attempted) changes of one sequence position. It must be an integer.
burnin: number of steps starting from the initial sequence before the first sample. Must be an integer.
step_type: :gibbs only. I intend to implement :metropolis and maybe :glauber.
step_meaning (:discrete only): whether a swap count towards equilibration or not. It can take three values
- :proposed: all mcmc steps count towards equilibration
- :accepted: only accepted steps count (all steps for non-codon Gibbs)
- :changed: only steps that lead to a change count (Gibbs can resample the same state)
Note: Gibbs steps for codons are more complicated, since they involve the possibility Metropolis step for gaps, which can be rejected.
fraction_gap_step: fraction of :metropolis steps concerning gaps when sampling with codons. This is only relevant for discrete sampling with codons.
branchlength_meaning: how branch-lengths on a tree must be converted to swaps. See ?BranchLengthMeaning for information. This is only useful if you sample along branches of a tree.

Continuous sampling

For continuous sampling, there is no notion of swap/sweep or of accepted/rejected steps. Since any positive real number is acceptable as a sampling time, branch lenghts of trees can be used directly.

Teq and burnin are floats.
step_type can be :metropolis, :glauber or :sqrt.
substitution_rate is the average substitution rate for a given Potts model (the average runs over sequences). It is the result of average_substitution_rate. This is computed automatically if not provided (but takes some time).
track_substitutions: track all substitutions (position, state, time) occuring during the Gillespie simulation. They are returned in the info output of mcmc_sample.

PottsEvolver.average_transition_rate — Method

average_transition_rate(g::PottsGraph, step_type, s0::AbstractSequence)
average_transition_rate(g::PottsGraph, step_type, S::Vector{<:AbstractSequence})
average_transition_rate(
    g::PottsGraph, step_type, fastafile::AbstractString; seq_type=AASequence
)

Compute the average transition rate for the continuous time Markov chain based on g. A sample from g is generated using a discrete time Markov chain for this purpose.

In the first form, a sample from g is used for averaging. s0 is only used to initialize the CTMCState and provide a type (codon or aa), and the keyword arguments are used to parametrize the sampling process.

In the second form, the sample is provided as argument, either as a vector of sequences or as a fasta file.

PottsEvolver.energy — Method

energy(s, g::PottsGraph)

Energy of sequence s in g.

PottsEvolver.genetic_code — Method

genetic_code(c::Codon)

Translate c and return the amino acid as a Char.

PottsEvolver.genetic_code — Method

genetic_code(x::Integer)

Translate the ith codon and return the index of the corresponding amino acid, using the default aa_alphabet

PottsEvolver.mcmc_sample — Function

mcmc_sample(
    g::PottsGraph, M::Integer, s0::AbstractSequence, params::SamplingParameters;
    rng=Random.GLOBAL_RNG, verbose=0, progress_meter=true, alignment_output=true,
)
mcmc_sample(
    g::PottsGraph, M::Integer, params::SamplingParameters; init=:random_num, kwargs...)
)
mcmc_sample(g, tvals::AbstractVector, s0, params::SamplingParameters; kwargs...)

First form: sample g for M steps starting from sequence s0, using parameters in params. Return value: named tuple with fields

sequences: alignment (or vector) of sequences
tvals: vector with the number of steps at each sample
info: information about the run
params: parameters of the run.

Second form: same, but initial sequence is provided through the init kwarg. See ?get_init_sequence for details on how the initial sequence is determined from init.

Third form: provide a set of times tvals at which samples are taken. Can also be used with the init kwarg.

Note: this function is not very efficient if M is small.

Sampling details are determined by parameters, see ?SamplingParameters. Whether to use the genetic code is determined by the type of the init sequence: it is used if init::CodonSequence, otherwise not.

PottsEvolver.mcmc_sample — Method

mcmc_sample(g, tree, M=1, params; alignment_output, translate_output, init, kwargs...)

Sample g along branches of tree. Repeat the process M times, returning an array of named tuples of the form (; tree, leaf_sequences, internal_sequences). If M is omitted, the output is just a named tuple (no array). Sequences in leaf_sequences and internal_sequences are sorted in post-order traversal.

The sequence to be used as the root should be provided using the init kwarg, see ?PottsEvolver.get_init_sequence.

If alignment_output, the sequences will be wrapped into an Alignment structure. Otherwise, they are in a dictionary indexed by node label. If translate_output and if the root sequence was a CodonSequence, the output alignment will contain the amino acid sequence and not the codons.

Warning

The Teq field of params is not used in the sampling. However, the burnin field will be used to set the root sequence: burnin mcmc steps will be performed starting from the input sequence, and the result is placed at the root. If you want a precise root sequence to be used, set burnin=0 in params.

PottsEvolver.read_graph — Function

read_graph
read_potts_graph

Read PottsGraph object from a file.

PottsEvolver.read_potts_graph — Function

read_potts_graph

Alias for read_graph.