Utilities

Hamming distance

hamming(x, y) returns the normalized Hamming distance between two integer vectors.

pairwise_hamming(A) returns a vector containing all Hamming distances between pairs of sequences in A. For large alignments, it is often practical to consider only a subset of Hamming distances: the step=n keyword can be used to only consider every nth sequence. The function is also adapted to two alignments: [pairwise_hamming(A,B)] will consider all pairs of sequences with one member in A and the other one in B.

Statistics

the profile of the alignment: site_specific_frequencies(A) returns a q x L matrix where q is the size of the alphabet and L the length of sequences, with element (a, i) being the fraction of sequences in which character a was found at position i.
the pairwise frequencies: pairwise_frequencies(A)
the pairwise correlations: pairwise_correlations(A)

All these functions accept a vector of weights as a second argument, and will by default use A.weights if it is not provided.

Weights

In DCA-like algorithms, it is customary to weight sequences by their degree of phylogenetic relations. Typically, the weight of a sequence is inversely proportional to the number of other sequences with a Hamming distance smaller than some threshold. For an alignment X, computing the weights is done using compute_weights:


julia> A = read_fasta("../../example/toy_fasta_dna.fasta")Alignment of M=5 sequences of length L=10 - Shown as `MxL` matrix
5×10 adjoint(::Matrix{Int64}) with eltype Int64:
 5  3  4  2  4  1  5  1  5  5
 2  4  4  1  4  2  3  2  2  3
 4  3  5  3  3  5  5  3  5  1
 3  3  5  2  2  1  5  3  1  2
 2  3  2  5  3  2  4  3  2  4
julia> compute_weights(A) # compute and return the weight vector, as well as the effective number of sequences([0.2, 0.2, 0.2, 0.2, 0.2], 5.0)
julia> compute_weights!(A); # same, but sets the weights field in A
julia> A.weights5-element Vector{Float64}:
 0.2
 0.2
 0.2
 0.2
 0.2

Utilities

Hamming distance

Alignment statistics

Pairwise Hamming distance

Statistics

Weights