Options
run_treeknit!
uses a topogy based heuristic optimization to find maximally compatible clades. Options to this heuristic are provided through the OptArgs
object, that is optionally passed as a second argument. Essential options are detailed here.
Note that when the OptArgs()
object is initialized it's default parameters are the same as those for running TreeKnit
for 2 trees. When instead the constructor OptArgs(K::Int)
is called, the default parameters for running TreeKnit
with K
trees are returned.
The default parameters for K=2
trees are the parameters for the :better_MCCs
method, whereas the default parameters for K>2
trees are the parameters for the :better_trees
method. See TreeKnit Methods.
Parsimony parameter
The heuristic method used by TreeKnit tries to prune consistent clades from a pair of trees in order to increase a compatibility score between other clades. Pruning a clade is interpreted as fixing a reassortment right above it, while increasing the compatibility between remaining clades removes reassortments. A purely parsimonious heuristic should thus give the same weight to fixing a reassortment through pruning a clade and fixing one incompatibility in the trees.
Here, we assign the score $\gamma$ to each pruned clade, and count as $1$ each incompatibility fixed for the remaining clades. For $\gamma=1$, we obtain a parsimonious method that attempts to minimize the overall number of reassortments. For higher values, pruning a clade must "fix" at least $\gamma$ incompatibilities to be considered a good move, making the obtained MCCs less parsimonious. For infinite $\gamma$, pruning clades is impossible, and we fall back on the naive estimation of MCCs.
The example below illustrates the difference between different $\gamma$ values:
t1 = parse_newick_string("((((A,B),C),D),E);"; label="t1")
t2 = parse_newick_string("((((D,B),E),A),C);"; label="t2") # Same topology, but shuffled leave
Here, pruning the two leaves (A,C)
or (D,E)
results in compatible trees (resp. ((B,D),E)
and ((A,B),C)
). These moves each have a cost 2$\gamma$ (removing 2 clades), but bring us from trees with 5 incompatibilities to 0. They will only be accepted if $\gamma \leq 2.5$.
julia> run_treeknit!(t1, t2, OptArgs(γ=2))
MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["D"], ["E"], ["A", "B", "C"]]))
julia> run_treeknit!(t1, t2, OptArgs(γ=3))
MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["A"], ["B"], ["C"], ["D"], ["E"]]))
Resolving trees with polytomies
See Resolving for more information about tree resolution. Note we have multiple options for resolving trees:
pre_resolve:
if this option is passed torun_treeknit!
(throughOptArgs
),resolve!
will be called on all trees together prior to MCC inference.resolve:
under this optionresolve!
is called on each pair of trees before each iteration of the MCC inference procedure. For example if treeknit is run on treest1
,t2
andt3
andresolve
is set totrue
,resolve!
will be called on each combination of tree pairs (see MultiTreeKnit for more details). Furthermore, this option allows for tree resolution during MCC inference on tree pairs and will resolve trees after each pair-wise iteration ofTreeKnit
using the inferred MCCs.
Note thatresolve=true
will also resolve trees with each other prior to inference, and thus pre_resolve
is not needed for 2 trees if resolve=true
, but we distinguish between the two options for consistency with $>2$ trees).
When resolving trees using MCCs we distinguish between two options (see strict vs liberal resolution for a detailed description of the two methods):
strict=true:
only add unambiguous splits from one tree into another.strict=false
also known asliberal
resolve: will resolve trees as much as possible, fully resolving shared regions of the two trees, but arbitrarily choosing the location of ambiguous splits, leading to potentially wrong splits in the output trees.
Degeneracy: sorting with likelihood
When several MCC decompositions are possible, degeneracy is removed by using the likelihood_sort
option (activated by default). In the example below, there are three equivalent decompositions if only topology is considered:
t1 = parse_newick_string("((A:2,B:2):2,C:4);")
t2 = parse_newick_string("(A:2,(B:1,C:1):1);")
oa = OptArgs(likelihood_sort = false)
unique([run_treeknit!(t1, t2, oa).mccs[Set([t1.label, t2.label])] for rep in 1:50]) # Repeating computation many times
3-element Vector{Vector{Vector{String}}}:
[["C"], ["A", "B"]]
[["B"], ["A", "C"]]
[["A"], ["B", "C"]]
When taking branch lengths into account, this degeneracy vanishes:
oa = OptArgs(likelihood_sort = true)
unique([run_treeknit!(t1, t2, oa).mccs[Set([t1.label, t2.label])] for rep in 1:50])
2-element Vector{Vector{Vector{String}}}:
[["C"], ["A", "B"]]
[["B"], ["A", "C"]]
Verbosity
The TreeKnit cli has four level of verbosity that can be set with the --verbosity-level <value>
option:
-1
means no visible output0
means relatively little output: e.g. input file names, number of MCCs found,... The default options result in less than 10 lines.1
gives slightly more details: e.g. the MCC inference process for each pair of trees is detailed2
gives a lot of information and is only useful for debugging and when using small trees
The --verbose
or -v
flag set the verbosity to 1
instead of 0
.
When using from a julia session, it can be useful to also set the verbosity. Let's take the simple case where one wants to use run_treeknit!(tree1, tree2, OptArgs())
with some pre-loaded trees and default arguments. If run like this, no info will be shown. To trigger verbosity, create a custom logger in the following way:
julia> using TreeKnit, TreeTools, Logging
julia> t1 = parse_newick_string("((A:2,B:2):2,C:4);"; label="t1");
julia> t2 = parse_newick_string("(A:2,(B:1,C:1):1);"; label="t2");
julia> verbosity = 1;
julia> logger = ConsoleLogger(LogLevel(-verbosity)) # corresponds to `--verbosity-level 1` from the CLI
Logging.ConsoleLogger(IOBuffer(data=UInt8[...], readable=false, writable=false, seekable=false, append=false, size=0, maxsize=0, ptr=1, mark=-1), LogLevel(-1), Logging.default_metafmt, true, 0, Dict{Any, Int64}())
julia> output = with_logger(logger) do run_treeknit!(t1, t2, OptArgs()) end
┌ LogLevel(-1): ROUND: 1 │ └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:37 ┌ LogLevel(-1): Infering MCCs for trees: t2 and t1 │ │ └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:49 ┌ LogLevel(-1): Initial state: 3 naive MCCs └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:216 ┌ LogLevel(-1): --- Iteration 1 (max. 15) - 3 leaves remaining --- └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:221 ┌ LogLevel(-1): Running optimization to find MCCs... └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:224 ┌ LogLevel(-1): Cooling schedule: geometric / Temperature values: 101 / Total of MCMC steps: 150 └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:225 ┌ LogLevel(-1): Sorting 2 topologically equivalent configurations. └ @ TreeKnit.SplitGraph ~/work/TreeKnit.jl/TreeKnit.jl/src/SplitGraph/SplitGraph.jl:56 ┌ LogLevel(-1): Found 1 new mccs. └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:239 ┌ LogLevel(-1): Proceeding based on newly found MCCs... └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:243 ┌ LogLevel(-1): Found mccs do not cover all leaves. Pruning them from trees. └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:294 ┌ LogLevel(-1): Resulting trees are compatible: final decomposition found. Stopping. └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:304 ┌ LogLevel(-1): found MCCs for trees: t2 and t1 └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:58 ┌ LogLevel(-1): ladderized and sorted trees: t2 and t1 └ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:67 MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["C"], ["A", "B"]]))
The above works because during inference, TreeKnit emits log messages in the form @logmsg LogLevel(0/-1/-2) msg
. The default Julia logger will only show messages with a positive level, and this can be set as described above. See the Logging and the LoggingExtras package for more information.