Options

run_treeknit! uses a topogy based heuristic optimization to find maximally compatible clades. Options to this heuristic are provided through the OptArgs object, that is optionally passed as a second argument. Essential options are detailed here.

Note that when the OptArgs() object is initialized it's default parameters are the same as those for running TreeKnit for 2 trees. When instead the constructor OptArgs(K::Int) is called, the default parameters for running TreeKnit with K trees are returned.

The default parameters for K=2 trees are the parameters for the :better_MCCs method, whereas the default parameters for K>2 trees are the parameters for the :better_trees method. See TreeKnit Methods.

Parsimony parameter

The heuristic method used by TreeKnit tries to prune consistent clades from a pair of trees in order to increase a compatibility score between other clades. Pruning a clade is interpreted as fixing a reassortment right above it, while increasing the compatibility between remaining clades removes reassortments. A purely parsimonious heuristic should thus give the same weight to fixing a reassortment through pruning a clade and fixing one incompatibility in the trees.

Here, we assign the score $\gamma$ to each pruned clade, and count as $1$ each incompatibility fixed for the remaining clades. For $\gamma=1$, we obtain a parsimonious method that attempts to minimize the overall number of reassortments. For higher values, pruning a clade must "fix" at least $\gamma$ incompatibilities to be considered a good move, making the obtained MCCs less parsimonious. For infinite $\gamma$, pruning clades is impossible, and we fall back on the naive estimation of MCCs.

The example below illustrates the difference between different $\gamma$ values:

t1 = parse_newick_string("((((A,B),C),D),E);"; label="t1")
t2 = parse_newick_string("((((D,B),E),A),C);"; label="t2") # Same topology, but shuffled leave

Here, pruning the two leaves (A,C) or (D,E) results in compatible trees (resp. ((B,D),E) and ((A,B),C)). These moves each have a cost 2$\gamma$ (removing 2 clades), but bring us from trees with 5 incompatibilities to 0. They will only be accepted if $\gamma \leq 2.5$.

julia> run_treeknit!(t1, t2, OptArgs(γ=2))MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["D"], ["E"], ["A", "B", "C"]]))
julia> run_treeknit!(t1, t2, OptArgs(γ=3))MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["A"], ["B"], ["C"], ["D"], ["E"]]))

Resolving trees with polytomies

See Resolving for more information about tree resolution. Note we have multiple options for resolving trees:

pre_resolve: if this option is passed to run_treeknit! (through OptArgs), resolve! will be called on all trees together prior to MCC inference.
resolve: under this option resolve! is called on each pair of trees before each iteration of the MCC inference procedure. For example if treeknit is run on trees t1, t2 and t3 and resolve is set to true, resolve! will be called on each combination of tree pairs (see MultiTreeKnit for more details). Furthermore, this option allows for tree resolution during MCC inference on tree pairs and will resolve trees after each pair-wise iteration of TreeKnit using the inferred MCCs.

pre-resolve and resolve for tree pairs

Note thatresolve=true will also resolve trees with each other prior to inference, and thus pre_resolve is not needed for 2 trees if resolve=true, but we distinguish between the two options for consistency with $>2$ trees).

When resolving trees using MCCs we distinguish between two options (see strict vs liberal resolution for a detailed description of the two methods):

strict=true: only add unambiguous splits from one tree into another.
strict=false also known as liberal resolve: will resolve trees as much as possible, fully resolving shared regions of the two trees, but arbitrarily choosing the location of ambiguous splits, leading to potentially wrong splits in the output trees.

Degeneracy: sorting with likelihood

When several MCC decompositions are possible, degeneracy is removed by using the likelihood_sort option (activated by default). In the example below, there are three equivalent decompositions if only topology is considered:

t1 = parse_newick_string("((A:2,B:2):2,C:4);")
t2 = parse_newick_string("(A:2,(B:1,C:1):1);")
oa = OptArgs(likelihood_sort = false)
unique([run_treeknit!(t1, t2, oa).mccs[Set([t1.label, t2.label])] for rep in 1:50]) # Repeating computation many times

3-element Vector{Vector{Vector{String}}}:
 [["B"], ["A", "C"]]
 [["A"], ["B", "C"]]
 [["C"], ["A", "B"]]

When taking branch lengths into account, this degeneracy vanishes:

oa = OptArgs(likelihood_sort = true)
unique([run_treeknit!(t1, t2, oa).mccs[Set([t1.label, t2.label])] for rep in 1:50])

2-element Vector{Vector{Vector{String}}}:
 [["C"], ["A", "B"]]
 [["B"], ["A", "C"]]

Verbosity

The TreeKnit cli has four level of verbosity that can be set with the --verbosity-level <value> option:

-1 means no visible output
0 means relatively little output: e.g. input file names, number of MCCs found,... The default options result in less than 10 lines.
1 gives slightly more details: e.g. the MCC inference process for each pair of trees is detailed
2 gives a lot of information and is only useful for debugging and when using small trees

The --verbose or -v flag set the verbosity to 1 instead of 0.

When using from a julia session, it can be useful to also set the verbosity. Let's take the simple case where one wants to use run_treeknit!(tree1, tree2, OptArgs()) with some pre-loaded trees and default arguments. If run like this, no info will be shown. To trigger verbosity, create a custom logger in the following way:

julia> using TreeKnit, TreeTools, Logging
julia> t1 = parse_newick_string("((A:2,B:2):2,C:4);"; label="t1");
julia> t2 = parse_newick_string("(A:2,(B:1,C:1):1);"; label="t2");
julia> verbosity = 1;
julia> logger = ConsoleLogger(LogLevel(-verbosity)) # corresponds to `--verbosity-level 1` from the CLILogging.ConsoleLogger(IOBuffer(data=UInt8[...], readable=false, writable=false, seekable=false, append=false, size=0, maxsize=0, ptr=1, mark=-1), LogLevel(-1), Logging.default_metafmt, true, 0, Dict{Any, Int64}())
julia> output = with_logger(logger) do
         run_treeknit!(t1, t2, OptArgs())
       end┌ LogLevel(-1): ROUND: 1
│ 
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:37
┌ LogLevel(-1): Infering MCCs for trees: t2 and t1
│ 
│ 
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:49
┌ LogLevel(-1): Initial state: 3 naive MCCs
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:216
┌ LogLevel(-1): --- Iteration 1 (max. 15) - 3 leaves remaining ---
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:221
┌ LogLevel(-1): Running optimization to find MCCs...
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:224
┌ LogLevel(-1): Cooling schedule: geometric / Temperature values: 101 / Total of MCMC steps: 150
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:225
┌ LogLevel(-1): Sorting 3 topologically equivalent configurations.
└ @ TreeKnit.SplitGraph ~/work/TreeKnit.jl/TreeKnit.jl/src/SplitGraph/SplitGraph.jl:56
┌ LogLevel(-1): Found 1 new mccs.
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:239
┌ LogLevel(-1): Proceeding based on newly found MCCs...
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:243
┌ LogLevel(-1): Found mccs do not cover all leaves. Pruning them from trees.
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:294
┌ LogLevel(-1): Resulting trees are compatible: final decomposition found. Stopping.
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:304
┌ LogLevel(-1): found MCCs for trees: t2 and t1
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:58
┌ LogLevel(-1): ladderized and sorted trees: t2 and t1
└ @ TreeKnit ~/work/TreeKnit.jl/TreeKnit.jl/src/main.jl:67
MCC_set(2, ["t1", "t2"], Dict{Set{String}, Vector{Vector{String}}}(Set(["t2", "t1"]) => [["C"], ["A", "B"]]))

The above works because during inference, TreeKnit emits log messages in the form @logmsg LogLevel(0/-1/-2) msg. The default Julia logger will only show messages with a positive level, and this can be set as described above. See the Logging and the LoggingExtras package for more information.