The runopt function

For larger trees, opttrees will not find all reassortments at once. The reason for this is that it is often necessary to "clean" reassorted leaves in smaller clades in order to see other reassortments deeper in the tree. As a consequence, trees obtained after one round of opttrees will still have incompatibilities. Below is a relatively simple example of trees for which opttrees does not find all reassorted leaves in one go:

nwk1 = "(Z,(G,(((A,X),(B,C)),((D,Y),(E,F)))));"
nwk2 = "(G,((A,(B,(C,X))),((D,(E,(F,Y))),Z)));"
t1 = parse_newick_string(nwk1; label="t1")
t2 = parse_newick_string(nwk2; label="t2")

These trees are constructed in the following way (you're encouraged to draw them!):

  • Clade ((A,X),(B,C)) in the first tree "corresponds" to clade (A,(B,(C,X))) in the second tree, with X as the reassorted leaf. Same goes for clades ((D,Y),(E,F)) and (D,(E,(F,Y))) and the reassorted leaf Y.
  • We respectively name these not yet compatible clades ABC and DEF. At a deeper level, the trees are now of the form (Z,(G,(ABC,DEF))) for the first and (G,(ABC,(DEF,Z))) for the second, with Z as the obvious reassorted leaf.

The important property here is that for a high enough value of $\gamma$, it is only possible for opttrees to see that Z is reassorted if clades ABC and DEF are "coarse-grained" to leaves. In return, this coarse-graining is only possible after X and Y have been identified as reassorted leaves, which will happen after a first iteration of opttrees. The task of iterating opttrees is performed by the runopt function. Here, we walk through the typical steps it takes.

Let's do a first pass with opttrees, with $\gamma=3$. We first build the SplitGraph object, as detailed in the opttrees page.

treelist = Any[t1, t2]
mccs_naive = naive_mccs(treelist) # these are just the leaves in this example
mcc_names = TreeKnit.name_mcc_clades!(treelist, mccs_naive)
for (i,t) in enumerate(treelist)
  treelist[i] = TreeKnit.reduce_to_mcc(t, mccs_naive)
end
g = SplitGraph.trees2graph(treelist);

We then run the simulated annealing optimization to find optimal leaves to remove.

opt_confs = SplitGraph.sa_opt(g; Trange = reverse(1e-3:1e-2:1), M = 10, γ = 3)[1]
mccs_found = [mcc_names[x] for x in g.labels[.!opt_confs[1]]]
2-element Vector{Vector{String}}:
 ["Y"]
 ["X"]
On setting $\gamma$

Here $\gamma = 2$, if you run this example using $\gamma \leq 2$, Z will immediatly be found as a reassorted strain. It is indeed not easy to find an example of a case with trees with a small number of leaves (9 here), obvious reassortments, and that are not solved in one go by opttrees with a low value of $\gamma$. However, when dealing with trees with hundreds of leaves, finding all MCCs in one go is the exception rather than the rule.

As expected, X Y are found as reassortments. However, the two trees will still have incompatibilities when removing those two leaves. To make this explicit, we remove the leaves X and Y and compute naive mccs again.

TreeKnit.pruneconf!(mccs_found, treelist...) # prune clades in a list of trees. Wrapper around TreeTools.prunesutree!
mccs_naive = naive_mccs(treelist...)
4-element Vector{Vector{String}}:
 ["G"]
 ["Z"]
 ["A", "B", "C"]
 ["D", "E", "F"]

Now that X and Y are removed, we see that clades ABC and DEF are common to both trees. If we reduce the pruned trees to their new naive MCCs again, we now see that Z is an obvious choice for a reassorted strain:

julia> mcc_names = TreeKnit.name_mcc_clades!(treelist, mccs_naive)Dict{Any, Any} with 4 entries:
  "Z"     => ["Z"]
  "MCC_4" => ["D", "E", "F"]
  "G"     => ["G"]
  "MCC_3" => ["A", "B", "C"]
julia> for (i,t) in enumerate(treelist) treelist[i] = TreeKnit.reduce_to_mcc(t, mccs_naive) end
julia> treelist[1] not all branch lengths known, assuming identical branch lengths ________________________ Z _| | ________________________ G |_______________________| | ________________________ MCC_3 |_______________________| |________________________ MCC_4
julia> treelist[2] not all branch lengths known, assuming identical branch lengths ________________________ G _| | ________________________ MCC_3 |_______________________| | ________________________ MCC_4 |_______________________| |________________________ Z

To finish the inference of MCCs, we would now have to re-run the optimization process, i.e. opttrees has to be iterated. This is performed automatically by the runopt function.

This process stops when one of the following end conditions is fulfilled:

  1. If no new MCCs are found in a given iteration of opttrees stop.
  2. If new MCCs were found, prune them from the trees. If the resulting trees do not have any incompatibilities stop.
  3. If the maximum number of iterations has been reached stop. This number can be set through OptArgs, the default is 15.