Iteration: going through a tree
TreeTools offers two different ways to iterate through nodes: post-order traversal, and arbitrary iteration.
Post-order traversal
The exact definition can be found here. This order of iteration guarantees that:
- children nodes are always visited before parent nodes
- the order in which children of a node are visited is the same as that given in
children(node)
.
julia> tree = parse_newick_string("((A:1,B:1)AB:2,C:3)R;")
_________________________ A __________________________________________________| _| |_________________________ B | |____________________________________________________________________________ C
julia> for n in POT(tree) println(n) end
TreeNode{TreeTools.EmptyData}: A TreeNode{TreeTools.EmptyData}: B TreeNode{TreeTools.EmptyData}: AB TreeNode{TreeTools.EmptyData}: C TreeNode{TreeTools.EmptyData}: R (root)
If you want to access only leaves, you can use POTleaves
, or simply filter the results:
julia> [label(n) for n in POTleaves(tree["AB"])]
2-element Vector{String}: "A" "B"
julia> for n in Iterators.filter(isleaf, POT(tree)) println("$(label(n)) is a leaf") end
A is a leaf B is a leaf C is a leaf
Note that POT
can also be called on TreeNode
objects. In this case, it will only iterate through the clade below the input node, including its root:
julia> let node = tree["AB"] X = map(label, POT(node)) println("The nodes in the clade defined by $(label(node)) are $(X).") end
The nodes in the clade defined by AB are ["A", "B", "AB"].
map
, count
, etc...
In some cases, one wants to do something like count the number of nodes that have a given property, or apply some function f
to each node and collect the result. To facilitate this, TreeTools extends the Base functions map
, map!
and count
to Tree
and TreeNode
objects. Using these functions will traverse the tree in post-order. If called on a TreeNode
object, they will only iterate through the clade defined by this node.
julia> map(branch_length, tree) # Branch length of all nodes, in POT
5-element Vector{Union{Missing, Float64}}: 1.0 1.0 2.0 3.0 missing
julia> map!(tree) do node # Double the length of all branches - map! returns `nothing` x = branch_length(node) if !ismissing(x) branch_length!(node, 2*x) end end
julia> map(branch_length, tree) # Doubled branch length, except for root (`missing`)
5-element Vector{Union{Missing, Float64}}: 2.0 2.0 4.0 6.0 missing
julia> count(n -> label(n)[1] == 'A', tree) # count the nodes with a label starting with 'A'
2
Note that there is a difference between the TreeTools extension of map!
with respect Base: in TreeTools, map!
returns nothing instead of an array.
Arbitrary order
As explained in Basic concepts, a Tree
object is mainly a dictionary mapping labels to TreeNode
objects. We can thus iterate through nodes in the tree using this dictionary. For this, TreeTools provides the nodes
, leaves
and internals
methods. This will traverse the tree in an arbitrary order but is faster than POT
.
julia> for n in leaves(tree) println("$(label(n)) is a leaf") end
B is a leaf A is a leaf C is a leaf
julia> for n in internals(tree) println("$(label(n)) is an internal node") end
R is an internal node AB is an internal node
julia> map(label, nodes(tree)) == union( map(label, leaves(tree)), map(label, internals(tree)) )
true
A note on speed
Iterating through tree
using nodes(tree)
will be faster than using POT(tree)
. This is mainly because of my inability to write an efficient iterator: currently, POT
will allocate a number of times that is proportional to the size of the tree. Below is a simple example where we define functions that count the number of nodes in a tree:
count_nodes_pot(tree) = sum(x -> 1, POT(tree)) # Traversing the tree in post-order while summing 1
count_nodes_arbitrary(tree) = sum(x -> 1, nodes(tree)) # Arbitrary order
These two functions obviously give the same result, but not with the same run time. Here, we try it on the example/tree_10.nwk
file:
julia> tree = read_tree("../../examples/tree_10.nwk")
_________________ 10 _______________________________________________________| | | __ 8 | |______________| | |,_ 9 | || _| |, 4 | || | | 6 | | ______________________________ 2 | | |__________________________________________| , 1 | ___________________| | | | 5 |_________| | ,_ 3 |__________________| |_ 7
julia> using BenchmarkTools
julia> @btime count_nodes_pot(tree)
474.867 ns (19 allocations: 912 bytes) 19
julia> @btime count_nodes_arbitrary(tree)
54.213 ns (0 allocations: 0 bytes) 19
If a fast post-order is needed, the only solution in the current state of TreeTools is to "manually" program it. The code below defines a more efficient to count nodes, traversing the tree in post-order.
function count_nodes_eff_pot(n::TreeNode) # this counts the number of nodes below `n`
cnt = 0
for c in children(n)
cnt += count_nodes_eff_pot(c) #
end
return cnt + 1
end
function count_nodes_eff_pot(tree::Tree)
return count_nodes_eff_pot(tree.root)
end
This will run faster than count_nodes_pot
, and does not allocate. For counting nodes, this is really overkill, and one could just call length(nodes(tree))
. In particular, traversing the tree in a precise order does not matter at all. But for more complex use case, writing short recursive code as above does not add a lot of complexity.
julia> @btime count_nodes_eff_pot(tree)
97.704 ns (0 allocations: 0 bytes) 19