Iteration: going through a tree
TreeTools offers two different ways to iterate through nodes: post/pre-order traversals, and arbitrary iteration.
Traversals
The call traversal(tree, style)
returns an iterator over the nodes of tree
. style
can take two values: :preorder
or :postorder
. The exact definition of the two traversals can be found here. In short, the traversal guarantees that:
- for
:postorder
(resp.:preorder
), children nodes are always visited before (resp. after) parent nodes; - the order in which children of a node are visited is the same as that given in
children(node)
; - the order is "depth-first": iteration over a subtree finishes before the next subtree starts.
julia> tree = parse_newick_string("((A:1,B:1)AB:2,C:3)R;");
julia> for node in traversal(tree, :postorder)
# do something with node
end
julia> map(label, traversal(tree, :postorder))
5-element Vector{String}:
"A"
"B"
"AB"
"C"
"R"
julia> map(label, traversal(tree, :preorder))
5-element Vector{String}:
"R"
"AB"
"A"
"B"
"C"
Note that traversal
can also be called on TreeNode
objects:
julia> map(label, traversal(tree, :preorder)) == map(label, traversal(root(tree), :preorder))
true
The traversal
function accepts boolean keyword arguments leaves
, root
and internals
. If any of those is set to false, the corresponding nodes are skipped:
julia> map(label, traversal(tree, :postorder, leaves=false))
2-element Vector{String}:
"AB"
"R"
julia> map(label, traversal(tree, :preorder, internals=false))
3-element Vector{String}:
"A"
"B"
"C"
One can also skip nodes by passing a function f
as the first argument. The traversal will only return nodes n
such that f(n)
is true:
julia> map(label, traversal(n -> label(n)[1] == 'A', tree, :postorder))
2-element Vector{String}:
"A"
"AB"
map
, count
, etc...
In some cases, one wants to do something like count the number of nodes that have a given property, or apply some function f
to each node and collect the result. To facilitate this, TreeTools extends the Base functions map
, map!
and count
to Tree
and TreeNode
objects. Using these functions will traverse the tree in post-order. If called on a TreeNode
object, they will only iterate through the clade defined by this node.
julia> map(branch_length, tree) # Branch length of all nodes, in postorder
5-element Vector{Union{Missing, Float64}}: 1.0 1.0 2.0 3.0 missing
julia> map!(tree) do node # Double the length of all branches - map! returns `nothing` x = branch_length(node) if !ismissing(x) branch_length!(node, 2*x) end end
julia> map(branch_length, tree) # Doubled branch length, except for root (`missing`)
5-element Vector{Union{Missing, Float64}}: 2.0 2.0 4.0 6.0 missing
julia> count(n -> label(n)[1] == 'A', tree) # count the nodes with a label starting with 'A'
2
Note that there is a difference between the TreeTools extension of map!
with respect Base: in TreeTools, map!
returns nothing instead of an array.
Arbitrary order
As explained in Basic concepts, a Tree
object is mainly a dictionary mapping labels to TreeNode
objects. We can thus iterate through nodes in the tree using this dictionary. For this, TreeTools provides the nodes
, leaves
and internals
methods. This will traverse the tree in an arbitrary order but is faster than traversal
.
julia> for n in leaves(tree) println("$(label(n)) is a leaf") end
B is a leaf A is a leaf C is a leaf
julia> for n in internals(tree) println("$(label(n)) is an internal node") end
R is an internal node AB is an internal node
julia> map(label, nodes(tree)) == union( map(label, leaves(tree)), map(label, internals(tree)) )
true
A note on speed
Iterating through tree
using nodes(tree)
will be faster than using traversal(tree, ...)
. This is mainly because of my inability to write an efficient iterator (any help appreciated). Below is a simple example where we define functions that count the number of nodes in a tree:
count_nodes_traversal(tree) = sum(x -> 1, traversal(tree, :postorder)) # Traversing the tree in post-order while summing 1
count_nodes_arbitrary(tree) = sum(x -> 1, nodes(tree)) # Arbitrary order
These two functions obviously give the same result, but not with the same run time. Here, we try it on the example/tree_10.nwk
file:
julia> tree = read_tree("../../examples/tree_10.nwk")
_________________ 10 _______________________________________________________| | | __ 8 | |______________| | |,_ 9 | || _| |, 4 | || | | 6 | | ______________________________ 2 | | |__________________________________________| , 1 | ___________________| | | | 5 |_________| | ,_ 3 |__________________| |_ 7
julia> using BenchmarkTools
julia> @btime count_nodes_traversal(tree)
7.564 μs (139 allocations: 6.00 KiB) 19
julia> @btime count_nodes_arbitrary(tree)
54.318 ns (0 allocations: 0 bytes) 19
Benchmarks show that this time difference tends to reduce when trees get larger (hundreds of leaves). In any case, if a fast post/pre-order is needed, the only solution in the current state of TreeTools is to "manually" program it using a recursive function. The code below defines a more efficient to count nodes, traversing the tree in post-order.
function count_nodes_recursive(n::TreeNode) # this counts the number of nodes below `n`
cnt = 0
for c in children(n)
cnt += count_nodes_recursive(c) #
end
return cnt + 1
end
count_nodes_recursive(tree::Tree) = count_nodes_recursive(tree.root)
This will run faster than count_nodes_traversal
, and does not allocate. For counting nodes, this is really overkill, and one could just call length(nodes(tree))
. In particular, traversing the tree in a precise order does not matter at all. But for more complex use case, writing short recursive code as above does not add a lot of complexity.
julia> @btime count_nodes_recursive(tree)
96.484 ns (0 allocations: 0 bytes) 19