Iteration: going through a tree

TreeTools offers two different ways to iterate through nodes: post-order traversal, and arbitrary iteration.

Post-order traversal

The exact definition can be found here. This order of iteration guarantees that:

  1. children nodes are always visited before parent nodes
  2. the order in which children of a node are visited is the same as that given in children(node).

julia> tree = parse_newick_string("((A:1,B:1)AB:2,C:3)R;") _________________________ A __________________________________________________| _| |_________________________ B | |____________________________________________________________________________ C
julia> for n in POT(tree) println(n) endTreeNode{TreeTools.EmptyData}: A TreeNode{TreeTools.EmptyData}: B TreeNode{TreeTools.EmptyData}: AB TreeNode{TreeTools.EmptyData}: C TreeNode{TreeTools.EmptyData}: R (root)

If you want to access only leaves, you can use POTleaves, or simply filter the results:

julia> [label(n) for n in POTleaves(tree["AB"])]2-element Vector{String}:
 "A"
 "B"
julia> for n in Iterators.filter(isleaf, POT(tree)) println("$(label(n)) is a leaf") endA is a leaf B is a leaf C is a leaf

Note that POT can also be called on TreeNode objects. In this case, it will only iterate through the clade below the input node, including its root:

julia> let
       	node = tree["AB"]
       	X = map(label, POT(node))
       	println("The nodes in the clade defined by $(label(node)) are $(X).")
       endThe nodes in the clade defined by AB are ["A", "B", "AB"].

map, count, etc...

In some cases, one wants to do something like count the number of nodes that have a given property, or apply some function f to each node and collect the result. To facilitate this, TreeTools extends the Base functions map, map! and count to Tree and TreeNode objects. Using these functions will traverse the tree in post-order. If called on a TreeNode object, they will only iterate through the clade defined by this node.

julia> map(branch_length, tree) # Branch length of all nodes, in POT5-element Vector{Union{Missing, Float64}}:
 1.0
 1.0
 2.0
 3.0
  missing
julia> map!(tree) do node # Double the length of all branches - map! returns `nothing` x = branch_length(node) if !ismissing(x) branch_length!(node, 2*x) end end
julia> map(branch_length, tree) # Doubled branch length, except for root (`missing`)5-element Vector{Union{Missing, Float64}}: 2.0 2.0 4.0 6.0 missing
julia> count(n -> label(n)[1] == 'A', tree) # count the nodes with a label starting with 'A'2

Note that there is a difference between the TreeTools extension of map! with respect Base: in TreeTools, map! returns nothing instead of an array.

Arbitrary order

As explained in Basic concepts, a Tree object is mainly a dictionary mapping labels to TreeNode objects. We can thus iterate through nodes in the tree using this dictionary. For this, TreeTools provides the nodes, leaves and internals methods. This will traverse the tree in an arbitrary order but is faster than POT.

julia> for n in leaves(tree)
       	println("$(label(n)) is a leaf")
       endB is a leaf
A is a leaf
C is a leaf
julia> for n in internals(tree) println("$(label(n)) is an internal node") endR is an internal node AB is an internal node
julia> map(label, nodes(tree)) == union( map(label, leaves(tree)), map(label, internals(tree)) )true

A note on speed

Iterating through tree using nodes(tree) will be faster than using POT(tree). This is mainly because of my inability to write an efficient iterator: currently, POT will allocate a number of times that is proportional to the size of the tree. Below is a simple example where we define functions that count the number of nodes in a tree:

count_nodes_pot(tree) = sum(x -> 1, POT(tree)) # Traversing the tree in post-order while summing 1
count_nodes_arbitrary(tree) = sum(x -> 1, nodes(tree)) # Arbitrary order

These two functions obviously give the same result, but not with the same run time. Here, we try it on the example/tree_10.nwk file:

julia> tree = read_tree("../../examples/tree_10.nwk")
                                                          _________________ 10
  _______________________________________________________|
 |                                                       |               __ 8
 |                                                       |______________|
 |                                                                      |,_ 9
 |                                                                      ||
_|                                                                       |, 4
 |                                                                       ||
 |                                                                        | 6
 |
 |                                           ______________________________ 2
 |                                          |
 |__________________________________________|                             , 1
                                            |          ___________________|
                                            |         |                   | 5
                                            |_________|
                                                      |                  ,_ 3
                                                      |__________________|
                                                                         |_ 7
julia> using BenchmarkTools
julia> @btime count_nodes_pot(tree) 474.867 ns (19 allocations: 912 bytes) 19
julia> @btime count_nodes_arbitrary(tree) 54.213 ns (0 allocations: 0 bytes) 19

If a fast post-order is needed, the only solution in the current state of TreeTools is to "manually" program it. The code below defines a more efficient to count nodes, traversing the tree in post-order.

function count_nodes_eff_pot(n::TreeNode) # this counts the number of nodes below `n`
	cnt = 0
	for c in children(n)
		cnt += count_nodes_eff_pot(c) #
	end
	return cnt + 1
end

function count_nodes_eff_pot(tree::Tree)
	return count_nodes_eff_pot(tree.root)
end

This will run faster than count_nodes_pot, and does not allocate. For counting nodes, this is really overkill, and one could just call length(nodes(tree)). In particular, traversing the tree in a precise order does not matter at all. But for more complex use case, writing short recursive code as above does not add a lot of complexity.

julia> @btime count_nodes_eff_pot(tree)  97.704 ns (0 allocations: 0 bytes)
19