Reading and writing

For now, TreeTools only handles the Newick format. Functions are quite basic at this stage.

Reading

To read from a Newick file, use read_tree. Here is an example with the example/tree_10.nwk file:

tree = read_tree("../../examples/tree_10.nwk")

                                                          _________________ 10
  _______________________________________________________|
 |                                                       |               __ 8
 |                                                       |______________|
 |                                                                      |,_ 9
 |                                                                      ||
_|                                                                       |, 4
 |                                                                       ||
 |                                                                        | 6
 |
 |                                           ______________________________ 2
 |                                          |
 |__________________________________________|                             , 1
                                            |          ___________________|
                                            |         |                   | 5
                                            |_________|
                                                      |                  ,_ 3
                                                      |__________________|
                                                                         |_ 7

The documentation reproduced below gives more information:

TreeTools.read_treeFunction
read_tree(
	nwk_filename::AbstractString;
	node_data_type=DEFAULT_NODE_DATATYPE, label=default_tree_label(), force_new_labels=false
)
read_tree(
	io::IO;
	node_data_type=DEFAULT_NODE_DATATYPE, label=default_tree_label(), force_new_labels=false
)

Read Newick file and create a Tree{node_data_type} object from it. The input file can contain multiple Newick strings on different lines. The output will then be an array of Tree objects.

The call node_data_type() must return a valid instance of a subtype of TreeNodeData. You can implement your own subtypes, or see ?TreeNodeData for already implemented ones.

Use force_new_labels=true to force the renaming of all internal nodes. By default the tree will be assigned a default_tree_label(), however the label of the tree can also be assigned with the label parameter.

If you have a variable containing a Newick string and want to build a tree from it, use parse_newick_string instead.

Note on labels

The Tree type identifies nodes by their labels. This means that labels have to be unique. For this reason, the following is done when reading a tree:

  • if an internal node does not have a label, a unique one will be created of the form "NODE_i"
  • if a node has a label that was already found before in the tree, a random identifier will be appended to it to make it unique. Note that the identifier is created using randstring(8), unicity is technically not guaranteed.
  • if force_new_labels is used, a unique identifier is appended to node labels
  • if node labels in the Newick file are identified as confidence/bootstrap values, a random identifier is appended to them, even if they're unique in the tree. See ?TreeTools.isbootstrap to see which labels are identified as confidence values.
source

read_tree will also read files containing several Newick strings, provided they are on separate lines. It then returns an array of Tree objects.

If you have a variable containing a Newick string, simply call parse_newick_string to return a tree:

nwk = "(A:3,(B:1,C:1)BC:1);"
tree = parse_newick_string(nwk)

  ____________________________________________________________________________ A
_|
 |                         __________________________ B
 |________________________|
                          |__________________________ C

If internal nodes of a Newick string do not have names, TreeTools will by default give them names of the form NODE_i with i::Int. This happens during parsing of the Newick string, in the parse_newick! function. This label is technically not guaranteed to be unique: the Newick string may also contain nodes with the same name. In some cases, it is thus necessary to create a unique identifier for a node. This is done by creating a random string obtained with the call Random.randstring(8), and happens at a later stage, when calling the node2tree function (see the section about Tree). This happens when:

  • the node label is found to be a bootstrap value (see ?TreeTools.isbootstrap).
  • the option force_new_labels is used when calling read_tree. This is useful if some internal nodes of the Newick string have redundant names.
  • for some reason, the node does not yet have a label.

There are about $2\cdot 10^{14}$ strings of length 8 (alphabetic + numeric characters), so this should be fine for most problems. A quick calculation shows that for a tree of 1000 leaves, the probability of obtaining two equal identifiers for different nodes is $\sim 2 \cdot 10^{-9}$, which is probably acceptable for most applications. If you think it's not enough, I can add a handle to let user create longer strings, or solve this in a more elegant way.

Writing

To write t::Tree to a Newick file, simply call write(filename, t). If you want to append to a file, call write(filename, t, "a"). Note that write(filename, t) adds a newline '\n' character at the end of the Newick string. This is done in case other trees have to be added to the file.