Reading and writing
For now, TreeTools only handles the Newick format. Functions are quite basic at this stage.
Reading
To read from a Newick file, use read_tree
. Here is an example with the example/tree_10.nwk
file:
tree = read_tree("../../examples/tree_10.nwk")
_________________ 10
_______________________________________________________|
| | __ 8
| |______________|
| |,_ 9
| ||
_| |, 4
| ||
| | 6
|
| ______________________________ 2
| |
|__________________________________________| , 1
| ___________________|
| | | 5
|_________|
| ,_ 3
|__________________|
|_ 7
The documentation reproduced below gives more information:
TreeTools.read_tree
— Functionread_tree(
nwk_filename::AbstractString;
node_data_type=DEFAULT_NODE_DATATYPE, label=default_tree_label(), force_new_labels=false
)
read_tree(
io::IO;
node_data_type=DEFAULT_NODE_DATATYPE, label=default_tree_label(), force_new_labels=false
)
Read Newick file and create a Tree{node_data_type}
object from it. The input file can contain multiple Newick strings on different lines. The output will then be an array of Tree
objects.
The call node_data_type()
must return a valid instance of a subtype of TreeNodeData
. You can implement your own subtypes, or see ?TreeNodeData
for already implemented ones.
Use force_new_labels=true
to force the renaming of all internal nodes. By default the tree will be assigned a default_tree_label()
, however the label of the tree can also be assigned with the label
parameter.
If you have a variable containing a Newick string and want to build a tree from it, use parse_newick_string
instead.
Note on labels
The Tree
type identifies nodes by their labels. This means that labels have to be unique. For this reason, the following is done when reading a tree:
- if an internal node does not have a label, a unique one will be created of the form
"NODE_i"
- if a node has a label that was already found before in the tree, a random identifier will be appended to it to make it unique. Note that the identifier is created using
randstring(8)
, unicity is technically not guaranteed. - if
force_new_labels
is used, a unique identifier is appended to node labels - if node labels in the Newick file are identified as confidence/bootstrap values, a random identifier is appended to them, even if they're unique in the tree. See
?TreeTools.isbootstrap
to see which labels are identified as confidence values.
read_tree
will also read files containing several Newick strings, provided they are on separate lines. It then returns an array of Tree
objects.
If you have a variable containing a Newick string, simply call parse_newick_string
to return a tree:
nwk = "(A:3,(B:1,C:1)BC:1);"
tree = parse_newick_string(nwk)
____________________________________________________________________________ A
_|
| __________________________ B
|________________________|
|__________________________ C
If internal nodes of a Newick string do not have names, TreeTools will by default give them names of the form NODE_i
with i::Int
. This happens during parsing of the Newick string, in the parse_newick!
function. This label is technically not guaranteed to be unique: the Newick string may also contain nodes with the same name. In some cases, it is thus necessary to create a unique identifier for a node. This is done by creating a random string obtained with the call Random.randstring(8)
, and happens at a later stage, when calling the node2tree
function (see the section about Tree). This happens when:
- the node label is found to be a bootstrap value (see
?TreeTools.isbootstrap
). - the option
force_new_labels
is used when callingread_tree
. This is useful if some internal nodes of the Newick string have redundant names. - for some reason, the node does not yet have a label.
There are about $2\cdot 10^{14}$ strings of length 8 (alphabetic + numeric characters), so this should be fine for most problems. A quick calculation shows that for a tree of 1000 leaves, the probability of obtaining two equal identifiers for different nodes is $\sim 2 \cdot 10^{-9}$, which is probably acceptable for most applications. If you think it's not enough, I can add a handle to let user create longer strings, or solve this in a more elegant way.
Writing
To write t::Tree
to a Newick file, simply call write(filename, t)
. If you want to append to a file, call write(filename, t, "a")
. Note that write(filename, t)
adds a newline '\n'
character at the end of the Newick string. This is done in case other trees have to be added to the file.