It is usually a good idea to control for species evolutionary history if we want to get robust results. This is because species are not independent from each other, thus violate the independence assumption of data for most statistical models. Fortunately, with growing available genetic data and softwares, building phylogenies are getting easier and easier.
Phylomatic is an easy way to fetch phylogenies for species, especially plants, on line. Thanks to packages developed by rOpenSci, we can now use Phylomatic within R. One big advantage of this is reproducibility, which means that we can regenerate the phylogeny whenever we want without click on buttons on the website. In addition, because most ecologists are using R for downstream analyses, fetching phylogenies within R will make the workflow much natural and easy to follow.
The basic procedure for fetching phylogenies with Phylomatic using R will be:
- Compile the species names we want to include in the phylogeny; and clean if necessary (
taxize
package,rotl::tnrs_match_names()
) - Clean and prepare species names in the format to be used with Phylomatic (
brranching::phylomatic_names()
) - Query Phylomatic and return the phylogeny (
brranching::phylomatic()
; if you have hundreds species, it is better to use Phylomatic locally withbrranching::phylomatic_local()
)1
It is possible to merge step 2 and 3, but I prefer to separate them.
I assume that you already have a list of species, named as sp_list
. Then we can use the phylomatic()
function from the brranching
package. If you do not have it installed, install it first with install.packages("brranching")
.
sp_list = c()
tree = brranching::phylomatic(sp_list)
If you have few species, this will likely give you a phylogeny with all species. However, in practice, it is quite possible that you will get a warning like this:
NOTE: 3 taxa not matched: NA/genus/species, ...
In this case, we may try to prepare species names first with brranching::phylomatic_names()
. The default database will be ncbi
, but if you have hundreds of species, this can be slow. Instead, I would suggest to use ape
first because it is much faster (this is the default within brranching::phylomatic()
). Then filter out those species have NA
as family and try ncbi
or itis
(these are the three database supported). Sometimes, your species names are not clean, e.g. with synonyms, then the R package taxize
will be really handy. In addition, I find rotl::tnrs_match_names()
is also good to check and solve names. This function will compare with Open Tree of Life to check species names.
sp_list_phylocom = brranching::phylomatic_names(sp_list,
format = "isubmit",
db = "ncbi")
Now, let’s try to fetch the phylogeny again, with the updated species list.
tree = brranching::phylomatic(sp_list_phylocom)
As mentioned eariler, it is possible to merge these two steps into one with tree = brranching::phylomatic(sp_list_phylocom, db = "ncbi")
but I prefer to solve species names first.
The default backbone phylogeny is the APG III R20120829
. We can use the Zanne et al. 2014 phylogeny.
tree = brranching::phylomatic(sp_list_phylocom,
storedtree = "zanne2014")
plot(tree)
Finally, I have one reproducible example that shows how to use the brranching
package to get phylogeny for plants at Github. Feel free to check it out (and the associated paper if you are interested in)!