Citation Sunday: Understanding Ash Dieback

Understanding Ash Dieback 

Ash Trees | Derek Harper

Gross, A., Hosoya, T., & Queloz, V. (2014). Population structure of the invasive forest pathogen Hymenoscyphus pseudoalbidus. Molecular Ecology, n/a–n/a. doi:10.1111/mec.12792


Share your publication with us and we may just post it here. Email Melissa with a link or cite Geneious.

Citation Sunday: Multiple Paternity of Perches

Multiple Paternity of Perches 

Striped Perch | Steve Lonhart (SIMoN / MBNMS)

LaBrecque, J. R., Alva-Campbell, Y. R., Archambeault, S., & Crow, K. D. (2014). Multiple paternity is a shared reproductive strategy in the live-bearing surfperches (Embiotocidae) that may be associated with female fitness. Ecology and Evolution, n/a–n/a. doi:10.1002/ece3.1071


Share your publication with us and we may just post it here. Email Melissa with a link or cite Geneious.

Hilary Miller is attending the Annual Scientific Meeting of the Genetics Society of AustralAsia this week in Sydney. 

She’ll also be presenting the work behind the Geneious Circular Assembler. 

Citation Sunday: Blood Parasites of Frogs

Blood Parasites in Frogs

Green Frog | Cephas

Leveille, A. N., Ogedengbe, M. E., Hafeez, M. A., Tu, H.-H. A., & Barta, J. R. (2014). The complete mitochondrial genome sequence of Hepatozoon catesbianae (Apicomplexa; Coccidia; Adeleorina), a blood parasite of the Green frog, Lithobates (formerly Rana) clamitans. The Journal of Parasitology. doi:10.1645/13-449.1


Share your publication with us and we may just post it here. Email Melissa with a link or cite Geneious.

Geneious at Evolution and CodeFest 2014

Evolution 2014

The Geneious team has been out and about meeting researchers all over the world.

We’re proud to be a gold sponsor of this year’s Evolution Meeting in Raleigh and it’s been great to see so many phylo-nerds don our temporary tattoos of Darwin’s Evolutionary Tree sketch.  

Geneious CodeFest

Although the tattoos are all gone we do have some sweet t-shirts to giveaway at the Geneious CodeFest this week.

CodeFest will be held at the North Carolina Museum of Natural Science, on Thursday 25 June from 9:00-5:00 and is open to the public.

This will be a great opportunity for open source bioinformatics developers to meet and work collaboratively on plugins for the Geneious sequence analysis platform. Some examples of community developed plugins include Augustus, InterProScan, QuRE, and PlasmoDB.

Whether you would like to contribute your Java coding skills or just want to pop in to say hello, please stop by to cheer us on! More event details can be found here.

Citation Sunday: The Symbiodinium Biosphere in Reef-Building Corals

image

A diversity of corals | Toby Hudson

Deep-Sequencing Method for Quantifying Background Abundances of Symbiodinium Types:

Quigley, K. M., Davies, S. W., Kenkel, C. D., Willis, B. L., Matz, M. V, & Bay, L. K. (2014). Deep-sequencing method for quantifying background abundances of symbiodinium types: exploring the rare symbiodinium biosphere in reef-building corals. PloS One, 9(4), e94297. doi:10.1371/journal.pone.0094297

Citation Sunday: Cement Proteins in Goose Barnacles

image

Goose Barnacle | Minette Layne

Cement Proteins in the Goose Barnacle

Perina, A., von Reumont, B. M., Martínez-Lage, A., & González-Tizón, A. M. (2014). Accessing transcriptomic data for ecologically important genes in the goose barnacle (Pollicipes pollicipes), with particular focus on cement proteins. Marine Genomics. doi:10.1016/j.margen.2014.02.003

Citation Sunday: Cute Penguins and Giant Snails

A novel virus in Adelie Penguins

image

Adelie Penguin | Nanosmile

Varsani, A., Kraberger, S., Jennings, S., Porzig, E. L., Julian, L., Massaro, M., … Ainley, D. G. (2014). A novel papillomavirus in Adelie penguin (Pygoscelis adeliae) faeces sampled at the Cape Crozier colony, Antarctica. The Journal of General Virology, vir.0.064436–0–. doi:10.1099/vir.0.064436-0

Species variation in New Zealand’s giant snail.

image

Giant Snails | Erin Bowkett 

Buckley, T. R., White, D. J., Howitt, R., Winstanley, T., Ramon-Laca, A., & Gleeson, D. (2014). NUCLEAR AND MITOCHONDRIAL DNA VARIATION WITHIN THREATENED SPECIES AND SUBSPECIES OF THE GIANT NEW ZEALAND LAND-SNAIL GENUS POWELLIPHANTA: IMPLICATIONS FOR CLASSIFICATION AND CONSERVATION. Journal of Molluscan Studies, eyu014–. doi:10.1093/mollus/eyu014

Share your publication with us and we may just post it here. Email Melissa with a link or cite Geneious. 

Which Tree Builder Should I Use? Making the Most of Maximum Likelihood in Geneious

By Hilary Miller.

At Geneious support, we often get questions like:  “I want to make a tree with 50,000 taxa - what’s the best tree builder to use?” or “why has my PHYML tree been running for 3 days and how can I speed it up?”.  In this blog post I’ll review the maximum-likelihood tree builders you can run within Geneious, PHYML, Garli, RAxML and FastTree, and try to answer these questions. I’ll tell you what sort of datasets each is most suitable for, which is fastest, and what sort of options you get with each.  I won’t go into the differences between the algorithms that these programs use but if you are interested in that you can check out the links I’ve given for each one.  

Background to each program

PHYML

PHYML was written by Stephane Guindon and his colleagues at Université de Montpellier, France, and the University of Auckland, New Zealand. It was first published in 2003, and we are now up to version 3.0, which is described in this paper.  PHYML is one of the best known maximum-likelihood programs for its simplicity, accuracy and speed.

RAxML

RaxML comes from the Alexandros Stamatakis' Exelixis lab at Ludwig-Maximilians-Universitat in Munich. It is based on the dnaml program written by Joe Felsentein as part of the PHYLIP package, and was developed for handling large datasets with its comparatively low memory consumption, advanced search algorithms and use of accelerated likelihood.  

The Geneious plugin currently uses RAxML version 7.2.8, so the features listed in the table below are for that version.

Garli

Garli is written and maintained by Derrick Zwickl who is currently at the University of Kansas, but the supporting documents for the software are hosted at NESCent. It is based on the program GAML (Lewis 1998), and a description of how it works is given here.  

FastTree

FastTree was developed by Morgan N. Price in Adam Arkin’s group at Lawrence Berkeley National Lab.  It is optimized for extremely large alignments of up to 1 million sequences and uses a combination of neighbor-joining, minimum evolution and maximum likelihood to infer approximately-maximum-likelihood trees.  A detailed description of how it works is given here, but to summarize, FastTree uses neighbor-joining to get an approximate starting tree, then minimum evolution methods to reduce the length of the tree, and then maximum likelihood further improve the tree. Geneious implements FastTree 2.1.5.

What can you do with these programs?

All four programs will build trees from both DNA and protein alignments, however there are some differences in the options you get with each one, which I’ve summarized in the table below.  

image

PHYML gives you the widest choice of models, with the ability to input any of the models that Modeltest compares for DNA data. However, bear in mind that most of these models are nested within the General-Time-Reversible (GTR) model which is implemented in the other programs.  

If you want to be able to partition your data and apply different models to each partition, then you’ll need to use RAxML.  PHYML also gives you a variety of methods for calculating support values, but it does have an inbuilt constraint on the number of taxa.  I’m not aware of similar dataset size constraints for Garli and RAxML (although as you’ll see below these programs are all out-performed by FastTree for really large datasets).

A brief note about how these programs run in Geneious

These plugins don’t run within the Geneious Java run-time environment, instead they run as stand-alone programs with Geneious providing an interface.  Geneious exports your file to the plugin, the plugin program runs, and then the results are imported back into Geneious.  

This has implications for how much memory you should allocate to Geneious, because the more you allocate to Geneious, the less the plugin will have available to it.  However, you need to have enough RAM allocated to Geneious for it to be able to handle the export/import of files - and for large files this can require a significant amount (often more than is required to actually build the tree).   

Which is fastest?

The answer to this question depends a lot on the type of dataset you have.  In my testing of these programs (using a standard laptop with a 2.4 GHz Intel Core 2 Duo processor, 8 GB of total RAM, 4GB allocated to Geneious) I have found that as a very general rule, speed goes something like this:  FastTree » RAxML > PHYML > Garli.  

FastTree is by far the fastest algorithm for large trees with a large number of taxa.  FastTree can produce a 10,000 taxon tree with support values in only a couple of minutes, whereas the same tree built by RAxML or Garli may take several days to run.  PHYML won’t even run on an alignment this large, as it has a built-in cutoff of 4000 taxa.  However, trees produced by FastTree are “approximately maximum likelihood” trees, and for datasets where the relationships between taxa are not so clear-cut, they may not be as accurate as trees produced by the other methods which perform a more intensive search of tree topologies (see the FastTree website for a more thorough discussion on the speed and accuracy of FastTree vs PHYML vs RAxML).  

If you have extremely long sequences, but only a few taxa (for example if you’re building a tree from a small number of bacterial genomes), then RAxML and PHYML out-perform FastTree.  A tree of five sequences, 4 million bases in length (computed without support values) took around 14 minutes in FastTree and only about 1 minute in RaxML and PHYML.  Garli is also inefficient on long sequences, as it performs a time-consuming “pattern processing” step on the alignment before it builds the tree.  In my test of a 5 taxon, 4 MB alignment, this step was still going after 12 hours!  

Of the full maximum-likelihood tree builders, RAxML appears to be most efficient for large trees from DNA data.  PHYML is a good choice for smaller datasets, as according to the PHYML manual the “comfort zone” for PhyML generally lies around 100-200 sequences less than 2,000 characters long.  The PHYML website has some extensive comparisons between PHYML and RAxML using a range of datasets.  

How can I make my tree run faster?

The short answer is to get a faster computer.  Having more RAM available for your treebuilder won’t necessarily speed it up, but may mean you can build larger trees without running out of memory.  Speed is primarily determined by the speed of your processor, and currently all the tree builders I’ve covered here only use a single processor and cannot be configured to run across multiple cores.

So, which tree is best?

There is no one answer to this thorny question as it is entirely dependent on the nature of your dataset, and how well the chosen model fits your data. Maximum likelihood tree-builders return the tree with the highest likelihood of being correct, given the data and the model you have chosen, but because of the differences in algorithms, the likelihood values produced by each program can’t be directly compared. It is good practice to use more than one method of tree building to assess how robust your tree topology is.  

And finally, if you are publishing your results, please remember to cite the original authors of the program you used.  You can find the links on the respective plugin pages. 

Geneious Webinars- May 2014

Interested in learning more about Geneious? 

Geneious Field Application Scientist Christian Olsen is hosting the following webinars in May. Email Christian to sign up. 

  • May 12th, 9:00AM PST - Geneious R7: A Bioinformatics Platform for Biologists
  • May 19st, 9:00 PST - Geneious R7: Sequence Alignment & Assembly