2014-05-11

Phylogenetic position of Ctenophora

The first genome of Ctenophora has now been sequenced (Ryan et al. 2013), specifically that of Mnemiopsis leidyi. Genomes of most of the 30 or so animal phyla are still unavailable, but this might change in the coming years (Bracken-Grissomet al. 2014). I had great hopes of getting new insights about pinpointing the phylogenetic position of ctenophores. Unfortunately, the Science paper was disappointing and mainly not because the phylogenetic position of Ctenophora remains unsolved, but because the authors want to give the impression that it is. The abstract of the paper is terribly misleading. Of course, it is more attractive to present the results in a way that one of the long standing questions in animal evolution has been solved, rather than admitting that even after sequencing the whole genome, it's still not sure.

Ryan et al. want to make the case, that ctenophores are the sister group to all other animals. This was suggested for the first time in 2008 (Dunn et al.), but it wasn't taken very seriously even by the authors and a year later (Philippe et al. 2009) it was shown that the likely problem causing such a strange placement of ctenophores was their fast molecular evolution. Taking this into account, Philippe et al. found that ctenophores most likely belong to Eumetazoa (animals with true tissues, muscles and nervous system) and the sister group to all other animals is Porifera (sponges, who do not have muscles and nervous system), as always suspected. According to Philippe et al. ctenophores are most closely related to Cnidaria, with whom they superficially resemble, and were traditionally classified together as Coelenterata. The results of Philippe et al. can by no means be considered final, however, and reliably deciphering the relationships between the main lineages of animals (Porifera, Placozoa, Ctenophora, Cnidaria, and Bilateria) is still difficult (Nosenko et al. 2013).

So what evidence Ryan et al. provide for such a surprising phylogenetic placement of ctenophores? The two main lines of evidence were phylogenetic analyses of protein sequences and of genome gene contents (presence/absence of genes).

Despite the impression the authors are giving, their phylogenetic analyses are far from conclusive. They used two methods, maximum likelihood (ML) and Bayesian inference, to construct phylogenies based on protein sequences. These methods gave different results, but most likely not because of the methodological differences, but because of different evolutionary models employed.

In the ML framework, they used the standard GTR model, which is a site homogeneous model. This means, that probabilities describing different amino acid (or nucleotide) replacements (termed replacement matrix) do not vary along the sequence. Although the overall rate of replacement among sites can change when gamma rate parameter is introduced in the model, the relative probabilities of amino acid replacements remain the same. In reality, however, different regions of proteins do not only evolve faster or slower, but also qualitatively differently because of various constraints. This means that depending on the position in the protein, only some types of amino acids tend to be allowed (e.g. hydrophobic or aromatic etc). This fact makes it necessary to consider different amino acid replacement probabilities for different positions even when the rate of change is the same (these kinds of models are called site heterogeneous). Fortunately, there is no need to assign to every position in the sequence its own replacement matrix (which would make the analyses computationally intractable), but they can be grouped into fewer categories. The Bayesian CAT model (Lartillot & Philippe 2004) estimates from the data the number of different categories and which kind of amino acid replacements describe these categories the best. As with the GTR model, the site heterogeneous models can be combined with gamma rate parameter to vary the overall rate at sites, adding an additional layer of complexity (but also making analysis computationally more demanding).

The ML analysis using GTR+gamma model favored a tree where ctenophores were the sister group to all other animals. The Bayesian analyses with CAT model favored either a tree were Ctenophores were the sister group to Porifera (105 000 site dataset with little missing data, but small taxon sampling) or positioned within Eumetazoa (88 000 site dataset with lot of missing data, but large taxon sampling).

The GTR model is less realistic and clearly more prone to long-branch attraction artefacts than CAT (Lartillot et al. 2007). Long-branch attraction causes fast evolving (long-branch) taxa to group together regardless of their phylogenetic affinities or pull them towards distant out-group taxa. As ctenophores appear to be at least at molecular level fast evolving (Philippe et al. 2009; Pett et al. 2011; Kohn et al. 2012), it cannot be excluded that the position of ctenophores in the ML analysis is caused by long-branch attraction artefact. Ryan et al. results also show that the ctenophores are among the faster evolving taxa in their dataset, but because the ctenophores were not extremely fast evolving, the authors thought that it is not a problem (unclear to me what gave them this confidence).

Unfortunately their Bayesian analyses are not without problems either. The small taxon analyses (where Mnemiopsis+sponge clade was sister to other animals) were problematic precisely because of poor taxon sampling (15–19 taxa depending on the outgroup size). Large number of taxa are required (Lartillot & Philippe 2004) to reliably estimate parameters of CAT model and decide between ancestral and derived character states. For the large taxon datasets the problem appeared to be the opposite – they were too big to get reliable results even after running analyses on average 200 days. This could perhaps have been solved by excluding some of the taxa (especially among well sampled Bilateria) and analyzing datasets containing for example random 50% of the original positions (44 000 instead of 88 000). PhyloBayes-MPI manual mentions that getting consistent results becomes challenging already beyond 20 000 positions.

Although CAT model is not available in the ML framework, nevertheless there are similar alternatives for ML. For example, structural and empirical mixture models containing 2–6 matrices (instead of just one) implemented in PhyML programs (Le& Gascuel 2010; Le et al. 2012). Some of these models have already been used in studying ancient phylogenetic relationships and shown to affect the results (Lasek-Nesselquist & Gogarten 2013). Pity that Ryan et al. did not explore these models.

The second main evidence Ryan et al. gave regarding phylogenetic position of ctenophores was gene content analyses. It appears that Mnemiopsis lacks many genes that are present in all other animals (including sponges) but not in outgroup species. Although the list of these missing genes for ctenophores as a whole is somewhat smaller (already authors found that few genes that were missing in Mnemiopsis were in fact present in some other ctenophore species), it probably remains quite large as all ctenophores appear to be rather closely related to each other (Podar et al. 2001). As ctenophores evolve fast and the genome of Mnemiopsis is compact and among the smallest in animals (Ryan et al. 2013), it seems likely that the missing genes have been lost secondarily. Two features of ctenophore reproductive biology might explain their fast evolution: inbreeding caused by self-fertilization (almost all ctenophores are hermaphrodites) and capability for rapid and massive reproduction. This can lead to frequent massive die-offs creating genetic bottlenecks, which facilitates the accumulation of deleterious mutations (Pett et al. 2011). It is also evident from Ryan et al's results of ML phylogenetic analyses of gene content that nonsense phylogenetic relationships can be produced: for example Annelida was not monophyletic, because one species was together with a mollusk as a sister group to a cephalochordate.

In summary, Ryan et al's phylogenetic analyses aren’t particularly convincing. Claiming that phylogenetic position of ctenophores is now resolved is annoying. Before we rearrange the animal tree of life, let's wait for more thorough analyses. And more data wouldn't hurt either.

References

Bracken-Grissom H, Collins AG, Collins T, Crandall K, Distel D, Dunn C, Giribet G, Haddock S, Knowlton N, Martindale M, Medina M, Messing C, O’Brien SJ, Paulay G, Putnam N, Ravasi T, Rouse GW, Ryan JF, Schulze A, Wörheide G, Adamska M, Bailly X, Breinholt J, Browne WE, Diaz MC, Evans N, Flot J-F, Fogarty N, Johnston M, Kamel B, Kawahara AY, Laberge T, Lavrov D, Michonneau F, Moroz LL, Oakley T, Osborne K, Pomponi SA, Rhodes A, Santos SR, Satoh N, Thacker RW, Van de Peer Y, Voolstra CR, Welch DM, Winston J, Zhou X (2014) The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes. The Journal of heredity 105: 1–18. doi: 10.1093/jhered/est084
Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith S a, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sørensen M V, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452: 745–749. doi: 10.1038/nature06614
Kohn AB, Citarella MR, Kocot KM, Bobkova Y V, Halanych KM, Moroz LL (2012) Rapid evolution of the compact and unusual mitochondrial genome in the ctenophore, Pleurobrachia bachei. Molecular phylogenetics and evolution 63: 203–207. doi: 10.1016/j.ympev.2011.12.009
Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular biology and evolution 21: 1095–109. doi: 10.1093/molbev/msh112
Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC evolutionary biology 7 Suppl 1: S4. doi: 10.1186/1471-2148-7-S1-S4
Le SQ, Gascuel O (2010) Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial. Systematic biology 59: 277–87. doi: 10.1093/sysbio/syq002
Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino Acid replacement matrices depending on site rates. Molecular biology and evolution 29: 2921–36. doi: 10.1093/molbev/mss112
Lasek-Nesselquist E, Gogarten JP (2013) The effects of model choice and mitigating bias on the ribosomal tree of life. Molecular phylogenetics and evolution 69: 17–38. doi: 10.1016/j.ympev.2013.05.006
Nosenko T, Schreiber F, Adamska M, Adamski M, Eitel M, Hammel J, Maldonado M, Müller WEG, Nickel M, Schierwater B, Vacelet J, Wiens M, Wörheide G (2013) Deep metazoan phylogeny: When different genes tell different stories. Molecular phylogenetics and evolution 67: 223–233. doi: 10.1016/j.ympev.2013.01.010
Pett W, Ryan JF, Pang K, Mullikin JC, Martindale MQ, Baxevanis AD, Lavrov D V (2011) Extreme mitochondrial evolution in the ctenophore Mnemiopsis leidyi: Insight from mtDNA and the nuclear genome. Mitochondrial DNA 22: 130–142. doi: 10.3109/19401736.2011.624611; alternateive link
Philippe H, Derelle R, Lopez P, Pick K, Borchiellini C, Boury-Esnault N, Vacelet J, Renard E, Houliston E, Quéinnec E, Da Silva C, Wincker P, Le Guyader H, Leys S, Jackson DJ, Schreiber F, Erpenbeck D, Morgenstern B, Wörheide G, Manuel M (2009) Phylogenomics revives traditional views on deep animal relationships. Current biology 19: 706–712. doi: 10.1016/j.cub.2009.02.052
Podar M, Haddock SH, Sogin ML, Harbison GR (2001) A molecular phylogenetic framework for the phylum Ctenophora using 18S rRNA genes. Molecular phylogenetics and evolution 21: 218–230. doi: 10.1006/mpev.2001.1036
Ryan JF, Pang K, Schnitzler CE, Nguyen A-D, Moreland RT, Simmons DK, Koch BJ, Francis WR, Havlak P, Smith S a, Putnam NH, Haddock SHD, Dunn CW, Wolfsberg TG, Mullikin JC, Martindale MQ, Baxevanis AD (2013) The genome of the ctenophore Mnemiopsis leidyi and its implications for cell type evolution. Science 342: 1242592. doi: 10.1126/science.1242592