The first
genome of Ctenophora has now been sequenced (Ryan et al. 2013),
specifically that of Mnemiopsis leidyi. Genomes of most of the 30 or so
animal phyla are still unavailable, but this might change in the coming years (Bracken-Grissomet al. 2014). I had great hopes of getting new insights
about pinpointing the phylogenetic position of ctenophores. Unfortunately, the Science paper was disappointing and
mainly not because the phylogenetic position of Ctenophora remains unsolved,
but because the authors want to give the impression that it is. The abstract of
the paper is terribly misleading. Of course, it is more attractive to present
the results in a way that one of the long standing questions in animal evolution
has been solved, rather than admitting that even after sequencing the whole
genome, it's still not sure.
Ryan et al.
want to make the case, that ctenophores are the sister group to all other
animals. This was suggested for the first time in 2008 (Dunn et al.),
but it wasn't taken very seriously even by the authors and a year later (Philippe et al. 2009) it was shown that the likely problem
causing such a strange placement of ctenophores was their fast molecular
evolution. Taking this into account, Philippe et al. found that ctenophores most likely belong to Eumetazoa (animals with true tissues,
muscles and nervous system) and the sister group to all other animals is Porifera
(sponges, who do not have muscles and nervous system), as always suspected.
According to Philippe et al. ctenophores are most closely related to Cnidaria,
with whom they superficially resemble, and were traditionally classified
together as Coelenterata. The results of Philippe et al. can by no means be
considered final, however, and reliably deciphering the relationships between
the main lineages of animals (Porifera, Placozoa, Ctenophora, Cnidaria, and
Bilateria) is still difficult (Nosenko et al. 2013).
So what
evidence Ryan et al. provide for such a surprising phylogenetic placement of ctenophores?
The two main lines of evidence were phylogenetic analyses of protein sequences
and of genome gene contents (presence/absence of genes).
Despite the
impression the authors are giving, their phylogenetic analyses are far from
conclusive. They used two methods, maximum likelihood (ML) and Bayesian
inference, to construct phylogenies based on protein sequences. These methods
gave different results, but most likely not because of the methodological differences,
but because of different evolutionary models employed.
In the ML
framework, they used the standard GTR model, which is a site homogeneous model.
This means, that probabilities describing different amino acid (or nucleotide)
replacements (termed replacement matrix)
do not vary along the sequence. Although the overall rate of replacement among
sites can change when gamma rate parameter is introduced in the model, the
relative probabilities of amino acid replacements remain the same. In reality,
however, different regions of proteins do not only evolve faster or slower, but
also qualitatively differently because of various constraints. This means that
depending on the position in the protein, only some types of amino acids tend
to be allowed (e.g. hydrophobic or aromatic etc). This fact makes it necessary
to consider different amino acid replacement probabilities for different
positions even when the rate of change is the same (these kinds of models are
called site heterogeneous). Fortunately, there is no need to assign to every
position in the sequence its own replacement matrix (which would make the
analyses computationally intractable), but they can be grouped into fewer
categories. The Bayesian CAT model (Lartillot & Philippe 2004)
estimates from the data the number of different categories and which kind of
amino acid replacements describe these categories the best. As with the GTR model,
the site heterogeneous models can be combined with gamma rate parameter to vary
the overall rate at sites, adding an additional layer of complexity (but also
making analysis computationally more demanding).
The ML
analysis using GTR+gamma model favored a tree where ctenophores were the sister
group to all other animals. The Bayesian analyses with CAT model favored either
a tree were Ctenophores were the sister group to Porifera (105 000 site dataset
with little missing data, but small taxon sampling) or positioned within
Eumetazoa (88 000 site dataset with lot of missing data, but large taxon
sampling).
The GTR
model is less realistic and clearly more prone to long-branch attraction
artefacts than CAT (Lartillot et al. 2007). Long-branch attraction causes fast
evolving (long-branch) taxa to group together regardless of their phylogenetic
affinities or pull them towards distant out-group taxa. As ctenophores appear
to be at least at molecular level fast evolving (Philippe et al. 2009; Pett et al. 2011; Kohn et al. 2012), it cannot be excluded that the position of
ctenophores in the ML analysis is caused by long-branch attraction artefact. Ryan
et al. results also show that the ctenophores are among the faster evolving
taxa in their dataset, but because the ctenophores were not extremely fast evolving, the
authors thought that it is not a problem (unclear to me what gave them this
confidence).
Unfortunately
their Bayesian analyses are not without problems either. The small taxon
analyses (where Mnemiopsis+sponge clade was sister to other animals) were
problematic precisely because of poor taxon sampling (15–19 taxa depending on
the outgroup size). Large number of taxa are required (Lartillot & Philippe 2004) to reliably estimate parameters of CAT model and decide between ancestral
and derived character states. For the large taxon datasets the problem appeared
to be the opposite – they were too big to get reliable results even after running
analyses on average 200 days. This could perhaps have been solved by excluding
some of the taxa (especially among well sampled Bilateria) and analyzing
datasets containing for example random 50% of the original positions (44 000
instead of 88 000). PhyloBayes-MPI manual mentions that getting consistent results becomes challenging already beyond
20 000 positions.
Although
CAT model is not available in the ML framework, nevertheless there are similar
alternatives for ML. For example, structural and empirical mixture models
containing 2–6 matrices (instead of just one) implemented in PhyML programs (Le& Gascuel 2010; Le et al. 2012). Some of these models have already been
used in studying ancient phylogenetic relationships and shown to affect the
results (Lasek-Nesselquist & Gogarten 2013). Pity that Ryan et al. did not explore
these models.
The second
main evidence Ryan et al. gave regarding phylogenetic position of ctenophores
was gene content analyses. It appears that Mnemiopsis
lacks many genes that are present in all other animals (including sponges) but
not in outgroup species. Although the list of these missing genes for ctenophores
as a whole is somewhat smaller (already authors found that few genes that were
missing in Mnemiopsis were in fact
present in some other ctenophore species), it probably remains quite large as
all ctenophores appear to be rather closely related to each other (Podar et al. 2001). As ctenophores evolve fast and the genome of Mnemiopsis is compact and among the smallest in animals (Ryan et al. 2013), it seems likely that the missing genes have been lost
secondarily. Two features of ctenophore reproductive biology might explain their
fast evolution: inbreeding caused by self-fertilization (almost all ctenophores
are hermaphrodites) and capability for rapid and massive reproduction. This can
lead to frequent massive die-offs creating genetic bottlenecks, which
facilitates the accumulation of deleterious mutations (Pett et al. 2011). It is
also evident from Ryan et al's results of ML phylogenetic analyses of gene content that nonsense phylogenetic relationships can be
produced: for example Annelida was not monophyletic, because one species was
together with a mollusk as a sister group to a cephalochordate.
In summary, Ryan et al's phylogenetic analyses aren’t particularly convincing. Claiming that
phylogenetic position of ctenophores is now resolved is annoying. Before
we rearrange the animal tree of life, let's wait for more thorough analyses. And
more data wouldn't hurt either.
References
Bracken-Grissom H, Collins AG,
Collins T, Crandall K, Distel D, Dunn C, Giribet G, Haddock S, Knowlton N,
Martindale M, Medina M, Messing C, O’Brien SJ, Paulay G, Putnam N, Ravasi T,
Rouse GW, Ryan JF, Schulze A, Wörheide G, Adamska M, Bailly X, Breinholt J,
Browne WE, Diaz MC, Evans N, Flot J-F, Fogarty N, Johnston M, Kamel B, Kawahara
AY, Laberge T, Lavrov D, Michonneau F, Moroz LL, Oakley T, Osborne K, Pomponi
SA, Rhodes A, Santos SR, Satoh N, Thacker RW, Van de Peer Y, Voolstra CR, Welch
DM, Winston J, Zhou X (2014) The Global Invertebrate Genomics Alliance (GIGA):
developing community resources to study diverse invertebrate genomes. The
Journal of heredity 105: 1–18. doi: 10.1093/jhered/est084
Dunn CW, Hejnol A, Matus DQ, Pang K,
Browne WE, Smith S a, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sørensen M V,
Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale
MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the
animal tree of life. Nature 452: 745–749. doi: 10.1038/nature06614
Kohn AB, Citarella MR, Kocot KM,
Bobkova Y V, Halanych KM, Moroz LL (2012) Rapid evolution of the compact and
unusual mitochondrial genome in the ctenophore, Pleurobrachia bachei. Molecular
phylogenetics and evolution 63: 203–207. doi: 10.1016/j.ympev.2011.12.009
Lartillot N, Philippe H (2004) A
Bayesian mixture model for across-site heterogeneities in the amino-acid
replacement process. Molecular biology and evolution 21: 1095–109. doi: 10.1093/molbev/msh112
Lartillot N, Brinkmann H, Philippe H
(2007) Suppression of long-branch attraction artefacts in the animal phylogeny
using a site-heterogeneous model. BMC evolutionary biology 7 Suppl 1: S4. doi: 10.1186/1471-2148-7-S1-S4
Le SQ, Gascuel O (2010) Accounting
for solvent accessibility and secondary structure in protein phylogenetics is
clearly beneficial. Systematic biology 59: 277–87. doi: 10.1093/sysbio/syq002
Le SQ, Dang CC, Gascuel O (2012)
Modeling protein evolution with several amino Acid replacement matrices
depending on site rates. Molecular biology and evolution 29: 2921–36. doi: 10.1093/molbev/mss112
Lasek-Nesselquist E, Gogarten JP
(2013) The effects of model choice and mitigating bias on the ribosomal tree of
life. Molecular phylogenetics and evolution 69: 17–38. doi: 10.1016/j.ympev.2013.05.006
Nosenko T, Schreiber F, Adamska M,
Adamski M, Eitel M, Hammel J, Maldonado M, Müller WEG, Nickel M, Schierwater B,
Vacelet J, Wiens M, Wörheide G (2013) Deep metazoan phylogeny: When different
genes tell different stories. Molecular phylogenetics and evolution 67:
223–233. doi: 10.1016/j.ympev.2013.01.010
Pett W, Ryan JF, Pang K, Mullikin
JC, Martindale MQ, Baxevanis AD, Lavrov D V (2011) Extreme mitochondrial
evolution in the ctenophore Mnemiopsis leidyi: Insight from mtDNA and the
nuclear genome. Mitochondrial DNA 22: 130–142. doi: 10.3109/19401736.2011.624611; alternateive link
Philippe H, Derelle R, Lopez P, Pick
K, Borchiellini C, Boury-Esnault N, Vacelet J, Renard E, Houliston E, Quéinnec
E, Da Silva C, Wincker P, Le Guyader H, Leys S, Jackson DJ, Schreiber F,
Erpenbeck D, Morgenstern B, Wörheide G, Manuel M (2009) Phylogenomics revives
traditional views on deep animal relationships. Current biology 19: 706–712.
doi: 10.1016/j.cub.2009.02.052
Podar M, Haddock SH, Sogin ML,
Harbison GR (2001) A molecular phylogenetic framework for the phylum Ctenophora
using 18S rRNA genes. Molecular phylogenetics and evolution 21: 218–230. doi: 10.1006/mpev.2001.1036
Ryan JF, Pang K, Schnitzler CE,
Nguyen A-D, Moreland RT, Simmons DK, Koch BJ, Francis WR, Havlak P, Smith S a,
Putnam NH, Haddock SHD, Dunn CW, Wolfsberg TG, Mullikin JC, Martindale MQ,
Baxevanis AD (2013) The genome of the ctenophore Mnemiopsis leidyi and its
implications for cell type evolution. Science 342: 1242592. doi: 10.1126/science.1242592