Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances.

AUTOR(ES)
RESUMO

The reconstruction of phylogenetic trees from DNA and protein sequences is confounded by unequal rate effects. These effects can group rapidly evolving taxa with other rapidly evolving taxa, whether or not they are genealogically related. All algorithms are sensitive to these effects whenever the assumptions on which they are based are not met. The algorithm presented here, called paralinear distances, is valid for a much broader class of substitution processes than previous algorithms and is accordingly less affected by unequal rate effects. It may be used with all nucleic acid, protein, or other sequences, provided that their evolution may be modeled as a succession of Markov processes. The properties of the method have been proven both analytically and by computer simulations. Like all other methods, paralinear distances can fail when sequences are misaligned or when site-to-site sequence variation of rates is extensive. To examine the usefulness of paralinear distances, the "origin of the eukaryotes" has been investigated by the analysis of elongation factor Tu sequences with a variety of sequence alignments. It has been found that the order in which sequences are pairwise aligned strongly determines the topology which is reconstructed by paralinear distances (as it does for all other reconstruction methods tested). When the parts of the alignment that are unaffected by alignment order are analyzed, paralinear distances strongly select the eocyte topology. This provides evidence that the eocyte prokaryotes are the closest prokaryotic relatives of the eukaryotes.

Documentos Relacionados