Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences.

AUTOR(ES)
RESUMO

A neural network system has been developed for rapid and accurate classification of ribosomal RNA sequences according to phylogenetic relationship. The molecular sequences are encoded into neural input vectors using an n-gram hashing method. A SVD (singular value decomposition) method is used to compress and reduce the size of long and sparse n-gram input vectors. The neural networks used are three-layered, feed-forward networks that employ supervised learning paradigms, including the back-propagation algorithm and a modified counter-propagation algorithm. A pedagogical pattern selection strategy is used to reduce the training time. After trained with ribosomal RNA sequences of the RDP (Ribosomal Database Project) database, the system can classify query sequences into more than one hundred phylogenetic classes with a 100% accuracy at a rate of less than 0.3 CPU second per sequence on a workstation. When compared to other sequence similarity search methods, including Similarity Rank, Blast and Fasta, the neural network method has a higher classification accuracy at a speed of about an order of magnitude faster. The software tool will be made available to the biology community, and the system may be extended into a gene identification system for classifying indiscriminately sequenced DNA fragments.

Documentos Relacionados