Predição de RNAs não-codificadores no transcriptoma do fungo Paracoccidioides brasiliensis usando aprendizagem de máquina

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

Paracoccidioides brasiliensis (Pb) is a saprophytic and dimorphic fungus of clinical importance because its propagules, when inhaled by humans, cause the disease known as paracoccidioidomycosis. In the year 2005 the Pb transcriptome was published, pointing out several potential drug targets, but still a significative amount of sequenced transcripts lack identified homologous proteins. This work suggests that these RNAs may be non-coding RNAs (ncRNAs), a class of biologically functional molecules that do not code for any protein product. Aiming this, a strictly computational approach was made, using known examples of mRNAs and ncRNAs for training two machine learning algorithms: naive Bayes (nB) and Support Vector Machines (SVM). Several programs available from literature and locally developed were used to obtain properties from transcripts and its corresponding protein products, in such a way that machine learning algorithms could successfully discriminate between mRNA and ncRNA. Several efficiency measurements show that both algorithms, SVM and nB, induced classifiers able to efficiently discriminate the two classes of RNAs, and also indicate that SVM has a significative advantage regarding ncRNA detection. Mean accuracy as estimated by 10-fold cross-validation procedure was 92.4% for SVM and 75.3% for nB. When used in the Pb transcriptome, SVM and nB detect, respectively, 970 and 262 ncRNAs, of which the majority is composed of singlets and unnanotated transcripts, two characteristics that support the possibility that these transcripts are real ncRNAs. Comparison to related works indicates that the described program offers a computational speed improvement without hindering accuracy. This work describes the design of a computational program for ab initio analysis, named PORTRAIT, specialized in detection of ncRNAs in transcriptomes from poorly characterized organisms.

ASSUNTO(S)

rnas não-codificadores máquinas de vetores de suporte transcriptoma paracoccidoides brasiliensis machine learning non-coding rnas aprendizagem de máquina transcriptome support vector machines paracoccidoides brasiliensis biologia molecular

Documentos Relacionados