Identificação e Análise de Sequências Codificantes com Atributos Conflitantes em Genomas Procariotos / Analysis and Identification of Prokaryotic Coding Sequences With Conflicting Atributes

AUTOR(ES)
DATA DE PUBLICAÇÃO

2010

RESUMO

The advent of new sequencing technologies and the development of computational tools that facilitate the analysis of genomes, generated the exponential growth of genome databases. New approaches in-silico of the comparative genomics use such data in its comparisons. Nevertheless, recent work on the genome of Escherichia coli indicate that the current state of coding sequences (Coding Sequences - CDS) from annotated genomes contain several errors, which need to be verified (Ochman e Davalos 2006). Therefore the correct description of a CDS is important to allow future genomic comparisons. Currently, there is an innovated proposal of the scientific community of biological databases to establish standards for the submission of the draft genome sequences in the new era of sequencing. Within this context, it is highlighted the identification and/or correction of frameshifts during the assembly of genomic sequences. The goal of this work was developing a tool with two comparative methods to identify CDSs with conflicting attributes. It uses the description of conflict to describe attributes such as frameshifts, large insertions or deletions, truncations, etc.. that are detected from a CDS or several CDSs used as references, depending on model. Also, the proposed tool allows to user to view of the results graphically and provide access to other tools, providing support for future friendly and faster genomic analysis. As a model of study, it was used the analysis of CDSs with conflicting attributes of the genome of E. coli strain CFT073 (NCBI) version AE014075.1, (last update date: April 20 of 2006), with this purpose was used as a reference genome of E.coli strain O157: H7 EDL933 version AE005174.2 (last update date: 6 June of 2008). Through this analysis were identified and stored 1865 CDSs (Included possible paralogs) because they present only alignments with coverage exceeding 30% of the CSD of reference. In a more detailed analysis of these results, 144 CDSs startle in the target genome by probably present frameshifts, of which 21 occur in intergenic regions. The tool developed in this work, also was applied to the case study of a genomic region of the bacterium Klebsiella pneumoniae strain KP13. The genome of this bacterium was sequenced in Computational Genomics Unit (UGC) Darcy Fontoura de Almeida LNCC (unpublished data). From the analysis of these genomes, one can conclude the importance of using the tool in the stages of identification, verification and correction of errors in annotation, and thus the need for its inclusion in the sequencing projects that want to reach high standards in the submission of genomic data.

ASSUNTO(S)

bioinformática genômica comparativa sequências codificantes mvc atributos conflitantes bioinformatics, comparative genomics coding sequences mvc conflicting attributes computabilidade e modelos de computacao

Documentos Relacionados