Development of a database for classification and analysis of type IV secretion systems / Desenvolvimento de um banco de dados para classificação e análise de sistemas de secreção do tipo IV bacteriano

AUTOR(ES)
DATA DE PUBLICAÇÃO

2008

RESUMO

The type IV secretion system can be classified as a large family of macromolecule transporters divided in three recognized sub-families involved in different bacterial functions. The major sub-family of T4SS is the conjugation system, which allows transfer of genetic material as a nucleoprotein via cell contact among bacteria. Analogously to bacterial conjugation, the T4SS can transfer genetic material from bacteria to eukaryotic cells; such is the case of T-DNA transfer of Agrobacterium tumefaciens to host plant cells. The system of effector proteins transport constitutes the second sub-family, being indispensable for infection processes of several mammalian and plants pathogens. The third sub-family corresponds to the DNA uptake/release system involved in genetic transformation competence, independently of cell contact, as it was described to the systems VirB/D4 from Campylobacter jejuni and ComB form Helicobacter pylori. Several essential features of T4SS are well known, but the knowledge in support of an uncomplicated classification or proper protein annotation of system subunits remains confusing, which in same cases can avoid making inferences about evolution of the system in bacterial species. The purpose of this work was to organize, classify and integrate the knowledge about T4SS through building a database devoted to this bacterial secretion system. The T4SS database was created using the SGBD MySQL and Perl programming language and with a web interface (HTML/CGI) that gives access to the database. Currently, this database hold genomic data from 43 bacteria and 10 plasmids acquired from the GenBank NCBI, these organisms comprise groups from Actionobacteria to Gram-negative Proteobacteria including symbiotic and pathogenic bacteria. By applying Bidirectional Best-Hits method was possible to get a core set of 75 clusters with 974 proteins involved in the T4SS. Also, during this procedure BlastP, Muscle e ClustalW algorithms were applied. The database was manually annotated supported by cross references built-in the T4SS annotation pages, such as the UniProtKB/Swiss-Prot, COG, InterPro and TCDB as well as by the methods for signal peptide and transmembrane regions prediction. All T4SS protein records scattered into 75 ortholog clusters were organized into five different classes of type IV secretion system proteins: (i) Type IVA Mpf/T4CP; (ii) Type IVA Dtr; (iii) F-type plasmid; (iv) IncP-1-type plasmid; (v) Type IVB Icm/Dot. All 974 proteins were annotated into 68 well-known families, which can be involved in conjugation, effector translocator, DNA uptake/release or even can be bifunctional proteins. Also, by using the Maximum Likelihood method were built 70 unrooted phylogenetic trees that represents just 70 clusters instead of 75, this is due to five clusters had only two protein sequences, five unrooted phylogenetic trees were built for each group of first hierarchical classification, one unrooted phylogenetic trees including proteins from archetype systems of all groups, one unrooted phylogenetic trees from 16S sequence of each organism and one rooted tree including a sequence from a Gram-positive bacteria as an external group. The phylogenetic analyses show that some proteins of T4SS are more divergent than others, which indicate that for a particular function few sequence mutations were needed, but other proteins required many sequence mutations to get another functions. Thus, these results proved that proteins belong to the same cluster show different functions: conjugation, DNA uptake/release or effector translocator. Consequently, it was possible verify that similar functions were grouped together within phylogenetic tree, which allowed to annotate a probable function of some uncharacterized proteins, that is possibly due to the sequence similarity may reveal a similar evolution to get the same function. Thus, the phylogenetic trees allowed confirming the protein annotation as well as inferring whether uncharacterized proteins would encompass a known function. The T4SS database will be an open access, given to the users searching and submission sequence tools, which will permit to get insights about classification and phylogeny of T4SS sequence of interest. T4SS Database is accessible at the URL http://www.t4ss.lncc.br.

ASSUNTO(S)

computabilidade e modelos de computacao genomics secretion system of classification of type iv banco de dados, classificação do sistema de secreção do tipo iv bioinformática genômica database bioinformatics

Documentos Relacionados