Estrutura genomica de tres megabases de DNA genomico (shotugun) de Eucalyptus : conteudo nucleotidico, sequencias repetitivas e genes

AUTOR(ES)
DATA DE PUBLICAÇÃO

2004

RESUMO

In this work we intended to obtain an overview of the structure and composition of the Eucalyptus genome by sample sequencing 10.000 genomic DNA fragments obtained from a shotgun genomic library from E. grandis, that represents 3,0 Mbp of the E. grandis genome. The reads were filtered by their quality and length (phred value >=20; length >=150) and analyzed for their nucleotide content, repetitive patterns, repetitive elements and gene content. The program RepeatMasker was used to analyze the %GC content and repetitive patterns and elements. The results indicate that on average the Eucalyptus genome is composed of 40.15% of GC. From the total of the bases sequenced approximately 1.4% were located in transposons, distributed in 310 interespersed repetitive genetic elements, among which 299 classified as retroelements, mainly LTRs. We also identified 986 microsatellites and 1636 low complexity sequences. 5.8% of the sequenced bases were located on repetitive sequences. We used an alternative approach to identify putative genes by comparing the genomic sequences with a Eucalyptus ESTs database using the GenESTate software. We attributed putative functions using a pipeline were the éxons of each gene were put togheter and compared with protein domains data banks. This procedure avoids the misleading results obtained when comparing DNA sequences with sequences deposited in GenBank. The sequences were clustered using the CAP3 software, resulting in 766 agrupamentos contíguos and 5428 singletos, the former showing an average of 1200 bp. These 766 agrupamentos contíguos were compared with more than 5,000 E. grandis ESTs from mature leaf tissue and 6,000 E. urophylla ESTs from xylem. From the 766 agrupamentos contíguos we found 44 that showed high similarity to some ESTs. The coding portion of the sequences accounted for around 2% of the total sequences. It is important to highlight that by this approach it was possible to identify íntrons and éxons, beside core promoter regions, which can t be identified in the ESTs. Other 166 possible genes were identified among the genomic sequences by using blastx-nr in NCBI. We also identified putative genes responsible for 16 tRNAs using the tRNAscan-SE software. These sequences are being used in the Genolyptus Project for the development of novel randomly distributed microsatellites markers, for the identification of promoter regions and will be used to assist in the development of overgo-probes to be applied in the anchoring of the genetic map to the physical map.

ASSUNTO(S)

genomas senquencia de nucleotidios eucalipto

Documentos Relacionados