SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data

AUTOR(ES)
FONTE

The National Academy of Sciences

RESUMO

A large fraction of the information content of metazoan genomes resides in the transcriptional and posttranscriptional cis-regulatory elements that collectively provide the blueprint for using the protein-coding capacity of the DNA, thus guiding the development and physiology of the entire organism. As successive whole-genome sequencing projects—–including those of mice and humans—are completed, we have full access to the regulatory genome of yet another species. But our ability to decipher the cis-regulatory code, and hence to link genes into regulatory networks on a global scale, is currently very limited. Here we describe SCORE (Site Clustering Over Random Expectation), a computational method for identifying transcriptional cis-regulatory modules based on the fact that they often contain, in statistically improbable concentrations, multiple binding sites for the same transcription factor. We have carried out a Drosophila genomewide inventory of predicted binding sites for the Notch-regulated transcription factor Suppressor of Hairless [Su(H)] and found that the fly genome contains highly nonrandom clusterings of Su(H) sites over a broad range of sequence intervals. We found that the most statistically significant clusters are very heavily enriched in both known and logical targets of Su(H) binding and regulation. The utility of the SCORE approach was validated by in vivo experiments showing that proper expression of the novel gene Him in adult muscle precursor cells depends both on Su(H) gene activity and sequences that include a previously unstudied cluster of four Su(H) sites, indicating that Him is a likely direct target of Su(H).

Documentos Relacionados