|
?A simple method for determining the target genes of transcription factors
|
|
By timothy auyeung, Section Biology Posted on Mon May 3rd, 2010 at 12:12:19 AM PST
|
 |
|
In response to changes in physiological conditions, cells turn on and off the expression of specific groups of genes to initiate specific biological processes. The expression of these genes is controlled by the binding of transcription factors to short DNA segments called transcription factor binding sites (TFBS) located in the cis-regulatory regions of the genes (Stormo 2000). The bound transcription factors subsequently recruit RNA polymerase and other transcriptional machineries to synthesize gene transcripts. Together, these molecular players constitute the gene regulatory network that controls the expression of each gene in the cell. Obtaining a comprehensive understanding of the regulatory network will drive advancement in many fields including cancer and pathology research as the network allows the identification of novel drug targets.
|
| Currently, one of the most preferred experimental strategies that can comprehensively probe the binding sites of a given transcription factor in vivo as well as in a genome-wide scale involves performing chromatin immunoprecipitation (ChIP) on the transcription factor followed by characterizing the bound sequences via direct sequencing (ChIP-seq). These sequences are subsequently used for constructing transcription factor binding site profiles with which the locations of the binding sites in the genome are determined. However, determining the target genes of the transcription factor remains a challenge. Currently, the target genes are primarily predicted based on the spatial distance between the transcriptional start site of the genes and the transcription factor binding sites (the gene that is spatially closest to a given binding site is considered the target gene). We wish to propose a simple method that involves a series of filtering steps for improving target gene predictions.
The ever accumulating data from genome-wide expression experiments (eg from microarrays and RNA-seq) represents a golden resource that the method can harness.
Since each of these experiments investigate the expression patterns of every gene in the genome under a few physiological conditions of interest (eg. gene knockout vs wildtype), one can combine the results from each of these experiments side by side to determine the expression patterns of the genes under a wide variety of conditions (Obayashi et al. 2008). By comparing the expression patterns of the transcription factor of interest and those of the other genes in the genome, one can assert higher confidence regarding which genes are likely candidates that are regulated by the given transcription factor. Specifically, if the transcription factor activates target gene transcription, one would expect a positive correlation between the transcription factor's expression profile and the target genes' expression profiles (ie. if the expression level of the transcription goes up, so will that of its target genes). In contrast, we would expect the opposite if the transcription factor represses target gene transcription. Evidently, this step requires prior knowledge regarding whether the transcription functions as an activator or a repressor and, when this is not known, both hypotheses need to be tested. Therefore, further filtering needs to be performed for the set of candidate target genes whose expression profiles correlate with that of the transcription factor. The second step investigates the concordance between the known biological function of the transcription factor and those of the candidate target genes. This may be accomplished by undertaking a simple approach that compares the Gene Ontology (GO) annotations of the transcription factor and those of the candidate target genes. Specifically, one can determine what major biological processes the target genes may be involved in by finding the GO terms that are attached to most of the target genes (such terms are denoted as over-represented GO terms). By finding which target genes possess the over-represented GO term that overlap with the GO terms of the transcription factor, an improved set of candidate target genes can likely be obtained.
In summary, the proposed method provides a straightforward but highly feasible way to screen for candidate target genes whose expression signatures and biological annotations agree with those of the transcription factor of interest.
References
Stormo, G. 2000. DNA binding sites: representation and discovery. Bioinformatics, 16(1): 16-23.
Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. 2008. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res. 36, D77-D82.
|
|
|
|
|
|
|