关注公众号

关注公众号

手机扫码查看

手机查看

喜欢作者

打赏方式

微信支付微信支付
支付宝支付支付宝支付
×

利用人工组合转录因子对人类基因组扫描

2019.5.20

Scanning the human genome with combinatorial transcription factor libraries

Published online: 18 February 2003, doi:10.1038/nbt794
March 2003 Volume 21 Number 3 pp 269 - 274

Pilar Blancafort, Laurent Magnenat & Carlos F. Barbas III.
 

Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037.
Correspondence should be addressed to C F Barbas. e-mail: carlos@scripps.edu


Despite the critical importance of transcription factors in mediating gene regulation, there exists no general, genome-wide tool that uses transcription factors to induce or silence a target gene or select for a particular phenotype. In the strategy described here, we prepared large combinatorial libraries of artificial transcription factors comprising three or six zinc-finger domains, and selected transcription factor–DNA interactions able to upregulate several genes in human cells. Selected transcription factors either induced the expression of an endothelial-specific differentiation marker, VE-cadherin, in non-endothelial cell lines or, when combined with a repression domain, knocked down expression. Potential binding sites for a number of these transcription factors were mapped along the promoter of CDH5, the gene encoding VE-cadherin. Transcription factor libraries represent a useful approach for studying and modulating gene function in cells and potentially in whole organisms.

Regulatory sequences and their attendant transcription factors provide the spatial-temporal cues that direct when, where, and to what extent a given gene is expressed. Most regulatory sequences contain binding sites for repertoires of transcription factors that mediate activation or repression of target genes. Considerable efforts have been devoted to engineering artificial sequence-specific transcription factors able to regulate specific genes, particularly therapeutic targets1. Compared with other approaches to studying gene function, such as RNA interference, ribozymes, or antisense RNA, that provide solely knock-down phenotypes2, 3, transcription factor–based tools can generate both loss-of-function phenotypes (when the transcription factor is linked to a repressor domain) and gain-of-function phenotypes (through linkage to an activator domain). Nevertheless, no general genome-wide transcription-factor tools have been described4.

Current transcription factor–based strategies involve the individualized design and testing of transcription factors targeted to particular genes. Modular zinc-finger DNA-recognition domains allow the assembly of transcription factors with predictable in vitro specificity. Such 'de novo' design has been used successfully for the regulation of a small number of genes (including ERBB2, ERBB3, VEGF, and EPO5-8). However, rational design has not always yielded functional regulators in vivo, mainly because knowledge of both regulatory areas and of endogenous factors affecting transcription factor–DNA interactions (such as chromatin structure, accessibility of the regulatory area, DNA modifications, and the presence of other cellular or tissue-specific factors) is often very limited6, 7. In the combinatorial strategy described here, large libraries of artificial transcription factors were created and used to select in vivo protein-DNA interactions that confer a desired phenotype or molecular function to human cells through the activation of one or more genomic loci.

Results and discussion

Construction of zinc-finger libraries.

We created libraries of zinc-finger transcription factors (TFZFs) for the recognition of DNA target sites of 9 and 18 base pairs (bp). Zinc-finger domains have exquisite sequence specificity and modularity1. Previous studies have identified alpha-helical sequences in the zinc-finger domain that confer specific recognition of 3 bp of DNA sequence, and have shown that these domains can be recombined to prepare polydactyl zinc-finger proteins of desired specificity5-13. Use of characterized zinc-finger domains allowed the prediction of potential DNA binding sites for each TFZF after the functional screen or selection was done.


We created the 3ZF library by combinatorial assembly of three different zinc-finger repertoires (ZF1, ZF2, and ZF3). Each repertoire consisted of an equimolar mixture of a subset of defined zinc-finger DNA sequences encoding a characteristic alpha-helical element previously optimized to provide specific recognition of 3 bp of DNA (Fig. 1A). Combination of a variety of available specific zinc-finger recognition helices for ZF1, ZF2, and ZF3 (consisting of all the helices recognizing DNA triplets of type GNN and a subset of the ANN and TNN triplets8, 10, 12) allowed the preparation of a 9,177 member, 9 bp–targeting 3ZF library. The 3ZF library was then used as a template to assemble the 18 bp–targeting 6ZF library (8.4 times 107members). TFZFs were linked to a potent transcriptional activation domain (VP64)10. We expected that the 3ZF library would recognize a subset of genomic DNA sequences of type 5'-(NNN)3-3', whereas the 6ZF library would recognize a subset of genomic sequences of type 5'-(NNN)6-3'. Given the zinc-finger domains used, both libraries were more likely to recognize (RNN)x-type sequences (R = G or A). In theory, the human genome contains 750 million (RNN)3 sites (considering both strands) and 93.75 million (RNN)6 sites14. Although any (RNN)3-binding TFZF might be expected to bind many sites in the genome, in the living cell many of these binding sites would be inaccessible or in regions with no impact on regulation. (RNN)6-binding TFZFs can bind unique sites in the genome.

Figure 1: The TFZF library design.
(A) The TFZF library construction based on the modular organization of protein-DNA contacts. (B) Screening for functional TFZF activators in A431 cells. (C–F) Flow cytometric analysis of A431 cells infected with some of the selected pMX-TFZF pools from the 3ZF selections (C, D) or 6ZF selections (E, F). Shown are upregulation of ERBB-2 (C), VE-cadherin (D, E), and ICAM-1 (F). Blue, A431 cells infected with the selected pMX-TFZF pools and stained with the corresponding antibody; orange, A431 cells infected with the 3ZF or 6ZF unselected libraries; green, mock-infected cells; stippled line, control staining without primary antibody.

Screening for upregulation of target genes in human cell lines.

We delivered millions of TFZFs into the human squamous carcinoma cell line A431 using a retroviral vector, pMX-IRES-GFP15 (Fig. 1B). Infection efficiency and expression of individual library members were tracked with the green fluorescent protein (GFP) marker. Cells overexpressing a target gene product on the cell surface were selected by flow cytometry. The DNA encoding the zinc-finger domain was recovered by PCR and re-cloned into the retroviral vector for subsequent rounds of selection. Finally, individual TFZF clones were isolated and sequenced and their functional properties were analyzed in vivo and in vitro. A431 cells infected with 3ZF and 6ZF libraries were screened with monoclonal antibodies against ten different markers: vascular endothelial cadherin (also known as VE-cadherin; cadherin-5 type 2, CDH5; CD144); 3-FAL selectin ligand (fucosyltransferase-4, FUT4; CD15); Apo1-FAS antigen (tumor necrosis factor superfamily member 6, TNFRSF6; CD95); integrin-alpha6 (ITGA6; CD49f) and integrin-beta4 (ITGB4; CD104); the adhesion molecules CD54 (intracellular adhesion molecule, ICAM-1) and leukocyte function-associated antigen (LFA-3; CD58); and the receptors erythroblastic leukemia viral oncogene homolog-2 (ERBB2), ERBB-3, and epidermal growth factor (EGF). Independent selections were carried out for each marker (Fig. 1B). These markers were chosen because they localize on the cell surface, facilitating cell sorting, and because they are involved in important aspects of tumor biology such as cell proliferation, adhesion, or migration.

After three rounds of selection with the 3ZF library and four rounds with the 6ZF library, pools of infected A431 cells were analyzed by flow cytometry. For both libraries, five cell surface markers showed changes in expression levels (Fig. 1 and Supplementary Fig. 1 online): ERBB-2 and VE-cadherin were the most highly regulated by the 3ZF library (Fig. 1C,D), and VE-cadherin and ICAM-1 were the most highly regulated by the 6ZF library (Fig. 1E,F). The remaining three markers showed only small changes in gene expression with both 3ZF and 6ZF selections. Both the 3ZF (Fig. 1D) and 6ZF (Fig. 1E) libraries induced expression of the strictly endothelial-specific marker VE-cadherin. This marker is not significantly expressed in A431 cells, as determined by FACS and RT-PCR. VE-cadherin is a transmembrane glycoprotein that self-associates in the adherens junctions of endothelial cells, controlling the permeability of the endothelium16. In addition, VE-cadherin is necessary for vascular morphogenesis17 and is involved in several aspects of angiogenesis18, tumor growth, and metastasis19, 20. We focused further studies on the characterization of TFZFs activating its associated gene, CDH5.

In vitro and in vivo analysis of TFZFs regulating CDH5.

 The sequences of the TFZFs regulating CDH5 and their predicted binding sites are presented in Table 1. From a total of 48 3ZF clones and 36 6ZF clones tested, a number of sequences were identical at the nucleotide level, indicating selective pressure for particular clones from the libraries. Some TFZFs were able to induce strong CDH5 expression—for example, the 6ZF clone 144-13 and the 3ZF clones VE-1 and VE-8. To test the specificity of these TFZFs for CDH5, we delivered TFZFs into A431 cells and probed them with antibodies specific for ten different cell surface markers. The TFZF clones VE-1, VE-5, VE-8, VE-13, 144-4, 144-5, and 144-13 preferentially activated CDH5 compared with the other genes tested (Fig. 2 and Supplementary Fig. 2 online). The 3ZF clone VE-1 was the most specific TFZF regulator in vivo, as determined by FACS (Fig. 2A). 3ZF proteins may be capable of binding multiple sites in the human genome and activating, to varying degrees, more than one gene. Depending on the application, this could be a limitation or an advantage.

Table 1: 6ZF (top) and 3ZF (bottom) clones activating VE-cadherin

To verify that the selected TFZFs bound their predicted DNA substrates in vitro, we expressed the zinc-finger binding domains as C-terminal fusions with bacterial maltose-binding protein (MBP). The DNA-binding specificity of each fusion protein was tested by ELISA using a panel of DNA substrates (Fig. 2B,C). The predicted DNA binding site of each TFZF was decoded from the alpha-helical sequence of the corresponding zinc finger (Table 1). As expected, the majority of the TFZFs specifically bound their predicted target site in vitro. Notably, some of the alpha-helices selected in TFZFs VE-1, VE-5, and VE-8 were identical or very similar (Table 1), explaining their similar binding-site preferences (Fig. 2C). TFZFs VE-1 and VE-8 shared two identical a-helices that interact with the subsequence 5'-GGGGAA-3', resulting in recognition of the VE-1 predicted target site by both VE-1 and VE-8. The binding-site preferences of these proteins, and in particular the strong recognition of both VE-1 and VE-8 for the same target site, raises the possibility that these TFZFs have been selected to bind partially overlapping genomic sites.

Figure 2: Specificity of isolated TFZF clones in vivo and in vitro.
(A) A431 cells were infected with different pMX-TFZF (containing the VP64 activator domain), stained with ten different antibodies, and analyzed by flow cytometry. Blue, A431 cells infected with pMX-TFZF VE-1 (a single clone selected for VE-cadherin activation); orange, A431 cells infected with the 3ZF unselected library; green, mock-infected cells; stippled line, control staining without primary antibody. Genes encode: CD58, leukocyte function-associated antigen; CDH5, VE-cadherin (CD144); EGF, epidermal growth factor; FUT4, 3-FAL selectin ligand (CD15); ICAM1, intracellular adhesion molecule (CD54); ERBB2, ERBB3, erythroblastic leukemia viral oncogene homolog-2 and -3; ITGA6, integrin-alpha6 (CD49f); ITGB4, integrin-beta4 (CD104); TNFRSF6, Apo1-FAS antigen (CD95). (B, C) DNA-binding ELISA of the selected -6ZF (B) and -3ZF protein domains (C) expressed as fusions with MBP. All TFZFs were selected for VE-cadherin upregulation except 54.3, which was selected for ICAM-1 activation. The DNA substrates contained the 18 bp or 9 bp predicted binding site for each 6ZF or 3ZF protein, respectively (
Table 1).

To verify that the selected TFZFs were able to regulate CDH5 at the level of transcription, we analyzed CDH5 mRNA levels of A431 cells infected with clones 144-4, 144-13, and VE-1 by RT-PCR. As a positive control we used human umbilical endothelial cells (HUVEC) expressing CDH5. Specific CDH5 product was detected in A431 cells infected with the TFZF constructs, and these clones were able to upregulate the expression of CDH5 at the level of transcription (Fig. 3A,B).

Figure 3: Semiquantitative RT-PCR analysis of A431 cells infected with several pMX-TFZF selected for CDH5 activation.
(A) RT-PCR analysis of CDH5 expression in these infected cells (clones 144-4, 144-13, and VE-1). HUVEC cells, which express CDH5, were used as a positive control. A431, mock-infected cells; –, control experiment in absence of cDNA. (B) Relative CDH5 mRNA levels were normalized to TFZF expression using VP64-specific primers. Equal loading was controlled using GAPDH-specific primers.

Next, we investigated whether or not the TFZFs were able to directly activate the proximal human CDH5 promoter. In mice, a promoter fragment (-2486 to +24) is sufficient to drive endothelial-specific expression of a reporter gene in transgenic animals21. We cloned a homologous region of the human CDH5 promoter upstream of a luciferase reporter and carried out transactivation studies using TFZFs in transient transfection assays of A431 cells (Fig. 4). Only TFZFs VE-1, VE-5, and VE-8 strongly activated the CDH5 promoter (up to 200-fold; Fig. 4A and Supplementary Fig. 3 online). We mapped the VE-1, VE-5, and VE-8 response elements in the CDH5 promoter using serial deletions of the promoter. Important transactivation determinants of VE-1 were located between positions -2369 and -1861, whereas VE-5 and VE-8 responded significantly to elements located between nucleotides -1861 and -1342 (Fig. 4A). In addition, both VE-1 and VE-8 (but not VE-5) activated the proximal (-403 to +80) fragment of the CDH5 promoter 10–15-fold.

Figure 4: Interactions of TFZFs VE-1, VE-5, and VE-8 with the CDH5 promoter.
(A) Luciferase transactivation assay of VE-1, VE-5, and VE-8 with several 5' deletions of the CDH5 promoter in A431 cells. (B) DNA-binding ELISA of several promoter fragments with the TFZFs VE-1, VE-5, and VE-8 purified as a fusion with MBP. Promoter fragments (boxes) were amplified by PCR using 5'-biotinylated primers. The binding of each fragment was normalized and expressed as percentage of the highest value. Binding data was represented in a color gradient (higher binding corresponds to darker boxes). (C) DNA-binding ELISA of VE-1, VE-5, and VE-8 proteins with the DNA duplex pr–88 and with the mutant pr–88(G4right arrowT4). (D) Luciferase transactivation assay of VE-1, VE-5, and VE-8 with the proximal -88 CDH5 promoter fragment and the same fragment containing a point mutation (G4right arrowT4). (E) Summary of putative interactions between VE-1 and CDH5 promoter fragments. Open boxes, potential binding sites for VE-1 as determined in vitro; underlining, putative EBS. The sequence of the -88 bp proximal human CDH5 promoter and the point mutation Gright arrowT introduced for transactivation studies are indicated. (F) Interaction of VE-1 with several potential binding sites located in the CDH5 promoter. The Kd (s.d.) of VE-1 with its predicted DNA substrate (VE-1 subs) was determined by gel shift assay. Kd values for VE-1 with promoter DNA duplexes containing potential VE-1 binding sites (comprising the 9 bp putative interacting sequence and three flanking base pairs) were determined by ELISA and normalized to VE-1 subs. The positions of the potential binding sites relative to the transcription start site are indicated. Nucleotides that differ from the theoretical VE-1 binding site (VE-1 subs) are indicated in red. (G) DNA sequences selected in vitro from a randomized DNA library (N10) for its interaction with VE-1 by CAST assay. The number of sequences containing identical VE-1 binding site is indicated. Open box, invariable nucleotides (consensus).

Promoter regions associated with luciferase activation correlated with TFZF binding in vitro (Fig. 4B). We localized a putative VE-1 and VE-8 binding site between positions -88 and +80 of the proximal CDH5 promoter (the pr -88 duplex, 5'-CAGG4GGGAA-3') that matched 8 of 9 bp of the predicted VE-1 binding site (Fig. 4E). Indeed, both VE-1 and VE-8 interacted specifically with this duplex in vitro (Fig. 4C). A single mutation in this duplex (G4right arrowT4) completely disrupted its interactions with VE-1 and VE-8. A promoter fragment (-88 to +80) containing this sequence retained VE-1- and VE-8-mediated transactivation, whereas the fragment bearing the point mutation was unresponsive to the transcription factors (Fig. 4D). The sequence 5'-GGAA-3' (ETS-binding site-2, EBS2) is conserved between mouse and the human promoters, and in the mouse it interacts with the ETS-1 protein, a transcription factor of the ETS family expressed in endothelial cells during blood vessel formation22-24. In the mouse proximal promoter, Ets-1 binds to two neighboring GGAA sites (EBS2 and EBS4) and activates CDH5 in endothelial cells23. Our experiments showed that VE-1 and VE-8 are able to activate the proximal (-88 to +80) human CDH5 promoter fragment 10–15-fold through interaction with a single EBS, but activation of the reporter was enhanced up to 200-fold through interaction with distal sequences located between positions -2369 and -1342. In mice, the proximal CDH5 promoter (-139 to +24) is responsible for ubiquitous transcription, but upstream sequences are necessary to silence the activity of the basic promoter in non-endothelial cells21. It is possible that the TFZFs could interfere with the silencing of the CDH5 promoter between positions -2369 and -1342, resulting in an enhanced transactivation. Examination of the CDH5 promoter showed several potential VE-1, VE-5, and VE-8 binding sites in that region.

Next, we focused on TFZF VE-1 to study zinc finger–binding determinants along this distal promoter area. To determine the binding-site preferences of VE-1, we carried out in vitro DNA selection experiments (cyclic amplification of selected targets (CAST) assay) using a randomized 10 bp DNA library and purified VE-1 protein. After four rounds of DNA selection, all the analyzed selected targets contained a 7 bp invariable consensus core, 5'-AGGGGGA-3' (Fig. 4G). Positions 1 and 9 flanking this core tolerated nucleotide variations. Indeed, nucleotide 1 is the partner of Thr+6, located in the alpha-helical region of VE-1 ZF3. As in the case of Zif268 (ref. 25), Thr+6 is not expected to make specific hydrogen bonds and therefore could not unambiguously discriminate its target nucleotide. Nucleotide 9 is a target of Gln–1 in the alpha-helix of ZF1. Although Gln–1 in this particular zinc finger prefers A at position 3' of the triplet, it can also tolerate T, C, or G, as reported for the same GAA-binding zinc finger of a Zif268 variant12. Figure 4F shows an alignment of the potential VE-1 binding sites found in the distal CDH5 promoter between positions -2369 and -1342. In vitro binding data showed that three DNA sequences in this region interacted with VE-1 with an affinity similar to those of the predicted VE-1 substrate and the -88 duplex (duplexes -2303, -1990, and -1591). In agreement with the CAST data, these duplexes have an identical core but different nucleotides at positions 1 and 9. As expected, mutations in the conserved core all decreased the affinity of VE-1 for its target DNA duplex. Overall, these data suggest that a possible mechanism of activation by TFZFVE-1 involves direct regulation of the promoter by interaction with multiple binding sites in both the proximal and distal regions.

Many TFZFs activated CDH5 in cancer cell lines where the gene product was not significantly expressed as determined by FACS, such as A431 (squamous carcinoma), HeLa, MDA-MB-435s (breast cancer), and HT29 (colon cancer) cells (Fig. 5). Notably, some regulators activated (when linked to a VP64 activator domain) or repressed (when linked to the KRAB repression domain5) CDH5 expression in cell lines where the gene is well expressed, such as in melanoma C8161 (Fig. 5B) or SKBR-3 cells (Fig. 5C). In melanoma C8161 cells, expression of CDH5 has been associated with the formation of vascular-like networks in three-dimensional collagen gels26. The selected TFZFs could be useful tools for studying the role of CDH5 with respect to several aspects of angiogenesis, tumor progression, and metastasis by these different cancer cell lines.


推荐
热点排行
一周推荐
关闭