analysis of core promoter elements: in-vivo -...

47
Analysis of core promoter elements: in-vivo functionality and contribution to enhancer- promoter specificity אנליזת מוטיבים ב פרומוטור הליבה: פעילותin-vivo ו תרומה ל ספציפיות פרומוטר- אנהנסרAnna Sloutskin Ph.D. proposal Supervised by Dr. Tamar Juven-Gershon Tevet 5775/ December 2014 Ramat Gan, Israel

Upload: trinhkhanh

Post on 17-Jun-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Analysis of core promoter elements: in-vivo

functionality and contribution to enhancer-

promoter specificity

תרומה ו in-vivoפרומוטור הליבה: פעילות ב אנליזת מוטיבים

אנהנסר-פרומוטר ספציפיותל

Anna Sloutskin

Ph.D. proposal

Supervised by Dr. Tamar Juven-Gershon

Tevet 5775/ December 2014 Ramat Gan, Israel

Page 2: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

1

1 INTRODUCTION ...................................................................................................................................................... 1

1.1 TRANSCRIPTION INITIATION AND THE CORE PROMOTER ........................................................................................................ 1

1.2 CORE PROMOTER ELEMENTS .......................................................................................................................................... 2

1.3 DISTAL REGULATORY ELEMENTS ...................................................................................................................................... 4

1.4 DPE-DEPENDENT REGULATION OF MESODERMAL GENES ...................................................................................................... 6

1.5 THE DPE MOTIF IS NECESSARY FOR TRANSCRIPTIONAL REGULATION IN VIVO .......................................................................... 8

1.6 EVOLUTIONARY CONSERVATION OF THE DPE ELEMENT ...................................................................................................... 10

2 RESEARCH IMPORTANCE AND AIMS: .................................................................................................................. 11

2.1 TO EXAMINE THE CHARACTERISTICS OF 3D INTERACTION PROFILES OF DIFFERENT CLASSES OF PROMOTERS .................................. 11

2.1.1 To characterize the 3D interactions profile of Mesodermal promoters containing different core-promoter

composition. .............................................................................................................................................................. 11

2.1.2 To characterize the 3D interactions profile of Caudal-responsive promoters under differential regulation of

Caudal expression. .................................................................................................................................................... 11

2.2 TO EXAMINE THE ROLE OF DPE-INR PROMOTERS WITHIN A DEVELOPMENTAL NETWORK IN TRANSCRIPTIONAL REGULATION IN-VIVO . 12

2.2.1 To evaluate the contribution of the DPE motif to the transcriptional output of Dorsal target genes using in-

vivo assays. ................................................................................................................................................................ 12

2.2.2 To evaluate the transcriptional output following the in-vivo flipping of core-promoter types. ...................... 12

3 PRELIMINARY RESULTS AND METHODS .............................................................................................................. 13

3.1 THE MESODERMAL CONTEXT OF DOWNSTREAM CORE PROMOTER ELEMENT (DPE) ................................................................. 13

3.1.1 In-vivo contribution of the DPE to the transcriptional regulation of mesodermal genes. ............................... 13

3.1.2 Classification of mesodermal genes according to their promoter composition. ............................................. 16

3.2 GENERATION OF CONSTRUCTS FOR THE EXAMINATION OF CAUDAL-REGULATED TARGETS IN-VIVO .............................................. 17

3.3 SEARCH FOR EVOLUTIONARY CONSERVED HUMAN DPE ..................................................................................................... 18

3.3.1 Bioinformatics-based detection of putative human DPE ................................................................................ 18

3.3.2 Experimental validation of putative human DPE sequences ........................................................................... 19

3.4 DEVELOPMENT OF ELEMENT- A CORE PROMOTER ELEMENTS NAVIGATOR TOOL. ................................................................... 22

4 FUTURE PLANS ..................................................................................................................................................... 24

4.1 TO EXAMINE THE CHARACTERISTICS OF 3D INTERACTION PROFILES OF DIFFERENT CLASSES OF PROMOTERS .................................. 24

4.1.1 To characterize the 3D interactions profile of Mesodermal promoters containing different core-promoter

composition. .............................................................................................................................................................. 24

4.1.2 To characterize the 3D interactions profile of Caudal-responsive promoters under differential regulation of

Caudal expression. .................................................................................................................................................... 24

4.2 TO EXAMINE THE ROLE OF DPE-INR PROMOTERS WITHIN A DEVELOPMENTAL NETWORK IN TRANSCRIPTIONAL REGULATION IN-VIVO . 25

4.2.1 To evaluate the contribution of the DPE motif to the transcriptional output of Dorsal target genes using in-

vivo assays. ................................................................................................................................................................ 25

4.2.2 To evaluate the transcriptional output following the in-vivo flipping of core-promoter types. ...................... 25

4.3 TO EXAMINE THE INVOLVEMENT OF CBP IN THE REGULATION OF DORSAL-RESPONSIVE GENES THROUGH THE DPE MOTIF .............. 25

5 REFERENCES ......................................................................................................................................................... 27

Appendix I A paper describing the ElemeNT and CORE resources, currently in revisions ............................................. 31

א ....................................................................................................................................................................... תקציר 6

Page 3: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

1

1 Introduction

1.1 Transcription initiation and the core promoter

The uniqueness of each cell, as well as the various developmental processes in a multicellular organism is

largely achieved by distinctive transcriptional programs, which are executed by the transcription machinery.

The regulation of transcription initiation is a complex process that is primarily based on the direct

interactions between general (basal) transcription factors and DNA. Transcription initiation occurs at the

core promoter region where the RNA Polymerase II (RNAPII) binds, which is often referred to as the

‘gateway to transcription’1–4. Although it was previously considered that the core promoter is a universal

component that works in a similar mechanism for all the protein-coding genes, it is nowadays established

that core promoters are divergent in their architecture and function, with a growing body of evidence

indicating that each individual core promoter is rather unique5–7. Moreover, distinct core promoter

compositions were demonstrated to result in various transcriptional outputs8–11. The transcription

machinery, as well as the underlying regulatory principles, are largely conserved from Drosophila to humans,

thus the use of Drosophila facilitated the discovery of many of the known regulatory networks.

Transcription initiation can generally occur in either a focused or a dispersed manner, however multiple

combinations between the ‘focused’ and ‘dispersed’ modes of transcription initiation are described4,6.

Promoters that exhibit a dispersed initiation pattern typically contain multiple weak transcription start sites

(TSSs) within a 50 to 100bp region, and are associated with CpG islands. In vertebrates, dispersed

transcription initiation appears to account for the majority of protein-coding genes and is believed to direct

the transcription of constitutively-expressed genes.

Focused promoters contain a single TSS and are highly correlated with tightly regulated gene expression4.

The focused core promoter region is designated as the 80bp sequence located from -40 to +40 relative to

the first transcribed nucleotide, which is usually described as “the +1 position”. The focused core promoter

area encompasses distinct DNA sequence motifs, termed core promoter elements or motifs, which are

Page 4: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

2

recognized by the basal transcription machinery to recruit RNAPII and form the preinitiation complex (PIC)12–

14. The TFIID multi-subunit complex is the key basal transcription factor that recognizes the core promoter in

the process of transcription initiation12–15. A distinct set of TFIID subunits, namely TBP-associated factors

(TAFs), recognize specific core promoter sequences, as detailed below3,5,12,16–19.

1.2 Core promoter elements

The relative location and length of the majority of the core promoter elements is depicted in Figure 1, and

discussed below.

The initiator motif (Inr), which is the most prevalent among core promoter elements, encompasses the

transcription start site, with a consensus sequence of ‘YYANWYY’ for mammals, and ‘TCAKTY’ for

Drosophila20. The A nucleotide is generally designated as the +1 position, even when this is not the

predominantly initiating nucleotide4. The Inr is recognized and bound by the TAF1 and TAF2 subunits of

TFIID21.

The TATA box motif, which was the first core promoter motif identified22, is conserved from archaebacteria

to human23. The consensus sequence ‘TATAWAAR’, with the 5’ T usually located at -30 or -31 relative to the

TSS, is bound by the TBP subunit of TFIID4,22. The TFIIB general transcription factor was found to bind

immediately upstream or downstream to the TATA box at the TFIIB recognition elements (BRE)24. The

upstream BRE (BREu) consensus is ‘SSRCGCC’, while the downstream BRE (BREd) consensus is ‘RTDKKKK’25,26.

The downstream core promoter element (DPE), which was originally discovered in Drosophila as a TFIID

recognition site that is downstream of the Inr, is precisely located at +28 to +33 relative to the A+1 of the Inr

with a functional range set of ‘DSWYVY’16,17,27. The DPE is conserved from Drosophila to human16. The motif

ten element (MTE) was identified as an overrepresented core promoter sequence that is located

immediately upstream of the DPE, encompassing positions +18 to +29 relative to the A+1 of the Inr28,29. The

MTE was validated to be a functional TFIID recognition site, with the consensus sequence of ‘CSARCSSAAC’29.

Both the DPE and the MTE are recognized by the TAF6 and TAF9 subunits of TFIID16,19. The MTE and DPE

together were found to encompass three functional sub-regions located at nucleotides +18 to +22, +27 to

Page 5: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

3

+29 and +30 to +33 downstream of the A+1 in the Inr, which contribute to core promoter activity19. The

bridge configuration includes the first and third functional sub-regions (bridge I, positions +18 to +22,

favored nucleotides ‘CGANC’; bridge II, positions +30 to +33, favored nucleotides ‘WYGT’) and was shown to

be a naturally rare, however functional, core promoter element4,19. The MTE, DPE and Bridge elements are

exclusively dependent on the presence of a functional initiator, and are enriched in a TATA-less

promoters3,4,16,19,27,29.

Figure 1. Schematic representation of the majority of the core promoter elements. The region of the core promoter area (-40 to +40 relative to the TSS) is illustrated. The diagram is roughly to scale. The TATA box and the BREu and BREd are located upstream of the TSS, while the DPE, MTE and Bridge elements are located downstream of the TSS.

The downstream core element (DCE) is composed of three sub-elements, located at positions +6 to +11

(necessary motif ‘CTTC’), +16 to +21 (necessary motif ‘CTGT’), and +30 to +34 (necessary motif ‘AGC’)

relative to transcription start site30,31. The DCE is distinct from the DPE, as it is frequently found in TATA-box

containing promoters and is bound by the TAF1 subunit of TFIID31, as opposed to the binding of the DPE by

the TAF6 and TAF9 subunits.

The polypyrimidine initiator motif (TCT) is another element encompassing the transcription start site, with

the consensus sequence of Drosophila being ‘YYC+1TTTYY’, while the human TCT consensus sequence is

‘YC+1TYTYY’, where +1 denotes the transcription start site (TSS)32. Although the initiator consensus resembles

the TCT consensus, the TCT cannot substitute for the Inr and initiate transcription32.

Page 6: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

4

Two additional core promoter motifs that are located around TSSs are sufficient to drive transcription and

are enriched among TATA-less promoters, were identified in the hepatitis B virus X gene promoter. The X

gene core promoter element 1 (XCPE1) drives RNAPll transcription when accompanied by co-activator

sites33. It is found in ~1% of the human genes and its consensus sequence ‘DSGYGGRASM’ spans positions -8

to +2 relative to the TSS33. The X gene core promoter element 2 (XCPE2) is sufficient to drive RNAPll

transcription by itself, in contrast to XCPE1. Its consensus sequence ‘VCYCRTTRCMY’ spans positions -9 to +2

relative to the TSS34.

Specific core promoter motifs have previously been implicated in the regulation of specific regulatory

networks. The TCT motif is enriched among ribosomal proteins and other translation-related factors, and is

considered to be devoted to synthesis of ribosomal proteins32. The downstream core promoter element

(DPE) has previously been identified in multiple developmentally-regulated genes35. Most of the Drosophila

homeotic (Hox) gene promoters (all of which lack the TATA box element), contain functionally important DPE

motifs35. Moreover, Caudal, a master regulator of Hox gene expression, exhibits preferential activation of

DPE-containing promoters as compared to TATA containing promoters35. Caudal, a sequence-specific

homeodomain transcription factor, is required for the specification of the posterior part of the embryo and

for patterning the anterior-posterior axis36, caudal-like genes are highly conserved in evolution and have

been found in multiple species 37. The vertebrate Caudal-related (Cdx) homeobox proteins have been

identified as factors that mediate anterior- posterior patterning through Hox gene regulation38,39.

Based on the findings above, it is likely that the distribution of specific core promoter elements in the

genome is not uniform, but is rather clustered within functional groups. However, no comprehensive

examination of the distribution of core promoter elements within functional classes has been carried yet.

1.3 Distal regulatory elements

Enhancer regions are functionally defined as containing clusters of binding sites for sequence-specific

transcription factors and are typically several hundred base pairs in length40,41. Strikingly, enhancers can

work over long distances, even over a million base pairs or more in multicellular organisms42,43. Co-activator

Page 7: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

5

and co-repressors comprise another class of general transcription factors, which serve as the molecular

bridges bringing into proximity the distant DNA sequences by the formation of chromatin loops (Figure

2)12,41. Interestingly, it was demonstrated that the widely used transcriptional co-activator, the CREB-binding

protein (CBP), co-occupies genomic loci in Drosophila embryos to which the Dorsal transcription factor

binds44,45. These findings indicate that mediator complexes can play tissue-specific regulatory role, and

should not be considered as a universal component.

Nonetheless, the combinatorial mode of transcription factors binding is of critical importance to the

transcriptional outcome. For example, the recruitment of multiple transcription factors, both activators and

repressors, enables the formation of the delicate stripe expression patterns in the Drosophila embryo40,46. It

was demonstrated that the combinatorial binding of transcription factors can be used to accurately predict

the spatio-temporal activity of mesodermal enhancers, and the actual combinations of transcription factors

that bind these elements at specific stages of development were mapped47. Furthermore, only one or two

types of TF binding motifs were found to be capable of driving specific spatio-temporal patterns during

development48. Another aspect influencing the expression pattern is the spacing and orientation of the

binding sites within the enhancer48.

Furthermore, histone modifications of active mesodermal enhancers were examined using a pure

mesodermal nuclei population, by isolating them from the whole-embryo49,50. It was found that rather than

having a unique chromatin state, active developmental enhancers show heterogeneous histone

modifications and Pol II occupancy, with Pol II recruitment being highly predictive of the timing of enhancer

activity50. The authors demonstrated that the combined chromatin signatures and Pol II occupancy are

sufficient to predict enhancer activity de novo. However, there is no knowledge with regards to how these

activity states are related to different promoter types and how different promoter types interact with the

vast collection of mesodermal enhancers. A recent study indicates that most promoter-enhancer

interactions in the Drosophila genome appear unchanged between tissue context and across development,

arising before gene activation43. In addition, these loops were found to be frequently associated with paused

RNA polymerase43. However, highly dynamic and transient contacts would not be visible when averaging

Page 8: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

6

over millions of nuclei, therefore a more comprehensive examination of promoter-enhancer loops will shed

further light on the regulation of transcription.

Figure 2. Enhancer-promoter chromatin looping. The core promoter region (in the immediate vicinity of the TSS, marked by an arrow pointing to the right) is regulated by the basal transcription machinery and by enhancers, which can be located at any distance from their target genes along the linear genomic DNA sequence. In a given tissue, active enhancers are bound by activating TFs and are brought into proximity of their respective target promoters by looping. Figure modified from

41.

1.4 DPE-dependent regulation of mesodermal genes

Dorsal-ventral patterning and mesoderm formation in the Drosophila embryo is one of the most critical

events during early embryogenesis. The complex transcriptional regulatory networks governing Drosophila

mesoderm development has been studied for many years using genetics, genomics and computational

biology, with enhancers and cis-regulatory modules being in the main focus42,47,51,52.

The overall interactions between the modules regulating the same process can be mapped to generate a

gene regulatory network (GRN)53.The dorsal-ventral GRN includes multiple genes that are activated by

different nuclear concentrations of the Dorsal transcription factor along the dorsal-ventral axis. Activation of

Dorsal target genes is achieved by the recruitment of Dorsal to the enhancers of these genes, which contain

Dorsal-binding sites hundreds or even thousands of base pairs upstream of the transcription start site54–58.

Page 9: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

7

Our lab has recently demonstrated that the DPE motif plays an important role in the expression of Dorsal

target genes regulating dorsal-ventral patterning9. Over two-thirds of Dorsal target genes were found to

contain DPE sequence motifs, which is significantly higher than the proportion of DPE-containing promoters

among total Drosophila genes. Furthermore, multiple Dorsal target genes were shown to be evolutionarily

conserved and functionally dependent on the DPE9. Despite the TATA box being a strong core promoter

element, only a few of the Dorsal targets that are regulated by high or intermediate nuclear concentrations

of Dorsal, contain a TATA box without a DPE motif. In addition, the extent of activation of some TATA box-

containing dorsal-responsive genes by transfected Dorsal is reduced, as compared to the extent of activation

of DPE-containing promoters59.

Figure 3. The core promoter composition establishes an additional dimension in the dorsal-ventral gene regulatory network. Dorsal target genes are classified according to their embryonic tissues- the mesoderm, neurogenic ectoderm and dorsal ectoderm. The Dorsal nuclear gradient is represented by the depth of the blue color. The upper side of the cube displays the color coding of the possible combinations of the discussed core promoter elements. The front depicts selected Dorsal target genes with the corresponding color-coded core promoter composition. The relative frequency of each core promoter combination among all Dorsal target genes in each of the three tissue (using the same color code) is shown on the right. The figure is published as Figure 4 in 59.

Page 10: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

8

Together with published findings10,60,61, the recent findings advocate that the core promoter itself is a critical

regulatory component for gene expression. We suggest that the core promoter composition should be

envisioned as an additional component of the dorsal-ventral gene regulatory network, which contributes to

the combinatorial transcriptional output (Figure 3). The enrichment of the DPE in Dorsal target promoters,

as compared to the TATA box motif, is most striking in the mesodermal genes responding to the highest

Dorsal nuclear concentrations. The enrichment of the DPE motif in the promoters of Dorsal target genes is

decreasing with Dorsal nuclear concentration. The correlation between Dorsal nuclear concentration and

core promoter composition, as well as the regulation of multiple key mesodermal Dorsal target genes via the

DPE motif, reinforce this model. Further support for this model is provided by the in vitro expression pattern

of hybrid enhancer-promoter constructs. Strikingly, combining the natural enhancer of twist with tinman

core promoter recapitulate the same in vitro pattern detected for tinman natural core promoter, and not for

the twist enhancer59.

1.5 The DPE Motif Is Necessary for Transcriptional Regulation in vivo

Previous work in our lab has generated transgenic flies expressing the EGFP reporter gene driven by the

natural enhancer and promoter of one of the Hox genes, Antennapedia (Antp P2)59. The Antp P2, as well as

the majority of the promoters of the Hox genes (which determine the identity of the segments along the

anterior-posterior axis in the developing embryo) was previously analyzed using Drosophila embryo nuclear

extracts and Schneider cells and was shown to be regulated in a DPE-dependent manner35. Flies containing

either the natural enhancer of Antp P2 promoter or an Antp P2 promoter with a mutation in the DPE motif

were generated. The EGFP expression, which was detected by immunostaining with anti-GFP antibodies of 2-

16 hour embryos, is typical for the Antp gene and is highly dependent on the DPE, as mDPE-Antp P2-EGFP

flies do not express EGFP (Figure 4). Hence, the DPE motif is necessary for transcription of the Antp gene in

the developing Drosophila embryo.

In order to decipher the in vivo importance of the core promoter's composition it is important to analyze the

effect of mutations in different core promoter elements of multiple developmental regulators. It would be

Page 11: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

9

exciting to compare the in vivo expression driven by promoters in which the TATA box could compensate for

the loss of a DPE in vitro, and examine whether DPE-driven transcription in vivo is different than TATA-driven

transcription. Such analysis may also provide new insights into the spatial and temporal expression of

different core promoter variants.

Figure 4. Expression of Antp P2-EGFP in transgenic Drosophila embryos is dependent on the DPE motif. Homozygous transgenic flies expressing the EGFP reporter gene driven by the natural enhancer and downstream promoter of the DPE -dependent Hox gene Antennapedia (Antp P2) were generated using site specific integration. Transgenic embryos for EGFP driven by the wild type (wt) enhancer-promoter of Antp P2 display an expression pattern (marked by an arrowhead) that is typical for the Antp P2 genomic fragment used, whereas transgenic embryos for EGFP driven by the Antp P2 enhancer-promoter containing a mutated DPE (mDPE) do not express EGFP. The figure is published as Figure 3 in

59.

Page 12: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

10

1.6 Evolutionary conservation of the DPE element

Previous studies have indicated that the DPE motif is conserved from Drosophila to humans16. Although

numerous genes with functional DPE were identified in Drosophila9,35,62, the only human genes containing a

functional DPE are the Interferon Regulatory Factor 1 (IRF1) and Calmodulin 216,63. However, these two genes

have distinct functions (a transcription factor and a phosphorylase kinase, respectively) and, based on

STRING analysis, are not clustered together. In contrast, the Drosophila DPE seems to contribute a regulatory

dimension to two distinct developmental gene networks (see sections ‎1.2 and ‎1.4 above).

Hox genes, whose regulation was demonstrated to be DPE-specific in Drosophila35, are highly conserved to

humans64. On-going work in our lab indicates that some human Hox genes possess a functional DPE motif.

Nevertheless, the transcription activity observed for the mutant DPE gene as compared to the wt, is typically

50-70%, whereas mutation of the Drosophila DPE reduces its activity by 70-90%. (Hila Shir-Shapira and

Yehuda M. Danino, unpublished data).

It would be interesting to explore whether additional human genes contain a functional DPE motif, and

whether the DPE motif is associated within distinct transcriptional networks.

Page 13: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

11

2 Research Importance and Aims:

The core promoter region has been demonstrated to be an important regulator of transcriptional output,

containing a unique set of core promoter elements.

Dissecting the role of the downstream core promoter element (DPE) can provide new and exciting insights

into the regulation of transcription initiation. The understanding of gene in the context of the developing

Drosophila embryo will advance our understanding of embryonic development, comprising one of the most

fascinating processes during each animal’s life.

The proposed research would explore the following specific aims:

2.1 To examine the characteristics of 3D interaction profiles of different classes of promoters

2.1.1 To characterize the 3D interactions profile of Mesodermal promoters containing different core-

promoter composition.

Using 4C on mesodermal nuclei, we will examine the 3D connectivity of specific promoters, based on their

core promoter elements composition. The resulting data will provide the first view of the extent of

interactions that an individual promoter makes, as well as reveal the inherent differences between classes of

promoters. This information may provide new insights into why certain classes of genes, e.g. developmental

regulators versus ubiquitously expressed genes, have specific promoter types.

2.1.2 To characterize the 3D interactions profile of Caudal-responsive promoters under differential

regulation of Caudal expression.

Using transgenic flies expressing reporter genes driven by different Caudal target genes, we will perform 4C

experiments on a defined set of Caudal-responsive promoters (mostly Hox genes). Three promoters of

Caudal target genes will be examined- the sex combs reduced, a synthetic enhancer composed of Caudal

binding sites, and caudal itself, which was implicated to be auto-regulated (preliminary results from our lab).

This analysis will reveal the extent of interactions that Caudal target promoters make and distinct enhancer

properties that are unique to DPE+Inr dependent promoters.

Page 14: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

12

Our enhancer-promoter specificity analysis will provide a better understanding of the physical and functional

connections between enhancers and promoters in Drosophila, as well as higher eukaryotes.

2.2 To examine the role of DPE-Inr promoters within a developmental network in

transcriptional regulation in-vivo

2.2.1 To evaluate the contribution of the DPE motif to the transcriptional output of Dorsal target genes

using in-vivo assays.

Direct examination of changes on the in-vivo transcripts, using CRISPR-Cas9 system for genomic editing, will

help to elucidate the in-vivo effect of core promoter elements on transcription. In order to generate a better

understanding of the DPE role in vivo, we will examine the in-vivo significance of DPE mutation in Dorsal

target genes.

2.2.2 To evaluate the transcriptional output following the in-vivo flipping of core-promoter types.

Re-wiring of the developmental networks studied by flipping promoter types, based on the results from the

previous sections, will probably result in changes to the 3D contacts that the promoters will make,

potentially linking them to different enhancer elements. This approach may prove to be a very specific

perturbation to the regulatory circuit.

Page 15: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

13

3 Preliminary Results and Methods

3.1 The mesodermal context of Downstream core promoter element (DPE)

Our group has demonstrated that the DPE is enriched among mesodermal genes9, and the proposed

study will explore this unique relationship in greater detail. As part of this work, I’ve participated in

the manuscript preparation of an extra-view article regarding the previously published work

(reference 59).

3.1.1 In-vivo contribution of the DPE to the transcriptional regulation of mesodermal genes.

To assess the in-vivo functionality of the mesodermal genes during early embryonic development

analyzed previously9, two series of transgenic fly strains were generated by Yonathan Zehavi, for the

dorsal-responsive tinman and brinker genes. Each tinman or brinker transgenic series consisted of

the natural enhancer, coupled to three different core promoter versions- wt, mutant DPE (mDPE),

and mDPE to which a TATA box was added (mDPE+TATA). All the enhancer-promoter constructs

drive the expression of the LacZ reporter gene, integrated into the Drosophila genome using the

φC31 integration system65,66.

To analyze the transcriptional outcome of the different core-promoters in-vivo, the expression of

LacZ was assayed by several techniques, using both over-night and optimal-hours collections of the

transgenic embryos. The highest expression level of tinman is documented to appear at 0-8 hours

after egg laying (AEL), while brinker is most highly expressed at 2-4 hours AEL. Optimal expression

times were determined using the FlyBase database67.

Initially, the enzymatic X-Gal assay was used to visualize the expression patterns of the tinman and

brinker transgenes (data not shown). Although some embryos were specifically stained by the blue

color, no distinguishable pattern could be detected with this assay for any of the analyzed strains.

In order to achieve a better resolution, the embryos were stained with β-gal antibodies (Figure 5,

Figure 6). The staining procedure was repeated at least three times for each series of transgenes,

using either over-night or optimal-hours collections of embryos. Surprisingly, for both tinman and

brinker the staining revealed no difference in the expression pattern between the WT and the two

Page 16: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

14

mutant constructs, in contrast to the in-vitro findings9. In addition, no early expression (before

embryonic stage 10) of the β-galactosidase reporter was detected in any of the analyzed embryos,

although early embryos were clearly present. This disagreement may stem either from the

compound in-vivo regulatory network as compared to in-vitro assays, or from technical issues related

to the stability of the β-galactosidase protein (Prof. Adi Salzberg, Faculty of Medicine, Technion,

personal communication). The fact that the antibody staining wasn’t able to detect the early

expression, which is well-documented for tinman and brinker (BDGP insitu68,69), led us to examine

whether the detection of the β-galactosidase reporter protein using antibodies is an adequate

indicator of gene expression at early Drosophila embryonic stages.

In order to analyze the actual RNA expression we have performed RNA in-situ hybridization of the

transgenic embryos using a LacZ probe. The RNA in-situ hybridization was performed in Prof. Adi

Salzberg’s lab. The results, illustrated in Figure 7, have recapitulated the findings obtained at the

protein level, using β-galactosidase antibodies. These results suggest that the expression of LacZ

(either RNA or protein) cannot be detected in early embryos.

Noteworthy, tinman transgenes stained much weaker as compared to brinker transgenes analyzed in

parallel. This difference might reflect the differential regulation governing the two distinct genes.

Figure 5. Visualization of brinker transgenes expression patters, using β-Gal antibodies. Transgenic brinker embryos were collected over-night and stained using β-galactosidase antibody. No staining of early embryos (prior to embryonic stage 10) was detected. The staining pattern did not reveal any difference in gene expression between the wt and mutated transgenes. CantonS flies were used as negative control (see lower panels in Figure 6).

Page 17: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

15

Figure 6. Visualization of tinman transgenes expression patters, using β-Gal antibodies. Transgenic tinman embryos were collected at 0-8 hours AEL and stained using β-galactosidase antibody. The weak staining was not detected in early embryos (prior to embryonic stage 10), and did not reveal any difference in gene expression between the wt and mutated transgenes. CantonS embryos collected at 0-8 hours AEL were used as a negative control for both tinman (upper panels)and brinker embryos (Figure 5).

Figure 7. Visualization of brinker transgenes expression patterns, using in situ hybridization with LacZ probe. brinker transgenic embryos were collected at 2-4 hours AEL, and assayed for LacZ RNA expression using LacZ probe. The anti-sense probe has detected brinker expression at the head region (black arrows), caudal mesoderm (white arrow) and segment-specific pattern (asterisks). No staining of early embryos (prior to embryonic stage 10) was detected. The sense probe was used as a negative control, and no staining pattern was detected.

Page 18: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

16

Taking into consideration the obtained results, it is likely that some critical regulatory sequences are

missing from the engineered constructs, for both tinman and brinker. These missing regulatory

sequences are probably responsible for the early expression of both genes, and may be differentially

regulated by the different core promoter composition. Exploring which regulatory elements are

missing in the current transgenic flies is beyond the scope of the proposed research. However,

designing other transgenic flies, based on further findings of the functional wiring of the Drosophila

genome, will shed a light on transcriptional regulation by the DPE.

3.1.2 Classification of mesodermal genes according to their promoter composition.

In order to analyze another aspect of the DPE contribution to the transcriptional regulation, we aim

to examine the chromatin connectivity of DPE-containing promoters (see section ‎2.1.1). To this end,

we are collaborating with the lab of Dr. Eileen Furlong at EMBL, Heidelberg, Germany. The goal of the

joint project is to determine the 3D architecture of the different core promoter classes, based on

their core promoter elements composition. The core promoter is supposed to serve as a “viewpoint”

in a 4C assay, revealing all the genomic contacts of that specific promoter. The mesodermal genes

expressed at 6-8 hours of embryonic development50 were analyzed for the presence of the different

core promoter elements combinations, and top 20 candidate genes were chosen. The analyzed

combinations include Inr+DPE, TATA+DPE, TATA+Inr, Inr only and no Inr (Table 1). The classification

of core promoter composition was carried using the CORE database, generated by Yonathan Zehavi

(manuscript in revisions, see Appendix I).

In addition, the total number of mesodermal genes harboring the specific combination of core

promoter elements is indicated. These statistics strongly demonstrate that mesodermal genes are

indeed enriched for DPE motif and depleted for TATA-box motif, as was previously described9.

Page 19: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

17

Inr + DPE TATA + DPE TATA + Inr Inr only no Inr total no. of genes- 101

total no. of genes- 17 total no. of genes- 20

total no. of genes- 115 total no. of genes- 33

Biniou Aldehyde dehydrogenase

Act57B Bagpipe Adenylate kinase-2

brinker Asph Dmel_CG10191 bendless capping protein beta

Dorsocross3 Dmel_CG11315 Dmel_CG10327 delilah Catalase

Drop Dmel_CG15064 Dmel_CG1221 Delta Cyclophilin 1

folded gastrulation

Dmel_CG17752 Dmel_CG1582 derailed death executioner Bcl-2 homologue

stumps Dmel_CG7271 Dmel_CG1620 diminutive Dmel_CG17256

heartless Dmel_CG7379 Dmel_CG18754 kin of irre Dorsocross1

Lame duck E(spl) region transcript mbeta

Dmel_CG31999 Dymeclin Eukaryotic translation initiation factor 3 subunit J

Multi drug resistance 49

even skipped Dmel_CG5656 Ecdysone-inducible gene L3

Glutamine synthetase 1

Myocyte enhancer factor 2

Flavin-containing monooxygenase 2

Dmel_CG6910 Enigma Glutathione S-transferase C-terminal domain-containing protein homolog

Mes2 giant Dmel_CG9615 Eukaryotic translation initiation factor 3 subunit G-1

lethal (3) 03670

meso18E

H/ACA ribonucleoprotein complex subunit 2-like protein

E(spl) region transcript m1

Homeodomain protein 2.0 Mannose-6-phosphate isomerase

odd paired rhomboid E(spl) region transcript m3

Mes4 mitochondrial ribosomal protein L14

phantom sloppy paired 1 extra macrochaetae odd skipped N-myristoyl transferase

retained Snail lethal (2) 09851 Protein bric-a-brac 2 Probable elongation factor G, mitochondrial

Rho-like Sulfated mitochondrial ribosomal protein L19

ribbon Probable peroxisomal acyl-coenzyme A oxidase 1

Tinman Troponin C at 73F optomotor-blind-related-gene-1

sticks and stones proliferation disrupter

tribbles wntD Peroxiredoxin 2540 sugarbabe Ribosomal protein S3

Twist

spalt-adjacent technical knockout Slender lobes-like protein

zfh1

Taspase 1 u-shaped UPF0595 protein CG11755

Table 1. Classification of mesodermal genes according to their core promoter elements composition. The mesodermal genes list was constructed based on references

9,50. The annotation of core promoter elements has

employed the CORE database (see ‎Appendix I).

3.2 Generation of constructs for the examination of Caudal-regulated targets in-vivo

Obtaining a tissue-specific chromatin from the whole-embryo context was demonstrated to be a

powerful technique50. In order to isolate specific nuclei from whole embryos using batch isolation of

tissue-specific chromatin (BiTS) technique, the nuclei of interest should be either tagged or should

express a nuclear-specific antigen with an available antibody49. We plan to employ the endogenous

tagging, and therefore will be generating transgenic flies with Streptavidin-Binding Peptide (SBP)

tagging of histone 2B, under the regulation of the natural caudal enhancer-promoter genomic region.

The SBP tag will then be used to isolate the relevant nuclei from all the other embryonic nuclei. The

isolated nuclei will then be subjected to 4C analysis with several Caudal-responsive promoters as

Page 20: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

18

‘view-points’. In addition, the composition of the enhancer used, either native or synthetic, will affect

the 3D wiring of the examined gene, therefore allowing the exploration of enhancer-promoter

specificity effects.

In order to present a more comprehensive characterization of Caudal-responsive gene wiring, two

additional types of transgenic flies with SBP-tagged His2B should be generated. The first is driven by

a synthetic enhancer containing 6 naturally occurring Caudal binding sites, which is activated by

Caudal, while the second is driven by the sex combs reduced (Scr) natural enhancer (previously

demonstrated to be a DPE-dependent Hox gene that is activated by Caudal35).

The original plasmid used to generate the mesodermal nuclei tagged by SBP on His2B was obtained

from Dr. Furlong’s lab49, and standard molecular cloning techniques were employed in order to

replace the mesodermal regulatory sequences with Caudal-related sequences. The synthetic

enhancer and Scr constructs have been generated, while the natural caudal enhancer-promoter

construct is still missing due to technical issues.

The current plasmids are built using the pCASPER4 plasmid backbone, targeted for P-elements

insertion. In case we decide to generate transgenic flies with site-specific integration using the φC31

integrase system, we will subclone the attB sites.

3.3 Search for evolutionary conserved human DPE

This work was initiated by Yehuda M. Danino and Hila Shir-Shapira from our lab, and performed in

close collaboration with them.

3.3.1 Bioinformatics-based detection of putative human DPE

In order to examine whether additional human genes, apart from the two published16,63, harbor a

functional DPE motif, a bioinformatics approach was adopted. The Matlab-based hDPEsearcher code

was developed by Amitay Drummer, as part of a collaboration with the lab of Dr. Sol Efroni, BIU.

The hDPEsearcher is deigned to search for putative Inr+DPE combinations within the human genome.

Subsequently, the detected combinations are intersected with RefSeq’s annotated transcription start

sites (TSS), to yield the Inr+DPE combinations which are closest to the TSS. The proximity to the TSS,

Page 21: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

19

as well as the DPE match score, is considered to be indicative of potential biological function of the

detected DPE motif.

hDPEsearcher analyze each strand of each chromosome separately, and its algorithm can be

described by the following steps:

1) Search for DPE sequence, with at least 4/6 matches to the functional Drosophila range set

DSWYVY, where each match has a score of 1.

2) If the DPE score is ≥4 (out of 6), search for an initiator starting at position -2, considering the

S (G/C) of the detected DPE to be precisely located at +29 position (based on previous

experiments that indicate that having a G or a C in this position is very important27.

3) Calculate the Inr score, based on the mammalian consensus (YYANWYY), where the +1

position must be A, and the -1 position is C/T (Y).

4) Inr+DPE combination is considered to be a ‘hit’ only if the score for both the Inr and DPE is at

least 8. I.e, if the DPE is highly conserved the corresponding Inr can be less conserved and

vice versa.

In addition to the chromosomal coordinates of the putative Inr+DPE combinations, the software

plots the combinations found against the genomic coordinates for each chromosome. These graphs

describe a rather uniform distribution of putative Inr+DPE combinations along the chromosome, with

some regions containing peaks of Inr+DPE combinations frequencies. However, no significant

correlation between the peaks and potential biological function was discovered.

The list of the complete human TSS location was extracted from the UCSC table browser, with the

kind help of Dr. Tirza Doniger. Alternative splice variants starting at the same position were

considered as single TSS, denoted by its official gene symbol. Inr+DPE combinations found within

±5bp of annotated TSS were considered to be potentially functional.

3.3.2 Experimental validation of putative human DPE sequences

Based on the distance of the putative Inr+DPE combination from the annotated TSS, and the match

score of both the Inr and DPE, a list of putative human DPE target genes was constructed. Additional

Page 22: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

20

evidence available from the UCSC genome browser, such as deposited clones and tags, were

integrated as well. The exact core promoter composition and core-promoter elements location of

each relevant gene was manually annotated. Finally, eleven genes were considered as the most

promising candidates, and were assigned for experimental validation (Table 2).

Gene UCSC transcript ID Sequence (5' to 3')

P21 uc021yzb.1 aacatgtcccAACATGTTGAGCTCTGGCATAGAAGAGGCTGGTGGCTATT

TP53INP2 uc002xau.1 gcggccgcacAGACTCAAAGCCCCGCGGGCGAGCTCAGCAGCCCGGAGCG

SNAIL 1 uc002xuz.3 tgctgcattcATTGCGCCGCGGCACGGCCTAGCGAGTGGTTCTTCTGCGC

TWIST 2 uc021vyw.2 cagcccagctAGAGTTTCCAAAAAAGTTAGAATAACTTCCTCTCCCGGAG

Protein S uc010hoo.3 tgtttccttcAGTTTTGTCAAAGCAACAGGCTTCACAAGTCCTGGTTAGG

CCND1 uc010hoo.3 cagtaacgtcACACGGACTACAGGGGAGTTTTGTTGAAGTTGCAAAGTCc

CDC34 uc010hoo.3 cggccaaggcAAGCGCCGGTGGGGCGGCGGCGCCAGAGCTGCTGGAGCGC

CDC25B uc002wjn.3 gctgctgctcagcGCAGCCAGTCGCGGAGGCGGGGAGGCTGCGCGGTCAG

CDC25A uc003csh.1 CAGCGAAGACAGCGTGAGCCTGGGCCGTTGCCTCGAGGCTCTCGCCCGGC

HOX B6 uc010dbh.1 cctggtggttaTAATGCAGCATTCTTTTGGACACCACACCTAGGTCGGAG

HOX D13 uc002ukf.1 cgagcgaaccagaGAGAAAGGAGAGGAGGGAGGAGGCGCGCCGCGCCATG

Table 2. Target promoters for experimental validation of functional human DPE motif. The sequences of the experimentally assessed genes’ promoters are described. The initiator motif is colored yellow, while the DPE motif is colored purple. The colored nucleotides represent matches to the relevant consensus sequence, while An un-colored position within the box does not match the consensus. For each Inr, the A+1 position is indicated in bold. Positions that were found to be enriched in DPE-containing promoters are marked green. Capital letters represent exons. None of the selected transcripts contained a putative TATA box.

For each putative human DPE described in Table 2, two firefly luciferase reporter plasmids were

generated. The minimal promoter sequence form -10 to +40 relative to the A+1, either the wt or

mutant DPE (mDPE) version (mutating nucleotides +28-34 to CTCATGT), was inserted into the firefly

luciferase pGL3 vector. The generated plasmids were transfected into HEK-293 cells, along with a TK

driven Renilla luciferase reporter plasmid for transfection-efficiency normalization. The relative

activaties of the mD version, compared to the wt, was examined using dual-luciferase assay. All the

experiments were performed in triplicates and at least 2 replicates.

No significant difference between the transcriptional activity of the wt and the mDPE minimal

promoters was detected for all the analyzed genes (Figure 8). A close examination of the genes giving

Page 23: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

21

an apparent reduction of expression upon mutation of the DPE (i.e. CCND1 and ProS) reveal

difference in the enzymatic activities of Renilla luciferase expression between the wt and mD

transfected cells, and therefore the general conclusions seem to be applicable to all the analyzed

genes. Notably, the constructs used in the dual-luciferase assays only contained the minimal

promoter comprising 50 nucleotides, and may therefore not represent the actual transcriptional

difference. Moreover, Drosophila and humans are quite distant from an evolutionary standpoint,,

and the precise functional Drosophila DPE sequence might have evolved over time to represent a

modified set of nucleotides in human. In addition, the strict requirements of functional initiator and

precise spacing that is known to be functionally important in Drosophila, might be altered in humans.

Therefore, since the initial results did not seem to reveal a functional DPE with sequence identical to

Drosophila, further work will only be carried out in accordance with new insights, based on other

projects in the lab.

Figure 8. Multiple human promoters that contain a match to the Drosophila DPE sequence, do not contain a functional human DPE. The activity of the wt promoter was set to 1 for each gene, and the mD version was analyzed accordingly. Overall, there is no substantial difference in expression between the wt and the mD version of the minimal core promoter. Error bars represent standard deviations. n=4 for P21, CCND1, TWIST, and HOX D13. n=3 for TP53INP2, ProS, CDC25A, CDC25B, and CDC34. n=2 for SNAIL and HOX B6.

Page 24: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

22

3.4 Development of ElemeNT- a core promoter Elements Navigator Tool.

Notably, there is no available resource allowing the identification of all the specific core promoter

elements and their potential combinations within a given sequence. Therefore, each annotation of

core promoter elements described above was performed individually for each sequence of interest.

To automate this process and alleviate the time burden associated with manual scanning of tens of

sequences at once, we have developed the Elements Navigation Tool (ElemeNT).

A paper describing this work is in revisions.

Briefly, ElemeNT is a web-based, interactive tool (implemented in Perl) for rapid and convenient

detection of core promoter elements and their combinations within any given sequence. It is

accessible at http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-description (password-

protected until publication; Username-GershonLab, password- TJGL2014). ElemeNT searches the

input sequences for the presence of certain core promoter elements specified by the user. The

elements are represented by position weight matrices (PWMs), which are constructed based on the

validated biologically functional sequences. Notably, for some elements, the PWM differs from the

defined consensus, reflecting differences in the analyses of the sequences.

The elements that can be searched for are: Mammalian Initiator, Drosophila Initiator, TATA box,

MTE, DPE, Bridge, BREu, BREd, Human TCT, Drosophila TCT, XCPE1 and XCPE2. The MTE, DPE and

Bridge motifs are only calculated at the precise location relative to each detected

mammalian/Drosophila Initiator, based on the known strict spacing requirement. The scores are

normalized to be between 0 and 1, generating more intuitively interpretable results. For each

element, the user should specify a threshold between 0 and 1, which determines whether the

element is present or not at a position. Default threshold values were empirically determined for

each element, based on known functional sequence elements, and are provided.

The output of the program contains the analyzed sequence, a color display of some possible core

promoter elements combinations found, and a table containing each of the detected elements

Page 25: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

23

alongside its position, PWM and consensus match scores. A sample output of the ElemeNT program

is depicted in Figure 9.

In addition to the automation of core promoter elements annotation, the ElemeNT program utilizes

PWM data, rather than consensus sequence, to score the putative motifs. The use of the PWM

enables a better reflection of the biological significance of the different nucleotides’ distribution at

specific position, which is hard to account for by manual annotation of sequences.

For further description of the program, please refer to ‎Appendix I.

Figure 9. A sample output of the ElemeNT program. (A) A screen-shot of the sample input sequence and the combinations of elements identified in it. ElemeNT has detected a TATA box flanked by both a BREu element and a BREd element, Drosophila and Mammalian initiator elements and MTE, DPE and Bridge elements. The two possible combinations result from a sequence match to both the Drosophila and mammalian initiators, due to the partial sequence redundancy of the two elements. (B) The table displaying all the elements identified within the sample input sequence, their location, PWM and consensus match scores. Note the message displayed for the TATA-box, indicating the presence of mammalian and Drosophila initiator, as well as BREu and BREd, at optimal distances for transcriptional synergy.

Page 26: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

24

4 Future Plans

4.1 To examine the characteristics of 3D interaction profiles of different classes of

promoters

This part will be performed as a joint project with Dr. Eileen Furlong, EMBL, Germany.

The proposed working schemes will be subjected to changes, based on findings from Furlong lab.

4.1.1 To characterize the 3D interactions profile of Mesodermal promoters containing different

core-promoter composition.

In collaboration with Dr. Eileen Furlong’s lab, representative candidates from each promoter will be

used as the ‘view-points’ in 4C experiments, based on Table 1. The mesodermal nuclei will be

isolated using the BiTS protocol49, and 4C analysis will be carried as in 43.

4.1.2 To characterize the 3D interactions profile of Caudal-responsive promoters under

differential regulation of Caudal expression.

Transgenic flies will be generated using the PhiC31 integrase site-specific integration system70 to

generate 3 types of Caudal-responsive tagged histone transgenes. The first line will contain the

caudal gene driven by the natural caudal enhancer itself (which is DPE-dependent and is activated by

Caudal). The second line will express SBP tagged His2B under the control of a synthetic enhancer

containing 6 naturally occurring Caudal binding sites, which is activated by Caudal, and the third will

contain the Scr (sex combs reduced) natural enhancer (which is a DPE-dependent Hox gene activated

by Caudal35).

Here again, BiTS will be used to isolate SBP-His2B expressing nuclei under the Caudal-responsive

promoter, and 4C analysis will be performed using the following caudal-responsive genes:

labial (lab), proboscipedia (pb), Deformed (Dfd), Sex combs reduced (Scr), Antennapedia P1 (Antp P1),

Antennapedia P2 (Antp P2), Abdominal-B (Abd- B), hairy (h), forkhead (fkh), fushi tarazu (ftz), giant

(gt), and caudal (cad) itself. Abdominal-A (abd-A) and Ultrabiothorax (Ubx) lack both a TATA box and

a DPE and will serve as negative control.

Page 27: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

25

4.2 To examine the role of DPE-Inr promoters within a developmental network in

transcriptional regulation in-vivo

4.2.1 To evaluate the contribution of the DPE motif to the transcriptional output of Dorsal target

genes using in-vivo assays.

Although the Antp P2 transcript was detected to be DPE-dependent in-vivo59, this effect was not yet

recapitulated with other genes, mainly due to the complexity of genomic loci (see section ‎3.1.1).

Using genome editing, rather than reporter assays, we expect to gain a better understanding of the

role of core promoter elements in-vivo.

We will construct several transgenic Drosophila lines, using the CRISPR-Cas technology, examining

the DPE transcriptional output to dorsal-responsive gene. The examined genes will include tinman

and twist, which were demonstrated to be regulated by the DPE using both S2R+ cells and 0-12h

embryos nuclear extract9. In addition, the dorsal gene itself will be examined; Dorsal auto-regulation

is supported mainly by ChIP signals71, indicating the extensive binding of dorsal protein in the coding

region of the dorsal coding region.

The generated transgenes will include two versions: a mutated DPE (mDPE), and a mutated DPE with

the addition of a TATA box (mDPE+TATA).

4.2.2 To evaluate the transcriptional output following the in-vivo flipping of core-promoter types.

CRISPR-Cas9 system will be used to induce specific mutations, similarly to the experiments in

section ‎4.2. The target genes will be selected based on the information gathered in the previous

steps.

4.3 To examine the involvement of CBP in the regulation of Dorsal-responsive genes

through the DPE motif

The regulatory contribution of the CBP co-activator protein to DPE-dependent activation of Dorsal

target genes will be examined in S2R+ cells. CBP and Dorsal will be transfected to the cells in

different combination in order to examine the extent of DPE-dependent activation. The analyzed

Page 28: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

26

genes will include the DPE-dependent enhancer constructs of tinman, brinker, leak and twist genes

(published in 9). Additional enhancer-containing constructs of Dorsal-responsive genes, such as

phantom and Rho-like, will be examined as well. In order to determine the regulatory relationships

between the CBP and Dorsal proteins, both on dual-luciferase assays and primer-extension analysis

will be used.

Page 29: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

5 References 1. Smale, S. T. Core promoters: active contributors to combinatorial gene regulation. Genes Dev.

15, 2503–2508 (2001). 2. Smale, S. T. & Kadonaga, J. T. The RNA polymerase II core promoter. Annu. Rev. Biochem. 72,

449–479 (2003). 3. Juven-Gershon, T., Hsu, J.-Y., Theisen, J. W. & Kadonaga, J. T. The RNA polymerase II core

promoter - the gateway to transcription. Curr. Opin. Cell Biol. 20, 253–259 (2008). 4. Kadonaga, J. T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev.

Biol. 1, 40–51 (2012). 5. Juven-Gershon, T. & Kadonaga, J. T. Regulation of gene expression via the core promoter and the

basal transcriptional machinery. Dev. Biol. 339, 225–229 (2010). 6. Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and

insights into transcriptional regulation. Nat. Rev. Genet. 13, 233–245 (2012). 7. Müller, F., Demény, M. A. & Tora, L. New Problems in RNA Polymerase II Transcription Initiation:

Matching the Diversity of Core Promoters with a Variety of Promoter Recognition Factors. J. Biol. Chem. 282, 14685–14689 (2007).

8. Juven-Gershon, T., Cheng, S. & Kadonaga, J. T. Rational design of a super core promoter that enhances gene expression. Nat. Methods 3, 917–922 (2006).

9. Zehavi, Y., Kuznetsov, O., Ovadia-Shochat, A. & Juven-Gershon, T. Core promoter functions in the regulation of gene expression of Drosophila dorsal target genes. J. Biol. Chem. 289, 11993–12004 (2014).

10. Butler, J. E. & Kadonaga, J. T. Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 15, 2515–2519 (2001).

11. Dikstein, R. The unexpected traits associated with core promoter elements. Transcription 2, 201–206 (2011).

12. Thomas, M. C. & Chiang, C.-M. The general transcription machinery and general cofactors. Crit. Rev. Biochem. Mol. Biol. 41, 105–178 (2006).

13. He, Y., Fang, J., Taatjes, D. J. & Nogales, E. Structural visualization of key steps in human transcription initiation. Nature 495, 481–486 (2013).

14. Grünberg, S. & Hahn, S. Structural insights into transcription initiation by RNA polymerase II. Trends Biochem. Sci. 38, 603–611 (2013).

15. Cianfrocco, M. A. et al. Human TFIID binds to core promoter DNA in a reorganized structural state. Cell 152, 120–131 (2013).

16. Burke, T. W. & Kadonaga, J. T. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev. 11, 3020–3031 (1997).

17. Burke, T. W. & Kadonaga, J. T. Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes Dev. 10, 711–724 (1996).

18. Wu, C. H. et al. Analysis of core promoter sequences located downstream from the TATA element in the hsp70 promoter from Drosophila melanogaster. Mol. Cell. Biol. 21, 1593–1602 (2001).

19. Theisen, J. W. M., Lim, C. Y. & Kadonaga, J. T. Three Key Subregions Contribute to the Function of the Downstream RNA Polymerase II Core Promoter. Mol. Cell. Biol. 30, 3471–3479 (2010).

20. Smale, S. T. & Baltimore, D. The ‘initiator’ as a transcription control element. Cell 57, 103–113 (1989).

21. Chalkley, G. E. & Verrijzer, C. P. DNA binding site selection by RNA polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO J. 18, 4835–4845 (1999).

22. Goldberg, M. thesis, Stanford Univ. (1979). 23. Reeve, J. N. Archaeal chromatin and transcription. Mol. Microbiol. 48, 587–598 (2003).

Page 30: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

24. Deng, W. & Roberts, S. G. E. TFIIB and the regulation of transcription by RNA polymerase II. Chromosoma 116, 417–429 (2007).

25. Lagrange, T., Kapanidis, A. N., Tang, H., Reinberg, D. & Ebright, R. H. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 12, 34–44 (1998).

26. Deng, W. & Roberts, S. G. E. A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev. 19, 2418–2423 (2005).

27. Kutach, A. K. & Kadonaga, J. T. The Downstream Promoter Element DPE Appears To Be as Widely Used as the TATA Box in Drosophila Core Promoters. Mol. Cell. Biol. 20, 4754–4764 (2000).

28. Ohler, U., Liao, G., Niemann, H. & Rubin, G. M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3, RESEARCH0087 (2002).

29. Lim, C. Y. et al. The MTE, a new core promoter element for transcription by RNA polymerase II. Genes Dev. 18, 1606–1617 (2004).

30. Lewis, B. A., Kim, T.-K. & Orkin, S. H. A downstream element in the human β-globin promoter: Evidence of extended sequence-specific transcription factor IID contacts. Proc. Natl. Acad. Sci. 97, 7172–7177 (2000).

31. Lee, D.-H. et al. Functional characterization of core promoter elements: the downstream core element is recognized by TAF1. Mol. Cell. Biol. 25, 9674–9686 (2005).

32. Parry, T. J. et al. The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev. 24, 2013–2018 (2010).

33. Tokusumi, Y., Ma, Y., Song, X., Jacobson, R. H. & Takada, S. The New Core Promoter Element XCPE1 (X Core Promoter Element 1) Directs Activator-, Mediator-, and TATA-Binding Protein-Dependent but TFIID-Independent RNA Polymerase II Transcription from TATA-Less Promoters. Mol. Cell. Biol. 27, 1844–1858 (2007).

34. Anish, R., Hossain, M. B., Jacobson, R. H. & Takada, S. Characterization of Transcription from TATA-Less Promoters: Identification of a New Core Promoter Element XCPE2 and Analysis of Factor Requirements. PLoS ONE 4, e5103 (2009).

35. Juven-Gershon, T., Hsu, J.-Y. & Kadonaga, J. T. Caudal, a key developmental regulator, is a DPE-specific transcriptional factor. Genes Dev. 22, 2823–2830 (2008).

36. Lall, S. & Patel, N. H. Conservation and Divergence in Molecular Mechanisms of Axis Formation. Annu. Rev. Genet. 35, 407–437 (2001).

37. Moreno, E. & Morata, G. Caudal is the Hox gene that specifies the most posterior Drosophile segment. Nature 400, 873–877 (1999).

38. Epstein, M., Pillemer, G., Yelin, R., Yisraeli, J. K. & Fainsod, A. Patterning of the embryo along the anterior-posterior axis: the role of the caudal genes. Dev. Camb. Engl. 124, 3805–3814 (1997).

39. Copf, T., Schröder, R. & Averof, M. Ancestral role of caudal genes in axis elongation and segmentation. Proc. Natl. Acad. Sci. U. S. A. 101, 17711–17715 (2004).

40. Spitz, F. & Furlong, E. E. M. Transcription factors: from enhancer binding to developmental control. Nat. Rev. Genet. 13, 613–626 (2012).

41. Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

42. Bulger, M. & Groudine, M. Enhancers: the abundance and function of regulatory sequences beyond promoters. Dev. Biol. 339, 250–257 (2010).

43. Ghavi-Helm, Y. et al. Enhancer loops appear stable during development and are associated with paused polymerase. Nature advance online publication, (2014).

44. Holmqvist, P.-H. et al. Preferential Genome Targeting of the CBP Co-Activator by Rel and Smad Proteins in Early Drosophila melanogaster Embryos. PLoS Genet 8, e1002769 (2012).

45. Holmqvist, P.-H. & Mannervik, M. Genomic occupancy of the transcriptional co-activators p300 and CBP. Transcription 4, 18–23 (2013).

46. Levine, M., Cattoglio, C. & Tjian, R. Looping Back to Leap Forward: Transcription Enters a New Era. Cell 157, 13–25 (2014).

47. Zinzen, R. P., Girardot, C., Gagneur, J., Braun, M. & Furlong, E. E. M. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70 (2009).

Page 31: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

48. Erceg, J. et al. Subtle Changes in Motif Positioning Cause Tissue-Specific Effects on Robustness of an Enhancer’s Activity. PLoS Genet 10, e1004060 (2014).

49. Bonn, S. et al. Cell type-specific chromatin immunoprecipitation from multicellular complex samples using BiTS-ChIP. Nat. Protoc. 7, 978–994 (2012).

50. Bonn, S. et al. Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development. Nat. Genet. 44, 148–156 (2012).

51. Sandmann, T. et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449 (2007).

52. Bonn, S. & Furlong, E. E. M. cis-Regulatory networks during development: a view of Drosophila. Curr. Opin. Genet. Dev. 18, 513–520 (2008).

53. Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl. Acad. Sci. U. S. A. 102, 4936–4942 (2005).

54. Stathopoulos, A., Van Drenth, M., Erives, A., Markstein, M. & Levine, M. Whole-genome analysis of dorsal-ventral patterning in the Drosophila embryo. Cell 111, 687–701 (2002).

55. Markstein, M., Markstein, P., Markstein, V. & Levine, M. S. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl. Acad. Sci. U. S. A. 99, 763–768 (2002).

56. Zeitlinger, J. et al. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390 (2007).

57. Hong, J.-W., Hendrix, D. A., Papatsenko, D. & Levine, M. S. How the Dorsal gradient works: insights from postgenome technologies. Proc. Natl. Acad. Sci. U. S. A. 105, 20072–20076 (2008).

58. Reeves, G. T. & Stathopoulos, A. Graded dorsal and differential gene regulation in the Drosophila embryo. Cold Spring Harb. Perspect. Biol. 1, a000836 (2009).

59. Zehavi, Y., Sloutskin, A., Kuznetsov, O. & Juven-Gershon, T. The core promoter composition establishes a new dimension in developmental gene networks. Nucl. Austin Tex 5, (2014).

60. Ohtsuki, S., Levine, M. & Cai, H. N. Different core promoters possess distinct regulatory activities in the Drosophila embryo. Genes Dev. 12, 547–556 (1998).

61. Lagha, M. et al. Paused Pol II coordinates tissue morphogenesis in the Drosophila embryo. Cell 153, 976–987 (2013).

62. Kedmi, A. et al. Drosophila TRF2 is a preferential core promoter regulator. Genes Dev. 28, 2163–2174 (2014).

63. Duttke, S. H. C. RNA polymerase III accurately initiates transcription from RNA polymerase II promoters in vitro. J. Biol. Chem. 289, 20396–20404 (2014).

64. Pearson, J. C., Lemons, D. & McGinnis, W. Modulating Hox gene functions during animal body patterning. Nat. Rev. Genet. 6, 893–904 (2005).

65. Markstein, M., Pitsouli, C., Villalta, C., Celniker, S. E. & Perrimon, N. Exploiting position effects and the gypsy retrovirus insulator to engineer precisely expressed transgenes. Nat. Genet. 40, 476–483 (2008).

66. Fish, M. P., Groth, A. C., Calos, M. P. & Nusse, R. Creating transgenic Drosophila by microinjecting the site-specific phiC31 integrase mRNA and a transgene-containing donor plasmid. Nat. Protoc. 2, 2325–2331 (2007).

67. St Pierre, S. E., Ponting, L., Stefancsik, R., McQuilton, P. & FlyBase Consortium. FlyBase 102--advanced approaches to interrogating FlyBase. Nucleic Acids Res. 42, D780–788 (2014).

68. Tomancak, P. et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 3, RESEARCH0088 (2002).

69. Tomancak, P. et al. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 8, R145 (2007).

70. Bischof, J., Maeda, R. K., Hediger, M., Karch, F. & Basler, K. An optimized transgenesis system for Drosophila using germ-line-specific phiC31 integrases. Proc. Natl. Acad. Sci. U. S. A. 104, 3312–3317 (2007).

Page 32: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

71. MacArthur, S. et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 10, R80 (2009).

Page 33: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Appendix I A paper describing the ElemeNT and CORE resources, currently in revisions

ElemeNT: A computational tool for detecting core promoter elements

Anna Sloutskin1, Yehuda M. Danino1, Yonathan Zehavi1, Yaron Orenstein2, Tirza Doniger1, Ron Shamir2 and Tamar Juven-Gershon1* 1The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 5290002, Israel 2Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel.

Abstract

Core promoter elements play a pivotal role in the transcriptional output, yet their detection within

sequences of interest is largely manually-performed. Here, we present two contributions in the

curation and detection of core promoter elements within given sequences. First, the CORE is a

collection of TATA-box, initiator and downstream core promoter element sequences within

Drosophila melanogaster promoters. Second, the Elements Navigation Tool (ElemeNT) is a

convenient web-based, interactive tool for prediction and display of putative core promoter

elements and their biologically-relevant combinations. These resources, accessible at

http://lifefaculty.biu.ac.il/gershon-tamar/index.php/resources, facilitate the identification of core

promoter elements as active contributors to gene expression.

Page 34: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Introduction

The uniqueness of each cell, as well as the differences between cell types in multicellular organisms,

are largely achieved by distinct transcriptional programs. The regulation of transcription initiation is a

complex process that is primarily based on the direct interactions between transcription factors and

DNA. Transcription initiation occurs at the core promoter region where the RNA Polymerase II

(RNAPII) binds, which is often referred to as the ‘gateway to transcription’ [1-6]. Although it was

previously believed that the core promoter is a universal component that works in a similar

mechanism for all protein-coding genes, it is nowadays established that core promoters differ in their

architecture and function [5-9]. Moreover, distinct core promoter compositions were demonstrated

to result in various transcriptional outputs [10-14].

Transcription initiation is generally thought to occur in either a focused or a dispersed manner with

multiple detected combinations between these modes [6,7]. Promoters that exhibit a dispersed

initiation pattern typically contain multiple weak transcription start sites (TSSs) within a 50 to 100bp

region, and are associated with CpG islands. In vertebrates, dispersed transcription initiation appears

to account for the majority of protein-coding genes and is believed to direct the transcription of

constitutively-expressed genes.

Focused promoters contain a single TSS and are highly correlated with tightly regulated gene

expression [6]. The focused core promoter typically spans the region from -40 to +40 relative to the

first transcribed nucleotide, which is usually termed “the +1 position”. The focused core promoter

area encompasses distinct DNA sequence motifs, termed core promoter elements or motifs. These

elements are recognized by the basal transcription machinery to recruit RNAPII and form the

preinitiation complex [15-17]. The TFIID multi-subunit complex is a key basal transcription factor that

recognizes the core promoter in the process of transcription initiation [15-18]. A distinct set of TFIID

subunits, namely TATA box-binding protein (TBP) and TBP-associated factors (TAFs), recognize

specific core promoter sequences [4-6,15,19-22]. Table 1 and Figure 1 provide a summary of the

characteristics of the known core promoter elements that have been shown to function at a precise

distance from the TSS. Remarkably, the MTE, DPE and Bridge elements are exclusively dependent on

the presence of a functional initiator with a strict spacing requirement, and are typically enriched in

TATA-less promoters [4-6,19,20,22-24].

An important aspect of core promoter elements is their synergistic nature. Although the presence of

a specific core promoter element is usually sufficient to affect transcription, different combinations

of core promoter elements exist, with some shown to act in concert and hence affect the potency of

the transcriptional outcome [10,25]. It is therefore important to consider all the elements present

within the same promoter in order to assess its transcriptional strength.

Manual annotation of experimentally-validated Drosophila promoters for the presence of TATA-box,

Initiator and DPE was previously described [23]. This analysis includes 205 promoters, whose TSSs

Page 35: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

were empirically determined. This mapping of core promoter elements has facilitated the discovery

that the Drosophila Hox gene network is regulated via the DPE [26]. A more comprehensive analysis

of the whole Drosophila transcriptome revealed that DPE-containing genes are conserved and highly

prevalent among the target genes of Dorsal, a key regulator of dorsal-ventral axis formation [11].

These examples demonstrate that the comprehensive annotation of core promoter elements in each

transcript can greatly advance the understanding of gene expression regulation.

Methods and Results

We have constructed CORE, a database of all RefSeq-defined Drosophila melanogaster transcripts,

annotated for the presence of TATA-box, Drosophila Initiator and downstream core promoter

element (DPE) (File S1). The database is downloadable at http://lifefaculty.biu.ac.il/gershon-

tamar/index.php/core-description.

All the annotated Drosophila transcripts initiating at the same nucleotide were treated as a single

TSS. For a given TSS, an Initiator score was calculated for each position from -50 to +50 relative to +1

of the RefSeq TSS. For a position between -10 and +10 relative to the RefSeq’s TSS, each adenosine

was examined as a potential A+1, and was assigned a score based on nucleotides match to the

consensus Drosophila initiator sequence (Table 1). Only a match of 4 out of 6 nucleotides was

considered for further analysis. Two putative initiators were determined for each RefSeq TSS, with

the first priority Inr located closer to the annotated TSS. DPE motifs were calculated for each putative

initiator positions by scoring the sequence that is precisely located at +28 to +33 relative to the A+1

of the corresponding initiator, based on a match to the DPE functional range set (DSWYVY; an

experimentally defined broad DPE consensus [23], presented in Table 1).

The presence of TATA-box motifs was determined by searching for a 4-nucleotides ‘TATA’ sequence

match in the region between -45 and -19 relative to the RefSeq +1 position. This loose criterion was

used in order to avoid missing functional TATA box-containing promoters that do not match the 8-

nucleotides-long consensus (TATAWAAR). Furthermore, the frequencies of the TATA-box, Drosophila

initiator and DPE motifs among the Drosophila transcripts were summarized.

Overall, in addition to a comprehensive analysis of the core promoter composition of Drosophila

melanogaster transcripts, the CORE provides clues (based on the core promoter composition) with

regards to an optimal TSS.

However, a drawback of the CORE database is the use of RefSeq’s annotation of 5’ ends, which is not

always in accordance with published expressed sequence tags (ESTs), FlyBase annotation and high

throughput transcription data.

Prediction of promoter elements that affect the transcriptional output, in the absence of

experimental validation, is a difficult task. Although high throughput transcription data, such as cap

analysis gene expression (CAGE) and genomic run-on assay followed by deep sequencing (GRO-seq)

exist, the RefSeq annotation is still considered the “gold standard” (see Discussion). The majority of

Page 36: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

currently available promoter prediction programs search for over-represented motifs in a given set of

promoter sequences, rather than known core promoter elements [27-29]. Most of these programs

utilize other features, such as transcription factors binding sites, physical properties of the DNA, DNA

accessibility, RNA polymerase II occupancy and various epigenetic markers [29-35]. However, even

available programs, such as McPromoter [36] and Eukaryotic Core Promoter Predictor (YAPP,

http://www.bioinformatics.org/yapp/cgi-bin/yapp.cgi), rarely consider the strict spacing required by

the Inr-dependent elements, namely, DPE, MTE and Bridge.

The selection of promoters that comprise the data set used to predict core promoter elements

based on position weight matrices (PWMs) is of pivotal importance, as subtle variations in the

sequences may generate completely different PWMs [31]. Optimized algorithms, such as XXmotif,

can be used to accurately construct a PWM for over-represented motifs within the given set of

sequences [37,38]. Unfortunately, even a perfect model that is only based on sequence features,

cannot exclusively account for the observed transcriptional activity, as most of the sequence motifs

are short and redundant, and can thus be found in many non-transcriptionally active regions of the

genome [31]. Using experimentally-validated sequences rather than over-represented motifs, can

greatly enhance the strength of the prediction program, but cannot fully guarantee the accuracy of

the prediction. Nevertheless, experimental readout of transcription strength and start sites resulting

from a mutated promoter sequence is still not performed on a high-throughput scale; hence, the

currently available experimental results are prone to contain a statistical bias.

Notably, none of the available resources, including CORE, allow the identification of all the core

promoter elements presented in Table 1 and their potential combinations within a given sequence.

Moreover, the known biologically functional sequences may slightly differ from the determined

consensus, and therefore the detection of candidate core promoter elements cannot be easily

performed using currently available resources.

In order to facilitate the joint identification of the vast majority of core promoter elements and their

biologically-relevant combinations within a sequence, we developed the Elements Navigation Tool

(ElemeNT). ElemeNT is a web-based, interactive tool (implemented in Perl) for rapid and convenient

detection of core promoter elements and their combinations within any given sequence. It is

accessible at http://lifefaculty.biu.ac.il/gershon-tamar/index.php/element-description , where the

source files can be downloaded as well. ElemeNT searches the input sequences for the presence of

the core promoter elements specified by the user (Figure 2). The elements are represented by

PWMs, which are constructed based on the validated biologically functional sequences (File S2, Table

1). Notably, for some elements, the PWMs differ from the defined consensus sequences, reflecting

differences in the analyses of the sequences.

The elements that can be searched for are: Mammalian Initiator, Drosophila Initiator, TATA box,

MTE, DPE, Bridge, BREu, BREd, Human TCT, Drosophila TCT, XCPE1 and XCPE2 (Table 1, Figure 1).

Page 37: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Notably, the MTE, DPE and Bridge motifs are only calculated at the precise location relative to each

detected mammalian/Drosophila Initiator, based on the known strict spacing requirement that is

crucial for these elements to be functional. The scores are normalized to be between 0 and 1,

generating more intuitively interpretable results. For each element, the user should specify a

threshold between 0 and 1, which determines whether the element is present or not at a position.

Default threshold values were empirically determined for each element, based on known functional

sequence elements, and are provided.

For a PWM matrix P with k columns, the PWM score is calculated for each sub-sequence of length k

(k-mer) in the sequences, by multiplying the appropriate values of the PWM for each consecutive

position, as follows:

),('),(_ 1:1 ji

k

jkii SjPPSSCOREPWM , where kiiS :1 is a k-mer starting at position i+1 in

sequence S and ),(' xjP is the probability for nucleotide x at position j in P, normalized so that for a

given j, 1)},('max{ xjP . The role of this normalization is to guarantee that the final PWM score

for every element is between 0 and 1, irrespective of the PWM’s parameters. Each sub-sequence

with a score exceeding the specified threshold is termed ‘hit’. The score is calculated for

kni 0 , where n is the length of the input sequence S, and hits are displayed in a list sorted in

descending score order for each element. Consensus match scores, which are the number of base

matches of the hit to the motif’s consensus, are also reported for each hit (Table 1).

The output of the program contains the analyzed sequence, a color display of some possible core

promoter elements combinations found, and a table containing each of the detected elements

alongside its position, PWM and consensus match scores.

In order to indicate potential synergism between elements that may inspire further exploration,

suggested combinations of core promoter elements are displayed. The elements that are considered

to form possible combinations are any combination of the following: 1) the mammalian/Drosophila

Initiator and either the MTE, DPE or Bridge motifs, 2) TATA box and mammalian/Drosophila Initiator,

3) TATA box and either BREu or BREd (Figure 3A).

In the output table, the elements are ordered based on the type of elements, and then sorted by

PWM scores (Figure 3B). The MTE, DPE and Bridge motifs, which are strictly dependent on the

presence of a functional initiator [4-6,19,20,22,24], are displayed immediately below the

corresponding initiator. For TATA box motifs, a message is displayed if the specific TATA-box is

located 26 to 40bp upstream of the A+1 of the initiator. In addition, a message is displayed if a BREu

or BREd is located in close proximity to the specific TATA-box [39-41].

To assess the functionality of the ElemeNT program, several experimentally-validated core promoter

sequences were analyzed by the program. The analysis of the Drosophila Inr is presented as an

example (Figure S1A). As expected, lower threshold values generated larger amount of hits, including

Page 38: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

the correct ones. False negative hits were scored as well, based on missed motifs. The intersection of

both parameters (discovery rate versus false positives) had a strong correlation with scores obtained

for previously validated motifs’ sequence variations [23].

An additional evaluated parameter was the length of the sequence, which directly affects the

amount of discovered motifs, as they are mostly composed of redundant sequences. Since only one

motif was validated in each input sequence, the other discovered motifs were considered as false

positives. With a cut-off of 0.1, there is not a big variation in the sequence length up till ~300 bp,

however, 40% of the sequences are not detected. With a cut-off of 0.01, the program detects all the

correct Inr elements, however, the number of false positive hits increase linearly. Hence, the

ElemeNT program performs better for short sequences (Figure S1B). Taken together, both the CORE

database and the ElemeNT program present new improved tools to assess the presence of core

promoter elements within a given DNA sequence.

Discussion

Core promoter elements, located at the immediate vicinity of the TSS, were demonstrated to greatly

affect the transcriptional output [6,7]. The majority of these motifs were identified as elements that

are recognized by components of the preinitiation complex [19,39,40,42,43]. In addition, many

statistically overrepresented motifs were identified in the region surrounding annotated TSS [44-46],

with some of them being experimentally demonstrated to exert an effect on transcriptional outcome

[24], or bind transcription-regulating proteins [47]. However, the analysis of large-scale experiments

involves critical decisions making and hence, might be prone to errors.

The determination of actual TSSs, which influence the motifs discovered in their vicinity, is a critical

factor in the prediction of core promoter elements. The comprehensive determination of TSS

provided by RefSeq is based on the alignment of sequenced high-quality RNA [48]. However, the

start site of the same gene can differ across the various developmental stages, tissues, and time

points sampled, which possess a great challenge for integration of the data provided by different

studies.

A wealth of novel high-throughput techniques to identify features and sequences that might affect

transcription is rapidly evolving; these include PEAT [49], 5' RACE [50], CAGE [51], FAIRE-seq [52],

ChIP-seq [53], and GRO-seq [54]. The above techniques are applied by major projects and consortia,

which are aimed to dissect the rules governing transcriptional regulation, including ENCODE [55],

modENCODE [56], and FANTOM5 [57], as well as other genome-wide studies [58,59]. These different

strategies complement each other and introduce together a more much more complicated view of

RNA transcription initiation than previously anticipated [60].

Furthermore, core promoter elements are associated with focused, rather than dispersed,

transcription [6], while the classification of promoters to these classes is largely lacking. A careful and

Page 39: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

comprehensive examination of the already available CAGE [61-63] and GRO-seq [60,64] data might

provide the answer to this question in the near future. As additional data becomes available, it will

be of utmost importance to integrate the information and perhaps, re-define transcription start sites.

Insights gained during the integration process will enable the re-evaluation of current tools.

The CORE database, which relies on the RefSeq’s annotation, should be revisited in the future, when

new standardized data for transcription start sites will be available. The overall distribution of TATA

box, Inr and DPE motifs among the Drosophila transcripts might change then, providing novel and

exciting aspects of gene regulation.

The uniqueness of the ElemeNT program, as compared to other promoter-prediction software, is its

major focus on biologically-functional core promoter elements, manifested by two major concepts

that lie at the foundation of the ElemeNT algorithm. The first is the exclusive use of experimentally

validated core promoter motifs, rather than overrepresented motifs, to construct the PWMs used.

The second is the obligatory presence of an initiator, and the strict spacing for the downstream

promoter elements MTE, DPE and Bridge. Both the presence of a functional initiator and the strict

spacing are crucial for the functionality of the downstream elements, and are frequently omitted by

other core promoter elements prediction programs available [27,29,32,35,36]. Moreover, the

identification of combinations of elements, which were experimentally demonstrated to result in

synergistic effects [10,25], may spark new research directions. Despite the fact that the presence of

potential core promoter elements, or any combination of them, may not necessarily imply that the

elements are functional, their presence might indicate that the specific genomic locus is

transcriptionally active. However, in contrast to most of the available promoter prediction programs,

ElemeNT is not designed to produce or analyze a genome-scale data, but is rather intended to

narrow down a given region of interest, considering the currently available, experimentally-validated

information about core promoter motifs themselves. The redundancy of the core promoter motifs

leads to the identification of sequences that perfectly match functionally-verified sequences, yet are

not functional. Based on experience with transcription factors binding motifs [65], sorting out only

the functionally-relevant hits might prove to be a difficult task. Future modifications of the algorithm

finding core promoter elements will be based on new insights and a better understanding of

transcription regulation, obtained by the abovementioned techniques and consortia.

Importantly, the ElemeNT program can assist in the analysis of sequences from organisms whose

TSSs have not yet been comprehensively defined. For example, both the TATA box and the BRE

motifs are conserved from archaebacteria to humans [66] and many organisms whose

transcriptomes have not been annotated, are likely to contain such core promoter elements. To

conclude, we anticipate that the ElemeNT program, along with the CORE database, will make the

search for specific core promoter elements and their combinations within Drosophila transcripts or

Page 40: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

any sequence of interest, accessible to scientists and help in elucidating the major role core

promoter elements play in gene expression.

Acknowledgments

We thank Marina Socol, Boris Komraz and Dr. Eli Sloutskin for invaluable assistance in ElemeNT

development and web execution. We thank Dr. Diana Ideses, Dan Even, Adi Kedmi, Hila Shir-Shapira

and Gal Nuta for critical reading of the manuscript.

Funding Statement

This research was supported by grants from the Israel Science Foundation to T.J-G (no. 798/10) and

R.S (no. 317/13) and the European Union Seventh Framework Programme (Marie Curie International

Reintegration Grant) to T.J-G (no. 256491). Y.O was supported by the Edmond J. Safra Center for

Bioinformatics at Tel-Aviv University and the Israeli Center for Research Excellence (I-CORE), Gene

Regulation in Complex Human Disease, center 41/11.

Page 41: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

References

1. Smale ST (2001) Core promoters: active contributors to combinatorial gene regulation. Genes & Development 15: 2503-2508.

2. Smale ST, Kadonaga JT (2003) The RNA polymerase II core promoter. Annual Review of Biochemistry 72: 449-479.

3. Heintzman ND, Ren B (2007) The gateway to transcription: identifying, characterizing and understanding promoters in the eukaryotic genome. Cellular and Molecular Life Sciences 64: 386-400.

4. Juven-Gershon T, Hsu J-Y, Theisen JWM, Kadonaga JT (2008) The RNA polymerase II core promoter - the gateway to transcription. Current Opinion in Cell Biology 20: 253-259.

5. Juven-Gershon T, Kadonaga JT (2010) Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev Biol 339: 225-229.

6. Kadonaga JT (2012) Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip Rev Dev Biol 1: 40-51.

7. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging characteristics

and insights into transcriptional regulation. Nature Reviews Genetics 13: 233-245. 8. Muller F, Demeny MA, Tora L (2007) New problems in RNA polymerase II transcription initiation:

matching the diversity of core promoters with a variety of promoter recognition factors. J Biol Chem 282: 14685-14689.

9. Muller F, Tora L (2014) Chromatin and DNA sequences in defining promoters for transcription initiation. Biochim Biophys Acta 1839: 118-128.

10. Juven-Gershon T, Cheng S, Kadonaga JT (2006) Rational design of a super core promoter that enhances gene expression. Nature Methods 3: 917-922.

11. Zehavi Y, Kuznetsov O, Ovadia-Shochat A, Juven-Gershon T (2014) Core promoter functions in the regulation of gene expression of Drosophila dorsal target genes. J Biol Chem 289: 11993-12004.

12. Zehavi Y, Sloutskin A, Kuznetsov O, Juven-Gershon T (2014) The core promoter composition establishes a new dimension in developmental gene networks. Nucleus 5.

13. Butler JE, Kadonaga JT (2001) Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev 15: 2515-2519.

14. Dikstein R (2011) The unexpected traits associated with core promoter elements. Transcription 2: 201-206.

15. Thomas MC, Chiang CM (2006) The general transcription machinery and general cofactors. Critical Reviews in Biochemistry and Molecular Biology 41: 105-178.

16. He Y, Fang J, Taatjes DJ, Nogales E (2013) Structural visualization of key steps in human transcription initiation. Nature 495: 481-486.

17. Grunberg S, Hahn S (2013) Structural insights into transcription initiation by RNA polymerase II. Trends Biochem Sci 38: 603-611.

18. Cianfrocco MA, Kassavetis GA, Grob P, Fang J, Juven-Gershon T, et al. (2013) Human TFIID binds to core promoter DNA in a reorganized structural state. Cell 152: 120-131.

19. Burke TW, Kadonaga JT (1996) Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes & Development 10: 711-724.

20. Burke TW, Kadonaga JT (1997) The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAF(II)60 of Drosophila. Genes & Development 11: 3020-3031.

21. Wu CH, Madabusi L, Nishioka H, Emanuel P, Sypes M, et al. (2001) Analysis of core promoter sequences located downstream from the TATA element in the hsp70 promoter from Drosophila melanogaster. Mol Cell Biol 21: 1593-1602.

22. Theisen JW, Lim CY, Kadonaga JT (2010) Three key subregions contribute to the function of the downstream RNA polymerase II core promoter. Mol Cell Biol 30: 3471-3479.

23. Kutach AK, Kadonaga JT (2000) The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters. Mol Cell Biol 20: 4754-4764.

Page 42: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

24. Lim CY, Santoso B, Boulay T, Dong E, Ohler U, et al. (2004) The MTE, a new core promoter element for transcription by RNA polymerase II. Genes & Development 18: 1606-1617.

25. Gershenzon NI, Ioshikhes IP (2005) Synergy of human Pol II core promoter elements revealed by statistical sequence analysis. Bioinformatics 21: 1295-1300.

26. Juven-Gershon T, Hsu J-Y, Kadonaga JT (2008) Caudal, a key developmental regulator, is a DPE-specific transcriptional factor. Genes & Development 22: 2823-2830.

27. Bajic VB, Tan SL, Suzuki Y, Sugano S (2004) Promoter prediction analysis on the whole human genome. Nat Biotechnol 22: 1467-1473.

28. Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, et al. (2008) A code for transcription initiation in mammalian genomes. Genome Res 18: 1-12.

29. Narlikar L, Ovcharenko I (2009) Identifying regulatory elements in eukaryotic genomes. Brief Funct Genomic Proteomic 8: 215-230.

30. Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet 17: 56-60.

31. Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic promoter prediction--a review. Comput Chem 23: 191-207.

32. Rach EA, Winter DR, Benjamin AM, Corcoran DL, Ni T, et al. (2011) Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet 7: e1001274.

33. Duran E, Djebali S, Gonzalez S, Flores O, Mercader JM, et al. (2013) Unravelling the hidden DNA structural/physical code provides novel insights on promoter location. Nucleic Acids Res 41: 7220-7230.

34. Abeel T, Saeys Y, Rouze P, Van de Peer Y (2008) ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24: i24-31.

35. Datta S, Mukhopadhyay S (2013) A composite method based on formal grammar and DNA structural features in detecting human polymerase II promoter region. PLoS One 8: e54843.

36. Ohler U (2006) Identification of core promoter modules in Drosophila and their application in accurate transcription start site prediction. Nucleic Acids Res 34: 5943-5950.

37. Hartmann H, Guthohrlein EW, Siebert M, Luehr S, Soding J (2013) P-value-based regulatory motif discovery using positional weight matrices. Genome Res 23: 181-194.

38. Luehr S, Hartmann H, Soding J (2012) The XXmotif web server for eXhaustive, weight matriX-based motif discovery in nucleotide sequences. Nucleic Acids Res 40: W104-109.

39. Deng W, Roberts SG (2005) A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev 19: 2418-2423.

40. Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH (1998) New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev 12: 34-44.

41. Deng W, Roberts SG (2007) TFIIB and the regulation of transcription by RNA polymerase II. Chromosoma 116: 417-429.

42. Chalkley GE, Verrijzer CP (1999) DNA binding site selection by RNA polymerase II TAFs: a TAF(II)250-TAF(II)150 complex recognizes the initiator. EMBO J 18: 4835-4845.

43. Tokusumi Y, Ma Y, Song X, Jacobson RH, Takada S (2007) The new core promoter element XCPE1 (X Core Promoter Element 1) directs activator-, mediator-, and TATA-binding protein-dependent but TFIID-independent RNA polymerase II transcription from TATA-less promoters. Mol Cell Biol 27: 1844-1858.

44. FitzGerald PC, Sturgill D, Shyakhtenko A, Oliver B, Vinson C (2006) Comparative genomics of Drosophila and human core promoters. Genome Biol 7: R53.

45. Ohler U, Liao GC, Niemann H, Rubin GM (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biol 3: RESEARCH0087.

46. Xi H, Yu Y, Fu Y, Foley J, Halees A, et al. (2007) Analysis of overrepresented motifs in human core promoters reveals dual regulatory roles of YY1. Genome Res 17: 798-806.

47. Li J, Gilmour DS (2013) Distinct mechanisms of transcriptional pausing orchestrated by GAGA factor and M1BP, a novel transcription factor. EMBO J 32: 1829-1841.

Page 43: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

48. Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501-504.

49. Ni T, Corcoran DL, Rach EA, Song S, Spana EP, et al. (2010) A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat Methods 7: 521-527.

50. Frohman MA, Dush MK, Martin GR (1988) Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci U S A 85: 8998-9002.

51. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, et al. (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100: 15776-15781.

52. Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD (2007) FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17: 877-885.

53. Furey TS (2012) ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 13: 840-852.

54. Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322: 1845-1848.

55. (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636-640. 56. Washington NL, Stinson EO, Perry MD, Ruzanov P, Contrino S, et al. (2011) The modENCODE Data

Coordination Center: lessons in harvesting comprehensive experimental details. Database (Oxford) 2011: bar023.

57. Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ, et al. (2014) A promoter-level mammalian expression atlas. Nature 507: 462-470.

58. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, et al. (2007) Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8: 424-436.

59. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet 13: 233-245.

60. Core LJ, Martins AL, Danko CG, Waters CT, Siepel A, et al. (2014) Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46: 1311-1320.

61. Consortium F, the RP, Clst, Forrest AR, Kawaji H, et al. (2014) A promoter-level mammalian expression atlas. Nature 507: 462-470.

62. Hoskins RA, Landolin JM, Brown JB, Sandler JE, Takahashi H, et al. (2011) Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res 21: 182-192.

63. Nechaev S, Fargo DC, dos Santos G, Liu L, Gao Y, et al. (2010) Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science 327: 335-338.

64. Saunders A, Core LJ, Sutcliffe C, Lis JT, Ashe HL (2013) Extensive polymerase pausing during Drosophila axis patterning enables high-level and pliable transcription. Genes Dev 27: 1146-1158.

65. Shlyueva D, Stampfel G, Stark A (2014) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15: 272-286.

66. Reeve JN (2003) Archaeal chromatin and transcription. Mol Microbiol 48: 587-598. 67. Smale ST, Baltimore D (1989) The "initiator" as a transcription control element. Cell 57: 103-113. 68. Goldberg ML (1979) Ph.D. Thesis. Sequence analysis of Drosophila histone genes. 69. Parry TJ, Theisen JWM, Hsu J-Y, Wang Y-L, Corcoran DL, et al. (2010) The TCT motif, a key

component of an RNA polymerase II transcription system for the translational machinery. Genes & Development 24: 2013-2018.

70. Anish R, Hossain MB, Jacobson RH, Takada S (2009) Characterization of transcription from TATA-less promoters: identification of a new core promoter element XCPE2 and analysis of factor requirements. PLoS One 4: e5103.

Page 44: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

71. Lewis BA, Kim TK, Orkin SH (2000) A downstream element in the human beta-globin promoter: evidence of extended sequence-specific transcription factor IID contacts. Proc Natl Acad Sci U S A 97: 7172-7177.

72. Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, et al. (2005) Functional characterization of core promoter elements: the downstream core element is recognized by TAF1. Mol Cell Biol 25: 9674-9686.

Page 45: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Figure 1. Schematic representation of the major core promoter elements. The region of the core

promoter area (-40 to +40 relative to the TSS) is illustrated. The diagram is roughly to scale, and each

element is colored according to its color in the output table (see Figure 3B).

Figure 2. Flow diagram of the ElemeNT calculation process. The flowchart demonstrates the input,

processing and output steps of the ElemeNT program. The input consists of a set of sequences and

the elements to search for with their corresponding thresholds. ElemeNT calculates hits for each

element, and considers possible combinations. The output includes combinations of core promoter

elements and a table containing all the identified elements, their location, PWM score and consensus

match score.

Page 46: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

Figure 3. A sample output of the ElemeNT

program. (A) A screen-shot of the sample

input sequence and the combinations of

elements identified in it. ElemeNT has

detected a TATA box flanked by both a BREu

element and a BREd element, Drosophila and

Mammalian initiator elements and MTE, DPE

and Bridge elements. The two possible

combinations result from a sequence match

to both the Drosophila and mammalian

initiators, due to the partial sequence

redundancy of the two elements. (B) The

table displaying all the elements identified

within the sample input sequence, their

location, PWM and consensus match scores.

Note the message displayed for the TATA-

box, indicating the presence of mammalian

and Drosophila initiator, as well as BREu and

BREd, at optimal distances for transcriptional

synergy.

Figure S1. Determination of optimal parameters

for the ElemeNT program. (A) Evaluation of

discovery rates. False positive hits (wrong/seq) and

discovery rates (calculated based on the

missed/seq) for Drosophila initiator motif were

scored as a function of the cutoff used. n=43,

sequence length =50bp. (B) Assessment of optimal

sequence length. Sequences of different length

containing individual experimental TSS were scored

for false positive hits of Drosophila initiator motif.

False positive hits are found in correlation with both

sequence length and cutoff value. n=34.

Page 47: Analysis of core promoter elements: in-vivo - BIUlifefaculty.biu.ac.il/gershon-tamar/images/Theses_Proposals/proposals... · Analysis of core promoter elements: in-vivo functionality

תקציר 6שעתוק הינה חיונית להתפתחותו של יצור חי, וכן אחראית להבדלים בין ה בקרה תקינה של תהליך

סוגי התאים השונים. הבקרה על השעתוק מתרחשת בכמה רבדים, ביניהם הקישור של פקטורי

של מיליוני בסיסים מאתר , אשר יכול להמצא במרחקenhancerבאיזור ה DNAשעתוק ספציפיים ל

, אשר נקשר לאיזור תחילת RNA polymerase IIאנזים עצמו נוצר ע"י ה RNAתחילת השעתוק. ה

האיזור אליו נקשרים . (general transcription factors) השעתוק ביחד עם פקטורי השעתוק הכלליים

פרומוטור הליבה מוגדר (. core promoterנקרא פרומוטור הליבה ) לשעתוקהחלבונים האחראים

בפרומוטור הליבה +.1יחסית לנקודת תחילת השעתוק, ה +40ל -40בסיסים, בין 80כאיזור של

(, המקנים תכונות שעתוקיות שונות core promoter elementsקיימים אלמנטים רצפיים שונים )

לפרומוטור בו הם נמצאים.

downstream coreהליבה השונים, ובעיקר ההמחקר המוצע יבחן את חשיבות אלמנטי פרומוטור

promoter element והTATA box לבקרת השעתוק בזבוב התסיסה ,Drosophila melanogaster .

המחקר יכלול שימוש נרחב בכלים המולקולריים והגנטיים המפותחים הקיימים, וכן אנליזות

הטבעי, בתוך היצור הבהקשרחן תיב core promoter elementsפעילות ה .שונות ביואינפורמטיות

היבט נוסף שייבחן הוא הקישור המרחבי בין סוגי הפרומוטורים השונים לבין האנהנסורים .השלם

הקישוריות השונה של הפרומוטורים מהווה שלהם, שמרוחקים מהם מרחק של עשרות אלפי בסיסים.

ק הגנים בכלל, ובמהלך בקרה נוספת על השעתוק, ובכך מוסיפה להבנתנו את תהליך בקרת שעתו

ההתפתחות העוברית בפרט.

core הרכב הבנוסף למידע המצוי כיום, כי לסיכום, על ידי המחקר המוצע נרצה לבסס את ההשערה

promoter של בקרת השעתוקחשוב מהווה נדבך .