modeling the regulatory network of histone acetylation in saccharomyces cerevisiae

10
Modeling the regulatory network of histone acetylation in Saccharomyces cerevisiae Hung Pham 1 , Roberto Ferrari 2 , Shawn J Cokus 1 , Siavash K Kurdistani 2 and Matteo Pellegrini 1, * 1 Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA, USA and 2 Department of Biological Chemistry, University of California, Los Angeles, CA, USA * Corresponding author. Department of Molecular, Cell and Developmental Biology, 621 Charles Young Drive South, University of California, Los Angeles, CA 90095, USA. Tel.: þ 1 310 825 0012; Fax: þ 1 310 206 3987; E-mail: [email protected] Received 20.8.07; accepted 17.10.07 Acetylation of histones plays an important role in regulating transcription. Histone acetylation is mediated partly by the recruitment of specific histone acetyltransferases (HATs) and deacetylases (HDACs) to genomic loci by transcription factors, resulting in modulation of gene expression. Although several specific interactions between transcription factors and HATs and HDACs have been elaborated in Saccharomyces cerevisiae, the full regulatory network remains uncharacterized. We have utilized a linear regression of optimized sigmoidal functions to correlate transcription factor binding patterns to the acetylation profiles of 11 lysines in the four core histones measured at all S. cerevisiae promoters. The resulting associations are combined with large-scale protein–protein interaction data sets to generate a comprehensive model that relates recruitment of specific HDACs and HATs to transcription factors and their target genes and the resulting effects on individual lysines. This model provides a broad and detailed view of the regulatory network, describing which transcription factors are most significant in regulating acetylation of specific lysines at defined promoters. We validate the model, both computationally and experimentally, to demonstrate that it yields accurate predictions of these regulatory mechanisms. Molecular Systems Biology 18 December 2007; doi:10.1038/msb4100194 Subject Categories: chromatin & transcription Keywords: histone acetylation; linear regression; networks; regulatory network This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. Creation of derivative works is permitted but the resulting work may be distributed only under the same or similar licence to this one. This licence does not permit commercial exploitation without specific permission. Introduction Transcriptional regulation is a highly complex process that involves not only the recruitment of polymerases to active promoters but also the remodeling of chromatin. Within silent chromatin, DNA is tightly wrapped around histone proteins (Richmond and Davey, 2003) leading to transcriptional suppression. In contrast, in active promoters, the chromatin state is more open, enabling access to the transcription machinery. One of the mechanisms used to control the state of chromatin structure is the modification of histones (Kurdistani and Grunstein, 2003). Histone acetyltransferases (HATs) and histone deacetylases (HDACs) alter the charge state of lysines in histone tails by adding and removing acetyl groups. The pattern of these modifications, along with methylation, phosphorylation and ubiquitination, contributes to establish a proposed histone code that regulates transcrip- tion through alterations of chromatin structure and generation of binding sites for chromatin-interacting factors (Strahl and Allis, 2000; Jenuwein and Allis, 2001). Histone acetylation patterns at individual genomic loci are in large part determined by the recruitment of specific HATs and HDACs by transcription factors (TFs) (Kurdistani and Grunstein, 2003). For instance, the Ume6 TF recruits the Rpd3 HDAC to its target genes, leading to repression of the downstream genes (Rundlett et al, 1998). Although a few interactions between HATs and HDACs and specific TFs have been established, we do not yet know the full network of interactions between these and the effect that such interactions have on establishing acetylation patterns on histone tails. While TF binding has been correlated with histone acetylation patterns, a regulatory network that controls histone acetylation levels has not been deter- mined (Guo et al, 2006). Therefore, we set out to utilize genome-wide TF-binding and acetylation data, along with protein–protein interaction data, to determine this network on a genome-wide scale. To construct a comprehensive network, we utilized two data sets that provide a systematic measurement of the acetylation levels of 11 lysines (Kurdistani et al, 2004) and the binding & 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 1 Molecular Systems Biology 3; Article number 153; doi:10.1038/msb4100194 Citation: Molecular Systems Biology 3:153 & 2007 EMBO and Nature Publishing Group All rights reserved 1744-4292/07 www.molecularsystemsbiology.com

Upload: blogspot

Post on 28-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Modeling the regulatory network of histone acetylationin Saccharomyces cerevisiae

Hung Pham1, Roberto Ferrari2, Shawn J Cokus1, Siavash K Kurdistani2 and Matteo Pellegrini1,*

1 Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA, USA and 2 Department of Biological Chemistry, University ofCalifornia, Los Angeles, CA, USA* Corresponding author. Department of Molecular, Cell and Developmental Biology, 621 Charles Young Drive South, University of California, Los Angeles, CA 90095, USA.Tel.: þ 1 310 825 0012; Fax: þ 1 310 206 3987; E-mail: [email protected]

Received 20.8.07; accepted 17.10.07

Acetylation of histones plays an important role in regulating transcription. Histone acetylation ismediated partly by the recruitment of specific histone acetyltransferases (HATs) and deacetylases(HDACs) to genomic loci by transcription factors, resulting in modulation of gene expression.Although several specific interactions between transcription factors and HATs and HDACs have beenelaborated in Saccharomyces cerevisiae, the full regulatory network remains uncharacterized. Wehave utilized a linear regression of optimized sigmoidal functions to correlate transcription factorbinding patterns to the acetylation profiles of 11 lysines in the four core histones measured at allS. cerevisiae promoters. The resulting associations are combined with large-scale protein–proteininteraction data sets to generate a comprehensive model that relates recruitment of specific HDACsand HATs to transcription factors and their target genes and the resulting effects on individuallysines. This model provides a broad and detailed view of the regulatory network, describing whichtranscription factors are most significant in regulating acetylation of specific lysines at definedpromoters. We validate the model, both computationally and experimentally, to demonstrate that ityields accurate predictions of these regulatory mechanisms.Molecular Systems Biology 18 December 2007; doi:10.1038/msb4100194Subject Categories: chromatin & transcriptionKeywords: histone acetylation; linear regression; networks; regulatory network

This is an open-access article distributed under the terms of the Creative Commons Attribution Licence,which permits distribution and reproduction in any medium, provided the original author and source arecredited. Creation of derivative works is permitted but the resulting work may be distributed only under thesame or similar licence to this one. This licence does not permit commercial exploitation without specificpermission.

Introduction

Transcriptional regulation is a highly complex process thatinvolves not only the recruitment of polymerases to activepromoters but also the remodeling of chromatin. Within silentchromatin, DNA is tightly wrapped around histone proteins(Richmond and Davey, 2003) leading to transcriptionalsuppression. In contrast, in active promoters, the chromatinstate is more open, enabling access to the transcriptionmachinery. One of the mechanisms used to control the stateof chromatin structure is the modification of histones(Kurdistani and Grunstein, 2003). Histone acetyltransferases(HATs) and histone deacetylases (HDACs) alter the chargestate of lysines in histone tails by adding and removing acetylgroups. The pattern of these modifications, along withmethylation, phosphorylation and ubiquitination, contributesto establish a proposed histone code that regulates transcrip-tion through alterations of chromatin structure and generationof binding sites for chromatin-interacting factors (Strahl andAllis, 2000; Jenuwein and Allis, 2001).

Histone acetylation patterns at individual genomic loci arein large part determined by the recruitment of specific HATsand HDACs by transcription factors (TFs) (Kurdistani andGrunstein, 2003). For instance, the Ume6 TF recruits theRpd3 HDAC to its target genes, leading to repression ofthe downstream genes (Rundlett et al, 1998). Althougha few interactions between HATs and HDACs and specificTFs have been established, we do not yet know the fullnetwork of interactions between these and the effect thatsuch interactions have on establishing acetylation patternson histone tails. While TF binding has been correlatedwith histone acetylation patterns, a regulatory networkthat controls histone acetylation levels has not been deter-mined (Guo et al, 2006). Therefore, we set out to utilizegenome-wide TF-binding and acetylation data, along withprotein–protein interaction data, to determine this network ona genome-wide scale.

To construct a comprehensive network, we utilized two datasets that provide a systematic measurement of the acetylationlevels of 11 lysines (Kurdistani et al, 2004) and the binding

& 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 1

Molecular Systems Biology 3; Article number 153; doi:10.1038/msb4100194Citation: Molecular Systems Biology 3:153& 2007 EMBO and Nature Publishing Group All rights reserved 1744-4292/07www.molecularsystemsbiology.com

levels of 204 TFs in promoter regions (Harbison et al, 2004) ona genome-wide basis in Saccharomyces cerevisiae. In addition,protein–protein interaction data from all yeast complexeshave been recently determined (Gavin et al, 2006), andhave been added to the pre-existing set of interactionscontained in various databases such as DIP (Salwinskiet al, 2004) and SGD. Using multiple linear regression ofoptimized sigmoidal functions, we construct a networkthat links TFs to individual lysines in histones. When known,we also introduce in our model interactions between TFsand specific histone-modifying enzymes. We provide bothcomputational and experimental validation of the network,indicating that it accurately captures the regulation ofacetylation levels of histones promoters in S. cerevisiae on agenome-wide scale.

We find that there is selectivity of histone acetylation siteusage by various TFs, with each TF associating with a specificsubset of acetylation sites. We also find that certain acetylationsites are linked to a greater number of TFs than others,indicating that some acetylation sites may have widespreadroles in transcription. We show that the network supports theview that the deacetylation of promoters tends to represstranscription, and acetylation to activate it, although theregulation of histone genes represents an interesting exceptionto this rule. Finally, we show that in most cases the TFs causethe changes in acetylation through the recruitment of HATsand HDACs, rather than being recruited by the histonemodifications.

Results

Construction of a global histone acetylationnetwork

We have constructed a model of the regulatory networkof histone acetylation in S. cerevisiae. This model describesthe interactions between HAT and HDAC complexes, the TFsthat recruit them and the lysines that are modified.To construct this model, we analyzed two data sets thatmeasure the binding of TFs and the acetylation state ofhistones at all yeast promoters. Both data sets were colleted inrich media; therefore our model represents the regulatorynetwork in this condition and may potentially be altered inother conditions that subject the cell to stresses. Since ourhypothesis is that the particular acetylation profile of apromoter is in large part determined through the recruitmentof HATs and HDACs by TFs that bind the promoter, weconstructed a multivariate linear regression framework ofoptimized sigmoidal functions for modeling acetylation interms of TF-binding data.

Multivariate regression allows us to model combinatorialregulation by multiple TFs. That is, the acetylation level ofa gene may be determined by the binding of multiple factors toits promoter. In fact, the majority of the promoters in ourmodel are regulated by more than one TF, as shown inSupplementary Figure 1. Nonlinear sigmoidal functions allowus to model the expected sigmoidal relationship betweenTF-binding and acetylation levels (see Supplementary Figure 2).If the binding of a TF to a promoter, as measured by thelogarithm of the ratios of immunoprecipitated DNA to input, is

negative (implying little or no binding), the effect on acetylationis close to zero, whereas if the binding is positive and large, theeffect saturates at some maximal level of acetylation. We use themidpoint of the sigmoidal function as the threshold for definingtargets of a factor: promoters that are bound by levels above themidpoint are considered bona fide targets, whereas those withlower binding are not. The distribution of the number of targetsper factor is shown in Supplementary Figure 3 and shows thatthe factors may generally be classified as global regulators thataffect many genes, or local ones that affect a small number (lessthan 200).

Using this approach, we describe the acetylation ofeach lysine as a linear superposition of the binding profilesof all TFs transformed by each function. For example, wemodel the acetylation level of H3K18ac at each promoter i asfollows:

H3K18aci ¼XN

j¼1

FjðBijÞ

where Fj is the sigmoidal function that describes the couplingbetween TF j and H3K18ac, and Bij is the binding of TF j topromoter i. The functional form of F is fit for each TF, but iskept constant for a specific factor across all promoters. Thuswe need to fit three parameters per TF in our model: the firstparameter describes the midpoint of the sigmoidal function,the second its slope and the third its maximal value. If themaximal value of the function is positive, then we hypothesizethat the binding of the factor leads to the acetylation ofthe promoter. In contrast, if the maximal value is negative,then the binding leads to deacetylation. Multivariate regres-sion identifies the most significant TFs in the model and thefunctional form F between each TF and histone lysine. Theserelationships may then be represented as a network thatcontains the acetylated lysines and the significant factors thatmodify them. The details of this approach are described inMaterials and methods below.

The full network of the relationships between TFs andhistone lysines is shown in Figures 1 and 2. For clarity, we haveseparated the full network into two subnetworks that naturallyemerge from this analysis: one network describes theregulation of hyperacetylation while the other hypoacetyla-tion. Both networks depict the 11 histone lysines whoseacetylation levels are measured at all promoters (Kurdistaniet al, 2004) as yellow ovals. The TFs that emerge in our modelas significant regulators of the acetylation or deacetylation ofspecific lysines are depicted as rounded rectangles in shades ofblue or red, indicating the average magnitude of the maximalvalue of the associated sigmoidal function; dark blue indicatesa large positive maximal value while dark red a large negativeone. The shade of the link indicates the maximal value of thesigmoidal function that couples a specific TF and lysine—bluefor positive values and red for negative ones.

We hypothesize that these TFs recruit proteins that areresponsible for the acetylation and deacetylation of histonelysines. As a result, we have extended the network byincluding interactions between HATs and HDACs and TFsfrom protein–protein interactions databases. These are shownin the second panel on the right side of the networks. Finally, tocomplete the network, we show the known HAT and HDACcomplexes in the rightmost panel.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

2 Molecular Systems Biology 2007 & 2007 EMBO and Nature Publishing Group

Combinatorial interaction of TFs and histoneacetylation

Considering that there are more TFs than histone modifiers orhistone acetylation sites, there should be overlapping butdistinct patterns of acetylation site usage by TFs. The networkshows this to be indeed the case. Figure 3A, derived from thecombination of the two hyper- and hypoacetylation networks,shows the number of TFs that are associated with each lysinein our network. From the number of interactions, we identifyH3K18 as the most widely regulated acetylation site with linksto 15 TFs. At the opposite end of the spectrum, H4K16 isregulated by only two TFs.

It is interesting to note that H3K18 and H4K16 show the leastcorrelation between any pair of acetylated lysines, which areotherwise often highly correlated. H3K18 acetylation appearsto be a general mark of active transcription (Kurdistani et al,2004). In contrast, H3K16 could be a more specific mark,possibly associated with the regulation of ribosomal genes assuggested by the fact that FHL1, a known regulator of rRNAprocessing, is one of the few factors that appears to regulate it.These findings suggest that certain acetylation sites, by virtueof being regulated by multiple TFs, may have broader and

more general roles in transcriptional regulation while othersare likely to have more specific ones.

Figure 3B describes the converse relationship: the number oflysines to which each TF is linked. Here, we find a moredramatic distribution, with five TFs interacting with nine ormore lysines in class 1, while nine TFs interact with only onelysine in class 2. We investigated whether there were generaltrends that separated the two classes of TFs. We found that inboth hyper- and hypoacetylation networks, TFs that interactwith many lysines are more likely to have direct interactionswith HATs and HDACs (total eight) than those that interactwith single lysines (total zero), possibly indicating that class 1TFs have stronger interactions with the HAT or HDACcomplexes, leading to the modification of more lysines.

Validation of the network

The network demonstrates the specificity of regulation ofhistone acetylation within cells. Of the 204 TFs analyzed, weare able to identify 26 TFs that are likely to regulate histoneacetylation at their target genes. These include factors thathave previously been associated with this role as well as newfactors that were not known to function in regulating

Lysines

Transcription factors Complexes

Spt20

H3K24

H3K14

H3K9

Eaf6

H3K18

Spt7

Ada2

Spt2

Epl1

Taf12

H2AK7

Taf14

H4K12 Spt3

Taf6H2BK16

H4K8

H3K23

Ngg1

Taf9

H4K16

Yng2

Mot2

H2BK11

Taf5

HIF1Lysines

HATs

Spt20

Abf1

Hir2

H3K27

H3K14

H4/H2Ahistoneacetyl-

transferase

Fhl1

H3K9

Eaf6

YBL054W

Taf1H3K18

Histoneacetyl-

transfer-ase B

Spt7Gcn4

Ada2

Spt2

Epl1

Taf12

H2AK7

Taf14

H4K12 Spt3

Taf6H2BK16

H4K8

H3K23

Ngg1

Taf9

YML081W

H4K16

Yng2

Mot2

SAGA

H2BK11

Taf5

HIF1

Hir3Snf1

Hir1

Rts2

Figure 1 The regulatory network of histone acetylation in Saccharomyces cerevisiae. The network has been divided into two subnetworks: the one shownhere describes the regulation of histone hyperacetylation. The yellow ovals represent the 11 lysines whose acetylation at each promoter has been measured. The TFsthat we predict to regulate the acetylation levels of these lysines are shown linked to the lysines. The color of the TFs and the links between TFs and lysinesare determined by the linear regression coefficients. Blue indicates correlation with hyperacetylation. White indicates a weak association. The known HAT andHDAC complexes, along with their subunits, are also included in the network. The interactions between HATs, HDACs and TFs are determined from protein–proteininteraction data.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

& 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 3

acetylation levels. In both cases, we have included in thenetwork the known interactions between TFs and subunits ofthe HATand HDAC complexes. As a result, the model generatesspecific axes of interaction between histone modifiers, TFs attheir target genes and specific lysine residues.

We validate the couplings between TFs and lysines in ournetwork using two approaches: through a statistical test andby verifying individual relationships experimentally. Thestatistical validation is based on the assumption that TFs thatare correlated with low acetylation levels (red) should interactwith HDACs. Conversely, we expect that TFs that arecorrelated with high acetylation levels should interact withHATs. We find that our network contains 22 correct associa-tions and only one incorrect one. This is striking since our TF–lysine interactions are not trained on protein–protein interac-tions at all, yet we are able to correctly identify the type ofnearly all the TF–HAT or HDAC interactions. We performed astatistical test to measure how often we obtain these correctassociations versus incorrect ones in the real versus randomnetworks and found that the number of correct associations inthe real network is significant below the 1e–5 level.

We have also conducted experimental validation of thenetwork. We tested the regulatory interactions described inour network by measuring changes in acetylation in mutantstrains of S. cerevisiae that lack specific TFs. In the TF mutant

strains, the recruitment of the HATs or HDACs that areresponsible for the observed acetylation should be eliminated,yielding a corresponding change in acetylation, which wemeasured. We tested six TFs, three that were significant in ourmodel, Hir3, Sum1 and YML081W, and three negative controls.Two of the three significant factors were selected because oneinteracted with a HAT and one with an HDAC; thus, we couldtest whether these had opposite effects on acetylation afterdeletion. The third had no interaction with HAT or HDACenzymes and is virtually uncharacterized in the literature andtherefore represents a potentially novel finding of our model.The three negative controls were factors that did not appear inany of our models, and we would therefore predict that theirdeletion has little or no effect on acetylation. This is in fact byand large what we see in Figure 4F and G, where the changesinduced by the deletion of these control factors are generallysmall.

The acetylation network reveals specificinteractions

The first experiment we conducted involved deleting Sum1,a transcriptional repressor required for mitotic repression ofmiddle sporulation-specific genes (Xie et al, 1999). This factor

Lysines

Transcription factors HDACs Complexes

TAF10

HDBSds3

Rlr1

Rpd3

Hst1

SIN3

Nrg1Pho23

Rfm1

Rxt3

Sin3

Ume6

Sum1

Mig3

Rtg3

Thi2

Ino4

Stp4

H3K27

H3K14

H3K9

H3K18

H2AK7

H4K12

H2BK16

H4K8

H3K23

H4K16

H2BK11

Mig2

Stp4

Met18

Gat3

Figure 2 The regulatory network of histone acetylation in Saccharomyces cerevisiae. The network has been divided into two subnetworks: the one shown heredescribes the regulation of histone hypoacetylation. The yellow ovals represent the 11 lysines whose acetylation at each promoter has been measured. The TFs that wepredict to regulate the acetylation levels of these lysines are shown linked to the lysines. The color of the TFs and the links between TFs and lysines are determined bythe linear regression coefficients. Red nodes and links indicate that the TF is correlated with hypoacetylation. White indicates a weak association. The known HAT andHDAC complexes, along with their subunits, are also included in the network. The interactions between HATs, HDACs and TFs are determined from protein–proteininteraction data.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

4 Molecular Systems Biology 2007 & 2007 EMBO and Nature Publishing Group

is known to interact with Hst1, an NADþ -dependent histonedeacetylase that is a subunit of the Sum1p/Rfm1p/Hst1pcomplex, and we would therefore expect it to regulate histonedeacetylation. However, the specific lysines that it regulatesand the targeted promoters were previously not known on agenomic scale. Our model allows us to define both theregulated lysines and the promoters that are regulated bySum1 (based on the midpoint of its sigmoidal function).Accordingly, we predict that Sum1 regulates all lysines otherthan H4K16 and H2BK11 and it does so at B40 promoters,where the binding ratio is greater than two.

We tested these predictions by comparing acetylation levelsof three lysines at three promoters between wild-type andsum1-deletion strains. At the first two promoters (iYFL011Wand iYIR027C), the binding of Sum1 is above the threshold setby the sigmoidal function for all three lysines (H3K14, H4K16and H3K18). The model predicts that both H3K14 and H3K18levels are regulated by Sum1, whereas H4K16 levels are not.What we find at these two promoters is that the first two

lysines are in fact significantly hyperacetylated in the mutantwith respect to the wild type, especially in the iYIR027Cpromoter, while K4K16 levels are not (see Figure 4A and B).This result supports our previous finding that the regulation ofH4K16 is largely independent of H3K18. The third promoterwe selected, iYDR224C, is one where Sum1 is bound belowthreshold, and we would therefore predict that the deletion ofSum1 should have little or no effect on acetylation. Again, theexperiments largely support this prediction, since we see littleor negative change in the acetylation levels of all three lysinesat this promoter (Figure 4C).

Regulation of transcription by histone acetylation

It is generally assumed that the deacetylation of promotersleads to transcriptional repression, while the acetylation toactivation. This may be in part due to the tightening andloosening of histone DNA interactions, as well as due to therecruitment of chromatin-modifying proteins that recognize

Figure 3 In the top panel we show that among all 11 acetylation sites, lysine H3K18 is linked to the largest number of TFs. This suggests that this lysine may play amore general role in regulating transcription than other lysines that are likely to regulate specific processes, such as H4K16. In the bottom panel we show the number oflysines that are linked to each TF. Ume6 and Hir3 are linked to all 10 lysines, while a large number of factors are linked to only one lysine.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

& 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 5

specific acetylation marks. We tested whether our networksupports this view of transcriptional regulation by annotatingfactors as transcriptional activators and repressors based on ananalysis of the transcriptional profiles of knockout strainsmeasured in rich media (Hu et al, 2007). As discussed in theMaterials and methods section, we used the same linearregression of optimized sigmoidal function strategy to modelexpression profiles in terms of TF-binding profiles as we did foracetylation profiles. If the model of a specific profile containedthe TF that was deleted in that mutant, then we were able toinfer whether the factor was an activator or repressor, based onthe sign of the maximal value of its sigmoidal function. If thesign was positive, and the genes bound by the deleted factorwere overexpressed, then the factor was considered a

repressor. In contrast, if the sign was negative, then it wasconsidered an activator because the genes it bound to wererepressed in the mutant strain.

We computed models of the expression profiles for the 26TFs found in our acetylation network and found five caseswhere the models contained the deleted factor (Table I). Formany of the other cases, the mutant strain had a very similarexpression profile to the wild type, or the perturbation wasexplained by the activation of other factors. From this analysis,we were able to infer that Gcn4 and Mot2 are activators, whileHir3, Sum1 and Ume6 are repressors, as one might expect fromprevious characterizations of these factors. Both the activatorsare associated with hyperacetylation in our network, while therepressors, with the exception of Hir2, are associated with

0

5

10

15

20

H3K14Ac H4K16Ac H3K18Ac

iYDR224C

Enr

ichm

ent

rela

tive

to te

lom

ere

WThir3

0

10

20

30

H3K14Ac H4K16Ac H3K18Ac

iYIR027C

Enr

ichm

ent

rela

tive

to te

lom

ere WT

sum1

0

5

10

H3K14Ac H4K16Ac H3K18Ac

iYDR224C

Enr

ichm

ent

rela

tive

to te

lom

ere

WTsum1

H3

H30

3

6

9

12

H3K14Ac H4K16Ac H3K18Ac

iYFL011WE

nric

hmen

t re

lativ

e to

telo

mer

e WTsum1

H3

H3

yml081WWT 0.6

0.2

0.4

0.0

15

12

9

6

3

0

Enr

ichm

ent

rela

tive

to te

lom

ere

H3H3K18Ac

iYDR525W

H3

ylr278WT

Enr

ichm

ent

rela

tive

to te

lom

ere

iYHR019W

H3K18AcH3K14Ac

8

6

4

2

0

ynr063w

H3

tos8WT

Enr

ichm

ent

rela

tive

to te

lom

ere

iYDR251W

H3K18AcH3K14Ac

8

6

4

2

0

Figure 4 Experimental validation of the regulatory network. We examine the change in acetylation in six mutants that lack a TF. For each mutant, we measure thechange in acetylation in one or more promoters where the factor is known to bind. In (A) and (B) we show that H3K14 and H3K18 acetylation levels increase in sum1mutants at the YFL011W and YIR027C promoters where Sum1 binds strongly, but as shown in (C) are not significantly changed at the YDR244C promoter where Sum1binds weakly. As predicted by our model H4K16 levels are not significantly affected by Sum1. In (D), we see that the hir3 mutant has significantly less acetylation at theYDR224C promoter that regulates the H2B gene. In (E), we see that Yml081W affects the acetylation of H3K18 at a promoter where it is bound. Finally, in (F) and (G),we see our control experiments, which show only small changes in promoters where factors that are not in our model are strongly bound.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

6 Molecular Systems Biology 2007 & 2007 EMBO and Nature Publishing Group

hypoacetylation. Thus in four of the five cases, our analysissupports the conventional view that transcriptional deacetyla-tion of promoters leads to transcriptional repression, whileacetylation leads to activation.

An intriguing exception to this rule is the regulation ofhistone genes. The Hir1, Hir2 and Hir3 TFs, which form the Hircomplex, are believed to be primarily responsible for repres-sing the expression of these genes (Prochasson et al, 2005).This repression is usually removed in the G1/S phase of thecell cycle, when histone genes are transcribed at high levels.However, it appears that the Hir complex remains bound to thehistone genes throughout the cell cycle, and therefore the exactmechanism of the switch that leads from repression toactivation of these genes is not known. One possibility is thathistone acetylation plays an important role in this transition,as the acetylation of certain lysines has been shown to vary in acell cycle-dependent manner (Xu et al, 2005).

We tested the effect of Hir3 deletion on the acetylation levelsof a histone H2B gene, YDR224C. Protein–protein interactiondata indicate that Hir3 interacts with Taf1, a putative histoneacetyltransferase. At the iYDR224C promoter region, deletionof hir3 leads to a marked decrease of acetylation at all threelysines that we examined, as predicted by our model(Figure 4D). This result then raises the possibility that therepressive function of the Hir complex is mediated by therecruitment of a histone acetyltrasferase. In the mutant strain,where the recruitment is inhibited, the promoters are no longeracetylated, leading to increased transcription of histone genes.Therefore, in contrast to the standard view, in this caseacetylation of promoters appears to be associated with therepression of transcription and not the activation. None-theless, it does not appear that the acetylation levels of H3K18,one of the lysines we examined, vary periodically during thecell cycle (Xu et al, 2005). Therefore, although our dataindicate that Hir3 regulates the acetylation of various lysinesand is associated with the repression of histone genes, themechanism by which this repression is relieved during the Sphase remains to be determined.

The network predicts novel interactions foruncharacterized TFs

The network enables us to infer partial functions forpreviously uncharacterized TFs. For example, TF Yml081w isvery poorly characterized in the literature. However, we findthat it is significantly linked to several lysines in thehyperacetylation network and would therefore conclude thatit likely recruits a HAT. To test this hypothesis, we constructed

strains lacking Yml081w and measured the change inacetylation at a promoter where the factor binds strongly.The result, shown in Figure 4E, indicates that there is a smallbut reproducible decrease in acetylation in the mutant strain,supporting the notion that this factor does in fact recruit a HAT.

This result leads us to postulate that Yml081w is likely anactivator of transcription. To test this hypothesis, we looked atthe expression of genes during the yeast cell cycle (Spellmanet al, 1998). We compared the average expression of 153 genesstrongly bound by this TF (log ratio greater than one) to theexpression of the transcript of Yml081w. We find that the twoare related by a correlation coefficient of 0.7, indicating thatwhen the transcripts of Yml081w are high, the expression of itstarget genes is also high and vice versa. This finding thereforesupports the hypothesis that Yml081w is likely an activator.Therefore, although our regulatory network cannot determinethe specific function of each individual TF, as this exampleshows, in some cases it may allow us to predict whether afactor is an activator or repressor.

TFs recruit HATs and HDACs

An important question that emerges from our network iswhether there are cases in which the histone modificationrecruits the TFs, as opposed to the factors recruiting HATs andHDACs. The analysis so far identifies only correlationsbetween TFs and acetylation profiles, but is not able toestablish a causal connection. To answer this question, weexamined a data set in which four lysines were mutated toarginine to inhibit their acetylation (Dion et al, 2005). Each ofthe 15 possible combinatorial combinations of mutants wasconstructed and the resulting expression was measured. Weagain built models for the expression profiles as above andtested whether the presence of TFs in our model correlatedwith the mutant state of any of the four lysines.

We found that out of all possible combinations of TFs andlysines, only one was statistically significant: the factor Ste12was present with a negative coefficient only in models inwhich H4K16 was mutated (Po0.001). This could imply thatSte12 binding recognizes the acetylation of H4K16. However,Dion et al speculate that an alternate explanation is that theH4K16 mutant derepresses the silent mating-type locus andsubsequently leads to a diploid-like repression of the matingpathway (Dion et al, 2005). Since Ste12 regulates matinggenes, it is possible that the repression of Ste12-bound genes inthe mutant results from this effect rather than the inhibition ofSte12 binding.

As a result of this analysis, we conclude that TF binding doesnot depend on the acetylation state of lysines. Furthermore, asour experiments on hir3, yml081w and sum1 mutantsdemonstrate, the deletion of TFs does in fact affect acetylationlevels. We therefore hypothesize that in most cases the TF iscausing changes in acetylation and not being recruited by theacetylated lysines.

Discussion

Previous studies have attempted to determine the relationshipbetween histone acetylation and gene expression using a

Table I Association of acetylation with transcriptional regulation

TF Coef Type Recruits

GCN4 �2.1 Activator HATMOT2 �0.2 Activator HATUME6 1.4 Repressor HDACHIR2 3.0 Repressor HATSUM1 4.6 Repressor HDAC

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

& 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 7

variety of techniques. For example, a recent study usedregression approaches to correlate acetylation patterns withexpression levels, demonstrating that the cumulative acetyla-tion levels are predictive of expression (Yuan et al, 2006). Anexperimental effort that measured the effect of histonemutants on gene expression levels reached similar conclusions(Dion et al, 2005).

In contrast to these efforts, here we do not focus on the effectof acetylation on expression, but rather attempt to reconstructthe regulatory network that establishes genome-wide acetyla-tion profiles. The enzymes that are responsible for acetylatingand deacetylating histones have been largely characterized inthe past couple of decades. In a limited number of cases, it isalso known how they are specifically recruited to specificgenomic loci to modify histone tails.

In contrast to these more limited studies, using a multiplelinear regression of optimized sigmoidal function analysis, wehave constructed a comprehensive and genome-wide networkthat links TFs and their target genes to acetylation profiles ofspecific lysine residues in the four core histones. For each TF,these associations are likely due to recruitment of select HATsand HDACs. We utilize protein–protein interaction data to inferwhich HAT or HDAC is likely recruited by each TF in the fewcases where this is known. The validity of the resultingnetwork is supported by the observation that TFs that arecorrelated with hypoacetylation tend to recruit deacetylases,while TFs that correlate with hyperacetylation tend to recruitacetylases. When we compare the constructed network tothose that are randomly generated, we find that the enrich-ment of correct associations is statistically very significant.

Furthermore, we have performed experimental tests of thehypotheses generated from the network. In particular, we havemeasured the changes in acetylation in three deletion mutantsof TFs that were present in our model: Yml081W, Hir3 andSum1. We note a marked decrease in the acetylation of threelysines in the hir3 mutant at a histone promoter where Hir3binds strongly, suggesting that histone acetylation may play arole in repressing the expression of histone genes. We observeincreased acetylation of two lysines in two promoters whereSum1 binds, while observing little or no effect on a controllysine and a control promoter, supporting our specificpredictions of which lysines and which promoters areregulated by Sum1 through the recruitment of Hst1. Finally,we have also shown that the deletion of a previouslyuncharacterized TF, YML081W, decreases acetylation levelsat a bound promoter, as predicted by our network.

We have also shown that in general the TFs cause changes inacetylation rather than being recruited by the modifications.Beyond these observations, our model also generates morespeculative predictions about the mechanisms by which thespecificity of histone acetylase and deacetylase complexes isachieved. For instance, we note that the same HAT or HDACcomplex appears to be recruited by different TFs throughalternate subunits. It is conceivable that differential recruit-ment of the complex through its distinct subunits can affect itsspecificity for different lysines.

Our effort parallels attempts to reconstruct transcriptionalregulatory networks: models of how the interactions betweenTFs and the promoters regulate gene expression. These effortshave advanced considerably over the past few years, providing

a comprehensive network in S. cerevisiae (Harbison et al,2004). Knowledge of this network allows us to pose questionsregarding its dynamics (Luscombe et al, 2004) along with itsevolution (Babu et al, 2004). We expect that as knowledge ofthe histone acetylation regulatory network increases and datafrom additional species become available, we will be able topose analogous questions.

In future, our approach can be expanded to yield moresophisticated regulatory networks. Use of data from moreprecise acetylation profiles and TF binding generated usingtiling arrays could reveal significant and more specificassociations that would be hidden in the current data sets(Pokholok et al, 2005). Our methodology is also applicable tohuman cells, in which reconstruction of analogous networksmay identify mechanisms that lead to the deregulation ofhistone acetylation networks in cancer cells (Seligson et al,2005).

Materials and methods

Data

The histone acetylation data used in this experiment are fromKurdistani et al (2004). This data set includes measurements of theacetylation levels of 11 lysines found on histone tails (H2AK7, H4K8,H3K9, H2BK11, H4K12, H3K14, H2BK16, H3K18, H3K23, H3K24,H4K16) for 5366 S. cerevisiae promoter regions. The acetylation levelsin this data set were measured relative to genomic DNA rather thannucleosomal DNA. However, since the acetylation levels of histonesare affected by the occupancy of nucleosomes in that region, wedivided the acetylation data by the average level of H3 and H2Ahistones from the Bernstein et al (2004) data set.

The TF-binding data set is from Harbison et al (2004), whichincludes the TF-binding levels of 203 TFs to 4956 promoters. We usethe normalized and averaged ratios that are reported in Supplementarydata. Only the promoter entries that are present in the acetylation,nucleosome occupancy, and TF-binding data sets were kept. Inaddition, since TF-binding data will serve as the basis matrix for thelinear regression, the column and row entries need to be wellpopulated for our analysis. Therefore, the rows of the acetylation datawith two or more missing values are removed. In the binding data set,rows with more than 20 missing values are removed and, subse-quently, columns with 50 or more missing values were also removed.After the filtering process, we are left with the binding values of 181TFs and the acetylation levels of 11 lysines for 3221 promoters.

The other data we utilize involve protein–protein interactionsbetween TFs and HDACs and HATs. We obtain protein–proteininteractions from the Saccharomyces Genome Database (http://www.yeastgenome.org/). This data set was complemented with theinteractions from a systematic screen of protein complexes (Gavinet al, 2006), the Database of Interacting Proteins (Salwinski et al, 2004)and the MIPS database (Salwinski et al, 2004) to ensure completeness.From these genome-wide data sets, we select only the interactions thatcontain proteins whose Gene Ontology (GO) annotation is associatedwith the terms ‘histone acetylation/deacetylation,’ ‘histone acetyl-transferase activity/contributes_to’ and ‘histone acetyltransferase/deacetylase complex’ (Ashburner et al, 2000). The proteins that arefound to be part of the histone acetylase or deacetylase complex arethen linked to the HAT and HDAC protein complexes, as defined byMIPS (Salwinski et al, 2004).

Linear regression of optimized sigmoidalfunctions

Before the linear regression is performed, each column of theacetylation data is normalized to yield a mean value of zero and avariance of one. We define the TF-binding data as the basis and the

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

8 Molecular Systems Biology 2007 & 2007 EMBO and Nature Publishing Group

normalized acetylation data as the target. Thus, the model for theacetylation of a single lysine is

log Ajk

� �¼

X181

i¼1

Fik log TFji

� �� �

where Ajk is the acetylation ratio for lysine k at gene j, TFji is thebinding ratio of TF i to gene j and Fik is the nonlinear function thatdescribes the effect of the binding of TF i on the acetylation of lysine k.We have chosen sigmoidal functions to model this relationship (seeSupplementary Figure 2),

F ¼ a

1 þ eb�x

c

since these functions capture the expected saturation effects of highbinding as well as the non-effects of low binding. The sigmoidal curveis a three-parameter function: the midpoint, b, the slope, c, and themaximal values, a, must all be specified. To simplify the complexoptimization problem we are confronted with, we have chosen to varyonly two of these parameters a and b, setting the value of c to 1 for allfactors. The maximal value, a, of the sigmoidal function represents themaximal coupling strength between a TF and the acetylation of aspecific lysine. If it is positive it implies that the TF is correlated withhigh levels of acetylation, and if it is negative it is correlated withhypoacetylation.

We use the following protocol to identify the TFs, and theirfunctional form, that regulate each lysine. This is similar to theapproach developed by Das et al (2006) using multiple adaptiveregression splines (MARS). We first divide our genes into two groups:test and training. The test genes are 10% of the total and the traininggenes are the remaining 90%. Using the training genes, we greedilybuild up a model containing successively more TFs. We use thereduction in variance of our model as the figure of merit:

RIV ¼ 1 � varðresidualÞvarðdataÞ

where the residual is the difference between the observed and modeleddata. After we select the best factor, we select the second that incombination with the first maximally reduces the variance of theresidual. We then iterate this procedure until we reach a maximummodel size of 25. At each step, we monitor the reduction in variance ofthe test set and consider the model size with the lowest variance as theoptimum. We repeat all these steps ten times for 10 different randomlyselected test sets.

This approach generates 10 independent models. From these, wegenerate a single consensus model, by selecting the top TFs that appearin all 10 models, making sure that the consensus model is not largerthan the average optimal model size. Finally, we compute thereduction in variance of this final model on a randomly selected testdata set, as shown in Supplementary Figure 4. We see that on averagethese models have a reduction in variance of about 0.15. The fullmodel is described in Supplementary Table 1, including all the factorsand their parameters for each lysine. We also report the computedchanges in acetylation that arise from the deletion of TFs at targetpromoters in Supplementary Table 2.

Linear regression of optimized sigmoidal function models ofexpression profiles is computed using the same protocol, simplyreplacing the acetylation profile with the logarithm of the expressionratios.

Statistical validation of network

The validation test is built on the premise that TFs that are correlatedwith hyperacetylation should recruit HATs while factors that arecorrelated with hypoacetylation should recruit HDACs. In practice, wetest for the number of associations in our network between factorswith positive alpha coefficients and HATs and negative alphacoefficients and HDACs. The statistical significance of this number iscomputed using the hypergeometric cumulative distribution, whichcomputes the probability of correct matches, or greater, given thenumber of positive and negative alpha coefficient TFs and the numberof HATs and HDACs in our network.

Network construction

From the linear regression computation described above, we define theTFs that regulate each lysine. We use these couplings to form the edgesbetween the TFs and lysines. To form the link between TFs and HATsand HDACs, we use protein–protein interactions from the integrateddata set described above.

The TFs are displayed in various shades of blue and red. The bluenodes indicate TFs that are correlated with histone acetylation, the rednodes indicate TFs that are linked to histone deacetylation and whitesuggests neither activity. The various shades of blue and red for eachTF were derived from the maximal value of their correspondingsigmoidal function. The strength of the color is determined by thesumming of maximal values associated with the same TFacross all thelysines it is associated with. The TF with the darkest shade of blue hasthe maximum positive sum and the TF with the darkest shade of redhas the most negative sum, with all other shades of red and blue beingin between. Finally, the histone acetylases, histone deacetylases andprotein complexes in the third and topmost tiers are colored blue andred as well to indicate histone acetyltransferase and deacetylaseactivity, respectively, as defined by GO and MIPS.

The individual links between the TFs and lysines follow the samecoloring scheme as above. In this case, the color intensity of the link isdetermined by the maximal value of the sigmoidal function betweenthe TF and the lysine.

Chromatin immunoprecipitation

Chromatin immunoprecipitation has been performed as previouslydescribed (Suka et al, 2001). Briefly, yeast strains were grown to OD600

of 0.8 in YPD media. DNA was fragmented with an average size ofapproximately 500 bp by sonication and 50ml of the lysate wasimmunoprecipitated using 0.5ml of H3 antibody (Abcam), 2ml ofH3K18Ac-specific antibody and 4ml of H4K16-specific antibody. PCRwas performed within the linear range of amplification for each primerset and DNA sample. Typically, 1/100th of the immunoprecipitatedDNA and 0.8mCi/ml [a-32P]dATP (specific activity, 3000 Ci/mmol)were used for each PCR (12.5ml). Quantitations of ChIP data wereperformed using a PhosphorImager and ImageQuant software(Molecular Dynamics, Sunnyvale, CA). All PCR reactions includeprimers designed to a region 500 bp from the telomere of chromosomeVI-R as an internal control for loading and quantitation.

Supplementary information

Supplementary information is available at the Molecular SystemsBiology website (www.nature.com/msb).

AcknowledgementsThe computations were performed by HP, the experimental validationwas performed by RF and the manuscript was written by HP and MPwith help from SKK. SC and SKK provided guidance and support forthe computation and experiments respectively.

References

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE,Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool forthe unification of biology. The gene ontology consortium. Nat Genet25: 25–29

Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA(2004) Structure and evolution of transcriptional regulatorynetworks. Curr Opin Struct Biol 14: 283–291

Bernstein BE, Liu CL, Humphrey EL, Perlstein EO, Schreiber SL (2004)Global nucleosome occupancy in yeast. Genome Biol 5: R62

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

& 2007 EMBO and Nature Publishing Group Molecular Systems Biology 2007 9

Das D, Nahle Z, Zhang MQ (2006) Adaptively inferring humantranscriptional subnetworks. Mol Syst Biol 2: 2006 0029

Dion MF, Altschuler SJ, Wu LF, Rando OJ (2005) Genomiccharacterization reveals a simple histone H4 acetylation code.Proc Natl Acad Sci U S A 102: 5501–5506

Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C,Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA,Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M,Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T,Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, RussellRB, Superti-Furga G (2006) Proteome survey reveals modularity ofthe yeast cell machinery. Nature 440: 631–636

Guo X, Tatsuoka K, Liu R (2006) Histone acetylation andtranscriptional regulation in the genome of Saccharomycescerevisiae. Bioinformatics 22: 392–399

Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW,Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J,Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES,Gifford DK, Fraenkel E, Young RA (2004) Transcriptional regulatorycode of a eukaryotic genome. Nature 431: 99–104

Hu Z, Killion PJ, Iyer VR (2007) Genetic reconstruction of a functionaltranscriptional regulatory network. Nat Genet 39: 683–687

Jenuwein T, Allis CD (2001) Translating the histone code. Science 293:1074–1080

Kurdistani SK, Grunstein M (2003) Histone acetylation anddeacetylation in yeast. Nat Rev Mol Cell Biol 4: 276–284

Kurdistani SK, Tavazoie S, Grunstein M (2004) Mapping global histoneacetylation patterns to gene expression. Cell 117: 721–733

Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M(2004) Genomic analysis of regulatory network dynamics revealslarge topological changes. Nature 431: 308–312

Pokholok DK, Harbison CT, Levine S, Cole M, Hannett NM, Lee TI,Bell GW, Walker K, Rolfe PA, Herbolsheimer E, Zeitlinger J, Lewitter F,Gifford DK, Young RA (2005) Genome-wide map of nucleosomeacetylation and methylation in yeast. Cell 122: 517–527

Prochasson P, Florens L, Swanson SK, Washburn MP, Workman JL(2005) The HIR corepressor complex binds to nucleosomes

generating a distinct protein/DNA complex resistant toremodeling by SWI/SNF. Genes Dev 19: 2534–2539

Richmond TJ, Davey CA (2003) The structure of DNA in thenucleosome core. Nature 423: 145–150

Rundlett SE, Carmen AA, Suka N, Turner BM, Grunstein M (1998)Transcriptional repression by UME6 involves deacetylation oflysine 5 of histone H4 by RPD3. Nature 392: 831–835

Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D(2004) The database of interacting proteins: 2004 update. NucleicAcids Res 32: D449–D451

Seligson DB, Horvath S, Shi T, Yu H, Tze S, Grunstein M, Kurdistani SK(2005) Global histone modification patterns predict risk of prostatecancer recurrence. Nature 435: 1262–1266

Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,Brown PO, Botstein D, Futcher B (1998) Comprehensive identificationof cell cycle-regulated genes of the yeast Saccharomyces cerevisiae bymicroarray hybridization. Mol Biol Cell 9: 3273–3297

Strahl BD, Allis CD (2000) The language of covalent histonemodifications. Nature 403: 41–45

Suka N, Suka Y, Carmen AA, Wu J, Grunstein M (2001) Highly specificantibodies determine histone acetylation site usage in yeastheterochromatin and euchromatin. Mol Cell 8: 473–479

Xie J, Pierce M, Gailus-Durner V, Wagner M, Winter E, Vershon AK (1999)Sum1 and Hst1 repress middle sporulation-specific gene expressionduring mitosis in Saccharomyces cerevisiae. EMBO J 18: 6448–6454

Xu F, Zhang K, Grunstein M (2005) Acetylation in histone H3 globulardomain regulates gene expression in yeast. Cell 121: 375–385

Yuan GC, Ma P, Zhong W, Liu JS (2006) Statistical assessment of theglobal regulatory role of histone acetylation in Saccharomycescerevisiae. Genome Biol 7: R70

Molecular Systems Biology is an open-access journalpublished by European Molecular Biology Organiza-

tion and Nature Publishing Group.This article is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence.

Modeling the regulatory network of histone acetylation in S. cerevisiaeH Pham et al

10 Molecular Systems Biology 2007 & 2007 EMBO and Nature Publishing Group