as and a level biology - king edward vii academykesacademy.co.uk/wp-content/uploads/2015/12/... ·...

16
AS and A Level Biology TOPI C GUI DE: EPI GENETI CS

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

AS and A Level Biology

TOPI C GUI DE: EPI GENETI CS

Page 2: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 2

I nt roduct ion

This guide is intended to provide support ing m aterial and background inform at ion for the following aspects of the new Edexcel A level Biology B specificat ion.

7 .1 Using base sequencing

● Understand what is m eant by the term ‘genom e’.

● Understand how base sequencing can be used to:

(a) analyse evolut ionary pat terns and to ident ify separate species

(b) predict the am ino acid sequence of proteins and possible links to genet ically determ ined condit ions.

7 .2 Factors affect ing gene expression

● Know that t ranscript ion factors are proteins that bind to DNA.

● Understand the role of t ranscript ion factors in regulat ing gene expression.

● Understand how post–t ranscript ion m odificat ion of m RNA in eukaryot ic cells (RNA splicing) can result in different products from a single gene.

● Know that gene expression can be changed by epigenet ic m odificat ion, lim ited to DNA m ethylat ion, and that this is im portant in ensuring cell different iat ion.

● Understand what is m eant by a stem cell and how its tot ipotency provides opportunit ies to develop new m edical advances.

● Understand how epigenet ic m odificat ions can result in tot ipotent stem cells in the em bryo developing into pluripotent cells in the blastocyst and finally into fully different iated som at ic cells.

I t assum es you are already fam iliar with the st ructure of DNA and RNA ( including 5’ and 3’ ends) and the basics of gene t ranscript ion, t ranslat ion and the genet ic code.

The sam e m aterial is also found in Edexcel A level Biology A, Topic 3 (The voice of the genom e) .

Acknowledgem ent : I am grateful to Robert Johnston for helpful discussion.

Andrew Read

July 2015

Page 3: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 3

The genom e

Your genom e is the totalit y of your DNA – not j ust the protein-coding genes, but all the non-coding DNA within ( int rons) and between the protein-coding genes. I t does not include all the various RNA species present in cells.

One of the surprising features of the hum an genom e is how lit t le of it is protein-coding – only about 1.2% . The sam e is t rue of the genom es of other higher organism s. About half of the rest is repet it ive, com prising huge num bers of copies of certain short sequences whose funct ion, if any, is m ost ly unknown. Much of the non- repet it ive DNA is involved in regulat ing expression of the protein-coding sequences. Gene regulat ion is the subject m at ter of epigenet ics.

DNA sequencing

The standard technique for ident ifying the sequence of nucleot ides in a piece of DNA was developed by Dr Fred Sanger in Cam bridge in the 1970s. I t earned him a share of the 1980 Nobel Prize for Chem ist ry. I t works by using a DNA polym erase enzym e to m ake copies of the DNA to be sequenced, but spiking the pool of individual nucleot ides with a sm all am ount of a chem ically m odified nucleot ide (a dideoxy nucleot ide) that will term inate growth of any copy in which it gets incorporated (Figure 1) .

Figure 1 : The pr inciple of dideoxy ( Sanger) sequencing of DNA . ( a) DNA polym erase m akes m any copies of the test sequence by extending a specia lly designed pr im er oligonucl eot ide. W henever by chance it incorporates a dideox y nucleot ide instead of the corresponding norm al deoxy nucleot ide, the chain te rm inates. Each dideoxy nucleot ide is tagged w ith a dif ferent coloured m olecule. ( b) An a utom ated sequencing m achine uses e lect rophoresis to separate the react ion products by siz e. ( c) I t reads the colours and show s the sequence as a ser ies of coloured peaks. F rom New Clinical Genet ics, Read & Donnai, Scion Publishing 2 0 1 5 .

Page 4: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 4

Sanger’s m ethod can give very accurate sequence of a DNA fragm ent up to around 800 base pairs in length. The Hum an Genom e Proj ect used Sanger sequencing (on banks of autom ated sequencing m achines) ; it was necessary to piece together m illions of short sequences in the com puter t o produce the overall 3200 m illion base pair hum an genom e sequence. I t took 15 years and cost around 3 billion dollars.

Start ing around the year 2005, a num ber of revolut ionary new DNA sequencing technologies becam e available. Different com pet ing com panies produced different m ethods, but all the so-called ‘Next -Generat ion Sequencing’ m ethods have in com m on that they sequence m illions of random DNA fragm ents in parallel. Depending on the technology, the fragm ents m ay be fixed on nanobeads in arrays of t iny wells; they m ay be anchored in arrays to a solid surface, or they m ay be in arrays of nanopores in a m em brane. Sequencing works by synthesis, like Sanger sequencing. I n different technologies each nucleot ide added generates a light signal or a pulse of hydrogen ions. Whatever the detailed technology, use of these m ethods has vast ly increased the am ount of DNA a lab can sequence, t o the point that it is now possible to sequence an individual’s whole genom e in a week for around £1,000. We are only beginning to see the im pact of this new capabilit y on the Nat ional Health Service.

Using genom e sequence to def ine species

Classically, species are defined as groups of individuals able to interbreed and produce fert ile offspring. That requires observat ion of their behaviour, and m aybe experim ental crosses. An alternat ive approach is to consider their genom e sequence. This is not com pletely st raight forward, because genom e sequences vary between individuals of the sam e species – reflect ing the fact that we are all different . But j ust as we can readily appreciate that all humans, despite their individual differences, are m ore sim ilar to each other than to chim panzees, so we can see from the DNA sequence that hum ans are one species and chim panzees another.

An interest ing exam ple of defining species based on DNA sequence concerns the Wood White but ter fly, Lept idea synpasis (Figure 2) . This but terfly, rare in Br itain though less so in I reland, looks fair ly sim ilar to the com m on Sm all White (Pieris rapae) , but can be readily dist inguished by a t rained eye. However, it has turned out that ‘Wood Whites’ actually com prise three species, L. synapsis, L. reali and L. j uvernica that can only be dist inguished reliably by sequencing their DNA (Dincă et al. , 2011) . Dincă, V. et al. Unexpected layers of cryptic diversity in wood white Leptidea butterflies. Nat. Commun. 2:324 doi: 10.1038/ncomms1329 (2011).

Analysing evolut ionary pa t terns

When genom e sequences of related species are com pared, the degree of difference between each pair can be used to const ruct an evolut ionary t ree. One m ight use the DNA sequences of one or a few selected genes that are present in each species. Alternat ively, the gene sequences can be t ranslated to give the am ino acid sequences of the proteins they encode. This approach is preferred for m ore distant ly related species, because it ignores changes that sim ply convert one codon for an am ino acid into another for the sam e am ino acid (see below) . Const ruct ing a t ree for real uses com puter program s that apply elaborate statistical arguments (there is an example in the Dincă et al paper m ent ioned above) . Figure 3 shows a sim ple exam ple.

Figure 2 : W ood W hite but ter f ly © Davidtom linson photos.co.uk

Page 5: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 5

Figure 3 : Com par ison of the last 5 0 am ino acids of the zeta - globin prote in in six species. ( a) the raw sequences, using 1 - le t ter codes for the am ino acids ( see below ) . Dots show unchanged am ino acids. ( b) tabulat ion of pa irw ise d ifferences. For exam ple, hum ans and chim ps dif fer a t 1 posit ion out of 5 0 , so the dif fe rence is 0 .0 2 . ( c) t ree const ructed from the data. You can see how hum an/ chim p and m ouse/ rat form close couples; then chick is about equidistant from both, and zebraf ish equid istant from a ll f ive. The distances can be used to est im ate the t im e of divergence, but to do that proper ly requi res heavy sta t ist ics and com put ing. From Hum an Molecular Genet ics St rachan & Read, Gar land 2 0 1 1 .

Possible teaching approach

Class could const ruct the table and t ree from the data. Other exam ples can be found on the Web – som e possibilit ies include:

ht tp: / / www- tc.pbs.org/ wgbh/ evolut ion/ educators/ teachstuds/ pdf/ unit3.pdf

ht tp: / / evolut ion.berkeley.edu/ evolibrary/ art icle/ 0_0_0/ phylogenet ics_01

ht tp: / / serc.carleton.edu/ sp/ process_of_science/ exam ples/ 73104.htm l

Predict ing the am ino acid sequence of prote ins

When a protein-coding gene is expressed, the enzym e RNA polym erase synthesises an RNA m olecule (m essenger RNA) that is com plem entary to the sequence of one st rand of the DNA ( the tem plate st rand) and ident ical to the sequence of the other st rand ( the sense st rand) . Databases and publicat ions always cite the sequence of the sense st rand, writ ten in the 5’ – 3’ direct ion (Figure 4) .

Figure 4 : From New Clinical Genet ics, Read & Donnai, Scion Publishing 2 0 1 5 .

Page 6: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 6

Possible teaching approach

Give the class a DNA sequence as convent ionally writ ten (you can get any num ber of real exam ples from ht tp: / / www.ensem bl.org/ Hom o_sapiens / I nfo/ I ndex) .

Ask them to write the com plem entary st rand, in the convent ional 5’ – 3’ direct ion. Then ask them to t ranslate each st rand using the table of the genet ic code. The results are com pletely different , m aking the point about the sense st rand and tem plate st rand.

An alternat ive would be to give them a sequence of the bases on a tem plate st rand and get them to predict the sense st rand, the m RNA, the tRNA and the am ino acid sequence. Then they should do it backwards to prove it produces a com pletely different am ino acid sequence.

The m essenger RNA (after splicing out any int rons, see below) is ‘read’ by r ibosom es. A r ibosom e at taches at the 5’ end of the m RNA and slides along unt il it encounters a start signal: the t r iplet AUG em bedded in a suitable consensus sequence ( known as the Kozak sequence) . I t then start s assem bling a polypept ide chain, the choice of am ino acid at each posit ion being determ ined by a t r iplet of t hree consecut ive nucleot ides in the m RNA.

I ndividual am ino acids are covalent ly at tached to specific sm all RNA m olecules, the t ransfer RNAs, by am ino acid-act ivat ing enzym es that are specific for each type of t ransfer RNA. Three nucleot ides on the t ransfer RNA base-pair with three nucleot ides of the m RNA within a special pocket of the r ibosom e. When the r ibosom e encounters a stop codon it falls off the m RNA and releases the polypept ide it has been m aking. The genet ic code (Figure 5) consists of unpunctuated non-overlapping t riplets of nucleot ides.

UUU CUU

AUU GUU

UUC Phe (F) CUC Leu (L) AUC Ile (I) GUC Val (V)

UUA CUA

AUA GUA

UUG CUG

AUG Met (M) GUG

UCU

CCU ACU

GCU UCC Ser (S) CCC Pro (P) ACC Thr (T) GCC Ala (A)

UCA CCA

ACA GCA

UCG CCG

ACG GCG

UAU Tyr (Y) CAU His (H) AAU Asn (N) GAU Asp (D)

UAC CAC

AAC GAC

UAA STOP CAA Gln (Q) AAA Lys (K) GAA Glu [E]

UAG STOP CAG AAG

GAG

UGU Cys(C) CGU

AGU Ser (S) GGU UGC

CGC Arg [R] AGC GGC Gly (G)

UGA STOP CGA AGA Arg [R] GGA

UGG Trp (W) CGG AGG

GGG

Figure 5 : The genet ic code. The corresponding DNA sequence in the sense st rand has T instead of U. By w r it ing out the nucleot ide sequenc e of a prote in - coding gene, you can predict the am ino acid sequence of the prote in it e ncodes.

Page 7: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 7

Links to genet ica lly determ ined disease

Replacing one nucleot ide by another in a protein-coding gene can have one of three effect s: a synonym ous variant , a m is-sense var iant or a nonsense variant (Figure 6) .

(a) ATG GTG CAT CTG ACT CCT GAG GAG AAG TCT GCC GTT…

M V H L T P E E K S A V …

(b) ATG GTG CAT CTG ACT CCT GAG GAG AAG TCA GCC GTT…

(c) ATG GTG CAT CTG ACT CCT GTG GAG AAG TCT GCC GTT…

(d) ATG GTG CAT CTG ACT CCT GAG TAG AAG TCT GCC GTT…

Figure 6 : ( a) the coding sequence for the star t of the beta - globin gene, w ith the am ino acids encoded. ( b) A synonym ous ( sam e - sense) change that does not a ffect the am ino acid encoded. ( c) A m is - sense change, replacing glutam ic acid w ith va line ( this is the sick le cell var iant ; as is usually the case, the in it ia l m ethionine is cleaved off dur ing post -t ransla t ion processing, so the var iant can be descr ibed as Glu6 Val) . ( d) A nonsense change, int roducing a prem ature stop codon.

I nsert ing or delet ing one or m ore nucleot ides has a m ore drast ic effect : it alters the reading fram e (a fram eshift change) and so changes the ent ire am ino acid sequence downst ream of the change.

(a) ATG GTG CAT CTG ACT CCT GAG GAG AAG TCT GCC GTT…

M V H L T P E E K S A V …

(b) ATG GTG CAA TCT GAC TCC TGA GGA GAA GTC TGC CGT T…

M V Q S D S STOP G E V C R

Figure 7 : ( a) the w ild - type beta - globin sequence. ( b) insert ing a single nucleot ide a lters the ent ire m essage ( and in this case int roduces a p rem ature stop codon) .

Possible teaching approach

This readily lends it self to class exercises. To access endless exam ples, go to ht tp: / / www.ensem bl.org/ Hom o_sapiens/ Info/ I ndex; enter a gene or condit ion, e.g. ‘cyst ic fibrosis’, ‘Factor VI I I ’. From the list that appears, click on a prom ising looking gene; click on a t ranscript , then the ‘cDNA’ item on the top left .

Predict ing the effect of a change on the protein encoded is fair ly st raight forward (and can be m ade the subject of m any classroom exercises) . Predict ing the effect on the person carrying the variant is not at all st raight forward. Som e changes will have a m aj or effect , like the sickle cell m utat ion. Som e will slight ly alter the st ructure or act ivit y of the protein, m aybe cont r ibut ing a lit t le to suscept ibilit y or resistance to a com m on m ult ifactorial (not m onogenic) condit ion like diabetes or hypertension. Som e will have no overt effect on the pat ient , even if there is a very m ajor effect on the protein – som e proteins are not im portant , or their role can be taken over by other proteins.

Moreover, supposing a sequence change m akes an im portant protein non- funct ional, we can ask whether we can get by with a single funct ional copy ( rem em ber, we are diploid and have two copies of each autosom al gene) , or whether 50% overall funct ion is not sufficient . I n the first case the condit ion will be recessive: carriers of one non- funct ional copy will be norm al. Cyst ic fibrosis is an exam ple. I n the second case the condit ion will be dom inant – for exam ple, individuals with achondroplast ic dwarfism have a single

Page 8: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 8

m alfunct ioning copy of t he FGFR3 ( fibroblast growth factor receptor 3) gene. Which of these alternat ives happens depends on the detailed role of that part icular protein in the cells where it is expressed.

The general conclusion is that without very detailed knowledge of the part icular protein and it s exact role in the biology of specific cells, it is im possible to predict the phenotypic effect of a DNA sequence change, however radical the effect m ay be on the encoded protein.

I nt rons, exons and splicing

I n m ost genes in hum ans and other m ult icellular organism s, the protein-coding sequence is split into segm ents ( exons) that are separated by non-coding sequence ( int rons) . This arrangem ent was a com plete surprise when first discovered in the late 1970s. Bacterial genes, which were the best understood genes at the t im e, do not have int rons. I t seem s com pletely counter- intuit ive. The num ber of exons in genes varies with no apparent logic (Figure 8) . The average is around 8–10, but there are genes with no int rons, and the record is held by the gene for the m uscle protein t it in, which has 362 exons.

Gene sizes also vary independent ly of the num ber of exons, because int rons vary ext rem ely widely in size, both within and between genes. Som e int rons are only a few dozen base pairs, som e are m ore than 100 kilobases. I n Figure 8, all the gene diagram s have been m ade to fit the box, but the real sizes vary widely: 1.43 kb for the insulin gene, 1.61 kb (beta-globin) , 4.62 kb (HLA-A) , 80.72 kb (phenylalanine hydroxylase) and 188.7 kb (CFTR, the gene m utated in cyst ic fibrosis) .

Figure 8 : Data from Ensem bl

When a gene is t ranscribed, the RNA polym erase t raverses the ent ire sequence, exons and int rons, t o m ake the prim ary t ranscript . This is then processed, within the nucleus, by being physically cut at exon- int ron boundaries; the exons are spliced together to m ake the m ature m RNA, and the int rons are discarded. The m achinery that does this, the spliceosom e, is exceedingly com plicated, incorporat ing five species of sm all RNAs and around 170 different proteins. Many t ranscripts can be spliced in m ore than one way – certain exons m ay be som et im es incorporated and som et im es skipped.

Alternat ive splicing is often t issue-specific, and the different splice isoform s m ay have clearly different funct ions. For exam ple, som e proteins exist in either a cell- surface form or a secreted form , depending whether an exon encoding a t ransm em brane dom ain is included in the final spliced m RNA.

Page 9: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 9

Alternat ive splicing is not a peculiar and except ional event , it is quite norm al. The average gene encodes about 5 different splice isoform s, and there are genes (neurexin B, for exam ple) that encode over 1,000. This forces a significant extension to the one-gene-one-enzym e hypothesis of Beadle and Tatum .

Possible teaching approaches

1. ask students to ident ify the parts of a gene (Figure 8)

DNA → RNA → protein

5’UT 3’UT

Figu re 9

Where is the:

• prom oter

• t ranscript ion start site

• t ranscript ion term inat ion site

• 5’ end of exon 3

• 3’ end of int ron 2?

What are the 5’ and 3’ unt ranslated regions?

2. Ask groups of students to access a gene in Ensem bl (url as above) , and to report the num ber of exons, the num ber of different t ranscripts and the relat ion between them . Suitable sim ple genes are HBB (beta-globin) or GJB2 ( connexin 26, m utated in about half of autosom al recessive profound childhood deafness) . More com plex genes could include CFTR ( cyst ic fibrosis) , BRCA1 ( fam ilial breast cancer) and PAX3 (m utated in the Waardenburg syndrom e of hearing loss and pigm entary anom alies) . The Ensem bl ent r ies include diagram s showing the exons of each t ranscript .

Factors a ffect ing gene expression

Som e genes are expressed in every cell of our body (so-called housekeeping genes) but m ost are not . Haem oglobin is m ade only by red cell precursors, kerat ins only in skin and hair; the ADH4 alcohol dehydrogenase gene is expressed only in liver cells. Tissue-specific gene expression is the key to our com plexity, com pared to sim pler organism s. How is it achieved?

For a gene to be expressed, two things are necessary:

● the DNA m ust be accessible, not buried in densely packed chrom at in

● sequence-specific DNA-binding proteins ( t ranscript ion factors) m ust bind to the prom oter, upst ream of t he sequence to be t ranscribed, to help recruit RNA polym erase.

Page 10: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 0

These depend on the interact ions of a com plex set of players – sequence elem ents (prom oters and enhancers) , proteins ( including t ranscript ion factors, DNA m ethylt ransferases, histone-m odifying enzym es and chrom at in rem odelling com plexes) , and a bat tery of sm all RNA species. The A level specificat ion rather arbit rarily includes only DNA m ethylat ion; we include brief details of the other players here to provide context and depth.

Prom oters

I n order to t ranscribe a gene, the RNA polym erase m ust at tach to the DNA just upst ream of the t ranscript ion start site. This region is called the prom oter. Binding is determ ined by the DNA sequence, but also by sequence-specific binding of a whole set of other proteins that together const itute the t ranscript ion init iat ion com plex. I ndividual protein-DNA interact ions m ay be quite weak, but they are cem ented by protein-protein interact ions (Figure 10) . Som e of those other proteins are present only in certain cells, and the m any possible com binat ions are one route to t issue-specific gene expression.

Figure 1 0

Enhancers

Enhancers are prom oter- like sequences that are located som e way away from the gene they regulate. They can be upst ream or downst ream of the gene, and in som e cases up to a m illion base pairs away. Like prom oters, they bind a variety of proteins, m any of them t issue-specific, and the DNA loops round to bring them into contact with the prom oter (Figure 11) . Many genes are cont rolled by a variety of different t issue-specific enhancers.

Figure 1 1

Page 11: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 1

Transcr ipt ion factors

Transcript ion factors are proteins that bind to prom oters and enhancers. There are general t ranscript ion factors, present in every cell and part of the basal t ranscript ion m achinery, and t issue-specific factors. These in turn are produced by genes that are them selves cont rolled by other t ranscript ion factors, allowing a cascade of regulatory effect s. Act ing in a com binatorial way, around 1000 t ranscript ion factors can exert subt le cont rol over the expression of our 20–25 000 protein-coding genes.

DNA m ethylt ransferases

These add m ethyl ( -CH3) groups to DNA, specifically to the 5-posit ion of cytosines that lie im m ediately upst ream of guanines (so-called CpG dinucleot ides, the p represent ing the phosphate joining adjacent nucleot ides) . 5-m ethyl cytosine base-pairs with guanine exact ly the sam e as norm al cytosine, but the m ethyl groups act as a signal to m ethyl DNA binding proteins, which in turn recruit other regulatory proteins.

Figure 1 2 : From New Clinical Genet ics, Read & Donnai, Scion Publishing 2 0 1 5 .

Histone m odify ing enzym es and chrom at in rem odelling com plexes

Every diploid hum an cell nucleus contains 2 m et res of DNA.

Possible teaching approach

Chem ically adept students could work out the m olecular weight of an A-T or G-C base pair, given the form ulae of nucleot ides ( the answer is about 550) . A diploid cell contains about 6 picogram s (6 × 10 -12 g) of DNA. Students can use this, together with the Avogadro num ber (6 × 1023) , t o work out the num ber of base pairs in a (diploid) cell. Having worked that out , or being given the figure of 6 × 109, and knowing the spacing of base pairs is 0.34 nm ( from the X- ray diffract ion work of Rosalind Franklin) , students can work out the length of DNA in a (diploid) cell. Given that a person consists of around 1013 cells, they can then work out the length of DNA in their body. I f nothing else, this should give them pract ice in m anipulat ing indices, and illust rate how thin the DNA double helix m ust be!

Page 12: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 2

The DNA needs to be t ight ly packaged to fit into the nucleus, and the first level of packaging is into nucleosom es. A nucleosom e is an octam er of histones (sm all basic proteins whose posit ive charge gives them an affinity for the negat ively charged phosphate groups of DNA) . Each nucleosom e contains two m olecules each of histones H2A, H2B, H3 and H4, with 147 base-pairs of DNA wound round it . At the basic level, DNA is organised into a st r ing of beads, nucleosom es, separated by variable lengths of spacer DNA.

Figure 1 3 : Nucleosom es. H istone H1 is not par t of t he nucleosom e, but binds the im m ediate ly adjacent DNA. I f a gene is to be expressed it m ust be in accessible chrom at in. DNA that is wrapped up in nucleosom es (and especially when the st r ing of beads is in turn t ight ly coiled in higher levels of packaging) is not accessible to RNA polym erase and the other DNA-binding proteins necessary to init iate t ranscript ion. Chrom at in- rem odelling com plexes are large ATP-driven m ult iprotein com plexes that cont rol the posit ioning of nucleosom es along the DNA so as to m ake specific prom oters available for t ranscript ion.

I n nucleosom es, the histone m olecules have prot ruding N- term inal ‘tails’ that can interact with other proteins. Different proteins bind to the histone tails to st im ulate or inhibit t ranscript ion. Binding is cont rolled by covalent m odificat ions to the histone tails. Specific enzym es tag part icular am ino acid residues in specific histones with m ethyl, acetyl and other groups to allow com plex and flexible cont rol of gene expression. There are ‘writers’ that apply the tags, ‘readers’ that bind in response to the tags, and ‘erasers’ that rem ove tags.

Regulatory RNAs

Our genom es encode a rem arkable num ber of non-coding RNAs – that is, RNA m olecules that are m ade by t ranscribing specific DNA sequences, but that are not m essenger RNAs. Ribosom al RNA and t ransfer RNA are the best known exam ples, but in recent years we have seen an explosive growth in the num ber of other species ident ified. I n fact , we have m ore genes for non-coding RNAs than for proteins. We don’t know what the funct ion of all those RNAs is, but it is generally supposed that their prim ary role is, one way or another, to regulate the expression of protein-coding genes. Som e have been shown to be involved in cont rolling chrom at in st ructure, and hence gene expression.

You can see that cont rolling when and where a gene is expressed is imm ensely com plicated and subt le. But this should not com e as a surprise, given that we const ruct all the 200 or so different cell t ypes of our bodies, and organise them into flexible t issues and responsive organs, using hardly m ore protein-coding genes than the nem atode worm Caenorhabdit is elegans uses to organise it s 1000 cells into it s 1 m m long body (around 22 000 in m an, 19 000 in the worm ) .

Page 13: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 3

Epigenet ic m em ory

Epigenet ics ( literally ‘above genet ics’) is about the m echanism s that allow cells to retain a m em ory of their part icular pat terns of gene expression, and to pass that m em ory on to daughter cells. I n som e cases the m em ory can be t ransm it ted across generat ions, from parent to child, although it is quite cont roversial how general such t ransgenerat ional effect s are in hum ans ( they are bet ter character ised in plants, in vernalisat ion for exam ple) . The epigenet ic m odificat ions them selves are the sam e DNA m ethylat ion and histone m odificat ions that we have seen regulate t ranscript ion within a cell; the quest ion is how epigenet ic m em ory works.

The key to epigenet ic m em ory lies in the DNA m ethylt ransferases. Rem em ber that these can m ethylate cytosines in CpG sequences – that is, cytosines im m ediately upst ream of a guanine. I n the DNA double helix, CpG will base-pair with GpC. But because the two st rands are ant i-parallel, reading in the standard 5’ – 3’ direct ion, opposite every CpG in one st rand is a CpG in the other (Figure 13) .

Figure 1 3 : From New Clinical Genet ics, Read & Donnai. Scion Publishing 2 0 1 5 .

We have three DNA m ethylt ransferase enzym es. Two of them are responsible for de novo DNA m ethylat ion, adding m ethyl groups to CpG sequences that were previously unm ethylated. The third, DNMT1, is the m aintenance m ethylase. When a DNA m olecule is replicated, the newly synthesised st rands are init ially com pletely unm ethylated. However, DNMT1 then specifically m ethylates any CpG on a daughter st rand that lies opposite a m ethylated CpG on the tem plate st rand. Thus the specific pat tern of m ethylat ion is inherited from m other cell to daughter cells.

Other m echanism s besides DNA m aintenance m ethylat ion m ay cont r ibute to epigenet ic m em ory, since Drosophila flies do not m ethylate their DNA, yet can clearly regulate gene expression and m aintain cell different iat ion. This whole area is one of act ive research. Perhaps the basic quest ion is which is the prim ary factor – DNA m ethylat ion, histone m odificat ion or som ething else? I t appears that the various m echanism s reinforce one another by posit ive feedback. Methyl DNA-binding proteins recruit histone m odifying enzym es, but m odified histones recruit DNA m ethylt ransferases. I t seem s possible that t ranscript ion factors play the key role in all of this, and that binding t ranscript ion factors m ay be the prim ary cause, set t ing all the other processes in t rain.

Page 14: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 4

Stem cells

The cells of a very early em bryo are t ot ipotent – that is, they can different iate to form all cell t ypes of the fetus and adult , including the placenta. Later, at the blastocyst stage, when the em bryo consists of a hollow ball of a few hundred cells, the dozen or so cells of the inner cell m ass are pluripotent – they can develop into all cell t ypes of the adult body, but not into the cells of the placenta and m em branes. As developm ent proceeds, cells becom e m ore specialised. Term inally different iated cells do not norm ally divide; t issues are m aintained by sm all populat ions of m ult ipotent or unipotent stem cells. Stem cells can divide sym m et rically, to produce two daughter stem cells, or asym m etrically, to produce one stem cell and one cell (a t ransit am plifying cell) that can divide rapidly and produce the term inally different iated cells of a t issue.

All this progression is the result of successive epigenet ic m odificat ion of the genom e. Many years ago, long before any of this was understood, CH Waddington put forward the idea of an ‘epigenet ic landscape’. He conceived a m odel of a ball rolling down a t ilted three-dim ensional surface with hills and bifurcat ing valleys. As the ball rolls down, it s opt ions are lim ited to the valleys that open up from the part icular valley it is current ly occupying, and the further down the surface it rolls, the fewer it s opt ions are. As a m odel of the progressive epigenet ic rest r ict ion of different iat ion potency as em bryonic developm ent proceeds, it is very good.

I n 2015 we can put flesh on Waddington’s concept . Each valley is defined by the bat tery of genes a cell expresses, and this depends on the t ranscript ion factors present (Figure 14) . Am ong those genes are genes for further t ranscript ion factors, which in turn define the secondary valleys. Choices between valleys can depend on signals from the surrounding cells or m edium , or they can be generated within a cell by asym m etric cell division, or sim ple chance. Transcript ion factors act ive in higher valleys m ay be act ively turned off as different iat ion proceeds, or they m ay be sim ply diluted out as the cells m ult iply. Replacing them m ay reverse different iat ion (see below) .

Figure 1 4

Page 15: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely

Biology Topic Guide: Epigenet ics

© Pearson Educat ion Ltd 2015. Copying perm it ted for purchasing inst itut ion only. This mater ial is not copyr ight free. 1 5

Possible teaching approach

All blood cell t ypes (erythrocytes, lym phocytes, granulocytes, platelets and dendrit ic cells) are produced by descendants of a sm all populat ion of m ult ipotent haem atopoiet ic stem cells in the bone m arrow. This is a nice illust rat ion of these principles (Figure 15) .

Figure 1 5

Pluripotent stem cells are of great m edical interest because, in principle, pluripotent cells from a pat ient could be grown and different iated into any body cell t ype, and then used to replace dam aged cells or t issues of the pat ient without any of the problem s of rej ect ion that com plicate norm al t ransplants.

The first hum an pluripotent stem cells were em bryonic stem (ES) cells, obtained in the late 1990s by delicate and difficult m anipulat ion of cells from the inner cell m ass of blastocysts. These proved quite cont roversial, because in order to obtain them a hum an em bryo had to be dest royed. The em bryos used were spare ones from in vit ro fert ilisat ion clinics – the procedure norm ally produces m ore em bryos than would be re-im planted, and the couple concerned m ight agree to donate the surplus for research.

I deally, to avoid rej ect ion, a pat ient should receive ES cells derived from his own cells. This gave r ise to the idea of therapeut ic cloning, where a donated fert ilised egg was enucleated and the nucleus replaced by one from a som at ic cell of the pat ient ( the procedure that created Dolly the sheep) . The egg would then be grown to the blastocyst stage and pat ient -specif ic ES cells obtained.

Because of the m any pract ical and ethical difficult ies, all this rem ained rather theoret ical, unt il the discovery that different iat ion could be reversed. I f norm al, different iated, som at ic cells are t reated with a special cocktail of t ranscript ion factors, som e of them revert to pluripotency. With appropriate culture condit ions, the pluripotent cells can be m ult iplied in culture and then induced to different iate into any desired cell t ype.

Developm ent of these iPS ( induced pluripotent stem ) cells has opened the door t o a new world of clinical possibilit ies. Pat ient -specific cells of any type m ight now be produced in the laboratory – neurons for a pat ient with Parkinson disease, blood cells for a pat ient with bone m arrow failure, and so on, without any of the problem s surrounding ES cells. Producing iPS cells is a highly skilled and uncertain business, and quest ions rem ain about the safety of int roducing the derived cells into a pat ient – m ight som e of them develop into tum ours? Thus m any quest ions rem ain, but the future looks exceedingly prom ising.

Page 16: AS and A Level Biology - King Edward VII Academykesacademy.co.uk/wp-content/uploads/2015/12/... · amino acid sequence. Then they should do it backwar ds to prove it produces a completely