[III] Genes, Genomics, and Chromosomes
• Eukaryotic gene structure, Cot analysis, Rot analyses, chromosomal organization of genes and noncoding DNA
• Genomics: Genome-wide analysis of gene structure and expression
• Structural organization of eukaryotic chromosomes
• Morphology and functional elements of eukaryotic chromosome
Molecular Definition of a Gene
• Definitation of a “Gene”: The entire nucleic acid sequence that is necessary for the synthesis of a functional gene product (polypeptide or RNA)
• A gene includes: Nucleic acid sequence not only encoding the amino acid
sequence of the protein (coding region) It is also required for the synthesis of an RNA transcript It also contains the transcription-control region (i.e.,
enhancer or silencer) Sequences that specifies 3’ cleavage and polyadenylation
[poly(A)] sites, and splice sites
• Most genes are transcribed into mRNAs, but some are transcribed into RNA molecules such as tRNA, rRNA and shRNA
Gene Expression in Prokaryotes and Eukaryotes
• Gene expression in prokaryotes takes place in a single compartment, but gene expression in eukaryotes takes place in multiple compartments in multiple stages
EukaryotesProkaryotes
Eukaryotic Genes Produce Monocistronic mRNAs and Contain Lengthy Introns
• While prokaryotes produce polycistronic mRNA, eukaryotes produce monocistronic mRNA
• In the polycistronic mRNA, a ribosome binding site is present near the start site for each of the cistron, and translation can be initiated from each of these sites
• In eukayrotic mRNA, the 5’CAP site directs the binding of ribosome to the mRNA and protein synthesis begins from the closest AUG codon. Furthermore, most of the mRNA also possess poly(A) tails
• In eukaryotes, introns, which are larger than exons, need to be removed from the precursor mRNA (pre-mRNA) before it can direct protein synthesis. Some introns in human genes are as big as 17 kb. The median intron length is about 3 kb.
Comparison of Structures of the cDNA and Its Genomic Gene
The main differences between a cDNA and a genomic gene are:cDNA does not have introncDNA does not have a regulatory/promoter sequence
Distribution of Uninterrupted and Interrupted Genes in Various Eukaryotes
• Majority of the genes in yeast are uninterrupted
• Most of genes in flies are interrupted by one or two introns
• Most genes in mammals are interrupted by many introns
Sizes of Genes in Various Organisms
• Yeast genes are short
• Genes in flies and mammals have a dispersed bimodal distribution extending to very long sizes
Sizes of Exons and Introns
• Exons coding for proteins usually are short, but introns usually range from very short to very long
Exons Introns
Simple Eukaryotic Transcription Unit
• In eukaryotes, some DNA encodes a single protein while the others encode more than one protein
• It means that some genes have simple transcription unites while others have complex transcription units. This slide shows a simple transcription unit
Complex Eukaryotic Transcription Unit
• Three different ways to process the primary transcription product of a gene to give rise to different mRNAs : Using different splice
sites to produce different mRNA species
Using alternative poly(A) sites to produce mRNAs with different 3’ exons
Using alternative promoters to produce mRNA with different 5’exons and same 3’ exons
• Differential splicing of an precursor mRNA leads to production of isoforms of gene products
Kinetics of DNA Hybridization
• The rate of DNA annealing is proportional to the concentration of nucleic acid and time of hybridization
• dC/dt = -kC2 by integrating the equation between Co (initial) and after time t, C/Co = 1/(1 + k.Cot) . If C/Co = ½, Cot1/2 = 1/k
Suggested Reading: 1. Integration of Cot
analysis, DNA cloning and high-throughput sequencing facilitate genome characterization and gene discovery. Perterson et al. (2002) Genome Res 12:795-807.
2. Repeated sequences in DNA. Britten and Kohne (1968) Science 161: 529-540
Kinetics of DNA Reassociation (Cot Analysis)
• Britten and Kohne (1968) studied genomic DNA sequence via measuring the kinetics of DNA reassociation Assigned Reading: Repeated sequence in DNA
• Rate of DNA reassociation is dependent upon random collision of the complementary strands (i.e., concentration of DNA) and duration of time for collision to occur
dC/dt = -kC2 where k = reassociation constant
By integrationC/Co = 1/ (1 + k.Cot)
Indicating that parameter controlling the re-association reaction is the product of initial DNA concentration and time (Cot)
C/Co = ½ = 1/ (1+ kCot1/2) so: Cot1/2 = 1/k
• Cot1/2 is the concentration and time required for 50% re-association
Reassociation Kinetics of Eukaryotic
DNA
Cot1/2 (DNA of any genome) Complexity of any genome = Cot1/2 of E. coli 4.2 x 106 bp
Calculating the Complexity of a Genome
• Non-repetitive DNA: Only present once per genome Found in prokaryotic and eukaryotic genome
• Intermediate (Moderate) Repetitive DNA: Repeat several times (10-1000X) per genome Disperse throughout the genome in eukaryotes
• Highly Repetitive DNA: Short repetitive DNA (<100 bp) present up to 1 million times
in the eukaryotic genome• Larger genomes are not generated by increasing the number of
copies of the same sequences present in smaller genomes. It is due to the presence of more repetitive DNA
• Suggested Reading II: Initial sequencing and analysis of human genome. Nature 409: 861-
927, 2001. Finishing the eukaryotic sequence of human genome. Nature 431:
931-945, 2004.
Repetitive and Unique DNA Sequence in Eukaryotes
The Proportions of Different Sequence components in eukaryotic Genomes
• The absolute content of non-repetitive DNA increases with genome size but reaches a plateau at ~2-3x 109 bp
• mRNA is typically derived from non-repetitive DNA sequence
• A significant part of the moderately repeat DNA sequence consists of transposones (able to move around the genome)
Genomes of Many Organisms Contain Much Noncoding DNA
• Much of the DNA in many eukaryotic cells do not encode RNA or have any apparent regulatory function Yeast ,12 Mb; fruit flies, 180 Mb; chicken, 1300 Mb; human,
300 Mb DNA Many lower organisms than human have higher DNA contents
than human
• Data from DNA sequence analysis revealed that the genome of higher eukaryotes contain a large amount of non-coding DNA
• Gene rich region vs. gene desert region
Genome Size and Gene Numbers in Various Organisms
The number of genes in bacterial and archael genomes is proportional to the genome size
Relationship of Gene Number and Genome Size
• The number of genes in prokaryotes correlates well with the sizes of their genome
• The number of genes in eukaryotes does not correct well with their genome sizes
Protein-Coding Genes
• Solitary genes: About 25-50 percent of the protein-coding genes are represented only once in the haploid genome Chicken lysozyme gene contains 15 kb DNA coding sequence
which constitutes a simple transcription unit with three exons and 2 introns
• Duplicated genes: These genes are close but nonidentical sequences that often are located within 5-50 kb of one another called “gene family” Each gene family could contain from a few to 30 or so members Gene family: A set of duplicated genes that encode proteins with
similar but not identical amino acid sequences. Examples are: cytoskeletal proteins, the myosin heavy chain, the - and -globins
Protein family: Encode closely related , homologous proteins. Examples: protein kinases, vertebrate immunoglobins and olfactory receptors. Protein families include from just a few to 30 or more members
The genes encoding-globins are a good example of gene family that contains five functional genes: , , A, G, and E
Total Number of Genes and Duplicated Genes
• In bacteria, since most of the genes are unique, so the number of distinct families is close to the total gene number
• In eukaryotes, many genes are duplicated, and as a result the number of different gene families is much less than the total number of genes
Proportions of Unique and Duplicated Genes
The proportion of unique genes drops sharply with genome size; bacteria have the highest proportion of unique genes, and yeast, flies, worm and Arabidopsis drop sharply
Heavily Used Gene Products (rRNA and snRNA Genes) are Arranged in Tandem Repeat
• In vertebrates and invertebrates, the genes encoding rRNAs and some other noncoding RNAs such as snRNA are arranged in tandemly repeated arrays
• These tandemly repeated genes, appear one after the other, encode identical or almost identical proteins or functional RNAs
• The tandemly repeated rRNA and snRNA genes are needed to meet the great cellular demand for their transcripts. Example: cells have 100 copies or more of 5S rRNA genes
• Multiple copies of tRNA and histone genes are also present in clusters, but generally not in tandem repeat
A Tandem rDNA Gene Cluster
A tandem gene cluster of rRNA gene
Electromicrograph of DNA being
Transcribed into RNA
• Green arrow indicates DNA and Red arrow indicates RNA
• This micrograph was taken by O.L. Miller, Jr, and Barbara R. Beatty at Oak Ridge National Lab showing the transcription of tandem repeat of rRNA genes in Xenopus oocytes
Non-Protein Coding Genes
Encode functional
RNAs
• There are non-protein genes in the genome that encode functional RNAs. These RNAs are important in regulating the expression of genes
• Assigned Reading: The functional genomics of noncoding RNA. Mattick et al. (2005), Science 309: 1527-1528.
How Many Genes Are There in All Organisms?
• This slide shows the comparison of fly genes to those of the worm and yeast
• Orthologous genes (orthologs): Genes encod corresponding polypeptides in different organisms. Two gene products from different organism that their sequence share >80% of their lengths are considered as orthologs
• In flies, ~20% of the genes have orthologs with worm and yeast. These are required genes
• When fly genes are compared with those of worm, an additional 10% genes are considered as additional orthologs. This means that these 30% genes are required for flies and worms
• The total number of proteins can be a good estimate of the total proteome size
Proportion of Protein Encoding Genes in Human Genome
• Human haploid genome contains 22 autosomes plus the X and Y chromosomes, and the chromosomes range from 45 to 279 Mb DNA
• The total haploid genome size is 3286 Mb (~3.3 x 109 bp)
• The chromatin comprises majority of genome, ~2.9 x 109 bp)
• Although about 25% of the human genome are for protein coding genes, the actual exons are only 1%
The Structure of Average
Human Gene
Different Classes of Repetitive DNA Sequences Human Genome
• Five classes of repetitive DNA sequences in human genome: Transposons, 45% of
thegenome, multiple copies
Pseudogenes, ~3,000 in all
Simple sdequence of repetitive DNA, ~3% of total DNA
Segmental duplications, 10 to 300 Kbthat have been duplicated, ~5%
Tandem repeat from blocks of one typeof sequence
Genomic DNA of Eukaryotic Organisms
Classes of DNA % of Human Genome
Protein coding genes
#/genome
~25,000 55
Tandemly repeated genes
U2 snRNA ~20 <0.001
rRNA ~300 0.4Repetitious DNA
Single sequence DNA variable ~6
Interpersed repeat ~3.26 45
Processed peusogenes 1-~100 ~0.4
Unclassified spacer DNA n.a. 25
Interspersed repeats: DNA transposons, LTR retrotransposons, Non-LTR retrotranspons, LINEs and SINEs
Satellite DNAs • When eukaryotic DNA is
centrifuged on a CsCl gradient, two components are observed: Main band: most of
the genomic DNA Satellite band: one or
multiple miner bands; they could be heavier or lighter than the main band
• The main band DNA has buoyant density of 1.701 g/cm with a G-C content of 42%, and minor band DNA has the buoyant density of 1.690 g/cm with a G-C content of 30%
Satellite DNAs Lie in Heterochromatin
• Highly repetitive DNA (simple sequence DNA): Satellite DNA is characterized by rapid rate of hybridization, consists of very short sequences repeated many times in tandem in large clusters. It is typically <10%
• In addition, multi-cellular eukaryotes have complex satellites with longer repeat units mainly in heterochromatic region
• In human, satellite DNA that consists of 171 bp repeats. -satellite DNA family has repeat units interspersed with a longer 3.3 repeats
• The tandem repeat DNA often has a distinct physical property that can be used to isolate. This physical property is the buoyant density which is lower than the buoyant density of the non-repetitive DNA
• Therefore, by equilibrium centrifugation on a CsCl gradient, the satellite DNA can be separated from the non-repetitive DNA
• The buoyant density of a duplex DNA depends on the G-C content according to the following formula
Buoyant density = 1.660 + 0.00098 (% G-C) g/cm-3
Most Simple-Sequence DNAs are Concentrated in Specific Chromosal Locations
• Repetitious DNA is present in the genome of eukaryotic cells Simple-sequence DNA or
called satellite DNA (6% of the human genome), size 14 to 500 bp
Microsatellite, 1-13 bp Interspersed repetitive DNA
dispersed throughout the genome (also called as transposable elements)
• By fluorescence in situ hybridization (FISH), the simple-sequence DNAs are localized near the centromeres and telomeres of mouse chromosome
• Centromeric heterchromatin---necessary for separation of chromosome to daughter cells
Diseases Associated with Microsatellites
• Microsatellite occasionally occur within transcription units
• At least 14 different types neuromuscular disease associate with microsatellite repeats in transcription unit of the gene
• Myotonic dystrophy and spinocerebellar ataxia are the examples. In myotonic dystrophy, the transcript of DMPK (dystrophia myotonica protein kinase) gene contain 1000 to 4000 repeats of the sequence of CUG in the 3’ end untranslated region that interfere with normal RNA processing and export of the mature RNA from nucleus to cytosol
Probing Minisatellite DNA by Southern Blot Hybridization
• DNA samples from three different individuals were digested with a restriction enzyme Hinf1, separated on agarose gels, transferred to nylon membranes and probed with three different radio-labeled minisatellites
• Different unique among individuals were observed with different individuals
• DNA Fingerprinting depends on differences in length of simple-sequence DNA
DNA Fingerprinting
• Minisatellite DNA: 14 to 100 bp repeat in a region of 1 to 5 kb region which makes up of 20-50 repeat units.
• A slight difference in the total length of the repeats can be detected by PCR analysis. This forms the basis of DNA fingerprinting
• This technique can be used in population studies, paternal or maternal identity test and criminal identification
Hybridization Kinetics of cDNAs to mRNAs
• The population complexity of mRNA isolated from a cell can be estimated by studying the kinetics of hybridization of mRNAs to their cDNAs
• The example given below is to compare the mRNA population differences of RNA isolated from estrogen treated trout liver to its untreated control: Isolate total RNA samples from livers of estrogen treated
fish and control (RNAind & RNAunind)
Prepare 32P-labeled cDNAind by reverse transcription
Set up hybridization between 32P-cDNAind and RNAunind at different Rot values (concentration of 32P-cDNAind x time)
Determine the amount of hybridization by treating the hybridization mixture with S1 nuclease
Hybridization between mRNA and cDNA
• This slide shows the hybridization profile of excess mRNA of chick oviduct with the cDNA of chick oviduct
• 32P-labelled cDNA synthesized from mRNA of chick oviduct and hybridized to excess mRNA of chick oviduct
• The result showed that there are three components of cDNA present at different frequencies hybridizing to chick oviduct mRNA:• About 50% of cDNA hybridizing
at a Rot1/2 of 0.0015
• About 15% of cDNA hybridizing at a Rot1/2 of 0.04
• About 35% of cDNA hybridizing at a Rot1/2 of 30
Rot Analysis of Excess mRNA and cDNA of Chick Oviduct Cells
• Total mRNA was isolated chick oviduct cells
• 32P-cDNA was prepared from the total mRNA by reverse transcription
• Rot analysis was conducted between radio labeled cDNA and excess amount of total mRNA
• The Rot analysis data showed that there are three components of sequences hybridizing to cDNA: The first component has the
characteristic of ovalbumin mRNA The second component has the
total complexity of 15 Kb (7-8 different mRNA of 2000 bases
The last component has the complexity of 26 Mb (~13,000 mRNA)
cDNA of estrogen-treated oviduct RNA hybridize to un-treated oviduct RNA
Number of Expressed Gene Measured by DNA Microarray Analysis
• Although Rot analysis can be used to reveal the complexity of mRNA population in any cell type, the number of gene expressed in any cell type can be determined by DNA microarray.
• In this assay, the mRNA isolated from the cell type of interest can be reversed transcribed to cDNA with tags
• The labeled cDNA is used to hybridize to an DNA array that contains entire number of genes of an organism of interest
• The genes that hybridized to the tagged cDNA can be visualized by scanning the array
• This slide shows results of DNA microarray analysis to determine expression of 12 genes in 59 individual breast tumor tissues of breastfed and breast-unfed women
• Genes highly expressed are shown “red”, lower expression in “blue”, equal expression in “grey”
• Genomics: Genome-wide analysis of gene structure and expression
Database of Genomes
• Using automated DNA sequencing techniques, methods for cloning DNA fragments on the order of 100 Kb in length, and computer algorithms to piece together the stored sequence data, scientists have determined vast amounts of DNA sequences including the entire genome of human, and many key experimental organisms e.g., the round-worm (C. elegans), fruit flies, mice, medaka and zebrafish etc.
• Since the cost of sequencing Mb of DNA is becoming very cheap, the genomes of many organisms are rapidly been determined
• There are two databases for human genome: The gene bank at the National Institute of Health at Bethesda,
MD The EMBL sequence base at the European Molecular Biology
Laboratory in Heidelberg, Germany
Comparison of the Regions of Human NF1 Protein with Ira Protein of S. cerivisiae
• Ira, the GTPase activating protein (GAP) modulate the GTPase activity of the monomeric G protein called ras. Both GAP and ras function to control cell replication and differentiation in response to signals from outside of the cell
Structural Motifs• When a protein shows no significant similarity to other
proteins with the BLAST (basic local alignment sequence tool) algorithm, it may nevertheless share a short sequence that is functionally important. Such short sequence recurring in many different proteins, referred to as structural motifs
Comparison of Related Sequences from Different Species Can Give Clues to Evolutionary
Relationship Among Proteins
• Paralogous: sequences that diverged as the result of gene duplication
• Orthologous: sequences that aroused because of speciation
By scanning for “Open Reading Frame” (ORF) ORF is defined as a stretch of DNA containing at least with 100
bp with a start codon and a stop codon of translation ORF analysis has identified at least more than 90% of the genes
in bacteria and yeast Both very short genes and long genes are missed by this
method For eukaryotic genes, due to the presence of multiple exons and
introns, scanning of the ORF is not a good method to identify genes. One needs to use computer programs to compare the genomic DNA sequences to cDNA sequences, splice site sequences and sequences of the expressed sequence tags (EST)
Another powerful method for identifying human genes is to compare the human genomic sequence with that of the mouse since human and mouse are sufficiently related to have most genes in common
Genes Can be Identified within Genomic DNA Sequences
Comparison of the Gene Number and Type of Proteins Encoded in the Genomes of different Organisms
Structural organization of eukaryotic chromosomes
Questions?
• How are DNA molecules organized within eukaryotic cells?
Total length of cellular DNA is up to a hundred thousand times of cell’s length and the packing of DNA is crucial to cell architecture
During interphase, DNA exists as a nucleoprotein complex, called as chromatin, dispersed throughout the nucleus
During mitosis, chromatin further compact into visible metaphase chromosomes which can be visualized under a microscope
Package of DNA in Microorganisms
• In viruses, genomic DNA molecule is associated with protein molecules and packaged inside the viral capsids. In bacteria and fungi, the genomic DNA is associated with proteins and is packaged as a compact mass inside the center of the cell. It is called as “nucleoid”
Electronmicrographs of Extended and Condensed Chromatin
Extended form Condensed form
• Nucleosomes: Chromatin isolated from nucleus under low salt and no divalent cation (Mg+2), the isolated chromatin resembles “beads on a string”. The beads are termed nucleosomes and the string termed linker
• Nucleosome is about 10 nm in diameter and is the primary structural unit of chromatin
Nuclear DNA Associate with Histones to form Chromatin
• When the DNA from eukaryotic nuclei was isolated in an isotonic buffer (i.e.,~0.15 M KCl), it is associated with an equal mass of proteins (histones [basic proteins]) as chromatin
• There are five different histones found in the chromatin, namely H1, H2A, H2B, H3 and H4. The sequences of four histones (H2A, H2B, H3 and H4) among different organisms are similar—suggesting these proteins fold into similar three dimensional
conformation
Four Classes of Histones
• Histones: Small basic chromosomal proteins rich in basic amino acids (lysine-rich and arginine-rich; positive charge)
Separation of Nucleosomes
• When the chromatin of nuclei is digested with micrococcal nuclease, discrete DNA fragments with definitive sizes can be recovered from the digested fraction
• When the digested materials were separated by gradient centrifugation, different size particles were isolated
• These particles are monomers, dimers, trimers and tetramers with different DNA size fragments
DNA fragments isolated from chromatin digested with micrococal nuclease (limited digestion)
Individual Nucleosomes Released by Digestion of Chromatin with Limited Amounts of Micrococcal Nuclease
100 nm
Structure of Nucleosomes
• The DNA in the nucleosomes are less susceptible to digestion by nuclease than that in the linkers
• By controlling the digestion with nuclease, free nucleosomes can be isolated
• A nucleosome consists of a protein core with DNA wrap around its surface like thread around the spool
• The protein core is an octomer containing two copies of H2A, H2B, H3 and H4
• Nucleosomes from all eukaryotes contain 147 bp of DNA wrapped slightly less than 2 turns around the protein core
• The length of the linker DNA is variable ranging from 8 to 114 bp, H1 associates with the linker DNA
Nucleosome
Dimer and Monomer of Nucleosones
• Mononucleosomes typically have ~200 bp DNA. End trimmed nucleosomes reduces the DNA to ~165 bp
• The core particles have DNA fragment of ~140 bp
• The linker DNA between two nucleosomes varies from 8 to 114 bp
Beads-on-a-String Structure of Chromatin
a, In the presence of histone H1
b, In the absence of histone H1
Structures of Nucleosome
• Mononucleosome is 10 nm particle which contains 200 bp DNA and histone octomer (consisting two copies of H2A, H2B, H3, and H4)
• DNA occupies most of the outer surface of the nucleosome
• Sequences on the DNA that lie on different turns around the nucleosome may be close together
Structures of the Four Core Histones
• Histone H2A and H2B each has 2 short -helix and one long -helix regions. These regions can form special folding
• Similarly, histone H3 and H4 also have 2 short -helix and one long -helix regions. These regions also form special folding as shown in next slide
Histone fold is formed by the three -helical regions of the core histones (a).The histone fold regions of two histone molecules allow them to associate to form a heterodimer (b)
Histone Fold
Structure of the Nucleosome
(a) Left: Top view of nucleosome; Right: Side view of nucleosome(b) Model of a nucleosome viewed from the top with histone shown
as ribbon diagram
Interaction of Histone 1 with Nucleosome
Histone 1 interacts with the central gyre of the DNA at the dyad axis, as well as with the linker DNA at either the entry of the exit
Histone Tails
The N- and C-termini of histone H2A, H2B, H3 and H4 project out from the core of the nucleosome. These regions are termed as histone tails
• Histone tails are chemically modified: acetylation, methylation, phosphorylation and ubiquitination to form histone codes
• The lysine residues in the histone tails of H3 and H4 can go through reversible acetylation and deacetylation. Acetylation in the lysine group will prevent the chromatin to condense
• Histone tails can also associate with other chromosomal proteins and thus affect transcription and DNA replication; this interaction can be affected by acetylation of the lysine or methylation of lysine and arginine in the histone tails
• Phosphorylation of serine residues on histones is another modification of histone tails
Modification of Histone Tails
Acetylation and Deacetylation of Histones
Enzymes responsible for acetylation of histones are histone acetyltransferases (HATs) [Gcn5 N-acetytransferases, p300/CBP family and MYST family]
Enzymes responsible for deacetylation of histones are histone deacetylases (HDACs)
The levels of acetylation of the N-terminus of histone is controlled by the balance between HAT and HDAC
Acetylation of Lysine Residues on Core Histones
Acetylation and Methylation of H3 and H4 Histones Enzyme
responsible for methylation is methylase
The methylated group can also be removed by specific enzymes
Lysine 9 in H3 can be either aceylated or methylated
Methylation of Lysine 9 in H3 can inhibits the acetylation of lysine 14
Histone Modifications
• Lysine -amino groups can be methylated several times, leading to preventing acetylation, and thus maintaining their positive charge.
• Arginine side chain can also be methylated
• Serine and threonine side chains can be reversily phosphrylated, introducing a negative charge
• A single 76-amino acid ubiquitin molecule can be reversibly added to a lysine C-terminal tails of H2A and 2B. Addition of ubiquitin to H2A and H2B could reduce the positive charge of histone H2A and H2B
Overall Modifications of Histone Tails
Important Terms
• Histone code: The situation of acetylation, methylation, phosphorylation, ubiqutination and sumolation of the histone tails. The pattern of modification affects the activity of the genes on the chromatin
• Changes of charges on the histones resulting from histone modification will also affect the binding of non-histone proteins to the chromatin. This is essential for gene expression
• Bromodomain: A protein domain of many transcription factors that recognizes lysine residues in the histone tail
• Chromodomain (chromatin organization modifier commonly found in modifier): Protein structural domain of about 60 amino acid residues found in association with remodeling and manipulation of chromatin
• Chromoshadow domain: A protein domain which is distantly related to the chromodomain. Proteins containing a chromoshadow domain include Su(var)205 (HP1) and mammalian modifier 1 and modifier 2
• PHD finger: Cys4-His-Cys3 motif of HAT3 . It relates to epigenetics
• TUDOR domain: A protein that recognizes methylated histones
Sites on Histone Modification and Functions
Most modified sites in histones have a single, specific type of modification, but some sites have more than one type of modification
Structure of Condensed Chromatin• When chromatin was extracted from
cells in isotonic buffers, it appears as fibers = 30 nm in diameter
• Nucleosomes in this type of chromatin are packaged into an irregular spiral or solenoid arrangement (about 6 nucleosome per turn)
• H1 histone is associated with each nucleosome
• Electron microscopic observation revealed that the 30 nm fiber is less uniform than the perfect solenoid
• Condensed chromatin may be very dynamic with regions occasionally unfolding and then refolding into solenoid structure
• Chromatin in chromosomal regions that are not being actively transcribed exists in condensed fiber form or in higher-order folded structures
Condensed chromatin in 30 nm fiber structure.
The solenoid model for the structure of the 30 nm chromatin fiber
Structure of the 30-nm Chromatin Fiber
• The structure of chromatin is highly conserved in different organisms
• The amino acid sequences of four histones (H2A, H2B, H3 and H4) are highly conserved between distantly related species
• The amino acid sequence of H1 varies more from organism to organism
• The similarity in histone structures suggests that they fold into very similar 3-dimensional conformations which were optimized for histone function early in evolution in a common ancestor of all modern eukaryotes
H1 and Other Modified Histones on the Formation of 30 nm fiber
(a). Positive charge of H4 and negative chage of H2A and H2B resulted in closer package of these two nucleosomes (b). Acetylation of Lysine 16 (K16) resulted in loss of positive in this position and lead to neutrolization of “+” and “-” attraction between these two nucleosomes
The globular region of the H1 (in pink color) interacts with the linker DNA as it exits the nucleosome and changes its path to produce a more compact structure
Further Condensation of Chromatin Fiber
The 30 nm chromatin fiber can be thrown into a series of loops, each of approximately 50,000-200,000 bp in size, producing a looped fiber with diameter of approximately 300 nm
Nonhistone Proteins, a Structural Scaffold for Long Chromatin Loops
• In addition to histones, non-histone proteins are also involved in organizing chromosome structure
• Chromosome scaffold: nonhistone proteins associated with the metaphase chromosome
• The shape of the scaffold is maintained even after the DNA on the metaphase chromosome is removed by DNase digestion
The loops of the 30 nm fiber are attached to the nuclear scaffold via the matrix- attachment regions (MAR s)
An individual loop can alter its structure from that of 30 nm fiber to the beads-on-a-string structure, allowing transcription to occur. These regions can be detected by its sensitivity to limited digestion by DNase I
Is DNA attached to the scaffold via specific sequence?
• MARs (Matrix Attachment Regions): DNA on the chromatin that attach to the scaffold proteins, It is also called as SARs (scaffold attachment regions)
• Figure in this slide shows how MARs can be isolated
• DNA sequence analysis revealed that there is no consensus sequence on the DNA that bind to scaffold matrix except the DNA is ~70% AT rich
• Furthermore, it has been found that MARs contain DNA that are cis-elements regulating transcription or topoisomerase II recognition site, suggesting that MARs may provide sites for topographical change in DNA
Heterochromatin
• Heterchromatin: A region of the chromatin that does not uncoil after mitosis. It is a dark staining area of the chromatin
• In mammalian cells, heterochromatin appears as darkly staining regions of the nucleus, often associated with the nuclear envelope
• Experiments of pulse labeling with 3H-uridine and autoradiography showed that most transcription occurs in regions of euchromatin and the nucleolus
• In general, heterochromatic regions are sites of inactive genes; however some transcribed genes have been located in regions of heterchromatin. Not all inactive genes and non-transcribed regions of DNA are visible as heterochromatin
Heterochromatin Versus Euchromatin
Bone marrow stem cell
Dark stained regions show heterochromatin and the light stained regions show euchromatin
The modifications of histone N-terminal tails in the heterochromatin and euchromatin of the histone H3
Probing Nontranscribed Genes
from Transcribed Genes
• Transcribable genes are sensitive to limited digestion by DNase I
• Nuclei from chicken embryo erythroblasts at 14 days and undifferentiated chicken lymphoblastic leukemia cells were exposed to increasing amount of DNase I, and DNA isolated from the digested nuclei
• DNA digested with Bsm H1, separated by agrose gel, transferred to nylon membrane and probed with the 4.5 Kb globin gene fragment
Model for the Formation of Heterochromatin by Binding to
Histone H3 Trimethylated at Lysine 9
• HP1: Heterochromatin protein 1, contribute to the condensation of heterochromatin by binding to the N-terminus lysine 9 of histone 3 after it is trimethylated
• The HP1 bound histone 3 will continue to associate among each other (HP1 oligomerization) and cause chromatin aggregation and condensation
• Heterochromatin condensation can spread along a chromosome because HP1 binds a histone methytransferase (HMT) that methylates lysine 9 of histone H3. This creates a binding site for HP1 on the neighboring nucleosome. The spreading process continues until a “boundary element” is encountered
Chromatin Contains Small Amounts of Nonhistone Proteins
• Besides histones and scaffold proteins, chromatins also contain small amounts of non-histone proteins
• High mobility group (HMG) proteins: Proteins can bind to transcription factors. In yeast, removal
of HMG genes will result in expression of other genes allover the genome
HMG proteins are found to bind with transcription factors and thus stabilizing the transcription factor complex to regulate the expression of genes
• DNA binding transcription factors: Regulate the transcription of genes
Model for the Folding of the 30-
nm Chromatin Fiber in a
Metaphase Chromosome
Model for the Packing of Chromatin and the Chromosome Scaffold in Metaphase
Overview of the Structure of Genes & Chromosomes
Eukaryotic Chromosomes Contain One Linear DNA Molecule
• Since the largest intact DNA molecules in lower eukaryotes can be extracted from the cells, it indicates that each chromosome contains a single DNA molecule DNA molecules (2.3 x 105 to 1.5 x 106 bp ) from S. cerevisiae can be
separated by pulse-field gel electrophoresis Drosophila genomic DNA (6 x 107 to 1 x 108 bp) can be readily
analyzed The largest DNA of human chromosomes (2.8 x 108 bp) are too large
to be extracted as intact molecules
• In summary, eukaryotic chromosome is a linear structure composed of an immensely long, single DNA molecule that is wound around histone octomers about 200 bp, forming strings of closed packed nucleosomes. The nucleosomes fold to forma 30-nm chromatin fibers. The fibers attach to scaffold proteins to form loops. In addition, thousands of transcription factors and HMG proteins are also found
Morphology and functional elements of eukaryotic chromosome
Microscopic Appearance of a Typical Metaphase Chromosome
• Colchisine or colcemid: compound that destroy microtubule and thus leaving the two sister chromatid attach together in metaphase
• Karyotype: number, size and shapes of metaphase chromosomes
Karyotypes of Human Chromosomes
• In non-dividing cells, chromosomes are not visible• During mitosis or meiosis, chromosomes condensed and become
visible by light microscopy• During metaphase of mitosis, each chromosome is in the form of
divalent chromatids attached at the centomer
Giemsa Staining of Chromosomes
• G bands: Giemsa staining of human chromosomes which will give specific patterns G-bandings
• G-bands correspond to large regions of the human genome that have low “G+C” content
• R Bands: R bands are produced by treating human chromosomes with hot alkaline solution and subsequent staining with Giemsa reagent. The pattern of R-bands is opposite to the pattern of G-bands
• R-bands and G-bands are used to identify chromosome aberration by cytogeneticist
• Chromosome painting: Revealing chromosomes by in situ hybridization of chromosome with fluorescence probes (FISH). It can be in single or multiple color
Giemsa Staining of Chromosomes
Using G-Banding and Multicolor FISH to Reveal Transloaction
Translocation between chromosome 9 and chromosome 22 to result in Philadelphia chromosome in nearly all myelgenous leukemia patients
Banding on Drosophila Polytene Salivary Gland Chromosomes
Band revealed by in situ hybridization
• This is caused by DNA amplification but the daughter chromosomes do not separate.
Interphase Polytene Chromosome in the Salivary Gland of Drosophila melanogaster Arise by DNA Amplification
Functional Elements Required for Replication and Stable Inheritance of Chromosomes
• Although chromosomes differ in length and number between species, the chromosomes behave similarly at the time of cell division
• Three functional elements are required for any eukaryotic cells to replicate and segregate correctly: Replication origins The centromer Two telomeres
• Experiments described in next few slides are designed to demonstrate the importance of these functional elements
Yeast Transfection Experiment
ARS is (Automomas replication sequence) is required for DNA replication in Yeast
Yeast Transfection Experiment
CEN (Yeast centromere sequence) is required for proper segregation
Yeast Transfection Experiment
TEL (Telemere sequence) is required for chromsomal DNA replication
Comparison of CEN Sequence between Yeast and Drosophila
• Centromeres from yeast and Drosophila vary greatly in length• Region I and Region III are short and sequences are conserved• Region II, although with various sequence, is fairly constant in
length and is rich in AT content• While region I and II bind to about 30 proteins and also bind to
microtubule of the spindle apparatus during mitosis, region II is bound to a nucleosome with H3 been replaced by a variant form of H3 (e.g., CENP-A in human)
Yeast Artificial Chromosomes Serve as Cloning Vector to Clone Megabase DNA Fragments
• Yeast artificial chromosome (YAC) consists of TEL sequence from yeast, yeast CEN and ARS plus selection marker and DNA to be cloned to make up more than 50 k
• Only 1 daughter cell out of 1,000 to 10,000 failing to receive an artificial chromosome
• The successful propagation of YACs and studies presented earlier strongly support the conclusion that yeast chromosomes, and probably all eukaryotic chromosomes are linear double-stranded DNA molecules containing special regions that ensure replication and proper segregation
Action of Telomerase to Prevent Shorting of Chromosomes
• Telomeres of several organisms are shown to contain repetitive oligmers with a high G content in the 3’end at the end of the chromosome. The repeat sequence is TTAGGG
• The lengths of repeats are several bp in protozoans and several thousand bp in vertebrates
• The region is bound by specific proteins that both protect the ends of the linear chromosomes from exonuclease digestion
• Synthesis of DNA in the lagging strand can not reach completion like leading strand, and results in shortening of the chromosomes. Telomerase can fill in the missing sequence in the lagging strand, thus maintaining the proper length of chromosome
• Reading List:– Maintenance of chromosomes by telomeres and telomerase. The
Nobel Prize in Physiology or Medicine 2009
Assigned Readings [III]:
1. Repeated Sequence in DNA2. Integration of Cot analysis, DNZ cloning and high-throughput
sequencing facilitate genome characterization and gene discovery
3. Initial sequencing and analysis of human genome4. Finishing the eukaryotic sequence of human genome5. The functional genomic of non-coding RNA6. Maintenance of chromosomes by telomeres and telomerase, A
Nobel Lecture7. DNA methylation and histone modifications: teaming up the
silence genes8. Histone lysine demethylases: emerging roles in development,
physiology and disease9. The key to development : interpreting the histone code? 10. Histones