Download - [III] Genes, Genomics, and Chromosomes Eukaryotic gene structure, Cot analysis, Rot analyses, chromosomal organization of genes and noncoding DNA Genomics:

[III] Genes, Genomics, and Chromosomes

• Eukaryotic gene structure, Cot analysis, Rot analyses, chromosomal organization of genes and noncoding DNA

• Genomics: Genome-wide analysis of gene structure and expression

• Structural organization of eukaryotic chromosomes

• Morphology and functional elements of eukaryotic chromosome

Molecular Definition of a Gene

• Definitation of a “Gene”: The entire nucleic acid sequence that is necessary for the synthesis of a functional gene product (polypeptide or RNA)

• A gene includes: Nucleic acid sequence not only encoding the amino acid

sequence of the protein (coding region) It is also required for the synthesis of an RNA transcript It also contains the transcription-control region (i.e.,

enhancer or silencer) Sequences that specifies 3’ cleavage and polyadenylation

[poly(A)] sites, and splice sites

• Most genes are transcribed into mRNAs, but some are transcribed into RNA molecules such as tRNA, rRNA and shRNA

Gene Expression in Prokaryotes and Eukaryotes

• Gene expression in prokaryotes takes place in a single compartment, but gene expression in eukaryotes takes place in multiple compartments in multiple stages

EukaryotesProkaryotes

Eukaryotic Genes Produce Monocistronic mRNAs and Contain Lengthy Introns

• While prokaryotes produce polycistronic mRNA, eukaryotes produce monocistronic mRNA

• In the polycistronic mRNA, a ribosome binding site is present near the start site for each of the cistron, and translation can be initiated from each of these sites

• In eukayrotic mRNA, the 5’CAP site directs the binding of ribosome to the mRNA and protein synthesis begins from the closest AUG codon. Furthermore, most of the mRNA also possess poly(A) tails

• In eukaryotes, introns, which are larger than exons, need to be removed from the precursor mRNA (pre-mRNA) before it can direct protein synthesis. Some introns in human genes are as big as 17 kb. The median intron length is about 3 kb.

Comparison of Structures of the cDNA and Its Genomic Gene

The main differences between a cDNA and a genomic gene are:cDNA does not have introncDNA does not have a regulatory/promoter sequence

Distribution of Uninterrupted and Interrupted Genes in Various Eukaryotes

• Majority of the genes in yeast are uninterrupted

• Most of genes in flies are interrupted by one or two introns

• Most genes in mammals are interrupted by many introns

Sizes of Genes in Various Organisms

• Yeast genes are short

• Genes in flies and mammals have a dispersed bimodal distribution extending to very long sizes

Sizes of Exons and Introns

• Exons coding for proteins usually are short, but introns usually range from very short to very long

Exons Introns

Simple Eukaryotic Transcription Unit

• In eukaryotes, some DNA encodes a single protein while the others encode more than one protein

• It means that some genes have simple transcription unites while others have complex transcription units. This slide shows a simple transcription unit

Complex Eukaryotic Transcription Unit

• Three different ways to process the primary transcription product of a gene to give rise to different mRNAs : Using different splice

sites to produce different mRNA species

Using alternative poly(A) sites to produce mRNAs with different 3’ exons

Using alternative promoters to produce mRNA with different 5’exons and same 3’ exons

• Differential splicing of an precursor mRNA leads to production of isoforms of gene products

Kinetics of DNA Hybridization

• The rate of DNA annealing is proportional to the concentration of nucleic acid and time of hybridization

• dC/dt = -kC2 by integrating the equation between Co (initial) and after time t, C/Co = 1/(1 + k.Cot) . If C/Co = ½, Cot1/2 = 1/k

Suggested Reading: 1. Integration of Cot

analysis, DNA cloning and high-throughput sequencing facilitate genome characterization and gene discovery. Perterson et al. (2002) Genome Res 12:795-807.

2. Repeated sequences in DNA. Britten and Kohne (1968) Science 161: 529-540

Kinetics of DNA Reassociation (Cot Analysis)

• Britten and Kohne (1968) studied genomic DNA sequence via measuring the kinetics of DNA reassociation Assigned Reading: Repeated sequence in DNA

• Rate of DNA reassociation is dependent upon random collision of the complementary strands (i.e., concentration of DNA) and duration of time for collision to occur

dC/dt = -kC2 where k = reassociation constant

By integrationC/Co = 1/ (1 + k.Cot)

Indicating that parameter controlling the re-association reaction is the product of initial DNA concentration and time (Cot)

C/Co = ½ = 1/ (1+ kCot1/2) so: Cot1/2 = 1/k

• Cot1/2 is the concentration and time required for 50% re-association

Reassociation Kinetics of Eukaryotic

DNA

Cot1/2 (DNA of any genome) Complexity of any genome = Cot1/2 of E. coli 4.2 x 106 bp

Calculating the Complexity of a Genome

• Non-repetitive DNA: Only present once per genome Found in prokaryotic and eukaryotic genome

• Intermediate (Moderate) Repetitive DNA: Repeat several times (10-1000X) per genome Disperse throughout the genome in eukaryotes

• Highly Repetitive DNA: Short repetitive DNA (<100 bp) present up to 1 million times

in the eukaryotic genome• Larger genomes are not generated by increasing the number of

copies of the same sequences present in smaller genomes. It is due to the presence of more repetitive DNA

• Suggested Reading II: Initial sequencing and analysis of human genome. Nature 409: 861-

927, 2001. Finishing the eukaryotic sequence of human genome. Nature 431:

931-945, 2004.

Repetitive and Unique DNA Sequence in Eukaryotes

The Proportions of Different Sequence components in eukaryotic Genomes

• The absolute content of non-repetitive DNA increases with genome size but reaches a plateau at ~2-3x 109 bp

• mRNA is typically derived from non-repetitive DNA sequence

• A significant part of the moderately repeat DNA sequence consists of transposones (able to move around the genome)

Genomes of Many Organisms Contain Much Noncoding DNA

• Much of the DNA in many eukaryotic cells do not encode RNA or have any apparent regulatory function Yeast ,12 Mb; fruit flies, 180 Mb; chicken, 1300 Mb; human,

300 Mb DNA Many lower organisms than human have higher DNA contents

than human

• Data from DNA sequence analysis revealed that the genome of higher eukaryotes contain a large amount of non-coding DNA

• Gene rich region vs. gene desert region

Genome Size and Gene Numbers in Various Organisms

The number of genes in bacterial and archael genomes is proportional to the genome size

Relationship of Gene Number and Genome Size

• The number of genes in prokaryotes correlates well with the sizes of their genome

• The number of genes in eukaryotes does not correct well with their genome sizes

Protein-Coding Genes

• Solitary genes: About 25-50 percent of the protein-coding genes are represented only once in the haploid genome Chicken lysozyme gene contains 15 kb DNA coding sequence

which constitutes a simple transcription unit with three exons and 2 introns

• Duplicated genes: These genes are close but nonidentical sequences that often are located within 5-50 kb of one another called “gene family” Each gene family could contain from a few to 30 or so members Gene family: A set of duplicated genes that encode proteins with

similar but not identical amino acid sequences. Examples are: cytoskeletal proteins, the myosin heavy chain, the - and -globins

Protein family: Encode closely related , homologous proteins. Examples: protein kinases, vertebrate immunoglobins and olfactory receptors. Protein families include from just a few to 30 or more members

The genes encoding-globins are a good example of gene family that contains five functional genes: , , A, G, and E

Total Number of Genes and Duplicated Genes

• In bacteria, since most of the genes are unique, so the number of distinct families is close to the total gene number

• In eukaryotes, many genes are duplicated, and as a result the number of different gene families is much less than the total number of genes

Proportions of Unique and Duplicated Genes

The proportion of unique genes drops sharply with genome size; bacteria have the highest proportion of unique genes, and yeast, flies, worm and Arabidopsis drop sharply

Heavily Used Gene Products (rRNA and snRNA Genes) are Arranged in Tandem Repeat

• In vertebrates and invertebrates, the genes encoding rRNAs and some other noncoding RNAs such as snRNA are arranged in tandemly repeated arrays

• These tandemly repeated genes, appear one after the other, encode identical or almost identical proteins or functional RNAs

• The tandemly repeated rRNA and snRNA genes are needed to meet the great cellular demand for their transcripts. Example: cells have 100 copies or more of 5S rRNA genes

• Multiple copies of tRNA and histone genes are also present in clusters, but generally not in tandem repeat

A Tandem rDNA Gene Cluster

A tandem gene cluster of rRNA gene

Electromicrograph of DNA being

Transcribed into RNA

• Green arrow indicates DNA and Red arrow indicates RNA

• This micrograph was taken by O.L. Miller, Jr, and Barbara R. Beatty at Oak Ridge National Lab showing the transcription of tandem repeat of rRNA genes in Xenopus oocytes

Non-Protein Coding Genes

Encode functional

RNAs

• There are non-protein genes in the genome that encode functional RNAs. These RNAs are important in regulating the expression of genes

• Assigned Reading: The functional genomics of noncoding RNA. Mattick et al. (2005), Science 309: 1527-1528.

How Many Genes Are There in All Organisms?

• This slide shows the comparison of fly genes to those of the worm and yeast

• Orthologous genes (orthologs): Genes encod corresponding polypeptides in different organisms. Two gene products from different organism that their sequence share >80% of their lengths are considered as orthologs

• In flies, ~20% of the genes have orthologs with worm and yeast. These are required genes

• When fly genes are compared with those of worm, an additional 10% genes are considered as additional orthologs. This means that these 30% genes are required for flies and worms

• The total number of proteins can be a good estimate of the total proteome size

Proportion of Protein Encoding Genes in Human Genome

• Human haploid genome contains 22 autosomes plus the X and Y chromosomes, and the chromosomes range from 45 to 279 Mb DNA

• The total haploid genome size is 3286 Mb (~3.3 x 109 bp)

• The chromatin comprises majority of genome, ~2.9 x 109 bp)

• Although about 25% of the human genome are for protein coding genes, the actual exons are only 1%

The Structure of Average

Human Gene

Different Classes of Repetitive DNA Sequences Human Genome

• Five classes of repetitive DNA sequences in human genome: Transposons, 45% of

thegenome, multiple copies

Pseudogenes, ~3,000 in all

Simple sdequence of repetitive DNA, ~3% of total DNA

Segmental duplications, 10 to 300 Kbthat have been duplicated, ~5%

Tandem repeat from blocks of one typeof sequence

Genomic DNA of Eukaryotic Organisms

Classes of DNA % of Human Genome

Protein coding genes

#/genome

~25,000 55

Tandemly repeated genes

U2 snRNA ~20 <0.001

rRNA ~300 0.4Repetitious DNA

Single sequence DNA variable ~6

Interpersed repeat ~3.26 45

Processed peusogenes 1-~100 ~0.4

Unclassified spacer DNA n.a. 25

Interspersed repeats: DNA transposons, LTR retrotransposons, Non-LTR retrotranspons, LINEs and SINEs

Satellite DNAs • When eukaryotic DNA is

centrifuged on a CsCl gradient, two components are observed: Main band: most of

the genomic DNA Satellite band: one or

multiple miner bands; they could be heavier or lighter than the main band

• The main band DNA has buoyant density of 1.701 g/cm with a G-C content of 42%, and minor band DNA has the buoyant density of 1.690 g/cm with a G-C content of 30%

Satellite DNAs Lie in Heterochromatin

• Highly repetitive DNA (simple sequence DNA): Satellite DNA is characterized by rapid rate of hybridization, consists of very short sequences repeated many times in tandem in large clusters. It is typically <10%

• In addition, multi-cellular eukaryotes have complex satellites with longer repeat units mainly in heterochromatic region

• In human, satellite DNA that consists of 171 bp repeats. -satellite DNA family has repeat units interspersed with a longer 3.3 repeats

• The tandem repeat DNA often has a distinct physical property that can be used to isolate. This physical property is the buoyant density which is lower than the buoyant density of the non-repetitive DNA

• Therefore, by equilibrium centrifugation on a CsCl gradient, the satellite DNA can be separated from the non-repetitive DNA

• The buoyant density of a duplex DNA depends on the G-C content according to the following formula

Buoyant density = 1.660 + 0.00098 (% G-C) g/cm-3

Most Simple-Sequence DNAs are Concentrated in Specific Chromosal Locations

• Repetitious DNA is present in the genome of eukaryotic cells Simple-sequence DNA or

called satellite DNA (6% of the human genome), size 14 to 500 bp

Microsatellite, 1-13 bp Interspersed repetitive DNA

dispersed throughout the genome (also called as transposable elements)

• By fluorescence in situ hybridization (FISH), the simple-sequence DNAs are localized near the centromeres and telomeres of mouse chromosome

• Centromeric heterchromatin---necessary for separation of chromosome to daughter cells

Diseases Associated with Microsatellites

• Microsatellite occasionally occur within transcription units

• At least 14 different types neuromuscular disease associate with microsatellite repeats in transcription unit of the gene

• Myotonic dystrophy and spinocerebellar ataxia are the examples. In myotonic dystrophy, the transcript of DMPK (dystrophia myotonica protein kinase) gene contain 1000 to 4000 repeats of the sequence of CUG in the 3’ end untranslated region that interfere with normal RNA processing and export of the mature RNA from nucleus to cytosol

Probing Minisatellite DNA by Southern Blot Hybridization

• DNA samples from three different individuals were digested with a restriction enzyme Hinf1, separated on agarose gels, transferred to nylon membranes and probed with three different radio-labeled minisatellites

• Different unique among individuals were observed with different individuals

• DNA Fingerprinting depends on differences in length of simple-sequence DNA

DNA Fingerprinting

• Minisatellite DNA: 14 to 100 bp repeat in a region of 1 to 5 kb region which makes up of 20-50 repeat units.

• A slight difference in the total length of the repeats can be detected by PCR analysis. This forms the basis of DNA fingerprinting

• This technique can be used in population studies, paternal or maternal identity test and criminal identification

Hybridization Kinetics of cDNAs to mRNAs

• The population complexity of mRNA isolated from a cell can be estimated by studying the kinetics of hybridization of mRNAs to their cDNAs

• The example given below is to compare the mRNA population differences of RNA isolated from estrogen treated trout liver to its untreated control: Isolate total RNA samples from livers of estrogen treated

fish and control (RNAind & RNAunind)

Prepare 32P-labeled cDNAind by reverse transcription

Set up hybridization between 32P-cDNAind and RNAunind at different Rot values (concentration of 32P-cDNAind x time)

Determine the amount of hybridization by treating the hybridization mixture with S1 nuclease

Hybridization between mRNA and cDNA

• This slide shows the hybridization profile of excess mRNA of chick oviduct with the cDNA of chick oviduct

• 32P-labelled cDNA synthesized from mRNA of chick oviduct and hybridized to excess mRNA of chick oviduct

• The result showed that there are three components of cDNA present at different frequencies hybridizing to chick oviduct mRNA:• About 50% of cDNA hybridizing

at a Rot1/2 of 0.0015

• About 15% of cDNA hybridizing at a Rot1/2 of 0.04

• About 35% of cDNA hybridizing at a Rot1/2 of 30

Rot Analysis of Excess mRNA and cDNA of Chick Oviduct Cells

• Total mRNA was isolated chick oviduct cells

• 32P-cDNA was prepared from the total mRNA by reverse transcription

• Rot analysis was conducted between radio labeled cDNA and excess amount of total mRNA

• The Rot analysis data showed that there are three components of sequences hybridizing to cDNA: The first component has the

characteristic of ovalbumin mRNA The second component has the

total complexity of 15 Kb (7-8 different mRNA of 2000 bases

The last component has the complexity of 26 Mb (~13,000 mRNA)

cDNA of estrogen-treated oviduct RNA hybridize to un-treated oviduct RNA

Number of Expressed Gene Measured by DNA Microarray Analysis

• Although Rot analysis can be used to reveal the complexity of mRNA population in any cell type, the number of gene expressed in any cell type can be determined by DNA microarray.

• In this assay, the mRNA isolated from the cell type of interest can be reversed transcribed to cDNA with tags

• The labeled cDNA is used to hybridize to an DNA array that contains entire number of genes of an organism of interest

• The genes that hybridized to the tagged cDNA can be visualized by scanning the array

• This slide shows results of DNA microarray analysis to determine expression of 12 genes in 59 individual breast tumor tissues of breastfed and breast-unfed women

• Genes highly expressed are shown “red”, lower expression in “blue”, equal expression in “grey”

• Genomics: Genome-wide analysis of gene structure and expression

Database of Genomes

• Using automated DNA sequencing techniques, methods for cloning DNA fragments on the order of 100 Kb in length, and computer algorithms to piece together the stored sequence data, scientists have determined vast amounts of DNA sequences including the entire genome of human, and many key experimental organisms e.g., the round-worm (C. elegans), fruit flies, mice, medaka and zebrafish etc.

• Since the cost of sequencing Mb of DNA is becoming very cheap, the genomes of many organisms are rapidly been determined

• There are two databases for human genome: The gene bank at the National Institute of Health at Bethesda,

MD The EMBL sequence base at the European Molecular Biology

Laboratory in Heidelberg, Germany

Comparison of the Regions of Human NF1 Protein with Ira Protein of S. cerivisiae

• Ira, the GTPase activating protein (GAP) modulate the GTPase activity of the monomeric G protein called ras. Both GAP and ras function to control cell replication and differentiation in response to signals from outside of the cell

Structural Motifs• When a protein shows no significant similarity to other

proteins with the BLAST (basic local alignment sequence tool) algorithm, it may nevertheless share a short sequence that is functionally important. Such short sequence recurring in many different proteins, referred to as structural motifs

Comparison of Related Sequences from Different Species Can Give Clues to Evolutionary

Relationship Among Proteins

• Paralogous: sequences that diverged as the result of gene duplication

• Orthologous: sequences that aroused because of speciation

By scanning for “Open Reading Frame” (ORF) ORF is defined as a stretch of DNA containing at least with 100

bp with a start codon and a stop codon of translation ORF analysis has identified at least more than 90% of the genes

in bacteria and yeast Both very short genes and long genes are missed by this

method For eukaryotic genes, due to the presence of multiple exons and

introns, scanning of the ORF is not a good method to identify genes. One needs to use computer programs to compare the genomic DNA sequences to cDNA sequences, splice site sequences and sequences of the expressed sequence tags (EST)

Another powerful method for identifying human genes is to compare the human genomic sequence with that of the mouse since human and mouse are sufficiently related to have most genes in common

Genes Can be Identified within Genomic DNA Sequences

Comparison of the Gene Number and Type of Proteins Encoded in the Genomes of different Organisms

Structural organization of eukaryotic chromosomes

Questions?

• How are DNA molecules organized within eukaryotic cells?

Total length of cellular DNA is up to a hundred thousand times of cell’s length and the packing of DNA is crucial to cell architecture

During interphase, DNA exists as a nucleoprotein complex, called as chromatin, dispersed throughout the nucleus

During mitosis, chromatin further compact into visible metaphase chromosomes which can be visualized under a microscope

Package of DNA in Microorganisms

• In viruses, genomic DNA molecule is associated with protein molecules and packaged inside the viral capsids. In bacteria and fungi, the genomic DNA is associated with proteins and is packaged as a compact mass inside the center of the cell. It is called as “nucleoid”

Electronmicrographs of Extended and Condensed Chromatin

Extended form Condensed form

• Nucleosomes: Chromatin isolated from nucleus under low salt and no divalent cation (Mg+2), the isolated chromatin resembles “beads on a string”. The beads are termed nucleosomes and the string termed linker

• Nucleosome is about 10 nm in diameter and is the primary structural unit of chromatin

Nuclear DNA Associate with Histones to form Chromatin

• When the DNA from eukaryotic nuclei was isolated in an isotonic buffer (i.e.,~0.15 M KCl), it is associated with an equal mass of proteins (histones [basic proteins]) as chromatin

• There are five different histones found in the chromatin, namely H1, H2A, H2B, H3 and H4. The sequences of four histones (H2A, H2B, H3 and H4) among different organisms are similar—suggesting these proteins fold into similar three dimensional

conformation

Four Classes of Histones

• Histones: Small basic chromosomal proteins rich in basic amino acids (lysine-rich and arginine-rich; positive charge)

Separation of Nucleosomes

• When the chromatin of nuclei is digested with micrococcal nuclease, discrete DNA fragments with definitive sizes can be recovered from the digested fraction

• When the digested materials were separated by gradient centrifugation, different size particles were isolated

• These particles are monomers, dimers, trimers and tetramers with different DNA size fragments

DNA fragments isolated from chromatin digested with micrococal nuclease (limited digestion)

Individual Nucleosomes Released by Digestion of Chromatin with Limited Amounts of Micrococcal Nuclease

100 nm

Structure of Nucleosomes

• The DNA in the nucleosomes are less susceptible to digestion by nuclease than that in the linkers

• By controlling the digestion with nuclease, free nucleosomes can be isolated

• A nucleosome consists of a protein core with DNA wrap around its surface like thread around the spool

• The protein core is an octomer containing two copies of H2A, H2B, H3 and H4

• Nucleosomes from all eukaryotes contain 147 bp of DNA wrapped slightly less than 2 turns around the protein core

• The length of the linker DNA is variable ranging from 8 to 114 bp, H1 associates with the linker DNA

Nucleosome

Dimer and Monomer of Nucleosones

• Mononucleosomes typically have ~200 bp DNA. End trimmed nucleosomes reduces the DNA to ~165 bp

• The core particles have DNA fragment of ~140 bp

• The linker DNA between two nucleosomes varies from 8 to 114 bp

Beads-on-a-String Structure of Chromatin

a, In the presence of histone H1

b, In the absence of histone H1

Structures of Nucleosome

• Mononucleosome is 10 nm particle which contains 200 bp DNA and histone octomer (consisting two copies of H2A, H2B, H3, and H4)

• DNA occupies most of the outer surface of the nucleosome

• Sequences on the DNA that lie on different turns around the nucleosome may be close together

Structures of the Four Core Histones

• Histone H2A and H2B each has 2 short -helix and one long -helix regions. These regions can form special folding

• Similarly, histone H3 and H4 also have 2 short -helix and one long -helix regions. These regions also form special folding as shown in next slide

Histone fold is formed by the three -helical regions of the core histones (a).The histone fold regions of two histone molecules allow them to associate to form a heterodimer (b)

Histone Fold

Structure of the Nucleosome

(a) Left: Top view of nucleosome; Right: Side view of nucleosome(b) Model of a nucleosome viewed from the top with histone shown

as ribbon diagram

Interaction of Histone 1 with Nucleosome

Histone 1 interacts with the central gyre of the DNA at the dyad axis, as well as with the linker DNA at either the entry of the exit

Histone Tails

The N- and C-termini of histone H2A, H2B, H3 and H4 project out from the core of the nucleosome. These regions are termed as histone tails

• Histone tails are chemically modified: acetylation, methylation, phosphorylation and ubiquitination to form histone codes

• The lysine residues in the histone tails of H3 and H4 can go through reversible acetylation and deacetylation. Acetylation in the lysine group will prevent the chromatin to condense

• Histone tails can also associate with other chromosomal proteins and thus affect transcription and DNA replication; this interaction can be affected by acetylation of the lysine or methylation of lysine and arginine in the histone tails

• Phosphorylation of serine residues on histones is another modification of histone tails

Modification of Histone Tails

Acetylation and Deacetylation of Histones

Enzymes responsible for acetylation of histones are histone acetyltransferases (HATs) [Gcn5 N-acetytransferases, p300/CBP family and MYST family]

Enzymes responsible for deacetylation of histones are histone deacetylases (HDACs)

The levels of acetylation of the N-terminus of histone is controlled by the balance between HAT and HDAC

Acetylation of Lysine Residues on Core Histones

Acetylation and Methylation of H3 and H4 Histones Enzyme

responsible for methylation is methylase

The methylated group can also be removed by specific enzymes

Lysine 9 in H3 can be either aceylated or methylated

Methylation of Lysine 9 in H3 can inhibits the acetylation of lysine 14

Histone Modifications

• Lysine -amino groups can be methylated several times, leading to preventing acetylation, and thus maintaining their positive charge.

• Arginine side chain can also be methylated

• Serine and threonine side chains can be reversily phosphrylated, introducing a negative charge

• A single 76-amino acid ubiquitin molecule can be reversibly added to a lysine C-terminal tails of H2A and 2B. Addition of ubiquitin to H2A and H2B could reduce the positive charge of histone H2A and H2B

Overall Modifications of Histone Tails

Important Terms

• Histone code: The situation of acetylation, methylation, phosphorylation, ubiqutination and sumolation of the histone tails. The pattern of modification affects the activity of the genes on the chromatin

• Changes of charges on the histones resulting from histone modification will also affect the binding of non-histone proteins to the chromatin. This is essential for gene expression

• Bromodomain: A protein domain of many transcription factors that recognizes lysine residues in the histone tail

• Chromodomain (chromatin organization modifier commonly found in modifier): Protein structural domain of about 60 amino acid residues found in association with remodeling and manipulation of chromatin

• Chromoshadow domain: A protein domain which is distantly related to the chromodomain. Proteins containing a chromoshadow domain include Su(var)205 (HP1) and mammalian modifier 1 and modifier 2

• PHD finger: Cys4-His-Cys3 motif of HAT3 . It relates to epigenetics

• TUDOR domain: A protein that recognizes methylated histones

Sites on Histone Modification and Functions

Most modified sites in histones have a single, specific type of modification, but some sites have more than one type of modification

Structure of Condensed Chromatin• When chromatin was extracted from

cells in isotonic buffers, it appears as fibers = 30 nm in diameter

• Nucleosomes in this type of chromatin are packaged into an irregular spiral or solenoid arrangement (about 6 nucleosome per turn)

• H1 histone is associated with each nucleosome

• Electron microscopic observation revealed that the 30 nm fiber is less uniform than the perfect solenoid

• Condensed chromatin may be very dynamic with regions occasionally unfolding and then refolding into solenoid structure

• Chromatin in chromosomal regions that are not being actively transcribed exists in condensed fiber form or in higher-order folded structures

Condensed chromatin in 30 nm fiber structure.

The solenoid model for the structure of the 30 nm chromatin fiber

Structure of the 30-nm Chromatin Fiber

• The structure of chromatin is highly conserved in different organisms

• The amino acid sequences of four histones (H2A, H2B, H3 and H4) are highly conserved between distantly related species

• The amino acid sequence of H1 varies more from organism to organism

• The similarity in histone structures suggests that they fold into very similar 3-dimensional conformations which were optimized for histone function early in evolution in a common ancestor of all modern eukaryotes

H1 and Other Modified Histones on the Formation of 30 nm fiber

(a). Positive charge of H4 and negative chage of H2A and H2B resulted in closer package of these two nucleosomes (b). Acetylation of Lysine 16 (K16) resulted in loss of positive in this position and lead to neutrolization of “+” and “-” attraction between these two nucleosomes

The globular region of the H1 (in pink color) interacts with the linker DNA as it exits the nucleosome and changes its path to produce a more compact structure

Further Condensation of Chromatin Fiber

The 30 nm chromatin fiber can be thrown into a series of loops, each of approximately 50,000-200,000 bp in size, producing a looped fiber with diameter of approximately 300 nm

Nonhistone Proteins, a Structural Scaffold for Long Chromatin Loops

• In addition to histones, non-histone proteins are also involved in organizing chromosome structure

• Chromosome scaffold: nonhistone proteins associated with the metaphase chromosome

• The shape of the scaffold is maintained even after the DNA on the metaphase chromosome is removed by DNase digestion

The loops of the 30 nm fiber are attached to the nuclear scaffold via the matrix- attachment regions (MAR s)

An individual loop can alter its structure from that of 30 nm fiber to the beads-on-a-string structure, allowing transcription to occur. These regions can be detected by its sensitivity to limited digestion by DNase I

Is DNA attached to the scaffold via specific sequence?

• MARs (Matrix Attachment Regions): DNA on the chromatin that attach to the scaffold proteins, It is also called as SARs (scaffold attachment regions)

• Figure in this slide shows how MARs can be isolated

• DNA sequence analysis revealed that there is no consensus sequence on the DNA that bind to scaffold matrix except the DNA is ~70% AT rich

• Furthermore, it has been found that MARs contain DNA that are cis-elements regulating transcription or topoisomerase II recognition site, suggesting that MARs may provide sites for topographical change in DNA

Heterochromatin

• Heterchromatin: A region of the chromatin that does not uncoil after mitosis. It is a dark staining area of the chromatin

• In mammalian cells, heterochromatin appears as darkly staining regions of the nucleus, often associated with the nuclear envelope

• Experiments of pulse labeling with 3H-uridine and autoradiography showed that most transcription occurs in regions of euchromatin and the nucleolus

• In general, heterochromatic regions are sites of inactive genes; however some transcribed genes have been located in regions of heterchromatin. Not all inactive genes and non-transcribed regions of DNA are visible as heterochromatin

Heterochromatin Versus Euchromatin

Bone marrow stem cell

Dark stained regions show heterochromatin and the light stained regions show euchromatin

The modifications of histone N-terminal tails in the heterochromatin and euchromatin of the histone H3

Probing Nontranscribed Genes

from Transcribed Genes

• Transcribable genes are sensitive to limited digestion by DNase I

• Nuclei from chicken embryo erythroblasts at 14 days and undifferentiated chicken lymphoblastic leukemia cells were exposed to increasing amount of DNase I, and DNA isolated from the digested nuclei

• DNA digested with Bsm H1, separated by agrose gel, transferred to nylon membrane and probed with the 4.5 Kb globin gene fragment

Model for the Formation of Heterochromatin by Binding to

Histone H3 Trimethylated at Lysine 9

• HP1: Heterochromatin protein 1, contribute to the condensation of heterochromatin by binding to the N-terminus lysine 9 of histone 3 after it is trimethylated

• The HP1 bound histone 3 will continue to associate among each other (HP1 oligomerization) and cause chromatin aggregation and condensation

• Heterochromatin condensation can spread along a chromosome because HP1 binds a histone methytransferase (HMT) that methylates lysine 9 of histone H3. This creates a binding site for HP1 on the neighboring nucleosome. The spreading process continues until a “boundary element” is encountered

Chromatin Contains Small Amounts of Nonhistone Proteins

• Besides histones and scaffold proteins, chromatins also contain small amounts of non-histone proteins

• High mobility group (HMG) proteins: Proteins can bind to transcription factors. In yeast, removal

of HMG genes will result in expression of other genes allover the genome

HMG proteins are found to bind with transcription factors and thus stabilizing the transcription factor complex to regulate the expression of genes

• DNA binding transcription factors: Regulate the transcription of genes

Model for the Folding of the 30-

nm Chromatin Fiber in a

Metaphase Chromosome

Model for the Packing of Chromatin and the Chromosome Scaffold in Metaphase

Overview of the Structure of Genes & Chromosomes

Eukaryotic Chromosomes Contain One Linear DNA Molecule

• Since the largest intact DNA molecules in lower eukaryotes can be extracted from the cells, it indicates that each chromosome contains a single DNA molecule DNA molecules (2.3 x 105 to 1.5 x 106 bp ) from S. cerevisiae can be

separated by pulse-field gel electrophoresis Drosophila genomic DNA (6 x 107 to 1 x 108 bp) can be readily

analyzed The largest DNA of human chromosomes (2.8 x 108 bp) are too large

to be extracted as intact molecules

• In summary, eukaryotic chromosome is a linear structure composed of an immensely long, single DNA molecule that is wound around histone octomers about 200 bp, forming strings of closed packed nucleosomes. The nucleosomes fold to forma 30-nm chromatin fibers. The fibers attach to scaffold proteins to form loops. In addition, thousands of transcription factors and HMG proteins are also found

Morphology and functional elements of eukaryotic chromosome

Microscopic Appearance of a Typical Metaphase Chromosome

• Colchisine or colcemid: compound that destroy microtubule and thus leaving the two sister chromatid attach together in metaphase

• Karyotype: number, size and shapes of metaphase chromosomes

Karyotypes of Human Chromosomes

• In non-dividing cells, chromosomes are not visible• During mitosis or meiosis, chromosomes condensed and become

visible by light microscopy• During metaphase of mitosis, each chromosome is in the form of

divalent chromatids attached at the centomer

Giemsa Staining of Chromosomes

• G bands: Giemsa staining of human chromosomes which will give specific patterns G-bandings

• G-bands correspond to large regions of the human genome that have low “G+C” content

• R Bands: R bands are produced by treating human chromosomes with hot alkaline solution and subsequent staining with Giemsa reagent. The pattern of R-bands is opposite to the pattern of G-bands

• R-bands and G-bands are used to identify chromosome aberration by cytogeneticist

• Chromosome painting: Revealing chromosomes by in situ hybridization of chromosome with fluorescence probes (FISH). It can be in single or multiple color

Giemsa Staining of Chromosomes

Using G-Banding and Multicolor FISH to Reveal Transloaction

Translocation between chromosome 9 and chromosome 22 to result in Philadelphia chromosome in nearly all myelgenous leukemia patients

Banding on Drosophila Polytene Salivary Gland Chromosomes

Band revealed by in situ hybridization

• This is caused by DNA amplification but the daughter chromosomes do not separate.

Interphase Polytene Chromosome in the Salivary Gland of Drosophila melanogaster Arise by DNA Amplification

Functional Elements Required for Replication and Stable Inheritance of Chromosomes

• Although chromosomes differ in length and number between species, the chromosomes behave similarly at the time of cell division

• Three functional elements are required for any eukaryotic cells to replicate and segregate correctly: Replication origins The centromer Two telomeres

• Experiments described in next few slides are designed to demonstrate the importance of these functional elements

Yeast Transfection Experiment

ARS is (Automomas replication sequence) is required for DNA replication in Yeast


CEN (Yeast centromere sequence) is required for proper segregation


TEL (Telemere sequence) is required for chromsomal DNA replication

Comparison of CEN Sequence between Yeast and Drosophila

• Centromeres from yeast and Drosophila vary greatly in length• Region I and Region III are short and sequences are conserved• Region II, although with various sequence, is fairly constant in

length and is rich in AT content• While region I and II bind to about 30 proteins and also bind to

microtubule of the spindle apparatus during mitosis, region II is bound to a nucleosome with H3 been replaced by a variant form of H3 (e.g., CENP-A in human)

Yeast Artificial Chromosomes Serve as Cloning Vector to Clone Megabase DNA Fragments

• Yeast artificial chromosome (YAC) consists of TEL sequence from yeast, yeast CEN and ARS plus selection marker and DNA to be cloned to make up more than 50 k

• Only 1 daughter cell out of 1,000 to 10,000 failing to receive an artificial chromosome

• The successful propagation of YACs and studies presented earlier strongly support the conclusion that yeast chromosomes, and probably all eukaryotic chromosomes are linear double-stranded DNA molecules containing special regions that ensure replication and proper segregation

Action of Telomerase to Prevent Shorting of Chromosomes

• Telomeres of several organisms are shown to contain repetitive oligmers with a high G content in the 3’end at the end of the chromosome. The repeat sequence is TTAGGG

• The lengths of repeats are several bp in protozoans and several thousand bp in vertebrates

• The region is bound by specific proteins that both protect the ends of the linear chromosomes from exonuclease digestion

• Synthesis of DNA in the lagging strand can not reach completion like leading strand, and results in shortening of the chromosomes. Telomerase can fill in the missing sequence in the lagging strand, thus maintaining the proper length of chromosome

• Reading List:– Maintenance of chromosomes by telomeres and telomerase. The

Nobel Prize in Physiology or Medicine 2009

Assigned Readings [III]:

1. Repeated Sequence in DNA2. Integration of Cot analysis, DNZ cloning and high-throughput

sequencing facilitate genome characterization and gene discovery

3. Initial sequencing and analysis of human genome4. Finishing the eukaryotic sequence of human genome5. The functional genomic of non-coding RNA6. Maintenance of chromosomes by telomeres and telomerase, A

Nobel Lecture7. DNA methylation and histone modifications: teaming up the

silence genes8. Histone lysine demethylases: emerging roles in development,

physiology and disease9. The key to development : interpreting the histone code? 10. Histones

Download - [III] Genes, Genomics, and Chromosomes Eukaryotic gene structure, Cot analysis, Rot analyses, chromosomal organization of genes and noncoding DNA Genomics:

Top Related