genetic control
TRANSCRIPT
Page 1 of 16
Genetic control
(a) describe the structure of RNA and DNA and explain the importance of base pairing and hydrogen bonding
DNA:- Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, like a recipe or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Chemically, DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore anti-parallel. Attached to each sugar is one of four types of molecules called bases. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription.
STRUCTURE:- Base pairing Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing. Here, purines form hydrogen bonds to pyrimidines, with a bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high temperature. As a result of this complementarities, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms.
Sense and antisense A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA copy that is translated into protein. The sequence on the opposite strand is called the "antisense" sequence. Both sense and antisense sequences can exist on different parts of the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear. One proposal is that antisense RNAs are involved in regulating gene expression through RNA-RNA base pairing. A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, blur the distinction between sense and antisense strands by having overlapping genes. In these cases, some DNA sequences do double duty, encoding one protein when read along one strand, and a second protein when read in the opposite direction along the other strand. In bacteria, this overlap may be involved in the regulation of gene transcription, while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.
Page 2 of 16
Super coiling DNA super coil DNA can be twisted like a rope in a process called DNA super coiling. With DNA in its "relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base pairs, but if the DNA is twisted the strands become more tightly or more loosely wound .If the DNA is twisted in the direction of the helix, this is positive super coiling, and the bases are held more tightly together. If they are twisted in the opposite direction, this is negative super coiling, and the bases come apart more easily. In nature, most DNA has slight negative super coiling that is introduced by enzymes called topoisomerases. These enzymes are also needed to relieve the twisting stresses introduced into DNA strands during processes such as transcription and DNA replication.
Importance of Base Pairing
Deoxy-ribonucleic Acid, DNA as we know it to be, has a double helical structure and is made up of small molecules
called nucleotides. (DNA can in other words be referred to as a nucleotide polymer.)Each nucleotide comprises a 5-
carbon sugar, a nitrogenous base and a phosphate group. These are joined together to form a ''twisted-ladder''.
One complete turn of the twisted ladder (i.e the double helical structure), which could be likened to the pitch of a
screw, has an approx. length of 3.4nm and a 2nm width. This is achieved chiefly by the complementarities of the
base pairing that occurs with the bases in DNA. (Remember, Guanine always bonds to Cytosine with 3 hydrogen
bonds and Adenine to Thymine with two hydrogen bonds!). In RNA however, the base thymine is replaced with
Uracil. This means that the base that is complementary to Adenine on a DNA strand , would be Uracil on the RNA
strand. The main significance of base pairing is to enable the maintenance of the gene pool within a species. You
will greatly appreciate that the DNA sequence that codes for a certain protein must be constant so as to make sure
that it codes for that same protein when it is Translated. For example, if the base sequence for the coding of the
hormone insulin were to change, the body would not be able to produce normal insulin and this could result in a
condition such as diabetes! This is the significance of base pairing; to maintain genetic stability and thus the
genetic characteristics that pass on to the offspring of a species.
RNA
Page 3 of 16
Ribonucleic acid (RNA) is a biologically important type of molecule that consists of a long chain of nucleotide units.
Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate. RNA is very similar to DNA, but
differs in a few important structural details: in the cell, RNA is usually single-stranded, while DNA is usually double-
stranded; RNA nucleotides contain ribose while DNA contains deoxyribose (a type of ribose that lacks one oxygen
atom); and RNA has the base uracil rather than thymine that is present in DNA. RNA is transcribed from DNA by
enzymes called RNA polymerases and is generally further processed by other enzymes. RNA is central to protein
synthesis. Here, a type of RNA called messenger RNA carries information from DNA to structures called ribosomes.
These ribosomes are made from proteins and ribosomal RNAs, which come together to form a molecular machine
that can read messenger RNAs and translate the information they carry into proteins. There are many RNAs with
other roles – in particular regulating which genes are expressed, but also as the genomes of most viruses.
STRUCTRE
Each nucleotide in RNA contains a ribose sugar, with carbons numbered 1' through 5'. A base is attached to the 1'
position, generally adenine (A), cytosine (C), guanine (G) or uracil (U). Adenine and guanine are purines, cytosine
and uracil are pyrimidines. A phosphate group is attached to the 3' position of one ribose and the 5' position of the
next. The phosphate groups have a negative charge each at physiological pH, making RNA a charged molecule
(polyanion). The bases may form hydrogen bonds between cytosine and guanine, between adenine and uracil and
between guanine and uracil. However other interactions are possible, such as a group of adenine bases binding to
each other in a bulge, or the GNRA tetraloop that has a guanine–adenine base-pair.
Chemical structure of RNA
An important structural feature of RNA that distinguishes it from DNA is the presence of a hydroxyl group at the 2'
position of the ribose sugar. The presence of this functional group causes the helix to adopt the A-form geometry
rather than the B-form most commonly observed in DNA.This results in a very deep and narrow major groove and
a shallow and wide minor groove. A second consequence of the presence of the 2'-hydroxyl group is that in
conformationally flexible regions of an RNA molecule (that is, not involved in formation of a double helix), it can
chemically attack the adjacent phosphodiester bond to cleave the backbone.
b) explain how DNA replicates semi-conservatively during interphase
Replication is the process by which a cell copies its DNA prior to division. In humans, for example, each
parent cell must copy its entire six billion base pairs of DNA before undergoing mitosis. The molecular
details of DNA replication are described elsewhere, and they were not known until some time after
Watson and Crick's discovery. In fact, before such details could be determined, scientists were faced with
a more fundamental research concern. Specifically, they wanted to know the overall nature of the process
by which DNA replication occurs.
Watson and Crick themselves had specific ideas about DNA replication, and these ideas were based on
the structure of the DNA molecule. In particular, the duo hypothesized that replication occurs in a
"semiconservative" fashion. According to the semiconservative replication model, which is illustrated in
Figure 1, the two original DNA strands (i.e., the two complementary halves of the double helix) separate
during replication; each strand then serves as a template for a new DNA strand, which means that each
newly synthesized double helix is a combination of one old (or original) and one new DNA strand.
Conceptually, semiconservative replication made sense in light of the double helix structural model of
DNA, in particular its complementary nature and the fact that adenine always pairs with thymine and
cytosine always pairs with guanine. Looking at this model, it is easy to imagine that during replication,
Page 4 of 16
each strand serves as a template for the synthesis of a new strand, with complementary bases being
added in the order indicated.
Semiconservative replication was not the only model of DNA replication proposed during the mid-1950s,
however. In fact, two other prominent hypotheses were put also forth: conservative replication and
dispersive replication. According to the conservative replication model, the entire original DNA double
helix serves as a template for a new double helix, such that each round of cell division produces one
daughter cell with a completely new DNA double helix and another daughter cell with a completely intact
old (or original) DNA double helix. On the other hand, in the dispersive replication model, the original DNA
double helix breaks apart into fragments, and each fragment then serves as a template for a new DNA
fragment. As a result, every cell division produces two cells with varying amounts of old and new DNA
Figure 1: Three proposed models of replication are conservative replication, dispersive replication, and
semiconservative replication.
Making Predictions Based on the Models
When these three models were first proposed, scientists had few clues about what might be occurring at
the molecular level during DNA replication. Fortunately, the models yielded different predictions about the
distribution of old versus new DNA in newly divided cells, no matter what the underlying molecular
mechanisms. These predictions were as follows:
According to the semiconservative model, after one round of replication, every new DNA double helix
would be a hybrid that consisted of one strand of old DNA bound to one strand of newly synthesized
DNA. Then, during the second round of replication, the hybrids would separate, and each strand would
pair with a newly synthesized strand. Afterward, only half of the new DNA double helices would be
hybrids; the other half would be completely new. Every subsequent round of replication therefore would
result in fewer hybrids and more completely new double helices.
According to the conservative model, after one round of replication, half of the new DNA double helices
would be composed of completely old, or original, DNA, and the other half would be completely new.
Then, during the second round of replication, each double helix would be copied in its entirety. Afterward,
one-quarter of the double helices would be completely old, and three-quarters would be completely new.
Thus, each subsequent round of replication would result in a greater proportion of completely new DNA
double helices, while the number of completely original DNA double helices would remain constant.
Page 5 of 16
According to the dispersive model, every round of replication would result in hybrids, or DNA double
helices that are part original DNA and part new DNA. Each subsequent round of replication would then
produce double helices with greater amounts of new DNA.
(c) state that a gene is a sequence of nucleotides as part of a DNA
molecule, which codes for a polypeptide.
A gene is a unit of heredity in a living organism. It is normally a stretch of DNA that codes for a type of
protein or for an RNA chain that has a function in the organism. All living things depend on genes, as they
specify all proteins and functional RNA chains. Genes hold the information to build and maintain an
organism's cells and pass genetic traits to offspring. A modern working definition of a gene is "a locatable
region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory
regions, transcribed regions, and or other functional sequence regions ".Colloquial usage of the term
gene (e.g. "good genes, "hair color gene") may actually refer to an allele: a gene is the basic instruction, a
sequence of nucleic acid (DNA or, in the case of certain viruses RNA), while an allele is one variant of
that instruction. Thus, when the mainstream press refers to "having" a "gene" for a specific trait, the press
is wrong. All people would have the gene in question, but certain people will have a specific allele of that
gene, which results in the trait.
The notion of a gene is evolving with the science of genetics, which began when Gregor Mendel noticed
that biological variations are inherited from parent organisms as specific, discrete traits. The biological
entity responsible for defining traits was later termed a gene, but the biological basis for inheritance
remained unknown until DNA was identified as the genetic material in the 1940s. All organisms have
many genes corresponding to many different biological traits, some of which are immediately visible, such
as eye color or number of limbs, and some of which are not, such as blood type or increased risk for
specific diseases, or the thousands of basic biochemical processes that comprise life.
The chemical structure of a four-base fragment of a DNA double helix.
The vast majority of living organisms encode their genes in long strands of DNA. DNA (deoxyribonucleic
acid) consists of a chain made from four types of nucleotide subunits, each composed of: a five-carbon
sugar (2'-deoxyribose), a phosphate group, and one of the four bases adenine, cytosine, guanine, and
thymine. The most common form of DNA in a cell is in a double helix structure, in which two individual
DNA strands twist around each other in a right-handed spiral. In this structure, the base pairing rules
specify that guanine pairs with cytosine and adenine pairs with thymine. The base pairing between
guanine and cytosine forms three hydrogen bonds, whereas the base pairing between adenine and
thymine forms two hydrogen bonds. The two strands in a double helix must therefore be complementary,
that is, their bases must align such that the adenines of one strand are paired with the thymines of the
other strand, and so on.
Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality.
One end of a DNA polymer contains an exposed hydroxyl group on the deoxyribose; this is known as the
3' end of the molecule. The other end contains an exposed phosphate group; this is the 5' end. The
directionality of DNA is vitally important to many cellular processes, since double helices are necessarily
directional (a strand running 5'-3' pairs with a complementary strand running 3'-5'), and processes such
as DNA replication occur in only one direction. All nucleic acid synthesis in a cell occurs in the 5'-3'
direction, because new monomers are added via a dehydration reaction that uses the exposed 3'
hydroxyl as a nucleophile.
Page 6 of 16
The expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of
nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than
deoxyribose. RNA also contains the base uracil in place of thymine. RNA molecules are less stable than
DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-
nucleotide sequences called codons, which serve as the words in the genetic language. The genetic code
specifies the correspondence during protein translation between codons and amino acids. The genetic
code is nearly the same for all known organisms.
This stylistic diagram shows a gene in relation to the double helix structure of DNA and to a chromosome
(right). The chromosome is X-shaped because it is dividing. Introns are regions often found in eukaryote
genes that are removed in the splicing process (after the DNA is transcribed into RNA): Only the exons
encode the protein. This diagram labels a region of only 50 or so bases as a gene. In reality, most genes
are hundreds of times larger.
(d) describe the way in which the nucleotide sequence codes for the amino acid sequence in a polypeptide with reference to the nucleotide sequence for HbA (normal) and HbS (sickle cell) alleles of the gene for the β-haemoglobin polypeptide
The genetic code is the set of rules by which information encoded in genetic material (DNA or mRNA
sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a
mapping between tri-nucleotide sequences, called codons, and amino acids. With some exceptions, a
triplet codon in a nucleic acid sequence specifies a single amino acid. Because the vast majority of genes
are encoded with exactly the same code (see the RNA codon table), this particular code is often referred
to as the canonical or standard genetic code, or simply the genetic code, though in fact there are many
variant codes. For example, protein synthesis in human mitochondria relies on a genetic code that differs
from the standard genetic code.
Not all genetic information is stored using the genetic code. All organisms' DNA contains regulatory
sequences, intergenic segments, and chromosomal structural areas that can contribute greatly to
phenotype. Those elements operate under sets of rules that are distinct from the codon-to-amino acid
paradigm underlying the genetic code.
Page 7 of 16
A series of codons in part of a mRNA molecule. Each codon consists of three nucleotides, usually
representing a single amino acid.
Transfer of information via the genetic code
The genome of an organism is inscribed in DNA, or in the case of some viruses, RNA. The portion of the
genome that codes for a protein or an RNA is referred to as a gene. Those genes that code for proteins
are composed of tri-nucleotide units called codons, each coding for a single amino acid. Each nucleotide
sub-unit consists of a phosphate, deoxyribose sugar and one of the 4 nitrogenous nucleobases. The
purine bases adenine (A) and guanine (G) are larger and consist of two aromatic rings. The pyrimidine
bases cytosine (C) and thymine (T) are smaller and consist of only one aromatic ring. In the double-helix
configuration, two strands of DNA are joined to each other by hydrogen bonds in an arrangement known
as base pairing. These bonds almost always form between an adenine base on one strand and a thymine
on the other strand and between a cytosine base on one strand and a guanine base on the other. This
means that the number of A and T residues will be the same in a given double helix, as will the number of
G and C residues. In RNA, thymine (T) is replaced by uracil (U), and the deoxyribose is substituted by
ribose.
Each protein-coding gene is transcribed into a template molecule of the related polymer RNA, known as
messenger RNA or mRNA. This, in turn, is translated on the ribosome into an amino acid chain or
polypeptide. The process of translation requires transfer RNAs specific for individual amino acids with the
amino acids covalently attached to them, guanosine triphosphate as an energy source, and a number of
translation factors. tRNAs have anticodons complementary to the codons in mRNA and can be "charged"
covalently with amino acids at their 3' terminal CCA ends. Individual tRNAs are charged with specific
amino acids by enzymes known as aminoacyl tRNA synthetases, which have high specificity for both their
cognate amino acids and tRNAs. The high specificity of these enzymes is a major reason why the fidelity
of protein translation is maintained.
There are 4³ = 64 different codon combinations possible with a triplet codon of three nucleotides; all 64
codons are assigned for either amino acids or stop signals during translation. If, for example, an RNA
sequence, UUUAAACCC is considered and the reading frame starts with the first U (by convention, 5' to
3'), there are three codons, namely, UUU, AAA and CCC, each of which specifies one amino acid. This
RNA sequence will be translated into an amino acid sequence, three amino acids long. A comparison
may be made with computer science, where the codon is similar to a word, which is the standard "chunk"
Page 8 of 16
The nucleotide sequence for HbS(SICKLE CELL)
Sickle-cell gene mutation probably arose spontaneously in different geographic areas, as suggested by
restriction endonuclease analysis. These variants are known as Cameroon, Senegal, Benin, Bantu and
Saudi-Asian. Their clinical importance springs from the fact that some of them are associated with higher
HbF levels, e.g., Senegal and Saudi-Asian variants, and tend to have milder disease.
In people heterozygous for HgbS (carriers of sickling haemoglobin), the polymerisation problems are
minor, because the normal allele is able to produce over 50% of the haemoglobin. In people homozygous
for HgbS, the presence of long-chain polymers of HbS distort the shape of the red blood cell from a
smooth donut-like shape to ragged and full of spikes, making it fragile and susceptible to breaking within
capillaries. Carriers have symptoms only if they are deprived of oxygen (for example, while climbing a
mountain) or while severely dehydrated. Under normal circumstances, these painful crises occur about
0.8 times per year per patient.[citation needed] The sickle-cell disease occurs when the seventh amino
acid (if the initial methionine is counted), glutamic acid, is replaced by valine to change its structure and
function.
The gene defect is a known mutation of a single nucleotide (see single-nucleotide polymorphism - SNP)
(A to T) of the β-globin gene, which results in glutamate being substituted by valine at position 6.
Haemoglobin S with this mutation are referred to as HbS, as opposed to the normal adult HbA. The
genetic disorder is due to the mutation of a single nucleotide, from a GAG to GTG codon mutation. This is
normally a benign mutation, causing no apparent effects on the secondary, tertiary, or quaternary
structure of haemoglobin in conditions of normal oxygen concentration. What it does allow for, under
conditions of low oxygen concentration, is the polymerization of the HbS itself. The deoxy form of
haemoglobin exposes a hydrophobic patch on the protein between the E and F helices. The hydrophobic
residues of the valine at position 6 of the beta chain in haemoglobin are able to associate with the
hydrophobic patch, causing haemoglobin S molecules to aggregate and form fibrous precipitates.
The allele responsible for sickle-cell anaemia is autosomal recessive and can be found on the short arm
of chromosome 11. A person that receives the defective gene from both father and mother develops the
disease; a person that receives one defective and one healthy allele remains healthy, but can pass on the
disease and is known as a carrier. If two parents who are carriers have a child, there is a 1-in-4 chance of
their child's developing the disease and a 1-in-2 chance of their child's being just a carrier. Since the gene
is incompletely recessive, carriers can produce a few sickled red blood cells, not enough to cause
symptoms, but enough to give resistance to malaria. Because of this, heterozygotes have a higher fitness
than either of the homozygotes. This is known as heterozygote advantage.
Due to the adaptive advantage of the heterozygote, the disease is still prevalent, especially among
people with recent ancestry in malaria-stricken areas, such as Africa, the Mediterranean, India and the
Middle East. Malaria was historically endemic to southern Europe, but it was declared eradicated in the
mid-20th century, with the exception of rare sporadic cases.
Inheritance
Sickle-cell conditions are inherited from parents in much the same way as blood type, hair colour and
texture, eye colour, and other physical traits.
The types of haemoglobin a person makes in the red blood cells depend on what haemoglobin genes are
inherited from his parents.
Page 9 of 16
If one parent has sickle-cell anaemia (SS) and the other has sickle-cell trait (AS), there is a 50% chance
of a child's having sickle-cell disease (SS) and a 50% chance of a child's having sickle-cell trait (AS).
When both parents have sickle-cell trait (AS), a child has a 25% chance (1 of 4) of sickle-cell disease
(SS), as shown in the diagram.
(e) describe how the information on DNA is used during transcription and translation to construct polypeptides, including the role of messenger RNA (mRNA), transfer RNA (tRNA) and
the ribosome
TRANSCRIPTION
ranscription is the process of creating an equivalent RNA copy of a sequence of DNA[1]. Both RNA and
DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be
converted back and forth from DNA to RNA in the presence of the correct enzymes. During transcription,
a DNA sequence is read by RNA polymerase, which produces a complementary, antiparallel RNA strand.
As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all
instances where thymine (T) would have occurred in a DNA complement.
Transcription is the first step leading to gene expression. The stretch of DNA transcribed into an RNA
molecule is called a transcription unit and encodes at least one gene. If the gene transcribed encodes for
a protein, the result of transcription is messenger RNA (mRNA), which will then be used to create that
protein via the process of translation. Alternatively, the transcribed gene may encode for either ribosomal
RNA (rRNA) or transfer RNA (tRNA), other components of the protein-assembly process, or other
ribozymes.
A DNA transcription unit encoding for a protein contains not only the sequence that will eventually be
directly translated into the protein (the coding sequence) but also regulatory sequences that direct and
regulate the synthesis of that protein. The regulatory sequence before (upstream from) the coding
sequence is called the five prime untranslated region (5'UTR), and the sequence following (downstream
from) the coding sequence is called the three prime untranslated region (3'UTR).
Page 10 of 16
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls
for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.[2]
As in DNA replication, DNA is read from 3' → 5' during transcription. Meanwhile, the complementary RNA
is created from the 5' → 3' direction. Although DNA is arranged as two antiparallel strands in a double
helix, only one of the two DNA strands, called the template strand, is used for transcription. This is
because RNA is only single-stranded, as opposed to double-stranded DNA. The other DNA strand is
called the coding strand, because its sequence is the same as the newly created RNA transcript (except
for the substitution of uracil for thymine). The use of only the 3' → 5' strand eliminates the need for the
Okazaki fragments seen in DNA replication.
Transcription is divided into 5 stages: pre-initiation, initiation, promoter clearance, elongation and
termination.
Major steps
Pre-initiation
In eukaryotes, RNA polymerase, and therefore the initiation of transcription, requires the presence of a
core promoter sequence in the DNA. Promoters are regions of DNA which promote transcription and in
eukaryotes, are found at -30, -75 and -90 base pairs upstream from the start site of transcription. Core
promoters are sequences within the promoter which are essential for transcription initiation. RNA
polymerase is able to bind to core promoters in the presence of various specific transcription factors.
The most common type of core promoter in eukaryotes is a short DNA sequence known as a TATA box,
found -30 base pairs from the start site of transcription. The TATA box, as a core promoter, is the binding
site for a transcription factor known as TATA binding protein (TBP), which is itself a subunit of another
transcription factor, called Transcription Factor II D (TFIID). After TFIID binds to the TATA box via the
TBP, five more transcription factors and RNA polymerase combine around the TATA box in a series of
stages to form a preinitiation complex. One transcription factor, DNA helicase, has helicase activity and
so is involved in the separating of opposing strands of double-stranded DNA to provide access to a
single-stranded DNA template. However, only a low, or basal, rate of transcription is driven by the
preinitiation complex alone. Other proteins known as activators and repressors, along with any associated
coactivators or corepressors, are responsible for modulating transcription rate.
The transcription preinitiation in archaea is essentially homologous to that of eukaryotes, but is much less
complex.[3] The archaeal preinitiation complex assembles at a TATA-box binding site; however, in
archaea, this complex is composed of only RNA polymerase II, TBP, and TFB (the archaeal homologue
of eukaryotic transcription factor II B (TFIIB)).[4][5]
Initiation
Simple diagram of transcription initiation. RNAP = RNA polymerase
In bacteria, transcription begins with the binding of RNA polymerase to the promoter in DNA. RNA
polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω
subunit. At the start of initiation, the core enzyme is associated with a sigma factor (number 70) that aids
in finding the appropriate -35 and -10 base pairs downstream of promoter sequences.
Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly
recognize the core promoter sequences. Instead, a collection of proteins called transcription factors
Page 11 of 16
mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription
factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of
transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex.
Transcription in the archaea domain is similar to transcription in eukaryotes.[6]
Promoter clearance
After the first bond is synthesized, the RNA polymerase must clear the promoter. During this time there is
a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive
initiation and is common for both eukaryotes and prokaroytes[7]. Abortive initiation continues to occur
until the σ factor rearranges, resulting in the transcription elongation complex (which gives a 35 bp
moving footprint). The σ factor is released before 80 nucleotides of mRNA are synthesized[8]. Once the
transcript reaches approximately 23 nucleotides, it no longer slips and elongation can occur. This, like
most of the remainder of transcription, is an energy-dependent process, consuming adenosine
triphosphate (ATP).
Promoter clearance coincides with phosphorylation of serine 5 on the carboxy terminal domain of RNA
Pol in eukaryotes, which is phosphorylated by TFIIH.
Elongation
Simple diagram of transcription elongation
One strand of DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis.
As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing
complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the
template strand from 3' → 5', the coding (non-template) strand and newly-formed RNA can also be used
as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA
molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils,
and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less
oxygen atom) in its sugar-phosphate backbone).
Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA
template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA
molecules can be rapidly produced from a single copy of a gene.
Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In
eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing
factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.
Termination
Simple diagram of transcription termination
Bacteria use two different strategies for transcription termination. In Rho-independent transcription
termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C rich hairpin
loop followed by a run of Us, which makes it detach from the DNA template. In the "Rho-dependent" type
of termination, a protein factor called "Rho" destabilizes the interaction between the template and the
mRNA, thus releasing the newly synthesized mRNA from the elongation complex.
Page 12 of 16
Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript
followed by template-independent addition of As at its new 3' end, in a process called polyadenylation
Simple diagram of transcription initiation. RNAP = RNA polymerase(figure-1)
Simple diagram of transcription elongation(f-2)
Simple diagram of transcription termination(f-3)
Translation
Translation is the first stage of protein biosynthesis (part of the overall process of gene expression). In
translation, messenger RNA (mRNA) produced by transcription is decoded by the ribosome to produce a
specific amino acid chain, or polypeptide, that will later fold into an active protein. Translation occurs in
the cell's cytoplasm, where the large and small subunits of the ribosome are located, and bind to the
mRNA. The ribosome facilitates decoding by inducing the binding of tRNAs with complementary
anticodon sequences to that of the mRNA. The tRNAs carry specific amino acids that are chained
together into a polypeptide as the mRNA passes through and is "read" by the ribosome in a fashion
reminiscent to that of a stock ticker and ticker tape.
A ribosome translating a protein that is secreted into the endoplasmic reticulum. tRNAs are colored dark
blue.
In many instances, the entire ribosome/mRNA complex will bind to the outer membrane of the rough
endoplasmic reticulum and release the nascent protein polypeptide inside for later vesicle transport and
secretion outside of the cell. Many types of transcribed RNA, such as transfer RNA, ribosomal RNA, and
small nuclear RNA, do not undergo translation into proteins.
Page 13 of 16
Translation proceeds in four phases: activation, initiation, elongation and termination (all describing the
growth of the amino acid chain, or polypeptide that is the product of translation). Amino acids are brought
to ribosomes and assembled into proteins.
In activation, the correct amino acid is covalently bonded to the correct transfer RNA (tRNA). The amino
acid is joined by its carboxyl group to the 3' OH of the tRNA by a peptide bond. When the tRNA has an
amino acid linked to it, it is termed "charged". Initiation involves the small subunit of the ribosome binding
to 5' end of mRNA with the help of initiation factors (IF). Termination of the polypeptide happens when the
A site of the ribosome faces a stop codon (UAA, UAG, or UGA). No tRNA can recognize or bind to this
codon. Instead, the stop codon induces the binding of a release factor protein that prompts the
disassembly of the entire ribosome/mRNA complex.
A number of antibiotics act by inhibiting translation; these include anisomycin, cycloheximide,
chloramphenicol, tetracycline, streptomycin, erythromycin, and puromycin, among others. Prokaryotic
ribosomes have a different structure from that of eukaryotic ribosomes, and thus antibiotics can
specifically target bacterial infections without any detriment to a eukaryotic host's cells.
Basic mechanisms
Further information: Prokaryotic translation and Eukaryotic translation
Diagram showing the translation of mRNA and the synthesis of proteins by a ribosome.
Diagram showing the translation of mRNA and the synthesis of proteins by a ribosome.
The mRNA carries genetic information encoded as a ribonucleotide sequence from the chromosomes to
the ribosomes. The ribonucleotides are "read" by translational machinery in a sequence of nucleotide
triplets called codons. Each of those triplets codes for a specific amino acid.
The ribosome molecules translate this code to a specific sequence of amino acids. The ribosome is a
multisubunit structure containing rRNA and proteins. It is the "factory" where amino acids are assembled
into proteins. tRNAs are small noncoding RNA chains (74-93 nucleotides) that transport amino acids to
the ribosome. tRNAs have a site for amino acid attachment, and a site called an anticodon. The
anticodon is an RNA triplet complementary to the mRNA triplet that codes for their cargo amino acid.
Page 14 of 16
Aminoacyl tRNA synthetase (an enzyme) catalyzes the bonding between specific tRNAs and the amino
acids that their anticodons sequences call for. The product of this reaction is an aminoacyl-tRNA
molecule. This aminoacyl-tRNA travels inside the ribosome, where mRNA codons are matched through
complementary base pairing to specific tRNA anticodons. The amino acids that the tRNAs carry are then
used to assemble a protein. The energy required for translation of proteins is significant. For a protein
containing n amino acids, the number of high-energy Phosphate bonds required to translate it is
4n+1[citation needed]. The rate of translation varies; it is significantly higher in prokaryotic cells (up to 17-
21 amino acid residues per second) than in eukaryotic cells (up to 6-9 amino acid residues per second)
[1]
Genetic code
Whereas other aspects such as the 3D structure, called tertiary structure, of protein can only be predicted
using sophisticated algorithms, the amino acid sequence, called primary structure, can be determined
solely from the nucleic acid sequence with the aid of a translation table.
This approach may not give the correct amino acid composition of the protein, in particular if
unconventional amino acids such as selenocysteine are incorporated into the protein, which is coded for
by a conventional stop codon in combination with a downstream hairpin (SElenoCysteine Insertion
Sequence, or SECIS).
There are many computer programs capable of translating a DNA/RNA sequence into a protein
sequence. Normally this is performed using the Standard Genetic Code; many bioinformaticians have
written at least one such program at some point in their education. However, few programs can handle all
the "special" cases, such as the use of the alternative initiation codons. For example, the rare alternative
start codon CTG codes for Methionine when used as a start codon, and for Leucine in all other positions.
Example: Condensed translation table for the Standard Genetic Code (from the NCBI Taxonomy
(f) explain that, as enzymes are proteins, their synthesis is controlled by
DNA
Synthesis
Main article: Protein biosynthesis
The DNA sequence of a gene encodes the amino acid sequence of a protein.
Proteins are assembled from amino acids using information encoded in genes. Each protein has its own
unique amino acid sequence that is specified by the nucleotide sequence of the gene encoding this
protein. The genetic code is a set of three-nucleotide sets called codons and each three-nucleotide
combination designates an amino acid, for example AUG (adenine-uracil-guanine) is the code for
methionine. Because DNA contains four nucleotides, the total number of possible codons is 64; hence,
there is some redundancy in the genetic code, with some amino acids specified by more than one
codon.[12] Genes encoded in DNA are first transcribed into pre-messenger RNA (mRNA) by proteins
such as RNA polymerase. Most organisms then process the pre-mRNA (also known as a primary
transcript) using various forms of post-transcriptional modification to form the mature mRNA, which is
then used as a template for protein synthesis by the ribosome. In prokaryotes the mRNA may either be
used as soon as it is produced, or be bound by a ribosome after having moved away from the nucleoid. In
contrast, eukaryotes make mRNA in the cell nucleus and then translocate it across the nuclear
Page 15 of 16
membrane into the cytoplasm, where protein synthesis then takes place. The rate of protein synthesis is
higher in prokaryotes than eukaryotes and can reach up to 20 amino acids per second.[13]
The process of synthesizing a protein from an mRNA template is known as translation. The mRNA is
loaded onto the ribosome and is read three nucleotides at a time by matching each codon to its base
pairing anticodon located on a transfer RNA molecule, which carries the amino acid corresponding to the
codon it recognizes. The enzyme aminoacyl tRNA synthetase "charges" the tRNA molecules with the
correct amino acids. The growing polypeptide is often termed the nascent chain. Proteins are always
biosynthesized from N-terminus to C-terminus.[12]
The DNA sequence of a gene encodes the amino acid sequence of a protein.
Chemical structure of the peptide bond (left) and a peptide bond between leucine and threonine (right).
The size of a synthesized protein can be measured by the number of amino acids it contains and by its
total molecular mass, which is normally reported in units of daltons (synonymous with atomic mass units),
or the derivative unit kilodalton (kDa). Yeast proteins are on average 466 amino acids long and 53 kDa in
mass.[11] The largest known proteins are the titins, a component of the muscle sarcomere, with a
molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids.[14]
Resonance structures of the peptide bond that links individual amino acids to form a protein polymer.
Chemical synthesis
Short proteins can also be synthesized chemically by a family of methods known as peptide synthesis,
which rely on organic synthesis techniques such as chemical ligation to produce peptides in high
Page 16 of 16
yield.[15] Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide
chains, such as attachment of fluorescent probes to amino acid side chains.[16] These methods are
useful in laboratory biochemistry and cell biology, though generally not for commercial applications.
Chemical synthesis is inefficient for polypeptides longer than about 300 amino acids, and the synthesized
proteins may not readily assume their native tertiary structure. Most chemical synthesis methods proceed
from C-terminus to N-terminus, opposite the biological reaction.[17]