genetic control

16
Page 1 of 16 Genetic control (a) describe the structure of RNA and DNA and explain the importance of base pairing and hydrogen bonding DNA:- Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, like a recipe or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Chemically, DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore anti-parallel. Attached to each sugar is one of four types of molecules called bases. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription. STRUCTURE:- Base pairing Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing. Here, purines form hydrogen bonds to pyrimidines, with a bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high temperature. As a result of this complementarities, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms. Sense and antisense A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA copy that is translated into protein. The sequence on the opposite strand is called the "antisense" sequence. Both sense and antisense sequences can exist on different parts of the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear. One proposal is that antisense RNAs are involved in regulating gene expression through RNA-RNA base pairing. A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, blur the distinction between sense and antisense strands by having overlapping genes. In these cases, some DNA sequences do double duty, encoding one protein when read along one strand, and a second protein when read in the opposite direction along the other strand. In bacteria, this overlap may be involved in the regulation of gene transcription, while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.

Upload: soath

Post on 08-Apr-2015

182 views

Category:

Documents


2 download

TRANSCRIPT

Page 1 of 16

Genetic control

(a) describe the structure of RNA and DNA and explain the importance of base pairing and hydrogen bonding

DNA:- Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, like a recipe or a code, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Chemically, DNA consists of two long polymers of simple units called nucleotides, with backbones made of sugars and phosphate groups joined by ester bonds. These two strands run in opposite directions to each other and are therefore anti-parallel. Attached to each sugar is one of four types of molecules called bases. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription.

STRUCTURE:- Base pairing Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing. Here, purines form hydrogen bonds to pyrimidines, with a bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high temperature. As a result of this complementarities, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms.

Sense and antisense A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA copy that is translated into protein. The sequence on the opposite strand is called the "antisense" sequence. Both sense and antisense sequences can exist on different parts of the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear. One proposal is that antisense RNAs are involved in regulating gene expression through RNA-RNA base pairing. A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, blur the distinction between sense and antisense strands by having overlapping genes. In these cases, some DNA sequences do double duty, encoding one protein when read along one strand, and a second protein when read in the opposite direction along the other strand. In bacteria, this overlap may be involved in the regulation of gene transcription, while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.

Page 2 of 16

Super coiling DNA super coil DNA can be twisted like a rope in a process called DNA super coiling. With DNA in its "relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base pairs, but if the DNA is twisted the strands become more tightly or more loosely wound .If the DNA is twisted in the direction of the helix, this is positive super coiling, and the bases are held more tightly together. If they are twisted in the opposite direction, this is negative super coiling, and the bases come apart more easily. In nature, most DNA has slight negative super coiling that is introduced by enzymes called topoisomerases. These enzymes are also needed to relieve the twisting stresses introduced into DNA strands during processes such as transcription and DNA replication.

Importance of Base Pairing

Deoxy-ribonucleic Acid, DNA as we know it to be, has a double helical structure and is made up of small molecules

called nucleotides. (DNA can in other words be referred to as a nucleotide polymer.)Each nucleotide comprises a 5-

carbon sugar, a nitrogenous base and a phosphate group. These are joined together to form a ''twisted-ladder''.

One complete turn of the twisted ladder (i.e the double helical structure), which could be likened to the pitch of a

screw, has an approx. length of 3.4nm and a 2nm width. This is achieved chiefly by the complementarities of the

base pairing that occurs with the bases in DNA. (Remember, Guanine always bonds to Cytosine with 3 hydrogen

bonds and Adenine to Thymine with two hydrogen bonds!). In RNA however, the base thymine is replaced with

Uracil. This means that the base that is complementary to Adenine on a DNA strand , would be Uracil on the RNA

strand. The main significance of base pairing is to enable the maintenance of the gene pool within a species. You

will greatly appreciate that the DNA sequence that codes for a certain protein must be constant so as to make sure

that it codes for that same protein when it is Translated. For example, if the base sequence for the coding of the

hormone insulin were to change, the body would not be able to produce normal insulin and this could result in a

condition such as diabetes! This is the significance of base pairing; to maintain genetic stability and thus the

genetic characteristics that pass on to the offspring of a species.

RNA

Page 3 of 16

Ribonucleic acid (RNA) is a biologically important type of molecule that consists of a long chain of nucleotide units.

Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate. RNA is very similar to DNA, but

differs in a few important structural details: in the cell, RNA is usually single-stranded, while DNA is usually double-

stranded; RNA nucleotides contain ribose while DNA contains deoxyribose (a type of ribose that lacks one oxygen

atom); and RNA has the base uracil rather than thymine that is present in DNA. RNA is transcribed from DNA by

enzymes called RNA polymerases and is generally further processed by other enzymes. RNA is central to protein

synthesis. Here, a type of RNA called messenger RNA carries information from DNA to structures called ribosomes.

These ribosomes are made from proteins and ribosomal RNAs, which come together to form a molecular machine

that can read messenger RNAs and translate the information they carry into proteins. There are many RNAs with

other roles – in particular regulating which genes are expressed, but also as the genomes of most viruses.

STRUCTRE

Each nucleotide in RNA contains a ribose sugar, with carbons numbered 1' through 5'. A base is attached to the 1'

position, generally adenine (A), cytosine (C), guanine (G) or uracil (U). Adenine and guanine are purines, cytosine

and uracil are pyrimidines. A phosphate group is attached to the 3' position of one ribose and the 5' position of the

next. The phosphate groups have a negative charge each at physiological pH, making RNA a charged molecule

(polyanion). The bases may form hydrogen bonds between cytosine and guanine, between adenine and uracil and

between guanine and uracil. However other interactions are possible, such as a group of adenine bases binding to

each other in a bulge, or the GNRA tetraloop that has a guanine–adenine base-pair.

Chemical structure of RNA

An important structural feature of RNA that distinguishes it from DNA is the presence of a hydroxyl group at the 2'

position of the ribose sugar. The presence of this functional group causes the helix to adopt the A-form geometry

rather than the B-form most commonly observed in DNA.This results in a very deep and narrow major groove and

a shallow and wide minor groove. A second consequence of the presence of the 2'-hydroxyl group is that in

conformationally flexible regions of an RNA molecule (that is, not involved in formation of a double helix), it can

chemically attack the adjacent phosphodiester bond to cleave the backbone.

b) explain how DNA replicates semi-conservatively during interphase

Replication is the process by which a cell copies its DNA prior to division. In humans, for example, each

parent cell must copy its entire six billion base pairs of DNA before undergoing mitosis. The molecular

details of DNA replication are described elsewhere, and they were not known until some time after

Watson and Crick's discovery. In fact, before such details could be determined, scientists were faced with

a more fundamental research concern. Specifically, they wanted to know the overall nature of the process

by which DNA replication occurs.

Watson and Crick themselves had specific ideas about DNA replication, and these ideas were based on

the structure of the DNA molecule. In particular, the duo hypothesized that replication occurs in a

"semiconservative" fashion. According to the semiconservative replication model, which is illustrated in

Figure 1, the two original DNA strands (i.e., the two complementary halves of the double helix) separate

during replication; each strand then serves as a template for a new DNA strand, which means that each

newly synthesized double helix is a combination of one old (or original) and one new DNA strand.

Conceptually, semiconservative replication made sense in light of the double helix structural model of

DNA, in particular its complementary nature and the fact that adenine always pairs with thymine and

cytosine always pairs with guanine. Looking at this model, it is easy to imagine that during replication,

Page 4 of 16

each strand serves as a template for the synthesis of a new strand, with complementary bases being

added in the order indicated.

Semiconservative replication was not the only model of DNA replication proposed during the mid-1950s,

however. In fact, two other prominent hypotheses were put also forth: conservative replication and

dispersive replication. According to the conservative replication model, the entire original DNA double

helix serves as a template for a new double helix, such that each round of cell division produces one

daughter cell with a completely new DNA double helix and another daughter cell with a completely intact

old (or original) DNA double helix. On the other hand, in the dispersive replication model, the original DNA

double helix breaks apart into fragments, and each fragment then serves as a template for a new DNA

fragment. As a result, every cell division produces two cells with varying amounts of old and new DNA

Figure 1: Three proposed models of replication are conservative replication, dispersive replication, and

semiconservative replication.

Making Predictions Based on the Models

When these three models were first proposed, scientists had few clues about what might be occurring at

the molecular level during DNA replication. Fortunately, the models yielded different predictions about the

distribution of old versus new DNA in newly divided cells, no matter what the underlying molecular

mechanisms. These predictions were as follows:

According to the semiconservative model, after one round of replication, every new DNA double helix

would be a hybrid that consisted of one strand of old DNA bound to one strand of newly synthesized

DNA. Then, during the second round of replication, the hybrids would separate, and each strand would

pair with a newly synthesized strand. Afterward, only half of the new DNA double helices would be

hybrids; the other half would be completely new. Every subsequent round of replication therefore would

result in fewer hybrids and more completely new double helices.

According to the conservative model, after one round of replication, half of the new DNA double helices

would be composed of completely old, or original, DNA, and the other half would be completely new.

Then, during the second round of replication, each double helix would be copied in its entirety. Afterward,

one-quarter of the double helices would be completely old, and three-quarters would be completely new.

Thus, each subsequent round of replication would result in a greater proportion of completely new DNA

double helices, while the number of completely original DNA double helices would remain constant.

Page 5 of 16

According to the dispersive model, every round of replication would result in hybrids, or DNA double

helices that are part original DNA and part new DNA. Each subsequent round of replication would then

produce double helices with greater amounts of new DNA.

(c) state that a gene is a sequence of nucleotides as part of a DNA

molecule, which codes for a polypeptide.

A gene is a unit of heredity in a living organism. It is normally a stretch of DNA that codes for a type of

protein or for an RNA chain that has a function in the organism. All living things depend on genes, as they

specify all proteins and functional RNA chains. Genes hold the information to build and maintain an

organism's cells and pass genetic traits to offspring. A modern working definition of a gene is "a locatable

region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory

regions, transcribed regions, and or other functional sequence regions ".Colloquial usage of the term

gene (e.g. "good genes, "hair color gene") may actually refer to an allele: a gene is the basic instruction, a

sequence of nucleic acid (DNA or, in the case of certain viruses RNA), while an allele is one variant of

that instruction. Thus, when the mainstream press refers to "having" a "gene" for a specific trait, the press

is wrong. All people would have the gene in question, but certain people will have a specific allele of that

gene, which results in the trait.

The notion of a gene is evolving with the science of genetics, which began when Gregor Mendel noticed

that biological variations are inherited from parent organisms as specific, discrete traits. The biological

entity responsible for defining traits was later termed a gene, but the biological basis for inheritance

remained unknown until DNA was identified as the genetic material in the 1940s. All organisms have

many genes corresponding to many different biological traits, some of which are immediately visible, such

as eye color or number of limbs, and some of which are not, such as blood type or increased risk for

specific diseases, or the thousands of basic biochemical processes that comprise life.

The chemical structure of a four-base fragment of a DNA double helix.

The vast majority of living organisms encode their genes in long strands of DNA. DNA (deoxyribonucleic

acid) consists of a chain made from four types of nucleotide subunits, each composed of: a five-carbon

sugar (2'-deoxyribose), a phosphate group, and one of the four bases adenine, cytosine, guanine, and

thymine. The most common form of DNA in a cell is in a double helix structure, in which two individual

DNA strands twist around each other in a right-handed spiral. In this structure, the base pairing rules

specify that guanine pairs with cytosine and adenine pairs with thymine. The base pairing between

guanine and cytosine forms three hydrogen bonds, whereas the base pairing between adenine and

thymine forms two hydrogen bonds. The two strands in a double helix must therefore be complementary,

that is, their bases must align such that the adenines of one strand are paired with the thymines of the

other strand, and so on.

Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality.

One end of a DNA polymer contains an exposed hydroxyl group on the deoxyribose; this is known as the

3' end of the molecule. The other end contains an exposed phosphate group; this is the 5' end. The

directionality of DNA is vitally important to many cellular processes, since double helices are necessarily

directional (a strand running 5'-3' pairs with a complementary strand running 3'-5'), and processes such

as DNA replication occur in only one direction. All nucleic acid synthesis in a cell occurs in the 5'-3'

direction, because new monomers are added via a dehydration reaction that uses the exposed 3'

hydroxyl as a nucleophile.

Page 6 of 16

The expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of

nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than

deoxyribose. RNA also contains the base uracil in place of thymine. RNA molecules are less stable than

DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-

nucleotide sequences called codons, which serve as the words in the genetic language. The genetic code

specifies the correspondence during protein translation between codons and amino acids. The genetic

code is nearly the same for all known organisms.

This stylistic diagram shows a gene in relation to the double helix structure of DNA and to a chromosome

(right). The chromosome is X-shaped because it is dividing. Introns are regions often found in eukaryote

genes that are removed in the splicing process (after the DNA is transcribed into RNA): Only the exons

encode the protein. This diagram labels a region of only 50 or so bases as a gene. In reality, most genes

are hundreds of times larger.

(d) describe the way in which the nucleotide sequence codes for the amino acid sequence in a polypeptide with reference to the nucleotide sequence for HbA (normal) and HbS (sickle cell) alleles of the gene for the β-haemoglobin polypeptide

The genetic code is the set of rules by which information encoded in genetic material (DNA or mRNA

sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a

mapping between tri-nucleotide sequences, called codons, and amino acids. With some exceptions, a

triplet codon in a nucleic acid sequence specifies a single amino acid. Because the vast majority of genes

are encoded with exactly the same code (see the RNA codon table), this particular code is often referred

to as the canonical or standard genetic code, or simply the genetic code, though in fact there are many

variant codes. For example, protein synthesis in human mitochondria relies on a genetic code that differs

from the standard genetic code.

Not all genetic information is stored using the genetic code. All organisms' DNA contains regulatory

sequences, intergenic segments, and chromosomal structural areas that can contribute greatly to

phenotype. Those elements operate under sets of rules that are distinct from the codon-to-amino acid

paradigm underlying the genetic code.

Page 7 of 16

A series of codons in part of a mRNA molecule. Each codon consists of three nucleotides, usually

representing a single amino acid.

Transfer of information via the genetic code

The genome of an organism is inscribed in DNA, or in the case of some viruses, RNA. The portion of the

genome that codes for a protein or an RNA is referred to as a gene. Those genes that code for proteins

are composed of tri-nucleotide units called codons, each coding for a single amino acid. Each nucleotide

sub-unit consists of a phosphate, deoxyribose sugar and one of the 4 nitrogenous nucleobases. The

purine bases adenine (A) and guanine (G) are larger and consist of two aromatic rings. The pyrimidine

bases cytosine (C) and thymine (T) are smaller and consist of only one aromatic ring. In the double-helix

configuration, two strands of DNA are joined to each other by hydrogen bonds in an arrangement known

as base pairing. These bonds almost always form between an adenine base on one strand and a thymine

on the other strand and between a cytosine base on one strand and a guanine base on the other. This

means that the number of A and T residues will be the same in a given double helix, as will the number of

G and C residues. In RNA, thymine (T) is replaced by uracil (U), and the deoxyribose is substituted by

ribose.

Each protein-coding gene is transcribed into a template molecule of the related polymer RNA, known as

messenger RNA or mRNA. This, in turn, is translated on the ribosome into an amino acid chain or

polypeptide. The process of translation requires transfer RNAs specific for individual amino acids with the

amino acids covalently attached to them, guanosine triphosphate as an energy source, and a number of

translation factors. tRNAs have anticodons complementary to the codons in mRNA and can be "charged"

covalently with amino acids at their 3' terminal CCA ends. Individual tRNAs are charged with specific

amino acids by enzymes known as aminoacyl tRNA synthetases, which have high specificity for both their

cognate amino acids and tRNAs. The high specificity of these enzymes is a major reason why the fidelity

of protein translation is maintained.

There are 4³ = 64 different codon combinations possible with a triplet codon of three nucleotides; all 64

codons are assigned for either amino acids or stop signals during translation. If, for example, an RNA

sequence, UUUAAACCC is considered and the reading frame starts with the first U (by convention, 5' to

3'), there are three codons, namely, UUU, AAA and CCC, each of which specifies one amino acid. This

RNA sequence will be translated into an amino acid sequence, three amino acids long. A comparison

may be made with computer science, where the codon is similar to a word, which is the standard "chunk"

Page 8 of 16

The nucleotide sequence for HbS(SICKLE CELL)

Sickle-cell gene mutation probably arose spontaneously in different geographic areas, as suggested by

restriction endonuclease analysis. These variants are known as Cameroon, Senegal, Benin, Bantu and

Saudi-Asian. Their clinical importance springs from the fact that some of them are associated with higher

HbF levels, e.g., Senegal and Saudi-Asian variants, and tend to have milder disease.

In people heterozygous for HgbS (carriers of sickling haemoglobin), the polymerisation problems are

minor, because the normal allele is able to produce over 50% of the haemoglobin. In people homozygous

for HgbS, the presence of long-chain polymers of HbS distort the shape of the red blood cell from a

smooth donut-like shape to ragged and full of spikes, making it fragile and susceptible to breaking within

capillaries. Carriers have symptoms only if they are deprived of oxygen (for example, while climbing a

mountain) or while severely dehydrated. Under normal circumstances, these painful crises occur about

0.8 times per year per patient.[citation needed] The sickle-cell disease occurs when the seventh amino

acid (if the initial methionine is counted), glutamic acid, is replaced by valine to change its structure and

function.

The gene defect is a known mutation of a single nucleotide (see single-nucleotide polymorphism - SNP)

(A to T) of the β-globin gene, which results in glutamate being substituted by valine at position 6.

Haemoglobin S with this mutation are referred to as HbS, as opposed to the normal adult HbA. The

genetic disorder is due to the mutation of a single nucleotide, from a GAG to GTG codon mutation. This is

normally a benign mutation, causing no apparent effects on the secondary, tertiary, or quaternary

structure of haemoglobin in conditions of normal oxygen concentration. What it does allow for, under

conditions of low oxygen concentration, is the polymerization of the HbS itself. The deoxy form of

haemoglobin exposes a hydrophobic patch on the protein between the E and F helices. The hydrophobic

residues of the valine at position 6 of the beta chain in haemoglobin are able to associate with the

hydrophobic patch, causing haemoglobin S molecules to aggregate and form fibrous precipitates.

The allele responsible for sickle-cell anaemia is autosomal recessive and can be found on the short arm

of chromosome 11. A person that receives the defective gene from both father and mother develops the

disease; a person that receives one defective and one healthy allele remains healthy, but can pass on the

disease and is known as a carrier. If two parents who are carriers have a child, there is a 1-in-4 chance of

their child's developing the disease and a 1-in-2 chance of their child's being just a carrier. Since the gene

is incompletely recessive, carriers can produce a few sickled red blood cells, not enough to cause

symptoms, but enough to give resistance to malaria. Because of this, heterozygotes have a higher fitness

than either of the homozygotes. This is known as heterozygote advantage.

Due to the adaptive advantage of the heterozygote, the disease is still prevalent, especially among

people with recent ancestry in malaria-stricken areas, such as Africa, the Mediterranean, India and the

Middle East. Malaria was historically endemic to southern Europe, but it was declared eradicated in the

mid-20th century, with the exception of rare sporadic cases.

Inheritance

Sickle-cell conditions are inherited from parents in much the same way as blood type, hair colour and

texture, eye colour, and other physical traits.

The types of haemoglobin a person makes in the red blood cells depend on what haemoglobin genes are

inherited from his parents.

Page 9 of 16

If one parent has sickle-cell anaemia (SS) and the other has sickle-cell trait (AS), there is a 50% chance

of a child's having sickle-cell disease (SS) and a 50% chance of a child's having sickle-cell trait (AS).

When both parents have sickle-cell trait (AS), a child has a 25% chance (1 of 4) of sickle-cell disease

(SS), as shown in the diagram.

(e) describe how the information on DNA is used during transcription and translation to construct polypeptides, including the role of messenger RNA (mRNA), transfer RNA (tRNA) and

the ribosome

TRANSCRIPTION

ranscription is the process of creating an equivalent RNA copy of a sequence of DNA[1]. Both RNA and

DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be

converted back and forth from DNA to RNA in the presence of the correct enzymes. During transcription,

a DNA sequence is read by RNA polymerase, which produces a complementary, antiparallel RNA strand.

As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all

instances where thymine (T) would have occurred in a DNA complement.

Transcription is the first step leading to gene expression. The stretch of DNA transcribed into an RNA

molecule is called a transcription unit and encodes at least one gene. If the gene transcribed encodes for

a protein, the result of transcription is messenger RNA (mRNA), which will then be used to create that

protein via the process of translation. Alternatively, the transcribed gene may encode for either ribosomal

RNA (rRNA) or transfer RNA (tRNA), other components of the protein-assembly process, or other

ribozymes.

A DNA transcription unit encoding for a protein contains not only the sequence that will eventually be

directly translated into the protein (the coding sequence) but also regulatory sequences that direct and

regulate the synthesis of that protein. The regulatory sequence before (upstream from) the coding

sequence is called the five prime untranslated region (5'UTR), and the sequence following (downstream

from) the coding sequence is called the three prime untranslated region (3'UTR).

Page 10 of 16

Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls

for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.[2]

As in DNA replication, DNA is read from 3' → 5' during transcription. Meanwhile, the complementary RNA

is created from the 5' → 3' direction. Although DNA is arranged as two antiparallel strands in a double

helix, only one of the two DNA strands, called the template strand, is used for transcription. This is

because RNA is only single-stranded, as opposed to double-stranded DNA. The other DNA strand is

called the coding strand, because its sequence is the same as the newly created RNA transcript (except

for the substitution of uracil for thymine). The use of only the 3' → 5' strand eliminates the need for the

Okazaki fragments seen in DNA replication.

Transcription is divided into 5 stages: pre-initiation, initiation, promoter clearance, elongation and

termination.

Major steps

Pre-initiation

In eukaryotes, RNA polymerase, and therefore the initiation of transcription, requires the presence of a

core promoter sequence in the DNA. Promoters are regions of DNA which promote transcription and in

eukaryotes, are found at -30, -75 and -90 base pairs upstream from the start site of transcription. Core

promoters are sequences within the promoter which are essential for transcription initiation. RNA

polymerase is able to bind to core promoters in the presence of various specific transcription factors.

The most common type of core promoter in eukaryotes is a short DNA sequence known as a TATA box,

found -30 base pairs from the start site of transcription. The TATA box, as a core promoter, is the binding

site for a transcription factor known as TATA binding protein (TBP), which is itself a subunit of another

transcription factor, called Transcription Factor II D (TFIID). After TFIID binds to the TATA box via the

TBP, five more transcription factors and RNA polymerase combine around the TATA box in a series of

stages to form a preinitiation complex. One transcription factor, DNA helicase, has helicase activity and

so is involved in the separating of opposing strands of double-stranded DNA to provide access to a

single-stranded DNA template. However, only a low, or basal, rate of transcription is driven by the

preinitiation complex alone. Other proteins known as activators and repressors, along with any associated

coactivators or corepressors, are responsible for modulating transcription rate.

The transcription preinitiation in archaea is essentially homologous to that of eukaryotes, but is much less

complex.[3] The archaeal preinitiation complex assembles at a TATA-box binding site; however, in

archaea, this complex is composed of only RNA polymerase II, TBP, and TFB (the archaeal homologue

of eukaryotic transcription factor II B (TFIIB)).[4][5]

Initiation

Simple diagram of transcription initiation. RNAP = RNA polymerase

In bacteria, transcription begins with the binding of RNA polymerase to the promoter in DNA. RNA

polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω

subunit. At the start of initiation, the core enzyme is associated with a sigma factor (number 70) that aids

in finding the appropriate -35 and -10 base pairs downstream of promoter sequences.

Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly

recognize the core promoter sequences. Instead, a collection of proteins called transcription factors

Page 11 of 16

mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription

factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of

transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex.

Transcription in the archaea domain is similar to transcription in eukaryotes.[6]

Promoter clearance

After the first bond is synthesized, the RNA polymerase must clear the promoter. During this time there is

a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive

initiation and is common for both eukaryotes and prokaroytes[7]. Abortive initiation continues to occur

until the σ factor rearranges, resulting in the transcription elongation complex (which gives a 35 bp

moving footprint). The σ factor is released before 80 nucleotides of mRNA are synthesized[8]. Once the

transcript reaches approximately 23 nucleotides, it no longer slips and elongation can occur. This, like

most of the remainder of transcription, is an energy-dependent process, consuming adenosine

triphosphate (ATP).

Promoter clearance coincides with phosphorylation of serine 5 on the carboxy terminal domain of RNA

Pol in eukaryotes, which is phosphorylated by TFIIH.

Elongation

Simple diagram of transcription elongation

One strand of DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis.

As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing

complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the

template strand from 3' → 5', the coding (non-template) strand and newly-formed RNA can also be used

as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA

molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils,

and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less

oxygen atom) in its sugar-phosphate backbone).

Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA

template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA

molecules can be rapidly produced from a single copy of a gene.

Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In

eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing

factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.

Termination

Simple diagram of transcription termination

Bacteria use two different strategies for transcription termination. In Rho-independent transcription

termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C rich hairpin

loop followed by a run of Us, which makes it detach from the DNA template. In the "Rho-dependent" type

of termination, a protein factor called "Rho" destabilizes the interaction between the template and the

mRNA, thus releasing the newly synthesized mRNA from the elongation complex.

Page 12 of 16

Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript

followed by template-independent addition of As at its new 3' end, in a process called polyadenylation

Simple diagram of transcription initiation. RNAP = RNA polymerase(figure-1)

Simple diagram of transcription elongation(f-2)

Simple diagram of transcription termination(f-3)

Translation

Translation is the first stage of protein biosynthesis (part of the overall process of gene expression). In

translation, messenger RNA (mRNA) produced by transcription is decoded by the ribosome to produce a

specific amino acid chain, or polypeptide, that will later fold into an active protein. Translation occurs in

the cell's cytoplasm, where the large and small subunits of the ribosome are located, and bind to the

mRNA. The ribosome facilitates decoding by inducing the binding of tRNAs with complementary

anticodon sequences to that of the mRNA. The tRNAs carry specific amino acids that are chained

together into a polypeptide as the mRNA passes through and is "read" by the ribosome in a fashion

reminiscent to that of a stock ticker and ticker tape.

A ribosome translating a protein that is secreted into the endoplasmic reticulum. tRNAs are colored dark

blue.

In many instances, the entire ribosome/mRNA complex will bind to the outer membrane of the rough

endoplasmic reticulum and release the nascent protein polypeptide inside for later vesicle transport and

secretion outside of the cell. Many types of transcribed RNA, such as transfer RNA, ribosomal RNA, and

small nuclear RNA, do not undergo translation into proteins.

Page 13 of 16

Translation proceeds in four phases: activation, initiation, elongation and termination (all describing the

growth of the amino acid chain, or polypeptide that is the product of translation). Amino acids are brought

to ribosomes and assembled into proteins.

In activation, the correct amino acid is covalently bonded to the correct transfer RNA (tRNA). The amino

acid is joined by its carboxyl group to the 3' OH of the tRNA by a peptide bond. When the tRNA has an

amino acid linked to it, it is termed "charged". Initiation involves the small subunit of the ribosome binding

to 5' end of mRNA with the help of initiation factors (IF). Termination of the polypeptide happens when the

A site of the ribosome faces a stop codon (UAA, UAG, or UGA). No tRNA can recognize or bind to this

codon. Instead, the stop codon induces the binding of a release factor protein that prompts the

disassembly of the entire ribosome/mRNA complex.

A number of antibiotics act by inhibiting translation; these include anisomycin, cycloheximide,

chloramphenicol, tetracycline, streptomycin, erythromycin, and puromycin, among others. Prokaryotic

ribosomes have a different structure from that of eukaryotic ribosomes, and thus antibiotics can

specifically target bacterial infections without any detriment to a eukaryotic host's cells.

Basic mechanisms

Further information: Prokaryotic translation and Eukaryotic translation

Diagram showing the translation of mRNA and the synthesis of proteins by a ribosome.

Diagram showing the translation of mRNA and the synthesis of proteins by a ribosome.

The mRNA carries genetic information encoded as a ribonucleotide sequence from the chromosomes to

the ribosomes. The ribonucleotides are "read" by translational machinery in a sequence of nucleotide

triplets called codons. Each of those triplets codes for a specific amino acid.

The ribosome molecules translate this code to a specific sequence of amino acids. The ribosome is a

multisubunit structure containing rRNA and proteins. It is the "factory" where amino acids are assembled

into proteins. tRNAs are small noncoding RNA chains (74-93 nucleotides) that transport amino acids to

the ribosome. tRNAs have a site for amino acid attachment, and a site called an anticodon. The

anticodon is an RNA triplet complementary to the mRNA triplet that codes for their cargo amino acid.

Page 14 of 16

Aminoacyl tRNA synthetase (an enzyme) catalyzes the bonding between specific tRNAs and the amino

acids that their anticodons sequences call for. The product of this reaction is an aminoacyl-tRNA

molecule. This aminoacyl-tRNA travels inside the ribosome, where mRNA codons are matched through

complementary base pairing to specific tRNA anticodons. The amino acids that the tRNAs carry are then

used to assemble a protein. The energy required for translation of proteins is significant. For a protein

containing n amino acids, the number of high-energy Phosphate bonds required to translate it is

4n+1[citation needed]. The rate of translation varies; it is significantly higher in prokaryotic cells (up to 17-

21 amino acid residues per second) than in eukaryotic cells (up to 6-9 amino acid residues per second)

[1]

Genetic code

Whereas other aspects such as the 3D structure, called tertiary structure, of protein can only be predicted

using sophisticated algorithms, the amino acid sequence, called primary structure, can be determined

solely from the nucleic acid sequence with the aid of a translation table.

This approach may not give the correct amino acid composition of the protein, in particular if

unconventional amino acids such as selenocysteine are incorporated into the protein, which is coded for

by a conventional stop codon in combination with a downstream hairpin (SElenoCysteine Insertion

Sequence, or SECIS).

There are many computer programs capable of translating a DNA/RNA sequence into a protein

sequence. Normally this is performed using the Standard Genetic Code; many bioinformaticians have

written at least one such program at some point in their education. However, few programs can handle all

the "special" cases, such as the use of the alternative initiation codons. For example, the rare alternative

start codon CTG codes for Methionine when used as a start codon, and for Leucine in all other positions.

Example: Condensed translation table for the Standard Genetic Code (from the NCBI Taxonomy

(f) explain that, as enzymes are proteins, their synthesis is controlled by

DNA

Synthesis

Main article: Protein biosynthesis

The DNA sequence of a gene encodes the amino acid sequence of a protein.

Proteins are assembled from amino acids using information encoded in genes. Each protein has its own

unique amino acid sequence that is specified by the nucleotide sequence of the gene encoding this

protein. The genetic code is a set of three-nucleotide sets called codons and each three-nucleotide

combination designates an amino acid, for example AUG (adenine-uracil-guanine) is the code for

methionine. Because DNA contains four nucleotides, the total number of possible codons is 64; hence,

there is some redundancy in the genetic code, with some amino acids specified by more than one

codon.[12] Genes encoded in DNA are first transcribed into pre-messenger RNA (mRNA) by proteins

such as RNA polymerase. Most organisms then process the pre-mRNA (also known as a primary

transcript) using various forms of post-transcriptional modification to form the mature mRNA, which is

then used as a template for protein synthesis by the ribosome. In prokaryotes the mRNA may either be

used as soon as it is produced, or be bound by a ribosome after having moved away from the nucleoid. In

contrast, eukaryotes make mRNA in the cell nucleus and then translocate it across the nuclear

Page 15 of 16

membrane into the cytoplasm, where protein synthesis then takes place. The rate of protein synthesis is

higher in prokaryotes than eukaryotes and can reach up to 20 amino acids per second.[13]

The process of synthesizing a protein from an mRNA template is known as translation. The mRNA is

loaded onto the ribosome and is read three nucleotides at a time by matching each codon to its base

pairing anticodon located on a transfer RNA molecule, which carries the amino acid corresponding to the

codon it recognizes. The enzyme aminoacyl tRNA synthetase "charges" the tRNA molecules with the

correct amino acids. The growing polypeptide is often termed the nascent chain. Proteins are always

biosynthesized from N-terminus to C-terminus.[12]

The DNA sequence of a gene encodes the amino acid sequence of a protein.

Chemical structure of the peptide bond (left) and a peptide bond between leucine and threonine (right).

The size of a synthesized protein can be measured by the number of amino acids it contains and by its

total molecular mass, which is normally reported in units of daltons (synonymous with atomic mass units),

or the derivative unit kilodalton (kDa). Yeast proteins are on average 466 amino acids long and 53 kDa in

mass.[11] The largest known proteins are the titins, a component of the muscle sarcomere, with a

molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids.[14]

Resonance structures of the peptide bond that links individual amino acids to form a protein polymer.

Chemical synthesis

Short proteins can also be synthesized chemically by a family of methods known as peptide synthesis,

which rely on organic synthesis techniques such as chemical ligation to produce peptides in high

Page 16 of 16

yield.[15] Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide

chains, such as attachment of fluorescent probes to amino acid side chains.[16] These methods are

useful in laboratory biochemistry and cell biology, though generally not for commercial applications.

Chemical synthesis is inefficient for polypeptides longer than about 300 amino acids, and the synthesized

proteins may not readily assume their native tertiary structure. Most chemical synthesis methods proceed

from C-terminus to N-terminus, opposite the biological reaction.[17]