lecture 1 prokaryotic expression systems€¦ · lecture 1 – prokaryotic ... transcription and...
TRANSCRIPT
LSM4242 Protein Engineering
1
Lecture 1 – Prokaryotic Expression Systems
Example of recombinant insulin:
Digestion of plasmid with restriction endonucleases
Insertion of insulin gene into plasmid through ligation
Transformation into host cell where recombinant plasmid will be propagated.
Use blue-white assay for screening:
Host strain carries lacZ deletion mutant (lacZM15) which contains the -
peptide which in inactive.
Transformed strain carries the -peptide which results in fully functional β-
galactosidase which will cause XGal to turn blue in the presence of IPTG.
Expression systems are based on the insertion of the gene of interest into a host cell for its
translation and expression into protein.
Every protein in unique and no clear strategy will work every time, thus different
systems have to be tried.
Expression systems include bacteria, yeast, insect and mammalian cells.
Prokaryotic expression system
Advantages:
Low cost
Simple and well characterized physiology
Short time required for growth and expression
Large yield of products
Elements needed for expression which are often in a vector:
Promoters – control point for transcription
Strong promoters have high affinity for RNA polymerase and are
frequently transcribed
Regulatable promoters can be controlled by inducers/co-repressors
E.g. lac promoter.
o Negative and inducible regulation where repressor binds to
operator and inducer (allolactose/IPTG) binds to repressor
to inactivate it.
o Positive regulation where high concentrations of cAMP will
bind to CAP during low glucose conditions, activating the
promoter.
LSM4242 Protein Engineering
2
E.g. lacUV5 promoter.
o CAP binding site is deleted and the -10 sequence is optimised.
E.g. Trp promoter
o Gene is inactive when repressor + co-repressor (Trp) is
present.
E.g. tac (trc) promoter
o Hybrid of lac and trp promoters where the -35 region is from
trp and the -10 region is from lac. Inducible by IPTG and has
about 3 times the strength of trp promoter and 11 times the
strength of the lac promoter.
E.g. λ pL promoter
o Regulated by the λ repressor which is encoded by the cI gene.
Often the cI857 mutant is used as it encodes for a
temperature sensitive repressor. At 28C, it represses the λ
promoter strongly, but at 42C, the protein denatures which
cause expression.
E.g. Phage T7 promoter
o Requires T7 RNA polymerase for expression. Often the T7
RNA polymerase gene is placed under control of the lac
promoter, and the gene of interest will be under control of T7
gene 10 promoter ensures tighter control.
o In BL21(DE3) strain of E. coli, the gene for T7 RNA
polymerase is integrated into its genome under the control of
Plac and thus inducible by IPTG.
Ribosome binding sites (RBS) – necessary for translation
Shine-Dalgarno sequence in prokaryotes which is 6-8bp length, often
10bp upstream of AUG start codon. Sequence is complementary to 3’
region of 16S rRNA small subunit in ribosome.
The stronger the binding between mRNA and 16S rRNA, the greater
the efficiency of translation initiation.
Antibiotic resistance and tags – for selection, purification and detection.
Examples of popular plasmid vectors
pUC family: ColE1 origin
of replication which has
high copy number, drug
resistance marker is β-
lactamase (ampR), has
lacZ for blue/white
selection. Plac and Olac is
used to control the gene
expression. Gene insertion site (polylinker region – pLink) contains many
restriction enzyme sites.
pET family: widely used due to strong selectivity of T7 RNA polymerase to
promoter, high activity of polymerase, and high translation efficiency. Has
ColE1 origin of replication and ampR resistance marker. Main difference in
the promoter being T7 and the presence of lacI gene which codes for lac
repressor.
LSM4242 Protein Engineering
3
Two main problems associated with production of recombinant proteins are
degradation/insolubility and purity. Solution is to use a cleavable fusion protein system
which attaches target protein to a stable cellular protein.
Results in a protein more soluble and resistant to proteases. E.g. thioredoxin
(11.7kDa) can keep fusion protein soluble even though it makes up 40% of total
protein.
For fusion protein, stop codon of stable cellular protein must be removed and
reading frame of fusion protein must be contiguous.
Example would be to fuse with ompF (Porin – forms pores that allow passive
diffusion of small molecules across the outer membrane) upstream for secretion or
lacZ downstream for stability.
Host proteins fused to proteins of interest by acting as tags allows for purification in
one step.
Affinity chromatography - small molecule (e.g. streptavidin/biotin) or
antigen can be covalently linked where the fusion protein will bind to the
column.
Insert a protease cleavage sequence between fusion partner and gene of interest to
isolate your desired protein.
Examples of proteases: factor Xa, enterokinase (serine protease) and TEV
protease.
The pTrcHis system uses lacIq which produces
more lacR than normal, the tac(trc) promoter
which is stronger, lacO which allows for IPTG
regulation, “His tag” region for purification via
metal chelation and an enterokinase (EK)
cleavage recognition sequence downstream of the
“His tag”.
Presence of His residues may prevent
normal protein function and thus can be
cleaved away by endopeptidase at EK site.
NH2-(His)6-(Asp)4-Lys-(Protein of interest)
PinPoint Xa system is used for the
production and purification of fusion
proteins that are biotinylated in vivo.
Gene of interest is fused to a
biotinlyated lysine tag followed by
a Xa protease recognition site.
Purification can be carried out on
an avidin/streptavidin resin
followed by proteolytic cleavage.
Disadvantages: Often it does not fold properly and there is a lack of post-
translational modifications. Toxic proteins can be solved by using inducible
promoters, while insoluble proteins can be expressed with a highly soluble partner
such as glutathione-S-transferase (GST) or maltose binding protein (MBP) to
improve its solubility.
LSM4242 Protein Engineering
4
Lecture 2 – Eukaryotic Expression Systems
Problems with regard to stability and activity would arise when eukaryotic proteins are
expressed in prokaryotic cells mainly because of the absence of PTM.
E.g. correct disulfide bond formation, proteolytic cleavage of inactive precursor,
glycosylation, alteration of amino acids such as phosphorylation and acetylation.
Genetic features in eukaryotic expression vectors include
selectable markers (e.g. AmpR), eukaryotic promoters, mRNA
polyA signal, origin of replication (orieuk) if plasmid based
and chromosomal DNA segment for homologous
recombination into host chromosome.
Yeast Expression Systems
Advantages:
Single cell
Well characterized genetically
Strong promoters available
Natural plasmid (2m)
Post-translational modification
Secretes a few proteins normally
Generally recognized as safe (GRAS) organism according to FDA.
Three types of expression vectors:
Episomal – most widely used but unstable in large scale cultures (>10L).
Strategy is to alter growth conditions to try and stabilize the episomal vector
by using mutant host strains requiring specific nutrients (e.g. Leu, Trp, His)
Integrating – generally stable but only has 1 copy inserted into chromosome.
Able to insert tandem arrays of genes and thus increases expression but also
instability.
E.g. Use AOX1 gene sequence to flank gene of interest and selection
marker, where integration into yeast chromosome will be done
through homologous recombination.
Yeast artificial chromosome (YAC) – designed to clone large pieces of DNA
(100 to 1500kb) and thus is generally not stable and not used for protein
production.
Yeast promoters:
Disadvantages:
Instability of plasmids in scale-up especially for episomal vectors.
Over-glycosylation of glycoproteins which may alter protein activity
100+ mannose residues vs. 8-13 normally
LSM4242 Protein Engineering
5
Solution to problems would then be to use other types of yeast such as Pichia
pastoris or Candida sp rather than the usual S. cerevisiae or other eukaryotic cells.
Current application of yeasts include:
Production of human glycoproteins by deleting yeast glycosylation genes
and adding human ones.
Yeast as cell factories by reconstructing heterologous biosynthesis pathways
to produce complex natural products such as isoprenoids and sterols
through the mevalonate pathway and flavonoids, stilbenes and opioids
through amino acid biosynthesis pathways.
Higher eukaryotic cell expression systems – often needed for production of therapeutic
proteins where correct PTMs are needed e.g. erythropoietin, interleukin-2, mAbs, growth
hormones etc.
Generalized mammalian expression vector
often include the following:
Origin of replication which is derived
from animal virus (SV40)
Promoters derived from animal
viruses or highly expressed
mammalian genes
Selectable markers e.g. methionine sulfoximine (MSX) inhibits glutamine
synthetase and thus cells cannot make endogenous glutamine, transfection
with a vector encoding Glutamine Synthetase ensures cell survival in culture.
Translation control elements on mRNA in order from 5’ to 3’ end:
Kozak sequence CCRCCAUGG
Signal sequence for secretion
Affinity tag for purification
Proteolytic cleavage site
Some therapeutic proteins are composed of two chains such as insulin and thus we
can express both subunits at the same time in stoichiometric amounts through:
Two vector expression system – clone target genes into 2 vectors using 2
different selectable markers and co-transfect them into cells.
Issues: can lose one of the plasmids, different copy numbers and
different promoter strength.
Two gene expression vector (double cassette vectors) – put both genes into
one plasmid where each gene is a separate transcription unit with own
promoter and polyA region.
Issues: may not get same amount of protein due to differences in
transcription and translation.
LSM4242 Protein Engineering
6
Dicistronic vectors – gene expression of both genes controlled by a single
promoter, thus sharing the same transcription unit. However this requires
an internal ribosome entry site which is derived from mammalian virus
genomes.
Insect cell expression systems are based on baculovirus which exclusively infect
invertebrates.
Baculovirus infects the insect cell early and produces polyhedron proteins which
trap many virions in a stable polyhedron package. Upon ingestion by insect host,
polyhedrin protein will be broken down and virions will be released, starting the
infection.
Since polyhedrin promoter is strong, by replacing the polyhedrin coding sequence
with gene of interest, lots of protein can be expressed within 36-48 hours of post-
infection.
Virus utilized is often the Autographa californica nucleopolyhedrovirus (AcMNPV)
which is able to infect over 30+ insects and the cell line used is the Spodoptera
frugiperda – fall armyworm moth Sf9.
Transfer vector is designed by flanking with AcMNPV sequences at both 5’ and 3’
ends and introduced into genome by homologous recombination.
To improve this process, linearise the AcMNPV prior to transfection to increase
frequency of recombination due to crossing over.
Flow chart: transform gene of interest into bacteria amplify the recombinant
bacteria plasmid transfect using baculovirus into insect cell line and obtain
recombinant virus particles for higher expression infect in new insect cell line
harvest protein.
Disadvantages of this method would be that it is expensive and production is not
continuous since baculovirus kills the host.
GATEWAY cloning technology used to transfer DNA fragments between plasmids using a set
of recombination sequences and enzymes. Goal is to move the gene from one vector
backbone to another.
Site-specific recombination mediated by phage λ recombination proteins. Specific
and directional which requires two different combinations of enzymes.
The att sites contain binding sites for proteins that mediate recombination, and
integration reaction is mediated by integrase and host integration factor (IHF) for
BP reaction and additionally excisionase (Xis) for the LR reaction.
When integration occurs, two new sites are created which flank the integrated
prophage with no loss of DNA sequence.
Advantages: entry clone can be easily sub-cloned into wide variety of destination
vectors, thus minimize planning, eliminate cloning and maximize compatibility and
flexibility.
LSM4242 Protein Engineering
7
Lecture 3 – In vitro translation and Site-directed Mutagenesis
Wide variety of applications:
Rapid identification of gene products
Localization of mutations via synthesis of truncated gene products
Incorporation of modified or unnatural amino acids for functional studies
Protein folding studies by using chaperones
Advantages over in vivo expression when:
Product is toxic to host cell
Product is insoluble or forms inclusion bodies
Protein undergoes rapid proteolytic degradation by intracellular proteases
Standard translation systems require purified RNA as a template for translation. If DNA is
used, then transcription and translation are coupled.
They generally contain all the macromolecular components required for translation
such as 70S (bacteria)/80S (eukaryotic) ribosomes, tRNAs, aminoacyl-tRNA
synthetases, initiation, elongation and termination factors supplemented with amino
acids, energy sources and energy regenerating systems.
In an Eppendorf tube containing the system, we can add either DNA or RNA to kick-
start the transcription or translation process (~1h each @ 30C). Product can be
quantified by introducing 35S into methionine and performing autoradiography via
SDS-PAGE since other proteins would also be present in the system.
Examples of standard translation systems are:
Rabbit reticulocyte lysate – highly efficient in vitro eukaryotic protein synthesis
system used for translation of exogenous RNAs. In vivo, reticulocytes are highly
specialized cells primarily responsible for the synthesis of hemoglobin, which
represents more than 90% of the protein made in the reticulocyte. These immature
red cells have already lost their nuclei, but contain adequate mRNA, as well as
complete translation machinery, for extensive globin synthesis. They are often
treated with nuclease to reduce background and increase efficient utilization of
exogenous RNAs.
Wheat germ extract – has low background incorporation due to its low level of
endogenous mRNA. Recommended for translation of RNA containing small
fragments of double-stranded RNA or oxidized thiols, which are inhibitory to the
rabbit reticulocyte lysate.
E. coli cell-free system – simple translational apparatus with less complicated
control at the initiation level, allowing this system to be very efficient in protein
synthesis. Bacterial extracts are often unsuitable for translation of RNA, because
exogenous RNA is rapidly degraded by endogenous nucleases. However, E. coli
extracts are ideal for coupled transcription-translation from DNA templates.
Linked transcription-translation:
Transcription with bacteriophage polymerase in prokaryotic system followed by
translation in eukaryotic system.
Coupled transcription-translation:
Simultaneously in E. coli extract, one-step reaction in vitro which results in efficient
expression of either prokaryotic or eukaryotic gene products.
Important elements in DNA for translation – eukaryotic requires 7-methyl-GTP 5’ cap, 3’
poly A tail and Kozak sequence while prokaryotic require Shine-Dalgarno sequence (RBS).
LSM4242 Protein Engineering
8
Site-directed mutagenesis (SDM) – alteration of amino acids at a given position.
Conventional mutagenesis is random, results in multiple possible mutations where
most are detrimental.
SDM allows for the characterization of the dynamic and complex relationships
between protein structure and function, study of gene expression elements and
vector modification.
Requires:
DNA sequence to determine which codons to alter.
3D structure of protein or bioinformatics to determine which amino acids
are candidates for modification in relation to active site, protein stability and
regulatory elements.
Oligonucleotide directed mutagenesis requires templates:
M13 ssDNA – single stranded bacteriophage, but double stranded in replicative form.
Use an oligonucleotide complementary to
desired codon.
Theoretically, half of phage should be mutants,
but in reality only 1-5% is mutated due to DNA
repair mechanisms.
Improved method for M13 vector – grow M13 vector
in mutant E. coli carrying two mutations:
dut – defective dUTPase which elevates
intracellular level of dUTP, resulting in
some being incorporated in DNA.
ung – defective uracil N-glycosylase
which prevents uracil from being
removed in DNA.
Thus ~1% of U exists in DNA which
lowers the possibility of DNA repair at
the mutagenesis site.
When transformed back into wild-type E.
coli (ung+), all uracil will be removed
where mutated form is not degraded.
Note that it is inconvenient to work with
M13 phage as it requires many steps.
Plasmid dsDNA – preferred method for
mutagenesis as it is specific and quicker, any kind of DNA is usable.
Introduce mutagenic oligonucleotide in
PCR to generate mutants
In 2-step overlap PCR, first reaction can
be used to introduce the mutation in
two halves where the second PCR can
be used to get a clonable mutated gene.
QuikChange II site-directed mutagenesis – Commercial
product which allows for mutagenesis in any double
stranded plasmid.
Quick three-step procedure which has a
mutation rate of greater than 80% efficiency in a single reaction.
LSM4242 Protein Engineering
9
Requires:
2 synthetic oligonucleotide primers containing desired mutation.
High-fidelity (HF) DNA polymerase which extends primer with highest
fidelity.
DpnI endonuclease which digests the parental (methylated) DNA template.
Note that dam- strains are not suitable
Lecture 4 – Molecular Evolution
Enzymes are adapted to function optimally in living cells for the conversion of natural
substrates, metabolic control and rapid turnover.
Since they generally have limited stress tolerance, a natural variant may not perform
perfectly in an industrial process because of the distinct conditions and demands.
Thus laboratory evolutions methods are required to fine-tune the selectivity and
activity of enzymes.
Differs from natural evolution as it is directed towards a functional goal –
think of breeding.
Have been successfully applied in:
Protein/ligand binding – antibody detection, stronger ligand binding
Improving protein stability – heat and solvent stability
Modifying enzyme selectivity – accepting other substrates
Rate of evolution of a single gene can be indeed be accelerated under in vitro selective
pressure through the generation of a new and more efficient functional variant of the same
gene. General approach: construct library of variant genes and screen/select the protein
products of the genes.
LSM4242 Protein Engineering
10
Library construction method:
Random mutagenesis – introduce change through the gene, useful if you
don’t know what mutation to use.
Use physical (UV) or chemical mutagens (ROS)
Error prone PCR - add Mn2+ and biased concentrations of dNTPs
along with error prone DNA polymerases (e.g. Mutazyme or Taq).
Mutator strains containing error-prone DNA polymerases
Directed methods – randomize only at a specific position, useful if you know
the area of interest.
Recombination (chimeric) methods – bring existing sequence diversity
together in novel combinations, either from point mutants or from different
parental DNA sequences, results in overall structural change.
DNA shuffling – use DNaseI digestion on dsDNA to generate small
fragments which act as overlapping primers where they are
randomly annealed to obtain full-length DNA. Factors involved:
♦ Similarity of genes selected
♦ Size of DNA fragments
♦ Annealing temperature
Staggered extension process (StEP) where small segments are added
to the end of a growing DNA strand in a series of very short extension
steps (extension time is varied)
Random chimeragenesis on transient templates (RACHITT) which
produce chimeras with a much larger number of crossovers.
Example of DNA shuffling:
4 genes from 4 microbial species encoding class C cephalosporinases which are 58-
82% identical at DNA level were shuffled either individually or as a pool where the
transformants with resultant libraries were screened for antibiotic moxalactam
resistance.
Single gene shuffling only resulted in 8-fold increased resistance while multi-
gene shuffling resulted in 270-540 fold increase in resistance.
By mixing several gene sequences, the enzyme structure can change
which may prove to be more effective than changing sequences on a
single gene.
In viruses, multiple parental sequences of viral glycoprotein involved in viral vector
can be shuffled where vectors with altered tropism can be achieved.
LSM4242 Protein Engineering
11
Directed evolution – majority of reported experiments are a combination of error-prone
PCR and DNA shuffling.
E.g. GFP gene was amplified by ep-PCR and cut into 50-300bp pieces. They were
then assembled by second PCR without primer (random annealing) and the
products were cloned into an expression vector.
Selection of resulting clones was done by FACS and the most fluorescent cells
(> 100 fold) were amplified, sorted, characterized and sequenced.
Resulted in the creation of eBFP, eCFP, eGFP, eYFP and dsRED.
E.g. Engineering of p450 BM3 from Bacilius megaterium to metabolize hydrocarbons
Medium chain fatty acid monooxygenase heme enzyme.
Single polypeptide chain containing hydroxylase domain and reductase
domain.
Upon mutagenic PCR, StEP and ep-PCR, the mutant obtained was able to use
other hydrocarbons as substrate with higher maximum turnover rate as
compared to the wild type.
General strategy for large scale analysis of protein function:
DNA library express proteins select/screen proteins for desired
function isolate DNA and select DNA sequence amplify or mutate for
improvements Repeat
Lecture 5 – Display Technology
Genetic material is physically associated with the proteins for selection/screening in library
and thus the success of display selection relies on the ability to retrieve the genetic
information along with the functional protein.
In vivo display – based on M13 and phagemid-based cloning system
Mutated gene is inserted into M13 g3p (pilus) or g8p (surface coat protein) gene to
form C-terminal fusion protein, and then transformed into E. coli.
By fusing proteins to pIII (>50aa) and pVIII (6-30aa) gene, they can be displayed on
the phage. Note that this only works for small proteins/peptides, bigger proteins
would interfere in phage assembly.
In screening the phage library assuming they express a scFv coupled to a HA tag,
incubate them in an antigen coated well.
Eluted phages can then be used for enrichment by infect E. coli again to generate a
secondary library. Use ELISA to test for binding affinity.
LSM4242 Protein Engineering
12
Single-Chain Fv (variable fragment) is the favoured form to be displayed.
In construction of antibody libraries, DNA sequences encoding VH and VL domains
are amplified by PCR and paired randomly. The scFv sequences are amplified by PCR
using primers incorporating restriction sites and then cloned into the phagemid
vector on pIII.
In vitro display – cell free display system through formation of stable protein-ribosome-
mRNA (PRM) complexes.
In this system, stable PRM complexes and correct folding of protein needs to be
established.
Stop codon can be removed to ensure that protein does not dissemble from the
ribosome.
Advantages:
Larger screening capacities (1014/mL) - probably the smallest system since
ribosomes are used.
No limit on transformation efficiency
PCR products can be utilized
Able to handle toxic, proteolytically sensitive and modified amino acids
which may not be possible in vivo.
Can be used for the improvement of stability and activity of proteins, antibody
engineering and generation of new multidomain/multifunctional proteins.
LSM4242 Protein Engineering
13
Lecture 6 – Antibodies and SiRNA
Polyclonal antibodies are those collected from serum of exposed animals which can
recognize multiple antigenic sites of injected biochemical.
Monoclonal antibodies (mAbs) are cloned and cultured individual B lymphocyte
hybridomas that are secreted and collected from culture media, only able to recognize one
antigenic site of injected biochemical.
Hybridoma technology is a method for producing large numbers of identical antibodies
(also called monoclonal antibodies).
This process starts by injecting a mouse (or other mammal) with an antigen that
provokes an immune response. B cells will produce antibodies that bind to the
injected antigen and these newly produced cells are then harvested.
These isolated B cells are in turn fused with immortal B cell cancer cells, a myeloma
to produce a hybrid cell line called a hybridoma, which has both the antibody-
producing ability of the B-cell and the exaggerated longevity and reproductivity of
the myeloma.
B cells are HGPRT+ (HGPRT plays a central role in the generation of purine
nucleotides through the purine salvage pathway) while tumor cells are HGPRT- and
thus cannot utilize the salvage pathway.
When grown in a HAT (hypoxanthine, aminopterin and thymidine) medium which
inhibits de novo synthesis of nucleic acids, myeloma cells that cannot switch over to
the salvage pathway are killed due to lack of HGPRT.
Aminopterin inhibits the de novo pathway which is required for cell division
while hypoxanthine and thymine provides the source provided the right
enzymes are present.
In vitro culture would be less concentrated and contains bovine serum while ascites
fluid (from mouse) will contain high concentration with minor contamination of
mouse Ig.
The mAbs can be purified via affinity purification using epitope (the part of an
antigen molecule to which an antibody attaches itself).
ScFv fragments are linked by a C to N terminal linker peptide to stabilize them. Their size
and specificity may allow for attachment to cryptic sites. Can be selected through phage
display technology.
LSM4242 Protein Engineering
14
Testing of mAb production by enzyme linked immunosorbent assay (ELISA)
Direct sandwich method for testing antigen: antibody + antigen + enzyme-linked
antibody
Indirect method for testing antiserum: antigen + antibody + enzyme-linked antibody
With enzyme-linked antibody, substrate is added and reaction produces a visible
colour change.
PNPP (p-Nitrophenyl Phosphate, Disodium Salt) is a widely used substrate
for detecting alkaline phosphatase in ELISA applications.
TMB (3,3',5,5'-tetramethylbenzidine) soluble substrates yield a blue color
when detecting HRP.
Monoclonal antibodies can be used for:
Protein purification in affinity chromatography
Identification and isolation of cell subpopulations using FACS – fluorescent
antibodies bind to surface markers on cells
Tumor detection, imaging and killing – select antibodies from phage display library
using antigen from cancer patient serum.
E.g. mAbs can be tagged with radioactive tracer which when injected into the
body will localize in areas of recurrent carcinoma cells.
Molecular drugs – against TNF- (septic shock), CD3 (transplants e.g. anti-CD3 mAbs
such as Muromonab which eliminates graft vs. host reaction), IL-2R
(leukaemia/lymphoma), anti-venoms (snake bites), viruses (infections)
E.g. Metastatic breast cancer patients overexpress HER-2 (EGF) receptor and
thus anti-HER mAbs can be used to block EGF binding to HER-2 which slow
cell growth.
E.g. fuse toxin to antibody to generate immunotoxin, only the targeted
tumour cells are killed by the toxin.
Problems arising in mAb production:
Repeated immunisation required in mice, lengthy procedure which involves
recovery of B cells and generation of hybridomas.
Humans can generate immune response to mouse antibodies.
Human mAb production will require large blood volumes/excised lymph nodes and
thus ethical difficulties, cell lines are also often unstable.
Thus instead of producing human antibodies, we produce chimeric human-mouse
antibodies which can function the same way.
Xenomouse – genetically engineered mouse where murine IgH and IgK loci
are replaced with human Ig counterparts.
Method works as human Ig transgenes carry majority of the variable
repertoire and can undergo class switching from IgM to IgG isotypes.
On using antisense oligonucleotides to selectively suppress and shutting down target mRNA
through:
When directed to terminus of 5’ UTR, no initiation due to prevention of ribosome
binding.
When directed to downstream of 5’ UTR, no translation occurs
When oligomer directed to splice site, no splicing occurs.
When oligomer directed to critical region of RNA domain of ribonucleoprotein (e.g.
telomerase), it inhibits the activity.
LSM4242 Protein Engineering
15
E.g. PKC is a central enzyme in tumor progression. The use of ISIS3521 results in the
expression of protein kinase C (PKC) isozyme to be specifically reduced.
Disadvantages include:
Limited efficacy
Poor specificity
Platelet toxicity
Overall down regulation of gene expression resulting in increased cell
invasiveness.
Solution lies in siRNA (small interfering RNA) which are duplexes of 19-21 nucleotides
RNAs with symmetric 2 nucleotide 3’ overhangs.
They are encoded in the genome as RNA and are produced when DICER
(endonuclease) cuts it into short pieces.
Binds to mRNA in the RISC complex which causes mRNA degradation, thus silencing
the gene.
Been shown to be able to suppress expression of GFP in oocytes.
Been shown to be able to fight HIV virus by silencing the gag gene which encodes for
an essential HIV core protein p24.
Guidelines for siRNA design (note that 50% of them give >50% silencing while 25% give >
70% silencing, thus need to test with at least 4 to be sure):
1. Find occurrences within mRNA with “AA” dinucleotide overhangs
2. Capture following 19 nucleotides
3. GC content should be 30-50%
4. BLAST search to find sequences with low homology to other genes
Methods to produce siRNAs:
In vitro:
Chemical synthesis
In vitro transcription
RNase III/DICER digestion of long dsRNA
In vivo:
Plasmids, PCR Templates (siRNA expression cassettes), viral vectors.
Consist of promoter, siRNA template (hairpin) consisting of sense +
antisense strand and termination signal (3-5Ts or polyA signal)
Usage of plasmid vectors eliminate the need to work with RNA and is able to
produce large quantities however requires cloning and thus troublesome.
Usage of siRNA expression cassettes involves three-step PCR method and
thus skips cloning, however requires time to optimize the PCR conditions.
LSM4242 Protein Engineering
16
Lecture 7 – Genome Editing
Genomic editing is the introduction of targeted genomic sequence changes including
targeted deletions, insertions and precise sequence changes into living cells and organisms.
First step is to create a DNA double-stranded break (DSB) which can be repaired by NHEJ or
HDR.
Systems for inducing DSB initially
relied on protein based systems with
customizable DNA-binding specificities
such as zinc finger nucleases (ZFNs)
and transcription activator-like effector
nucleases (TALENs )
Both use protein-DNA
interactions for targeting and
have extended DNA recognition
sequences (14 to 40bp)
The construction of engineered
zinc finger array is difficult
while the highly repetitive nature of TALEN-coding sequences is also a problem for
delivery using viral vectors.
Recently developed bacterial CRISPR-associated protein (Cas9) nuclease from Streptococcus
pyogenes which are RNA-guided nucleases.
Use simple base-pairing rules between engineered RNA and target DNA site.
CRISPR systems are adaptable immune mechanisms used by bacteria to protect
themselves from foreign nucleic acids such as viruses or plasmids.
Requires either a crRNA/tracrRNA hybrid or gRNA bound to Cas9 protein for
recognition at the PAM (NGG) site of the target DNA for cleavage.
Nickase domain in Cas9 cleaves only the DNA strand that is complementary
to and recognized by the gRNA.
Cas9 have been shown to be able to insert/delete base pairs, insert/replace
sequences, delete/rearrange sequences.
By deactivating the nickase domain, dCas9 can be used for:
LSM4242 Protein Engineering
17
Gene activation by binding it to an activation domain
DNA modification by binding it to an effector domain
Imaging of a genomic locus by binding it with GFP
Parameters to evaluate genome editing tool:
Targeting efficiency : % of desired mutation achieved
Cas9 (>70% in zebrafish) is much better as compared to TALENs and ZFNs
(1-50% in human cells)
Off-target mutations – likely to appear in sites that have differences of only a few
nucleotides compared to original sequence as long as they are adjacent to PAM
sequence.
Cas9 can tolerate up to 5 base mismatches within the protospacer region or
a single base difference in PAM sequence
Off target mutations are hard to detect as whole genome sequencing is
required.
To reduce off-target mutations:
Use truncated gRNA or add two extra G at 5’ end
Use paired nickase – two sgRNAs complementary to adjacent area on
opposite strands. Although it induces DSBs in target DNA, it only
creates single nicks in off-target locations and thus minimal off target
mutations.
Use web based tools to facilitate identification of potential CRISPR
target sites and assess their potential for off-target cleavage.
Current applications of CRISPR/Cas9:
Have already been used in many cell lines and organisms such as human, bacteria,
zebrafish, C. elegans, plants, Xenopus tropicalis, yeast, Drosophila, monkeys, rabbits,
pigs, rats and mice.
Single point mutations in a particular target gene via single gRNA.
Induce large deletions or rearrangements such as inversions or translocations using
a pair of gRNA-directed Cas9 nucleases.
LSM4242 Protein Engineering
18
Using dCas9 to target protein domains for transcriptional regulation, epigenetic
modification and visualization of genome loci.
Enables rapid genome-wide interrogation of gene function by generating large gRNA
libraries for genomic screening.
LSM4242 Protein Engineering
19
Lecture 8 – Structure-based Protein Design and Engineering
Protein engineering is needed for better catalysts in the industry, and as therapeutic agents.
We want to manipulate proteins in a controlled and rational way. Thus we need to know the
principles and mechanisms such as structure, folding and stability as well as catalysis before
using molecular biology to engineer it.
E.g. Dengue virus is produced as a long amino acid chain which gets cleaved by
proteases. Genome encodes for 10 proteins where 3 are structural proteins (coat
and RNA delivery) and the remaining 7 are non-structural proteins (production of
new viruses).
Structural studies reveal that NS3 protease is an intrinsically disordered
chymotrypsin fold which requests NS2B for correct folding and functional dynamics.
Solution conformations of NS2B and NS3 proteins show that they can be inhibited by
natural products from edible plants.
Proteins are made up of 20 natural amino acids which are hydrophobic, charged or polar.
Their functions are defined by their 3D structures (e.g. random coil, alpha helix or beta sheet)
Protein folding is spontaneous and starts from the random coil state.
Afinsen experiment on ribonuclease:
Adding urea and mercaptoethanol to ribonuclease result in inactive enzyme
and reduced disulfide bonds.
Removal of urea first allowed protein to reform into its native state,
following which removal of mercaptoethanol allowed the correct disulfide
bonds to form.
Removal of mercaptoethanol first caused the wrong disulfide bonds to form,
following which when urea was removed, the enzyme was inactive.
This suggests that the amino acid sequence determines the folding of the
protein where it folds to reduce the Gibbs free energy of the whole system.
Folding models:
Framework model – secondary structures formed first which further pack into final
structure with well-defined side chain packing through diffusion.
Nucleation model – most stable secondary structure formed first, folding starts at
nucleation site and spreads throughout protein.
LSM4242 Protein Engineering
20
Hydrophobic collapse & “molten globule” model – more compact state with
hydrophobic side chains inside.
Folding funnels – many possible pathways to native state, however there will only
be one global minimum.
Driving forces for protein folding:
Hydrophobic effect is the main driving force where hydrophobic side chains
cluster/exclude water; result in release of water cages in unfolded state which cause
lowering of free energy.
Hydrogen bonds, electrostatic interactions (salt bridge) and chemical cross links
such as disulfides or metal ions also stabilize protein structure.
Some comments:
Are there any exact rules for H-bond as basis for folding? We can remove H-bond via
SDM to test but still there is no clear exact answer because this is context dependent.
The entire system of H-bond must be considered for the entire structure. Removing
one may cause the formation of another H-bond etc. Can we do a systems approach
for protein folding?
Introducing disulfide bonds may not always stabilize the protein. Protein stability
may be higher, but may not fold into desired conformation.
Each mutation does not just affect one area, it will affect many other factors such as
Van der Waals interactions, electrostatic interactions etc.
Misfolded proteins can be inserted into membranes due to formation of helices/loops which
may be caused due to the environment, may lead to toxicity.
LSM4242 Protein Engineering
21
Lecture 9 – Intrinsically disordered proteins and protein-protein Interactions
Many gene sequences in eukaryotic genomes encode entire proteins or large segments of
proteins that lack a well-structured three-dimensional fold.
Disordered regions can be highly conserved between species in both composition
and sequence.
Most are functional where many disordered segments fold on binding to their
biological targets (coupled folding and binding) whereas others constitute flexible
linkers that have a role in assembly of macromolecular arrays
E.g. CREB-binding protein binding to CREB.
Well folded proteins have high complexity sequences but they only have up 50% of
all proteins. IDPs are hard to study as they get rapidly degraded by proteases.
Hydrophobicity and mean net charge can help to determine if a protein in
natively unfolded or folded.
IDPs are characterised with high net charge and low hydrophobicity.
IDPs function by coupled folding and binding it folds upon binding to a protein.
E.g. IDP binds to PAK4 and forms a helix structure.
Infected prions can cause a change in secondary structure in normal prions upon contact.
Prion-like domains have low complexity sequence enriched in polar and uncharged
amino acids such as Gln, Asn, Ser, Gly and Tyr.
Generally involved in neurodegenerative diseases such as mad-cow disease and
Creutzfeld-Jacob disease.
Liquid-liquid phase separation (LLPS) is the principle behind the formation of membrane-
less organelles.
ATP concentration is very high in the cell and thus it is suggested that they help to
solubilize hydrophobic molecules in aqueous solutions, acting as a bivalent binder
(bind to different parts) and thus preventing formation of protein aggregates.
LLPS dissolves at high ATP concentrations.
Lack of ATP can also enhance aggregation.
Proteins can interact with other proteins, nucleic acids and small molecules at genome,
proteome and metabolome level.
LSM4242 Protein Engineering
22
Focus is at the proteome level where protein-protein interactions are examined
Scaffolding proteins are very important for integrating signals from upstream.
Driving force for protein-protein interactions are the same as those for protein
folding
Protein interfaces are diverse in size, shape, composition and solvent content and thus no
reliable method is available so far to detect protein-protein interaction interfaces.
However there are two categories which are well studied – enzyme protein
substrate interactions and pure protein-protein interactions.
Protein-ligand interaction can be described by 3 theories:
Lock and key model – specificity of enzyme where substrate has
specific complementary shape that fits active site.
Induced fit hypothesis – some complexes have different
conformation from unbound state as the bound conformations are
induced by the binding partner.
Conformational selection and population shift – biomolecules exist in
dynamic ensembles of conformations where during binding,
conformers that are most complementary to some pre-existing
ligand conformation are preferentially bound. This disturbs the
equilibrium, resulting in a population shift such that equilibrium is
restored. Note that the conformation is always induced by the ligand.
Design and discovery of molecules to disrupt protein-protein interaction interfaces are a
critical approach to develop therapeutics, either by random screening or by rational design.