engineering rna-binding proteins: unravelling the code · engineering rna-binding proteins:...
Post on 27-Jun-2020
5 Views
Preview:
TRANSCRIPT
Engineering RNA-binding proteins: Unravelling the code
Muhammad Fazril MOHAMAD RAZIF B.Forensics and B.Sc (Hons)
This thesis is presented for the degree of Doctor of Philosophy of Medicine and Pharmacology of The University of Western Australia
School of Medicine and Pharmacology
2012
Engineering RNA-binding proteins: Unravelling the code
i
DECLARATION FOR THESES CONTAINING PUBLISHED WORK AND/OR WORK PREPARED FOR PUBLICATION
The examination of the thesis is an examination of the work of the student. The work must have been substantially conducted by the student during enrolment in the degree.
Where the thesis includes work to which others have contributed, the thesis must include a statement that makes the student’s contribution clear to the examiners. This may be in the form of a description of the precise contribution of the student to the work presented for examination and/or a statement of the percentage of the work that was done by the student.
In addition, in the case of co-authored publications included in the thesis, each author must give their signed permission for the work to be included. If signatures from all the authors cannot be obtained, the statement detailing the student’s contribution to the work must be signed by the coordinating supervisor.
Please sign one of the statements below.
1. This thesis does not contain work that I have published, nor work under review for publication. Student Signature ........................................................................................................................
2. This thesis contains only sole-authored work, some of which has been published and/or prepared for publication under sole authorship. The bibliographical details of the work and where it appears in the thesis are outlined below. Student Signature ........................................................................................................................
3. This thesis contains published work and/or work prepared for publication, some of which has been co-authored. The bibliographical details of the work and where it appears in the thesis are outlined below. The student must attach to this declaration a statement for each publication that clarifies the contribution of the student to the work. This may be in the form of a description of the precise contributions of the student to the published work and/or a statement of percent contribution by the student. This statement must be signed by all authors. If signatures from all the authors cannot be obtained, the statement detailing the student’s contribution to the published work must be signed by the coordinating supervisor.
Filipovska, A., Razif, M. F., Nygard, K. K., Rackham, O. (2011) A universal code for RNA
Recognition by PUF proteins. Nat Chem Biol. 7(7), 425-427 Student Signature:…………………………………………………………………………………….. Coordinating Supervisor Signature:…………………………………………………………………
Engineering RNA-binding proteins: Unravelling the code
ii
DECLARATION
I, Muhammad Fazril MOHAMAD RAZIF, declare that the PhD thesis entitled “Engineering
RNA-binding proteins: Unraveling the code” contains published work within Chapters 3
and 4. The work presented in Chapters 3 and 4 has been published in the following
publication:
Filipovska, A., Razif, MF., Nygard, KK., Rackham, O., 2011, A universal code for RNA
recognition by PUF proteins, Nat Chem Biol., 7(7):425-7 (IMPACT FACTOR: 14.69)
For this publication, I performed the construction of plasmids, protein purifications and
the associated troubleshooting for appropriate induction conditions, and established and
optimized the yeast three-hybrid assays. The work described in Chapters 5 and 6 was
performed entirely by me, with the exception of the northern blot.
Student Signature: Date: Co-ordinating Supervisor Signature Date:
Engineering RNA-binding proteins: Unravelling the code
iii
ABSTRACT
RNA-protein interactions have key roles in the regulation of gene expression and
are vital for many cellular processes and complex developmental programs in eukaryotes.
Proteins that have the ability to bind RNAs tend to do so in various modes that are often
difficult to predict, limiting the ability to engineer these RNA-binding proteins for medical
and biotechnological use. Hence, engineering proteins that can bind a specific RNA
sequence has many potential applications, analogous to that of siRNAs and miRNAs, with
even more potential benefits. The ability to fuse RNA-binding proteins to any desired
effector domain, in turn enabling the manipulation of any aspect of the target RNA’s
metabolism makes engineering these proteins highly appealing. Here, the recognition of
PUF repeats beyond adenine, guanine and uracil has been achieved through directed
evolution, enabling them to specifically bind cytosine. With this code, PUF domains
capable of selectively binding RNA targets of diverse sequence and structure can be
designed. Unlike the PUFs, the basis for nucleotide RNA recognition by pentatricopeptide
repeat (PPR) proteins, another RNA-binding protein, remains ambiguous. Here,
computational methods have been used to create a stable, highly reduced PPR
architecture for the study of RNA-binding specificity and the design of specific tools to
manipulate RNA metabolism. We used these synthetic PPRs to examine the amino acid
codes for nucleotide recognition by PPRs, which also revealed that PPRs have a modular
recognition mechanism similar to that of PUFs. These findings provide a significant step
towards predicting native binding sites of the vast number of PPR proteins found in
nature. It also highlights the possibility of a PPR scaffold to be engineered for new
functions and sequence specificities.
Engineering RNA-binding proteins: Unravelling the code
iv
ACKNOWLEDGEMENTS
The decision to pursue this PhD has definitely been a life-changing experience for
me and the past four years would not have been possible without the remarkable amount
of support and guidance that I received. First of all, I would like to say a very big thank you
to my supervisors, Dr. Oliver Rackham and Dr. Aleksandra Filipovska, for their never-
ending encouragement and support throughout this journey. Their passion for science and
incredible focus motivated me to come into the lab every day; in my eyes, they are truly
inspirational. They never cease to amaze me and I hope that I too can become the
accomplished scientist that they are (although I may require some genetic engineering).
Many thanks also to Dr. Stefan Davies and Anne-Marie for guiding me not only
with experiments, but also for all the entertaining discussions and invaluable advice. To all
the other students in Lab 555, Isabel, Tara, Ross, Louis and Tiong-Sun, you for all have
been by my side throughout this PhD, living every single minute of it in the lab with me,
encouraging one another to strive for the best and for that I am grateful. Many thanks to
Kristina and Karoline for readily assisting me whenever I needed help. I am indebted to my
closest friends, Karina, Ellen, Kailin, Szymon, Aaron, Michael S. and Mike J., who shared
their lives with me. I cannot thank you enough for being the most awesome group of
friends one could ever wish for. Without all of you, I think I would have gone cray cray. I
would also like to say a heartfelt thank you to my Mum and Dad for always believing in me
and encouraging me to follow my dreams. Many thanks also to my grandmother and my
siblings, Nur, Lin, Nina and Ariq, for helping in whatever way they could during this
challenging period, I love you all very much.
Finally, I gratefully acknowledge the funding received towards my PhD from the
UWA Scholarships for International Research Fees (SIRF) and University International
Stipend (UIS). Thanks to the Graduate Research School for both the travel grant and
the GRST award. These have enabled me to present my work at Lorne as well as provided
funding for additional experiments.
Engineering RNA-binding proteins: Unravelling the code
v
TABLE OF CONTENTS
DECLARATION i ABSTRACT iii ACKNOWLEDGEMENTS iv TABLE OF CONTENTS v ABBREVIATIONS viii CHAPTER 1: Introduction 1 1.1 Gene Expression 1 1.1.2 The Importance of post-transcriptional control 1 1.2 RNA-binding proteins and their roles in gene expression 3 1.2.1 Co-transcriptional recruitment of RNA binding proteins during mRNA 4
processing 1.2.2 Capping 5 1.2.3 Splicing 5 1.2.4 Polyadenylation 6 1.3 Mitochondrial gene expression 7 1.4 Candidate RNA-binding proteins for biotechnology applications 10 1.4.1 Zinc fingers proteins that bind RNA 12 1.4.2 PUF proteins 14 1.4.3 PUF family of proteins 15 1.4.4 Functions of PUF proteins 16 1.4.5 Human PUF proteins 19 1.4.6 Features of PUF proteins 20 1.5 Pentatricopeptide repeat (PPR) proteins 22 1.5.1 Members of the PPR family and its functions 22 1.5.2 PPR motif and structure 24 1.5.3 RNA recognition code of PPR proteins 26 1.5.4 Human PPRs 28 1.6 Purview of the thesis 31 CHAPTER 2: Materials and Methods 33 2.1 Materials 33 2.2 Methods 34 2.2.1 Plasmid construction 35 2.2.2 E. coli competent cell preparation 37 2.2.3 E. coli transformation 38 2.2.4 Bacterial colony screening 38 2.2.5 Plasmid preparation and analysis 39 2.2.6 Yeast transformation 39 2.2.7 PUF protein expression and purification 40 2.2.8 cPPR protein expression and purification 41
Engineering RNA-binding proteins: Unravelling the code
vi
2.2.9 Bicinchoninic acid (BCA) protein assay 41 2.2.10 SDS-PAGE gel 41 2.2.11 RNA electrophoretic mobility shift assays 42 2.2.12 PUF library selections 42 2.2.13 Yeast three-hybrid assays 43 2.2.14 β-galactosidase assays 43 2.2.15 Cell culture 43 2.2.16 Transfection 44 2.2.17 Northern blotting 44 2.2.18 Mitochondrial protein synthesis 44 2.3 Graphic maps of plasmids 45 2.3.1 pIIIA/MS2-2 plamsid 45 2.3.2 pTYB3-EYFP plasmid 46 2.3.3 pETM30-EYFP plasmid 46 2.3.4 pTYB3 cPPRcaps poly(A) plasmid 47 2.3.5 pcDNA3-OTC-cPPRcaps poly(A)-CTAP plasmid 47 CHAPTER 3: Engineering Cytosine-binding PUF repeats 48 3.1 Methods to study RNA protein interactions 50 3.2 Genetic selection of PUF library in yeast three-hybrid system 52 3.3 Yeast three-hybrid sensitivity testing for engineering individual PUF 53
repeats 3.3.1 Yeast three-hybrid system is sensitive for engineering PUFs 55 3.4 Library screening for cytosine-binding PUF 56 3.4.1 Five PUF mutants interact with cytosine 57 3.5 In vitro analysis of PUF-NRE interaction 60 3.5.1 Purifying PUF proteins 60 3.5.2 RNA electrophoretic mobility shift assay of PUF proteins 68 3.6 Summary 70 CHAPTER 4: Exploring Features of PUF-RNA Interactions 72 4.1 General applicability of cytosine-binding code 72 4.1.1 The cytosine code is modular 74 4.2 PUF-RNA interaction with increasing structure 75 4.3 Extending the PUF domain beyond eight repeats 77 4.4 Summary 81 CHAPTER 5: Engineering Consensus PPRs 83 5.1 Designing consensus PPR 84 5.1.1 Purifying cPPRcaps protein 87 5.1.2 RNA electrophoretic mobility shift assay of the cPPRcaps protein 87 5.2 In vitro analysis of other consensus PPR interactions based on Barkan 89
Engineering RNA-binding proteins: Unravelling the code
vii
et al. (2012) 5.2.1 Purifying cPPRcaps poly(A) proteins 90 5.2.2 RNA electrophoretic mobility shift assay of the cPPRcaps poly(A) protein 91 5.2.3 Purifying cPPRcaps poly(G) proteins 92 5.2.4 RNA electrophoretic mobility shift assay of the cPPRcaps poly(G) protein 93 5.2.5 Purifying cPPRcaps poly(U/C) proteins 94 5.3 In vitro analysis of other consensus PPR combinations 95 5.3.1 Purifying cPPRcaps poly(C) [NS/NT] proteins 97 5.3.2 RNA electrophoretic mobility shift assay of the cPPRcaps poly(C) [NS/NT] 98
protein 5.3.3 Purifying cPPRcaps poly(G) [GD/SD] proteins 99 5.3.4 RNA electrophoretic mobility shift assay of the cPPRcaps poly(G) [GD/SD] 100
protein 5.4 Summary 100 CHAPTER 6: Engineering Designer PPR proteins 103 6.1 Design of consensus PPR protein that binds the NRE RNA 103 6.1.1 Purifying cPPRcaps NRE proteins 104 6.1.2 RNA Electrophoretic Mobility Shift Assay of cPPRcaps NRE proteins 105 6.2 Mammalian mitochondrial RNA metabolism 106 6.2.1 cPPRcaps poly(A) reduces the translation of mitochondrially encoded 109
proteins 6.3 cPPRcaps poly(A) does not affect mitochondrial RNA stability 110 6.4 Summary 112 CHAPTER 7: Discussion 113 BIBLIOGRAPHY 122
Engineering RNA-binding proteins: Unravelling the code
viii
ABBREVIATIONS 3-AT 3-amino triazole A adenine APUM Arabidopsis Pumilio ARE AU-rich elements ASF arginine- and serine –rich domains Ash1 a histone-lysine N-methyltransferase enzyme Asn asparagine ATP adenosine triphosphate C cytosine CDK2 Cyclin-dependent kinase 2 cDNA complimentary DNA CF cleavage factors COXI cytochrome c oxidase subunit I cPPR consensus pentatricopeptide protein CPSF cleavage/polyadenylation specificity factor CSTF cleavage stimulatory factor CTD carboxyl terminal domain CYTB cytochrome B dsRNA double stranded RNA eIF2α eukaryotic initiation factor 2- alpha eIF4G eukaryotic translation initiation factor 4 gamma EMSA electrophoretic mobility shift assay ETC electron transport chain EYFP enhanced yellow fluorescent protein FBF fem-3 binding factor G guanine Gly glycine GST glutathione-S-transferase tag hb hunchback His6 hexa histidine-tag hnRNP A1 Heterogeneous nuclear ribonucleoprotein A1 HSP heavy (H)-strand promoter IMP1 insulin-like growth factor II mRNA binding protein 1 IPTG isopropylthio-β-galactoside IRES internal ribosome entry site LB lysogeny broth LRPPRC leucine-rich pentatricopeptide repeat cassette
Engineering RNA-binding proteins: Unravelling the code
ix
LRR leucine-rich repeats LSP light (L)-strand promoter MCEI mRNA-capping enzyme catalytic subunit Met methionine miRNA microRNA mRNA messenger RNA MRPP3 mitochondrial RNase P protein 3 MRPS27 mitochondrial ribosomal protein of the small subunit 27 mtDNA mitochondrial DNA ncRNA non-coding RNA ND1 NADH dehydrogenase 1 ND6 NADH dehydrogenase 6 NRE Nanos response element NRE Nanos response element Nt nucleotide O/N overnight ORF open reading frame OTC ornithine transcarbamylase PABII poly(A)-binding protein II PABP poly(A)-binding protein PAP poly(A) polymerase PAPD1 PAP associated domain containing 1 PARN poly(A)-specific ribonuclease PBS phosphate buffered saline PCR polymerase chain reaction PDE12 phosphodiesterase 12 PFAM a database of protein families POLRMT mitochondrial RNA polymerase poly(A) polyadenylate poly(C) polycytosine poly(G) polyguanine poly(U) polyuracil PPR Pentatricopeptide repeat PSSM position-specific scoring matrix PTB polypyrimidine tract binding proteins PTCD PPR domain containing protein PUF PUMILIO and fem-3 binding factor PUM1 Pumilio homolog 1 RBP RNA-binding protein
Engineering RNA-binding proteins: Unravelling the code
x
RNAi RNA interference RNAPII RNA polymerase II RNR1 RNA, ribosomal 1 RNR2 RNA, ribosomal 2 rRNA ribosomal RNA RVDS repeat variable di-residues SC media Synthetic Complete (SC) Media Ser serine siRNA short interfering RNA snRNA small nuclear RNA snRNPs small nuclear ribonucleoproteins T Thymine TALE TAL effectors TEV tobacco etch virus TFAM mitochondrial transcription factor A TFB1M mitochondrial transcription factor B1 TPR tetratricopeptide repeat protein TRIM71 tripartite motif containing 71 U uracil UTR untranslated region VEGF-A Vascular endothelial growth factor ZF zinc finger
Engineering RNA-binding proteins: Unravelling the code
1
CHAPTER 1
Introduction
1.1 Gene Expression
Eukaryotic gene expression is a complex stepwise process. A simplified view of the
process begins with the initiation of transcription, followed by the elongation of the
messenger RNA (mRNA) transcript and its termination. During transcription, the pre-
mRNA undergoes several structural changes which include capping at the 5’ end, the
splicing of introns, and the polyadenylation of the 3’ end. The mature mRNA is then
released and exported to the cytoplasm for translation. The view that transcriptional
regulation is the predominant regulatory mechanism has been challenged over the past
few decades with the discovery of post-transcriptional mechanisms for regulating gene
expression (Reviewed in Kishore et al., 2010; Glisovic et al., 2008).
1.1.1 The Importance of post-transcriptional control
Given that DNA is stably maintained at one or two copies as a permanent source of
genetic information in most cells, there are not many opportunities for cells to respond
rapidly to environmental changes or stresses. As most eukaryotic mRNAs are transcribed
as much larger precursors with a great deal of intronic material to be removed, their rate
of synthesis presents a hurdle against rapid changes in gene expression. RNA polymerase
II (RNAPII) transcribes genes of an average length of 60 kb in human cells (Wong et al.,
2001). Since the elongation rate of RNAPII is approximately 30 nucleotides per second,
this results in an average transcription time of over 30 minutes for an average precursor
mRNA (excluding the often slow step of transcription initiation). With some larger
transcripts taking much longer than this (eg. the human dystrophin locus requires 16
hours to transcribe; Tennyson et al., 1995), it would be extremely advantageous for cells
Engineering RNA-binding proteins: Unravelling the code
2
to be able to quickly change their protein complement post-transcriptionally. In addition
to providing a rapid means to respond, post-transcriptional control is particularly useful
because it is readily reversible. For instance, phosphorylation of initiation factor eIF2α
inactivates its ability to exchange GDP for GTP, shutting down translation globally
(Scheper et al., 1998). Simple dephosphorylation can reactivate translation rapidly and in
an energy efficient manner; this could not occur at the transcriptional level (Prostko et al.,
1993).
As DNA is present in a completely different compartment from where its
expression is actualized, the nucleus and cytoplasm, respectively, the only way spatial
information can be integrated into gene expression is at the post-transcriptional level. This
can be achieved by either incorporating this information into the final protein product or
into the mRNA so that its expression can be directed to a particular zone in the cell. As the
average mRNA is translated into thousands of proteins, it is much more energy efficient to
transport the mRNA rather than the protein to the required location (Stern et al., 2007;
Pradet-Balade et al., 2001). The asymmetric partitioning of mRNA within the cytoplasm
plays an important role in determining cell polarity (Latham et al., 1994), differential cell
division (Long et al., 1997; Takizawa et al., 1997), organelle function (Margeot et al.,
2002), nuclear import (Levadoux et al., 1999), synaptic remodeling (Miller et al., 2002),
cancer metastasis (Shestakova et al., 1999), specification of germ cells (Kloc et al., 2001)
and embryonic axes (van Eeden and St Johnston, 1999). All these roles necessitate post-
transcriptional control.
In summary, post-transcriptional control is of considerable use as mRNAs can be
stored at high copy number, rapidly removed if necessary, moved to the exact location
where their product is required, grouped and regrouped into different packages, and
activated or silenced as required. This considerable power and flexibility in terms of
combinatorial assortment led Keene to propose that dynamic clusters of mRNAs from
Engineering RNA-binding proteins: Unravelling the code
3
groups of genes represent the eukaryotic equivalent of bacterial operons (Keene and
Tenenbaum, 2002). However, unlike their prokaryotic equivalents, these non-covalent
groupings can be reconfigured whenever the expression of their constituents needs to be
changed.
Although the regulation of mRNAs from synthesis to destruction is occasionally
controlled directly by the mRNA sequence itself (Hesselberth and Ellington, 2002), more
often than not the roles of mRNAs are almost always dictated by an array of proteins
(Shyu and Wilkinson, 2000; Dreyfuss et al., 2002). Proteins control the efficiency of
transcription, processing, nuclear export, translation, localization and degradation of
mRNA (Hieronymus and Silver, 2004). In recent years many proteins have been found to
bind mRNA and influence its lifecycle. However the exact functions of many of these
proteins remain elusive. In this introduction, I will describe some of the roles RNA-binding
proteins play in the control of gene expression and RNA-binding proteins that have the
potential for use in biotechnology applications.
1.2 RNA-binding proteins and their roles in gene expression
All mature eukaryotic mRNAs can be divided into three functional parts based on
their sequence: the 5’-untranslated region (UTR), the coding sequence and the 3’-UTRs
(Dreyfuss et al., 2002). UTRs contain regulatory cis-elements, such as AU-rich elements
(ARE) and internal ribosome entry sites (IRES), which are critical for post-transcriptional
control (Dassi and Quattrone, 2012). Trans-acting factors, which encompass RNA-binding
proteins (RBPs) or non-coding regulatory RNAs (ncRNAs), interact with these cis-elements
and influence the mRNA’s lifecycle (Dassi and Quattrone, 2012). In recent years, many
proteins have been found to bind mRNA and control the efficiency of transcription,
processing, nuclear export, translation, localization and degradation of mRNA. To illustrate
Engineering RNA-binding proteins: Unravelling the code
4
the many roles that RNA-binding proteins play in the control of gene expression, I will
describe the lifecycle of mRNA and highlight various important RNA-protein interactions.
1.2.1 Co-transcriptional recruitment of RNA-binding proteins during mRNA processing
Upon gene activation, the recruitment of the first RNA-binding protein, RNA
polymerase II (RNAPII), occurs. RNAPII bends DNA and unwinds it to reveal a single strand
that is used as a template for mRNA synthesis (Coulombe and Burton, 1999). As soon as
the newly synthesized RNA exits RNAPII, it is bound by an array of proteins that modifies
the pre-mRNA molecule to a mature functional mRNA so that it will be ready for export to
the cytoplasm. Processing typically involves attaching a 7-methyl guanine “cap” at the 5'
end of the transcript, removal of intronic sequences via splicing, and polyadenylation of
the 3' terminus (Lodish et al., 2000). RNAPII recruits a host of proteins involved in these
processes via the carboxyl terminal domain (CTD) of its largest subunit (Steinmetz, 1997).
Several studies have shown that CTD can function as both an assembly platform and a
regulator of transcription and pre-mRNA processing factors (Maniatis and Reed, 2002).
The active phosphorylation and dephosphorylation of CTD’s repeats is critical to its
function. CTD contains many phosphorylation sites and is the substrate for several kinases
and at least one phosphatase (Cho et al., 2001; Trigon et al., 1998; Rodriguez et al.,
2000; Hirose and Ohkuma., 2007; Sikorski and Buratowski., 2009). Recent work has shown
that the CTD provides a rallying point for proteins that bind mRNA and influence its life
cycle. Interestingly, structural studies predict that the CTD is positioned where RNA exits
the RNAPII transcription complex in the hyperphosphorylated state (Cramer et al., 2001;
2004) but undergoes a conformational change that takes it away from the exit tunnel
when dephosphorylated (Dahmus, 1995; Richard and Manley., 2009; Kuehner et al.,
2011). Many proteins that associate with RNAPII can only bind to one phosphorylation
state of the CTD. When the CTD is positioned adjacent to the exit tunnel, it has the
potential to allow the transfer of RNA-binding proteins immediately to the nascent mRNA.
Engineering RNA-binding proteins: Unravelling the code
5
Insulin-like growth factor-II mRNA-binding protein 1 (IMP1) and polypyrimidine-tract-
binding protein (PTB) are two examples of RNA-binding proteins that, associate with the
mRNAs they regulate while they are transcribed (Oleynikov and Singer, 2003; Urban et al.,
2000; Oberstrass et al.,2005).
1.2.2. Capping
The first processing step that a newly synthesized pre-mRNA undergoes is capping.
Capping is important because it protects the pre-mRNA from degradation, ensuring that it
is adequately stable to complete synthesis, processing and export (Walther et al. 1998;
Grudzien et al., 2006). In mammals, the bi-functional mRNA-capping enzyme catalytic
subunit (MceI) protein performs the removal of a phosphate group from the 5' of the
transcript and the attachment of a guanosine, then a 7-methyltransferase modifies this
MceI product (Bisaillon and Lemay, 1997). The quality of cap methylation is monitored by
the Rat1/Rai1 complex; pre-mRNAs with improperly methylated caps are degraded before
export to the cytoplasm (Jiao et al., 2010). Interestingly, the capping reaction occurs after
25-30 bases of pre-mRNA have been synthesized, the same point at which the CTD
becomes hyperphosphorylated (McCracken et al., 1997a). The cap plays important roles in
promoting subsequent pre-mRNA processing steps of splicing and 3' end processing,
assists translation and blocks 5'-3' degradation upon transfer to the cytoplasm (Parker and
Song, 2004). In the nucleus, the cap-binding protein (CBP) 80 - CBP20 heterodimer binds
to the cap and its interaction with other proteins coordinates the succeeding steps in pre-
mRNA processing (Schoenberg and Maquat, 2012).
1.2.3. Splicing
The next processing step required for the maturation of pre-mRNA involves the
removal of intronic sequences. Splicing is an extremely complex reaction that requires
over 300 proteins and five non-coding small nuclear RNAs (snRNAs) into a macromolecular
Engineering RNA-binding proteins: Unravelling the code
6
complex known as the spliceosome (Rappsilber et al., 2002; Wahl et al., 2009). Splice site
sequences generally do not have sufficient information to clearly specify exon–intron
boundaries, therefore various sequence motifs recognized by RNA binding proteins aid to
define and regulate splice site selection (Graveley, 2000; Smith and Valcárcel, 2000; Wang
and Burge, 2008). Regions to be removed are recognized within the pre-mRNA via short
sequence motifs located within and bordering the intron, and occasionally by more distant
exonic enhancers and suppressors (McCullough and Schuler, 1997; Zheng et al., 1998).
Five small nuclear ribonucleoprotein particles (snRNPs) combined with many other non-
snRNP factors come together to create a catalytically active spliceosome which removes
looped out intronic sequences in a two-step reaction (Patel and Bellini, 2008;
Zaphiropoulos, 1998; Will and Luhrnmann., 2011). While studying the transcriptional cycle
of the human dystrophin gene, it was noted that splicing occurred co-transcriptionally
(Tennyson et al., 1995) and soon afterwards a physical link between these processes was
obtained when CTD immunoprecipitates were found to contain splicing intermediates and
spliceosomal components (Vincent et al., 1996). Recently, based on the analysis of whole-
cell, total RNA sequencing, Ameur et al. (2011) suggested that co-transcriptional splicing
may be widespread in the human brain.
1.2.4. Polyadenylation
Polyadenylation, transcription termination and the release of the RNA from the
site of transcription are the final processing steps an mRNA must undergo prior to export
from the nucleus (Colgan and Manley, 1997; Yonaha and Proudfoot, 2000). In normal
mammalian cells, two major processes are required for polyadenylation. The first is the
cleavage step, which requires the cleavage-polyadenylation specificity factor (CPSF),
cleavage stimulation factor (CstF), cleavage factors I and II (CF I and II), RNAP II and poly(A)
polymerase (PAP; Zhao et al., 1999; Mandel et al., 2008). The site of poly(A) addition is
specified by the attachment of proteins to specific sequences in the 3'-UTRs of pre-mRNA.
The CPSF binds to the hexanucleotide polyadenylation signal sequence (AAUAAA) and the
Engineering RNA-binding proteins: Unravelling the code
7
CstF binds to a GU-rich downstream motif (Bentley, 2005; Takagaki and Manley, 1997; Coll
et al., 2010). Additionally, both CFI and CFII are required for cleavage of the pre-mRNA to
release the excess 3' sequences (Yang and Doublie, 2011). Following cleavage, a dedicated
poly(A) polymerase (PAP) associates with CPSF and uses approximately 20 nucleotides of
pre-mRNA as a primer for poly(A) addition. Shortly after PAP begins RNA synthesis,
poly(A)-binding protein II (PABII) engages the polyadenylation complex and stimulates
processive synthesis of a poly(A) tail between 200 and 300 nt in length (Bienroth et al.,
1993; Wahle., 1995).
Given RNAPII’s intimate association with the previous processing steps, it is not
surprising that it has been found to play a role in polyadenylation. Both CPSF and CSTF co-
purify in RNAPII preparations and CTD-affinity chromatography (McCracken et al., 1997b).
Evidence suggests that CPSF is recruited by transcription factor TFIID and transferred to
RNAP II at the time of transcription initiation (Dantonel et al.,1997; Hirose and Manley,
1998). This interaction is likely to be essential for efficient 3' end processing and
polyadenylation as CTD truncations disrupt these processes in vivo (Hirose and Manley,
1998; Licatalosi et al., 2002). The reciprocal is also true as without poly(A) site recognition
transcription termination by RNAPII is impaired (Dichtl et al., 2002). Once transferred to
the cytoplasm a specific poly(A)-binding protein (PABP) attaches to the poly(A) tail and
plays a number of roles in the remainder of its life cycle (Tarun et al., 1997; Fabian et al.,
2009; Kahvejian et al., 2005).
1.3 Mitochondrial gene expression
The mammalian mitochondrial DNA (mtDNA) is circular, double stranded and
relatively small, only 16.5 kb in size (Smeitink et al., 2001). The genome is compact,
encoding for two rRNAs, 22 tRNAs and 13 proteins that are translated from 9
monocistronic and 2 dicistronic mRNAs (Smeitink et al., 2001). Both dicistronic mRNAs
contain overlapping reading frames (Anderson et al. 1981, 1982). The condensed nature
Engineering RNA-binding proteins: Unravelling the code
8
of the mammalian mitochondrial genome has given rise to transcripts with many distinct
features and imposed post-transcriptional control of its gene expression that is
evolutionarily divergent from other eukaryotes (reviewed by Rorbach and Minczuki,
2012). Human mtDNA genome is transcribed by specialized machinery which includes the
nuclear-encoded mitochondria RNA polymerase (POLMRT), the mitochondrial
transcription factor A (TFAM) and one of the two mitochondrial transcription factor B
paralogues, TFB1M or TFB2M (Falkenberg et al., 2002; Kanki et al., 2004). Transcription
initiates at one of two divergently oriented promoters, the heavy (H)-strand promoter
(HSP) and light (L)-strand promoter (LSP), located in the D-loop regulatory region to
generate two long polycistronic, precursor transcripts that span the heavy and light
strands of the entire mtDNA (Ojala et al., 1981; Aloni and Attardi, 1971; Murphy et al.,
1975; Montoya et al., 1982). A third transcript covering the start of the heavy strand and
the two rRNA genes is also produced (Christianson and Clayton, 1988).
Splicing does not occur in mammalian mitochondria, instead the polycistronic
precursor RNAs are processed to produce the individual tRNA and mRNA molecules by
mitochondria RNase P (at the 5' end of tRNAs; Holzmann et al., 2008) and the
mitochondrial RNase Z (at the 3' ends of tRNAs; Brzezniak et al., 2011; Lopez Sanchez et
al., 2011). This mode of RNA processing is called the ‘tRNA puntuation model’ whereby all
of the protein and rRNA genes are immediately flanked by at least one tRNA gene (Ojala et
al., 1981). Mammalian mitochondrial mRNAs do not include introns, lack conventional 5'
and 3' untranslated regions (UTRs), Shine-Dalgarno sequences, lack 5' 7-methylguanosine
caps and base modifications (Montoya et al., 1981).
Following RNA processing, mitochondrial RNAs undergo maturation process that
involves a CCA triplet being added to the 3′ ends of tRNAs (Nagaike et al., 2001), as well as
modification of specific bases within both tRNAs and rRNAs (Nagaike et al., 2005; Sharma
et al., 2003) while mRNAs are generally polyadenylated at their 3′ ends, with the
Engineering RNA-binding proteins: Unravelling the code
9
exception of the MTND6 mRNA (Bobrowicz et al., 2008; Mercer et al., 2011; Temperley et
al., 2010; Slomovic et al., 2005). The addition of CCA is required for amino acids
attachments and to enable interactions with both the aminoacyl-tRNA synthetases and
elongation factor Tu (Levinger et al., 2004; Cusack, 1997). Translation is accomplished by
the mitochondrial ribosome, which is composed of a large 39S and a small 28S subunit
that associates to form the 55S particles (O'Brien, 1971).
Despite their common polycistronic origin, wide variation between the levels of
individual mRNAs, tRNAs and rRNA has been observed indicating that post-transcriptional
processing and degradation is significant in regulating mitochondrial gene expression
(Mercer et al., 2011). Like the majority of mitochondrial components, the proteins
necessary for replication, repair, transcription and translation are not encoded within the
organelle itself, but rather are encoded in the nucleus. In addition, regulation of the
processing of mitochondrial tRNAs can have vast effects on mitochondrial gene
expression, by affecting the levels of mature RNA species, the final processing of the
different RNAs, and the overall level of translation and mitochondrial function (Lopez-
Sanchez et al., 2011). RNA-binding proteins play essential roles in controlling the
mitochondrial transcriptome from its synthesis to its destruction and have evolved unique
features to complement the unusual features of mitochondrial RNAs.
PPR-containing proteins are RNA-binding proteins that enable interactions
between RNA and the enzymes that act on them in a site-specific manner; a few PPR-
containing proteins have been functionally characterized and implicated in RNA-
processing events in mitochondria (Mili and Pinol-Roma et al., 2003). For example,
MRPP3, one of the three protein component of the mt-RNaseP, is composed of 5 PPR
domains and a putative metallonuclease domain (Lopez Sanchez et al., 2011; Holzmann
and Rossmanith., 2009; Rossmanith and Holzmann., 2009). Studies have shown that
Engineering RNA-binding proteins: Unravelling the code
10
knockdown of RNase P subunits led to increase in the abundance of mitochondrial
precursor transcripts (Holzmann et al., 2008) and decrease in the levels of mRNAs and
tRNAs that consequently decreased mitochondrial translation, ribosome stability and
respiration (Lopez Sanchez et al., 2011). On the other hand, the mitochondrial RNase Z
has been found to associate with PTCD1, another PPR protein, that has been shown to
affect 3' processing of mitochondrial tRNAs and act as negative regulator of leucine tRNAs
(Lopez Sanchez et al., 2011, Rackham et al., 2009) although it is still not clear what the
role of PTCD1 is in mitochondrial tRNA metabolism. The N-terminal region of mammalian
POLRMT contains two putative pentatricopeptide repeat (PPR) motifs and it has been
hypothesized that these PPR domains could be involved in binding nascent mitochondrial
RNA transcripts to stabilize them during their synthesis (Rodeheffer et al., 2001; Ringel et
al., 2011). For many of these mitochondrial PPR proteins, the binding targets remain to be
elucidated; their identification should shed some light on their role in binding of
mitochondrial RNAs and transcription.
In summary, the opinion that transcriptional regulation is the principal regulatory
mechanism has been challenged by the discovery of ever-increasing examples of post-
transcriptional mechanisms for regulating gene expression. It has been shown that RNA-
binding proteins are key players in the post-transcriptional regulation of gene expression.
Over the past decade, a plethora of new RNA-binding proteins that possess potential for
re-engineering for biotechnology purposes have been discovered.
1.4 Candidate RNA-binding proteins for biotechnology applications
Amazing advances have been made in the field of DNA-binding proteins as they
can be custom designed for recognition of a specific target double-stranded DNA and are
now commercially available. Designer DNA-binding proteins are based on classical zinc-
finger (ZF) domains, and they have been developed to activate, silence or aid in the
modification of specific gene in vivo (Sera, 2009; Cathomen and Joung, 2008, Camenisch et
Engineering RNA-binding proteins: Unravelling the code
11
al, 2008). Xanthomonas transcription activator-like effectors (TALE) possess another DNA-
binding domain that has been engineered for biotechnological purposes. These are
proteins with tandemly arranged, nearly identical repeats of about 34 amino acids long,
with nucleic acid specificity almost completely defined by residues 12 and 13, which have
been referred to as repeat variable di-residues (RVDs).
In developing TALEs as biotechnological tools, the simplicity of the TALE code,
which specifies that one TALE repeat binds to one DNA base pair, has enabled the
construction of artificial transcription factors to turn on the expression of specific genes
(Moscou and Bogdanove., 2009; Boch et al., 2009; Zhang et al., 2011; Miller et al., 2010;
Morbitzer et al., 2010). There are instances where some repeats show more specificity
towards a particular DNA base, whereas others are able to recognize more than one base
(Figure 1.1). This has also led to the development TALE nucleases (TALENs), which are
artificial enzymes with programmable specificity; composed of a TALE DNA binding
domain fused to the non-specific DNA cleavage domain from FokI (Mussolino et al., 2011;
Wood et al., 2011; Hockemeyer et al., 2011; Cermak et al., 2011; Li et al., 2011).
Figure 1.1: The recognition code of TALE repeats for DNA bases. Glutamine and isoleucine at positions 12 and 13 (NI) recognize adenine; histidine and aspartate (HD) recognize cytosine; glutamine and glycine (NG) recognize thymine; glutamine and lysine (NK) recognize guanine; repeats with glutamine at both positions 12 and 13 (NN) recognize both guanine and adenine; glutamine and serine (NS) are able to bind all four bases. (Figure adapted from Filipovska and Rackham, 2011)
Engineering RNA-binding proteins: Unravelling the code
12
Given the increasing appreciation of the importance of post-transcriptional gene
regulation and because some aspects of gene expression can only be controlled at the
RNA level, it would be highly desirable to engineer proteins that can recognize RNA with
customized specificity. The potential of using RNA-binding proteins is similar to that of
short interfering RNAs (siRNAs) and microRNAs, which can be regarded as the leading
technologies in the field of RNA regulation at the moment (Liu and Paroo, 2010; Vaishnaw
et al., 2010; Perrimon et al., 2010). However, the use of these short RNA duplexes to
target RNAs is generally limited to lowering their abundance or expression in the
cytoplasm, and this depends on RNA interference pathways. Having designer RBPs fused
to mitochondria targeting sequence or any desired effector domain would thus offer
considerably flexibility for controlling RNA function and is set to become a valuable tool in
medical research.
Initially, designing RBPs was hindered by the lack of structural knowledge and
guidelines determining RNA-protein recognition. However, this situation is changing
rapidly given that several dozen structures of RNA-protein complexes from various
structural classes have been elucidated (Auweter et al., 2006). The diversity of RNA
structure also requires different strategies to be taken when designing these proteins
given that the RNA targets can be single- or double-stranded, as well as more complex
tertiary structures. Here I shall discuss a few perspective RBP candidates that have
qualities suitable for engineering.
1.4.1 Zinc finger proteins that bind RNA
More than two decade ago, the classical zinc finger (ZF) proteins, C2H2, were first
identified as a modular nucleic acid recognition element (Miller et al., 1985). The study
that elucidated to the possibility of ZFs being able to bind nucleic acid transpired when
Picard and Wegnez (1979) found C2H2 ZF motifs in transcription factor IIIA (TFIIIA), a
Engineering RNA-binding proteins: Unravelling the code
13
protein constituent that associates with 5S rRNA within a 7S particle in Xenopus oocytes.
The C2H2 DNA-binding properties were first proposed when the expression of the 5S rRNA
gene was shown to be regulated by TFIIIA (Pelham and Brown, 1980). TFIIIA contains nine
C2H2 ZFs, which are used to recognize both DNA and RNA targets, the 5S rRNA gene and
5S rRNA, respectively (Miller et al., 1985). The C2H2 ZF is a module of about 30 amino acid
residues, and each module constitutes an independent domain stabilized by a zinc ion
ligated to two cysteines and two histidines wth an inner structural hydrophobic core. Its
crystal structure shows that it folds into a small domain comprising two β strands followed
by an α helix (Pavletich and Pabo, 1991).
For DNA binding to the 5S rRNA gene, TFIIIA binds to three elements within the
gene’s internal control region. These are the 11 base pair ‘box A’ sequence, a 3 base pair
‘intermediate element’ sequence and a 10 base pair ‘box C’ sequence (Pieler et al., 1987).
The crystal structure of TFIIIA revealed that the binding occurs antiparallel whereby the of
the first three ZFs bind to the box C sequence, wrapping around the major groove of the
DNA, the fifth ZF was bound to the IE element and ZFs 7–9 interacted with the box A
element (Nolte et al., 1998; Lu et al., 2003). It was interesting to note the ZF 4 and 6 did
not interact with the DNA; they acted as non-binding spacers. It was later discovered that
ZF 4 and 6 are the most important for RNA binding to the 5S rRNA as they both bind to
elements in loop regions of the 5S rRNA using the N-terminal ends of their respective α
helices (Lu et al., 2003). ZFs are now known for their ability to recognize DNA and both
ssRNA and dsRNA, with structural information emerging for all these types of interactions.
More recently, two other ZF proteins have been found to bind RNA targets. These
are the CCCH class of ZF proteins and the RanBP2-type ZF domains. Firstly, the CCCH-type
ZFs, which were discovered in regulatory proteins such as muscleblind and Tis11d, possess
the ability to bind ssRNA and are involved in mRNA processing (Taylor et al., 1996). Each
of the CCCH ZF modules binds to the sequence UAUU. A study by Hudson et al. (2004)
Engineering RNA-binding proteins: Unravelling the code
14
revealed that the CCCH-RNA interaction is largely driven by hydrogen bonds mediated by
the protein backbone, while few side chain–mediated interactions define the specificity of
RNA recognition. On the other hand, Ran-binding domain-containing protein 2 (RanBP2) -
type ZF domains, named after the nuclear pore protein complex where eight of these
domains were found, are able to form base-specific contacts with the GGU sequence in
ssRNA (Loughlin et al., 2009). Loughlin et al. (2009) also found that the binding is
mediated predominantly by hydrogen bonding formed between (i) two arginine side
chains to each of the two guanines in the GGU recognition site, and (ii) uracil with two
asparagine side chains in the ZRANB2 ZF.
Although ZFs appear to be promising candidates for engineering, there are several
drawbacks posed by this technology. For example, the reliance on backbone-mediated
hydrogen bonding, specifically by the CCCH-ZF, for RNA recognition places limitations on
the range of sequences that could be recognized and also makes it less readily engineered
because the RNA-binding specificity would be highly sensitive to small variations in amino
acid sequences. It also remains to be shown if the CCCH-ZFs are able to bind ssRNA
(Mackay et al., 2011). Another limitation to the system can be exemplified by the fact that
RanBP2-type ZFs can only recognize 3 nucleotides, which mean that a total of 64 variants
would be required to recognize all possible triplets in potential target RNAs (Mackay et al.,
2011). This evidently makes RanBP2-type ZFs less practical to use in comparison to the
recognition efficiency of PUF proteins, which only requires four variants as each PUF
repeat recognizes a single nucleotide (See Section 1.4.6).
1.4.2 PUF proteins
The PUF family proteins, named after the founder members Drosophila
melanogaster PUMILIO and Caenorhabditis elegans fem-3 binding factor (FBF) proteins
are an evolutionary conserved family of RNA-binding proteins found in most eukaryotes
(Wickens et al, 2002; Murata and Wharton., 1995; Zhang et al., 1997). They are typically
Engineering RNA-binding proteins: Unravelling the code
15
involved in the regulation of gene expression at the mRNA level by binding to sequences
located in the 3’ UTR and promoting translational activation or repression via affecting the
stability of the transcripts (Wharton and Aggarwal., 2006; Wickens et al., 2002; Quenalt et
al., 2011). They have been shown to be involved in the regulation of embryogenesis,
development and differentiation (Sonoda and Wharton, 1999; Gamberi et al., 2002; Cho
et al., 2006; Prinz et al., 2007; Murata and Wharton, 1995; Wreden et al., 1997), neuronal
function (Dubnau et al., 2003; Menon et al., 2004; Mee et al., 2004; Ye et al., 2004; Vessey
et al., 2006; Muraro et al., 2008) and mitochondrial biogenesis (Garcıa-Rodrıguez et al.,
2007; Saint-Georges et al., 2008; Eliyahu et al., 2010)
1.4.3 PUF family of proteins
In different eukaryotes, the numbers of genes that encode PUF proteins differ
vastly. While Dyctiostelium, Anopheles and Drosophila species have only one PUF gene,
others such as Saccharomyces cerevisiae and C. elegans has six and eleven PUF genes
encoded in their genomes, respectively (Wickens et al., 2002). Xenopus, zebrafish, mouse
and humans have two PUF genes (Spassov and Jurecic, 2002; 2003; Wickens et al., 2002).
Recent studies have revealed the existence of one, two and ten PUF homologs
in Planaria, Plasmodium and Trypanosome, respectively (Salvetti et al., 2005; Cui et al.,
2002; Caro et al. 2006). In Arabidopsis thaliana, Francischinni and Quaggio (2009)
confirmed the identity of 25 Arabidopsis Pumilio (APUM) proteins of which 12 (APUM-1 to
APUM-12) have a PUF domain with 50-75% similarity to the Drosophila PUF domain. To
date, no PUF protein has been found in the Archaea or eubacteria species (Wickens et al.,
2002). Within a species, closely related subfamilies of PUF proteins can be found; for
example in C. elegans, there are the FBF-1 and FBF-2 homologs.
Engineering RNA-binding proteins: Unravelling the code
16
1.4.4 Functions of PUF proteins
PUF proteins have the ability to regulate diverse processes (Table 1.1). Various
studies have revealed that PUFs in each organism display both unique functions as well as
redundancy because each PUF family member not only has its own unique subset of
mRNAs that it binds to, but is also capable of sharing mRNA targets (Gerber et al., 2004;
Hook et al., 2007; Ulbricht and Olivas., 2008). An example of unique transcript targeting
by a PUF is S. cerevisiae Puf5 that is involved in inhibiting yeast cell differentiation to the
filamentous form by repressing the protein levels of the Ste7 MAP-kinase and the Tec1
transcriptional activator (Prinz et al., 2007). Additionally, Drosophila’s Pumilio is involved
in regulating anterior/posterior patterning in early embryos by repressing the translation
of hunchback mRNA via the recruitment of CCR4-NOT (CNOT) complexes which contain
several enzymes that catalyze mRNA deadenylation (Barker et al., 1992; Murata et al.,
1995; Van Etten et al., 2012). On the other hand, both S. cerevisiae Puf4 and Puf5 regulate
HO endonuclease by co-occupying the mRNA and destabilizing it (Hook et al., 2007).
Table 1.1: Various PUF proteins and their related functions
Organism PUF protein Biological Function Target mRNA Refs
Drosophila Pumilio Regulate presynaptic growth of neurons
eiF-4E Menon et al.(2009)
Posterior patterning hunchback Barker et al.(1992) Anterior patterning bicoid Wharton et al.(1991) Germline development and
differentiation cyclin B Lin et al.(1997)
Forbes et al.(1998) C. elegans FBF-1, FBF-2 Spermatogenesis/oogenesis
switch fem-3 Lamont et al.(2004)
MAP kinase phosphatase lip-1 Lee et al.(2007) Olfactory neuron adaptation egl-4 Kaye et al.(2009) PUF-5 –
PUF-7 Oocyte maturation and
differentiation glp-1 Lublin et al.(2006)
PUF-8 Spermatogenesis ? Subramaniam et al.(2003)
PUF-9 Differentiation of hypodermal stem cells
hbl-1 Nolde et al.(2007)
Xenopus Pum1 Oocyte maturation Cyclin B1 Nakahata et al.(2001)
Pum2 Oocyte maturation RINGO/Spy Padmanabhan et al.(2006)
T. brucei PUF9 Cell cycle replication LIGKA Archer et al.(2009) Regulation of organelle copy number PNT1, PNT2 Archer et al.(2009)
Engineering RNA-binding proteins: Unravelling the code
17
Organism PUF protein Biological Function Target mRNA Refs S. cerevisiae
Puf3p Mitochondria biogenesis PET123, COX17 Eliyahu et al.(2010) Jackson et al.(2004)
Puf4p Mating-type switching HO Hook et al.(2007) Puf5p Represses differentiation of
filamentous-form ? Prinz et al.(2007)
Localization of peroxisome protein
PEX14 Zipor et al.(2009)
Cell wall integrity LRG1 Kennedy et al.(1995; 1997)
Mating-type switching HO Hook et al.(2007) Human PUM1 Cell cycle regulation Cyclin B1, Cyclin
E2 Morris et al.(2008)
Histone mRNA binding protein SLBP Morris et al.(2008 Ribosomal subunit export to
cytoplasm SDAD1 Galgano et al.(2008)
Notch signaling pathway DII1 Galgano et al.(2008) PUM2 Translation initiation factor eIF4E Vessey et al.(2010) Vascular endothelial growth
factor VEGF-A Galgano et al.(2008)
MAP kinase ERK2, p38α Lee et al.(2007) Cdc42 effector CEP3 Spik et al.(2006)
PUF proteins also function as repressors of translation, in addition to new evidence
that suggests they can also contribute to the activation of mRNA expression via binding to
the 3’UTR of mRNAs (Pique et al., 2008; Suh et al., 2009; Kaye et al., 2009; Archer et al.,
2009) and assist in subcellular targeting of mRNAs (Gu et al., 2004; Deng et al., 2008;
Vessey et al., 2006 and 2010; Elihayu et al., 2010). Drosophila Pum, an example of a
translational repressor, is involved in the regulation of posterior segmentation and
abdomen formation in the early fly embryo. Pum is required for establishing the anterior-
posterior gradient of the transcription factor hunchback (hb), whose absence at the
posterior end of the fly embryo enables the formation of eight abdominal segments
(Lehmann and Nusslein-Volhard, 1991). Pum binds to NRE within the 3’UTR of hb mRNA
and causes a translational arrest; a repression that occurs only at the posterior end of the
embryo as it requires association with the zinc finger protein, Nanos (Nos) (Murata et al.,
1995; Wreden et al., 1997; Sonoda and Wharton, 1999; Asaoka-Taguchi et al., 1999). Nos
and another protein Brat (Brain Tumor) are simultaneously recruited by Pum-NRE to form
Engineering RNA-binding proteins: Unravelling the code
18
a ribonucleoprotein complex that promotes deadenylation and destabilizing of hb mRNA
(Wreden et al., 1997; Sonoda and Wharton, 1999).
On the other hand, the exact mechanism of how PUFs are involved in mRNA
activation is yet to be determined but there are studies which propose its effects are likely
to be direct (Quenalt et al., 2011). The direct pathway suggests that the PUFs bind to the
mRNAs they activate, leading to an upregulation of the transcript levels; this requires PUF
binding in the 3’UTR of the mRNA (Pique et al., 2008; Suh et al., 2009; Kaye et al., 2009;
Archer et al., 2009). A study by Archer et al. (2009) examining the regulation of gene
expression in the cell cycle of Trypanosoma brucei found that the T. brucei PUF9 stabilizes
certain mRNA transcripts during the S-phase of the cell cycle. They also noted that the
levels of PUF9-regulated transcripts were cell cycle dependent, peaking between the mid-
to late- S-phase. Knocking down PUF9 resulted in the reduction of Puf9 target transcripts
as well as increasing presence of extra nuclei and kinetoplasts, an indication of de-
coupling of their biogenesis from cell division (Archer et al., 2009).
As previously mentioned, PUFs themselves have the ability to assist in the
localization of mRNA by acting as targeting factors. A prime example of this is S.
cerevisieae Puf3p, which is a multi-function PUF that localizes to mitochondria and
contributes to mitochondrial protein synthesis, organization, respiration and biogenesis
(García-Rodríguez et al., 2007). Previous studies have discovered that Puf3p preferentially
binds to mRNAs of nuclear-encoded mitochondrial proteins (Puf3p binds to 87% of the
154 mRNAs encoding proteins that localize to mitochondria). García-Rodríguez et al.
(2007) found that Puf3p works by localizing to the cytosolic face of the mitochondrial
outer membrane and that overexpression of the protein resulted in reduced
mitochondrial respiratory activity and reduced levels of Pet123p (a protein encoded by a
Puf3p-bound mRNA). Puf3p is also known to bind to a consensus motif in the 3'UTR of
Engineering RNA-binding proteins: Unravelling the code
19
many mRNAs encoding mitochondrial proteins (Gerber et al., 2004; Jackson Jr et al.,
2004). There is also evidence that PUF3p deletion leads to a decrease in mRNA
deadenylation and a doubling of the half-life of COX17 mRNA (Olivas and Parker, 2000).
Therefore, Puf3p proteins probably control the transport and stability of a specific set of
mRNAs involved in mitochondrial biogenesis.
1.4.5 Human PUF proteins
There are only two Pumilio related genes present in humans, PUM1 and PUM2.
The nomenclature coincides with the chromosomal localization of these genes on human
chromosomes 1 and 2. PUM1 is found on chromosome 1p35.2, spanning approximately
150kb with 22 exons, whereas PUM2 is found on chromosome 2p23–24, spanning at least
80kb and composed of 20 exons (Figure 1.2; Spassov and Jurecic, 2002). PUM1 and PUM2
encode 127 and 114 kDa proteins with evolutionarily highly conserved PUF RNA-binding
domains (86 and 88% identity with the fly Pum protein). Overall, they share 75% in overall
similarity, with their highly conserved RNA-binding domain, called PUM-HD, being 91%
identical (Spassov and Jurecic, 2002). PUM-HD spans eight exons, and is encoded by exons
15–22 in PUM1 and exons 13–20 in PUM2 gene. In addition, the sizes of these exons are
identical in PUM1 and PUM2, reflecting the conservation of gene structure. Hence, it is
not surprising that the C-terminal end of both proteins are highly homologous to that of
Drosophila Pum (78% identity for PUM1 and 79% identity for PUM2) compared to the N-
terminal end of PUM1 and PUM2 which shows a low degree of sequence conservation and
variation in size. This is caused mainly by the truncation of the N-terminal part in both
human proteins as the fly Pum protein is 1534 aa long, whereas PUM1 is 1186 aa and
PUM2 is only 1064 aa long. PUM1 gene also contains two additional exons that encode for
extra 128 amino acids at the N-terminus. Some of the exons that encode the N-terminal
part of human Pum proteins (exons 1–14 in PUM1 and 1–12 in PUM2) have slightly
different sizes stemming from small in frame insertions or deletions.
Engineering RNA-binding proteins: Unravelling the code
20
(adapted from Jurecic and Spassov, 2002) Figure 1.2: Human PUM1 and PUM2. The PUM1 gene consists of 22 exons, whereas PUM2 gene consists of 20 exons. PUM-HD is encoded by exons 15 – 22 in PUM1 gene, and exons 13 – 20 in PUM2.
Both PUM1 and PUM2 show relatively widespread and mostly overlapping
expression in human tissues, with the only difference being that PUM1 does not seem to
be transcribed in the cerebellum, amygdala, corpus callosum, caudate nucleus, medulla
oblongata, hippocampus and putamen (Spassov and Jurecic, 2002).
1.4.6 Features of PUF proteins
The RNA-binding domain of PUF proteins is generally composed of eight
consecutive 36 amino acid repeats that are very similar, flanked by a degenerate capping
repeat at each end (Figure 1.3; Zamore et al., 1997). The crystal structure of both
Drosophila Pumilio and its human homolog, PUM1, showed that the collection of repeats
form an extended arc (Wang et al., 2002; Edwards et al., 2001). Each of the PUF repeats
consists of three alpha helix bundles that stack onto each other to form the arc whilst the
degenerate capping repeats have only one or two helices that approximate the shape of
the canonical repeats (Zamore et al., 1997; Wang et al., 2001; Edwards et al., 2001;,
Quenalt et al., 2011).
Engineering RNA-binding proteins: Unravelling the code
21
Figure 1.3: PUF protein crystal structure and binding motif (a) The crystal structure of the human PUM1 PUF domain bound to its native RNA, NRE (b) Schematic representation of the recognition of RNA bases in the NRE RNA by the PUF repeats of PUM1. (c) Recognition of adenine (top), uracil (middle), and guanine (bottom) by PUF repeats 5, 6 and 7 in the crystal structure of PUM1, respectively.
PUF proteins recognize specific sequences, known as the Nanos Response
elements (NREs) present at the 3’UTR of the target mRNA (Murata and Wharton, 1995).
The RNA binds to the inner concave surface of the protein, where each repeat binds to a
single base of the RNA. Amino acids at positions 12 and 16 of the PUF repeat bind each
RNA base via hydrogen bonding or van der Waals contacts with the Watson-Crick edge,
while the amino acid at position 13 makes a stacking interaction. The recognition of RNA
by naturally occurring PUF domains is base-specific (Wang et al., 2002), such that
asparagine and glutamine bind uracil; cysteine and glutamine bind adenine; and serine
and glutamate bind guanine. The RNA runs antiparallel to the protein whereby
nucleotides 1-8 are recognized by PUR repeat 8-1 respectively (Wang et al., 2002). The
modular nature of the interactions has enabled the sequence specificity of the PUF to be
altered by simply mutating the residues that make contacts with the Watson-Crick edge of
the base (Cheong and Hall, 2006). This has paved the way for PUF domains to be
engineered to recognize endogenous RNAs composed of adenine, guanine or uracil (Wang
Engineering RNA-binding proteins: Unravelling the code
22
et al., 2002; Wang et al., 2009; Tilsner et al., 2009). Despite the potential usefulness of
PUF domains as tools, they have not been widely adopted because naturally occurring
residues that recognize cytosine have not been found. This limits the potential target sites
for engineered PUFs and for small RNAs or defined regions of a larger RNAs, making it
impossible to engineer a PUF protein to bind them.
1.5 Pentatricopeptide repeat (PPR) proteins
Pentatricopeptide repeat proteins or PPRs were first discovered by Small and
Peeters (2002) when they examined the genome of Arabidopsis thaliana whilst searching
for genes encoding for proteins that were predicted to be imported into chloroplasts and
mitochondria. They initially found 200 genes belonging to a unique unreported gene
family. Further analysis uncovered almost 450 independent genes belonging to this family
that could be separated into two subfamilies with four subclasses (Small and Peeters;
2000; Lurin et al., 2004). PPRs were found to be involved in regulating many aspects of
mitochondria and chloroplast function; these include mRNA processing, stability, splicing,
editing and translation (Aubourg et al., 2000; Small and Peeters, 2000; Shikanai, 2006).
1.5.1 Members of the PPR family and its functions
PPR proteins are more common in plants than they are in fungi and vertebrates.
Plant genomes encode nearly 500, with more than 400 predicted PPR proteins in
Arabidopsis thaliana (Schmitz-Linneweber and Small., 2008). In yeast, nearly 200 potential
PPRs have been identified (Lipinski et al., 2011) and recently some of them have been
studied in Schizosaccharomyces pombe (Kuhl et al., 2011). With the exception of
Trypanosomes, which have 28 PPRs (Mingler et al., 2006), animal genomes generally
encode few to several dozen PPRs. In mammals, there are only seven mitochondrial PPR
proteins that have been identified. These are the leucine-rich PPR cassette (LRPPRC)
Engineering RNA-binding proteins: Unravelling the code
23
protein, the mitochondrial RNA polymerase (POLRMT), PPR domain containing proteins
(PTCD) 1, 2, and 3, the mitochondrial ribosomal protein of the small subunit 27 (MRPS27)
and mitochondrial RNase P protein 3 (MRPP3; Holzmann et al., 2008; Lightowlers and
Chrzanowska-Lightowlers., 2008).
Computational comparison of PPRs from different organisms conducted by Lipinski
et al. (2008) showed that methods that predict PPRs in plants are suboptimal in other
eukaryotes. This suggests that there is significant divergence of these motifs, and this
necessitates thorough analyses and prediction programs for the identification of PPRs in
eukaryotic genomes. Comparison of orthologous PPR proteins has indicated that this
family of proteins has undergone accelerated divergent evolution (Lipinski et al., 2011;
O’Toole et al., 2008). The accelerated divergent evolution could be attributed to the
coevolution of the proteins along with their RNA targets as a result of intragenic genetic
interactions or nucleo-organellar genetic buffering that are the result of cooperative
functional interactions between PPR motifs (Fujii et al., 2011; Lipinski et al., 2011; O’Toole
et al., 2008). The important point is that the presence of PPRs in all of these proteins along
with other eukaryotic PPR proteins has aided the prediction of their common role in RNA
binding.
Many genetic studies found that PPR proteins have essential roles in diverse plant
phenomena, such as embryogenesis (Cushing et al., 2005), fertility restoration of
cytoplasmic male sterility (Chase, 2007), maintenance of chloroplasts and mitochondria
(Schmitz-Linneweber and Small, 2008), abiotic stress response (Zsigmond et al., 2008),
organelle-to-nuclear signaling (Koussevitzky et al., 2007) and metabolite biosynthesis
(Kobayashi et al., 2007). Research on each individual mammalian PPR protein have
discovered that they mostly have different and unrelated functions in organelle RNA
metabolism (Rackham and Filipovska., 2011; Davies et al., 2011; Davies et al., 2009; Gohill
Engineering RNA-binding proteins: Unravelling the code
24
et al., 2010; Lightowlers and Chrzanowska-Lightowlers., 2008; Mili and Pinol Roma, 2003;
Rackham et al., 2009; Mootha et al., 2003; Sasarman et al., 2010 Sondheimer et al., 2010;
Sterky et al., 2010; Xu et al., 2008; Ruzzenente et al., 2011).
Several PPR proteins have been shown to interact with RNA by in vitro studies
(Nakamura et al., 2003; Okuda et al., 2007; Hammani et al., 2011; Prikryl et al., 2011), or
by co-immunoprecipitation (Schmitz-Linneweber et al., 2005; Beick et al., 2008). These
PPR proteins interact with either a single specific RNA or a small subset of RNA molecules
and affect numerous aspects of RNA metabolism, including RNA editing (Kotera et al.,
2005), cleavage (Gobert et al., 2010), RNA stability (Pfalz et al., 2009), splicing (Schmitz-
Linneweber et al., 2006), translation or a combination of these functions (Yamazaki et al.,
2004). It has been suggested that the PPR motif itself does not catalyze any RNA
processing; instead it is proposed that PPR proteins act as adapters, with the tandem array
of PPR motifs facilitating binding to nucleic acids in a sequence-specific manner. Of
interest, studies have found that the majority of PPR proteins function by recruiting
accessory proteins that possess RNA degrading or modifying functions, or to block these
proteins from their target RNAs (Schmitz-Linneweber and Small, 2008).
1.5.2 PPR motif and structure
PPR proteins are made up of 2-26 copies of a degenerate motif, approximately 35
amino acids, organized as a tandem array (Schmitz-Linneweber and Small, 2008; Small and
Peeters, 2008). All known PPR proteins have been found to be nuclear encoded, with most
predicted to localize to the chloroplast or mitochondria (Lurin et al., 2004). They share
similarities in primary structure to the tetratricopeptide (TPR) repeat; while the function
of the TPR motifs is to mediate protein-protein interactions, PPR domains are mainly
involved in RNA-protein interactions (Small and Peeters, 2000). In vitro and in vivo studies
have confirmed that PPR domains interact with RNA (Nakamura et al., 2003; Okuda et al.,
Engineering RNA-binding proteins: Unravelling the code
25
2007; Hammani et al., 2011; Prikryl et al., 2011) and it is suggested that PPR proteins
function as adapters, facilitating binding to nucleic acid in a sequence specific manner.
a
b
Figure 1.4: PPR proteins. (a) Schematic representation of a typical PPR protein, human PTCD3 (Davies et al., 2009). Locations of predicted PPRs and an N-terminal mitochondrial targeting sequence (MTS) are shown. (b) Crystal structure of the human POLRMT protein containing two tandem PPRs solved by Ringel et al. (2011) (adapted from Filipovska and Rackham, 2012). The C-terminal PPR is highlighted in purple and the N-terminal PPR is highlighted in orange.
Ringel et al. (2011) solved the structure of a mitochondrial RNA polymerase that
contained two PPR motifs (Figure 1.4b). The crystal structure showed that the 35 amino
acid PPR motif was comprised of two anti-parallel α-helices. Initially, the helices were
predicted to be arranged in tandem arrays to form a superhelix structure with a central
Engineering RNA-binding proteins: Unravelling the code
26
hydrophilic cavity where the RNA phosphate backbone could potentially interact (Small
and Peeters, 2000; Tavares-Carreon et al., 2008). The helical-hairpin model has been
confirmed experimentally via both circular dichroism spectrum analysis and analytical
ultracentrifugation using maize PPR5 (Williams-Carrier et al., 2008). Structural prediction
proposed that helix A of the PPR motif is located at the concave surface. The inner surface
of the protein is positively charged enabling interactions with the negatively charged
backbones of nucleic acids to occur (Delannoy et al., 2007).
Recently, Howard et al. (2012) solved the crystal structure of a protein-only RNAse
P (PRORP) from the Arabidopsis thaliana. This protein is different to any other RNAse P in
that it does not possess a catalytic RNA component. PRORP1 is one of three PRORP
enzymes encoded by A. thaliana and it localizes to mitochondria and chloroplasts (Gobert
et al., 2010). The crystal structure revealed that PRORP1 is composed of three discrete
domains, one of which is a PPR domain composed of 11 α-helices forming 5.5 consecutive
PPR repeats, each consisting of a helix-turn-helix hairpin (Howard et al., 2012). The
domain arrangements resembled that seen in tetratricopeptide repeat (TPR) motifs
whereby the tandem helical repeats associate to form a right-handed superhelical
structure (Howard et al., 2012). Howard et al. (2012) also found that PRORP1 has an
overall neutral electrostatic surface potential at the concave surface facing the putative
active site, suggesting that the PPR–nucleic acid interaction is not mainly electrostatic.
1.5.3 RNA recognition code of PPR proteins
Using co-variation analysis to determine phylogenetically conserved amino acids,
clues to the RNA recognition code of these proteins have been deciphered. Recently,
Kobayashi et al. (2012) used truncations of the Arabidopsis HCF152 protein, which is
composed of two adjacent PPRs, to perform widespread mutagenesis in order to identify
amino acids that are important for RNA-binding and specificity. This study identified five
Engineering RNA-binding proteins: Unravelling the code
27
residues at positions one, four, eight, 12 and 34 as ones that are imperative for high
affinity RNA-interaction (Kobayashi et al., 2012). These five residues are aligned such that
they are exposed on the solvent surface of the PPR protein, although the structure does
not necessarily imply the mechanism for PPR-RNA binding (Ringel et al., 2011). They also
highlighted that residue 4 appears to be particularly important for PPR function as
substitutions at that position resulted in drastically reduced RNA binding affinity, and that
the 4th residue possesses inter- and intra-connections with all adjoining residues
(Kobayashi et al. 2012).
In another study, Barkan et al (2012) used computational methods to deduce a
code for nucleotide recognition of PPR proteins by using the maize protein PPR10, which
consists of 19 PPR motifs, as a model. They found strong correlations between the RNA
base and the amino acids at positions 6 and 1’ (corresponds to position 4 and 34 in the
Kobayashi et al. (2012) study), which were suggested to be specificity-determining
positions based on their patterns of evolutionary selection (Barkan et al., 2012; Fujii et al.,
2011). Using mobility shift assays to test whether there are correlations between the
amino acid identities at those PPR positions to RNA-binding specificity, they found that
PPRs recognize RNA in a modular manner, in a parallel orientation, with the amino acid at
positions 6 and 1′ in each repeat determining base preference (Barkan et al., 2012). In the
context of two adjacent PPR motifs, other amino acid positions appear to not affect
nucleotide specificity as amino acid changes at positions 6 and 1’ was sufficient to change
the nucleotide preference (Barkan et al., 2012). Although conceptually comparable to
PUF/RNA recognition, PPR/RNA complexes involve distinct amino acid combinations and
have opposite polarity (Barkan et al., 2012). Their results define a combinatorial two-
amino acid code that can specify binding of a PPR motif to either A, G, U>C, C>U, or U = C
(Table 1.2).
Engineering RNA-binding proteins: Unravelling the code
28
Table 1.2: Amino acid code of PPR proteins for RNA binding (Barkan et al., 2012)
Amino Acid Nucleotide Preference Position 6 Position 1’
Threonine (T) Aspartic Acid (D) G >>> A,C,U Threonine (T) Asparagine (N) A >>> G,C,U
Asparagine (N) Aspartic Acid (D) U > C >>> A,G Asparagine (N) Asparagine (N) C = U >>> A,G Asparagine (N) Serine (S) C > U >>> A,G
Barkan et al. (2012) highlighted that the prediction of the natural binding sites of
PPR proteins and off-target binding estimation by synthetic PPR proteins would prove to
be challenging because the RNA-binding code is degenerate, with less than two-thirds of
naturally occurring combinations can be deciphered, and that there is still a lack in
understanding of the energetic requirements in establishing a physiologically relevant
PPR/RNA interaction.
1.5.4 Human PPRs
As previously mentioned, there are only seven identified mitochondrial PPR
domain proteins in mammals to date. These are the mitochondrial RNA polymerase
(POLRMT), the leucine-rich PPR cassette (LRPPRC) protein, PPR domain containing
proteins (PTCD) 1, 2, and 3, mitochondrial RNase P protein 3 (MRPP3) and mitochondrial
ribosomal protein of the small subunit 27 (MRPS27)(Rackham and Filipovska, 2012;
Rackham et al., 2012; Figure 1.5). There remains some discrepancy about the number of
PPR domains that each of these mammalian PPR proteins possess due to the scarcity of
structural and functional data on PPR domains and how they bind RNA. This makes it
challenging to precisely define functional PPR domains. Experimental and structural
evidence is necessary to explain the mechanism of their interaction with RNA. Studies are
on-going investigating the association of RNA with PPR proteins; it is worth noting that
these proteins are highly insoluble, causing delays in determining their atomic structure.
Engineering RNA-binding proteins: Unravelling the code
29
Figure 1.5: Mammalian mitochondrial PPR proteins. Schematic representation of the seven mammalian mitochondrial PPR domain proteins. (adapted from Rackham and Filipovska, 2011)
A very brief summary of each human PPR proteins is as follows. The leucine-rich
pentatricopeptide repeat cassette (LRPPRC) protein is primarily a mitochondrial matrix
protein that is 130 kDa in size and contains 22 predicted PPR domains (Sterky et al., 2010;
Mili and Pinol-Roma., 2003; Mootha et al., 2003; Xu et al., 2004). It is still unclear which
aspect of gene expression and mitochondrial RNA metabolism is affected by LRPPRC
reduction but it has been shown that LRPPRC may be responsible for mitochondria mRNA
stability and co-ordinated translation (Ruzzenente et al., 2011). MRPP3 is a 67 kDa
mitochondria targeted protein that is composed of 3 PPR domains and a putative
metallonuclease domain. MRPP3 has recently been identified as one of the three essential
components of the mitochondria targeted RNase P (Holzmann et al., 2008). Recently, it
has been shown that MRPP3 is necessary for the processing of mitochondrial tRNAs
(Holzmann et al., 2008; Lopez-Sanchez et al., 2011; Holzmann et al., 2009; Rossmanith and
Holzmann, 2009). The mechanism of MRPP3 is still unclear and it has been speculated that
the PPR domains in MRPP3 recognize and bind the substrate tRNA, while the
metallonuclease domain carries out the cleavage of the tRNA from the precursor
transcript.
Engineering RNA-binding proteins: Unravelling the code
30
MRPS27 is a PPR domain protein that has six putative PPR domains located in
tandem towards the N-terminus of the protein and it has been shown to associate with
the small subunit of the mitochondrial ribosome and to be required for mitochondrial
translation (Davies et al., 2012). The human POLRMT is a single polypeptide subunit that is
139 kDa in size prior to import into the mitochondria, and is essential for the transcription
of the mitochondrial genome (Falkenberg et al., 2007). POLRMT transcription of the
mitochondrial genome requires the presence of both the mitochondrial transcription
factor A (TFAM) and one of the two mitochondrial transcription factor B paralogues
(TFB1M and TFB2M) (Falkenberg et al., 2002; Kanki et al., 2004). PTCD1 is a mitochondrial
matrix protein that contains eight PPR domains and is predominantly found in muscle and
heart (Rackham et al., 2009). PTCD1 is thought to be involved in negatively regulating
leucine tRNA levels and consequently affects the abundance of mitochondria encoded
proteins (Lopez Sanchez et al., 2011; Rackham et al., 2009). PTCD2, a 44 kDa mitochondria
targeted protein, contains 5 PPR domains and a study by Xu et al. (2008) indicated that
PTCD2 may regulate Cyt b RNA processing in mice. PTCD3 is a 79 kDa protein that contains
15 PPR domains and has an important role in the translation of mitochondrial proteins by
association with the small subunit of mitochondrial ribosomes and the 12S rRNA (Davies
et al., 2009).
Overall, human PPRs have diverse roles in mitochondrial gene expression. Their
modularity has enabled them to have numerous RNA regulatory functions, however their
RNA binding target sequence have yet to be determined. Identification of their target
RNAs would allow better understanding into their functions and modes of regulating the
expression of mitochondrial mRNAs. To date, PPR proteins have not been used in
biotechnological applications. Although key aspects of the code by which PPRs recognize
RNA have been discovered, the base specificity of only a few amino acid combinations
have been experimentally verified and the structural and mechanistic details of how they
bind RNA are unknown. Furthermore, applications of PPR proteins have been severely
limited by their inherent insolubility when expressed recombinantly. Elucidating the RNA
Engineering RNA-binding proteins: Unravelling the code
31
recognition code of PPR proteins would not only allow us to predict their RNA targets but
perhaps to engineer them as tools to selectively and specifically manipulate mammalian
mitochondrial gene expression.
1.6 Purview of the thesis
There are many potential applications of designer RNA-binding proteins for
biotechnological and medical use, given the importance of posttranscriptional regulation.
However, before we are able to engineer them for these purposes, we must first obtain
the complete modular amino acid code in order to be able to predict their binding
specificity. Here I describe the use of directed evolution to expand PUF repeat recognition
beyond adenine, guanine and uracil in order to specifically bind cytosine. One of the
factors considered prior to selecting PUFs for protein engineering was that randomizing
amino acid residues in PUF proteins are more apparent and limited because full-
randomization can be achieved by mutating the two amino acids that make contact with
the RNA bases. PUF proteins also have the ability to undergo combinatorial selection,
which entails the capacity to randomize the RNA-binding residues to enable
programmable protein specificity. It is not surprising that many studies have successfully
engineered PUFs to bind their specific target of interest (Lu et al., 2009; Ozawa et al.,
2007; Wang et al., 2009). I have also demonstrated that these PUF repeats can be
engineered to selectively bind targets beyond its native eight-repeat RNA sequence and
that binding can be achieved with RNA targets of diverse structure.
Additionally, I describe how the RNA recognition code of PPR proteins was
successfully deciphered using a consensus protein design composed of the most common
amino acids residues at each position of the PPR motif. Our success with the PUF proteins
led to the work on PPR proteins because they too are able to undergo combinatorial
selection, although it was unclear at the start of the study as to which amino acid residue
positions were responsible for base recognition. We also took into consideration the fact
Engineering RNA-binding proteins: Unravelling the code
32
that both PUF and PPR proteins have contiguous recognition. This is important because
theoretically, better specificity would be achieved if the spaces (or lack thereof) between
each RNA-binding domain were fixed (Wang et al., 2002). Double ZFs is an example of an
RBP which possesses a spacer in between each repeat unit. In order to maximize
specificity, one would have to optimize not only the linker length but also its composition
(Handel et al., 2009; Shimizu et al., 2009). With the newly deciphered RNA-binding PPR
code, we were able to engineer a synthetic PPR protein that was able to target the poly(A)
tail of mitochondrial mRNAs, highlighting the ability of using designer RBPs to target
transcripts not accessible to RNAi technologies. Overall, not only does this study better
our understanding of RBP-RNA recognition, it also provides a glimpse into their potential
application to better understand the complex patterns of gene expression in diseases.
Engineering RNA-binding proteins: Unravelling the code
33
CHAPTER 2
Materials and Methods 2.1 Materials
All chemicals and materials used in this study were of analytical grade and were
sourced from Amresco Inc., DIFCO Laboratories or Sigma Chemical Company, unless
indicated below.
BD (Becton, Dickinson and company): Bacto Yeast Extract, YPD Broth, Bacto Agar, Bacto
Tryptone
Beckman: 50 ml and 500 ml Ultracentrifuge tubes
Bio-Rad Laboratories Inc.: Mini-PROTEAN 1D-Electrophoresis system, PowerPac Universal
power supply
Fermentas: DreamTaq DNA Polymerase, 10X DreamTaq Green Buffer, high-fidelity
restriction enzymes and associated reaction buffers, GeneJET Plasmid Miniprep Kit,
GeneJET PCR Purification Kit and O’GeneRuler 1kb DNA Ladder Plus, 6x Orange DNA
loading dye
GE Lifesciences: 0.45 μm Hybond-N+ nitrocellulose membrane
Greiner Bio-One: 15 ml and 50 ml Falcon tubes, Cellstar 96-well plate, Cellstar 10 and 25
ml pipette tip
Invitrogen Corporation: DH10B E.coli, pcDNA3 plasmid, Dulbecco’s modified Eagle’s
medium, DMEM, Lipofectamine 2000, RNaseOUT, AcTEV protease, OptiMEM media
Ito, T. (University of Tokyo): pGAD-RC plasmid
Engineering RNA-binding proteins: Unravelling the code
34
New England Biolabs: All restriction endonucleases, T4 ligase, Phusion DNA polymerase,
DNA polymerase I Large (Klenow) fragment and associated reaction buffers,
deoxyribonucleotides (dNTPs), ER2566 E.coli, chitin beads, pTYB3 plasmid.
Novagen: BL21(DE3) E.coli, Rosetta E.coli
Perkin-Elmer: Expres35S Protein Labeling Mix (35S)
Promega: Beta-Glo Assay System
Qiagen: RNeasy Mini kit, QuantiTect Reverse Transcription Kit, miRNeasy Mini kit
Roche: Complete protease inhibitors, Fugene HD
Sarstedt: 90 mm x 14 mm petri dishes
Scientific Specialties Inc.: 1.7 ml graduated microtubes, Ultraflux 200 µL flat top PCR
tubes
Sigma: rabbit-IgG agarose
Wickens, M. (University of Wisconsin at Madison): pIIIA/MS2-2 plamid, RNA expression
plasmid
Thermo Scientific: Top Vision agarose, PageRuler Plus Prestained Protein Ladder Slide-A-
Lyzer mini dialysis units (3,500 MWCO), FastAP Thermosensitive Alkaline Phosphatase
2.2 Methods
Unless otherwise indicated, all methods were performed according to Current
Protocols in Molecular Biology (Ausubel, 1987), Current Protocols in Cell Biology
(Bonifacino, 1998), or manufacturers’ instructions. Specifically, restriction digestion and
ligation reactions were performed according to protocols provided by New England
Biolabs. PCR using Phusion DNA polymerase or DreamTaq DNA polymerase used protocols
Engineering RNA-binding proteins: Unravelling the code
35
provided by New England Biolabs and Fermentas, respectively, with “touch-down” cycling
according to Current Protocols in Molecular Biology (Ausubel, 1987). Gel extraction, PCR
purification and plasmid minipreps were performed according to instructions from
Fermentas. Annealing of oligonucleotides was performed according to Current Protocols
in Molecular Biology (Ausubel, 1987). RNA extraction and cDNA synthesis was performed
according to instructions from Qiagen. Basic cell culture techniques were carried out
according to Current Protocols in Cell Biology (Bonifacino, 1998).
2.2.1 Plasmid construction
To produce a Gal4p activation domain fused to a PUF domain, a synthetic gene
encoding amino acids 828 to 1176 of the human PUM1 protein (GenBank accession no.
NP_001018494, GENEART) was subcloned into pJC72 (Rackham and Chin, 2005). The
synthetic PUM1 gene was cut with NcoI and XhoI and ligated into a FastAP treated
NcoI/XhoI cut pJC72 plasmid. This plasmid was used as a template for library construction
by enzymatic inverse PCR (Rackham and Chin, 2005) using primers where the codons
corresponding to amino acids 1043 and 1047 were encoded by mixtures of trimer
phosphoramidites encoding all 20 amino acids (GeneWorks). This library of mutant PUF
domains was subcloned into the yeast expression plasmid pGAD-RC that had been cut
with NcoI and XhoI (Ito et al., 2000). Individual PUF domain mutants were also made by
enzymatic inverse PCR where two pairs of primers specifying cysteine or serine at amino
acid 1043 and glutamine or glutamate at amino acid 1047 were designed for mutating
repeat 6 of human PUM1 protein for yeast three-hybrid sensitivity testing. To make a 16
repeat Puf protein (PUFx2) repeats 1-8 of the human PUM1 cDNA were amplified using
primers that incorporated flanking SacI sites, digested with SacI and cloned into an
engineered SacI site that encodes amino acids 1030 and 1031 of the synthetic gene
encoding the PUM1 PUF domain.
Engineering RNA-binding proteins: Unravelling the code
36
RNA expression plasmids were made by altering the multiple cloning site of pIIIA/MS2-2
(Stumpf et al., 2008) according to Cassiday and Maher (2001) and sub-cloning pairs of
annealed oligonucleotides corresponding to the following RNA sequences (PUF
recognition sequences in bold, site specific mutations underlined):
NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU1C: 5'-CCGGCUAGCAAUCGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREG2C: 5'-CCGGCUAGCAAUUCUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3A: 5'-CCGGCUAGCAAUUGAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3'; NREA4C: 5'-CCGGCUAGCAAUUGUCUAUAUUAAUUUAAUAAAGCAUG-3'; NREU5C: 5'-CCGGCUAGCAAUUGUACAUAUUAAUUUAAUAAAGCAUG-3'; NREA6C: 5'-CCGGCUAGCAAUUGUAUCUAUUAAUUUAAUAAAGCAUG-3'; NREU7C: 5'-CCGGCUAGCAAUUGUAUACAUUAAUUUAAUAAAGCAUG-3'; NREA8C: 5'-CCGGCUAGCAAUUGUAUAUCUUAAUUUAAUAAAGCAUG-3'; NREstem5: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAAUAAAGCAUG-3'; NREstem6: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUUAAAGCAUG-3'; NREstem7: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUAAAAGCAUG-3'; NREstem8: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUACAAGCAUG-3'; NREx2: 5'-CCGGCUAGCAAUUGUUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREx2mut1: 5'- CCGGCUAGCAAUCCCUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREx2mut2: 5'- CCGGCUAGCAAUUGUCCCCUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'.
All of the synthetic genes cPPRcaps poly(A), cPPRcaps poly(U), cPPRcaps poly(C)
[NT], cPPRcaps poly(C) [NS], cPPRcaps poly(G) [SD], cPPRcaps poly(G) [GD] and cPPRcaps
NRE were assembled from synthetic oligonucleotides and/or PCR products by GeneArt
(Life Technologies). The fragments were provided pre-cloned into pMK-RQ using SfiI
cloning sites. For the expression and purification of PUF and cPPR proteins, both the PUF
and cPPR genes were cloned into the pTYB3 vector. pTYB3 is a 7,477bp E. coli expression
vector used in the IMPACT Kit (NEB #E6901; Chong et al., 1997). This C-terminal fusion
plasmid was designed for the insertion of a target gene into a polylinker upstream of an
intein tag (the Sce VMA intein/chitin binding domain, 55 kDa) (Chong et al., 1997;
Engineering RNA-binding proteins: Unravelling the code
37
Watanabe et al., 1994). Upon cloning, the C-terminal end of the target protein would be
fused to the N-terminus of the intein tag, both under the control of an IPTG-inducible T7
promoter (Dubendorff and Studier, 1991). Single column purification of the target protein
is achieved via thiol-induced self-cleavage of the intein, which releases the target protein
from the chitin-bound intein tag (Chong et al., 1996).
The synthetic cPPR fragments were cut with NcoI and SapI from the pMK-RQ
plasmid and ligated with a FastAP treated, NcoI and SapI cut pTYB3 plasmid. The PUF and
mutant derivative genes were cloned into the NcoI/XhoI cut pTYB3 plasmid.
Additionally, the PUF gene and its mutant derivatives were also cloned into
NcoI/XhoI cut pETM30 vector, which is another protein expression vector which has both
a glutathione S-transferase (GST) tag and a HexaHis (His6) tag located at the N-
terminus. Purification of the target protein via this GST gene fusion system is achieved by
TEV cleavage of the His6 after the GST-proteins have been captured by glutathione-
agarose beads. For expression of cPPR proteins in tissue culture, cPPRcaps poly(A) and
cPPRcaps NRE were initially subcloned into NcoI/XhoI cut pJC72-OTC backbone. Ornitihine
transcarbamylase (OTC) is a 36 kDa protein that facilitates post-translational import of
proteins into mitochondria (Brusilow and Horwich., 1996; Mori et al., 1981).
Subsequently, the fused OTC-cPPR genes were cut with KpnI/XhoI, gel purified and cloned
into the pcDNA3 expression vector (Invitrogen). All plasmids were tested for expression by
transfection and immunoblotting.
2.2.2. E. coli competent cell preparation
DH10B/ER2566/BL21 cells were grown in 10 ml lysogeny broth (LB) medium
overnight at 37 °C with shaking (180 rpm). The starter culture was diluted in 500 ml LB
media and grown at 37 °C with shaking until OD600 was between 0.4 and 0.6. Cells were
Engineering RNA-binding proteins: Unravelling the code
38
pelleted at 3000 rpm for 5 min and resuspended in 150 ml ice cold 100 mM CaCl2/10%
glycerol. Cells were pelleted once again and resuspended in 20 ml ice cold 100 mM
CaCl2/10% glycerol; incubate for 25 mins. 250 µL aliquots were dispensed in sterile 1.7 ml
microcentrifuge tubes. Transformation efficiency and checks for contamination were
performed after each preparation, and frozen stocks were stored at –80°C for subsequent
use.
2.2.3 E. coli transformation
For transformation, 10 µL or 50 µL of E. coli competent cells were added to 1 µL
(whole plasmid) or 10 µL (ligation mix), respectively and incubated on ice for 30 mins.
Cells were subsequently heat-shocked for 30 s at 42 °C in a water bath. After cells were
chilled on ice for 5 mins and 1 ml of SOC was added (made by adding 20mM glucose to 1 L
of SOB medium [2% (w/v) tryptone peptone, 0.5% (w/v) yeast extract, 10 mM NaCl, 2.5
mM KCl, 10 mM MgCl2, 10 mM MgSO4, prepared without Mg2+ and autoclaved, Mg2+
added from a 2 M filter sterilized stock (1 M MgCl2.6H20 and 1 M MgSO47H20,)]. Cells
were left to shake (180 rpm) at 37°C for 40 mins. Cells were pelleted and plated on LB agar
plates (with antibiotics) and grown overnight at 37°C.
2.2.4 Bacterial colony screening
As many plasmids were constructed in the course of this and other projects, a
quick and reliable method for screening for recombinant clones was required. Colonies
were initially picked and resuspended in 6 µL of LB. Of this suspension, 3 µL was added to
8 µL of 0.5% Tween-20 and mixed. The remaining 3 µL were used later for inoculation of
an overnight culture when screening results were positive. The Tween-20 suspension was
heated to 100°C in a thermal cycler for 30 s. For PCR based screening, an aliquot of 1 µL
was used from the denatured Tween-20 suspension and amplified with appropriate
primers and DreamTaq DNA polymerase.
Engineering RNA-binding proteins: Unravelling the code
39
For screening by restriction enzyme, digest 2 µL (containing 1 µL of 10 x buffer, 0.2
µL of each restriction enzyme and made up to the final volume with sterile deionized
water) were added to the denatured Tween-20 suspension. After 30 mins, at the
appropriate temperature, reaction mixes were resolved by agarose gel electrophoresis
and analyzed by ethidium bromide staining.
2.2.5 Plasmid preparation and analysis
Plasmid DNA was isolated from 10 ml of overnight E. coli DH10B culture using the
Fermentas GeneJET Plasmid Miniprep Kit. The concentration and purity of the plasmid
DNA was determined using a NanoDrop ND-1000 Spectrophotometer (Nanodrop
Technologies, Inc). The identity of each plasmid was confirmed by restriction
endonuclease digestion and Sanger sequencing of inserts using primers flanking ligation
junctions (Australian Genome Research Facility, Perth).
2.2.6 Yeast transformations
The lithium acetate method for transforming yeast was adapted from Gietz and
Woods (2001). S. cerevisiae YBZ1 cells (MATa, ura3-52, leu2-3, 112, his3-200, trp1-1, ade2,
LYS2 :: (LexAop)-HIS3, ura3 :: (lexA-op)-lacZ, LexA-MS2 coat (N55K))(Hook et al., 2005)
were inoculated in 10 ml YPAD (1% w/v Yeast extract, 2% w/v peptone, 2% w/v glucose,
0.01% w/v adenine hemisulphate) and grown overnight at 30 °C with shaking (180 rpm).
Cells were pelleted at 3200 rpm for 2 mins and resuspended in 50 ml YPAD in a baffled
flask. Cells were returned to grow for 3 hours at 30 °C with shaking. Cells were pelleted,
washed in 40 ml TE (10 mM Tris-Cl, pH 7.5; 1 mM EDTA), pelleted again and incubated at
room temperature for 10 mins in 2 ml 100 mM Lithium acetate/ 0.5x TE. 100 µL of cells
was added to 1 µg plasmid DNA with 100 µg denatured salmon sperm DNA and mixed
gently. For yeast-three hybrid transformations, both RNA expression and yeast expression
plasmid were added together. 700 µL of 100mM lithium acetate/40% PEG-3350/1x TE was
Engineering RNA-binding proteins: Unravelling the code
40
added to cell-DNA mix and incubated at 30°C for 30 mins. 88 µL of DMSO was added and
the mixture was heat shocked for 7 mins in a 42°C water bath. Cells were pelleted for 30 s
at 10,000 rpm, liquid removed and washed with 1 ml TE. Cells were pelleted once again
and resuspended in 100 µL TE before plating on SC media plates lacking the appropriate
amino acids; allowed to grow at 30°C. SC media (0.67% w/v yeast nitrogen base [without
amino acids, with ammonium sulphate], 2% w/v glucose, pH 5.6 (prepared without amino
acids and dropout mix) autoclaved; 100 ml of 10x dropout mix (0.03% w/v arginine HCl,
0.03% w/v isoleucine, 0.03% w/v lysine HCl, 0.03% w/v methionine, 0.05% w/v
phenylalanine, 0.03% w/v serine, 0.03% w/v threonine, 0.03% w/v tyrosine, 0.15% w/v
valine) and 10 ml of individual 100x amino acids was added as required. SC agar was
made by adding 2.1% w/v Bacto agar to SC media.
2.2.7 PUF protein expression and purification
PUF domains were subcloned into pTYB3 and expressed as fusions to an intein and
chitin-binding domain in E.coli ER2566 cells. Cells were lyzed by sonication in 20 mM
sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF. Lysates were clarified by
centrifugation and incubated for 40 min with chitin beads. Beads were washed twice with
20 mM sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF, once with 20 mM
sodium phosphate (pH 8.0), 0.5 M NaCl, and 0.1 mM PMSF, and once with 20 mM sodium
phosphate (pH 8.0), 0.15 M NaCl, and 0.1 mM PMSF. DTT was added to the beads to 50
mM final concentration and the tube was purged with nitrogen gas before incubation at
room temperature with gentle rocking for three days. Cleaved PUF domain protein, free
from the intein and chitin-binding domain was collected, transferred into 10 mM Tris-HCl
(pH 7.4), 150 mM NaCl, 5 mM ß-mercaptoethanol and further purified by an ÄKTA-
Explorer system (GE) using a Superdex 200 10/300 column (GE) with a total bed volume of
120 ml. Pure fractions were pooled and concentrated using Microsep 10K Omega
centrifugal devices (PALL). Protein concentration was determined by the bicichroninic acid
(BCA) assay using bovine serum albumin (BSA) as a standard.
Engineering RNA-binding proteins: Unravelling the code
41
2.2.8 cPPR protein expression and purification
Like the PUF proteins, cPPR domains were subcloned into pTYB3 and expressed as
a fusion to an intein and chitin-binding domain in Escherichia coli ER2566 cells (New
England Biolabs). Cells were lyzed by sonication in 20 mM Trizma base (pH 8.0), 1 M NaCl,
10% glycerol and 0.1 mM PMSF. Lysates were clarified by centrifugation and incubated for
40 min with chitin beads (New England Biolabs). Beads were washed five times with 20
mM Trizma base (pH 8.0), 1 M NaCl, 10% glycerol and 0.1 mM PMSF. DTT was added to
the beads to 50 mM final concentration and the tube was purged with nitrogen gas before
incubation at room temperature with gentle rocking for three days. Cleaved cPPR domain
protein, free from the intein and chitin-binding domain was collected and transferred into
a Slide-A-Lyzer mini dialysis unit and dialyzed overnight in 20 mM Trizma Base (pH 8.0), 1
M NaCl, 10% glycerol to remove DTT. Protein concentration was determined visually on a
10% SDS-page gel using dilutions of 1 mg/ml bovine serum albumin (BSA) as a standard.
2.2.9 Bicinchoninic acid (BCA) protein assay
In order to determine the protein concentration of samples, BCA protein assays
were conducted as per Smith et al. (1985) Analytical Biochemistry 150 (pg 76-85) in a 96-
well plate using bovine serum albumin (BSA) as a standard. 20 µL triplicates of each
sample were pipetted in successive wells. Dilutions were prepared in 1% v/v Triton-X-100
in water. 50 parts of BCA reagent A (1% BCA [4,4’-dicarboxy-2,2’-biquinoline] 2% Na2CO3,
0.16% Na2tartrate, 0.4% NaOH, 0.95% NaHCO3, pH 11.25) was mixed with one part BCA
reagent B (4% CuSO4.5H2O). 200 µL of the prepared reagent was added to each sample
and was allowed to incubate at 37 °C for one hour. The plate was read at OD550
on VICTOR3™ Multilabel Counter model 1420 (PerkinElmer).
2.2.10 SDS-PAGE gel
Cell samples were denatured in loading buffer (50 mM Tris, 4% SDS, 12% glycerol,
2% 2-mercaptoethanol, 0.01% Coomassie brilliant blue) for 7 min at 95°C and separated
Engineering RNA-binding proteins: Unravelling the code
42
on a 10% Tris-glycine gel (0.375 M Tris-HCl, 0.1% SDS, pH6.8) using the BioRad Mini
Protean system. Gels were then stained with Coomassie blue stain (40% methanol, 10%
acetic acid, 0.1% Coomassie Brilliant Blue) for 1 hr and destained (20% methanol, 7.5%
acetic acid).
2.2.11 RNA electrophoretic mobility shift assays
Purified PUF/cPPR domains were incubated at room temperature for 30 min with
fluorescein labeled RNA oligonucleotides (Dharmacon) in 10 mM HEPES (pH 8.0), 1 mM
EDTA, 50 mM KCl, 2 mM DTT, 0.1 mg/ml fatty acid-free BSA, and 0.02% Tween-20.
Reactions were analyzed by 10% PAGE in TAE and fluorescence was detected using a
Typhoon TRIO scanner (GE).
List of probes: NRE: 5'-(Fl)AUUGUAUAUA-3'
NREU3C: 5'-(Fl)AUUGCAUAUA-3' Poly G: 5’-(Fl)AAGGGGGGGG -3’ Poly C: 5’-(Fl)CCCCCCCCCC-3’ Poly U: 5’-(Fl)UUUUUUUUUU -3’ Poly A: 5’-(Fl)AAAAAAAAAA-3’
2.2.12 PUF library selections
YBZ1 cells containing the NREU3C RNA expression plasmid were transformed with
the PUF domain library in pGAD-RC using the lithium acetate method according to Gietz
and Woods (2002) as described in section 2.2.4, yielding 6 x 105 primary transformants.
Cells were inoculated and amplified by overnight growth in SC media lacking leucine and
uracil at 30 °C with shaking (180 rpm). The cells were subsequently pelleted and washed in
50 ml TE and 1 x 107 CFU were plated on SC agar lacking leucine, uracil and histidine,
supplemented with 0.5 mM 3-amino triazole. Colonies were picked after three days and
the plasmids were isolated, transformed into DH10B, screened by PCR to identify the PUF
encoding plasmid which was sequenced and transformed into YBZ1 to analyze the
Engineering RNA-binding proteins: Unravelling the code
43
specificity of the mutant PUF domains, as described below.
2.2.13 Yeast three-hybrid growth assays
YBZ1 transformants containing PUF domain and RNA expression plasmids were
grown overnight in SC media lacking leucine and uracil at 30 °C with shaking (180 rpm).
Cells were pelleted and washed in SC media without amino acids, diluted to OD600 of 0.1
and replica spotted (5 µL) onto SC media lacking leucine and uracil (to test for cell health
and plasmid maintenance) and SC agar lacking leucine, uracil and histidine, supplemented
with 0.5 mM 3-amino triazole (to test for RNA-protein interactions).
2.2.14 ß-galactosidase assays
YBZ1 transformants containing PUF domain and RNA expression plasmids were
grown overnight in SC media lacking leucine and uracil at 30°C with shaking (180 rpm). The
culture was diluted to OD600 of 0.1 and mixed with an equal volume of Beta-Glo reagent
(Promega), incubated for 1 h at room temperature and luminescence was detected using
a FLUOstar OPTIMA (BMB Labtech).
2.2.15 Cell culture
143B osteosarcoma cells were cultured at 37 °C under humidified 95% air/5% CO2
in Dulbecco’s modified Eagle’s medium (DMEM, Invitrogen) containing glucose (4.5 g l−1),
1 mM pyruvate, 2 mM glutamine, penicillin (100 U ml−1), streptomycin sulfate
(100 μg ml−1) and 10% fetal bovine serum (FBS).
Engineering RNA-binding proteins: Unravelling the code
44
2.2.16 Transfections
143B cells were plated at 60% confluence in six-well plates or 10 cm dishes and
transfected with mammalian expression plasmids in OptiMEM media (Invitrogen). 125 nM
(for 6-well plates) or 145 nM (for 10 cm dishes) of cPPRcaps poly(A)/NRE or control EYFP,
were transfected using Lipofectamine 2000 (Invitrogen). 158 ng/cm2 of cPPRcaps poly
(A)/NRE or control EYFP plasmid DNA was transfected using Fugene HD (Roche). Cell
incubations were carried out for 3 days following transfection. Transfections for 9 days
were reseeded and re-transfected every 3 days.
2.2.17 Northern blotting
RNA was isolated from 143B cells using the Qiagen miRNeasy kit according to the
manufacturer’s instructions. RNA (5 μg) was resolved on 1.2% agarose formaldehyde gels,
then transferred to 0.45 μm Hybond-N+ nitrocellulose membrane (GE Lifesciences) and
hybridized with biotinylated oligonucleotide probes specific to mitochondrial mRNAs and
rRNAs. The hybridizations were carried out overnight at 50 °C in 5× SSC, 20 mM Na2HPO4,
7% SDS and 100 μg ml−1 heparin, followed by washing. The signal was detected using
either a streptavidin-linked horseradish peroxidase or streptavidin-linked infrared
antibody (diluted 1:2000 in 3× SSC, 5% SDS, 25 mM Na2HPO4, pH 7.5) by enhanced
chemiluminescence (GE Lifesciences) or using an Odyssey Infrared Imaging System.
2.2.18 Mitochondrial protein synthesis
143B cells were grown in six-well plates until 60% confluent, transfected and 3
days later de novo protein synthesis was analyzed. For a 6 day transfection, the initial
transfected cells are re-transfected at the end of the 3 day time point and allowed to grow
for an additional 3 days. The growth medium was replaced with methionine and cysteine
free medium containing 10% dialysed FBS for 30 min before addition of 100 μg ml−1
Engineering RNA-binding proteins: Unravelling the code
45
emetine for 5 min. Next, 200 μCi Expres35S Protein Labeling Mix [35S] (14 mCi, Perkin–
Elmer) was added and incubated at 37 °C for 1 h, then washed in PBS and centrifuged. The
cells were suspended in PBS and 20 μg of proteins were separated on 12.5% SDS–PAGE
and the radiolabeled proteins were visualized on film.
2.3 Graphic maps of Plasmids
The following graphic maps are representations of plasmids used in the project. Not
all plasmids were included in this section due to redundancy.
2.3.2 pIIIA/MS2-2 plasmid
pIIIA/MS2-2 is an RNA expression plasmid used in yeast three-hybrid experiments, where
annealed oligonucleotides were cloned between the SmaI and SphI sites.
Engineering RNA-binding proteins: Unravelling the code
46
2.3.2 pTYB3-EYFP plasmid
Other inserts cloned between the NcoI and XhoI restriction site in pTYB3 were PUF1,
PUM1 and the C-binding mutants.
2.3.3 pETM30-EYFP plasmid
Other inserts cloned between the NcoI and XhoI restriction site in pETM30 were PUF1,
PUM1 and the C-binding mutants.
Engineering RNA-binding proteins: Unravelling the code
47
2.3.4 pTYB3-cPPRcaps poly(A) plasmid
Other inserts cloned between the NcoI and SapI restriction site in pTYB3 were all the
cPPRcaps - poly [A, G, U/C, C(NT), C(NS), G(GD), G(SD), NRE].
2.3.5 pcDNA3-OTC cPPRcaps poly(A)-CTAP plasmid
Other inserts cloned between the KpnI and XhoI restriction site in pTYB3 were the
cPPRcaps-NRE and EYFP.
Engineering RNA-binding proteins: Unravelling the code
48
CHAPTER 3
Engineering Cytosine-binding PUF repeats
Designer DNA-binding proteins that can silence, activate or modify a target gene
have already been developed based on various classical zinc finger (ZF) domains (Sera,
2009; Cathomen and Joung, 2008; Carroll, 2008; Camenisch et al., 2008). Some of them
are currently being tested in clinical trials as potential therapeutic agents (Tebas and Stein,
2009). Present methods for altering gene expressing via RNA mostly rely on RNA
interference (RNAi) methods (Liu and Paroo, 2010, Perrimon et al., 2010; Vaishnaw et al.,
2010). However, there are also engineered RNA-binding proteins (RBPs) that have been
used to modulate mRNA function. Examples include (i) fusion proteins containing a green
fluorescent protein and the RNA-binding MS2 coat protein have been used to monitor the
localization of ASH1 mRNA (Bertrand et al., 1998) (ii) combining the RNA-binding domains
(RBDs) of iron regulatory protein with the eukaryotic translation initiation factor, eIF4G,
can enhance translation of a reporter gene (De Gregorio et al., 1999). However, these
proteins have not been able to function on endogenous mRNAs because the reporters
were linked to RNA-binding proteins with well-characterized recognition sites that were
incorporated into target mRNAs of interest (Mackay et al., 2011). Therefore, the ability to
engineer designer RBPs would offer considerable flexibility for controlling RNA function
and would enable endogenous RNAs to be targeted.
One of the best candidates for engineering are the PUF (Pumilio and FBF
homology) proteins. Initially, designing custom PUF proteins was hindered by the lack of
structural knowledge and poor understanding of the guidelines governing its RNA-protein
recognition. The crystal structures of PUF proteins has shown that they are generally
composed of eight 36 amino acid repeats, with each repeat binding to a single nucleotide
in their RNA targets (Wang et al., 2002; Edwards et al., 2001; Zamore et al., 1997; Lu and
Engineering RNA-binding proteins: Unravelling the code
49
Hall., 2011; Wang et al., 2009; Zhu et al., 2009). Amino acids at positions 12 and 16 of the
PUF repeat bind each RNA base via hydrogen bonding or van der Waals contacts with the
Watson-Crick edge, while the amino acid at position 13 makes stacking interactions. It was
also elucidated that PUFs are good designer RBP candidates because their RNA
recognition is base-specific (Wang et al., 2002); such that a cysteine and glutamine bind
adenine; asparagine and glutamine bind uracil; and serine and glutamate bind guanine.
A major attribute of designer RBPs is the ability to fuse the protein to useful
effector domains. It is not surprising that this has been done with the PUF proteins. Ozawa
et al. (2007) successfully created two PUF fusion proteins that contained either the N- or
C-terminal portion of a fluorescent protein. The two PUM1 PUF domains were engineered
to recognize specific sequences in the mitochondrial ND6 transcript by introducing either
three or seven point mutations (Ozawa et al., 2007). The goal to be achieved was that
when mammalian cells were transfected with the two plasmids, fluorescence will occur
only when both fusion proteins bound to their target sequence in the mitochondrial ND6
mRNA, allowing the fluorescent protein to be reconstituted (Ozawa et al., 2007). Using
this approach, both the diffusion and localization properties of the ND6 mRNA transcript
were ascertained under normal and stress conditions.
In a different application, Wang et al. (2009) fused engineered PUF domains to the
glycine-rich domain of human heterogeneous nuclear ribonucleoprotein (hnRNP) A1 or
the arginine- and serine-rich domain of ASF (also called SF2 or SRSF1) to create targetable
RNA-splicing repressors and enhancers, respectively. This was achieved by introducing five
point mutations in the PUM1 PUF domain so as to recognize an 8 nucleotide sequence in
an exon extension region of the BCL21 RNA (also known as BCLX; Wang et al., 2009). The
long transcripts of BCL21 act to inhibit apoptosis in cancer cells. When several cancer cell
lines were treated with the engineered splicing factor, it resulted in a shift to the
Engineering RNA-binding proteins: Unravelling the code
50
predominant splice form of shortened BCLX transcripts, which was pro-apoptotic and
caused the cells to be sensitive to anticancer drugs. These studies provide an exciting
preview of applications for engineered RBPs.
However, the use of PUF domains as tools has been hampered because naturally
occurring residues that recognize cytosine have not been found. This limits the potential
target sites for engineered PUFs because even in RNAs encoded by guanine and cytosine-
poor genomes, the majority of octomer sequences will contain at least one cytosine. If
one was to target a small RNA or defined region of a larger RNA, it can be impossible to
engineer a PUF protein to bind these RNAs. Therefore, the identification of a combination
of amino acid side chains in a PUM1 repeat that can recognize a cytosine is necessary to
expand the use of designer PUF domains directed toward any RNA sequence. To
overcome this limitation, we used directed evolution to select for PUF repeat variants that
are able to specifically recognize cytosine. This process entails generating a library of
mutants and testing them via screening or selection, for the presence of mutants
possessing the desired property with the aim of eliminating all original or non-functional
mutants and identifying and enriching mutants with the new desired function.
3.1 Methods to study RNA-Protein interactions
In general, the most common methods used to study RNA-protein interactions
involve cell extracts or purified proteins and in vitro transcribed RNA. Specific binding can
be studied by mobility shifts of labeled or fluorescently-probed RNA in agarose or
polyacrylamide gels (Thomson et al., 1999), filter binding (Merrick and Sonenberg, 1997),
or chromatography with matrix-bound RNAs (Allerson et al., 2003). In a similar method
known as a “north-western”, proteins are separated on polyacrylamide gels, transferred
to a membrane and probed with labeled RNA sequences of interest (Monshausen et al.,
2001). As purified proteins and RNAs can be used in these methods, they can discriminate
Engineering RNA-binding proteins: Unravelling the code
51
between direct and indirect binding. These assays all suffer from the same major
drawback in that the conditions of association vary in stringency depending on the
composition of the reaction buffers. Immunoprecipitation of RNA-binding proteins
provides a more physiological way to study RNA-protein interactions. Subsequent
northern (Takizawa and Vale, 2000) or microarray analysis (Brown et al., 2001) of
antibody-bound fractions can determine the RNAs that were associated with a given
protein prior to cell lysis. As complexes are allowed to form in cells before analysis fewer
false interactions are likely to occur. Appending an epitope or affinity tag by genetic
manipulation enables the collection of RNA-protein complexes containing proteins for
which antibodies are not available (Takizawa and Vale, 2000). However, it is still possible
for non-specific interactions to occur after cell lysis and these methods cannot
discriminate between direct and indirect binding. In a complementary approach,
identification of proteins associated with RNA of interest can be achieved by tagging the
RNA itself. RNAs are manipulated to contain a short aptamer sequence that binds directly
to an affinity matrix (Srisawat and Engelke, 2001; Srisawat et al., 2001) or a heterologous
RNA-binding protein that is itself tagged or can be immunoprecipitated (Watkins et al.,
2000). Co-purifying proteins can be identified by immunoblotting or mass spectrometry. In
addition to post-lysis artifacts, these methods require huge numbers of cells as starting
material and hence any observations represent only the average of a population.
To date very few methods for studying RNA-protein interactions in living cells have
been reported. The yeast three-hybrid system (Putz et al., 1996; SenGupta et al., 1996) is
the most widely used of these approaches and has been especially powerful in the
isolation of cDNAs for proteins that bind RNA sequences of interest (Jan et al., 1999;
Martin et al., 1997; Park et al., 1999; Zhang et al., 1997). In this system, a DNA-binding
domain is tethered to an RNA sequence of interest by the bacteriophage MS2 coat protein
via a specific aptamer; while a transcriptional activator is fused to an RNA-binding protein
of interest (SenGupta et al., 1996). If an interaction between the two molecules of interest
Engineering RNA-binding proteins: Unravelling the code
52
occurs, transcription of reporter genes is activated. The reporter genes enable screening
of yeast on selective growth media and semi-quantitative measurement of interactions.
The yeast three-hybrid system enables screening of cDNA libraries to identify new RNA-
binding proteins (Zhang et al., 1997), screening of RNA sequences to determine unknown
targets for a given RNA-binding protein (Sengupta et al., 1999), and delineation of
sequences important for known RNA-protein interactions (Lee et al., 1999). Because the
yeast three-hybrid system is an in vivo system which is particularly well suited for library
screening, I sought to adapt it to expand the RNA recognition code of PUFs.
3.2 Genetic selection of PUF library in yeast three-hybrid system
To identify PUF repeat variants that are able to specifically recognize cytosine, the
yeast three-hybrid system was used to link the interaction between PUF domains and its
RNA target to a life-death selection in S. cerevisiae (Figure 3.1; SenGupta et al., 1996). In
this system, a fusion between the lexA DNA-binding domain and the MS2 coat protein
effectively tethers a hybrid RNA that contains MS2 recognition sites and a PUF RNA target
of interest upstream of his3 and lacZ reporter genes in the yeast genome. If a PUF-RNA
complex is formed, transcription of the his3 gene is activated to allow survival on media
lacking histidine. In addition, the RNA-protein interaction can be quantified by measuring
the activity of β−galactosidase expressed from the lacZ reporter gene (SenGupta et al.,
1996).
Engineering RNA-binding proteins: Unravelling the code
53
Figure 3.1: The yeast three-hybrid system. Diagrammatic representation of the yeast three-hybrid system used in this study. The LexA DNA operator (LexAop), LexA DNA-binding domain (LexA), bacteriophage MS2 coat protein (MS2), Nanos response element RNA (NRE), PUF protein (PUF), transcription activation domain of the yeast Gal4 transcription factor (AD), HIS3 gene (HIS3) and the lacZ reporter gene (lacZ) are shown.
3.3 Yeast three-hybrid sensitivity testing for engineering individual PUF repeats
We investigated repeat 6 of the human PUF protein, PUM1, which inherently binds
uracil in its RNA target (the nanos-response element, NRE) with a high-affinity, as
observed in its crystal structure (Wang et al., 2001). In order to test the sensitivity of the
yeast three-hybrid system, a preliminary experiment was conducted. This entailed
mutating the amino acids at position 12 and 16 of repeat 6 by PCR with primers specifying
amino acid cysteine or serine at position 12 and glutamine or glutamate at position 16
with the purpose of altering the specificity of repeat 6 to bind adenine and guanine,
respectively (Figure 3.2).
Engineering RNA-binding proteins: Unravelling the code
54
Figure 3.2: Amino acids that determine PUF repeat binding. Amino acids of repeat 5, 6 and 7 of PUM1 bind Adenine, Guanine and Uracil, respectively. Cysteine (C) and glutamine (Q) bind adenine; asparagine (N) and glutamine (Q) bind uracil; and serine (S) and glutamate (E) bind guanine. Amino acids at position 12 and 16 of repeat 6 were mutated via PCR to resemble adenine- and guanine- binding repeats (exemplified by repeat 5 and 7).
.
The RNA expression plasmids used for this experiment were constructed by sub-
cloning pairs of annealed oligonucleotides into the multiple cloning site of the pIIIA/MS2-2
plasmid matching the following RNA sequences. Nanos response element (NRE) is the RNA
target for PUM1 (PUF recognition sequences in bold, site specific mutations underlined):
NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3A: 5'-CCGGCUAGCAAUUGAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3';
In order to generate a PUF domain that was fused to a Gal4p activation domain, a
synthetic gene encoding amino acids 828 to 1176 of the human PUM1 protein was sub-
cloned into pJC72 (Rackham and Chin, 2005). This plasmid was later used as a template for
library construction by enzymatic inverse PCR (Section 3.5; Rackham and Chin, 2005). The
yeast expression plasmid was constructed by sub-cloning the wild-type PUF gene and the
two repeat 6 variants from pJC72 into the yeast expression plasmid pGAD-RC (Ito et al.,
2000). YBZ1 was transformed with the combination of four RNA expression plasmids and
the three pGAD-RC constructs using the lithium acetate method according to Gietz and
Woods (2002). Enhanced yellow fluorescent protein (EYFP) was used as a negative control.
The transformants were plated on SC agar lacking leucine, uracil and histidine,
supplemented with 0.5 mM 3-amino triazole.
Engineering RNA-binding proteins: Unravelling the code
55
3-Aminotriazole (3-AT) is a competitive inhibitor of the product of HIS3 gene,
His3p. The level of resistance to 3-AT is used as a measure of the strength of RNA–protein
interaction because cells containing more His3p can survive at higher concentrations of 3-
AT. YBZ1 transformants containing both the PUF domain and RNA expression plasmids
were grown overnight in SC media lacking leucine and uracil, washed in SC media without
amino acids, diluted to OD600 of 0.1 and replica spotted onto SC media lacking leucine and
uracil (to test for cell health and plasmid maintenance) and SC agar lacking leucine, uracil
and histidine, supplemented with 0.5 mM 3-amino triazole (to test for RNA-protein
interactions).
3.3.1 Yeast three-hybrid system is sensitive for engineering PUFs
Results of the sensitivity test showed that when the amino acid at position 12 was
mutated to cysteine to resemble the adenine-binding repeat five of PUM1 (CQ) cells
survived on selective media only when the target RNA had an adenine at the position in
the RNA bound by repeat 6 (Figure 3.3). Furthermore, when we transplanted the guanine-
recognizing amino acids, serine and glutamate, from repeat seven (SE) into repeat six, the
cells successfully survived on selective media when the target RNA had an guanine at the
position in the RNA bound by repeat 6. Results from this experiment indicate that the
system provides sufficient sensitivity to enable the engineering of individual PUF repeats.
Engineering RNA-binding proteins: Unravelling the code
56
Figure 3.3: Yeast three-hybrid sensitivity test for engineering individual PUF repeats. Specificity of the selected clones was determined by survival on medium lacking histidine and containing 0.5 mM 3-aminotriazole.
3.4 Library screening for cytosine-binding PUF
Given that the system was sensitive enough for engineering individual PUF repeats,
we progressed on to find amino acids that were able to specifically recognize cytosine in
the context of PUFs. A library based on the PUM1 PUF domain was synthesized where
positions 12 and 16 of repeat six were randomized to encode all possible amino acids. This
was achieved using primers where the codons corresponding to amino acids 1043 and
1047 (located in repeat 6 of PUM1) were encoded by mixtures of trimer phosphoramidites
encoding all 20 amino acids. This library of mutant PUF domains was then sub-cloned from
the pJC72 plasmid into the yeast expression plasmid pGAD-RC (Ito et al., 2000). YBZ1
containing the NREU3C RNA expression plasmid was transformed with the PUF domain
library in pGAD-RC and plated on SC agar lacking leucine, uracil and histidine,
supplemented with 0.5 mM 3-amino triazole.
Engineering RNA-binding proteins: Unravelling the code
57
Unlike the sensitivity testing, colonies that survived on media lacking histidine after
three days were picked and the plasmids were isolated, transformed into DH10B,
screened by PCR to identify the PUF encoding plasmid which was sequenced and
transformed into YBZ1 to analyze the specificity of the mutant PUF domains. YBZ1
transformants containing both the PUF domain and RNA expression plasmids were grown
overnight in SC media lacking leucine and uracil, washed in SC media without amino acids,
diluted to OD600 of 0.1 and replica spotted onto both SC media lacking leucine and uracil
and SC agar lacking leucine, uracil and histidine, supplemented with 0.5 mM 3-amino
triazole.
3.4.1 Five PUF mutants interact with cytosine
Five unique PUF mutants that selectively interacted with RNAs containing a
cytosine but not adenine, guanine or uracil were identified (Figure 3.4). The first
interesting observation noted was that all five variants had an arginine at position 16. On
the other hand, four of the amino acids at position 12 (glycine, serine, threonine and
cysteine) are amino acids with polar uncharged side chains, while the other amino acid,
alanine, is classified as a non-polar hydrophobic amino acid. The commonality in their
structure is that they possess a small or nucleophilic side chain.
Engineering RNA-binding proteins: Unravelling the code
58
Figure 3.4: Selection of PUF repeats that can specifically recognize cytosine. (Left) Sequences of the RNA-binding regions of PUM1 repeats. The key hydrogen-bonding residues at positions 12 and 16 were randomised and combinations that could recognize cytosine were selected from the library using the yeast three-hybrid system. Specificity of the selected clones was determined by survival on media lacking histidine and containing 0.5 mM 3-aminotriazole (a His3p competitive inhibitor) (Right) Cytosine-binding clones as determined by survival on medium lacking histidine uracil and histidine, and containing 0.5 mM 3-aminotriazole.
In addition, β-galactosidase activity expressed from the lacZ reporter gene was
assayed in order to quantify of the strength of the RNA-protein interaction. We confirmed
that the A, G and U repeats bind their expected bases and that all five of the newly found
C-binding mutants specifically interacted with a target RNA containing cytosine. The fold
increase in β-galactosidase activity for the selected mutants in the presence of cytosine-
containing RNA was not as high as that for the wild type PUF and its cognate NRE RNA,
however some of the cytosine-binding mutants in combination with their target RNAs
Engineering RNA-binding proteins: Unravelling the code
59
generated activities similar to the guanine-binding mutant and higher than the adenine-
binding mutant, in the presence of their cognate RNAs.
Figure 3.5: Characterization of PUF repeats that can specifically recognize cytosine. Specificity of the selected clones was quantified using ß-galactosidase assays to examine activation of a lacZ reporter gene. We used the enhanced yellow fluorescent protein (EYFP) as a control to show that the activation of transcription was dependent on specific RNA-protein interactions in our experiments. Data are mean ± SEM from six independent experiments. *p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.
Engineering RNA-binding proteins: Unravelling the code
60
3.5 In vitro analysis of PUF-NRE interaction
To further confirm our observations seen in the previous section, we investigated
the specificity of the interactions between wild type and mutant PUF proteins in vitro
using RNA electrophoretic mobility shift assays (RNA EMSA). In order to do this, I over
expressed the wild type PUF and two cytosine-recognizing PUF mutants (GR, with glycine
at position 12 and arginine at position 16, and CR, with cysteine at position 12 and
arginine at position 16) in E. coli and purified them to homogeneity.
3.5.1 Purifying PUF proteins
The PUF domains were first sub-cloned from the pGAD-RC yeast expression plasmid
into the pETM30 protein expression plasmid. This system expresses the PUF as a fusion to
GST and a His6 tag in cells. An initial small scale test induction was conducted to
determine the optimum conditions for soluble PUM1 protein production. pETM30-PUM1
was transformed into the Rosetta 2 and BL21 cells and overnight cultures were grown in
LB in the presence of kanamycin (BL21 and Rosetta) and chloramphenicol (for Rosetta
only). A five hour induction at 37 oC and an overnight induction at room temperature with
1 mM IPTG were conducted.
Engineering RNA-binding proteins: Unravelling the code
61
His6-GST-PUM1
Figure 3.6: Test induction of pETM30 -PUM1 plasmid in BL21. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (5 hrs). Lane 7: Induced insoluble lysate (O/N). Lane 8: Uninduced soluble lysate (O/N). Lane 9: Induced soluble lysate (5 hrs). Lane 10: Induced soluble lysate (O/N).
Figure 3.7: Test induction of pETM30 -PUM1 plasmid in Rosetta 2. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (5 hrs). Lane 7: Induced insoluble lysate (O/N). Lane 8: Uninduced soluble lysate (O/N). Lane 9: Induced soluble lysate (5 hrs). Lane 10: Induced soluble lysate (O/N).
His6-GST-PUM1
Engineering RNA-binding proteins: Unravelling the code
62
The first set of inductions showed that the PUM1 protein was successfully
produced in both conditions, with more protein being produced following overnight
induction (Figures 3.6 and 3.7). However, the proteins produced were highly insoluble
under those conditions. I proceeded to determine if the 1 mM IPTG concentration was too
high by repeating the inductions using 0.5 mM and 0.05 mM IPTG given that at high IPTG
concentrations the cells may have been under too much stress. A five hour induction at
37 oC and an overnight induction at room temperature with 1 mM IPTG were conducted.
Figure 3.8: Test induction of pETM30 -PUM1 plasmid with 0.5 mM IPTG. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced soluble lysate (3 hrs). Lane 4: Induced soluble lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (3 hrs). Lane 7: Induced insoluble lysate (O/N).
His6-GST-PUM1
Engineering RNA-binding proteins: Unravelling the code
63
Figure 3.9: Test induction of pETM30 -PUM1 plasmid with 0.05 mM IPTG. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced soluble lysate (3 hrs). Lane 4: Induced soluble lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (3 hrs). Lane 7: Induced insoluble lysate (O/N).
Similar results to the induction at 1 mM IPTG were obtained when the cells were
induced with 0.5 mM and 0.05 mM IPTG (Figures 3.8 and 3.9). PUM1 production was
successful, however the proteins were insoluble. I decided to use a different expression
plasmid, pTYB3 as this was the same plasmid used by Wang et al. (2001) when they
successfully purified the PUM1 protein for crystallography. The PUF domains were first
sub-cloned from the pGAD-RC yeast expression plasmid into the pTYB3 protein expression
plasmid. This system expresses the PUF as a fusion to an intein and chitin-binding domain.
pTYB3-PUM1 was transformed into the Rosetta 2 cells and overnight cultures were grown
in LB in the presence of ampicillin and kanamycin. A five hour induction at room
temperature with 1 mM IPTG was conducted.
His6-GST-PUM1
Engineering RNA-binding proteins: Unravelling the code
64
Figure 3.10: Test induction of pTYB3-PUM1 plasmid in Rosetta 2. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Uninduced soluble lysate. Lane 5: Induced soluble lysate. Lane 6: Uninduced insoluble lysate. Lane 7: Induced insoluble lysate.
Although the induction successfully led to the production of PUM1, all the proteins
produced were insoluble (Figure 3.10). As a final attempt, a test induction was conducted
with ER2566 cells, which is the host strain recommended for the expression of genes
cloned into the pTYB3 vector. pTYB3-PUM1 was transformed into the ER2566 and
overnight cultures were grown in LB in the presence of ampicillin. A five hour and an
overnight induction at room temperature with 1 mM IPTG were conducted.
PUM1-intein-CBD
Engineering RNA-binding proteins: Unravelling the code
65
PUM1- intein-CBD
Figure 3.11: Test induction of pETM30 -PUM1 plasmid in ER2566. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced soluble lysate (O/N). Lane 6: Induced soluble lysate (5 hrs). Lane 7: Induced soluble lysate (O/N). Lane 8: Uninduced insoluble lysate (O/N). Lane 9: Induced insoluble lysate (3 hrs). Lane 10: Induced insoluble lysate (O/N).
The inductions showed that although the majority of PUM1 proteins produced
were insoluble, there was still a small portion of soluble PUM1 proteins (as seen in lane 6
and 7; Figure 3.11), with a five hour induction producing the same amount of soluble
protein as an overnight induction. Note that PUM proteins were visualized in both lanes 2
and 8 because with the pET system, the absence of glucose in media may lead to partial
induction as catabolite repression is not achieved (Novy and Morris, 2001). Therefore, it
was decided that future inductions and purifications with the pTYB3 plasmid was to be
conducted at room temperature for five hours with 1 mM IPTG in ER2566. I advanced with
the large scale (2 l) purifications of PUM1 and two cytosine-binding mutants to enable in
vitro examination via RNA-electrophoretic mobility shift assays (RNA EMSAs). The proteins
Engineering RNA-binding proteins: Unravelling the code
66
were purified as per Section 2.2.7. In all cases, soluble protein was produced, free of the
intein and chitin-binding domain (Figures 3.12-3.14).
Figure 3.12: Purification of PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified PUM1 protein. Lane 7: Chitin beads
PUM1-intein-CBD
intein-CBD
PUM1
Engineering RNA-binding proteins: Unravelling the code
67
Figure 3.13: Purification of C-binding (GR) PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified C-binding (GR) protein. Lane 7: Chitin beads
Figure 3.14: Purification of C-binding (CR) PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified C-binding (CR) protein. Lane 7: Chitin beads
C-binding (GR) PUF- intein-CBD
intein-CBD
C-binding (GR) PUF protein
C-binding (CR) PUF- intein-CBD
intein-CBD
C-binding (CR) PUF protein
Engineering RNA-binding proteins: Unravelling the code
68
The cleaved PUF proteins were collected and further purified by an ÄKTA-Explorer
system (GE) using a Superdex 200 10/300 column (GE). The final purified proteins were
separated by SDS-PAGE and visualized by Coomassie Brilliant Blue staining (Figure 3.15).
Figure 3.15: Purified PUM1 and the two C-binding PUF proteins. (Left) Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. (Right) Gel filtration profile of cleaved PUM1 proteins purified on the Superdex 200 10/300 column by an ÄKTA-Explorer system. Purified PUM1 was collected from the highest peak (region between the dashed lines).
3.5.2 RNA electrophoretic mobility shift assay of PUF proteins
In order to confirm the binding specificity of the newly found cytosine-binding PUF
as well as to determine the strength of their interactions with the cytosine-containing RNA
target, an in vitro assay known as the RNA electrophoretic mobility shift assay (RNA EMSA)
was conducted. This assay was conducted by incubating the purified PUF proteins with
fluorescein labeled RNA oligonucleotides in binding buffer for an hour; the reactions were
subsequently analyzed by 10% PAGE in TAE and fluorescence was detected using a
Typhoon TRIO scanner. The two fluorescein-labeled RNA oligonucleotides are NRE [5'-(Fl)
AUUGUAUAUA-3'] and NREU3C [5'-(Fl) AUUGCAUAUA-3'], the former is the native RNA
target of the PUM1 protein, while the latter is the target for the cytosine-binding PUFs.
Engineering RNA-binding proteins: Unravelling the code
69
Figure 3.16. Specific recognition of cytosine in vitro. (a) Selected PUF domains are specific for cytosine containing RNAs, determined by RNA electrophoretic mobility shift assays. Wild type (NQ) and mutant (CR and GR) PUF proteins were tested against uracil (NRE) or cytosine (U3C) containing RNA probes. (b) The percentage of RNA bound by varying concentrations of each protein.
The RNA EMSAs showed striking specificity shift between the mutant PUF domains
and their cytosine-containing target RNA and the wild type PUF and its cognate NRE target
(Fig. 3.16). The first RNA EMSA with the PUM1 shows the wild-type protein binding
strongly to its NRE target compared to the cytosine-containing target RNA as binding was
observed at 0.01 µM of PUM1. There was a marked shift in specificity observed between
the PUM1 and the NREU3C. In general, increased shifting was observed with higher
Engineering RNA-binding proteins: Unravelling the code
70
protein concentration. Comparing the RNA EMSAs between PUM1 and the C-binding CR
mutant with both RNA targets, it was noted that the affinity of PUM1 for the NRE was
comparable to the affinity of the CR mutant to its NREU3C target as binding was achieved
at 0.01 µM of CR protein. On the other hand, the GR mutant successfully bound to its
cognate cytosine-containing RNA, albeit with lower affinity. It can also be seen that the CR
protein has a much higher level of non-specific binding to the NRE probe compared to the
GR mutant, however it still preferentially binds NREU3C. This confirms previous
observations showing that engineered PUF proteins do not always bind with the same
affinities as the wild type proteins (Cheong and Hall, 2006), indicating that future
applications will depend not only on binding preference but also on the affinity for their
target RNAs.
3.6 Summary
We have successfully identified five unique PUF mutants that can selectively
interact with RNAs containing a cytosine but not adenine, guanine or uracil. This was
achieved by randomizing the amino acids at positions 12 and 16 to encode for all possible
20 amino acids and screening for mutants on selection plates. All five variants had an
arginine at position 16 and either the amino acid glycine, alanine, serine, threonine, or
cysteine at position 12. Prior to this, we have shown that the yeast three-hybrid system
has sufficient sensitivity to enable the engineering of individual PUF repeats. Further
testing using β-galactosidase assays confirmed that the A, G and U repeats bind their
expected bases and that the five newly discovered C-binding mutants specifically
interacted with target RNA containing cytosine. Although the β-galactosidase activity for
the C-binding mutants bound to a cytosine-containing RNA was not as high as that for the
wild type PUF, it appears that some of the cytosine-binding mutants in combination with
their target RNAs generated activities similar to or better than the guanine-binding and
adenine-binding mutant, respectively in the presence of their cognate RNAs.
Engineering RNA-binding proteins: Unravelling the code
71
We also confirmed previous observations showing that engineered PUF proteins
do not always bind with the same affinities as wild type proteins, with the GR mutant PUF
binding to its cognate cytosine-containing RNA with an affinity very similar to the wild
type PUF and its cognate NRE RNA. On the other hand, the CR mutant bound to its
cognate cytosine-containing RNA with higher affinity than that of the wild type PUF. The
next step forward is to determine if the newly found code retains its modularity and to
explore other applications of PUF proteins.
Engineering RNA-binding proteins: Unravelling the code
72
CHAPTER 4
Exploring Features of PUF-RNA Interactions
Having discovered the code for cytosine-binding PUF repeats, it would be of
interest to determine if the code has general applicability in order to ensure that the
binding properties of newly designed PUF domains are predictable. It has already been
shown that the modular nature of the interactions enables the sequence specificity of
PUFs to be altered by simply mutating the residues that make contacts with the Watson-
Crick edge of the base (Cheong and Hall, 2006). This has paved the way for PUF domains
to be engineered to recognize endogenous RNAs composed of adenine, guanine or uracil
(Wang et al., 2002; Wang et al., 2009; Tilsner et al., 2009). Here we examine if the same
modularity can be achieved with cytosine.
4.1 General applicability of cytosine-binding code
The RNA expression plasmids used for this experiment were constructed by sub-
cloning pairs of annealed oligonucleotides into the multiple cloning site of the pIIIA/MS2-2
plasmid.
NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU1C: 5'-CCGGCUAGCAAUCGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREG2C: 5'-CCGGCUAGCAAUUCUAUAUAUUAAUUUAAUAAAGCAUG-3' NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3'; NREA4C: 5'-CCGGCUAGCAAUUGUCUAUAUUAAUUUAAUAAAGCAUG-3'; NREU5C: 5'-CCGGCUAGCAAUUGUACAUAUUAAUUUAAUAAAGCAUG-3'; NREA6C: 5'-CCGGCUAGCAAUUGUAUCUAUUAAUUUAAUAAAGCAUG-3'; NREU7C: 5'-CCGGCUAGCAAUUGUAUACAUUAAUUUAAUAAAGCAUG-3'; NREA8C: 5'-CCGGCUAGCAAUUGUAUAUCUUAAUUUAAUAAAGCAUG-3'
Engineering RNA-binding proteins: Unravelling the code
73
A set of eight PUM1 mutants where each repeat was sequentially modified to have a
glycine at position 12 and an arginine at position 16 (GR; Figure 4.1), was made and sub-
cloned from the pJC72 backbone into the yeast expression plasmid pGAD-RC. S. cerevisiae
YBZ1 was transformed with the combination of the nine RNA expression plasmids with the
set of eight pGAD-PUM1 mutant constructs. Enhanced yellow fluorescent protein (EYFP)
was used as a negative control. The transformants were plated on SC agar lacking leucine,
uracil and histidine, supplemented with 0.5 mM 3-amino triazole.
Figure 4.1: Applicability of the C-binding code. Converting successive wild type PUM1 repeats to the cytosine-binding (GR) repeats. Green and red arrows indicate the sequential change in position of the cytosine in the probe and the position of the C-binding repeat unit, respectively.
Engineering RNA-binding proteins: Unravelling the code
74
4.1.1 The cytosine-binding code is modular
The results showed that all of these engineered PUF domains bound RNA targets
with cytosine at the position in their RNA target corresponding to the mutated repeat with
higher affinity than the wild type, non-cytosine-containing RNA target (Figure 4.2). From
the β-galactosidase assay, it was observed that the magnitude of the specificity shift
varied from repeat to repeat. Binding between A4C mutant to its RNA target with cytosine
corresponding to repeat 5 was the strongest while U1C mutant bound the weakest to its
cognate mutant RNA target, only 2-fold better than binding to the NRE. There were also
moderate interactions observed between the NRE RNA and the cytosine-mutated PUFs
(A8C and U5C), although they were both approximately 3-fold weaker than the interaction
observed with its cognate mutant RNA target. This experiment confirms previous reports
that not all repeat-base interactions contribute equally to the binding energy of the RNA-
protein complex (Zamore et al., 1997; Cheong and Hall, 2006), however all mutants
preferentially bound cytosine-containing RNAs.
Figure 4.2: C-binding code possesses modularity. Engineered PUF repeats can selectively bind cytosine at all eight positions within the RNA target. Data are mean ± SEM from six independent experiments. * p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.
Engineering RNA-binding proteins: Unravelling the code
75
To date, all RNA targets of PUF proteins analyzed contain a UGU trinucleotide that
is critical for binding (Zhang et al., 1997; Zamore et al., 1997; Gerber et al., 2004; Souza et
al., 1999; Lamont et al., 2004; Tadauchi et al., 2001; Crittenden et al., 2002; Wang et al.,
2002; Nakahata et al., 2001; Eckmann et al., 2004; Olivas and Parker 2000; Jackson et al.,
2004). The binding of UGU by repeats 6-8 is highly conserved given atypical binding mode
of repeat 7, where the asparagine side chain is not long enough to form a stacking
interaction with a base (Cheong and Hall, 2006). They found that when repeats 6 and 7
was mutated such that they recognize UUG instead of the UGU triplet typically found in
RNA targets of PUF proteins, the mutant PUF binds its cognate mutant RNA 34-fold more
tightly than wild-type RNA, whereas wild-type protein binds wild-type RNA 3,300-fold
more tightly than the mutant RNA (Cheong and Hall, 2006). Zamore et al. (1997) found
that a deletion of the repeats 7 and 8 essentially eliminated RNA binding. In either study,
they were unable to fully recover the original strength of binding affinity of mutated PUFs.
With this knowledge, it can be deduced that re-engineering at these positions would be
difficult.
4.2 PUF-RNA interaction with increasing structure
The binding mode of PUF domains observed in different crystal structures (Wang
et al., 2002; Wang et al., 2009; Gupta et al., 2008; Zhu et al., 2009) indicates that their
RNA targets are exclusively single stranded in the RNA-protein complexes, however
whether their RNA targets must be single stranded prior to PUF domain binding is not
known. To test this hypothesis, a series of RNA variants was generated whereby the
sequence downstream of the NRE was modified sequentially to place the NRE in
increasingly base-paired structures.
The RNA expression plasmids used for this experiment were constructed by sub-
cloning pairs of annealed oligonucleotides into the pIIIA/MS2-2 plasmid. Base changes
Engineering RNA-binding proteins: Unravelling the code
76
were made to promote base-pairing within the RNA target to generate an increasingly
stronger hairpin structure:
NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3' NREstem5: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAAUAAAGCAUG-3'; NREstem6: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUUAAAGCAUG-3'; NREstem7: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUAAAAGCAUG-3'; NREstem8: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUACAAGCAUG-3';
S. cerevisiae YBZ1 was transformed with the combination of the five RNA
expression plasmids with the original pGAD-PUM1 or with EYFP as a negative control. The
transformants were plated on SC agar lacking leucine, uracil and histidine, supplemented
with 0.5 mM 3-amino triazole (Figure 4.3).
Figure 4.3: PUF proteins can bind to highly structured RNA targets. The PUF domain is able to bind to RNA targets that are located within substantially double stranded structures. The number of bases from the PUF recognition site that were paired within a stem structure was increased stepwise from five (stem5) to eight (stem8). Data are mean ± SEM from six independent experiments. *p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.
Engineering RNA-binding proteins: Unravelling the code
77
The results revealed that the PUF protein was able to bind all of the RNA targets. It
was interesting to observe that initially when the RNA target was mutated from a 5 base
pair-stem to a 6 base pair-stem secondary structure, the binding affinity of PUF for its
target actually improved. The introduction of the 7 base pair-stem structure led to
reduced affinity compared to wild-type NRE. The final 8 base pair-stem structure, in which
every base was paired in a stem, binding was still observed albeit less efficiently. This
experiment indicates that PUF proteins are able to invade structured RNAs to bind their
target sequences. It is unclear how the PUF protein successfully achieves this as there are
no crystal structures for the interactions but it can be hypothesized that this presumably
occurs during the dynamic rearrangements that are intrinsic to RNA structures (Kedde et
al., 2010). This is relevant not only to the rational engineering of PUF domains but also to
naturally occurring PUF domain proteins.
4.3 Extending the PUF domain beyond eight repeats
As previously mentioned, naturally occurring PUF proteins typically contain eight
RNA-binding repeats. Even though this is adequate for them to selectively regulate
particular developmental processes in cells, they often achieve this by binding to several
different RNAs (Gerber et al., 2004). For various applications in medicine, biotechnology
and synthetic biology, it would be extremely desirable to be able to target only one
species of RNA within an entire transcriptome. To achieve such levels of sequence
discrimination, we engineered an extended PUF protein with 16 RNA-binding repeats and
assessed its binding abilities to an equally extended RNA target.
To engineer a 16 RNA-binding repeat PUF (PUFx2), we inserted sequences
encoding only the RNA-binding PUF repeats, without flanking regions, from the human
PUM1 cDNA between repeats five and six of a synthetic gene that encodes the same
protein sequence as the PUM1 cDNA but is only 78% similar at the DNA level, to avoid
potential instability of the recombinant DNA (Figure 4.4). The core C. elegans FBF
Engineering RNA-binding proteins: Unravelling the code
78
recognition sequence begins with a UGU triplet, as do all validated PUF target sequences.
The optimal binding site is 5′-UGURNNAUA-3′ (R, purine; N, any base; Bernstein et
al.,2005). Because the C. elegans FBF-1 and FBF-2 PUF proteins contain a short insertion
close to the end of repeat five, we reasoned that this region might tolerate the insertion
of extra PUF repeats. Repeats 1-8 of the human PUM1 cDNA were amplified using primers
that incorporated flanking SacI sites, digested with SacI and cloned into an engineered
SacI site that encodes amino acids 1030 and 1031 of the synthetic gene encoding the
PUM1 PUF domain.
The RNA expression plasmids used for this experiment were constructed by sub-
cloning pairs of annealed oligonucleotides into the pIIIA/MS2-2 plasmid. Two mutant
probes were made by the introduction of three cytosine bases. The NREx2 mut1 RNA has
the UGU triplet of the newly added target region mutated to CCC, the NREx2 mut2 RNA
has the UGU triplet of the native NRE mutated to CCC:
NREx2: 5'-CCGGCUAGCAAUUGUUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3';
NREx2mut1: 5'- CCGGCUAGCAAUCCCUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3';
NREx2mut2: 5'- CCGGCUAGCAAUUGUCCCCUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'
Engineering RNA-binding proteins: Unravelling the code
79
Figure 4.4: 16 RNA-binding repeat PUF. A PUF domain consisting of 16 RNA-binding repeats provides additional binding specificity and selectivity. The structure of the engineered 16 repeat PUF and its cognate RNA target are shown schematically. NREx2 mut1 RNA has the UGU triplet of the newly added target region mutated to CCC, the NREx2 mut2 RNA has the UGU triplet of the native NRE mutated to CCC.
S. cerevisiae YBZ1 was transformed with the combination of the five RNA
expression plasmids with the original pGAD-PUM1 with EYFP as a negative control. The
transformants were plated on SC agar lacking leucine, uracil and histidine, supplemented
with 0.5 mM 3-amino triazole.
Engineering RNA-binding proteins: Unravelling the code
80
Figure 4.5: Extended PUF binds its extended RNA target. Survival on media with 0.5 mM 3-aminotriazole and lacking histidine and β-galactosidase assays were used to determine the interaction of PUF domains and their RNA targets. Data are mean ± SEM from six independent experiments. *, p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.
The experiment revealed that the extended PUF was not only able to successfully
bind to its cognate extended RNA target in yeast, it also activated transcription of the β-
galactosidase reporter gene more efficiently than the eight repeat PUF with its cognate
RNA (Figure 4.5). The wild type PUF protein was able to bind just as well to the NREx2 and
the NREx2 mut1 because the NREx2 mut1 RNA has the UGU triplet of the newly added
target region mutated to CCC, unlike the NREx2 mut2 which has the CCC mutation
introduced at the native PUF target site. The inserted and flanking PUF repeats
contributed to the binding affinity and selectivity as separately mutating the UGU triplets
recognized by both sets of repeats significantly decreased β-galactosidase activity and
Engineering RNA-binding proteins: Unravelling the code
81
growth on selective media. It would be of interest to obtain the crystal structure of the 16
repeat PUF to enable the visualization of how it binds to its extended RNA target. This
experiment shows that engineered PUF domain proteins which have 16 RNA-binding
repeats can provide the means to selectively bind RNAs in higher eukaryotes that have
more complex transcriptomes.
4.4 Summary
We have successfully shown that the newly discovered cytosine-binding PUF
repeat can be used in a modular manner. This is significant as binding specificity can be
transferred to different positions in the RNA recognition sequence, enabling the design of
engineered PUFs with predictable binding. We also observed that the extent of the
specificity shift from repeat to repeat varied, similarly observed by other studies that
report that not all repeat-base interactions contribute equally to the binding energy of the
RNA-protein complex (Zamore et al., 1997; Cheong and Hall, 2006).
We found that the PUF protein was able to bind to highly structured RNA targets,
which was previously not known as other studies have only reported interactions between
RNA targets that were exclusively single stranded in the RNA-protein complexes. Finally,
we engineered an extended PUF protein with 16 RNA-binding repeats and successfully
showed that it had the ability to bind to its extended RNA target in yeast and also
activated transcription of the β-galactosidase reporter gene more efficiently than the
eight-repeat wild type PUF. Having the ability to target a specific RNA species within the
plethora of transcripts would be highly advantageous for various biotechnology and
medical applications. Engineered PUF proteins would provide unique prospects for fine-
tuning the expression of endogenous genes or transgenes as the post-transcriptional
control of gene expression is rapid and more precise (Isaacs et al., 2004; Zenklusen et al.,
2008). Additionally, the ability to regulate events such as nuclear retention or cytoplasmic
Engineering RNA-binding proteins: Unravelling the code
82
localization of mRNAs can only be achieved at the level of RNA, for instance the mRNAs
(Johnston, 2005). This new toolkit may aid us in better understanding the complex
patterns of gene expression in living cells (Filipovska and Rackham, 2008; Isaacs et al.,
2006).
Engineering RNA-binding proteins: Unravelling the code
83
CHAPTER 5
Engineering Consensus PPRs
The pentatricopeptide repeat (PPR) proteins are made up of 2-26 copies of a 35
amino acid degenerate motif, comprised of two anti-parallel α-helices (Schmitz-
Linneweber and Small, 2008; Small and Peeters; 2008; Ringel et al., 2011). Although they
are similar in structure to the tetratricopeptide (TPR) repeat, TPR proteins are responsible
for mediating protein-protein interactions while PPR domains are mainly involved in RNA-
protein interactions (Small and Peeters, 2000). PPR proteins are of interest as not much is
known with respect to their modular architecture, folding and binding specificities. What
is known is that they are present in a large number of proteins that are associated with an
extensive range of biological functions involving RNA. Their repetitive nature also
indicates that they might have potential as modular RNA-binding proteins similar to PUFs.
Engineering repeat proteins composed of identical or near-identical repeats is an
appealing approach to elucidate their fundamental attributes. Such scaffolds would
consist of short and simple repeated modules, and each repeat would have identical intra-
and inter-repeat interactions (Main et al., 2003). In addition, perfect designed repeat
proteins could be more symmetrical structurally than naturally occurring repeat proteins
and adding or removing whole repeats would simply extend or shorten the protein
without disrupting its tertiary structure (Main et al., 2003). The ability to easily adjust or
modify them enables a more flexible approach to investigate repeat proteins. The
abundance of sequences accessible makes a statistical design approach a fitting strategy
as multiple sequence alignments can be conducted to aid in the identification of
functionally and structurally important residues of repeat proteins. The profound interest
in repeat proteins had already led to the successful design of consensus TPR (Main et al.,
2003), ankyrin (Kohl et al., 2003; Mosavi et al., 2002), and leucine-rich repeat (LRR)
Engineering RNA-binding proteins: Unravelling the code
84
proteins (Stumpp et al., 2003). Our aim was to use a consensus-based PPR design to
create an array of PPR proteins to decipher the RNA-binding code of these proteins and to
create a robust RNA-binding scaffold for biotechnology applications. Although during the
course of this study Barkan et al (2012) used computational methods to deduce a code for
nucleotide recognition of PPR proteins by using the maize protein PPR10 as a model, there
are an abundance of amino acid combinations at positions 4 and 34 (6 and 1’ according to
Barkan et al.) in nature that cannot be predicted computationally. The highly insoluble
nature of naturally occurring PPRs makes them extremely challenging to study. In
addition, the auxiliary sequences in PPR10 make it difficult to determine which PPRs are
important for RNA recognition or to rationally adjust the number of RNA-binding repeats
in this protein.
5.1 Designing consensus PPR
Consensus design is defined as the engineering of a protein composed of the most
common residues at each position determined from a multiple sequence alignment
(Desjarlais and Berg, 1993). More often than not, these designed consensus proteins have
been substantially more stable than natural proteins used in the multiple sequence
alignment. The first step in constructing a consensus PPR (cPPR) was to create a PPR
multiple sequence alignment by searching and retrieving all PPR sequences (Figure 5.1).
An initial PPR consensus was made to maximize the inclusiveness of the subsequent PSI-
BLAST to build a final consensus. The 100 most diverse PPR sequences were obtained from
Pfam PPR record TIGR00756 and narrowed down to canonical 35 amino acid PPRs. PSI-
BLAST was used to generate the multiple sequence alignments, resulting in a position-
specific scoring matrix (PSSM; Altschul and Koonin, 1998). The NCBI Protein Reference
Sequences were searched, as it is a curated, non-redundant database. PSSM is a scoring
matrix that provides substitution scores of amino acid for each position in a protein
multiple sequence alignment individually (National Council for Biotechnology Information,
2011). Positive scores specify that a particular amino acid substitution occurs more
Engineering RNA-binding proteins: Unravelling the code
85
frequently in the alignment than expected while negative scores specify that the
substitution occurs less frequently. Most of the time, large positive scores indicate critical
functional residues (eg. active site residues; National Council for Biotechnology
Information, 2011). Four iterations were performed at which point no new significant
matches were identified. Iteration refers to the process whereby a protein profile is run
against a database, after which new similar sequences can be detected (Altschul and
Koonin, 1998). A new multiple alignment that includes the sequences from the previous
iteration, can be constructed, resulting in the abstraction of a new profile and a new
database search performed. The procedure can be iterated as repeatedly as needed or
until convergence when no new statistically significant sequences can be detected
(Altschul and Koonin, 1998). The consensus PPR PSSM was derived from 6286
sequences. Calculation of a global propensity for each position in the PPR motif was
conducted in order to determine the ratio of the percentage of occurrence of an amino
acid at a given position to its percentage occurrence in the protein database (Andrade et
al., 2001; Kajender et al., 2006). The final designed consensus PPR sequence was taken as
those residues with the highest global propensity at each position of the PPR motif.
Figure 5.1: Pentatricopeptide repeats. A sequence logo illustrating the characteristic amino acid composition of PPR sequences. The PPR profile was constructed from 14,466 PPRs found in the PROSITE PPR entry (PDOC51375; WebLogo). These sequences were derived from the following taxonomic groups: 86% plants, 5.7% fungi, 4.3% animals, 1.8% algae, 1% trypanosomes and 1.2% others. Amino acids are colour coded according to the physiochemical properties of their side chains: small (A, G) in black, nucleophilic (C, S, T) in blue, hydrophobic (I, L, V, M, P) in green, aromatic (F, W, Y) in red, acidic (D, E) in purple, amides (Q, N) in pink, and basic (H, K, R) in orange. Regions of alpha helical structure are shown below. Amino acids are numbered based on the Pfam model, which functions as a minimal unit. Residue 34 is also defined as ii according to Kobayashi et al. (2012) while the numbering scheme used by Fujii et al. (2012) is shifted to the N-terminus by two amino acids such that amino acids 1, 4 and 34 in the Pfam model are annotated as 3, 6 and 1, respectively.
Engineering RNA-binding proteins: Unravelling the code
86
The one exception was position 11, where cysteine was replaced by glycine to
exclude the possibility of undesirable oxidation/reduction reactions that may interfere
with folding. The design of the final cPPR protein consisted of 8 repeats because (i) it is of
a manageable size (ii) based on previous experience working with PUFs (iii) it strikes a
balance between effective binding and non-specific association (iv) it is predicted to be
able to bind a contiguous RNA. In addition to the repeated consensus units, two extra
features were inserted into the consensus proteins: nucleating N-terminal (Met-Gly-Asn-
Ser) residues and a C-terminal solvating helix similarly utilized by Main et al. (2003). The
design was modeled after a consensus TPR design used by Main et al. (2003) to engineer
novel proteins by arraying various numbers of an idealized TPR motif. They found that the
proteins were stable, possessed native-like properties (such as reversible thermal
denaturation transitions) and formed the desired TPR fold.
cPPRcaps Amino acid Sequence
GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYNTLISGLGKAG
Features N-Cap PPR repeat (x8) Half repeat*
*half repeat refers to the solvating helix sequence.
The Met-Gly-Asn-Ser N-terminal cap was used because statistically, Gly, Asn, and
Ser have the highest propensities to occur at the N″, N′, and N cap positions in α helices
(Kumar and Mansal, 1998; Aurora and Rose, 1998; Dasgupta and Bell, 1998; Richardson
and Richardson, 1993). The C-terminal solvating helix (referred as half repeat) was added
after the final consensus repeat with the rationale of increasing its solubility (Main et al.,
2003).
The binding preferences of the cPPRcaps protein were tested in vitro via RNA
electrophoretic mobility shift assays (RNA EMSA) between the consensus PPR protein and
four RNA probes [poly(A), poly(G), poly(C) and poly(U)]. In order to do this, I over
expressed the cPPRcaps poly(U) protein in E. coli and purified it to homogeneity.
Engineering RNA-binding proteins: Unravelling the code
87
5.1.1 Purifying cPPRcaps protein
The synthetic cPPRcaps domain was first sub-cloned from the pMK-RQ plasmid into
the pTYB3 protein expression plasmid. This system expresses the consensus PPR proteins
as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The proteins were
purified as per Section 2.2.7.
Figure 5.2: Purification of cPPRcaps protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Soluble lysate after sonication. Lane 6: Unbound soluble protein after chitin bead binding. Lane 7: Purified cPPRcaps protein. Lane 8: Chitin beads
5.1.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps protein
In order to determine the binding specificity of the purified cPPRcaps protein, an in
vitro assay known as the RNA-electrophoretic mobility shift assay (RNA EMSA) was
conducted. This assay was conducted by incubating the purified cPPR proteins with
cPPRcaps-intein- CBD intein-CBD
cPPRcaps
Engineering RNA-binding proteins: Unravelling the code
88
fluorescein labeled RNA oligonucleotides in binding buffer for an hour; the reactions were
subsequently analyzed by 10% PAGE in TAE and fluorescence was detected using a
Typhoon TRIO scanner. The four fluorescein-labeled RNA oligonucleotides are poly(G) [5’-
(Fl)AAGGGGGGGG-3’], poly(C) [5’-(Fl)CCCCCCCCCC-3’], poly(U) [5’-(Fl)UUUUUUUUUU -3’]
and poly(A) [5’-(Fl)AAAAAAAAAA-3’]. In vitro assays with poly Guanine probes are
challenging due to the propensity of G tracts to form stable quadruplex structures
(Kobayashi et al., 2011). Indeed no fluorescence could be detected in a fluorescein-labeled
RNA consisting of 10 Gs. To bypass this limitation, we used a probe where a run of 8 Gs
was linked to a fluorescein label via two Adenine residues. This probe had readily
detectable fluorescence and was used in all REMSAs as a “polyG” probe.
Figure 5.3: RNA probe recognition of cPPRcaps in vitro. (a) cPPRcaps is specific for uracil containing RNAs, determined by RNA EMSA. cPPRcaps protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.
Engineering RNA-binding proteins: Unravelling the code
89
The RNA EMSAs showed specific binding between the cPPRcaps protein and the
poly(U) target RNA compared to the other three poly(A), poly(G) and poly(C) RNA probes
(Figure 5.3). The top most RNA EMSA with the poly(U) probe shows the cPPRcaps poly(U)
binding strongly to the RNA target, with RNA-protein complexes forming at 0.08 µM
protein. This shows that the combinatorial amino acid code for uracil recognition is
contained within the consensus PPR protein we designed. After I obtained this result, the
study by Barkan et al. (2012) was published showing that residues 4 and 34 of PPRs are
responsible for their binding specificity. According to the code elucidated by Barkan et al.
(2012), the consensus that has residues Asp (N) and Aspartic acid (D) at positions 4 and 34
(N4D34) will bind U. Therefore, given that all 8 repeats are identical, it would be predicted
that the purified cPPR would bind to a poly(U) RNA target. Here I confirmed that
prediction and provide validation for their proposed code. Additionally, the cPPR design
could be utilized as a scaffold to find other combinations of amino acids that could
specifically bind other bases.
5.2 In vitro analysis of other cPPR interaction based on Barkan et al. (2012)
The next set of cPPR repeats were designed according to the code described by
Barkan et al. (2012). This code specifies that (i) Thr (T) at position 4 and Asp (D) at position
34 (T4D34) will bind guanine, (ii) Thr (T) at position 4 and Asn (N) at position 34 (T4N34) will
bind adenine, and (iii) Asn (N) at position 4 and Asn (N) at position 34 (N4N34) will bind
cytosine or uracil. The binding preferences of the cPPRcaps poly(A), poly(G) and poly(U/C)
protein was tested in vitro via the RNA EMSA between the consensus PPR proteins and
four RNA probes. The cPPR proteins were over-expressed in E. coli and purified to
homogeneity.
Engineering RNA-binding proteins: Unravelling the code
90
cPPRcaps Amino acid Sequence poly(A) GAPMGNS VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYTTLISGLGKAG poly(G) GAPMGNS VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYTTLISGLGKAG
poly(U/C) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYNTLISGLGKAG Features N-Cap PPR repeat (x8) Half repeat*
*half repeat refers to the solvating helix sequence.
5.2.1 Purification of cPPRcaps poly(A) protein
The synthetic cPPR poly(A) domain was first sub-cloned from the pMK-RQ plasmid
into the pTYB3 protein expression plasmid. This system expresses the consensus PPR
proteins as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The
proteins were purified as per Section 2.2.7 (Figure 5.4).
Figure 5.4: Purification of cPPRcaps poly(A) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(A) protein. Lane 7: Chitin beads
cPPRcaps poly(A)- intein-CBD
intein-CBD
cPPRcaps poly(A) protein
Engineering RNA-binding proteins: Unravelling the code
91
5.2.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(A) proteins.
In order to confirm the binding specificity of the purified cPPRcaps poly(A)
proteins, an RNA EMSA was conducted (as described above in Section 5.1.2).
Figure 5.5: RNA probe recognition of cPPRcaps Poly A in vitro. (a) cPPRcaps poly(A) is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(A) protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.
The RNA EMSAs showed highly specific binding between the cPPRcaps poly(A)
protein and its poly(A) target RNA compared to the other three poly(U), poly(G) and
poly(C) RNA probes. The third RNA EMSA from the top with the poly(A) probe shows the
cPPRcaps poly(A) binding strongly to its target, with RNA-protein complexes forming at
0.01 µM protein. Hence, the combinatorial amino acid code for adenine recognition is
T4N34.
Engineering RNA-binding proteins: Unravelling the code
92
5.2.3 Purification of cPPRcaps poly(G) protein
The synthetic cPPR Poly G domain was first sub-cloned from the pMK-RQ plasmid
into the pTYB3 protein expression plasmid. The proteins were purified as per Section 2.2.7
(Figure 5.6).
Figure 5.6: Purification of cPPRcaps poly(G) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPR PolyG protein. Lane 7: Chitin beads
cPPRcaps poly(G)- intein-CBD intein-CBD
cPPRcaps poly(G) protein
Engineering RNA-binding proteins: Unravelling the code
93
5.2.4 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(G) proteins.
In order to confirm the binding specificity of the purified cPPRcaps poly(A)
proteins, an RNA EMSA was conducted (as described above in Section 5.1.2).
Figure 5.7: RNA probe recognition of cPPRcaps poly(G) in vitro. (a) cPPRcaps poly(G) is not specific for any nucleotide probes, determined by RNA EMSA. cPPRcaps poly(G) protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.
The RNA EMSAs showed no shifts between the cPPRcaps poly(G) protein and its
poly(G) target RNA (Figure 5.7). No shifts were observed between the cPPRcaps poly(G)
and the other three RNA probes too.
Engineering RNA-binding proteins: Unravelling the code
94
5.2.5 Purification of cPPRcaps poly (U/C) protein
The synthetic cPPRcaps poly(U/C) domain was first sub-cloned from the pMK-RQ
plasmid into the pTYB3 protein expression plasmid. The proteins were purified as per
Section 2.2.7.
Figure 5.8: Purification of cPPRcaps poly(U/C) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(U/C) protein. Lane 7: Chitin beads
The cPPRcaps poly(U/C) protein is highly insoluble and was not successfully
purified (Figure 5.8).
cPPRcaps poly(U/C)- intein-CBD
Engineering RNA-binding proteins: Unravelling the code
95
5.3 In vitro analysis of other consensus PPR combinations
In order to discover combinations of amino acids at positions 4 and 34 that might
specifically bind G or C, we tested if either Gly (G) or Ser (S) at position 4 and Asp (D) at
position 34 (G4D34 or S4D34) will bind guanine and Asn (N) at position 4 and either Ser (S)
or Thr (T) at position 34 (N4S34 or N4T34) will bind cytosine. The binding preferences of the
cPPR Poly G (GD/SD) and cPPR Poly C (NS/NT) proteins were tested in vitro via RNA EMSA
between the consensus PPR proteins and four RNA probes. The cPPR proteins were over-
expressed in E. coli and purified to homogeneity.
cPPRcaps Amino acid Sequence PolyC (NT) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPTV VTYNTLISGLGKAG
PolyC (NS) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPSV VTYNTLISGLGKAG
PolyG (GD) GAPMGNS VTYGTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYGTLISGLGKAG
PolyG (SD) GAPMGNS VTYSTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYGTLISGLGKAG
Features N-Cap PPR repeat (x8) Half repeat*
*half repeat refers to the solvating helix sequence.
5.3.1 Purification of cPPRcaps poly(C) [NS/NT] protein
The synthetic cPPRcaps poly(C) [NS/NT] domains were first sub-cloned from the
pMK-RQ plasmid into the pTYB3 protein expression plasmid. The proteins were purified as
per Section 2.2.7 (Figures 5.9 and 5.10).
Engineering RNA-binding proteins: Unravelling the code
96
Figure 5.9: Purification of cPPRcaps poly(C) [NS] protein. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(C) [NS] protein. Lane 7: Chitin beads
Figure 5.10: Purification of cPPRcaps poly(C) [NT] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(C) [NT] protein. Lane 7: Chitin beads
cPPRcaps poly(C) [NS]- intein-CBD
intein-CBD
cPPRcaps poly(C) [NS]
cPPRcaps poly(C) [NT]- intein-CBD
intein-CBD
cPPRcaps poly(C) [NT]
Engineering RNA-binding proteins: Unravelling the code
97
5.3.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(C) [NS/NT]
proteins
In order to confirm the binding specificity of the purified cPPRcaps poly(C) proteins
[NS/NT], an RNA EMSA was conducted (as described above in Section 5.1.2).
Figure 5.11: RNA probe recognition of cPPRcaps poly(C) [NS] in vitro. (a) cPPRcaps poly(C) [NS] is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(C) [NS] protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.
Engineering RNA-binding proteins: Unravelling the code
98
Figure 5.12: RNA probe recognition of cPPRcaps poly(C) [NT] in vitro. (a) cPPRcaps poly(C) [NT] is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(C) [NT] protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.
Both RNA EMSAs showed specific binding between the cPPRcaps poly(C) [NS] and
[NT] variants and its poly(C) target RNA compared to the other three poly(A), poly(G) and
poly(U) RNA probes (Figures 5.11 and 5.12). cPPRcaps poly(C) [NS] has a slightly stronger
binding affinity compared to cPPRcaps poly(C) [NT] because binding was achieved at 0.08
µM protein compared to 0.16 µM for the latter. This experiment shows that the
combinatorial amino acids that enable cytosine recognition are N4T34 and N4S34.
5.3.3 Purification of cPPRcaps poly(G) [GD/SD] proteins
The synthetic cPPRcaps poly(G) [GD/SD] domain was first sub-cloned from the
pMK-RQ plasmid into the pTYB3 protein expression plasmid. The proteins were purified as
per Section 2.2.7 (Figures 5.13 and 5.14).
Engineering RNA-binding proteins: Unravelling the code
99
Figure 5.13: Purification of cPPRcaps poly(G) [GD] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(G) [GD] protein. Lane 7: Chitin beads Figure 5.14: Purification of cPPRcaps poly(G) [SD] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(G) [SD] protein. Lane 7: Chitin beads
cPPRcaps poly(G) [GD]- intein-CBD intein-CBD
cPPRcaps poly(G) [GD]
cPPRcaps poly(G) [SD]- intein-CBD
intein-CBD
cPPRcaps poly(G) [SD]
Engineering RNA-binding proteins: Unravelling the code
100
5.3.4 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(G) [GD/SD]
proteins
In order to confirm the binding specificity of the purified cPPRcaps poly(G) proteins
[GD/SD], an RNA EMSA was conducted (as described above in Section 5.1.2).
Figure 5.15: RNA probe recognition of cPPRcaps Poly G (SD) in vitro. Both cPPRcaps poly(G) [SD] and cPPRcaps poly(G) [GD; not shown] are not specific for any RNAs, determined by RNA EMSA. cPPRcaps poly(G) [NS] protein was tested against guanine and three other RNA probes (not shown: adenine, uracil, cytosine).
Both cPPRcaps poly(G) [SD/GD] proteins did not bind to its poly(G) RNA probe
(Figure 5.15). No binding was observed between both cPPRcaps poly(G) variants and the
other three RNA probes either (data not shown). It was speculated that the poly(G) RNA
probe may have been compromised as G homopolymers in RNA are known to form highly
stable quadruplex structures (Pochon and Michelson, 1965; Simonsson, 2001).
5.4 Summary
We constructed a stable and soluble consensus PPR architecture and used it to
uncover the amino acid code for binding RNA by PPR proteins. We validated the results of
two previous studies, Kobayashi et al. (2012) and Barkan et al. (2012), and shown that
Engineering RNA-binding proteins: Unravelling the code
101
amino acids at position 4 and 34 are important for nucleotide recognition. The code,
similar to that described by Barkan et al. (2012), specifies that N4D34 bind uracil; T4N34
bind adenine and N4S34 bind cytosine. Futhermore, I showed for the first time that
repeats with N4T34 specifically bind cytosine.
Figure 5.16: The recognition code of PPRs for RNA bases. On the left is the code described by Barkan et al. (2012). Highlighted in blue is our newly found N4T34 code that specifies cytosine binding.
My designed consensus PPRs have revealed the code for RNA recognition functions
in a highly reduced protein consisting almost entirely of only 8 PPRs. We also showed that
PPRs do not all bind with the same affinities for their target RNA, as seen with cPPRcaps
poly(A) compared to cPPRcaps poly(U). The differences in RNA binding affinity of the
proteins may indicate that there are PPR motifs of low and high RNA binding affinities
(Kobayashi et al., 2012), with each nucleotide recognition having different contributions to
Engineering RNA-binding proteins: Unravelling the code
102
the overall binding affinity. Kobayashi et al. (2012) found that having several PPR motifs
led to higher binding affinities compared to proteins with a single PPR motif, suggesting
that there may be cooperative effects between motifs observed during RNA-binding,
instead of simple sum of individual motif affinities. These interactions may be similar to
those observed with RRMs, whereby the combination of two or more RNA binding
domains significantly increases the affinity for the target (Clery et al., 2008). The next step
forward is to determine if the newly found code retains its modularity and to investigate
other potential applications of PPR proteins.
Engineering RNA-binding proteins: Unravelling the code
103
CHAPTER 6
Engineering Designer PPR proteins
Here I investigated the modularity of the PPR code and sought to determine if the
consensus PPR can be recoded to bind a specific target akin to the predictable modularity
of the PUF proteins (Wang et al., 2002; Wang et al., 2009; Tilsner et al., 2009). I
investigated the possibility of engineering PPR proteins to recognize endogenous RNAs
and if they can be used to modulate gene expression. A PPR protein was designed to
target the native target of the PUM1 protein, the NRE, that would enable me to compare
and contrast the binding characteristics of PUF and PPR proteins. Next I used a PPR to
target the poly(A) tails of mRNAs encoded by the mitochondrial genome in human cells.
As there is little known about the role of poly(A) tails of mammalian mitochondrial mRNAs
(Nagaike et al.,2005), this strategy would be very useful to study their role in gene
expression.
6.1 Design of a consensus PPR protein that binds the NRE RNA
The cPPRcaps NRE was designed according to the Pfam model. The Pfam model
defines the 1st amino acid of the PPR motif as the start of helix A, which is a valine.
(http://Pfam.sanger.ac.uk/, PF01535). Kobayashi et al. (2012) also showed that the
proteins of the Pfam model displayed definite RNA binding activities based on the
apparent KD with position 34 being important for PPR function. To determine if binding
specificity of PPRs can be engineered, using the predicted PPR code, I designed a PPR
protein that should bind the native RNA target of PUM1, NRE. The cPPRcaps NRE
(GeneArt) was cloned into the E. coli expression vector pTYB3. The binding specificity of
the cPPRcaps NRE protein was tested in the presence of two RNA probes (NRE and
NREU3C) in vitro using an RNA electrophoretic mobility shift assay (RNA EMSA). Initially, I
over expressed the cPPRcaps NRE protein in E. coli and purified it to homogeneity.
Engineering RNA-binding proteins: Unravelling the code
104
cPPRcaps NRE
Amino acid Sequence Predicted base
Repeat1 GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat2 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPDV G Repeat3 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat4 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV A Repeat5 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat6 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV A Repeat7 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat8 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYNTLISGLGKAG A Features N-Cap PPR repeats Half repeat*
*half repeat refers to the solvating helix sequence 6.1.1 Purifying cPPRcaps NRE proteins
The synthetic cPPRcaps NRE domain was first sub-cloned from the pMK-RQ plasmid
into the pTYB3 protein expression plasmid. This system expresses the consensus PPR
proteins as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The
proteins were purified as per Section 2.2.7. A large fraction of cPPRcaps NRE was soluble
and could be purified effectively (Figure 6.1).
Figure 6.1: Purification of cPPRcaps NRE protein. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps NRE protein. Lane 7: Chitin beads
cPPRcaps NRE intein-CBD
intein-CBD
cPPRcaps NRE
Engineering RNA-binding proteins: Unravelling the code
105
6.1.2 RNA electrophoretic mobility shift assay of the cPPRcaps NRE protein
To confirm the binding specificity of the purified cPPRcaps NRE proteins, a RNA
EMSA was conducted with fluorescein-labeled RNA oligonucleotides NRE and NREU3C
concurrently with the PUM1 protein as a comparison of binding specificity.
Figure 6.2: Comparison between PUM1 and cPPRcaps NRE in vitro. Both proteins were tested against the wild type NRE and cytosine-containing RNA probes.
The RNA EMSA revealed a prominent specificity shift whereby the cPPRcaps NRE
bound to its cognate NRE RNA with an affinity very similar to the wild type PUF and the
NRE RNA, while the cPPRcaps NRE bound to the cytosine-containing RNA with lower
affinity. Unlike the cPPRcaps NRE, the higher binding stringency of the PUM1 protein did
not enable the visualization of a NREU3C-PUM1 complex in this experiment.
Engineering RNA-binding proteins: Unravelling the code
106
6.2 Mammalian mitochondrial RNA metabolism
The mammalian mitochondrial DNA (mtDNA) is 16.5 kb in size, circular and double
stranded, encoding for 13 proteins (Smeitink et al., 2001). These proteins include all
members of the mitochondrial ATP synthase, in addition to respiratory complexes I, III,
and IV that form the electron transport chain in the mitochondrial inner membrane
together with Complex II (Smeitink et al., 2001). The electron transport chain (ETC) and
the ATP synthase are responsible for producing energy via oxidative phosphorylation.
Nuclear and mitochondrial gene expression requires cooperative regulation in response to
energy production. This is because the mitochondrial respiratory complexes are mainly
composed of nuclear encoded polypeptides that are imported into mitochondria post-
translationally, with their proper assembly and function dependent on the expression of
the mitochondria encoded polypeptides (Smeitink et al., 2001). The mitochondrial
genome is also dependent on nuclear encoded proteins for replication, repair,
transcription, and translation (Shoubridge, 2001). Recently, a study by Mercer et al. (2011)
showed that mitochondrial gene expression is significantly correlated with nuclear gene
expression, reinforcing the close coordination between both genomes in relation to the
energy needs of different tissues.
Mammalian mitochondrial mRNAs differ to the nuclear encoded mRNAs in that (i)
they lack both 5’ and 3’ untranslated regions (UTRs), (ii) have no Shine-Dalgarno
sequences, (iii) lack 5’ 7-methylguanosine caps, (iv) do not include introns and base
modifications (Montoya et al., 1981). Other interesting features are that mitochondrial
mRNAs use a non-universal genetic code where AUA codes for a methionine instead of
isoleucine, AGA and AGG are used as termination codons instead of coding for arginine,
and that UGA encodes for a tryptophan instead of a stop codon (Barrell et al., 1979). A
majority of mature mammalian mitochondrial mRNAs start at the first nucleotide
positioned at the 5’ end; nine open reading frames (ORFs) use AUG as their start codons,
three ORFs use AUA, and one ORF uses AUU to encode the first methionine in mammalian
Engineering RNA-binding proteins: Unravelling the code
107
mitochondrial encoded proteins (Mercer et al., 2011). The three ORFs that are the
exception to this rule encode MT-ATP8, MT-ND1 and MT-CO1 because they have 1, 2 and
3 nucleotides preceding their start codons (Mercer et al., 2011). The second ORF of the
two bicistronic transcripts MT-ND4L/ND4 and MT-ATP8/ATP6 have a prominent 5’ leader
sequence compared to the remaining mRNAs. Although all these features results in a
more compact mitochondrial genome, their significance remains to be determined. In
addition, the translation of all these mRNAs and the mechanism by which mitochondrial
ribosomes recognize them are still unknown (Rackham et al., 2012).
One feature that remains identical to that found in nuclear mRNAs is
polyadenylation. With the exception of MT-ND6 mRNA, all mitochondrial mRNAs are
polyadenylated at the 3’ end (Temperley et al., 2010; Slomovic et al., 2005, Mercer et al.,
2011). 3’ polyadenylation is crucial for the maturation of mitochondrial mRNAs because
seven of the 11 mRNAs use it to complete their termination codons (Temperley et
al.,2010; Bobrowicz et al., 2009; Borowski et al., 2010; Gagliardi et al., 2004; Nagaike et
al., 2008). Approximately 45 nucleotides are added onto the 10 polyadenylated
mammalian mRNAs, but their length may vary depending on the specific mRNA and
between different cell types or even with the same mRNA in different cell types
(Temperley et al., 2010). For mRNAs such as MT-CO1 and MT-ND6, it has been proposed
that the 3’ UTRs facilitates the proper termination of protein synthesis by establishing
strong secondary structures (Temperley et al., 2010). Recently, Rackham et al. (2011)
found that not only was the 3’ UTR of MT-ND5 significantly longer compared to MT-CO1
and MT-CO2 (which are not as extensively polyadenylated as the other nine mRNAs), the
3’ UTR of this mRNA acts as a stable long non-coding RNA, although its function is not
known.
Engineering RNA-binding proteins: Unravelling the code
108
A study by Slomovic et al. (2005) found that there were variations in RNA
polyadenylation frequencies that do not correlate with RNA abundance, indicating that
some mitochondrial RNAs may exist in a non-polyadenylated form. mRNAs such as MT-
ND4L/ND4, which require polyadenylation to complete an in-frame stop codon (Ojala et
al., 1981), were found to be diminished in fractions depleted of polyadenylated transcripts
while others like MT-CO1 mRNA were found to be enriched in these fractions. This
indicates that mRNAs which require polyadenylation to complete an in-frame stop codon
are stabilized by polyadenylation of their 3’ end. Wydro et al. (2010) conducted a study
that involved importing a cytoplasmic poly(A) specific 3’ → 5’ exoribonuclease (PARN) into
mitochondria with the purpose of removing the poly(A) tails. They found that this
stabilized some mRNAs, made some of them less stable while others did not appear to
have any effects. This shows that the role of mitochondrial mRNA polyadenylation in
mammals is not always consistent and appears to vary for specific mRNAs. Likewise in
cells, MT-CO1, MT-CO2, MT-CO3, and MT-ATP8/ATP6 mRNAs can be destabilized by the
reduction of the mitochondrial poly(A) polymerase (PAPD1) but have no effect on MT-ND3
mRNA (Nagaike et al., 2005), but a decrease in the length of the poly(A) tail of MT-ND1
mRNA by 2’-phosphodiesterase (PDE12) has been shown to increase its abundance
(Rorbach et al., 2011). The significance of polyadenylated and non-polyadenylated mRNAs
remains unclear but these findings suggest that there are indeed two individual isoform
pools.
Polyadenylation has been shown to affect mitochondrial translation but the exact
role that it plays in mRNA stability and translation needs to be investigated further to help
understand the role of mRNA polyadenylation in mammalian mitochondria (Wydro et al.,
2005; Rorbach et al., 2011; Ruzzenente et al., 2011). As an initial step towards
manipulating the mitochondrial transcriptome at the polyadenylation level, I designed a
mitochondrially targeted poly(A) recognizing PPR protein to bind mitochondrial mRNAs in
cells.
Engineering RNA-binding proteins: Unravelling the code
109
6.2.1 cPPRcaps poly(A) reduces the translation of mitochondrially encoded proteins
In Chapter 5, the RNA EMSA showed that the cPPRcaps poly(A) (Figure 5.5), has a
very strong affinity for its cognate poly(A) RNA. I used this cPPRcaps poly(A) protein and
fused it to a mitochondrial targeting signal derived from ornithine transcarbamylase (OTC;
Mori et al., 1982; Horwich et al., 1986) so that I could use it to bind poly(A) tails of mature
mitochondrial mRNAs in cells. OTC is a nuclear-encoded mitochondrial matrix enzyme
whose leader peptide directs mitochondrial localization, both in vitro and in intact cells
(Horwich et al., 1986). The precursor is recognized by mitochondria and translocated in an
energy-dependent fashion, across both mitochondrial membranes (Mori et al., 1982;
Kolansky et al., 1982). Mitochondrial protein synthesis was measured in the presence of
cyclohexamide to inhibit cytoplasmic translation and 35S cysteine and methionine to label
mitochondrially encoded proteins only, when cPPRcaps poly(A), cPPRcaps NRE or EYFP
proteins were expressed in 143B osteosarcoma cells.
Figure 6.3: cPPRcaps poly(A) affects mitochondrial protein synthesis. (Left) cPPRcaps poly(A) expression lowers mitochondrial translation in cells. 143B cells were transfected with pOTC- cPPRcaps poly(A), pOTC-cPPRcaps NRE and pEYFP-TAP and protein synthesis was measured by pulse incorporation of 35S-labelled methionine and cysteine. Equal amounts of cell lysate protein were separated by SDS–PAGE and visualised by autoradiography. (Right) The gels were stained with Coomassie Brilliant Blue to confirm equal loading.
Engineering RNA-binding proteins: Unravelling the code
110
A general decrease of mitochondrially-encoded proteins was observed of the cPPR
protein designed to target mitochondrial mRNAs, compared to controls that was
particularly apparent after a 6-day expression of this protein (Figure 6.3). It appears that a
second cPPRcaps poly(A) transfection at day 4 resulted in a more pronounce effect on
protein levels at day 6 compared to a single transfection. This finding is very significant
since to date, it has been very challenging to manipulate the mitochondrial transcriptome
that is contained within the mitochondrial matrix. The two mitochondrial membranes
(inner and outer), one of which is tightly coupled and impermeable to anything but ions
(Alberts et al., 2002), that surround the mitochondrial matrix provide a barrier that
prevents the import of any macromolecules, particularly if they are negatively charged.
For this reason, manipulation of mitochondrial gene expression using antisense agents,
RNAi or other technologies has not been possible to date. Further investigation would be
required to determine the maximal level of overall protein knockdown that can be
achieved with prolonged PPR protein expression cycle, possibly through the establishment
of stable cell lines.
6.3 cPPRcaps poly(A) does not affect mitochondrial RNA stability
To investigate if the binding of the poly(A) tails of mitochondrial RNAs affects the
stability of the mRNAs or if it interferes with their translation on mitochondrial ribosomes
I isolated RNA from cells that were treated with the cPPRcaps poly(A) or cPPRcaps NRE
and control treatments and carried out northern blotting. I investigated the effects of
these proteins on several different mitochondrial mRNAs that have varying poly(A) tail
lengths, the MTND6 mRNA that is known to lack a poly(A) tail (Temperley et al., 2010;
Mercer et al., 2011), and the 12S and 16S rRNAs that are also known to lack a poly(A) tail.
Engineering RNA-binding proteins: Unravelling the code
111
Figure 6.4: cPPRcaps poly(A) lowers CytB and ND1 transcripts. RNA isolated from mitochondria was analysed by northern blotting after expression of cPPRcaps poly(A) and cPPRcaps NRE. The blot was stripped and re-probed with probes against mitochondrial mRNAs and ribosomal RNAs (RNR1 and RNR2).
The northern blots indicate that the levels of mitochondrial transcripts are not
affected by the PPR proteins suggesting that the binding of the proteins does not
compromise RNA stability, with the exception of ND1 and cytB by cPPRcaps-NRE after 3
days. This result is very intriguing and requires further examination. Overall, the general
decrease in protein synthesis of the mitochondrial transcripts however indicates that the
poly(A)-binding PPR protein acts to decrease the efficiency of translation.
Engineering RNA-binding proteins: Unravelling the code
112
6.4 Summary
We have shown the PPR proteins possess modularity, similar to that observed in
PUF proteins. This means that the binding specificity of PPRs repeats can be transferred to
various positions in the RNA recognition sequence to enable the design of engineered
PPRs with predictable binding. It has already been shown by numerous studies that
naturally occurring PPRs have diverse roles in mitochondrial gene expression. The modular
nature of these proteins may allow them to have versatile RNA regulatory functions.
Elucidation of the RNA recognition code of PPRs should enable the prediction of their RNA
targets and also to engineer them as tools to specifically and selectively manipulate
mammalian mitochondrial gene expression. In this study, we have successfully engineered
a PPR protein that can target the poly(A) tail of mitochondrial mRNAs, providing evidence
that designer PPRs have the potential to be used to investigate mitochondrial mRNA
transcripts. The next step would be to fuse PPRs to effector domains to make new
discoveries about the post-transcriptional regulation of the mitochondrial genome.
Thus far, analysis of the human mitochondrial transcriptome has uncovered many
novel and important mechanisms for regulating the expression of its genes and the energy
metabolism in cells (Mercer et al., 2011). The unique features of mitochondrial transcripts
and the need for post-transcriptional regulation of its expression demand further
investigation. Engineered RBPs can be exploited to further understand the regulation of
mitochondrial gene expression and how these processes are coordinated in response to
environmental changes. Understanding the links between energy metabolism and
mitochondrial RNA control would provide significant insight into human diseases caused
by mutations in genes encoding mitochondrial proteins.
Engineering RNA-binding proteins: Unravelling the code
113
CHAPTER 7
Discussion
In all life forms binding of RNA by proteins is a vital function, given that all aspects
of gene expression and regulation require RNA-binding proteins. Here we have
successfully identified the amino acid code responsible for nucleotide recognition by PPR
proteins as well as expanded the base recognition scope of PUF proteins by engineering
them to specifically recognize cytosine. The availability of a code that enables the design
of proteins with predictable RNA targets will have many potential applications in
biotechnology and medicine. We have also provided evidence that PUF proteins can be
designed to bind any RNA of interest by designing a PUF protein containing 16 RNA-
binding repeats and that PPR protein can be exploited as potential tools to study
mitochondrial gene expression.
Given that transcription and translation in eukaryotes are uncoupled, the former
taking place in the nucleus and the latter in the cytoplasm, this provides extensive
opportunities for the use of designer RBPs to regulate gene expression post-
transcriptionally. As the control of gene expression is faster and more precise at the post-
transcriptional level, engineered PUF and PPR proteins can be exploited for fine-tuning the
expression of endogenous genes or transgenes, not to mention that some aspects of gene
expression can only be controlled at the level of RNA (eg. the cytoplasmic localization or
nuclear retention of mRNAs; Isaacs et al., 2004; Zenklusen et al., 2008; Johnston et al.,
2005). Numerous studies have shown that mutations or alteration in expression of either
RBPs or their binding sites in target transcripts are the cause of several human diseases
such as muscular atrophies, neurological disorder and cancer. Therefore designer RBPs
could help elucidate the mechanism of these disorders and even provide potential
therapies in the future (Lukong et al.,2008; Musunuru et al.,2003; Kim et al., 2009)
Engineering RNA-binding proteins: Unravelling the code
114
This study has led to the successful identification of five unique PUF mutants that
can selectively interact with RNAs containing a cytosine, achieved via the randomization of
amino acids at positions 12 and 16 to encode for all possible 20 amino acids and screening
for mutants using a yeast genetic system. All five variants had an arginine at position 16,
while the amino acids at position 12 were alanine, glycine, serine, threonine and cysteine.
A subsequent study by Dong et al. (2011) confirmed the specificity of the serine/arginine
pair that we found for cytosine residues and revealed that this specific interaction was
achieved via hydrogen bonding between O2 and N3 of the cytosine base with the side
chain of arginine in a mode comparable to the recognition of uracil by asparagine (Figure
7.1). Albeit different from uracil recognition (where the glutamine at position 16 contacts
the base), the serine at position 12 serves to position the arginine and does not contact
the base directly (Dong et al. 2011). This may explain the ability of cysteine, alanine,
threonine and glycine to replace serine to a certain degree.
We demonstrated that engineered PUF proteins may not always bind with the
same affinities as wild type proteins, which implies that one would have to consider which
PUF repeat combinations would result in the ideal binding affinity for target RNAs of
interest. We also illustrated that engineering a 16-repeat PUF is possible and that it
possessed enhanced specificity given that it bound to its cognate extended RNA target
more efficiently than the wild-type eight repeat PUF with its cognate RNA. A key piece of
material that would complete this study would be the crystal structure of the 16 repeat
PUF which will ultimately inform us if every single PUF repeat unit binds to each RNA base
in its extended RNA target.
Engineering RNA-binding proteins: Unravelling the code
115
Figure 7.1: Recognition of adenine, cytosine, guanine, and uracil by PUF repeats in the crystal structure of a mutant PUM1 (Filipovska and Rackham, 2011). Highlighted in red is the recognition of cytosine by PUF repeat with a serine at amino acid position 12 and an arginine at position 16.
On the other hand, the successful identification of the amino acid code for
nucleotide binding by PPR proteins that has been speculative and vague largely due to the
highly insoluble nature of PPR proteins, is revolutionary as this would finally enable the
engineering of customizable PPR proteins. We confirmed that amino acids at position 4
and 34 are responsible for nucleotide recognition, with N4D34 binding uracil, T4N34 binds
adenine and N4S34/N4T34 binding cytosine. We are currently in the midst of deciphering
the guanine recognition code. The amino acid code described by Barkan et al. (2012) in
maize protein PPR10 was shown to hold true in a highly reduced, consensus PPR
architecture, illustrating that the code is likely to be general to most, if not all, naturally
occurring PPR proteins. Furthermore, the highly soluble and stable consensus PPR should
enable the binding preferences of other amino acid combinations that cannot be
predicted computationally to be readily deciphered. This will provide the means to
discover the binding sites of naturally occurring PPR proteins and to understand their roles
in cells. It was previously hypothesized that like PUF proteins, PPR binding was modular,
but we have shown that this is indeed true as we were successful in designing a PPR
Engineering RNA-binding proteins: Unravelling the code
116
protein that could bind to the NRE RNA. We seek to ascertain the structure of the PPR-
RNA complex in order to provide further understanding of the fundamental principles of
its interaction and also to determine if any other residues in the PPR protein may further
enhance or stabilize binding to RNA, similar to amino acid 13 in PUF proteins.
There are many similarities and differences between PUF and PPR proteins that are
worth highlighting. Firstly, although the classic PUF repeats are 36 amino acids long while
PPR proteins are generally 35 amino acids long, the length distributions of both proteins
actually overlap because shorter and longer repeats of both can be found in naturally
occurring proteins (Wickens et al., 2002; Small et al.,2000; Filipovska and Rackham, 2011).
The overall numbers of repeats in individual proteins is highly variable in PPR proteins but
are much more constrained in PUF proteins, with almost all PUF repeat arrays consisting
of eight RNA-binding repeats (Wicken et al., 2002). This may be a reflection of the
evolutionary constraints placed on PUFs rather than biophysical limitations, as we were
able to engineer a 16 repeat PUF that specifically recognized an extended RNA target.
Structurally, repeats of PPR and PUF proteins are predominantly alpha helical, however
the PUF repeat unit consists of three distinct alpha helices (Wang et al.,2001; Edwards et
al.,2001), while PPRs are predicted to have two alpha helices organized in a helix-turn-
helix structure (Howards et al., 2012; Small and Peeters, 2000). Detailed crystal structures
have revealed that PUF domains interact in a one-to-one stoichiometry with their RNA
targets, however the mode by which PPRs bind their targets are yet to be verified in
crystal form.
The specific recognition of nucleic acids by PUFs are achieved via base-binding and
stacking amino acids found in the second helix of each PUF repeat, while we and Barkan et
al. (2012) have shown that the RNA recognition code of PPR proteins are determined by
the amino acids in positions four and 34. Co-variation analysis of fertility restorer genes
Engineering RNA-binding proteins: Unravelling the code
117
containing PPRs and their RNA targets in plant mitochondria by Fujii et al. (2011)
compellingly proposes that the amino acids in positions one, four and 34 are likely to be
responsible for RNA recognition, with all of these amino acids falling within or adjacent to
helix one of the PPR (Filipovska and Rackham, 2011). The alignment of helix one of PPRs
with helix two of PUF repeats, based on the likeness of its primary sequence places
residues 12, 13 and 16 of the PUF repeat in similar positions to residues with high co-
variation in the PPR repeat. As previously described, amino acids of PUFs at positions 12
and 16 interact with the RNA nucleotides via hydrogen bonding, while stacking
interactions occur at amino acid position 13. Although the specificities of PUFs repeats can
be reassigned without altering the amino acid at position 13 (Cheong and Hall, 2006), a
recent study by Koh et al. (2011) found that the choice of residue at this position can
influence the affinity and specificity of each repeat by modulating the stacking
interactions within the RNA-protein complex. In future studies, it will be of great interest
to determine if residues surrounding positions 4 and 34 of PPRs contribute to the affinity
or specificity of PPR-RNA association.
With regards to biotechnological applications, the availability of designer RBPs may
provide new tools to generate synthetic networks that are controlled at the level of RNA
and to enable better understanding of the complex patterns of gene expression in living
cells (Isaacs et al., 2006; Filipovska and Rackham et al., 2008). To date, the
biotechnological applications of PUFs have involved fusing them to effector domains with
well-characterized functions. The first designer PUF domains were engineered for the
tracking of endogenous mitochondria-encoded RNAs in living cells (Ozawa et al., 2007).
This was achieved by fusing two fragment of a fluorescent protein to adjacent PUF
domains that targeted ND6 mRNA, where a functional fluorophore was regenerated upon
folding when both segments were brought into close proximity (Ozawa et al,. 2007).
Through this approach, Ozawa et al. (2007) observed that oxidative stress led to the
increased mobility of ND6 within mitochondria and stimulated its degradation. The main
Engineering RNA-binding proteins: Unravelling the code
118
advantage of using engineered PUF proteins is that endogenous RNAs can be tracked
without the necessity of heterologous overexpression or genetic manipulation of their
genes (Filipovska and Rackham, 2011).
More recently, Wang et al. (2009) engineered PUF fused to a splicing regulatory
domain that targeted a site upstream of a splice acceptor site in the Bcl-X mRNA. Bcl-X
mRNA encodes for a mitochondrial outer membrane protein that is involved in
programmed cell death or apoptosis. Alternative splicing of this transcript facilitated the
selective production of two distinct isoforms of Bcl-X, Bcl-XL and Bcl-XS, which act as an
apoptotic inhibitor or an apoptotic activator, respectively (Boise et al., 1993; Chipuk et al.,
2010). By fusing the designed PUF domain to splicing repressor domain derived from
hnRNP A1, they were able to increase the production of Bcl-XS and induce apoptosis in
cancer cells (Wang et al., 2009). Our discovery of a universal code for RNA recognition by
PUF proteins should enable various types of applications of designer RBPs to manipulate
numerous aspects of RNA metabolism. Recently, Dong et al. (2011) used the cytosine-
recognizing PUF repeats to engineer splicing factors that bound cytosine-containing splice
sites in VEGF-A pre-mRNA to increase anti-angiogenic vascular endothelial growth factor
isoform production in cultured mammalian cells. The discovery of genome editing has also
generated an assortment of new potential methods that can be used to interrogate
biological systems, with the aim of enhancing our understanding of basic biology that can
possibly lead to new methods for treating human disease. For example, fusing a PUF or
PPR protein to other effector domains can potentially facilitate novel studies in RNA
biology, similar to applications seen with Zinc Finger Nuclease (ZFN) and Transcription
Activator-Like Effector Nuclease (TALEN) in DNA biology (reviewed in Klug, 2010).
The manipulation of RNA has thus far been limited to antisense and RNA
interference technologies (Hebert et al., 2008). Although short interfering RNAs (siRNAs)
Engineering RNA-binding proteins: Unravelling the code
119
have proven to be very potent inhibitors of gene expression and have allowed for the
elucidation and better understanding of gene functions in many different cell lines and
organisms, there are several limitations to siRNA-knockdown technology. These
approaches are limited to lowering the abundance or expression of their target RNAs and,
in the case of siRNAs and miRNAs, their actions are confined to the cytoplasm as they
depend on endogenous RNA interference pathways (Carthew and Sontheimer, 2009). We
have shown that a poly(A)-targeted PPR protein, when fused to a mitochondrial targeting
sequence, could be imported into the mitochondria where it was able to reduce the
translation of mitochondrial-encoded proteins. One limitation of this study was that we
did not determine the maximum level of overall protein knockdown that could be
achieved. This could possibly be investigated through the establishment of stable cell lines
transfected with the cPPR proteins, prolonging PPR protein expression within the cells.
The use of designer RBPs to interrogate mitochondrial gene expression will enable
significant advances towards understanding mitochondrial gene expression and its
involvement in human disease. We aim to engineer PPR proteins targeting specific
mitochondrial RNA transcripts in order to study the regulation of mitochondrial gene
expression at the level of RNA processing, translation and degradation and to understand
how these processes are coordinated. Ultimately, this will provide vital insight into human
diseases caused by mutations in genes encoding mitochondrial proteins.
On the other hand, RNAi mediated by the introduction of long dsRNA has been
used to investigate gene functions in various organisms including plants, Drosophila and
mouse oocytes (Baulcombe, 1999; Kennerdell and Carthew, 1998; Miquitta and Paterson,
1999; Svoboda et al., 2000; Wianny and Zernicka-Goetz, 2000). Although long dsRNA
enables the effective silencing of gene expression by presenting various siRNA sequences
to the target mRNA, the applicability of this approach is limited in mammals due to the
sequence non-specific interferon response caused by the introduction of dsRNA longer
than 30 nt (Elbashir et al., 2001). Not only does interferon trigger the degradation of
Engineering RNA-binding proteins: Unravelling the code
120
mRNA through the induction of 2'-5' oligodenylate synthase, it also leads to the activation
of protein kinase, which phosphorylates the translation initiation factor eIF2 leading to a
global inhibition of mRNA translation (Stark et al., 1998). We have shown that the number
of repeats within a single PUF domain can be adjusted from eight to 16; this would enable
either one to target a limited set of RNAs or only one species of RNA within an entire
transcriptome without some of the drawbacks of RNAi technologies.
The distinctive properties of PUFs can provide novel opportunities to selectively
bind and regulate specific RNAs. This can be exemplified by the ability of PUF proteins to
invade highly structured RNAs to bind their target sequences when we systematically
varied the target RNA structure to place it in increasingly base-paired structures. For RNAi
technologies, this may be a challenge as Kedde et al. (2010) showed that a miRNA
targeting the 3'UTR of p27 mRNA was ineffective without prior binding of PUM1 to that
target site. p27 tumour suppressor is a cyclin-dependent kinase inhibitor that associates
with cyclin-dependent kinase 2 (CDK2) and cyclin E complexes to negatively regulate cell
cycle progression. Kedde et al. (2010) discovered that binding of PUM1 to the 3’UTR opens
up the local RNA structure, allowing access of miR-221 and miR-222. This study is an
example of how an RBP-induced structural change modulates the ability of miRNA to
regulate gene expression.
Designer RBPs can also be used to complement RNAi technologies so that the
consequences of knocking down or overexpressing endogenous genes can be used to
elucidate the functions of a specific gene. There is increasing evidence in support of the
cross-communication between naturally occurring miRNAs and RBPs (Janga., 2012).
TRIM71, a tripartite motif protein which possesses ubiquitin ligase activity, is a target of
the miRNA let-7 and it is responsible for AGO protein degradation through
ubiquitinylation, directly interfering with miRNA function (Rybak et al., 2009). This study is
Engineering RNA-binding proteins: Unravelling the code
121
one of many that suggest that RBPs and microRNAs may target overlapping regions of
RNA transcripts, describing a mechanism by which miRNAs may modulate the post-
transcriptional pathway by revealing or masking the regulatory targets of RBPs (Janga.,
2012). In addition, RBPs near miRNA target sites can possibly regulate miRNA function by
either directly affecting miRNA-binding or indirectly by altering the RNA secondary
structure (van Kouwenhove et al., 2011). The interaction between RBPs and miRNAs can
provide the means of controlling the expression of target genes in a combinatorial
fashion.
In conclusion, since the discovery of the first consensus motifs in RBPs more than
twenty years ago, the number of RBPs and the various functions in which they participate
in has increased vastly. It can be presumed that with the rapid progress in understanding
molecular mechanisms underlying gene expression as well as the assortment of
uncharacterized protein repeat domains, we have only begun to see the potential that
engineered RBP scaffolds possess. Indeed there is still a large gap in the knowledge of the
structures of RBPs as well as their mode of interaction with RNAs and the organization of
these proteins in complex structures. It is not unexpected to think that designer RBPs will
lead to a shift in the way we manipulate and study complex transcriptomes and their
cellular functions. Several challenges still remain with this technology as there is ample
work to be done to further improve both the specificity of engineered RBPs as well as
methods employed to monitor off-target binding. However, the potential of engineering
RBP fusions to alter the splicing, translation, localization or degradation of chosen mRNAs
or other RNA species in order to further understand the mechanism of diseases or for
biotechnological applications is extremely stimulating.
Engineering RNA-binding proteins: Unravelling the code
122
BIBLIOGRAPHY
1. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. (2002) Molecular Biology of the Cell. 4th edition. New York: Garland Science; The Transport of Proteins into Mitochondria and Chloroplasts. Available from: http://www.ncbi.nlm.nih.gov/books/NBK26828/
2. Allerson, C. R., Martinez, A., Yikilmaz, E., and Rouault, T. A. (2003). A high-capacity RNA affinity column for the purification of human IRP1 and IRP2 overexpressed in Pichia pastoris. RNA 9, 364-374.
3. Aloni, Y., and Attardi, G. (1971) Symmetrical in vivo transcription of mitochondrial DNA in HeLa cells. Proc Natl Acad Sci USA 68, 1757–1761.
4. Altschul, S. F., Koonin, E.V. (1998) Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases.Trends Biochem Sci. 23(11), 444-447
5. Ameur, A., Zaghlool, A., Halvardson, J., Wetterbom, A., Gyllensten, U., Cavelier, L., Feuk, L. (2011) Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat Struct Mol Biol. 18(12), 1435-1440.
6. Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H., Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J., Staden, R., Young, I. G. (1981) Sequence and organization of the human mitochondrial genome. Nature. 290(5806), 457-465.
7. Anderson, S., de Bruijn, M. H., Coulson, A. R., Eperon, I. C., Sanger, F., Young, I. G. (1982). Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J Mol Biol. 156(4), 683-717.
8. Andrade, M. A., Perez-Iratxeta, C., and Ponting, C. P. (2001) Protein repeats: structures, functions and evolution. J. Struct. Biol. 134 (2-3), 117-131
9. Archer, S. K., Luu, V. D., de Queiroz, R. A., Brems, S., Clayton, C. (2009). Trypanosoma brucei PUF9 regulates mRNAs for proteins involved in replicative processes over the cell cycle. PLoS Pathog. 5(8), e1000565.
10. Asaoka-Taguchi, M., Yamada, M., Nakamura, A., Hanyu, K., Kobayashi, S. (1999). Maternal Pumilio acts together with Nanos in germline development in Drosophila embryos. Nat Cell Biol 1, 431–437.
11. Aubourg, S., Boudet, N., Kreis, M., Lecharny, A. (2000). In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol. 42(4), 603-613
12. Aurora, R., Rose, G. D. (1998) Helix capping. Protein Sci. 7, 21–38
13. Ausubel, F. M. (1987). Current Protocols in Molecular Biology. (New York, Greene Pub. Associates and Wiley-Interscience)
14. Auweter, S. D., Oberstrass, F. C., Allain, F. H. (2006) Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 34(17), 4943-4959.
15. Barkan, A., Rojas, M., Fujii, S., Yap, A., Chong, Y. S., Bond, C. S., Small, I. (2012). A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins. PLoS Genet. 8(8), e1002910.
Engineering RNA-binding proteins: Unravelling the code
123
16. Barker, D. D., Wang, C., Moore, J., Dickinson, L. K., Lehmann, R. (1992). Pumilio is essential for function but not for distribution of the Drosophila abdominal determinant Nanos. Genes Dev. 6, 2312–2326.
17. Barrell, B. G., Bankier, A. T., Drouin, J. (1979). A different genetic code in human mitochondria. Nature 282, 189–194.
18. Baulcombe, D. C. (1999). Gene silencing: RNA makes RNA makes no protein. Curr. Biol. 9, R599–R601
19. Beick, S., Schmitz-Linneweber, C., Williams-Carrier, R., Jensen, B., Barkan, A. (2008). The pentatricopeptide repeat protein PPR5 stabilizes a specific tRNA precursor in maize chloroplasts. Mol Cell Biol. 28(17), 5337-5347.
20. Bentley, D. L. (2005). Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr. Opin. Cell Biol., 17, 251–256
21. Bernstein, D., Hook, B., Hajarnavis, A., Opperman, L., Wickens, M. (2005) Binding specificity and mRNA targets of a C. elegans PUF protein, FBF-1. RNA. 11(4), 447-458.
22. Bertrand, E., Chartrand, P., Schaefer, M., Shenoy, S. M., Singer, R. H., Long, R. M. (1998). Localization of ASH1 mRNA particles in living yeast. Mol Cell. 2, 437-445.
23. Beyer, K., Dandekar, T., Keller, W. (1997). RNA ligands selected by cleavage stimulation factor contain distinct sequence motifs that function as downstream elements in 3'-end processing of pre-mRNA. J Biol Chem 272, 26769-26779.
24. Bienroth, S., Keller, W., Wahle, E. (1993). Assembly of a processive messenger RNA polyadenylation complex. EMBO J. 12, 585- 594.
25. Bisaillon, M., and Lemay, G. (1997). Viral and cellular enzymes involved in synthesis of mRNA cap structure. Virology 236, 1-7.
26. Bobrowicz, A. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. (2008) Polyadenylation and degradation of mRNA in mammalian mitochondria: a missing link? Biochem Soc Trans 36(3), 517–519.
27. Boch, J., Scholze, H., Schornack, S., Landgraf, A., Hahn, S., Kay, S., Lahaye, T., Nickstadt, A., Bonas, U. (2009). Breaking the code of DNA binding specificity of TAL-type III effectors. Science. 326, 1509-1512.
28. Boise, LH, González-García, M., Postema, C. E., Ding, L., Lindsten, T., Turka, L. A., Mao, X., Nuñez, G., Thompson, C. B. (1993). bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell. 74, 597–608.
29. Bonifacino, J. S. (1998). Current Protocols in Cell Biology. (New York, John Wiley).
30. Borowski, L. S., Szczesny, R. J., Brzezniak, L. K., Stepien, P. P. (2010) RNA turnover in human mitochondria: more questions than answers? Biochim Biophys Acta 1797, 1066–1070.
31. Brown, V., Jin, P., Ceman, S., Darnell, J. C., O'Donnell, W. T., Tenenbaum, S. A., Jin, X., Feng, Y., Wilkinson, K. D., Keene, J. D., Darnell, R. B., Warren, S. T. (2001). Microarray identification of FMRP-associated brain mRNAs and altered mRNA translational profiles in fragile X syndrome. Cell 107, 477-487.
32. Brusilow, S. W., and Horwich, A. L. (1996) Urea cycle enzymes. In: The metabolic and molecular bases of inherited disease, 7th ed, Scriver CR, Beaudet AL, Sly WS, Valle D (Eds), McGraw-Hill, New York, 1187-1232.
Engineering RNA-binding proteins: Unravelling the code
124
33. Brzezniak, L. K., Bijata, M., Szczesny, R. J., Stepien, P. P. (2011). Involvement of human ELAC2 gene product in 3' end processing of mitochondrial tRNAs. RNA Biol. 8(4), 616-626.
34. Camenisch, T. D., Brilliant, M. H., Segal, D. J. (2008). Critical parameters for genome editing using zinc finger nucleases. Med Chem. 8(7), 669-676.
35. Caro, F., Bercovich, N., Atorrasagasti, C., Levin, M. J., Vazquez, M. P. (2006). Trypanosoma cruzi: analysis of the complete PUF RNA-binding protein family. Exp Parasitol 113, 112–124.
36. Carroll, D. (2008). Progress and prospects: Zinc-finger nucleases as gene therapy agents Gene Therapy. 15, 1463–1468.
37. Carthew, R. W., Sontheimer, E. J. (2009). Origins and Mechanisms of miRNAs and siRNAs. Cell. 136(4), 642-655.
38. Cassiday, L. A., and Maher, L. J. (2001) In vivo recognition of an RNA aptamer by its transcription factor target. Biochemistry 40, 2433–2438
39. Cathomen, T., Joung, J. K. (2008). Zinc-finger nucleases: the next generation emerges. Mol Ther. 16(7), 1200-1207.
40. Cermak, T., Doyle, E. L., Christian, M., Wang, L., Zhang, Y., Schmidt, C., Baller, J. A., Somia, N. V., Bogdanove, A. J., Voytas, D. F. (2011). Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 39(12), e82
41. Chase, C. D. (2007). Cytoplasmic male sterility: a window to the world of plant mitochondrial-nuclear interactions. Trends Genet. 23(2), 81-90.
42. Cheong, C. G., Hall, T. M. (2006) Engineering RNA sequence specificity of Pumilio repeats. Proc Natl Acad Sci USA. 103(37), 13635-13639.
43. Chipuk, J. E., Moldoveanu, T., Llambi, F., Parsons, M. J., Green, D. R. (2010). The BCL-2 family reunion. Mol Cell. 37(3), 299-310.
44. Cho, E-J., Kobor, M. S., Kim, M., Greenblatt, J., Buratowski, S. (2001). Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes & Dev. 15, 3319-3329
45. Cho, P. F., Gamberi, C., Cho-Park, Y. A., Cho-Park, I. B., Lasko, P., Sonenberg, N. (2006) Cap-dependent translational inhibition establishes two opposing morphogen gradients in drosophila embryos. Curr Biol. 16, 2035–2041.
46. Chong, S., Mersha, F. B., Comb, D. G., Scott, M. E., Landry, D., Vence, L.M., Perler, F.B., Benner, J., Kucera, R.B., Hirvonen, C. A., Pelletier, J.J., Paulus, H. Xu, M.-Q. (1997) Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene 192, 271-281.
47. Chong, S., Shao, Y., Paulus, H., Benner, J., Perler F.B., Xu, M.-Q. (1996) Protein splicing involving the Saccharomyces, cerevisiae VMA intein: the steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing system. J. Biol. Chem. 271, 22159-22168.
48. Christianson, T. W., Clayton, D. A. (1988) A tridecamer DNA sequence supports human mitochondrial RNA 3’-end formation in vitro. Mol Cell Biol, 8, 4502–4509.
Engineering RNA-binding proteins: Unravelling the code
125
49. Cléry, A., Blatter, M., Allain, F. H. (2008). RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol. 18(3), 290-298.
50. Colgan, D. F., and Manley, J. L. (1997). Mechanism and regulation of mRNA polyadenylation. Genes Dev 11, 2755-2766.
51. Coll, O., Villalba, A., Bussotti, G., Notredame, C., Gebauer, F. (2010). A novel, noncanonical mechanism of cytoplasmic polyadenylation operates in Drosophila embryogenesis.Genes Dev. 24(2), 129-134
52. Coulombe, B., Burton, Z. F. (1999). DNA bending and wrapping around RNA polymerase: a "revolutionary" model describing transcriptional mechanisms. Microbiol Mol Biol Rev. 63(2), 457-478
53. Cramer, P. (2004). RNA polymerase II structure: From core to functional complexes. Curr Opin Genet Dev 14, 218–226.
54. Cramer, P., Bushnell, D. A., Kornberg, R. D. (2001). Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863-1876.
55. Crittenden, S. L., Bernstein, D. S., Bachorik, J. L., Thompson, B. E., Gallegos, M., Petcherski, A. G., Moulder, G., Barstead, R., Wickens, M., and Kimble, J. (2002). A conserved RNA-binding protein controls germline stem cells in Caenorhabditis elegans. Nature 417, 660–663.
56. Cui, L., Fan, Q., Li, J. (2002). The malaria parasite Plasmodium falciparum encodes members of the Puf RNA-binding protein family with conserved RNA binding activity. Nucleic Acids Res. 30(21), 4607-4617.
57. Cusack, S. (1997) Aminoacyl-tRNA synthetases. Curr Opin Struct Biol 7,881–889.
58. Cushing, D. A., Forsthoefel, N. R., Gestaut, D. R., Vernon, D. M. (2005). Arabidopsis emb175 and other ppr knockout mutants reveal essential roles for pentatricopeptide repeat (PPR) proteins in plant embryogenesis. Planta. 221(3), 424-436.
59. Dahmus, M. E. (1995). Phosphorylation of the C-terminal domain of RNA polymerase II. Biochim Biophys Acta 1261, 171-182.
60. Dantonel, J. C., Murthy, K. G., Manley, J. L., Tora, L. (1997). Transcription factor TFIID recruits factor CPSF for formation of 3' end of mRNA. Nature. 389(6649), 399-402.
61. Dasgupta, S., Bell, J. A. (1993) Design of helix ends. Amino acid preferences, hydrogen bonding and electrostatic interactions. Int. J. Pept. Protein Res 41, 499–511
62. Dassi, E., Quattrone, A. (2012). Tuning the engine: An introduction to resources on post-transcriptional regulation of gene expression. RNA Biol. 9(10)
63. Davies, S. M., Lopez Sanchez, M. I., Narsai, R., Shearwood, A. M., Razif, M. F., Small, I. D., Whelan, J., Rackham, O., Filipovska, A. (2012). MRPS27 is a pentatricopeptide repeat domain protein required for the translation of mitochondrially encoded proteins. FEBS Lett. 586(20), 3555-3561.
64. Davies, S. M., Rackham, O., Shearwood, A. M., Hamilton, K. L., Narsai, R., Whelan, J., Filipovska, A. (2009). Pentatricopeptide repeat domain protein 3 associates with the mitochondrial small ribosomal subunit and regulates translation. FEBS Lett. 583(12), 1853-1858.
Engineering RNA-binding proteins: Unravelling the code
126
65. De Gregorio, E., Preiss, T., Hentze, M. W. (1999) Translation driven by an eIF4G core domain in vivo. EMBO J 18, 4865-4874.
66. Delannoy, E., Stanley, W. A., Bond, C. S., Small, I. D. (2007). Pentatricopeptide repeat (PPR) proteins as sequence-specificity factors in post-transcriptional processes in organelles. Biochem Soc Trans. 35(Pt 6), 1643-1647
67. Deng, Y., Singer, R. H., Gu, W. (2008). Translation of ASH1 mRNA is repressed by Puf6p-Fun12p/eIF5B interaction and released by CK2 phosphorylation. Genes Dev. 22, 1037–1050.
68. Desjarlais, J. R., and Berg, J. M. (1993) Use of a zinc-finger consensus sequence framework and specificity rules to design specific DNA binding proteins. Proc Natl Acad Sci USA. 90(6), 2256-2260.
69. Dichtl, B., Blank, D., Sadowski, M., Hübner, W., Weiser, S., Keller, W. (2002). Yhh1p/Cft1p directly links poly(A) site recognition and RNA polymerase II transcription termination.EMBO J 21(15), 4125-4135.
70. Dong, S., Wang, Y., Cassidy-Amstutz, C., Lu, G., Bigler, R., Jezyk, M. R., Li, C., Hall, T. M., Wang, Z. (2011). Specific and modular binding code for cytosine recognition in Pumilio/FBF (PUF) RNA-binding domains. J Biol Chem. 286(30), 26732-26742.
71. Dreyfuss, G., Kim, V. N., and Kataoka, N. (2002). Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol 3, 195-205.
72. Dubendorff, J.W., and Studier, F.W. (1991) Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor. J. Mol. Biol. 219, 45-59.
73. Dubnau, J., Chiang, A. S., Grady, L., Barditch, J., Gossweiler, S., McNeil, J., Smith, P., Buldoc, F., Scott, R., Certa, U., Broger, C., Tully, T. (2003). The staufen/pumilio pathway is involved in Drosophila long-term memory. Curr Biol. 13(4), 286-96.
74. Eckmann, C. R., Kraemer, B., Wickens, M., and Kimble, J. (2002). GLD-3, a bicaudal-C homolog that inhibits FBF to control germline sex determination in C. elegans. Dev. Cell 3, 697–710
75. Edwards, T. A., Pyle, S. E., Wharton, R. P., Aggarwal, A. K. (2001). Structure of Pumilio reveals similarity between RNA and peptide binding motifs. Cell.105(2), 281-289.
76. Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature. 411(6836), 494-498
77. Eliyahu, E., Pnueli, L., Melamed, D., Scherrer, T., Gerber, A. P., Pines, O., Rapaport, D., Arava, Y. (2010) Tom20 mediates localization of mRNAs to mitochondria in a translation-dependent manner. Mol Cell Biol. 30, 284–294.
78. Fabian, M. R., Mathonnet, G., Sundermeier, T., Mathys, H., Zipprich, J. T., Svitkin, Y. V., Rivas, F., Jinek, M., Wohlschlegel, J., Doudna, J. A., Chen, C. Y., Shyu, A. B., Yates, JR 3rd, Hannon, G. J., Filipowicz, W., Duchaine, T. F., Sonenberg, N. (2009). Mammalian miRNA RISC recruits CAF1 and PABP to affect PABP-dependent deadenylation. Mol Cell. 35(6), 868-880.
Engineering RNA-binding proteins: Unravelling the code
127
79. Falkenberg, M., Gaspari, M., Rantanen, A., Trifunovic, A., Larsson, N. G., Gustafsson, C. M. (2002). Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat Genet. 31, 289–294.
80. Falkenberg, M., Larsson, N. G., Gustafsson, C. M. (2007). DNA replication and transcription in mammalian mitochondria. Annu Rev Biochem. 76, 679–699.
81. Filipovska A, Rackham O. (2008).Building a Parallel Metabolism within the Cell. ACS Chem Biol. 3(1), 51-63.
82. Filipovska, A., Rackham, O. (2011) Designer RNA-binding proteins: New tools for manipulating the transcriptome. RNA Biol. 8(6), 978-983.
83. Forbes, A., Lehmann, R. (1998) Nanos and Pumilio have critical roles in the development and function of Drosophila germline stem cells. Development, 125(4), 679-690.
84. Francischini, C. W., Quaggio, R. B. (2009). Molecular characterization of Arabidopsis thaliana PUF proteins-binding specificity and target candidates. Febs J. 276, 5456–5470
85. Fujii, S., Bond, C. S., Small, I. D. (2011). Selection patterns on restorer-like genes reveal a conflict between nuclear and mitochondrial genomes throughout angiosperm evolution. Proc Natl Acad Sci USA. 108(4), 1723-1728.
86. Gagliardi, D., Stepien, P. P., Temperley, R. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2004) Messenger RNA stability in mitochondria: different means to an end. Trends Genet. 20, 260–267.
87. Galgano, A., Forrer, M., Jaskiewicz, L., Kanitz, A., Zavolan, M., Gerber, A. P. (2008) Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS One 3, e3164
88. Gamberi, C., Peterson, D. S., He, L., Gottlieb, E. (2002). An anterior function for the Drosophila posterior determinant Pumilio. Development. 129, 2699–2710.
89. García-Rodríguez, L. J., Gay, A. C., Pon, L. A. (2007) Puf3p, a Pumilio family RNA binding protein, localizes to mitochondria and regulates mitochondrial biogenesis and motility in budding yeast. J Cell Biol. 176(2), 197-207.
90. Gerber, A. P., Herschlag, D., and Brown, P.O. (2004). Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol 2, E79
91. Gietz, R. D., Woods, R. A. (2002). Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol.350, 87-96
92. Glisovic T., Bachorik, JL., Yong, J., Dreyfuss, G. (2008) RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582(14),1977-1986.
93. Gobert, A., Gutmann, B., Taschner, A., Gössringer, M., Holzmann, J., Hartmann, R. K., Rossmanith, W., Giegé, P. (2010). A single Arabidopsis organellar protein has RNase P activity. Nat Struct Mol Biol. 17(6):740-744.
94. Graveley, B. R. (2000) Sorting out the complexity of SR protein functions. RNA. 6(9), 1197-1211.
95. Grudzien, E., Kalek, M., Jemielity, J., Darzynkiewicz, E., Rhoads, R. E. (2006). Differential inhibition of mRNA degradation pathways by novel cap analogs. J Biol Chem. 281(4), 1857-1867.
Engineering RNA-binding proteins: Unravelling the code
128
96. Gu, W., Deng, Y., Zenklusen, D., Singer, R. H. (2004) A new yeast PUF family protein, Puf6p, represses ASH1 mRNA translation and is required for its localization. Genes Dev. 18, 1452–1465.
97. Gupta, Y. K., Nair, D. T., Wharton, R. P., Aggarwal, A. K. (2008). Structures of human Pumilio with noncognate RNAs reveal molecular mechanisms for binding promiscuity. Structure. 16(4), 549-557
98. Hammani, K., des Francs-Small, C. C., Takenaka, M., Tanz, S. K., Okuda, K., Shikanai, T., Brennicke, A., Small, I. (2011). The pentatricopeptide repeat protein OTP87 is essential for RNA editing of nad7 and atp1 transcripts in Arabidopsis mitochondria. J Biol Chem. 286(24), 21361-21371.
99. Händel E. M., Alwin S., Cathomen T. (2009) Expanding or restricting the target site repertoire of zinc-finger nucleases: the inter-domain linker as a major determinant of target site selectivity. Mol. Ther. 17 (1), 104–111.
100. Hebert, C. G., Valdes, J. J., Bentley, W. E. (2008). Beyond silencing - engineering applications of RNA interference and antisense technology for altering cellular phenotype. Curr Opin Biotechnol 19, 500-505.
101. Hesselberth, J. R., and Ellington, A. D. (2002). A (ribo) switch in the paradigms of genetic regulation. Nat Struct Biol 9, 891-893.
102. Hieronymus, H., Silver, P. A. (2004) A systems view of mRNP biology. Genes Dev. 18(23), 2845-2860.
103. Hirose, Y., Manley, J. L. (1998). RNA polymerase II is an essential mRNA polyadenylation factor. Nature 395, 93-96.
104. Hirose, Y., Ohkuma, Y. (2007). Phosphorylation of the C-terminal domain of RNA polymerase II plays central roles in the integrated events of eucaryotic gene expression. J Biochem 141, 601–608
105. Hockemeyer, D., Wang, H., Kiani, S., Lai, C. S., Gao, Q., Cassady, J. P., Cost, G. J., Zhang, L., Santiago, Y., Miller, J. C., Zeitler, B., Cherone, J. M., Meng, X., Hinkley, S. J., Rebar, E. J., Gregory, P. D., Urnov, F. D., Jaenisch, R. (2011). Genetic engineering of human pluripotent cells using TALE nucleases. Nat Biotechnol. 29(8), 731-734.
106. Holzmann, J., Frank, P., Loffler, E., Bennett, K. L., Gerner, C., Rossmanith, W. (2008). RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell. 135, 462–474.
107. Holzmann, J., Rossmanith, W. (2009). tRNA recognition, processing, and disease: hypotheses around an unorthodox type of RNase P in human mitochondria. Mitochondrion. 9, 284–288.
108. Hook, B. A., Goldstrohm, A. C., Seay, D. J., Wickens, M. (2007) Two yeast PUF proteins negatively regulate a single mRNA. J Biol Chem. 282, 15430–15438.
109. Hook, B., Bernstein, D., Zhang, B., Wickens, M. (2005). RNA-protein interactions in the yeast three-hybrid system: affinity, sensitivity, and enhanced library screening. RNA. 11(2), 227-233.
110. Horwich, A. L., Kalousek, F., Fenton, W. A., Pollock, R. A., Rosenberg, L. E. (1986) Targeting of pre-ornithine transcarbamylase to mitochondria: definition of critical regions and residues in the leader peptide. Cell. 44(3), 451-459.
Engineering RNA-binding proteins: Unravelling the code
129
111. Howard, M. J., Lim, W. H., Fierke, C. A., Koutmos, M. (2012). Mitochondrial ribonuclease P structure provides insight into the evolution of catalytic strategies for precursor-tRNA 5' processing. Proc Natl Acad Sci USA. 109(40), 16149-16154.
112. Hudson, B. P., Martinez-Yamout, M. A., Dyson, H. J., Wright, P. E. (2004). Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nat Struct Mol Biol. 11, 257–264.
113. Isaacs, F. J., Dwyer, D. J., Collins, J. J. (2006) RNA synthetic biology. Nat Biotechnol. 24(5), 545-554
114. Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R., Collins, J. J. (2004) Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol. 22(7), 841-847.
115. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., Sakaki, Y. (2000) Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA. 97(3), 1143-1147.
116. Jackson Jr., J. S., Houshmandi, S. S., Lopez Leban, F., and Olivas, W. M. (2004). Recruitment of the Puf3 protein to its mRNA target for regulation of mRNA decay in yeast. RNA 10, 1625–1636
117. Jan, E., Motzny, C. K., Graves, L. E., and Goodwin, E. B. (1999). The STAR protein, GLD-1, is a translational regulator of sexual identity in Caenorhabditis elegans. EMBO J 18, 258-269
118. Janga, S. C. (2012). From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks. Brief Funct Genomics. [Epub ahead of print]
119. Jiao, X., Xiang, S., Oh, C., Martin, C. E., Tong, L., Kiledjian, M. (2010) Identification of a quality-control mechanism for mRNA 5'-end capping. Nature. 467(7315), 608-611.
120. Kahvejian, A., Svitkin, Y. V., Sukarieh, R., M’Boutchou, M. N., Sonenberg, N. (2005). Mammalian poly(A)-binding protein is a eukaryotic translation initiation factor, which acts via multiple mechanisms. Genes Dev. 19, 104–113
121. Kanki, T., Ohgaki, K., Gaspari, M., Gustafsson, C. M., Fukuoh, A., Sasaki, N., Hamasaki, N., Kang, D. (2004) Architectural role of mitochondrial transcription factor A in maintenance of human mitochondrial DNA. Mol Cell Biol. 24, 9823–9834.
122. Kaye, J. A., Rose, N. C., Goldsworthy, B., Goga, A., L`Etoile, N. D. (2009). A 3′UTR pumilio-binding element directs translational activation in olfactory sensory neurons. Neuron. 61, 57–70.
123. Kedde, M., van Kouwenhove, M., Zwart, W., Oude Vrielink, J. A., Elkon, R., Agami, R. (2010). A Pumilio-induced RNA structure switch in p27-3' UTR controls miR-221 and miR-222 accessibility. Nat Cell Biol. 12(10), 1014-1020
124. Keene, J. D., and Tenenbaum, S. A. (2002). Eukaryotic mRNPs may represent posttranscriptional operons. Mol Cell 9, 1161-1167.
125. Kennedy, B. K., Austriaco, N. R. Jr, Zhang, J., Guarente, L. (1995) Mutation in the silencing gene SIR4 can delay aging in S. cerevisiae. Cell 80, 485–496.
126. Kennedy, B. K., Gotta, M., Sinclair, D. A., Mills, K., McNabb, D. S., Murthy, M., Pak, S. M., Laroche, T., Gasser, S. M., Guarente, L. (1997) Redistribution of silencing proteins from
Engineering RNA-binding proteins: Unravelling the code
130
telomeres to the nucleolus is associated with extension of life span in S. cerevisiae. Cell 89, 381–391
127. Kennerdell., J. R., Carthew, R. W. (1998). Use of dsRNA-mediated genetic interference to demonstrate that frizzled and frizzled 2 act in the wingless pathway. Cell 95, 1017–1026
128. Kim, M. Y., Hur, J., Jeong, S. (2009). Emerging roles of RNA and RNA-binding protein network in cancer cells. BMB Rep 42, 125–130.
129. Kishore, S., Luber, S., Zavolan, M. (2010) Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Brief Funct Genomics. 9(5-6), 391-404.
130. Kloc, M., Bilinski, S., Chan, A. P., Allen, L. H., Zearfoss, N. R., Etkin, L. D. (2001). RNA localization and germ cell determination in Xenopus. Int Rev Cytol 203, 63-91.
131. Kobayashi, K., Kawabata, M., Hisano, K., Kazama, T., Matsuoka, K., Sugita, M., Nakamura, T. (2012). Identification and characterization of the RNA binding surface of the pentatricopeptide repeat protein. Nucleic Acids Res. 40(6), 2712-2723.
132. Kobayashi, K., Suzuki, M., Tang, J., Nagata, N., Ohyama, K., Seki, H., Kiuchi, R., Kaneko, Y., Nakazawa, M., Matsui, M., Matsumoto, S., Yoshida, S., Muranaka, T. (2007) Lovastatin insensitive 1, a Novel pentatricopeptide repeat protein, is a potential regulatory factor of isoprenoid biosynthesis in Arabidopsis. Plant Cell Physiol. 48(2), 322-331.
133. Kobe, B. and Kajava, A. V. (2000) When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25 (10), 509-515
134. Koc, E. C., Burkhart, W., Blackburn, K., Moyer, M. B., Schlatzer, D. M., Moseley, A., Spremulli, L. L. (2001). The large subunit of the mammalian mitochondrial ribosome. Analysis of the complement of ribosomal proteins present. J Biol Chem. 276, 43958–43969.
135. Koh, Y. Y., Wang, Y., Qiu, C., Opperman, L., Gross, L., Tanaka Hall T. M., Wickens, M. (2011). Stacking interactions in PUF-RNA complexes. RNA. 17(4), 718-727.
136. Kohl, A., Binz, H. K., Forrer, P., Stumpp, M. T., Plückthun, A., Grütter, M. G. (2003) Designed to be stable: crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci USA. 100(4), 1700-1705.
137. Kolansky, D. M., Conboy, J. G., Fenton, W. A., Rosenberg, L. E. (1982). Energy-dependent translocation of the precursor of ornithine transcarbamylase by isolated rat liver mitochondria. J Biol Chem. 257(14), 8467-8471.
138. Kotera, E., Tasaka, M., Shikanai, T. (2005). A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature. 433(7023), 326-330.
139. Koussevitzky, S., Nott, A., Mockler, T. C., Hong, F., Sachetto-Martins, G., Surpin, M., Lim, J., Mittler, R., Chory, J. (2007). Signals from chloroplasts converge to regulate nuclear gene expression. Science. 316(5825), 715-719
140. Kuehner, J. N., Pearson, E. L., Moore, C. (2011). Unravelling the means to an end: RNA polymerase II transcription termination. Nat Rev Mol Cell Biol 12, 283–294
141. Kühl, I., Dujeancourt, L., Gaisne, M., Herbert, C. J., Bonnefoy, N. (2011). A genome wide study in fission yeast reveals nine PPR proteins that regulate mitochondrial gene expression. Nucleic Acids Res. 39(18), 8029-8041.
Engineering RNA-binding proteins: Unravelling the code
131
142. Kumar, S., Mansal, M., (1998) Dissecting alpha-helicesposition-specific analysis of alpha-helices in globular proteins Proteins. 31, 460–476
143. Lamont, L. B., Crittenden, S. L, Bernstein, D., Wickens, M., Kimble, J. (2004) FBF-1 and FBF-2 regulate the size of the mitotic region in the C. elegans germline. Dev Cell 7, 697–707.
144. Latham, V. M., Jr., Kislauskis, E. H., Singer, R. H., Ross, A. F. (1994). Beta-actin mRNA localization is regulated by signal transduction mechanisms. J Cell Biol 126, 1211-1219.
145. Lee, E., Yeo, A., Kraemer, B., Wickens, M., and Linial, M. L. (1999). The gag domains required for avian retroviral RNA encapsidation determined by using two independent assays. J Virol 73, 6282-6292.
146. Lee, M. H., Hook, B., Pan, G., Kershner, A. M., Merritt, C., Seydoux, G., Thomson, J. A., Wickens, M., Kimble, J. (2007). Conserved regulation of MAP kinase expression by PUF RNA-binding proteins. PLoS Genet, 3(12), e233.
147. Lehmann, R., Nusslein-Volhard, C. (1987). Involvement of the pumilio gene in the transport of an abdominal signal in the Drosophila embryo. Nature 329, 167–170.
148. Levadoux, M., Mahon, C., Beattie, J. H., Wallace, H. M., and Hesketh, J. E. (1999). Nuclear import of metallothionein requires its mRNA to be associated with the perinuclear cytoskeleton. J Biol Chem 274, 34961-34966.
149. Levinger, L., Morl, M., Florentz, C. (2004) Mitochondrial tRNA 3’ end metabolism and human disease. Nucleic Acids Res 32, 5430–5441.
150. Li, T., Huang, S., Zhao, X., Wright, D. A., Carpenter, S., Spalding, M. H., Weeks, D. P., Yang, B. (2011). Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaryotes. Nucleic Acids Res. 39(14), 6315-6325.
151. Licatalosi, D. D., Geiger, G., Minet, M., Schroeder, S., Cilli, K., McNeil, J. B., Bentley, D.L. (2002). Functional interaction of yeast pre-mRNA 3′ end processing factors with RNA polymerase II. Mol. Cell. 9, 1101–1111
152. Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2008). PPR (pentatricopeptide repeat) proteins in mammals: important aids to mitochondrial gene expression. Biochem J. 416(1), e5-6
153. Lin, H., Spradling, A. C. (1997) A novel group of pumilio mutations affects the asymmetric division of germline stem cells in the Drosophila ovary. Development 124, 2463–2476.
154. Lipinski, K. A., Puchta, O., Surendranath, V., Kudla, M., Golik, P. (2011). Revisiting the yeast PPR proteins--application of an Iterative Hidden Markov Model algorithm reveals new members of the rapidly evolving family. Mol Biol Evol. 28(10), 2935-2948.
155. Liu, Q., Paroo, Z. (2010). Biochemical principles of small RNA pathways. Annu Rev Biochem. 79, 295-319
156. Lodish, H., Berk, A., Zipursky, S. L., et al. (2000). Molecular Cell Biology. 4th edition. New York: W. H. Freeman. Section 11.2, Processing of Eukaryotic mRNA. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21563/
157. Long, R. M., Singer, R. H., Meng, X., Gonzalez, I., Nasmyth, K., Jansen, R. P. (1997). Mating type switching in yeast controlled by asymmetric localization of ASH1 mRNA. Science 277, 383-387.
Engineering RNA-binding proteins: Unravelling the code
132
158. Lopez Sanchez, M. I. G., Mercer, T. R., Davies, S. M., Shearwood, A-M. J., Nygard, K. K. A., Richman, T. R., Mattick, J. S., Rackham, O., Filipovska, A. (2011) RNA processing in human mitochondria. Cell Cycle 10, 1–13
159. Loughlin, F. E., Mansfield, R. E., Vaz, P. M., McGrath, A. P., Setiyaputra, S., Gamsjaeger, R., Chen, E. S., Morris, B. J., Guss, J. M., Mackay, J. P. (2009) The zinc fingers of the SR-like protein ZRANB2 are single stranded RNA-binding domains that recognize 5’ splice site-like sequences. Proc Natl Acad Sci USA. 106, 5581–5586.
160. Lu, D., Searles, M. A., Klug, A. (2003). Crystal structure of a zinc-finger-RNA complex reveals two modes of molecular recognition. Nature. 426, 96–100.
161. Lublin, A. L., Evans, T. C. (2007) The RNA-binding proteins PUF-5, PUF-6, and PUF-7 reveal multiple systems for maternal mRNA regulation during C. elegans oogenesis. Dev Biol 303, 635–649.
162. Lukong, K. E., Chang, K. W., Khandjian, E. W, Richard, S. (2008). RNAbinding proteins in human genetic disease. Trends Genet 24, 416–425.
163. Lurin, C., Andrés, C., Aubourg, S., Bellaoui, M., Bitton, F., Bruyère, C., Caboche, M., Debast, C., Gualberto, J., Hoffmann, B., Lecharny, A., Le Ret, M., Martin-Magniette, M. L., Mireau, H., Peeters, N., Renou, J. P., Szurek, B., Taconnat, L., Small, I. (2004). Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell. 16(8), 2089-2103.
164. Mackay, J. P., Font, J., Segal, D. J. (2011). The prospects for designer single-stranded RNA-binding proteins. Nat Struct Mol. 18, 256-261.
165. Main, E. R., Jackson, S. E., Regan, L. (2003). The folding and design of repeat proteins: reaching a consensus. Curr Opin Struct Biol. 13(4), 482-489.
166. Mandel, C. R., Bai, Y., Tong, L. (2008) Protein factors in premRNA 3'-end processing. Cell Mol Life Sci 65, 1099-1122
167. Maniatis, T., and Reed, J., (2002). An extensive network of coupling among gene expression machines. Nature 416, 499-506
168. Margeot, A., Blugeon, C., Sylvestre, J., Vialette, S., Jacq, C., Corral-Debrinski, M. (2002). In Saccharomyces cerevisiae, ATP2 mRNA sorting to the vicinity of mitochondria is essential for respiratory function. EMBO J 21, 6893-6904.
169. Martin, F., Schaller, A., Eglite, S., Schumperli, D., and Muller, B. (1997). The gene for histone RNA hairpin binding protein is located on human chromosome 4 and encodes a novel type of RNA binding protein. EMBO J 16, 769-778.
170. McCracken, S., Fong, N., Rosonina, E., Yankulov, K., Brothers, G., Siderovski, D., Hessel, A., Foster, S., Shuman, S., Bentley, D. L. (1997a). 5'-Capping enzymes are targeted to pre-mRNA by binding to the phosphorylated carboxy-terminal domain of RNA polymerase II. Genes Dev 11, 3306-3318.
171. McCracken, S., Fong, N., Yankulov, K., Ballantyne, S., Pan, G., Greenblatt, J., Patterson, S. D., Wickens, M., Bentley, D. L. (1997b). The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385, 357-361.
172. McCullough, A. J., and Schuler, M. A. (1997). Intronic and exonic sequences modulate 5' splice site selection in plant nuclei. Nucleic Acids Res 25, 1071-1077.
Engineering RNA-binding proteins: Unravelling the code
133
173. Mee, C. J., Pym, E. C., Moffat, K. G., Baines, R. A. (2004). Regulation of neuronal excitability through pumilio-dependent control of a sodium channel gene. J Neurosci 24, 8695–8703
174. Menon, K. P., Andrews, S., Murthy, M., Gavis, E. R., Zinn, K. (2009). The translational repressors Nanos and Pumilio have divergent effects on presynaptic terminal growth and postsynaptic glutamate receptor subunit composition. J Neurosci. 29, 5558–5572.
175. Menon, K. P., Sanyal, S., Habara, Y., Sanchez, R., Wharton, R. P., Ramaswami, M., Zinn, K. (2004) The translational repressor Pumilio regulates presynaptic morphology and controls postsynaptic accumulation of translation factor eIF-4E. Neuron. 44, 663–676.
176. Mercer, T. R., Neph, S., Dinger, M. E., Crawford, J., Smith, M. A., Shearwood, A. M., Haugen, E., Bracken, C. P., Rackham, O., Stamatoyannopoulos, J. A., Filipovska, A., Mattick, J. S. (2011) The human mitochondrial transcriptome. Cell 146, 645–658.
177. Merrick, W. C., and Sonenberg, N. (1997). Assays for eukaryotic translation factors that bind mRNA. Methods 11, 333-342.
178. Mili, S., Piñol-Roma, S. (2003). LRP130, a pentatricopeptide motif protein with a noncanonical RNA-binding domain, is bound in vivo to mitochondrial and nuclear RNAs. Mol Cell Biol. 23(14), 4972-4982.
179. Miller, J. C., Tan, S., Qiao, G., Barlow, K. A., Wang, J., Xia, D. F., Meng, X., Paschon, D. E., Leung, E., Hinkley, S. J., Dulay, G. P., Hua, K. L., Ankoudinova, I., Cost, G. J., Urnov, F. D., Zhang, H. S., Holmes, M. C., Zhang, L., Gregory, P. D., Rebar, E. J. (2010). A TALE nuclease architecture for efficient genome editing. Nat Biotechnol. 29(2):143-148.
180. Miller, J., McLachlan, A. D., Klug, A. (1985). Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J. 4, 1609–1614.
181. Miller, S., Yasuda, M., Coats, J. K., Jones, Y., Martone, M. E., Mayford, M. (2002). Disruption of dendritic translation of CaMKIIalpha impairs stabilization of synaptic plasticity and memory consolidation. Neuron 36, 507-519.
182. Mingler, M. K., Hingst, A. M., Clement, S. L., Yu, L. E., Reifur, L., Koslowsky, D. J. (2006). Identification of pentatricopeptide repeat proteins in Trypanosoma brucei. Mol Biochem Parasitol. 150(1), 37-45.
183. Misquitta, L., Paterson, B. M. (1999) Targeted disruption of gene function in Drosophila by RNA interference (RNA-i): a role for nautilus in embryonic somatic muscle formation. Proc. Natl Acad. Sci. USA 96, 1451–1456.
184. Monshausen, M., Putz, U., Rehbein, M., Schweizer, M., DesGroseillers, L., Kuhl, D., Richter, D., and Kindler, S. (2001). Two rat brain staufen isoforms differentially bind RNA. J Neurochem 76, 155-165.
185. Montoya, J., Christianson, T., Levens, D., Rabinowitz, M., Attardi, G. (1982) Identification of initiation sites for heavy-strand and light-strand transcription in human mitochondrial DNA. Proc Natl Acad Sci USA 79, 7195–7199.
186. Montoya, J., Ojala, D., Attardi, G. (1981) Distinctive features of the 5’-terminal sequences of the human mitochondrial mRNAs. Nature 290, 465–470
187. Mootha, V. K., Lepage, P., Miller, K., Bunkenborg, J., Reich, M., Hjerrild, M., Delmonte, T., Villeneuve, A., Sladek, R., Xu, F., Mitchell, G. A., Morin, C., Mann, M., Hudson, T. J., Robinson, B., Rioux, J. D., Lander, E. S. (2003). Identification of a gene causing human
Engineering RNA-binding proteins: Unravelling the code
134
cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci USA. 100, 605–610.
188. Morbitzer, R., Römer, P., Boch, J., Lahaye, T. (2010). Regulation of selected genome loci using de novo-engineered transcription activator-like effector (TALE)-type transcription factors. Proc Natl Acad Sci USA. 107(50), 21617-21622.
189. Mori, M., Miura, S., Morita, T., Takiguchi, M., Tatibana, M. (1982) Ornithine transcarbamylase in liver mitochondria. Mol Cell Biochem. 49(2):97-111.
190. Mori, M., Morita, T., Ikeda, F., Amaya, Y., Tatibana, M., Cohen, P. P. (1981) Synthesis, intracellular transport, and processing of the precursors for mitochondrial ornithine transcarbamylase and carbamoyl-phosphate synthetase I in isolated hepatocytes. Proc Natl Acad Sci USA. 78(10), 6056-6060.
191. Morris, A. R., Mukherjee, N., Keene, J. D. (2008) Ribonomic analysis of human Pum1 reveals cis-trans conservation across species despite evolution of diverse mRNA target sets. Mol Cell Biol, 28, 4093–4103.
192. Mosavi, L. K., Minor, D. L. Jr., Peng, Z. Y. (2002) Consensus-derived structural determinants of the ankyrin repeat motif. Proc Natl Acad Sci USA. 99(25), 16029-16034.
193. Moscou, M. J., Bogdanove, A. J. (2009). A simple cipher governs DNA recognition by TAL effectors. Science. 326, 1501.
194. Muraro, N. I., Weston, A. J., Gerber, A. P., Luschnig, S., Moffat, K. G., Baines, R. A. (2008). Pumilio binds para mRNA and requires Nanos and Brat to regulate sodium current in Drosophila motoneurons. J Neurosci. 28, 2099–2109.
195. Murata, Y., Wharton, R. P. (1995). Binding of pumilio to maternal hunchback mRNA is required for posterior patterning in Drosophila embryos. Cell. 80(5), 747-756.
196. Murphy, W. I., Attardi, B., Tu, C., Attardi, G. (1975) Evidence for complete symmetrical transcription in vivo of mitochondrial DNA in HeLa cells. J Mol Biol 99, 809–814.
197. Mussolino, C., Morbitzer, R., Lütge, F., Dannemann, N., Lahaye, T., Cathomen, T. (2011). A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity. Nucleic Acids Res.39(21), 9283-9293.
198. Musunuru, K. (2003) Cell-specific RNA-binding proteins in human disease. Trends CardiovascMed 13, 188–195.
199. Nagaike, T., Suzuki, T., Katoh, T., Ueda, T. (2005) Human mitochondrial mRNAs are stabilized with polyadenylation regulated by mitochondria-specific poly(A) polymerase and polynucleotide phosphorylase. J Biol Chem 280, 19721–19727.
200. Nagaike, T., Suzuki, T., Tomari, Y., Takemoto-Hori, C., Negayama, F., Watanabe, K., Ueda, T. (2001) Identification and characterization of mammalian mitochondrial tRNA nucleotidyltransferases. J Biol Chem 276, 40041–40049
201. Nagaike, T., Suzuki, T., Ueda, T. (2008) Polyadenylation in mammalian mitochondria: insights from recent studies. Biochim Biophys Acta 177, 266–269
202. Nakahata, S., Katsu, Y., Mita, K., Inoue, K., Nagahama, Y., Yamashita, M. (2001). Biochemical identification of Xenopus Pumilio as a sequence-specific cyclin B1 mRNA-binding protein that physically interacts with a Nanos homolog, Xcat-2, and a cytoplasmic polyadenylation element-binding protein. J. Biol. Chem. 276, 20945–20953
Engineering RNA-binding proteins: Unravelling the code
135
203. Nakahata, S., Kotani, T., Mita, K., Kawasaki, T., Katsu, Y., Nagahama, Y., Yamashita, M. (2003). Involvement of Xenopus Pumilio in the translational regulation that is specific to cyclin B1 mRNA during oocyte maturation. Mech Dev 120, 865–880.
204. Nakamura, T., Meierhoff, K., Westhoff, P., Schuster, G. (2003). RNA-binding properties of HCF152, an Arabidopsis PPR protein involved in the processing of chloroplast RNA. Eur J Biochem. 270(20), 4070-4081.
205. National Centre for Biotechnology Information, PSSM Viewer, Accessed 15th July 2011, http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi.
206. Nolde, M. J., Saka, N., Reinert, K. L., Slack, F. J. (2007) The Caenorhabditis elegans pumilio homolog, puf‐9, is required for the 3′UTR‐mediated repression of the let‐7 microRNA target gene, hbl‐1. Dev Biol 305(2), 551-563.
207. Nolte, R.T., Conlin, R.M., Harrison, S.C., Brown, R.S. (1998). Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. Proc Natl Acad Sci USA. 95, 2938–2943.
208. Novy, R., Morris, B. (2001) Use of glucose to control basal expression in the pET System [Article], inNovations, 13, 8-10, retrieved from http://wolfson.huji.ac.il/expression/procedures/bacterial/Glucose%20supression.pdf
209. O’Brien, T. W. (1971) The general occurrence of 55 S ribosomes in mammalian liver mitochondria. J Biol Chem 246, 3409–3417.
210. Oberstrass, F. C., Auweter, S. D., Erat, M., Hargous, Y., Henning, A., Wenter, P., Reymond, L., Amir-Ahmady, B., Pitsch, S., Black, D. L., Allain, F. H. (2005). Structure of PTB bound to RNA: specific binding and implications for splicing regulation. Science. 309(5743), 2054-2057.
211. Ojala, D., Montoya, J., Attardi, G. (1981) tRNA punctuation model of RNA processing in human mitochondria. Nature 290, 470–474.
212. Okuda, K., Myouga, F., Motohashi, R., Shinozaki, K., Shikanai, T. (2007). Conserved domain structure of pentatricopeptide repeat proteins involved in chloroplast RNA editing. Proc Natl Acad Sci USA. 104(19), 8178-8183.
213. Oleynikov, Y., and Singer, R. H. (2003). Real-Time Visualization of ZBP1 Association with beta-Actin mRNA during Transcription and Localization. Curr Biol 13, 199-207.
214. Olivas, W., Parker, R. (2000). The Puf3 protein is a transcript-specific regulator of mRNA degradation in yeast. EMBO J. 19(23), 6602-6611
215. O'Toole, N., Hattori, M., Andres, C., Iida, K., Lurin, C., Schmitz-Linneweber, C., Sugita, M., Small, I. (2008). On the expansion of the pentatricopeptide repeat gene family in plants. Mol Biol Evol. 25(6):1120-1128
216. Ozawa, T., Natori, Y., Sato, M., Umezawa, Y. (2007). Imaging dynamics of endogenous mitochondrial RNA in single living cells. Nat Methods. 4, 413-419.
217. Padmanabhan, K., Richter, J. D. (2006) Regulated Pumilio-2 binding controls RINGO/Spy mRNA translation and CPEB activation. Genes Dev 20, 199–209.
218. Park, Y. W., Wilusz, J., and Katze, M. G. (1999). Regulation of eukaryotic protein synthesis: selective influenza viral mRNA translation is mediated by the cellular RNA-binding protein GRSF-1. Proc Natl Acad Sci U S A 96, 6694-6699.
Engineering RNA-binding proteins: Unravelling the code
136
219. Parker, R., Song, H. (2004). The enzymes and control of eukaryotic mRNA turnover. Nat Struct Mol Biol. 11(2), 121-127
220. Patel, S. B., Bellini, M. (2008). The assembly of a spliceosomal small nuclear ribonucleoprotein particle. Nucleic Acids Res. 36(20), 6482-6493.
221. Pavletich, N.P., Pabo, C.O. (1991). Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science. 252, 809–817.
222. Pelham, H. R., Brown, D. D. (1980). A specific transcription factor that can bind either the 5S RNA gene or 5S RNA.. Proc Natl Acad Sci USA. 77(7), 4170-4174
223. Perrimon, N., Ni, J. Q., Perkins, L. (2010). In vivo RNAi: today and tomorrow. Cold Spring Harb Perspect Biol. 2(8), a003640.
224. Pfalz, J., Bayraktar, O. A., Prikryl, J., Barkan, A. (2009). Site-specific binding of a PPR protein defines and stabilizes 5' and 3' mRNA termini in chloroplasts. EMBO J. 28(14), 2042-2052.
225. Picard, B., Wegnez, M. (1979). Isolation of a 7S particle from Xenopus laevis oocytes: a 5S RNA-protein complex. Proc Natl Acad Sci USA. 76, 241-245.
226. Pieler, T., Hamm, J., Roeder, R. G. (1987). The 5S gene internal control region is composed of three distinct sequence elements, organized as two functional domains with variable spacing. Cell. 48, 91-100.
227. Pique, M., Lopez, J. M., Foissac, S., Guigo, R., Mendez, R. (2008). A combinatorial code for CPE-mediated translational control. Cell. 132, 434–448.
228. Pradet-Balade, B., Boulme, F., Beug, H., Mullner, E. W., Garcia-Sanz, J. A. (2001). Translation control: bridging the gap between genomics and proteomics? Trends Biochem Sci 26, 225-229.
229. Prikryl, J., Rojas, M., Schuster, G., Barkan, A. (2011). Mechanism of RNA stabilization and translational activation by a pentatricopeptide repeat proteinProc Natl Acad Sci USA. 108(1), 415-420.
230. Prinz, S., Aldridge, C., Ramsey, S. A., Taylor, R. J., Galitski, T. (2007). Control of signaling in a MAP-kinase pathway by an RNA-binding protein. PLoS One. 2, e249.
231. Prostko, C. R., Brostrom, M. A., Brostrom, C. O. (1993). Reversible phosphorylation of eukaryotic initiation factor 2 alpha in response to endoplasmic reticular signaling. Mol Cell Biochem 127-128, 255-265.
232. Putz, U., Skehel, P., and Kuhl, D. (1996). A tri-hybrid system for the analysis and detection of RNA--protein interactions. Nucleic Acids Res 24, 4838-4840.
233. Quenault, T., Lithgow, T., Traven, A. (2011) PUF proteins: repression, activation and mRNA localization. Trends Cell Biol. 21(2), 104-112
234. Rackham, O., Chin, J. W. (2005) A network of orthogonal ribosome x mRNA pairs. Nat Chem Biol. 1(3), 159-166.
235. Rackham, O., Davies, S. M., Shearwood, A. M., Hamilton, K. L., Whelan, J., Filipovska, A. (2009). Pentatricopeptide repeat domain protein 1 lowers the levels of mitochondrial leucine tRNAs in cells. Nucleic Acids Res. 37(17), 5859-5867.
236. Rackham, O., Filipovska, A. (2012). The role of mammalian PPR domain proteins in the regulation of mitochondrial gene expression. Biochim Biophys Acta. 1819(9-10),1008-1016.
Engineering RNA-binding proteins: Unravelling the code
137
237. Rackham, O., Mercer, T. R., Filipovska, A. (2012). The human mitochondrial transcriptome and the RNA-binding proteins that regulate its expression. Wiley Interdiscip Rev RNA. 3(5), 675-695.
238. Rackham, O., Shearwood, A. M., Mercer, T. R., Davies, S. M., Mattick, J. S., Filipovska, A. (2011) Long noncoding RNAs are generated from the mitochondrial genome and regulated by nuclear-encoded proteins. RNA 17, 2085–2093
239. Rappsilber, J., Ryder, U., Lamond, A. I., Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res 12, 1231-1245.
240. Richard, P., Manley, J. L. (2009). Transcription termination by nuclear RNA polymerases. Genes Dev, 23, 1247–1269.
241. Richardson, J. S., Richardson, D. C. (1988) Amino acid preferences for specific locations at the ends of alpha helices. Science 240, 1648–1652
242. Ringel, R., Sologub, M., Morozov, Y. I., Litonin, D., Cramer, P., Temiakov, D. (2011) Structure of human mitochondrial RNA polymerase. Nature. 478(7368), 269-273.
243. Rodeheffer, M. S., Boone, B. E., Bryan, A. C., Shadel, G. S. (2001). Nam1p, a protein involved in RNA processing and translation, is coupled to transcription through an interaction with yeast mitochondrial RNA polymerase. J Biol Chem. 276(11), 8616-8622.
244. Rodeheffer, M. S., Shadel, G. S. (2003) Multiple interactions involving the amino-terminal domain of yeast mtRNA polymerase determine the efficiency of mitochondrial protein synthesis. J Biol Chem 278, 18695–18701.
245. Rodriguez, C. R., Cho, E. J., Keogh, M. C., Moore, C. L., Greenleaf, A. L., Buratowski, S. (2000). Kin28, the TFIIH-associated carboxy-terminal domain kinase, facilitates the recruitment of mRNA processing machinery to RNA polymerase II. Mol. Cell. Biol. 20, 104–112.
246. Rorbach, J., Minczuki, M. (2012). The post-transcriptional life of mammalian mitochondrial RNA. Biochem. J. 444, 357–373
247. Rorbach, J., Nicholls, T. J., Minczuk, M. (2011) PDE12 removes mitochondrial RNA poly(A) tails and controls translation in human mitochondria. Nucleic Acids Res. 39, 7750–7763.
248. Rossmanith, W., Holzmann, J. (2009). Processing mitochondrial (t)RNAs: new enzyme, old job. Cell Cycle 8, 1650–1653.
249. Ruzzenente, B., Metodiev, M. D., Wredenberg, A., Bratic, A., Park, C. B., Camara, Y., Milenkovic, D., Zickermann, V., Wibom, R., Hultenby, K, Erdjument-Bromage, H., Tempst, P., Brandt, U., Stewart, J. B., Gustafsson, C. M., Larsson, N. G. (2011). LRPPRC is necessary for polyadenylation and coordination of translation of mitochondrial mRNAs. EMBO J 31, 443–456.
250. Rybak, A., Fuchs, H., Hadian, K., Smirnova, L., Wulczyn, E. A., Michel, G., Nitsch, R., Krappmann, D., Wulczyn, F. G. (2009). The let-7 target gene mouse lin-41 is a stem cell specific E3 ubiquitin ligase for the miRNA pathway protein Ago2. Nat Cell Biol 11(12), 1411-1420
251. Saint-Georges, Y., Garcia, M., Delaveau, T., Jourdren, L., Le Crom, S., Lemoine, S., Tanty, V., Devaux, F., Jacq, C. (2008) Yeast mitochondrial biogenesis: a role for the PUF RNA-binding protein Puf3p in mRNA localization. PLoS One 3, e2293.
Engineering RNA-binding proteins: Unravelling the code
138
252. Salvetti, A., Rossi, L., Lena, A., Batistoni, R., Deri, P., Rainaldi, G., Locci, M. T., Evangelista, M., Gremigni, V. (2005) DjPum, a homologue of Drosophila Pumilio, is essential to planarian stem cell maintenance. Development 132, 1863–1874.
253. Sasarman, F., Brunel-Guitton, C., Antonicka, H., Wai, T., Shoubridge, E. A.; LSFC Consortium. (2010) LRPPRC and SLIRP interact in a ribonucleoprotein complex that regulates posttranscriptional gene expression in mitochondria. Mol Biol Cell. 21(8), 1315-1323.
254. Scheper, G. C., Thomas, A. A., van Wijk, R. (1998). Inactivation of eukaryotic initiation factor 2B in vitro by heat shock. Biochem J 334, 463-467.
255. Schmitz-Linneweber, C., and Small, I. (2008) Pentatricopeptide repeat proteins: a socket set for organelle gene expression. Trends Plant Sci. 13(12):663-670.
256. Schmitz-Linneweber, C., Williams-Carrier, R. E., Williams-Voelker, P. M., Kroeger, T. S., Vichas, A., Barkan, A. (2006). A pentatricopeptide repeat protein facilitates the trans-splicing of the maize chloroplast rps12 pre-mRNA. Plant Cell. 18(10), 2650-2663.
257. Schoenberg, D. R., and Maquat, L. E. (2012) Regulation of cytoplasmic mRNA decay.Nat Rev Genet. 13(4), 246-259.
258. Sengupta, D. J., Wickens, M., and Fields, S. (1999). Identification of RNAs that bind to a specific protein using the yeast three-hybrid system. RNA 5, 596-601.
259. SenGupta, D. J., Zhang, B., Kraemer, B., Pochart, P., Fields, S., Wickens, M. (1996) A three-hybrid system to detect RNA-protein interactions in vivo. Proc Natl Acad Sci U S A 93, 8496-8501.
260. Sera, T. (2009) Zinc-finger-based artificial transcription factors and their applications. Adv Drug Deliv Rev. 61(7-8):513-26.
261. Sharma, M. R., Koc, E. C., Datta, P. P., Booth, T.M., Spremulli, L.L., Agrawal, R. K. (2003) Structure of the mammalian mitochondrial ribosome reveals an expanded functional role for its component proteins. Cell 115, 97–108.
262. Shestakova, E. A., Wyckoff, J., Jones, J., Singer, R. H., Condeelis, J. (1999). Correlation of beta-actin messenger RNA localization with metastatic potential in rat adenocarcinoma cell lines. Cancer Res 59, 1202-1205.
263. Shikanai, T. (2006). RNA editing in plant organelles: machinery, physiological function and evolution. Cell Mol Life Sci. 63(6), 698-708.
264. Shimizu, Y., Bhakta, M. S., Segal, D. J. (2009) Restricted spacer tolerance of a zinc finger nuclease with a six amino acid linker. Bioorg. Med. Chem. Lett. 19 (14), 3970–3972.
265. Shoubridge, E. A. (2001). Nuclear genetic defects of oxidative phosphorylation. Hum Mol Genet 10, 2277–2284.
266. Shyu, A. B., Wilkinson, M. F. (2000). The double lives of shuttling mRNA binding proteins. Cell. 102(2), 135-138.
267. Sikorski, T. W., Buratowski, S. (2009). The basal initiation machinery: Beyond the general transcription factors. Curr Opin Cell Biol 21, 344–351
268. Slomovic, S., Laufer, D., Geiger, D., Schuster, G. (2005) Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol Cell Biol 25, 6427–6435.
Engineering RNA-binding proteins: Unravelling the code
139
269. Small, I. D., Peeters, N. (2000) The PPR motif - a TPR-related motif prevalent in plant organellar proteins. Trends Biochem Sci. 25(2), 46-47
270. Smeitink, J., van denHeuvel, L., DiMauro, S. (2001). The genetics and pathology of oxidative phosphorylation. Nat Rev Genet 2, 342–352
271. Smith, C.W., Valcárcel, J. (2000) Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem Sci. 25(8), 381-388.
272. Smith, P. K., Krohn, R. I.,Hermanson, G. T., Mallia, A. K., Gartner, F. H., Provenzano, M. D., Fujimoto, E. K., Goeke, N. M., Olson, B. J., Klenk, D.C. (1985). Measurement of protein using bicinchoninic acid, Analytical Biochemistry, Volume 150, Issue 1, October 1985, Pages 76-85, ISSN 0003-2697
273. Smits, P., Smeitink, J. A., van den Heuvel, L. P., Huynen, M. A., Ettema, T. J. (2007). Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 35, 4686–4703.
274. Sondheimer, N., Fang, J. K., Polyak, E., Falk, M. J., Avadhani, N. G. (2010). Leucine-rich pentatricopeptide-repeat containing protein regulates mitochondrial transcription. Biochemistry. 49(35), 7467-7473
275. Sonnhammer, E. L., Eddy, S. R., Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 28(3), 405-420.
276. Sonoda, J., Wharton, R. P. (1999). Recruitment of Nanos to hunchback mRNA by Pumilio. Genes Dev. 13(20), 2704-2712.
277. Souza, G. M., da Silva, A. M., Kuspa, A. (1999). Starvation promotes Dictyostelium development by relieving PufA inhibition of PKA translation through the YakA kinase pathway. Development 126, 3263–3274
278. Spassov, D. S., Jurecic, R. (2002).Cloning and comparative sequence analysis of PUM1 and PUM2 genes, human members of the Pumilio family of RNA-binding proteins. Gene. 299(1-2), 195-204
279. Spik, A., Oczkowski, S., Olszak, A., Formanowicz, P., Blazewicz, J., Jaruzelska, J. (2006) Human fertility protein PUMILIO2 interacts in vitro with testis mRNA encoding Cdc42 effector 3 (CEP3). Reprod Biol 6(2), 103-113.
280. Srisawat, C., Engelke, D. R. (2001). Streptavidin aptamers: affinity tags for the study of RNAs and ribonucleoproteins. RNA 7, 632-641.
281. Srisawat, C., Goldstein, I. J., Engelke, D. R. (2001). Sephadex-binding RNA ligands: rapid affinity purification of RNA from complex RNA mixtures. Nucleic Acids Res 29, E4.
282. St Johnston, D. (2005) Moving messages: the intracellular localization of mRNAs. Nat Rev Mol Cell Biol. 6(5), 363-375.
283. Stark, G. R., Kerr, I. M., Williams, B. R., Silverman, R. H., Schreiber, R. D. (1998). How cells respond to interferons. Annu Rev Biochem. 67, 227-264.
284. Steinmetz, E. J. (1997). Pre-mRNA processing and the CTD of RNA polymerase II: the tail that wags the dog? Cell 89, 491-494.
285. Sterky, F. H., Ruzzenente, B., Gustafsson, C. M., Samuelsson, T., Larsson, N. G. (2010). LRPPRC is a mitochondrial matrix protein that is conserved in metazoans. Biochem Biophys Res Commun. 398, 759–764.
Engineering RNA-binding proteins: Unravelling the code
140
286. Stern, B., Olsen, L. C., Tröße, C., Ravneberg, H., Pryme, I.F. (2007). Improving mammalian cell factories : The selection of signal peptide has a major impact on recombinant protein synthesis and secretion in mammalian cells.? Trends Cell Mol. Biol. 2, 1-1
287. Stumpf, C. R., Opperman, L., Wickens, M. (2008) Chapter 14. Analysis of RNA-protein interactions using a yeast three-hybrid system. Methods Enzymol. 449, 295-315.
288. Stumpp, M. T., Forrer, P., Binz, H. K., Plückthun, A. (2003) Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. J Mol Biol. 332(2), 471-487.
289. Subramaniam, K., Seydoux, G. (2003). Dedifferentiation of primary spermatocytes into germ cell tumors in C. elegans lacking the pumilio-like protein PUF-8. Curr Biol. 13, 134–139.
290. Suh, N., Crittenden, S. L., Goldstrohm, A., Hook, B., Thompson, B., Wickens, M., Kimble, J. (2009) FBF and its dual control of gld‐1 expression in the Caenorhabditis elegans germline. Genetics 181, 1249–1260.
291. Svoboda, P., Stein, P., Hayashi, H., Schultz, R. M. (2000) Selective reduction of dormant maternal mRNAs in mouse oocytes by RNA interference. Development 127, 4147–4156.
292. Tadauchi, T., Matsumoto, K., Herskowitz, I., and Irie, K. (2001). Post-transcriptional regulation through the HO 3′-UTR by Mpt5, a yeast homolog of Pumilio and FBF. EMBO J. 20, 552–561.
293. Takagaki, Y., and Manley, J. L. (1997). RNA recognition by the human polyadenylation factor CstF. Mol Cell Biol 17, 3907-3914.
294. Takizawa, P. A., and Vale, R. D. (2000). The myosin motor, Myo4p, binds Ash1 mRNA via the adapter protein, She3p. Proc Natl Acad Sci U S A 97, 5273-5278.
295. Takizawa, P. A., Sil, A., Swedlow, J. R., Herskowitz, I., Vale, R. D. (1997). Actin-dependent localization of an RNA encoding a cell-fate determinant in yeast. Nature 389, 90-93.
296. Tarun, S. Z. Jr, Wells, S. E., Deardorff, J. A., Sachs, A. B. (1997). Translation initiation factor eIF4G mediates in vitro poly(A) tail-dependent translation. Proc. Natl. Acad. Sci. USA 94, 9046–9051
297. Tavares-Carreón, F., Camacho-Villasana, Y., Zamudio-Ochoa, A., Shingú-Vázquez, M., Torres-Larios, A., Pérez-Martínez, X. (2008). The pentatricopeptide repeats present in Pet309 are necessary for translation but not for stability of the mitochondrial COX1 mRNA in yeast. J Biol Chem. 283(3), 1472-1479.
298. Taylor, G. A., Carballo, E., Lee, D. M., Lai, W. S., Thompson, M. J., Patel, D. D., Schenkman, D. I., Gilkeson, G. S., Broxmeyer, H. E., Haynes, B. F., and Blackshear, P. J. (1996). A pathogenetic role for TNF alpha in the syndrome of cachexia, arthritis, and autoimmunity resulting from tristetraprolin (TTP) deficiency. Immunity. 4, 445–454.
299. Tebas, P., Stein, D. (2009). Autologous T-Cells Genetically Modified at the CCR5 Gene by Zinc Finger Nucleases SB-728 for HIV. ClinicalTrials.gov
300. Temperley, R. J., Wydro, M., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2010) Human mitochondrial mRNAs-like members of all families, similar but different. Biochim Biophys Acta 1797, 1081–1085.
301. Tennyson, C. N., Klamut, H. J., Worton, R. G. (1995). The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced. Nat Genet 9, 184-190.
Engineering RNA-binding proteins: Unravelling the code
141
302. Thomson, A. M., Rogers, J. T., Walker, C. E., Staton, J. M., and Leedman, P. J. (1999). Optimized RNA gel-shift and UV cross-linking assays for characterization of cytoplasmic RNA-protein interactions. Biotechniques 27, 1032-1039, 1042.
303. Tilsner, J., Linnik, O., Christensen, N. M., Bell, K., Roberts, I. M., Lacomme, C., Oparka, K. J. (2009) Live-cell imaging of viral RNA genomes using a Pumilio-based reporter. Plant J. 57(4), 758-770.
304. Trigon, S., Serizawa, H., Conaway, J. W., Conaway, R. C., Jackson, S. P., Morange, M. (1998). Characterization of the residues phosphorylated in vitro by different C-terminal domain kinases. J Biol Chem. 273(12), 6769-6775.
305. Ulbricht, R. J., Olivas, W. M. (2008). Puf1p acts in combination with other yeast Puf proteins to control mRNA stability. RNA 14, 246–262.
306. Urban, R. J., Bodenburg, Y., Kurosky, A., Wood, T. G., and Gasic, S. (2000). Polypyrimidine tract-binding protein-associated splicing factor is a negative regulator of transcriptional activity of the porcine p450scc insulin-like growth factor response element. Mol Endocrinol 14, 774-782.
307. Vaishnaw, A. K., Gollob, J., Gamba-Vitalo, C., Hutabarat, R., Sah, D., Meyers, R., de Fougerolles, T., Maraganore, J. (2010). A status report on RNAi therapeutics. Silence. 1(1), 14.
308. van Eeden, F. St Johnston, D. (1999). The polarisation of the anterior-posterior and dorsal-ventral axes during Drosophila oogenesis. Curr Opin Genet Dev 9, 396-404
309. Van Etten, J., Schagat, T. L., Hrit, J., Weidmann, C. A., Brumbaugh, J., Coon, J. J., Goldstrohm, A. C. (2012). Human Pumilio Proteins Recruit Multiple Deadenylases to Efficiently Repress Messenger RNAs. J Biol Chem. 287(43), 36370-36383
310. van Kouwenhove, M., Kedde, M., Agami, R. (2011). MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer. 11, 644–656.
311. Vessey, J. P., Schoderboeck, L., Gingi, E., Luzi, E., Riefler, J., Di Leva, F., Karra, D., Thomas, S., Kiebler, M. A., Macchi, P. (2010). Mammalian Pumilio 2 regulates dendrite morphogenesis and synaptic function. Proc Natl Acad Sci USA 107, 3223–3227.
312. Vessey, J. P., Vaccani, A., Xie, Y., Dahm, R., Karra, D., Kiebler, M. A., Macchi, P. (2006). Dendritic localization of the translational repressor Pumilio 2 and its contribution to dendritic stress granules. J Neurosci 26, 6496–6508.
313. Vincent, M., Lauriault, P., Dubois, M. F., Lavoie, S., Bensaude, O., and Chabot, B. (1996). The nuclear matrix protein p255 is a highly phosphorylated form of RNA polymerase II largest subunit which associates with spliceosomes. Nucleic Acids Res 24, 4649-4652.
314. Wahl, M. C., Will, C. L., Lührmann, R. (2009).The spliceosome: design principles of a dynamic RNP machine. Cell. 136(4):701-18
315. Wahle, E. (1995). Poly(A) tail length control is caused by termination of processive synthesis. J Biol Chem. 270, 2800-2808.
316. Walther, T. N., Wittop Koning, T. H., Schümperli, D., Müller, B. A. (1998). 5'-3' exonuclease activity involved in forming the 3' products of histone pre-mRNA processing in vitro. RNA. 4(9), 1034-1046
317. Wang, X., McLachlan, J., Zamore, P. D., Hall, T. M. (2002) Modular recognition of RNA by a human pumilio-homology domain. Cell. 110(4), 501-512.
Engineering RNA-binding proteins: Unravelling the code
142
318. Wang, Y., Cheong, C. G., Hall, T. M., Wang, Z. (2009) Engineering splicing factors with designed specificities. Nat Methods. 6(11), 825-830.
319. Wang, Z., and Burge, C. B. (2008) Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 14(5), 802-813.
320. Watanabe, T., Ito, Y., Yamada, T., Hashimoto, M., Sekine, S., Tanaka, H. (1994) The role of the C-terminal domain and type III domains of chitinase A1 from Bacillus circulans WL-12 in chitin degradation. J. Bacteriol. 176, 4465-4472.
321. Watkins, N. J., Segault, V., Charpentier, B., Nottrott, S., Fabrizio, P., Bachi, A., Wilm, M., Rosbash, M., Branlant, C., and Luhrmann, R. (2000). A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell 103, 457-466.
322. Wharton, R. P., Aggarwal, A. K. (2006). mRNA regulation by Puf domain proteins. Sci STKE. 2006(354), pe37.
323. Wharton, R. P., and Struhl, G. (1991) RNA regulatory elements mediate control of Drosophila body pattern by the posterior morphogen nanos. Cell 67, 955–967.
324. Wianny, F., Zernicka-Goetz, M. (2000) Specific interference with gene functions by double-stranded RNA in early mouse development. Nature Cell Biol. 2, 70–75.
325. Wickens, M., Bernstein, D. S., Kimble, J., Parker, R. (2002) A PUF family portrait: 3'UTR regulation as a way of life. Trends Genet 18, 150-157.
326. Will, CL., Lührmann, R. (2011) Spliceosome structure and function. Cold Spring Harb Perspect Biol. 3(7) Review.
327. Williams-Carrier, R., Kroeger, T., Barkan, A. (2008). Sequence-specific binding of a chloroplast pentatricopeptide repeat protein to its native group II intron ligand. RNA. 14(9), 1930-1941.
328. Wong, G. K., Passey, D. A., Yu, J. (2001). Most of the human genome is transcribed. Genome Res 11, 1975-1977.
329. Wood, A. J., Lo, T. W., Zeitler, B., Pickle, C. S., Ralston, E. J., Lee, A. H., Amora, R., Miller, J. C., Leung, E., Meng, X., Zhang, L., Rebar, E. J., Gregory, P. D., Urnov, F. D., Meyer, B. J. (2011). Targeted genome editing across species using ZFNs and TALENs. Science. 333(6040), 307.
330. Wreden, C., Verrotti, A. C., Schisa, J. A., Lieberfarb, M. E., Strickland, S. (1997). Nanos and pumilio establish embryonic polarity in Drosophila by promoting posterior deadenylation of hunchback mRNA. Development. 124, 3015–3023
331. Wydro, M., Bobrowicz, A., Temperley, R. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2010) Targeting of the cytosolic poly(A) binding protein PABPC1 to mitochondria causes mitochondrial translation inhibition. Nucleic Acids Res 38, 3732–3742.
332. Xu, F., Ackerley, C., Maj, M. C., Addis, J. B., Levandovskiy, V., Lee, J., Mackay, N., Cameron, J. M., Robinson, B. H. (2008). Disruption of a mitochondrial RNA-binding protein gene results in decreased cytochrome b expression and a marked reduction in ubiquinol-cytochrome c reductase activity in mouse heart mitochondria. Biochem J. 416, 15–26.
333. Xu, F., Morin, C., Mitchell, G., Ackerley, C., Robinson, B. H. (2004). The role of the LRPPRC (leucine-rich pentatricopeptide repeat cassette) gene in cytochrome oxidase assembly:
Engineering RNA-binding proteins: Unravelling the code
143
mutation causes lowered levels of COX (cytochrome c oxidase) I and COX III mRNA. Biochem J. 382(Pt 1), 331-336
334. Yamazaki, H., Tasaka, M., Shikanai, T. (2004). PPR motifs of the nucleus-encoded factor, PGR3, function in the selective and distinct steps of chloroplast gene expression in Arabidopsis. Plant J. 38(1), 152-163.
335. Yang, Q., Doublié, S., (2011). Structural biology of poly(A) site definition. Wiley Interdiscip Rev RNA. 2(5), 732-747.
336. Ye, B., Petritsch, C., Clark, I. E., Gavis, E. R., Jan, L. Y., Jan, Y. N. (2004). Nanos and Pumilio are essential for dendrite morphogenesis in Drosophila peripheral neurons. Curr Biol. 14, 314–321.
337. Yonaha, M., Proudfoot, N. J. (2000). Transcriptional termination and coupled polyadenylation in vitro. EMBO J. 19(14), 3770-3777.
338. Zamore, P. D., Williamson, J. R., Lehmann, R. (1997) The Pumilio protein binds RNA through a conserved domain that defines a new class of RNA-binding proteins. RNA. 3(12), 1421-1433.
339. Zaphiropoulos, P. G. (1998). Mechanisms of pre-mRNA splicing - classical versus non-classical pathways. Histology & Histopathology 13, 585-589.
340. Zenklusen, D., Larson, D. R., Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol. 15(12), 1263-1271.
341. Zhang, B., Gallegos, M., Puoti, A., Durkin, E., Fields, S., Kimble, J., Wickens, M. P. (1997). A conserved RNA-binding protein that regulates sexual fates in the C. elegans hermaphrodite germ line. Nature. 390(6659), 477-484.
342. Zhang, F., Cong, L., Lodato, S., Kosuri, S., Church, G. M., Arlotta, P. (2011). Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription. Nat Biotechnol. 29(2), 149-153.
343. Zhao, J., Hyman, L., Moore, C. (1999) Formation of mRNA 3' ends in eukaryotes: Mechanism, regulation and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev 63, 405-45
344. Zheng, Z. M., Huynen, M., Baker, C. C. (1998). A pyrimidine-rich exonic splicing suppressor binds multiple RNA splicing factors and inhibits spliceosome assembly. Proc Natl Acad Sci U S A 95, 14088-14093.
345. Zhu, D., Stumpf, C. R., Krahn, J. M., Wickens, M., Hall, T. M. (2009). A 5' cytosine binding pocket in Puf3p specifies regulation of mitochondrial mRNAs. Proc Natl Acad Sci USA. 106(48), 20192-20197
346. Zipor, G., Haim-Vilmovsky, L., Gelin-Licht, R., Gadir, N., Brocard, C., Gerst, J. E. (2009) Localization of mRNAs coding for peroxisomal proteins in the yeast, Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 106(47), 19848-19853.
347. Zsigmond, L., Rigó, G., Szarka, A., Székely, G., Otvös, K., Darula, Z., Medzihradszky, K. F., Koncz, C., Koncz, Z., Szabados, L. (2008). PPR40 connects abiotic stress responses to mitochondrial electron transport. Plant Physiol. 146(4), 1721-1737
top related