engineering rna-binding proteins: unravelling the code · engineering rna-binding proteins:...

Engineering RNA-binding proteins: Unravelling the code

Muhammad Fazril MOHAMAD RAZIF B.Forensics and B.Sc (Hons)

This thesis is presented for the degree of Doctor of Philosophy of Medicine and Pharmacology of The University of Western Australia

School of Medicine and Pharmacology

DECLARATION FOR THESES CONTAINING PUBLISHED WORK AND/OR WORK PREPARED FOR PUBLICATION

The examination of the thesis is an examination of the work of the student. The work must have been substantially conducted by the student during enrolment in the degree.

Where the thesis includes work to which others have contributed, the thesis must include a statement that makes the student’s contribution clear to the examiners. This may be in the form of a description of the precise contribution of the student to the work presented for examination and/or a statement of the percentage of the work that was done by the student.

In addition, in the case of co-authored publications included in the thesis, each author must give their signed permission for the work to be included. If signatures from all the authors cannot be obtained, the statement detailing the student’s contribution to the work must be signed by the coordinating supervisor.

Please sign one of the statements below.

1. This thesis does not contain work that I have published, nor work under review for publication. Student Signature ........................................................................................................................

2. This thesis contains only sole-authored work, some of which has been published and/or prepared for publication under sole authorship. The bibliographical details of the work and where it appears in the thesis are outlined below. Student Signature ........................................................................................................................

3. This thesis contains published work and/or work prepared for publication, some of which has been co-authored. The bibliographical details of the work and where it appears in the thesis are outlined below. The student must attach to this declaration a statement for each publication that clarifies the contribution of the student to the work. This may be in the form of a description of the precise contributions of the student to the published work and/or a statement of percent contribution by the student. This statement must be signed by all authors. If signatures from all the authors cannot be obtained, the statement detailing the student’s contribution to the published work must be signed by the coordinating supervisor.

Filipovska, A., Razif, M. F., Nygard, K. K., Rackham, O. (2011) A universal code for RNA

Recognition by PUF proteins. Nat Chem Biol. 7(7), 425-427 Student Signature:…………………………………………………………………………………….. Coordinating Supervisor Signature:…………………………………………………………………

DECLARATION

I, Muhammad Fazril MOHAMAD RAZIF, declare that the PhD thesis entitled “Engineering

RNA-binding proteins: Unraveling the code” contains published work within Chapters 3

and 4. The work presented in Chapters 3 and 4 has been published in the following

publication:

Filipovska, A., Razif, MF., Nygard, KK., Rackham, O., 2011, A universal code for RNA

recognition by PUF proteins, Nat Chem Biol., 7(7):425-7 (IMPACT FACTOR: 14.69)

For this publication, I performed the construction of plasmids, protein purifications and

the associated troubleshooting for appropriate induction conditions, and established and

optimized the yeast three-hybrid assays. The work described in Chapters 5 and 6 was

performed entirely by me, with the exception of the northern blot.

Student Signature: Date: Co-ordinating Supervisor Signature Date:

ABSTRACT

RNA-protein interactions have key roles in the regulation of gene expression and

are vital for many cellular processes and complex developmental programs in eukaryotes.

Proteins that have the ability to bind RNAs tend to do so in various modes that are often

difficult to predict, limiting the ability to engineer these RNA-binding proteins for medical

and biotechnological use. Hence, engineering proteins that can bind a specific RNA

sequence has many potential applications, analogous to that of siRNAs and miRNAs, with

even more potential benefits. The ability to fuse RNA-binding proteins to any desired

effector domain, in turn enabling the manipulation of any aspect of the target RNA’s

metabolism makes engineering these proteins highly appealing. Here, the recognition of

PUF repeats beyond adenine, guanine and uracil has been achieved through directed

evolution, enabling them to specifically bind cytosine. With this code, PUF domains

capable of selectively binding RNA targets of diverse sequence and structure can be

designed. Unlike the PUFs, the basis for nucleotide RNA recognition by pentatricopeptide

repeat (PPR) proteins, another RNA-binding protein, remains ambiguous. Here,

computational methods have been used to create a stable, highly reduced PPR

architecture for the study of RNA-binding specificity and the design of specific tools to

manipulate RNA metabolism. We used these synthetic PPRs to examine the amino acid

codes for nucleotide recognition by PPRs, which also revealed that PPRs have a modular

recognition mechanism similar to that of PUFs. These findings provide a significant step

towards predicting native binding sites of the vast number of PPR proteins found in

nature. It also highlights the possibility of a PPR scaffold to be engineered for new

functions and sequence specificities.

ACKNOWLEDGEMENTS

The decision to pursue this PhD has definitely been a life-changing experience for

me and the past four years would not have been possible without the remarkable amount

of support and guidance that I received. First of all, I would like to say a very big thank you

to my supervisors, Dr. Oliver Rackham and Dr. Aleksandra Filipovska, for their never-

ending encouragement and support throughout this journey. Their passion for science and

incredible focus motivated me to come into the lab every day; in my eyes, they are truly

inspirational. They never cease to amaze me and I hope that I too can become the

accomplished scientist that they are (although I may require some genetic engineering).

Many thanks also to Dr. Stefan Davies and Anne-Marie for guiding me not only

with experiments, but also for all the entertaining discussions and invaluable advice. To all

the other students in Lab 555, Isabel, Tara, Ross, Louis and Tiong-Sun, you for all have

been by my side throughout this PhD, living every single minute of it in the lab with me,

encouraging one another to strive for the best and for that I am grateful. Many thanks to

Kristina and Karoline for readily assisting me whenever I needed help. I am indebted to my

closest friends, Karina, Ellen, Kailin, Szymon, Aaron, Michael S. and Mike J., who shared

their lives with me. I cannot thank you enough for being the most awesome group of

friends one could ever wish for. Without all of you, I think I would have gone cray cray. I

would also like to say a heartfelt thank you to my Mum and Dad for always believing in me

and encouraging me to follow my dreams. Many thanks also to my grandmother and my

siblings, Nur, Lin, Nina and Ariq, for helping in whatever way they could during this

challenging period, I love you all very much.

Finally, I gratefully acknowledge the funding received towards my PhD from the

UWA Scholarships for International Research Fees (SIRF) and University International

Stipend (UIS). Thanks to the Graduate Research School for both the travel grant and

the GRST award. These have enabled me to present my work at Lorne as well as provided

funding for additional experiments.

TABLE OF CONTENTS

DECLARATION i ABSTRACT iii ACKNOWLEDGEMENTS iv TABLE OF CONTENTS v ABBREVIATIONS viii CHAPTER 1: Introduction 1 1.1 Gene Expression 1 1.1.2 The Importance of post-transcriptional control 1 1.2 RNA-binding proteins and their roles in gene expression 3 1.2.1 Co-transcriptional recruitment of RNA binding proteins during mRNA 4

processing 1.2.2 Capping 5 1.2.3 Splicing 5 1.2.4 Polyadenylation 6 1.3 Mitochondrial gene expression 7 1.4 Candidate RNA-binding proteins for biotechnology applications 10 1.4.1 Zinc fingers proteins that bind RNA 12 1.4.2 PUF proteins 14 1.4.3 PUF family of proteins 15 1.4.4 Functions of PUF proteins 16 1.4.5 Human PUF proteins 19 1.4.6 Features of PUF proteins 20 1.5 Pentatricopeptide repeat (PPR) proteins 22 1.5.1 Members of the PPR family and its functions 22 1.5.2 PPR motif and structure 24 1.5.3 RNA recognition code of PPR proteins 26 1.5.4 Human PPRs 28 1.6 Purview of the thesis 31 CHAPTER 2: Materials and Methods 33 2.1 Materials 33 2.2 Methods 34 2.2.1 Plasmid construction 35 2.2.2 E. coli competent cell preparation 37 2.2.3 E. coli transformation 38 2.2.4 Bacterial colony screening 38 2.2.5 Plasmid preparation and analysis 39 2.2.6 Yeast transformation 39 2.2.7 PUF protein expression and purification 40 2.2.8 cPPR protein expression and purification 41

2.2.9 Bicinchoninic acid (BCA) protein assay 41 2.2.10 SDS-PAGE gel 41 2.2.11 RNA electrophoretic mobility shift assays 42 2.2.12 PUF library selections 42 2.2.13 Yeast three-hybrid assays 43 2.2.14 β-galactosidase assays 43 2.2.15 Cell culture 43 2.2.16 Transfection 44 2.2.17 Northern blotting 44 2.2.18 Mitochondrial protein synthesis 44 2.3 Graphic maps of plasmids 45 2.3.1 pIIIA/MS2-2 plamsid 45 2.3.2 pTYB3-EYFP plasmid 46 2.3.3 pETM30-EYFP plasmid 46 2.3.4 pTYB3 cPPRcaps poly(A) plasmid 47 2.3.5 pcDNA3-OTC-cPPRcaps poly(A)-CTAP plasmid 47 CHAPTER 3: Engineering Cytosine-binding PUF repeats 48 3.1 Methods to study RNA protein interactions 50 3.2 Genetic selection of PUF library in yeast three-hybrid system 52 3.3 Yeast three-hybrid sensitivity testing for engineering individual PUF 53

repeats 3.3.1 Yeast three-hybrid system is sensitive for engineering PUFs 55 3.4 Library screening for cytosine-binding PUF 56 3.4.1 Five PUF mutants interact with cytosine 57 3.5 In vitro analysis of PUF-NRE interaction 60 3.5.1 Purifying PUF proteins 60 3.5.2 RNA electrophoretic mobility shift assay of PUF proteins 68 3.6 Summary 70 CHAPTER 4: Exploring Features of PUF-RNA Interactions 72 4.1 General applicability of cytosine-binding code 72 4.1.1 The cytosine code is modular 74 4.2 PUF-RNA interaction with increasing structure 75 4.3 Extending the PUF domain beyond eight repeats 77 4.4 Summary 81 CHAPTER 5: Engineering Consensus PPRs 83 5.1 Designing consensus PPR 84 5.1.1 Purifying cPPRcaps protein 87 5.1.2 RNA electrophoretic mobility shift assay of the cPPRcaps protein 87 5.2 In vitro analysis of other consensus PPR interactions based on Barkan 89

et al. (2012) 5.2.1 Purifying cPPRcaps poly(A) proteins 90 5.2.2 RNA electrophoretic mobility shift assay of the cPPRcaps poly(A) protein 91 5.2.3 Purifying cPPRcaps poly(G) proteins 92 5.2.4 RNA electrophoretic mobility shift assay of the cPPRcaps poly(G) protein 93 5.2.5 Purifying cPPRcaps poly(U/C) proteins 94 5.3 In vitro analysis of other consensus PPR combinations 95 5.3.1 Purifying cPPRcaps poly(C) [NS/NT] proteins 97 5.3.2 RNA electrophoretic mobility shift assay of the cPPRcaps poly(C) [NS/NT] 98

protein 5.3.3 Purifying cPPRcaps poly(G) [GD/SD] proteins 99 5.3.4 RNA electrophoretic mobility shift assay of the cPPRcaps poly(G) [GD/SD] 100

protein 5.4 Summary 100 CHAPTER 6: Engineering Designer PPR proteins 103 6.1 Design of consensus PPR protein that binds the NRE RNA 103 6.1.1 Purifying cPPRcaps NRE proteins 104 6.1.2 RNA Electrophoretic Mobility Shift Assay of cPPRcaps NRE proteins 105 6.2 Mammalian mitochondrial RNA metabolism 106 6.2.1 cPPRcaps poly(A) reduces the translation of mitochondrially encoded 109

proteins 6.3 cPPRcaps poly(A) does not affect mitochondrial RNA stability 110 6.4 Summary 112 CHAPTER 7: Discussion 113 BIBLIOGRAPHY 122

ABBREVIATIONS 3-AT 3-amino triazole A adenine APUM Arabidopsis Pumilio ARE AU-rich elements ASF arginine- and serine –rich domains Ash1 a histone-lysine N-methyltransferase enzyme Asn asparagine ATP adenosine triphosphate C cytosine CDK2 Cyclin-dependent kinase 2 cDNA complimentary DNA CF cleavage factors COXI cytochrome c oxidase subunit I cPPR consensus pentatricopeptide protein CPSF cleavage/polyadenylation specificity factor CSTF cleavage stimulatory factor CTD carboxyl terminal domain CYTB cytochrome B dsRNA double stranded RNA eIF2α eukaryotic initiation factor 2- alpha eIF4G eukaryotic translation initiation factor 4 gamma EMSA electrophoretic mobility shift assay ETC electron transport chain EYFP enhanced yellow fluorescent protein FBF fem-3 binding factor G guanine Gly glycine GST glutathione-S-transferase tag hb hunchback His6 hexa histidine-tag hnRNP A1 Heterogeneous nuclear ribonucleoprotein A1 HSP heavy (H)-strand promoter IMP1 insulin-like growth factor II mRNA binding protein 1 IPTG isopropylthio-β-galactoside IRES internal ribosome entry site LB lysogeny broth LRPPRC leucine-rich pentatricopeptide repeat cassette

LRR leucine-rich repeats LSP light (L)-strand promoter MCEI mRNA-capping enzyme catalytic subunit Met methionine miRNA microRNA mRNA messenger RNA MRPP3 mitochondrial RNase P protein 3 MRPS27 mitochondrial ribosomal protein of the small subunit 27 mtDNA mitochondrial DNA ncRNA non-coding RNA ND1 NADH dehydrogenase 1 ND6 NADH dehydrogenase 6 NRE Nanos response element NRE Nanos response element Nt nucleotide O/N overnight ORF open reading frame OTC ornithine transcarbamylase PABII poly(A)-binding protein II PABP poly(A)-binding protein PAP poly(A) polymerase PAPD1 PAP associated domain containing 1 PARN poly(A)-specific ribonuclease PBS phosphate buffered saline PCR polymerase chain reaction PDE12 phosphodiesterase 12 PFAM a database of protein families POLRMT mitochondrial RNA polymerase poly(A) polyadenylate poly(C) polycytosine poly(G) polyguanine poly(U) polyuracil PPR Pentatricopeptide repeat PSSM position-specific scoring matrix PTB polypyrimidine tract binding proteins PTCD PPR domain containing protein PUF PUMILIO and fem-3 binding factor PUM1 Pumilio homolog 1 RBP RNA-binding protein

RNAi RNA interference RNAPII RNA polymerase II RNR1 RNA, ribosomal 1 RNR2 RNA, ribosomal 2 rRNA ribosomal RNA RVDS repeat variable di-residues SC media Synthetic Complete (SC) Media Ser serine siRNA short interfering RNA snRNA small nuclear RNA snRNPs small nuclear ribonucleoproteins T Thymine TALE TAL effectors TEV tobacco etch virus TFAM mitochondrial transcription factor A TFB1M mitochondrial transcription factor B1 TPR tetratricopeptide repeat protein TRIM71 tripartite motif containing 71 U uracil UTR untranslated region VEGF-A Vascular endothelial growth factor ZF zinc finger

CHAPTER 1

Introduction

1.1 Gene Expression

Eukaryotic gene expression is a complex stepwise process. A simplified view of the

process begins with the initiation of transcription, followed by the elongation of the

messenger RNA (mRNA) transcript and its termination. During transcription, the pre-

mRNA undergoes several structural changes which include capping at the 5’ end, the

splicing of introns, and the polyadenylation of the 3’ end. The mature mRNA is then

released and exported to the cytoplasm for translation. The view that transcriptional

regulation is the predominant regulatory mechanism has been challenged over the past

few decades with the discovery of post-transcriptional mechanisms for regulating gene

expression (Reviewed in Kishore et al., 2010; Glisovic et al., 2008).

1.1.1 The Importance of post-transcriptional control

Given that DNA is stably maintained at one or two copies as a permanent source of

genetic information in most cells, there are not many opportunities for cells to respond

rapidly to environmental changes or stresses. As most eukaryotic mRNAs are transcribed

as much larger precursors with a great deal of intronic material to be removed, their rate

of synthesis presents a hurdle against rapid changes in gene expression. RNA polymerase

II (RNAPII) transcribes genes of an average length of 60 kb in human cells (Wong et al.,

2001). Since the elongation rate of RNAPII is approximately 30 nucleotides per second,

this results in an average transcription time of over 30 minutes for an average precursor

mRNA (excluding the often slow step of transcription initiation). With some larger

transcripts taking much longer than this (eg. the human dystrophin locus requires 16

hours to transcribe; Tennyson et al., 1995), it would be extremely advantageous for cells

to be able to quickly change their protein complement post-transcriptionally. In addition

to providing a rapid means to respond, post-transcriptional control is particularly useful

because it is readily reversible. For instance, phosphorylation of initiation factor eIF2α

inactivates its ability to exchange GDP for GTP, shutting down translation globally

(Scheper et al., 1998). Simple dephosphorylation can reactivate translation rapidly and in

an energy efficient manner; this could not occur at the transcriptional level (Prostko et al.,

1993).

As DNA is present in a completely different compartment from where its

expression is actualized, the nucleus and cytoplasm, respectively, the only way spatial

information can be integrated into gene expression is at the post-transcriptional level. This

can be achieved by either incorporating this information into the final protein product or

into the mRNA so that its expression can be directed to a particular zone in the cell. As the

average mRNA is translated into thousands of proteins, it is much more energy efficient to

transport the mRNA rather than the protein to the required location (Stern et al., 2007;

Pradet-Balade et al., 2001). The asymmetric partitioning of mRNA within the cytoplasm

plays an important role in determining cell polarity (Latham et al., 1994), differential cell

division (Long et al., 1997; Takizawa et al., 1997), organelle function (Margeot et al.,

2002), nuclear import (Levadoux et al., 1999), synaptic remodeling (Miller et al., 2002),

cancer metastasis (Shestakova et al., 1999), specification of germ cells (Kloc et al., 2001)

and embryonic axes (van Eeden and St Johnston, 1999). All these roles necessitate post-

transcriptional control.

In summary, post-transcriptional control is of considerable use as mRNAs can be

stored at high copy number, rapidly removed if necessary, moved to the exact location

where their product is required, grouped and regrouped into different packages, and

activated or silenced as required. This considerable power and flexibility in terms of

combinatorial assortment led Keene to propose that dynamic clusters of mRNAs from

groups of genes represent the eukaryotic equivalent of bacterial operons (Keene and

Tenenbaum, 2002). However, unlike their prokaryotic equivalents, these non-covalent

groupings can be reconfigured whenever the expression of their constituents needs to be

changed.

Although the regulation of mRNAs from synthesis to destruction is occasionally

controlled directly by the mRNA sequence itself (Hesselberth and Ellington, 2002), more

often than not the roles of mRNAs are almost always dictated by an array of proteins

(Shyu and Wilkinson, 2000; Dreyfuss et al., 2002). Proteins control the efficiency of

transcription, processing, nuclear export, translation, localization and degradation of

mRNA (Hieronymus and Silver, 2004). In recent years many proteins have been found to

bind mRNA and influence its lifecycle. However the exact functions of many of these

proteins remain elusive. In this introduction, I will describe some of the roles RNA-binding

proteins play in the control of gene expression and RNA-binding proteins that have the

potential for use in biotechnology applications.

1.2 RNA-binding proteins and their roles in gene expression

All mature eukaryotic mRNAs can be divided into three functional parts based on

their sequence: the 5’-untranslated region (UTR), the coding sequence and the 3’-UTRs

(Dreyfuss et al., 2002). UTRs contain regulatory cis-elements, such as AU-rich elements

(ARE) and internal ribosome entry sites (IRES), which are critical for post-transcriptional

control (Dassi and Quattrone, 2012). Trans-acting factors, which encompass RNA-binding

proteins (RBPs) or non-coding regulatory RNAs (ncRNAs), interact with these cis-elements

and influence the mRNA’s lifecycle (Dassi and Quattrone, 2012). In recent years, many

proteins have been found to bind mRNA and control the efficiency of transcription,

processing, nuclear export, translation, localization and degradation of mRNA. To illustrate

the many roles that RNA-binding proteins play in the control of gene expression, I will

describe the lifecycle of mRNA and highlight various important RNA-protein interactions.

1.2.1 Co-transcriptional recruitment of RNA-binding proteins during mRNA processing

Upon gene activation, the recruitment of the first RNA-binding protein, RNA

polymerase II (RNAPII), occurs. RNAPII bends DNA and unwinds it to reveal a single strand

that is used as a template for mRNA synthesis (Coulombe and Burton, 1999). As soon as

the newly synthesized RNA exits RNAPII, it is bound by an array of proteins that modifies

the pre-mRNA molecule to a mature functional mRNA so that it will be ready for export to

the cytoplasm. Processing typically involves attaching a 7-methyl guanine “cap” at the 5'

end of the transcript, removal of intronic sequences via splicing, and polyadenylation of

the 3' terminus (Lodish et al., 2000). RNAPII recruits a host of proteins involved in these

processes via the carboxyl terminal domain (CTD) of its largest subunit (Steinmetz, 1997).

Several studies have shown that CTD can function as both an assembly platform and a

regulator of transcription and pre-mRNA processing factors (Maniatis and Reed, 2002).

The active phosphorylation and dephosphorylation of CTD’s repeats is critical to its

function. CTD contains many phosphorylation sites and is the substrate for several kinases

and at least one phosphatase (Cho et al., 2001; Trigon et al., 1998; Rodriguez et al.,

2000; Hirose and Ohkuma., 2007; Sikorski and Buratowski., 2009). Recent work has shown

that the CTD provides a rallying point for proteins that bind mRNA and influence its life

cycle. Interestingly, structural studies predict that the CTD is positioned where RNA exits

the RNAPII transcription complex in the hyperphosphorylated state (Cramer et al., 2001;

2004) but undergoes a conformational change that takes it away from the exit tunnel

when dephosphorylated (Dahmus, 1995; Richard and Manley., 2009; Kuehner et al.,

2011). Many proteins that associate with RNAPII can only bind to one phosphorylation

state of the CTD. When the CTD is positioned adjacent to the exit tunnel, it has the

potential to allow the transfer of RNA-binding proteins immediately to the nascent mRNA.

Insulin-like growth factor-II mRNA-binding protein 1 (IMP1) and polypyrimidine-tract-

binding protein (PTB) are two examples of RNA-binding proteins that, associate with the

mRNAs they regulate while they are transcribed (Oleynikov and Singer, 2003; Urban et al.,

2000; Oberstrass et al.,2005).

1.2.2. Capping

The first processing step that a newly synthesized pre-mRNA undergoes is capping.

Capping is important because it protects the pre-mRNA from degradation, ensuring that it

is adequately stable to complete synthesis, processing and export (Walther et al. 1998;

Grudzien et al., 2006). In mammals, the bi-functional mRNA-capping enzyme catalytic

subunit (MceI) protein performs the removal of a phosphate group from the 5' of the

transcript and the attachment of a guanosine, then a 7-methyltransferase modifies this

MceI product (Bisaillon and Lemay, 1997). The quality of cap methylation is monitored by

the Rat1/Rai1 complex; pre-mRNAs with improperly methylated caps are degraded before

export to the cytoplasm (Jiao et al., 2010). Interestingly, the capping reaction occurs after

25-30 bases of pre-mRNA have been synthesized, the same point at which the CTD

becomes hyperphosphorylated (McCracken et al., 1997a). The cap plays important roles in

promoting subsequent pre-mRNA processing steps of splicing and 3' end processing,

assists translation and blocks 5'-3' degradation upon transfer to the cytoplasm (Parker and

Song, 2004). In the nucleus, the cap-binding protein (CBP) 80 - CBP20 heterodimer binds

to the cap and its interaction with other proteins coordinates the succeeding steps in pre-

mRNA processing (Schoenberg and Maquat, 2012).

1.2.3. Splicing

The next processing step required for the maturation of pre-mRNA involves the

removal of intronic sequences. Splicing is an extremely complex reaction that requires

over 300 proteins and five non-coding small nuclear RNAs (snRNAs) into a macromolecular

complex known as the spliceosome (Rappsilber et al., 2002; Wahl et al., 2009). Splice site

sequences generally do not have sufficient information to clearly specify exon–intron

boundaries, therefore various sequence motifs recognized by RNA binding proteins aid to

define and regulate splice site selection (Graveley, 2000; Smith and Valcárcel, 2000; Wang

and Burge, 2008). Regions to be removed are recognized within the pre-mRNA via short

sequence motifs located within and bordering the intron, and occasionally by more distant

exonic enhancers and suppressors (McCullough and Schuler, 1997; Zheng et al., 1998).

Five small nuclear ribonucleoprotein particles (snRNPs) combined with many other non-

snRNP factors come together to create a catalytically active spliceosome which removes

looped out intronic sequences in a two-step reaction (Patel and Bellini, 2008;

Zaphiropoulos, 1998; Will and Luhrnmann., 2011). While studying the transcriptional cycle

of the human dystrophin gene, it was noted that splicing occurred co-transcriptionally

(Tennyson et al., 1995) and soon afterwards a physical link between these processes was

obtained when CTD immunoprecipitates were found to contain splicing intermediates and

spliceosomal components (Vincent et al., 1996). Recently, based on the analysis of whole-

cell, total RNA sequencing, Ameur et al. (2011) suggested that co-transcriptional splicing

may be widespread in the human brain.

1.2.4. Polyadenylation

Polyadenylation, transcription termination and the release of the RNA from the

site of transcription are the final processing steps an mRNA must undergo prior to export

from the nucleus (Colgan and Manley, 1997; Yonaha and Proudfoot, 2000). In normal

mammalian cells, two major processes are required for polyadenylation. The first is the

cleavage step, which requires the cleavage-polyadenylation specificity factor (CPSF),

cleavage stimulation factor (CstF), cleavage factors I and II (CF I and II), RNAP II and poly(A)

polymerase (PAP; Zhao et al., 1999; Mandel et al., 2008). The site of poly(A) addition is

specified by the attachment of proteins to specific sequences in the 3'-UTRs of pre-mRNA.

The CPSF binds to the hexanucleotide polyadenylation signal sequence (AAUAAA) and the

CstF binds to a GU-rich downstream motif (Bentley, 2005; Takagaki and Manley, 1997; Coll

et al., 2010). Additionally, both CFI and CFII are required for cleavage of the pre-mRNA to

release the excess 3' sequences (Yang and Doublie, 2011). Following cleavage, a dedicated

poly(A) polymerase (PAP) associates with CPSF and uses approximately 20 nucleotides of

pre-mRNA as a primer for poly(A) addition. Shortly after PAP begins RNA synthesis,

poly(A)-binding protein II (PABII) engages the polyadenylation complex and stimulates

processive synthesis of a poly(A) tail between 200 and 300 nt in length (Bienroth et al.,

1993; Wahle., 1995).

Given RNAPII’s intimate association with the previous processing steps, it is not

surprising that it has been found to play a role in polyadenylation. Both CPSF and CSTF co-

purify in RNAPII preparations and CTD-affinity chromatography (McCracken et al., 1997b).

Evidence suggests that CPSF is recruited by transcription factor TFIID and transferred to

RNAP II at the time of transcription initiation (Dantonel et al.,1997; Hirose and Manley,

1998). This interaction is likely to be essential for efficient 3' end processing and

polyadenylation as CTD truncations disrupt these processes in vivo (Hirose and Manley,

1998; Licatalosi et al., 2002). The reciprocal is also true as without poly(A) site recognition

transcription termination by RNAPII is impaired (Dichtl et al., 2002). Once transferred to

the cytoplasm a specific poly(A)-binding protein (PABP) attaches to the poly(A) tail and

plays a number of roles in the remainder of its life cycle (Tarun et al., 1997; Fabian et al.,

2009; Kahvejian et al., 2005).

1.3 Mitochondrial gene expression

The mammalian mitochondrial DNA (mtDNA) is circular, double stranded and

relatively small, only 16.5 kb in size (Smeitink et al., 2001). The genome is compact,

encoding for two rRNAs, 22 tRNAs and 13 proteins that are translated from 9

monocistronic and 2 dicistronic mRNAs (Smeitink et al., 2001). Both dicistronic mRNAs

contain overlapping reading frames (Anderson et al. 1981, 1982). The condensed nature

of the mammalian mitochondrial genome has given rise to transcripts with many distinct

features and imposed post-transcriptional control of its gene expression that is

evolutionarily divergent from other eukaryotes (reviewed by Rorbach and Minczuki,

2012). Human mtDNA genome is transcribed by specialized machinery which includes the

nuclear-encoded mitochondria RNA polymerase (POLMRT), the mitochondrial

transcription factor A (TFAM) and one of the two mitochondrial transcription factor B

paralogues, TFB1M or TFB2M (Falkenberg et al., 2002; Kanki et al., 2004). Transcription

initiates at one of two divergently oriented promoters, the heavy (H)-strand promoter

(HSP) and light (L)-strand promoter (LSP), located in the D-loop regulatory region to

generate two long polycistronic, precursor transcripts that span the heavy and light

strands of the entire mtDNA (Ojala et al., 1981; Aloni and Attardi, 1971; Murphy et al.,

1975; Montoya et al., 1982). A third transcript covering the start of the heavy strand and

the two rRNA genes is also produced (Christianson and Clayton, 1988).

Splicing does not occur in mammalian mitochondria, instead the polycistronic

precursor RNAs are processed to produce the individual tRNA and mRNA molecules by

mitochondria RNase P (at the 5' end of tRNAs; Holzmann et al., 2008) and the

mitochondrial RNase Z (at the 3' ends of tRNAs; Brzezniak et al., 2011; Lopez Sanchez et

al., 2011). This mode of RNA processing is called the ‘tRNA puntuation model’ whereby all

of the protein and rRNA genes are immediately flanked by at least one tRNA gene (Ojala et

al., 1981). Mammalian mitochondrial mRNAs do not include introns, lack conventional 5'

and 3' untranslated regions (UTRs), Shine-Dalgarno sequences, lack 5' 7-methylguanosine

caps and base modifications (Montoya et al., 1981).

Following RNA processing, mitochondrial RNAs undergo maturation process that

involves a CCA triplet being added to the 3′ ends of tRNAs (Nagaike et al., 2001), as well as

modification of specific bases within both tRNAs and rRNAs (Nagaike et al., 2005; Sharma

et al., 2003) while mRNAs are generally polyadenylated at their 3′ ends, with the

exception of the MTND6 mRNA (Bobrowicz et al., 2008; Mercer et al., 2011; Temperley et

al., 2010; Slomovic et al., 2005). The addition of CCA is required for amino acids

attachments and to enable interactions with both the aminoacyl-tRNA synthetases and

elongation factor Tu (Levinger et al., 2004; Cusack, 1997). Translation is accomplished by

the mitochondrial ribosome, which is composed of a large 39S and a small 28S subunit

that associates to form the 55S particles (O'Brien, 1971).

Despite their common polycistronic origin, wide variation between the levels of

individual mRNAs, tRNAs and rRNA has been observed indicating that post-transcriptional

processing and degradation is significant in regulating mitochondrial gene expression

(Mercer et al., 2011). Like the majority of mitochondrial components, the proteins

necessary for replication, repair, transcription and translation are not encoded within the

organelle itself, but rather are encoded in the nucleus. In addition, regulation of the

processing of mitochondrial tRNAs can have vast effects on mitochondrial gene

expression, by affecting the levels of mature RNA species, the final processing of the

different RNAs, and the overall level of translation and mitochondrial function (Lopez-

Sanchez et al., 2011). RNA-binding proteins play essential roles in controlling the

mitochondrial transcriptome from its synthesis to its destruction and have evolved unique

features to complement the unusual features of mitochondrial RNAs.

PPR-containing proteins are RNA-binding proteins that enable interactions

between RNA and the enzymes that act on them in a site-specific manner; a few PPR-

containing proteins have been functionally characterized and implicated in RNA-

processing events in mitochondria (Mili and Pinol-Roma et al., 2003). For example,

MRPP3, one of the three protein component of the mt-RNaseP, is composed of 5 PPR

domains and a putative metallonuclease domain (Lopez Sanchez et al., 2011; Holzmann

and Rossmanith., 2009; Rossmanith and Holzmann., 2009). Studies have shown that

knockdown of RNase P subunits led to increase in the abundance of mitochondrial

precursor transcripts (Holzmann et al., 2008) and decrease in the levels of mRNAs and

tRNAs that consequently decreased mitochondrial translation, ribosome stability and

respiration (Lopez Sanchez et al., 2011). On the other hand, the mitochondrial RNase Z

has been found to associate with PTCD1, another PPR protein, that has been shown to

affect 3' processing of mitochondrial tRNAs and act as negative regulator of leucine tRNAs

(Lopez Sanchez et al., 2011, Rackham et al., 2009) although it is still not clear what the

role of PTCD1 is in mitochondrial tRNA metabolism. The N-terminal region of mammalian

POLRMT contains two putative pentatricopeptide repeat (PPR) motifs and it has been

hypothesized that these PPR domains could be involved in binding nascent mitochondrial

RNA transcripts to stabilize them during their synthesis (Rodeheffer et al., 2001; Ringel et

al., 2011). For many of these mitochondrial PPR proteins, the binding targets remain to be

elucidated; their identification should shed some light on their role in binding of

mitochondrial RNAs and transcription.

In summary, the opinion that transcriptional regulation is the principal regulatory

mechanism has been challenged by the discovery of ever-increasing examples of post-

transcriptional mechanisms for regulating gene expression. It has been shown that RNA-

binding proteins are key players in the post-transcriptional regulation of gene expression.

Over the past decade, a plethora of new RNA-binding proteins that possess potential for

re-engineering for biotechnology purposes have been discovered.

1.4 Candidate RNA-binding proteins for biotechnology applications

Amazing advances have been made in the field of DNA-binding proteins as they

can be custom designed for recognition of a specific target double-stranded DNA and are

now commercially available. Designer DNA-binding proteins are based on classical zinc-

finger (ZF) domains, and they have been developed to activate, silence or aid in the

modification of specific gene in vivo (Sera, 2009; Cathomen and Joung, 2008, Camenisch et

al, 2008). Xanthomonas transcription activator-like effectors (TALE) possess another DNA-

binding domain that has been engineered for biotechnological purposes. These are

proteins with tandemly arranged, nearly identical repeats of about 34 amino acids long,

with nucleic acid specificity almost completely defined by residues 12 and 13, which have

been referred to as repeat variable di-residues (RVDs).

In developing TALEs as biotechnological tools, the simplicity of the TALE code,

which specifies that one TALE repeat binds to one DNA base pair, has enabled the

construction of artificial transcription factors to turn on the expression of specific genes

(Moscou and Bogdanove., 2009; Boch et al., 2009; Zhang et al., 2011; Miller et al., 2010;

Morbitzer et al., 2010). There are instances where some repeats show more specificity

towards a particular DNA base, whereas others are able to recognize more than one base

(Figure 1.1). This has also led to the development TALE nucleases (TALENs), which are

artificial enzymes with programmable specificity; composed of a TALE DNA binding

domain fused to the non-specific DNA cleavage domain from FokI (Mussolino et al., 2011;

Wood et al., 2011; Hockemeyer et al., 2011; Cermak et al., 2011; Li et al., 2011).

Figure 1.1: The recognition code of TALE repeats for DNA bases. Glutamine and isoleucine at positions 12 and 13 (NI) recognize adenine; histidine and aspartate (HD) recognize cytosine; glutamine and glycine (NG) recognize thymine; glutamine and lysine (NK) recognize guanine; repeats with glutamine at both positions 12 and 13 (NN) recognize both guanine and adenine; glutamine and serine (NS) are able to bind all four bases. (Figure adapted from Filipovska and Rackham, 2011)

Given the increasing appreciation of the importance of post-transcriptional gene

regulation and because some aspects of gene expression can only be controlled at the

RNA level, it would be highly desirable to engineer proteins that can recognize RNA with

customized specificity. The potential of using RNA-binding proteins is similar to that of

short interfering RNAs (siRNAs) and microRNAs, which can be regarded as the leading

technologies in the field of RNA regulation at the moment (Liu and Paroo, 2010; Vaishnaw

et al., 2010; Perrimon et al., 2010). However, the use of these short RNA duplexes to

target RNAs is generally limited to lowering their abundance or expression in the

cytoplasm, and this depends on RNA interference pathways. Having designer RBPs fused

to mitochondria targeting sequence or any desired effector domain would thus offer

considerably flexibility for controlling RNA function and is set to become a valuable tool in

medical research.

Initially, designing RBPs was hindered by the lack of structural knowledge and

guidelines determining RNA-protein recognition. However, this situation is changing

rapidly given that several dozen structures of RNA-protein complexes from various

structural classes have been elucidated (Auweter et al., 2006). The diversity of RNA

structure also requires different strategies to be taken when designing these proteins

given that the RNA targets can be single- or double-stranded, as well as more complex

tertiary structures. Here I shall discuss a few perspective RBP candidates that have

qualities suitable for engineering.

1.4.1 Zinc finger proteins that bind RNA

More than two decade ago, the classical zinc finger (ZF) proteins, C2H2, were first

identified as a modular nucleic acid recognition element (Miller et al., 1985). The study

that elucidated to the possibility of ZFs being able to bind nucleic acid transpired when

Picard and Wegnez (1979) found C2H2 ZF motifs in transcription factor IIIA (TFIIIA), a

protein constituent that associates with 5S rRNA within a 7S particle in Xenopus oocytes.

The C2H2 DNA-binding properties were first proposed when the expression of the 5S rRNA

gene was shown to be regulated by TFIIIA (Pelham and Brown, 1980). TFIIIA contains nine

C2H2 ZFs, which are used to recognize both DNA and RNA targets, the 5S rRNA gene and

5S rRNA, respectively (Miller et al., 1985). The C2H2 ZF is a module of about 30 amino acid

residues, and each module constitutes an independent domain stabilized by a zinc ion

ligated to two cysteines and two histidines wth an inner structural hydrophobic core. Its

crystal structure shows that it folds into a small domain comprising two β strands followed

by an α helix (Pavletich and Pabo, 1991).

For DNA binding to the 5S rRNA gene, TFIIIA binds to three elements within the

gene’s internal control region. These are the 11 base pair ‘box A’ sequence, a 3 base pair

‘intermediate element’ sequence and a 10 base pair ‘box C’ sequence (Pieler et al., 1987).

The crystal structure of TFIIIA revealed that the binding occurs antiparallel whereby the of

the first three ZFs bind to the box C sequence, wrapping around the major groove of the

DNA, the fifth ZF was bound to the IE element and ZFs 7–9 interacted with the box A

element (Nolte et al., 1998; Lu et al., 2003). It was interesting to note the ZF 4 and 6 did

not interact with the DNA; they acted as non-binding spacers. It was later discovered that

ZF 4 and 6 are the most important for RNA binding to the 5S rRNA as they both bind to

elements in loop regions of the 5S rRNA using the N-terminal ends of their respective α

helices (Lu et al., 2003). ZFs are now known for their ability to recognize DNA and both

ssRNA and dsRNA, with structural information emerging for all these types of interactions.

More recently, two other ZF proteins have been found to bind RNA targets. These

are the CCCH class of ZF proteins and the RanBP2-type ZF domains. Firstly, the CCCH-type

ZFs, which were discovered in regulatory proteins such as muscleblind and Tis11d, possess

the ability to bind ssRNA and are involved in mRNA processing (Taylor et al., 1996). Each

of the CCCH ZF modules binds to the sequence UAUU. A study by Hudson et al. (2004)

revealed that the CCCH-RNA interaction is largely driven by hydrogen bonds mediated by

the protein backbone, while few side chain–mediated interactions define the specificity of

RNA recognition. On the other hand, Ran-binding domain-containing protein 2 (RanBP2) -

type ZF domains, named after the nuclear pore protein complex where eight of these

domains were found, are able to form base-specific contacts with the GGU sequence in

ssRNA (Loughlin et al., 2009). Loughlin et al. (2009) also found that the binding is

mediated predominantly by hydrogen bonding formed between (i) two arginine side

chains to each of the two guanines in the GGU recognition site, and (ii) uracil with two

asparagine side chains in the ZRANB2 ZF.

Although ZFs appear to be promising candidates for engineering, there are several

drawbacks posed by this technology. For example, the reliance on backbone-mediated

hydrogen bonding, specifically by the CCCH-ZF, for RNA recognition places limitations on

the range of sequences that could be recognized and also makes it less readily engineered

because the RNA-binding specificity would be highly sensitive to small variations in amino

acid sequences. It also remains to be shown if the CCCH-ZFs are able to bind ssRNA

(Mackay et al., 2011). Another limitation to the system can be exemplified by the fact that

RanBP2-type ZFs can only recognize 3 nucleotides, which mean that a total of 64 variants

would be required to recognize all possible triplets in potential target RNAs (Mackay et al.,

2011). This evidently makes RanBP2-type ZFs less practical to use in comparison to the

recognition efficiency of PUF proteins, which only requires four variants as each PUF

repeat recognizes a single nucleotide (See Section 1.4.6).

1.4.2 PUF proteins

The PUF family proteins, named after the founder members Drosophila

melanogaster PUMILIO and Caenorhabditis elegans fem-3 binding factor (FBF) proteins

are an evolutionary conserved family of RNA-binding proteins found in most eukaryotes

(Wickens et al, 2002; Murata and Wharton., 1995; Zhang et al., 1997). They are typically

involved in the regulation of gene expression at the mRNA level by binding to sequences

located in the 3’ UTR and promoting translational activation or repression via affecting the

stability of the transcripts (Wharton and Aggarwal., 2006; Wickens et al., 2002; Quenalt et

al., 2011). They have been shown to be involved in the regulation of embryogenesis,

development and differentiation (Sonoda and Wharton, 1999; Gamberi et al., 2002; Cho

et al., 2006; Prinz et al., 2007; Murata and Wharton, 1995; Wreden et al., 1997), neuronal

function (Dubnau et al., 2003; Menon et al., 2004; Mee et al., 2004; Ye et al., 2004; Vessey

et al., 2006; Muraro et al., 2008) and mitochondrial biogenesis (Garcıa-Rodrıguez et al.,

2007; Saint-Georges et al., 2008; Eliyahu et al., 2010)

1.4.3 PUF family of proteins

In different eukaryotes, the numbers of genes that encode PUF proteins differ

vastly. While Dyctiostelium, Anopheles and Drosophila species have only one PUF gene,

others such as Saccharomyces cerevisiae and C. elegans has six and eleven PUF genes

encoded in their genomes, respectively (Wickens et al., 2002). Xenopus, zebrafish, mouse

and humans have two PUF genes (Spassov and Jurecic, 2002; 2003; Wickens et al., 2002).

Recent studies have revealed the existence of one, two and ten PUF homologs

in Planaria, Plasmodium and Trypanosome, respectively (Salvetti et al., 2005; Cui et al.,

2002; Caro et al. 2006). In Arabidopsis thaliana, Francischinni and Quaggio (2009)

confirmed the identity of 25 Arabidopsis Pumilio (APUM) proteins of which 12 (APUM-1 to

APUM-12) have a PUF domain with 50-75% similarity to the Drosophila PUF domain. To

date, no PUF protein has been found in the Archaea or eubacteria species (Wickens et al.,

2002). Within a species, closely related subfamilies of PUF proteins can be found; for

example in C. elegans, there are the FBF-1 and FBF-2 homologs.

1.4.4 Functions of PUF proteins

PUF proteins have the ability to regulate diverse processes (Table 1.1). Various

studies have revealed that PUFs in each organism display both unique functions as well as

redundancy because each PUF family member not only has its own unique subset of

mRNAs that it binds to, but is also capable of sharing mRNA targets (Gerber et al., 2004;

Hook et al., 2007; Ulbricht and Olivas., 2008). An example of unique transcript targeting

by a PUF is S. cerevisiae Puf5 that is involved in inhibiting yeast cell differentiation to the

filamentous form by repressing the protein levels of the Ste7 MAP-kinase and the Tec1

transcriptional activator (Prinz et al., 2007). Additionally, Drosophila’s Pumilio is involved

in regulating anterior/posterior patterning in early embryos by repressing the translation

of hunchback mRNA via the recruitment of CCR4-NOT (CNOT) complexes which contain

several enzymes that catalyze mRNA deadenylation (Barker et al., 1992; Murata et al.,

1995; Van Etten et al., 2012). On the other hand, both S. cerevisiae Puf4 and Puf5 regulate

HO endonuclease by co-occupying the mRNA and destabilizing it (Hook et al., 2007).

Table 1.1: Various PUF proteins and their related functions

Organism PUF protein Biological Function Target mRNA Refs

Drosophila Pumilio Regulate presynaptic growth of neurons

eiF-4E Menon et al.(2009)

Posterior patterning hunchback Barker et al.(1992) Anterior patterning bicoid Wharton et al.(1991) Germline development and

differentiation cyclin B Lin et al.(1997)

Forbes et al.(1998) C. elegans FBF-1, FBF-2 Spermatogenesis/oogenesis

switch fem-3 Lamont et al.(2004)

MAP kinase phosphatase lip-1 Lee et al.(2007) Olfactory neuron adaptation egl-4 Kaye et al.(2009) PUF-5 –

PUF-7 Oocyte maturation and

differentiation glp-1 Lublin et al.(2006)

PUF-8 Spermatogenesis ? Subramaniam et al.(2003)

PUF-9 Differentiation of hypodermal stem cells

hbl-1 Nolde et al.(2007)

Xenopus Pum1 Oocyte maturation Cyclin B1 Nakahata et al.(2001)

Pum2 Oocyte maturation RINGO/Spy Padmanabhan et al.(2006)

T. brucei PUF9 Cell cycle replication LIGKA Archer et al.(2009) Regulation of organelle copy number PNT1, PNT2 Archer et al.(2009)

Organism PUF protein Biological Function Target mRNA Refs S. cerevisiae

Puf3p Mitochondria biogenesis PET123, COX17 Eliyahu et al.(2010) Jackson et al.(2004)

Puf4p Mating-type switching HO Hook et al.(2007) Puf5p Represses differentiation of

filamentous-form ? Prinz et al.(2007)

Localization of peroxisome protein

PEX14 Zipor et al.(2009)

Cell wall integrity LRG1 Kennedy et al.(1995; 1997)

Mating-type switching HO Hook et al.(2007) Human PUM1 Cell cycle regulation Cyclin B1, Cyclin

E2 Morris et al.(2008)

Histone mRNA binding protein SLBP Morris et al.(2008 Ribosomal subunit export to

cytoplasm SDAD1 Galgano et al.(2008)

Notch signaling pathway DII1 Galgano et al.(2008) PUM2 Translation initiation factor eIF4E Vessey et al.(2010) Vascular endothelial growth

factor VEGF-A Galgano et al.(2008)

MAP kinase ERK2, p38α Lee et al.(2007) Cdc42 effector CEP3 Spik et al.(2006)

PUF proteins also function as repressors of translation, in addition to new evidence

that suggests they can also contribute to the activation of mRNA expression via binding to

the 3’UTR of mRNAs (Pique et al., 2008; Suh et al., 2009; Kaye et al., 2009; Archer et al.,

2009) and assist in subcellular targeting of mRNAs (Gu et al., 2004; Deng et al., 2008;

Vessey et al., 2006 and 2010; Elihayu et al., 2010). Drosophila Pum, an example of a

translational repressor, is involved in the regulation of posterior segmentation and

abdomen formation in the early fly embryo. Pum is required for establishing the anterior-

posterior gradient of the transcription factor hunchback (hb), whose absence at the

posterior end of the fly embryo enables the formation of eight abdominal segments

(Lehmann and Nusslein-Volhard, 1991). Pum binds to NRE within the 3’UTR of hb mRNA

and causes a translational arrest; a repression that occurs only at the posterior end of the

embryo as it requires association with the zinc finger protein, Nanos (Nos) (Murata et al.,

1995; Wreden et al., 1997; Sonoda and Wharton, 1999; Asaoka-Taguchi et al., 1999). Nos

and another protein Brat (Brain Tumor) are simultaneously recruited by Pum-NRE to form

a ribonucleoprotein complex that promotes deadenylation and destabilizing of hb mRNA

(Wreden et al., 1997; Sonoda and Wharton, 1999).

On the other hand, the exact mechanism of how PUFs are involved in mRNA

activation is yet to be determined but there are studies which propose its effects are likely

to be direct (Quenalt et al., 2011). The direct pathway suggests that the PUFs bind to the

mRNAs they activate, leading to an upregulation of the transcript levels; this requires PUF

binding in the 3’UTR of the mRNA (Pique et al., 2008; Suh et al., 2009; Kaye et al., 2009;

Archer et al., 2009). A study by Archer et al. (2009) examining the regulation of gene

expression in the cell cycle of Trypanosoma brucei found that the T. brucei PUF9 stabilizes

certain mRNA transcripts during the S-phase of the cell cycle. They also noted that the

levels of PUF9-regulated transcripts were cell cycle dependent, peaking between the mid-

to late- S-phase. Knocking down PUF9 resulted in the reduction of Puf9 target transcripts

as well as increasing presence of extra nuclei and kinetoplasts, an indication of de-

coupling of their biogenesis from cell division (Archer et al., 2009).

As previously mentioned, PUFs themselves have the ability to assist in the

localization of mRNA by acting as targeting factors. A prime example of this is S.

cerevisieae Puf3p, which is a multi-function PUF that localizes to mitochondria and

contributes to mitochondrial protein synthesis, organization, respiration and biogenesis

(García-Rodríguez et al., 2007). Previous studies have discovered that Puf3p preferentially

binds to mRNAs of nuclear-encoded mitochondrial proteins (Puf3p binds to 87% of the

154 mRNAs encoding proteins that localize to mitochondria). García-Rodríguez et al.

(2007) found that Puf3p works by localizing to the cytosolic face of the mitochondrial

outer membrane and that overexpression of the protein resulted in reduced

mitochondrial respiratory activity and reduced levels of Pet123p (a protein encoded by a

Puf3p-bound mRNA). Puf3p is also known to bind to a consensus motif in the 3'UTR of

many mRNAs encoding mitochondrial proteins (Gerber et al., 2004; Jackson Jr et al.,

2004). There is also evidence that PUF3p deletion leads to a decrease in mRNA

deadenylation and a doubling of the half-life of COX17 mRNA (Olivas and Parker, 2000).

Therefore, Puf3p proteins probably control the transport and stability of a specific set of

mRNAs involved in mitochondrial biogenesis.

1.4.5 Human PUF proteins

There are only two Pumilio related genes present in humans, PUM1 and PUM2.

The nomenclature coincides with the chromosomal localization of these genes on human

chromosomes 1 and 2. PUM1 is found on chromosome 1p35.2, spanning approximately

150kb with 22 exons, whereas PUM2 is found on chromosome 2p23–24, spanning at least

80kb and composed of 20 exons (Figure 1.2; Spassov and Jurecic, 2002). PUM1 and PUM2

encode 127 and 114 kDa proteins with evolutionarily highly conserved PUF RNA-binding

domains (86 and 88% identity with the fly Pum protein). Overall, they share 75% in overall

similarity, with their highly conserved RNA-binding domain, called PUM-HD, being 91%

identical (Spassov and Jurecic, 2002). PUM-HD spans eight exons, and is encoded by exons

15–22 in PUM1 and exons 13–20 in PUM2 gene. In addition, the sizes of these exons are

identical in PUM1 and PUM2, reflecting the conservation of gene structure. Hence, it is

not surprising that the C-terminal end of both proteins are highly homologous to that of

Drosophila Pum (78% identity for PUM1 and 79% identity for PUM2) compared to the N-

terminal end of PUM1 and PUM2 which shows a low degree of sequence conservation and

variation in size. This is caused mainly by the truncation of the N-terminal part in both

human proteins as the fly Pum protein is 1534 aa long, whereas PUM1 is 1186 aa and

PUM2 is only 1064 aa long. PUM1 gene also contains two additional exons that encode for

extra 128 amino acids at the N-terminus. Some of the exons that encode the N-terminal

part of human Pum proteins (exons 1–14 in PUM1 and 1–12 in PUM2) have slightly

different sizes stemming from small in frame insertions or deletions.

(adapted from Jurecic and Spassov, 2002) Figure 1.2: Human PUM1 and PUM2. The PUM1 gene consists of 22 exons, whereas PUM2 gene consists of 20 exons. PUM-HD is encoded by exons 15 – 22 in PUM1 gene, and exons 13 – 20 in PUM2.

Both PUM1 and PUM2 show relatively widespread and mostly overlapping

expression in human tissues, with the only difference being that PUM1 does not seem to

be transcribed in the cerebellum, amygdala, corpus callosum, caudate nucleus, medulla

oblongata, hippocampus and putamen (Spassov and Jurecic, 2002).

1.4.6 Features of PUF proteins

The RNA-binding domain of PUF proteins is generally composed of eight

consecutive 36 amino acid repeats that are very similar, flanked by a degenerate capping

repeat at each end (Figure 1.3; Zamore et al., 1997). The crystal structure of both

Drosophila Pumilio and its human homolog, PUM1, showed that the collection of repeats

form an extended arc (Wang et al., 2002; Edwards et al., 2001). Each of the PUF repeats

consists of three alpha helix bundles that stack onto each other to form the arc whilst the

degenerate capping repeats have only one or two helices that approximate the shape of

the canonical repeats (Zamore et al., 1997; Wang et al., 2001; Edwards et al., 2001;,

Quenalt et al., 2011).

Figure 1.3: PUF protein crystal structure and binding motif (a) The crystal structure of the human PUM1 PUF domain bound to its native RNA, NRE (b) Schematic representation of the recognition of RNA bases in the NRE RNA by the PUF repeats of PUM1. (c) Recognition of adenine (top), uracil (middle), and guanine (bottom) by PUF repeats 5, 6 and 7 in the crystal structure of PUM1, respectively.

PUF proteins recognize specific sequences, known as the Nanos Response

elements (NREs) present at the 3’UTR of the target mRNA (Murata and Wharton, 1995).

The RNA binds to the inner concave surface of the protein, where each repeat binds to a

single base of the RNA. Amino acids at positions 12 and 16 of the PUF repeat bind each

RNA base via hydrogen bonding or van der Waals contacts with the Watson-Crick edge,

while the amino acid at position 13 makes a stacking interaction. The recognition of RNA

by naturally occurring PUF domains is base-specific (Wang et al., 2002), such that

asparagine and glutamine bind uracil; cysteine and glutamine bind adenine; and serine

and glutamate bind guanine. The RNA runs antiparallel to the protein whereby

nucleotides 1-8 are recognized by PUR repeat 8-1 respectively (Wang et al., 2002). The

modular nature of the interactions has enabled the sequence specificity of the PUF to be

altered by simply mutating the residues that make contacts with the Watson-Crick edge of

the base (Cheong and Hall, 2006). This has paved the way for PUF domains to be

engineered to recognize endogenous RNAs composed of adenine, guanine or uracil (Wang

et al., 2002; Wang et al., 2009; Tilsner et al., 2009). Despite the potential usefulness of

PUF domains as tools, they have not been widely adopted because naturally occurring

residues that recognize cytosine have not been found. This limits the potential target sites

for engineered PUFs and for small RNAs or defined regions of a larger RNAs, making it

impossible to engineer a PUF protein to bind them.

1.5 Pentatricopeptide repeat (PPR) proteins

Pentatricopeptide repeat proteins or PPRs were first discovered by Small and

Peeters (2002) when they examined the genome of Arabidopsis thaliana whilst searching

for genes encoding for proteins that were predicted to be imported into chloroplasts and

mitochondria. They initially found 200 genes belonging to a unique unreported gene

family. Further analysis uncovered almost 450 independent genes belonging to this family

that could be separated into two subfamilies with four subclasses (Small and Peeters;

2000; Lurin et al., 2004). PPRs were found to be involved in regulating many aspects of

mitochondria and chloroplast function; these include mRNA processing, stability, splicing,

editing and translation (Aubourg et al., 2000; Small and Peeters, 2000; Shikanai, 2006).

1.5.1 Members of the PPR family and its functions

PPR proteins are more common in plants than they are in fungi and vertebrates.

Plant genomes encode nearly 500, with more than 400 predicted PPR proteins in

Arabidopsis thaliana (Schmitz-Linneweber and Small., 2008). In yeast, nearly 200 potential

PPRs have been identified (Lipinski et al., 2011) and recently some of them have been

studied in Schizosaccharomyces pombe (Kuhl et al., 2011). With the exception of

Trypanosomes, which have 28 PPRs (Mingler et al., 2006), animal genomes generally

encode few to several dozen PPRs. In mammals, there are only seven mitochondrial PPR

proteins that have been identified. These are the leucine-rich PPR cassette (LRPPRC)

protein, the mitochondrial RNA polymerase (POLRMT), PPR domain containing proteins

(PTCD) 1, 2, and 3, the mitochondrial ribosomal protein of the small subunit 27 (MRPS27)

and mitochondrial RNase P protein 3 (MRPP3; Holzmann et al., 2008; Lightowlers and

Chrzanowska-Lightowlers., 2008).

Computational comparison of PPRs from different organisms conducted by Lipinski

et al. (2008) showed that methods that predict PPRs in plants are suboptimal in other

eukaryotes. This suggests that there is significant divergence of these motifs, and this

necessitates thorough analyses and prediction programs for the identification of PPRs in

eukaryotic genomes. Comparison of orthologous PPR proteins has indicated that this

family of proteins has undergone accelerated divergent evolution (Lipinski et al., 2011;

O’Toole et al., 2008). The accelerated divergent evolution could be attributed to the

coevolution of the proteins along with their RNA targets as a result of intragenic genetic

interactions or nucleo-organellar genetic buffering that are the result of cooperative

functional interactions between PPR motifs (Fujii et al., 2011; Lipinski et al., 2011; O’Toole

et al., 2008). The important point is that the presence of PPRs in all of these proteins along

with other eukaryotic PPR proteins has aided the prediction of their common role in RNA

binding.

Many genetic studies found that PPR proteins have essential roles in diverse plant

phenomena, such as embryogenesis (Cushing et al., 2005), fertility restoration of

cytoplasmic male sterility (Chase, 2007), maintenance of chloroplasts and mitochondria

(Schmitz-Linneweber and Small, 2008), abiotic stress response (Zsigmond et al., 2008),

organelle-to-nuclear signaling (Koussevitzky et al., 2007) and metabolite biosynthesis

(Kobayashi et al., 2007). Research on each individual mammalian PPR protein have

discovered that they mostly have different and unrelated functions in organelle RNA

metabolism (Rackham and Filipovska., 2011; Davies et al., 2011; Davies et al., 2009; Gohill

et al., 2010; Lightowlers and Chrzanowska-Lightowlers., 2008; Mili and Pinol Roma, 2003;

Rackham et al., 2009; Mootha et al., 2003; Sasarman et al., 2010 Sondheimer et al., 2010;

Sterky et al., 2010; Xu et al., 2008; Ruzzenente et al., 2011).

Several PPR proteins have been shown to interact with RNA by in vitro studies

(Nakamura et al., 2003; Okuda et al., 2007; Hammani et al., 2011; Prikryl et al., 2011), or

by co-immunoprecipitation (Schmitz-Linneweber et al., 2005; Beick et al., 2008). These

PPR proteins interact with either a single specific RNA or a small subset of RNA molecules

and affect numerous aspects of RNA metabolism, including RNA editing (Kotera et al.,

2005), cleavage (Gobert et al., 2010), RNA stability (Pfalz et al., 2009), splicing (Schmitz-

Linneweber et al., 2006), translation or a combination of these functions (Yamazaki et al.,

2004). It has been suggested that the PPR motif itself does not catalyze any RNA

processing; instead it is proposed that PPR proteins act as adapters, with the tandem array

of PPR motifs facilitating binding to nucleic acids in a sequence-specific manner. Of

interest, studies have found that the majority of PPR proteins function by recruiting

accessory proteins that possess RNA degrading or modifying functions, or to block these

proteins from their target RNAs (Schmitz-Linneweber and Small, 2008).

1.5.2 PPR motif and structure

PPR proteins are made up of 2-26 copies of a degenerate motif, approximately 35

amino acids, organized as a tandem array (Schmitz-Linneweber and Small, 2008; Small and

Peeters, 2008). All known PPR proteins have been found to be nuclear encoded, with most

predicted to localize to the chloroplast or mitochondria (Lurin et al., 2004). They share

similarities in primary structure to the tetratricopeptide (TPR) repeat; while the function

of the TPR motifs is to mediate protein-protein interactions, PPR domains are mainly

involved in RNA-protein interactions (Small and Peeters, 2000). In vitro and in vivo studies

have confirmed that PPR domains interact with RNA (Nakamura et al., 2003; Okuda et al.,

2007; Hammani et al., 2011; Prikryl et al., 2011) and it is suggested that PPR proteins

function as adapters, facilitating binding to nucleic acid in a sequence specific manner.

Figure 1.4: PPR proteins. (a) Schematic representation of a typical PPR protein, human PTCD3 (Davies et al., 2009). Locations of predicted PPRs and an N-terminal mitochondrial targeting sequence (MTS) are shown. (b) Crystal structure of the human POLRMT protein containing two tandem PPRs solved by Ringel et al. (2011) (adapted from Filipovska and Rackham, 2012). The C-terminal PPR is highlighted in purple and the N-terminal PPR is highlighted in orange.

Ringel et al. (2011) solved the structure of a mitochondrial RNA polymerase that

contained two PPR motifs (Figure 1.4b). The crystal structure showed that the 35 amino

acid PPR motif was comprised of two anti-parallel α-helices. Initially, the helices were

predicted to be arranged in tandem arrays to form a superhelix structure with a central

hydrophilic cavity where the RNA phosphate backbone could potentially interact (Small

and Peeters, 2000; Tavares-Carreon et al., 2008). The helical-hairpin model has been

confirmed experimentally via both circular dichroism spectrum analysis and analytical

ultracentrifugation using maize PPR5 (Williams-Carrier et al., 2008). Structural prediction

proposed that helix A of the PPR motif is located at the concave surface. The inner surface

of the protein is positively charged enabling interactions with the negatively charged

backbones of nucleic acids to occur (Delannoy et al., 2007).

Recently, Howard et al. (2012) solved the crystal structure of a protein-only RNAse

P (PRORP) from the Arabidopsis thaliana. This protein is different to any other RNAse P in

that it does not possess a catalytic RNA component. PRORP1 is one of three PRORP

enzymes encoded by A. thaliana and it localizes to mitochondria and chloroplasts (Gobert

et al., 2010). The crystal structure revealed that PRORP1 is composed of three discrete

domains, one of which is a PPR domain composed of 11 α-helices forming 5.5 consecutive

PPR repeats, each consisting of a helix-turn-helix hairpin (Howard et al., 2012). The

domain arrangements resembled that seen in tetratricopeptide repeat (TPR) motifs

whereby the tandem helical repeats associate to form a right-handed superhelical

structure (Howard et al., 2012). Howard et al. (2012) also found that PRORP1 has an

overall neutral electrostatic surface potential at the concave surface facing the putative

active site, suggesting that the PPR–nucleic acid interaction is not mainly electrostatic.

1.5.3 RNA recognition code of PPR proteins

Using co-variation analysis to determine phylogenetically conserved amino acids,

clues to the RNA recognition code of these proteins have been deciphered. Recently,

Kobayashi et al. (2012) used truncations of the Arabidopsis HCF152 protein, which is

composed of two adjacent PPRs, to perform widespread mutagenesis in order to identify

amino acids that are important for RNA-binding and specificity. This study identified five

residues at positions one, four, eight, 12 and 34 as ones that are imperative for high

affinity RNA-interaction (Kobayashi et al., 2012). These five residues are aligned such that

they are exposed on the solvent surface of the PPR protein, although the structure does

not necessarily imply the mechanism for PPR-RNA binding (Ringel et al., 2011). They also

highlighted that residue 4 appears to be particularly important for PPR function as

substitutions at that position resulted in drastically reduced RNA binding affinity, and that

the 4th residue possesses inter- and intra-connections with all adjoining residues

(Kobayashi et al. 2012).

In another study, Barkan et al (2012) used computational methods to deduce a

code for nucleotide recognition of PPR proteins by using the maize protein PPR10, which

consists of 19 PPR motifs, as a model. They found strong correlations between the RNA

base and the amino acids at positions 6 and 1’ (corresponds to position 4 and 34 in the

Kobayashi et al. (2012) study), which were suggested to be specificity-determining

positions based on their patterns of evolutionary selection (Barkan et al., 2012; Fujii et al.,

2011). Using mobility shift assays to test whether there are correlations between the

amino acid identities at those PPR positions to RNA-binding specificity, they found that

PPRs recognize RNA in a modular manner, in a parallel orientation, with the amino acid at

positions 6 and 1′ in each repeat determining base preference (Barkan et al., 2012). In the

context of two adjacent PPR motifs, other amino acid positions appear to not affect

nucleotide specificity as amino acid changes at positions 6 and 1’ was sufficient to change

the nucleotide preference (Barkan et al., 2012). Although conceptually comparable to

PUF/RNA recognition, PPR/RNA complexes involve distinct amino acid combinations and

have opposite polarity (Barkan et al., 2012). Their results define a combinatorial two-

amino acid code that can specify binding of a PPR motif to either A, G, U>C, C>U, or U = C

(Table 1.2).

Table 1.2: Amino acid code of PPR proteins for RNA binding (Barkan et al., 2012)

Amino Acid Nucleotide Preference Position 6 Position 1’

Threonine (T) Aspartic Acid (D) G >>> A,C,U Threonine (T) Asparagine (N) A >>> G,C,U

Asparagine (N) Aspartic Acid (D) U > C >>> A,G Asparagine (N) Asparagine (N) C = U >>> A,G Asparagine (N) Serine (S) C > U >>> A,G

Barkan et al. (2012) highlighted that the prediction of the natural binding sites of

PPR proteins and off-target binding estimation by synthetic PPR proteins would prove to

be challenging because the RNA-binding code is degenerate, with less than two-thirds of

naturally occurring combinations can be deciphered, and that there is still a lack in

understanding of the energetic requirements in establishing a physiologically relevant

PPR/RNA interaction.

1.5.4 Human PPRs

As previously mentioned, there are only seven identified mitochondrial PPR

domain proteins in mammals to date. These are the mitochondrial RNA polymerase

(POLRMT), the leucine-rich PPR cassette (LRPPRC) protein, PPR domain containing

proteins (PTCD) 1, 2, and 3, mitochondrial RNase P protein 3 (MRPP3) and mitochondrial

ribosomal protein of the small subunit 27 (MRPS27)(Rackham and Filipovska, 2012;

Rackham et al., 2012; Figure 1.5). There remains some discrepancy about the number of

PPR domains that each of these mammalian PPR proteins possess due to the scarcity of

structural and functional data on PPR domains and how they bind RNA. This makes it

challenging to precisely define functional PPR domains. Experimental and structural

evidence is necessary to explain the mechanism of their interaction with RNA. Studies are

on-going investigating the association of RNA with PPR proteins; it is worth noting that

these proteins are highly insoluble, causing delays in determining their atomic structure.

Figure 1.5: Mammalian mitochondrial PPR proteins. Schematic representation of the seven mammalian mitochondrial PPR domain proteins. (adapted from Rackham and Filipovska, 2011)

A very brief summary of each human PPR proteins is as follows. The leucine-rich

pentatricopeptide repeat cassette (LRPPRC) protein is primarily a mitochondrial matrix

protein that is 130 kDa in size and contains 22 predicted PPR domains (Sterky et al., 2010;

Mili and Pinol-Roma., 2003; Mootha et al., 2003; Xu et al., 2004). It is still unclear which

aspect of gene expression and mitochondrial RNA metabolism is affected by LRPPRC

reduction but it has been shown that LRPPRC may be responsible for mitochondria mRNA

stability and co-ordinated translation (Ruzzenente et al., 2011). MRPP3 is a 67 kDa

mitochondria targeted protein that is composed of 3 PPR domains and a putative

metallonuclease domain. MRPP3 has recently been identified as one of the three essential

components of the mitochondria targeted RNase P (Holzmann et al., 2008). Recently, it

has been shown that MRPP3 is necessary for the processing of mitochondrial tRNAs

(Holzmann et al., 2008; Lopez-Sanchez et al., 2011; Holzmann et al., 2009; Rossmanith and

Holzmann, 2009). The mechanism of MRPP3 is still unclear and it has been speculated that

the PPR domains in MRPP3 recognize and bind the substrate tRNA, while the

metallonuclease domain carries out the cleavage of the tRNA from the precursor

transcript.

MRPS27 is a PPR domain protein that has six putative PPR domains located in

tandem towards the N-terminus of the protein and it has been shown to associate with

the small subunit of the mitochondrial ribosome and to be required for mitochondrial

translation (Davies et al., 2012). The human POLRMT is a single polypeptide subunit that is

139 kDa in size prior to import into the mitochondria, and is essential for the transcription

of the mitochondrial genome (Falkenberg et al., 2007). POLRMT transcription of the

mitochondrial genome requires the presence of both the mitochondrial transcription

factor A (TFAM) and one of the two mitochondrial transcription factor B paralogues

(TFB1M and TFB2M) (Falkenberg et al., 2002; Kanki et al., 2004). PTCD1 is a mitochondrial

matrix protein that contains eight PPR domains and is predominantly found in muscle and

heart (Rackham et al., 2009). PTCD1 is thought to be involved in negatively regulating

leucine tRNA levels and consequently affects the abundance of mitochondria encoded

proteins (Lopez Sanchez et al., 2011; Rackham et al., 2009). PTCD2, a 44 kDa mitochondria

targeted protein, contains 5 PPR domains and a study by Xu et al. (2008) indicated that

PTCD2 may regulate Cyt b RNA processing in mice. PTCD3 is a 79 kDa protein that contains

15 PPR domains and has an important role in the translation of mitochondrial proteins by

association with the small subunit of mitochondrial ribosomes and the 12S rRNA (Davies

et al., 2009).

Overall, human PPRs have diverse roles in mitochondrial gene expression. Their

modularity has enabled them to have numerous RNA regulatory functions, however their

RNA binding target sequence have yet to be determined. Identification of their target

RNAs would allow better understanding into their functions and modes of regulating the

expression of mitochondrial mRNAs. To date, PPR proteins have not been used in

biotechnological applications. Although key aspects of the code by which PPRs recognize

RNA have been discovered, the base specificity of only a few amino acid combinations

have been experimentally verified and the structural and mechanistic details of how they

bind RNA are unknown. Furthermore, applications of PPR proteins have been severely

limited by their inherent insolubility when expressed recombinantly. Elucidating the RNA

recognition code of PPR proteins would not only allow us to predict their RNA targets but

perhaps to engineer them as tools to selectively and specifically manipulate mammalian

mitochondrial gene expression.

1.6 Purview of the thesis

There are many potential applications of designer RNA-binding proteins for

biotechnological and medical use, given the importance of posttranscriptional regulation.

However, before we are able to engineer them for these purposes, we must first obtain

the complete modular amino acid code in order to be able to predict their binding

specificity. Here I describe the use of directed evolution to expand PUF repeat recognition

beyond adenine, guanine and uracil in order to specifically bind cytosine. One of the

factors considered prior to selecting PUFs for protein engineering was that randomizing

amino acid residues in PUF proteins are more apparent and limited because full-

randomization can be achieved by mutating the two amino acids that make contact with

the RNA bases. PUF proteins also have the ability to undergo combinatorial selection,

which entails the capacity to randomize the RNA-binding residues to enable

programmable protein specificity. It is not surprising that many studies have successfully

engineered PUFs to bind their specific target of interest (Lu et al., 2009; Ozawa et al.,

2007; Wang et al., 2009). I have also demonstrated that these PUF repeats can be

engineered to selectively bind targets beyond its native eight-repeat RNA sequence and

that binding can be achieved with RNA targets of diverse structure.

Additionally, I describe how the RNA recognition code of PPR proteins was

successfully deciphered using a consensus protein design composed of the most common

amino acids residues at each position of the PPR motif. Our success with the PUF proteins

led to the work on PPR proteins because they too are able to undergo combinatorial

selection, although it was unclear at the start of the study as to which amino acid residue

positions were responsible for base recognition. We also took into consideration the fact

that both PUF and PPR proteins have contiguous recognition. This is important because

theoretically, better specificity would be achieved if the spaces (or lack thereof) between

each RNA-binding domain were fixed (Wang et al., 2002). Double ZFs is an example of an

RBP which possesses a spacer in between each repeat unit. In order to maximize

specificity, one would have to optimize not only the linker length but also its composition

(Handel et al., 2009; Shimizu et al., 2009). With the newly deciphered RNA-binding PPR

code, we were able to engineer a synthetic PPR protein that was able to target the poly(A)

tail of mitochondrial mRNAs, highlighting the ability of using designer RBPs to target

transcripts not accessible to RNAi technologies. Overall, not only does this study better

our understanding of RBP-RNA recognition, it also provides a glimpse into their potential

application to better understand the complex patterns of gene expression in diseases.

CHAPTER 2

Materials and Methods 2.1 Materials

All chemicals and materials used in this study were of analytical grade and were

sourced from Amresco Inc., DIFCO Laboratories or Sigma Chemical Company, unless

indicated below.

BD (Becton, Dickinson and company): Bacto Yeast Extract, YPD Broth, Bacto Agar, Bacto

Tryptone

Beckman: 50 ml and 500 ml Ultracentrifuge tubes

Bio-Rad Laboratories Inc.: Mini-PROTEAN 1D-Electrophoresis system, PowerPac Universal

power supply

Fermentas: DreamTaq DNA Polymerase, 10X DreamTaq Green Buffer, high-fidelity

restriction enzymes and associated reaction buffers, GeneJET Plasmid Miniprep Kit,

GeneJET PCR Purification Kit and O’GeneRuler 1kb DNA Ladder Plus, 6x Orange DNA

loading dye

GE Lifesciences: 0.45 μm Hybond-N+ nitrocellulose membrane

Greiner Bio-One: 15 ml and 50 ml Falcon tubes, Cellstar 96-well plate, Cellstar 10 and 25

ml pipette tip

Invitrogen Corporation: DH10B E.coli, pcDNA3 plasmid, Dulbecco’s modified Eagle’s

medium, DMEM, Lipofectamine 2000, RNaseOUT, AcTEV protease, OptiMEM media

Ito, T. (University of Tokyo): pGAD-RC plasmid

New England Biolabs: All restriction endonucleases, T4 ligase, Phusion DNA polymerase,

DNA polymerase I Large (Klenow) fragment and associated reaction buffers,

deoxyribonucleotides (dNTPs), ER2566 E.coli, chitin beads, pTYB3 plasmid.

Novagen: BL21(DE3) E.coli, Rosetta E.coli

Perkin-Elmer: Expres35S Protein Labeling Mix (35S)

Promega: Beta-Glo Assay System

Qiagen: RNeasy Mini kit, QuantiTect Reverse Transcription Kit, miRNeasy Mini kit

Roche: Complete protease inhibitors, Fugene HD

Sarstedt: 90 mm x 14 mm petri dishes

Scientific Specialties Inc.: 1.7 ml graduated microtubes, Ultraflux 200 µL flat top PCR

Sigma: rabbit-IgG agarose

Wickens, M. (University of Wisconsin at Madison): pIIIA/MS2-2 plamid, RNA expression

plasmid

Thermo Scientific: Top Vision agarose, PageRuler Plus Prestained Protein Ladder Slide-A-

Lyzer mini dialysis units (3,500 MWCO), FastAP Thermosensitive Alkaline Phosphatase

2.2 Methods

Unless otherwise indicated, all methods were performed according to Current

Protocols in Molecular Biology (Ausubel, 1987), Current Protocols in Cell Biology

(Bonifacino, 1998), or manufacturers’ instructions. Specifically, restriction digestion and

ligation reactions were performed according to protocols provided by New England

Biolabs. PCR using Phusion DNA polymerase or DreamTaq DNA polymerase used protocols

provided by New England Biolabs and Fermentas, respectively, with “touch-down” cycling

according to Current Protocols in Molecular Biology (Ausubel, 1987). Gel extraction, PCR

purification and plasmid minipreps were performed according to instructions from

Fermentas. Annealing of oligonucleotides was performed according to Current Protocols

in Molecular Biology (Ausubel, 1987). RNA extraction and cDNA synthesis was performed

according to instructions from Qiagen. Basic cell culture techniques were carried out

according to Current Protocols in Cell Biology (Bonifacino, 1998).

2.2.1 Plasmid construction

To produce a Gal4p activation domain fused to a PUF domain, a synthetic gene

encoding amino acids 828 to 1176 of the human PUM1 protein (GenBank accession no.

NP_001018494, GENEART) was subcloned into pJC72 (Rackham and Chin, 2005). The

synthetic PUM1 gene was cut with NcoI and XhoI and ligated into a FastAP treated

NcoI/XhoI cut pJC72 plasmid. This plasmid was used as a template for library construction

by enzymatic inverse PCR (Rackham and Chin, 2005) using primers where the codons

corresponding to amino acids 1043 and 1047 were encoded by mixtures of trimer

phosphoramidites encoding all 20 amino acids (GeneWorks). This library of mutant PUF

domains was subcloned into the yeast expression plasmid pGAD-RC that had been cut

with NcoI and XhoI (Ito et al., 2000). Individual PUF domain mutants were also made by

enzymatic inverse PCR where two pairs of primers specifying cysteine or serine at amino

acid 1043 and glutamine or glutamate at amino acid 1047 were designed for mutating

repeat 6 of human PUM1 protein for yeast three-hybrid sensitivity testing. To make a 16

repeat Puf protein (PUFx2) repeats 1-8 of the human PUM1 cDNA were amplified using

primers that incorporated flanking SacI sites, digested with SacI and cloned into an

engineered SacI site that encodes amino acids 1030 and 1031 of the synthetic gene

encoding the PUM1 PUF domain.

RNA expression plasmids were made by altering the multiple cloning site of pIIIA/MS2-2

(Stumpf et al., 2008) according to Cassiday and Maher (2001) and sub-cloning pairs of

annealed oligonucleotides corresponding to the following RNA sequences (PUF

recognition sequences in bold, site specific mutations underlined):

NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU1C: 5'-CCGGCUAGCAAUCGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREG2C: 5'-CCGGCUAGCAAUUCUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3A: 5'-CCGGCUAGCAAUUGAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3'; NREA4C: 5'-CCGGCUAGCAAUUGUCUAUAUUAAUUUAAUAAAGCAUG-3'; NREU5C: 5'-CCGGCUAGCAAUUGUACAUAUUAAUUUAAUAAAGCAUG-3'; NREA6C: 5'-CCGGCUAGCAAUUGUAUCUAUUAAUUUAAUAAAGCAUG-3'; NREU7C: 5'-CCGGCUAGCAAUUGUAUACAUUAAUUUAAUAAAGCAUG-3'; NREA8C: 5'-CCGGCUAGCAAUUGUAUAUCUUAAUUUAAUAAAGCAUG-3'; NREstem5: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAAUAAAGCAUG-3'; NREstem6: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUUAAAGCAUG-3'; NREstem7: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUAAAAGCAUG-3'; NREstem8: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUACAAGCAUG-3'; NREx2: 5'-CCGGCUAGCAAUUGUUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREx2mut1: 5'- CCGGCUAGCAAUCCCUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREx2mut2: 5'- CCGGCUAGCAAUUGUCCCCUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'.

All of the synthetic genes cPPRcaps poly(A), cPPRcaps poly(U), cPPRcaps poly(C)

[NT], cPPRcaps poly(C) [NS], cPPRcaps poly(G) [SD], cPPRcaps poly(G) [GD] and cPPRcaps

NRE were assembled from synthetic oligonucleotides and/or PCR products by GeneArt

(Life Technologies). The fragments were provided pre-cloned into pMK-RQ using SfiI

cloning sites. For the expression and purification of PUF and cPPR proteins, both the PUF

and cPPR genes were cloned into the pTYB3 vector. pTYB3 is a 7,477bp E. coli expression

vector used in the IMPACT Kit (NEB #E6901; Chong et al., 1997). This C-terminal fusion

plasmid was designed for the insertion of a target gene into a polylinker upstream of an

intein tag (the Sce VMA intein/chitin binding domain, 55 kDa) (Chong et al., 1997;

Watanabe et al., 1994). Upon cloning, the C-terminal end of the target protein would be

fused to the N-terminus of the intein tag, both under the control of an IPTG-inducible T7

promoter (Dubendorff and Studier, 1991). Single column purification of the target protein

is achieved via thiol-induced self-cleavage of the intein, which releases the target protein

from the chitin-bound intein tag (Chong et al., 1996).

The synthetic cPPR fragments were cut with NcoI and SapI from the pMK-RQ

plasmid and ligated with a FastAP treated, NcoI and SapI cut pTYB3 plasmid. The PUF and

mutant derivative genes were cloned into the NcoI/XhoI cut pTYB3 plasmid.

Additionally, the PUF gene and its mutant derivatives were also cloned into

NcoI/XhoI cut pETM30 vector, which is another protein expression vector which has both

a glutathione S-transferase (GST) tag and a HexaHis (His6) tag located at the N-

terminus. Purification of the target protein via this GST gene fusion system is achieved by

TEV cleavage of the His6 after the GST-proteins have been captured by glutathione-

agarose beads. For expression of cPPR proteins in tissue culture, cPPRcaps poly(A) and

cPPRcaps NRE were initially subcloned into NcoI/XhoI cut pJC72-OTC backbone. Ornitihine

transcarbamylase (OTC) is a 36 kDa protein that facilitates post-translational import of

proteins into mitochondria (Brusilow and Horwich., 1996; Mori et al., 1981).

Subsequently, the fused OTC-cPPR genes were cut with KpnI/XhoI, gel purified and cloned

into the pcDNA3 expression vector (Invitrogen). All plasmids were tested for expression by

transfection and immunoblotting.

2.2.2. E. coli competent cell preparation

DH10B/ER2566/BL21 cells were grown in 10 ml lysogeny broth (LB) medium

overnight at 37 °C with shaking (180 rpm). The starter culture was diluted in 500 ml LB

media and grown at 37 °C with shaking until OD600 was between 0.4 and 0.6. Cells were

pelleted at 3000 rpm for 5 min and resuspended in 150 ml ice cold 100 mM CaCl2/10%

glycerol. Cells were pelleted once again and resuspended in 20 ml ice cold 100 mM

CaCl2/10% glycerol; incubate for 25 mins. 250 µL aliquots were dispensed in sterile 1.7 ml

microcentrifuge tubes. Transformation efficiency and checks for contamination were

performed after each preparation, and frozen stocks were stored at –80°C for subsequent

2.2.3 E. coli transformation

For transformation, 10 µL or 50 µL of E. coli competent cells were added to 1 µL

(whole plasmid) or 10 µL (ligation mix), respectively and incubated on ice for 30 mins.

Cells were subsequently heat-shocked for 30 s at 42 °C in a water bath. After cells were

chilled on ice for 5 mins and 1 ml of SOC was added (made by adding 20mM glucose to 1 L

of SOB medium [2% (w/v) tryptone peptone, 0.5% (w/v) yeast extract, 10 mM NaCl, 2.5

mM KCl, 10 mM MgCl2, 10 mM MgSO4, prepared without Mg2+ and autoclaved, Mg2+

added from a 2 M filter sterilized stock (1 M MgCl2.6H20 and 1 M MgSO47H20,)]. Cells

were left to shake (180 rpm) at 37°C for 40 mins. Cells were pelleted and plated on LB agar

plates (with antibiotics) and grown overnight at 37°C.

2.2.4 Bacterial colony screening

As many plasmids were constructed in the course of this and other projects, a

quick and reliable method for screening for recombinant clones was required. Colonies

were initially picked and resuspended in 6 µL of LB. Of this suspension, 3 µL was added to

8 µL of 0.5% Tween-20 and mixed. The remaining 3 µL were used later for inoculation of

an overnight culture when screening results were positive. The Tween-20 suspension was

heated to 100°C in a thermal cycler for 30 s. For PCR based screening, an aliquot of 1 µL

was used from the denatured Tween-20 suspension and amplified with appropriate

primers and DreamTaq DNA polymerase.

For screening by restriction enzyme, digest 2 µL (containing 1 µL of 10 x buffer, 0.2

µL of each restriction enzyme and made up to the final volume with sterile deionized

water) were added to the denatured Tween-20 suspension. After 30 mins, at the

appropriate temperature, reaction mixes were resolved by agarose gel electrophoresis

and analyzed by ethidium bromide staining.

2.2.5 Plasmid preparation and analysis

Plasmid DNA was isolated from 10 ml of overnight E. coli DH10B culture using the

Fermentas GeneJET Plasmid Miniprep Kit. The concentration and purity of the plasmid

DNA was determined using a NanoDrop ND-1000 Spectrophotometer (Nanodrop

Technologies, Inc). The identity of each plasmid was confirmed by restriction

endonuclease digestion and Sanger sequencing of inserts using primers flanking ligation

junctions (Australian Genome Research Facility, Perth).

2.2.6 Yeast transformations

The lithium acetate method for transforming yeast was adapted from Gietz and

Woods (2001). S. cerevisiae YBZ1 cells (MATa, ura3-52, leu2-3, 112, his3-200, trp1-1, ade2,

LYS2 :: (LexAop)-HIS3, ura3 :: (lexA-op)-lacZ, LexA-MS2 coat (N55K))(Hook et al., 2005)

were inoculated in 10 ml YPAD (1% w/v Yeast extract, 2% w/v peptone, 2% w/v glucose,

0.01% w/v adenine hemisulphate) and grown overnight at 30 °C with shaking (180 rpm).

Cells were pelleted at 3200 rpm for 2 mins and resuspended in 50 ml YPAD in a baffled

flask. Cells were returned to grow for 3 hours at 30 °C with shaking. Cells were pelleted,

washed in 40 ml TE (10 mM Tris-Cl, pH 7.5; 1 mM EDTA), pelleted again and incubated at

room temperature for 10 mins in 2 ml 100 mM Lithium acetate/ 0.5x TE. 100 µL of cells

was added to 1 µg plasmid DNA with 100 µg denatured salmon sperm DNA and mixed

gently. For yeast-three hybrid transformations, both RNA expression and yeast expression

plasmid were added together. 700 µL of 100mM lithium acetate/40% PEG-3350/1x TE was

added to cell-DNA mix and incubated at 30°C for 30 mins. 88 µL of DMSO was added and

the mixture was heat shocked for 7 mins in a 42°C water bath. Cells were pelleted for 30 s

at 10,000 rpm, liquid removed and washed with 1 ml TE. Cells were pelleted once again

and resuspended in 100 µL TE before plating on SC media plates lacking the appropriate

amino acids; allowed to grow at 30°C. SC media (0.67% w/v yeast nitrogen base [without

amino acids, with ammonium sulphate], 2% w/v glucose, pH 5.6 (prepared without amino

acids and dropout mix) autoclaved; 100 ml of 10x dropout mix (0.03% w/v arginine HCl,

0.03% w/v isoleucine, 0.03% w/v lysine HCl, 0.03% w/v methionine, 0.05% w/v

phenylalanine, 0.03% w/v serine, 0.03% w/v threonine, 0.03% w/v tyrosine, 0.15% w/v

valine) and 10 ml of individual 100x amino acids was added as required. SC agar was

made by adding 2.1% w/v Bacto agar to SC media.

2.2.7 PUF protein expression and purification

PUF domains were subcloned into pTYB3 and expressed as fusions to an intein and

chitin-binding domain in E.coli ER2566 cells. Cells were lyzed by sonication in 20 mM

sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF. Lysates were clarified by

centrifugation and incubated for 40 min with chitin beads. Beads were washed twice with

20 mM sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF, once with 20 mM

sodium phosphate (pH 8.0), 0.5 M NaCl, and 0.1 mM PMSF, and once with 20 mM sodium

phosphate (pH 8.0), 0.15 M NaCl, and 0.1 mM PMSF. DTT was added to the beads to 50

mM final concentration and the tube was purged with nitrogen gas before incubation at

room temperature with gentle rocking for three days. Cleaved PUF domain protein, free

from the intein and chitin-binding domain was collected, transferred into 10 mM Tris-HCl

(pH 7.4), 150 mM NaCl, 5 mM ß-mercaptoethanol and further purified by an ÄKTA-

Explorer system (GE) using a Superdex 200 10/300 column (GE) with a total bed volume of

120 ml. Pure fractions were pooled and concentrated using Microsep 10K Omega

centrifugal devices (PALL). Protein concentration was determined by the bicichroninic acid

(BCA) assay using bovine serum albumin (BSA) as a standard.

2.2.8 cPPR protein expression and purification

Like the PUF proteins, cPPR domains were subcloned into pTYB3 and expressed as

a fusion to an intein and chitin-binding domain in Escherichia coli ER2566 cells (New

England Biolabs). Cells were lyzed by sonication in 20 mM Trizma base (pH 8.0), 1 M NaCl,

10% glycerol and 0.1 mM PMSF. Lysates were clarified by centrifugation and incubated for

40 min with chitin beads (New England Biolabs). Beads were washed five times with 20

mM Trizma base (pH 8.0), 1 M NaCl, 10% glycerol and 0.1 mM PMSF. DTT was added to

the beads to 50 mM final concentration and the tube was purged with nitrogen gas before

incubation at room temperature with gentle rocking for three days. Cleaved cPPR domain

protein, free from the intein and chitin-binding domain was collected and transferred into

a Slide-A-Lyzer mini dialysis unit and dialyzed overnight in 20 mM Trizma Base (pH 8.0), 1

M NaCl, 10% glycerol to remove DTT. Protein concentration was determined visually on a

10% SDS-page gel using dilutions of 1 mg/ml bovine serum albumin (BSA) as a standard.

2.2.9 Bicinchoninic acid (BCA) protein assay

In order to determine the protein concentration of samples, BCA protein assays

were conducted as per Smith et al. (1985) Analytical Biochemistry 150 (pg 76-85) in a 96-

well plate using bovine serum albumin (BSA) as a standard. 20 µL triplicates of each

sample were pipetted in successive wells. Dilutions were prepared in 1% v/v Triton-X-100

in water. 50 parts of BCA reagent A (1% BCA [4,4’-dicarboxy-2,2’-biquinoline] 2% Na2CO3,

0.16% Na2tartrate, 0.4% NaOH, 0.95% NaHCO3, pH 11.25) was mixed with one part BCA

reagent B (4% CuSO4.5H2O). 200 µL of the prepared reagent was added to each sample

and was allowed to incubate at 37 °C for one hour. The plate was read at OD550

on VICTOR3™ Multilabel Counter model 1420 (PerkinElmer).

2.2.10 SDS-PAGE gel

Cell samples were denatured in loading buffer (50 mM Tris, 4% SDS, 12% glycerol,

2% 2-mercaptoethanol, 0.01% Coomassie brilliant blue) for 7 min at 95°C and separated

on a 10% Tris-glycine gel (0.375 M Tris-HCl, 0.1% SDS, pH6.8) using the BioRad Mini

Protean system. Gels were then stained with Coomassie blue stain (40% methanol, 10%

acetic acid, 0.1% Coomassie Brilliant Blue) for 1 hr and destained (20% methanol, 7.5%

acetic acid).

2.2.11 RNA electrophoretic mobility shift assays

Purified PUF/cPPR domains were incubated at room temperature for 30 min with

fluorescein labeled RNA oligonucleotides (Dharmacon) in 10 mM HEPES (pH 8.0), 1 mM

EDTA, 50 mM KCl, 2 mM DTT, 0.1 mg/ml fatty acid-free BSA, and 0.02% Tween-20.

Reactions were analyzed by 10% PAGE in TAE and fluorescence was detected using a

Typhoon TRIO scanner (GE).

List of probes: NRE: 5'-(Fl)AUUGUAUAUA-3'

NREU3C: 5'-(Fl)AUUGCAUAUA-3' Poly G: 5’-(Fl)AAGGGGGGGG -3’ Poly C: 5’-(Fl)CCCCCCCCCC-3’ Poly U: 5’-(Fl)UUUUUUUUUU -3’ Poly A: 5’-(Fl)AAAAAAAAAA-3’

2.2.12 PUF library selections

YBZ1 cells containing the NREU3C RNA expression plasmid were transformed with

the PUF domain library in pGAD-RC using the lithium acetate method according to Gietz

and Woods (2002) as described in section 2.2.4, yielding 6 x 105 primary transformants.

Cells were inoculated and amplified by overnight growth in SC media lacking leucine and

uracil at 30 °C with shaking (180 rpm). The cells were subsequently pelleted and washed in

50 ml TE and 1 x 107 CFU were plated on SC agar lacking leucine, uracil and histidine,

supplemented with 0.5 mM 3-amino triazole. Colonies were picked after three days and

the plasmids were isolated, transformed into DH10B, screened by PCR to identify the PUF

encoding plasmid which was sequenced and transformed into YBZ1 to analyze the

specificity of the mutant PUF domains, as described below.

2.2.13 Yeast three-hybrid growth assays

YBZ1 transformants containing PUF domain and RNA expression plasmids were

grown overnight in SC media lacking leucine and uracil at 30 °C with shaking (180 rpm).

Cells were pelleted and washed in SC media without amino acids, diluted to OD600 of 0.1

and replica spotted (5 µL) onto SC media lacking leucine and uracil (to test for cell health

and plasmid maintenance) and SC agar lacking leucine, uracil and histidine, supplemented

with 0.5 mM 3-amino triazole (to test for RNA-protein interactions).

2.2.14 ß-galactosidase assays

YBZ1 transformants containing PUF domain and RNA expression plasmids were

grown overnight in SC media lacking leucine and uracil at 30°C with shaking (180 rpm). The

culture was diluted to OD600 of 0.1 and mixed with an equal volume of Beta-Glo reagent

(Promega), incubated for 1 h at room temperature and luminescence was detected using

a FLUOstar OPTIMA (BMB Labtech).

2.2.15 Cell culture

143B osteosarcoma cells were cultured at 37 °C under humidified 95% air/5% CO2

in Dulbecco’s modified Eagle’s medium (DMEM, Invitrogen) containing glucose (4.5 g l−1),

1 mM pyruvate, 2 mM glutamine, penicillin (100 U ml−1), streptomycin sulfate

(100 μg ml−1) and 10% fetal bovine serum (FBS).

2.2.16 Transfections

143B cells were plated at 60% confluence in six-well plates or 10 cm dishes and

transfected with mammalian expression plasmids in OptiMEM media (Invitrogen). 125 nM

(for 6-well plates) or 145 nM (for 10 cm dishes) of cPPRcaps poly(A)/NRE or control EYFP,

were transfected using Lipofectamine 2000 (Invitrogen). 158 ng/cm2 of cPPRcaps poly

(A)/NRE or control EYFP plasmid DNA was transfected using Fugene HD (Roche). Cell

incubations were carried out for 3 days following transfection. Transfections for 9 days

were reseeded and re-transfected every 3 days.

2.2.17 Northern blotting

RNA was isolated from 143B cells using the Qiagen miRNeasy kit according to the

manufacturer’s instructions. RNA (5 μg) was resolved on 1.2% agarose formaldehyde gels,

then transferred to 0.45 μm Hybond-N+ nitrocellulose membrane (GE Lifesciences) and

hybridized with biotinylated oligonucleotide probes specific to mitochondrial mRNAs and

rRNAs. The hybridizations were carried out overnight at 50 °C in 5× SSC, 20 mM Na2HPO4,

7% SDS and 100 μg ml−1 heparin, followed by washing. The signal was detected using

either a streptavidin-linked horseradish peroxidase or streptavidin-linked infrared

antibody (diluted 1:2000 in 3× SSC, 5% SDS, 25 mM Na2HPO4, pH 7.5) by enhanced

chemiluminescence (GE Lifesciences) or using an Odyssey Infrared Imaging System.

2.2.18 Mitochondrial protein synthesis

143B cells were grown in six-well plates until 60% confluent, transfected and 3

days later de novo protein synthesis was analyzed. For a 6 day transfection, the initial

transfected cells are re-transfected at the end of the 3 day time point and allowed to grow

for an additional 3 days. The growth medium was replaced with methionine and cysteine

free medium containing 10% dialysed FBS for 30 min before addition of 100 μg ml−1

emetine for 5 min. Next, 200 μCi Expres35S Protein Labeling Mix [35S] (14 mCi, Perkin–

Elmer) was added and incubated at 37 °C for 1 h, then washed in PBS and centrifuged. The

cells were suspended in PBS and 20 μg of proteins were separated on 12.5% SDS–PAGE

and the radiolabeled proteins were visualized on film.

2.3 Graphic maps of Plasmids

The following graphic maps are representations of plasmids used in the project. Not

all plasmids were included in this section due to redundancy.

2.3.2 pIIIA/MS2-2 plasmid

pIIIA/MS2-2 is an RNA expression plasmid used in yeast three-hybrid experiments, where

annealed oligonucleotides were cloned between the SmaI and SphI sites.

2.3.2 pTYB3-EYFP plasmid

Other inserts cloned between the NcoI and XhoI restriction site in pTYB3 were PUF1,

PUM1 and the C-binding mutants.

2.3.3 pETM30-EYFP plasmid

Other inserts cloned between the NcoI and XhoI restriction site in pETM30 were PUF1,

PUM1 and the C-binding mutants.

2.3.4 pTYB3-cPPRcaps poly(A) plasmid

Other inserts cloned between the NcoI and SapI restriction site in pTYB3 were all the

cPPRcaps - poly [A, G, U/C, C(NT), C(NS), G(GD), G(SD), NRE].

2.3.5 pcDNA3-OTC cPPRcaps poly(A)-CTAP plasmid

Other inserts cloned between the KpnI and XhoI restriction site in pTYB3 were the

cPPRcaps-NRE and EYFP.

CHAPTER 3

Engineering Cytosine-binding PUF repeats

Designer DNA-binding proteins that can silence, activate or modify a target gene

have already been developed based on various classical zinc finger (ZF) domains (Sera,

2009; Cathomen and Joung, 2008; Carroll, 2008; Camenisch et al., 2008). Some of them

are currently being tested in clinical trials as potential therapeutic agents (Tebas and Stein,

2009). Present methods for altering gene expressing via RNA mostly rely on RNA

interference (RNAi) methods (Liu and Paroo, 2010, Perrimon et al., 2010; Vaishnaw et al.,

2010). However, there are also engineered RNA-binding proteins (RBPs) that have been

used to modulate mRNA function. Examples include (i) fusion proteins containing a green

fluorescent protein and the RNA-binding MS2 coat protein have been used to monitor the

localization of ASH1 mRNA (Bertrand et al., 1998) (ii) combining the RNA-binding domains

(RBDs) of iron regulatory protein with the eukaryotic translation initiation factor, eIF4G,

can enhance translation of a reporter gene (De Gregorio et al., 1999). However, these

proteins have not been able to function on endogenous mRNAs because the reporters

were linked to RNA-binding proteins with well-characterized recognition sites that were

incorporated into target mRNAs of interest (Mackay et al., 2011). Therefore, the ability to

engineer designer RBPs would offer considerable flexibility for controlling RNA function

and would enable endogenous RNAs to be targeted.

One of the best candidates for engineering are the PUF (Pumilio and FBF

homology) proteins. Initially, designing custom PUF proteins was hindered by the lack of

structural knowledge and poor understanding of the guidelines governing its RNA-protein

recognition. The crystal structures of PUF proteins has shown that they are generally

composed of eight 36 amino acid repeats, with each repeat binding to a single nucleotide

in their RNA targets (Wang et al., 2002; Edwards et al., 2001; Zamore et al., 1997; Lu and

Hall., 2011; Wang et al., 2009; Zhu et al., 2009). Amino acids at positions 12 and 16 of the

PUF repeat bind each RNA base via hydrogen bonding or van der Waals contacts with the

Watson-Crick edge, while the amino acid at position 13 makes stacking interactions. It was

also elucidated that PUFs are good designer RBP candidates because their RNA

recognition is base-specific (Wang et al., 2002); such that a cysteine and glutamine bind

adenine; asparagine and glutamine bind uracil; and serine and glutamate bind guanine.

A major attribute of designer RBPs is the ability to fuse the protein to useful

effector domains. It is not surprising that this has been done with the PUF proteins. Ozawa

et al. (2007) successfully created two PUF fusion proteins that contained either the N- or

C-terminal portion of a fluorescent protein. The two PUM1 PUF domains were engineered

to recognize specific sequences in the mitochondrial ND6 transcript by introducing either

three or seven point mutations (Ozawa et al., 2007). The goal to be achieved was that

when mammalian cells were transfected with the two plasmids, fluorescence will occur

only when both fusion proteins bound to their target sequence in the mitochondrial ND6

mRNA, allowing the fluorescent protein to be reconstituted (Ozawa et al., 2007). Using

this approach, both the diffusion and localization properties of the ND6 mRNA transcript

were ascertained under normal and stress conditions.

In a different application, Wang et al. (2009) fused engineered PUF domains to the

glycine-rich domain of human heterogeneous nuclear ribonucleoprotein (hnRNP) A1 or

the arginine- and serine-rich domain of ASF (also called SF2 or SRSF1) to create targetable

RNA-splicing repressors and enhancers, respectively. This was achieved by introducing five

point mutations in the PUM1 PUF domain so as to recognize an 8 nucleotide sequence in

an exon extension region of the BCL21 RNA (also known as BCLX; Wang et al., 2009). The

long transcripts of BCL21 act to inhibit apoptosis in cancer cells. When several cancer cell

lines were treated with the engineered splicing factor, it resulted in a shift to the

predominant splice form of shortened BCLX transcripts, which was pro-apoptotic and

caused the cells to be sensitive to anticancer drugs. These studies provide an exciting

preview of applications for engineered RBPs.

However, the use of PUF domains as tools has been hampered because naturally

occurring residues that recognize cytosine have not been found. This limits the potential

target sites for engineered PUFs because even in RNAs encoded by guanine and cytosine-

poor genomes, the majority of octomer sequences will contain at least one cytosine. If

one was to target a small RNA or defined region of a larger RNA, it can be impossible to

engineer a PUF protein to bind these RNAs. Therefore, the identification of a combination

of amino acid side chains in a PUM1 repeat that can recognize a cytosine is necessary to

expand the use of designer PUF domains directed toward any RNA sequence. To

overcome this limitation, we used directed evolution to select for PUF repeat variants that

are able to specifically recognize cytosine. This process entails generating a library of

mutants and testing them via screening or selection, for the presence of mutants

possessing the desired property with the aim of eliminating all original or non-functional

mutants and identifying and enriching mutants with the new desired function.

3.1 Methods to study RNA-Protein interactions

In general, the most common methods used to study RNA-protein interactions

involve cell extracts or purified proteins and in vitro transcribed RNA. Specific binding can

be studied by mobility shifts of labeled or fluorescently-probed RNA in agarose or

polyacrylamide gels (Thomson et al., 1999), filter binding (Merrick and Sonenberg, 1997),

or chromatography with matrix-bound RNAs (Allerson et al., 2003). In a similar method

known as a “north-western”, proteins are separated on polyacrylamide gels, transferred

to a membrane and probed with labeled RNA sequences of interest (Monshausen et al.,

2001). As purified proteins and RNAs can be used in these methods, they can discriminate

between direct and indirect binding. These assays all suffer from the same major

drawback in that the conditions of association vary in stringency depending on the

composition of the reaction buffers. Immunoprecipitation of RNA-binding proteins

provides a more physiological way to study RNA-protein interactions. Subsequent

northern (Takizawa and Vale, 2000) or microarray analysis (Brown et al., 2001) of

antibody-bound fractions can determine the RNAs that were associated with a given

protein prior to cell lysis. As complexes are allowed to form in cells before analysis fewer

false interactions are likely to occur. Appending an epitope or affinity tag by genetic

manipulation enables the collection of RNA-protein complexes containing proteins for

which antibodies are not available (Takizawa and Vale, 2000). However, it is still possible

for non-specific interactions to occur after cell lysis and these methods cannot

discriminate between direct and indirect binding. In a complementary approach,

identification of proteins associated with RNA of interest can be achieved by tagging the

RNA itself. RNAs are manipulated to contain a short aptamer sequence that binds directly

to an affinity matrix (Srisawat and Engelke, 2001; Srisawat et al., 2001) or a heterologous

RNA-binding protein that is itself tagged or can be immunoprecipitated (Watkins et al.,

2000). Co-purifying proteins can be identified by immunoblotting or mass spectrometry. In

addition to post-lysis artifacts, these methods require huge numbers of cells as starting

material and hence any observations represent only the average of a population.

To date very few methods for studying RNA-protein interactions in living cells have

been reported. The yeast three-hybrid system (Putz et al., 1996; SenGupta et al., 1996) is

the most widely used of these approaches and has been especially powerful in the

isolation of cDNAs for proteins that bind RNA sequences of interest (Jan et al., 1999;

Martin et al., 1997; Park et al., 1999; Zhang et al., 1997). In this system, a DNA-binding

domain is tethered to an RNA sequence of interest by the bacteriophage MS2 coat protein

via a specific aptamer; while a transcriptional activator is fused to an RNA-binding protein

of interest (SenGupta et al., 1996). If an interaction between the two molecules of interest

occurs, transcription of reporter genes is activated. The reporter genes enable screening

of yeast on selective growth media and semi-quantitative measurement of interactions.

The yeast three-hybrid system enables screening of cDNA libraries to identify new RNA-

binding proteins (Zhang et al., 1997), screening of RNA sequences to determine unknown

targets for a given RNA-binding protein (Sengupta et al., 1999), and delineation of

sequences important for known RNA-protein interactions (Lee et al., 1999). Because the

yeast three-hybrid system is an in vivo system which is particularly well suited for library

screening, I sought to adapt it to expand the RNA recognition code of PUFs.

3.2 Genetic selection of PUF library in yeast three-hybrid system

To identify PUF repeat variants that are able to specifically recognize cytosine, the

yeast three-hybrid system was used to link the interaction between PUF domains and its

RNA target to a life-death selection in S. cerevisiae (Figure 3.1; SenGupta et al., 1996). In

this system, a fusion between the lexA DNA-binding domain and the MS2 coat protein

effectively tethers a hybrid RNA that contains MS2 recognition sites and a PUF RNA target

of interest upstream of his3 and lacZ reporter genes in the yeast genome. If a PUF-RNA

complex is formed, transcription of the his3 gene is activated to allow survival on media

lacking histidine. In addition, the RNA-protein interaction can be quantified by measuring

the activity of β−galactosidase expressed from the lacZ reporter gene (SenGupta et al.,

1996).

Figure 3.1: The yeast three-hybrid system. Diagrammatic representation of the yeast three-hybrid system used in this study. The LexA DNA operator (LexAop), LexA DNA-binding domain (LexA), bacteriophage MS2 coat protein (MS2), Nanos response element RNA (NRE), PUF protein (PUF), transcription activation domain of the yeast Gal4 transcription factor (AD), HIS3 gene (HIS3) and the lacZ reporter gene (lacZ) are shown.

3.3 Yeast three-hybrid sensitivity testing for engineering individual PUF repeats

We investigated repeat 6 of the human PUF protein, PUM1, which inherently binds

uracil in its RNA target (the nanos-response element, NRE) with a high-affinity, as

observed in its crystal structure (Wang et al., 2001). In order to test the sensitivity of the

yeast three-hybrid system, a preliminary experiment was conducted. This entailed

mutating the amino acids at position 12 and 16 of repeat 6 by PCR with primers specifying

amino acid cysteine or serine at position 12 and glutamine or glutamate at position 16

with the purpose of altering the specificity of repeat 6 to bind adenine and guanine,

respectively (Figure 3.2).

Figure 3.2: Amino acids that determine PUF repeat binding. Amino acids of repeat 5, 6 and 7 of PUM1 bind Adenine, Guanine and Uracil, respectively. Cysteine (C) and glutamine (Q) bind adenine; asparagine (N) and glutamine (Q) bind uracil; and serine (S) and glutamate (E) bind guanine. Amino acids at position 12 and 16 of repeat 6 were mutated via PCR to resemble adenine- and guanine- binding repeats (exemplified by repeat 5 and 7).

The RNA expression plasmids used for this experiment were constructed by sub-

cloning pairs of annealed oligonucleotides into the multiple cloning site of the pIIIA/MS2-2

plasmid matching the following RNA sequences. Nanos response element (NRE) is the RNA

target for PUM1 (PUF recognition sequences in bold, site specific mutations underlined):

NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3A: 5'-CCGGCUAGCAAUUGAAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3';

In order to generate a PUF domain that was fused to a Gal4p activation domain, a

synthetic gene encoding amino acids 828 to 1176 of the human PUM1 protein was sub-

cloned into pJC72 (Rackham and Chin, 2005). This plasmid was later used as a template for

library construction by enzymatic inverse PCR (Section 3.5; Rackham and Chin, 2005). The

yeast expression plasmid was constructed by sub-cloning the wild-type PUF gene and the

two repeat 6 variants from pJC72 into the yeast expression plasmid pGAD-RC (Ito et al.,

2000). YBZ1 was transformed with the combination of four RNA expression plasmids and

the three pGAD-RC constructs using the lithium acetate method according to Gietz and

Woods (2002). Enhanced yellow fluorescent protein (EYFP) was used as a negative control.

The transformants were plated on SC agar lacking leucine, uracil and histidine,

supplemented with 0.5 mM 3-amino triazole.

3-Aminotriazole (3-AT) is a competitive inhibitor of the product of HIS3 gene,

His3p. The level of resistance to 3-AT is used as a measure of the strength of RNA–protein

interaction because cells containing more His3p can survive at higher concentrations of 3-

AT. YBZ1 transformants containing both the PUF domain and RNA expression plasmids

were grown overnight in SC media lacking leucine and uracil, washed in SC media without

amino acids, diluted to OD600 of 0.1 and replica spotted onto SC media lacking leucine and

uracil (to test for cell health and plasmid maintenance) and SC agar lacking leucine, uracil

and histidine, supplemented with 0.5 mM 3-amino triazole (to test for RNA-protein

interactions).

3.3.1 Yeast three-hybrid system is sensitive for engineering PUFs

Results of the sensitivity test showed that when the amino acid at position 12 was

mutated to cysteine to resemble the adenine-binding repeat five of PUM1 (CQ) cells

survived on selective media only when the target RNA had an adenine at the position in

the RNA bound by repeat 6 (Figure 3.3). Furthermore, when we transplanted the guanine-

recognizing amino acids, serine and glutamate, from repeat seven (SE) into repeat six, the

cells successfully survived on selective media when the target RNA had an guanine at the

position in the RNA bound by repeat 6. Results from this experiment indicate that the

system provides sufficient sensitivity to enable the engineering of individual PUF repeats.

Figure 3.3: Yeast three-hybrid sensitivity test for engineering individual PUF repeats. Specificity of the selected clones was determined by survival on medium lacking histidine and containing 0.5 mM 3-aminotriazole.

3.4 Library screening for cytosine-binding PUF

Given that the system was sensitive enough for engineering individual PUF repeats,

we progressed on to find amino acids that were able to specifically recognize cytosine in

the context of PUFs. A library based on the PUM1 PUF domain was synthesized where

positions 12 and 16 of repeat six were randomized to encode all possible amino acids. This

was achieved using primers where the codons corresponding to amino acids 1043 and

1047 (located in repeat 6 of PUM1) were encoded by mixtures of trimer phosphoramidites

encoding all 20 amino acids. This library of mutant PUF domains was then sub-cloned from

the pJC72 plasmid into the yeast expression plasmid pGAD-RC (Ito et al., 2000). YBZ1

containing the NREU3C RNA expression plasmid was transformed with the PUF domain

library in pGAD-RC and plated on SC agar lacking leucine, uracil and histidine,

supplemented with 0.5 mM 3-amino triazole.

Unlike the sensitivity testing, colonies that survived on media lacking histidine after

three days were picked and the plasmids were isolated, transformed into DH10B,

screened by PCR to identify the PUF encoding plasmid which was sequenced and

transformed into YBZ1 to analyze the specificity of the mutant PUF domains. YBZ1

transformants containing both the PUF domain and RNA expression plasmids were grown

overnight in SC media lacking leucine and uracil, washed in SC media without amino acids,

diluted to OD600 of 0.1 and replica spotted onto both SC media lacking leucine and uracil

and SC agar lacking leucine, uracil and histidine, supplemented with 0.5 mM 3-amino

triazole.

3.4.1 Five PUF mutants interact with cytosine

Five unique PUF mutants that selectively interacted with RNAs containing a

cytosine but not adenine, guanine or uracil were identified (Figure 3.4). The first

interesting observation noted was that all five variants had an arginine at position 16. On

the other hand, four of the amino acids at position 12 (glycine, serine, threonine and

cysteine) are amino acids with polar uncharged side chains, while the other amino acid,

alanine, is classified as a non-polar hydrophobic amino acid. The commonality in their

structure is that they possess a small or nucleophilic side chain.

Figure 3.4: Selection of PUF repeats that can specifically recognize cytosine. (Left) Sequences of the RNA-binding regions of PUM1 repeats. The key hydrogen-bonding residues at positions 12 and 16 were randomised and combinations that could recognize cytosine were selected from the library using the yeast three-hybrid system. Specificity of the selected clones was determined by survival on media lacking histidine and containing 0.5 mM 3-aminotriazole (a His3p competitive inhibitor) (Right) Cytosine-binding clones as determined by survival on medium lacking histidine uracil and histidine, and containing 0.5 mM 3-aminotriazole.

In addition, β-galactosidase activity expressed from the lacZ reporter gene was

assayed in order to quantify of the strength of the RNA-protein interaction. We confirmed

that the A, G and U repeats bind their expected bases and that all five of the newly found

C-binding mutants specifically interacted with a target RNA containing cytosine. The fold

increase in β-galactosidase activity for the selected mutants in the presence of cytosine-

containing RNA was not as high as that for the wild type PUF and its cognate NRE RNA,

however some of the cytosine-binding mutants in combination with their target RNAs

generated activities similar to the guanine-binding mutant and higher than the adenine-

binding mutant, in the presence of their cognate RNAs.

Figure 3.5: Characterization of PUF repeats that can specifically recognize cytosine. Specificity of the selected clones was quantified using ß-galactosidase assays to examine activation of a lacZ reporter gene. We used the enhanced yellow fluorescent protein (EYFP) as a control to show that the activation of transcription was dependent on specific RNA-protein interactions in our experiments. Data are mean ± SEM from six independent experiments. *p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.

3.5 In vitro analysis of PUF-NRE interaction

To further confirm our observations seen in the previous section, we investigated

the specificity of the interactions between wild type and mutant PUF proteins in vitro

using RNA electrophoretic mobility shift assays (RNA EMSA). In order to do this, I over

expressed the wild type PUF and two cytosine-recognizing PUF mutants (GR, with glycine

at position 12 and arginine at position 16, and CR, with cysteine at position 12 and

arginine at position 16) in E. coli and purified them to homogeneity.

3.5.1 Purifying PUF proteins

The PUF domains were first sub-cloned from the pGAD-RC yeast expression plasmid

into the pETM30 protein expression plasmid. This system expresses the PUF as a fusion to

GST and a His6 tag in cells. An initial small scale test induction was conducted to

determine the optimum conditions for soluble PUM1 protein production. pETM30-PUM1

was transformed into the Rosetta 2 and BL21 cells and overnight cultures were grown in

LB in the presence of kanamycin (BL21 and Rosetta) and chloramphenicol (for Rosetta

only). A five hour induction at 37 oC and an overnight induction at room temperature with

1 mM IPTG were conducted.

His6-GST-PUM1

Figure 3.6: Test induction of pETM30 -PUM1 plasmid in BL21. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (5 hrs). Lane 7: Induced insoluble lysate (O/N). Lane 8: Uninduced soluble lysate (O/N). Lane 9: Induced soluble lysate (5 hrs). Lane 10: Induced soluble lysate (O/N).

Figure 3.7: Test induction of pETM30 -PUM1 plasmid in Rosetta 2. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (5 hrs). Lane 7: Induced insoluble lysate (O/N). Lane 8: Uninduced soluble lysate (O/N). Lane 9: Induced soluble lysate (5 hrs). Lane 10: Induced soluble lysate (O/N).

His6-GST-PUM1

The first set of inductions showed that the PUM1 protein was successfully

produced in both conditions, with more protein being produced following overnight

induction (Figures 3.6 and 3.7). However, the proteins produced were highly insoluble

under those conditions. I proceeded to determine if the 1 mM IPTG concentration was too

high by repeating the inductions using 0.5 mM and 0.05 mM IPTG given that at high IPTG

concentrations the cells may have been under too much stress. A five hour induction at

37 oC and an overnight induction at room temperature with 1 mM IPTG were conducted.

Figure 3.8: Test induction of pETM30 -PUM1 plasmid with 0.5 mM IPTG. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced soluble lysate (3 hrs). Lane 4: Induced soluble lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (3 hrs). Lane 7: Induced insoluble lysate (O/N).

His6-GST-PUM1

Figure 3.9: Test induction of pETM30 -PUM1 plasmid with 0.05 mM IPTG. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced soluble lysate (3 hrs). Lane 4: Induced soluble lysate (O/N). Lane 5: Uninduced insoluble lysate (O/N). Lane 6: Induced insoluble lysate (3 hrs). Lane 7: Induced insoluble lysate (O/N).

Similar results to the induction at 1 mM IPTG were obtained when the cells were

induced with 0.5 mM and 0.05 mM IPTG (Figures 3.8 and 3.9). PUM1 production was

successful, however the proteins were insoluble. I decided to use a different expression

plasmid, pTYB3 as this was the same plasmid used by Wang et al. (2001) when they

successfully purified the PUM1 protein for crystallography. The PUF domains were first

sub-cloned from the pGAD-RC yeast expression plasmid into the pTYB3 protein expression

plasmid. This system expresses the PUF as a fusion to an intein and chitin-binding domain.

pTYB3-PUM1 was transformed into the Rosetta 2 cells and overnight cultures were grown

in LB in the presence of ampicillin and kanamycin. A five hour induction at room

temperature with 1 mM IPTG was conducted.

His6-GST-PUM1

Figure 3.10: Test induction of pTYB3-PUM1 plasmid in Rosetta 2. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Uninduced soluble lysate. Lane 5: Induced soluble lysate. Lane 6: Uninduced insoluble lysate. Lane 7: Induced insoluble lysate.

Although the induction successfully led to the production of PUM1, all the proteins

produced were insoluble (Figure 3.10). As a final attempt, a test induction was conducted

with ER2566 cells, which is the host strain recommended for the expression of genes

cloned into the pTYB3 vector. pTYB3-PUM1 was transformed into the ER2566 and

overnight cultures were grown in LB in the presence of ampicillin. A five hour and an

overnight induction at room temperature with 1 mM IPTG were conducted.

PUM1-intein-CBD

PUM1- intein-CBD

Figure 3.11: Test induction of pETM30 -PUM1 plasmid in ER2566. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate (O/N). Lane 3: Induced whole cell lysate (5 hrs). Lane 4: Induced whole cell lysate (O/N). Lane 5: Uninduced soluble lysate (O/N). Lane 6: Induced soluble lysate (5 hrs). Lane 7: Induced soluble lysate (O/N). Lane 8: Uninduced insoluble lysate (O/N). Lane 9: Induced insoluble lysate (3 hrs). Lane 10: Induced insoluble lysate (O/N).

The inductions showed that although the majority of PUM1 proteins produced

were insoluble, there was still a small portion of soluble PUM1 proteins (as seen in lane 6

and 7; Figure 3.11), with a five hour induction producing the same amount of soluble

protein as an overnight induction. Note that PUM proteins were visualized in both lanes 2

and 8 because with the pET system, the absence of glucose in media may lead to partial

induction as catabolite repression is not achieved (Novy and Morris, 2001). Therefore, it

was decided that future inductions and purifications with the pTYB3 plasmid was to be

conducted at room temperature for five hours with 1 mM IPTG in ER2566. I advanced with

the large scale (2 l) purifications of PUM1 and two cytosine-binding mutants to enable in

vitro examination via RNA-electrophoretic mobility shift assays (RNA EMSAs). The proteins

were purified as per Section 2.2.7. In all cases, soluble protein was produced, free of the

intein and chitin-binding domain (Figures 3.12-3.14).

Figure 3.12: Purification of PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified PUM1 protein. Lane 7: Chitin beads

PUM1-intein-CBD

intein-CBD

Figure 3.13: Purification of C-binding (GR) PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified C-binding (GR) protein. Lane 7: Chitin beads

Figure 3.14: Purification of C-binding (CR) PUM1 protein. Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Soluble lysate after sonication. Lane 5: Insoluble protein after 3 day cleavage. Lane 6: Purified C-binding (CR) protein. Lane 7: Chitin beads

C-binding (GR) PUF- intein-CBD

intein-CBD

C-binding (GR) PUF protein

C-binding (CR) PUF- intein-CBD

intein-CBD

C-binding (CR) PUF protein

The cleaved PUF proteins were collected and further purified by an ÄKTA-Explorer

system (GE) using a Superdex 200 10/300 column (GE). The final purified proteins were

separated by SDS-PAGE and visualized by Coomassie Brilliant Blue staining (Figure 3.15).

Figure 3.15: Purified PUM1 and the two C-binding PUF proteins. (Left) Samples were loaded on a 10% Tris-glycine gel and ran for 15 mins at 80V, 1 hr at 130V. (Right) Gel filtration profile of cleaved PUM1 proteins purified on the Superdex 200 10/300 column by an ÄKTA-Explorer system. Purified PUM1 was collected from the highest peak (region between the dashed lines).

3.5.2 RNA electrophoretic mobility shift assay of PUF proteins

In order to confirm the binding specificity of the newly found cytosine-binding PUF

as well as to determine the strength of their interactions with the cytosine-containing RNA

target, an in vitro assay known as the RNA electrophoretic mobility shift assay (RNA EMSA)

was conducted. This assay was conducted by incubating the purified PUF proteins with

fluorescein labeled RNA oligonucleotides in binding buffer for an hour; the reactions were

subsequently analyzed by 10% PAGE in TAE and fluorescence was detected using a

Typhoon TRIO scanner. The two fluorescein-labeled RNA oligonucleotides are NRE [5'-(Fl)

AUUGUAUAUA-3'] and NREU3C [5'-(Fl) AUUGCAUAUA-3'], the former is the native RNA

target of the PUM1 protein, while the latter is the target for the cytosine-binding PUFs.

Figure 3.16. Specific recognition of cytosine in vitro. (a) Selected PUF domains are specific for cytosine containing RNAs, determined by RNA electrophoretic mobility shift assays. Wild type (NQ) and mutant (CR and GR) PUF proteins were tested against uracil (NRE) or cytosine (U3C) containing RNA probes. (b) The percentage of RNA bound by varying concentrations of each protein.

The RNA EMSAs showed striking specificity shift between the mutant PUF domains

and their cytosine-containing target RNA and the wild type PUF and its cognate NRE target

(Fig. 3.16). The first RNA EMSA with the PUM1 shows the wild-type protein binding

strongly to its NRE target compared to the cytosine-containing target RNA as binding was

observed at 0.01 µM of PUM1. There was a marked shift in specificity observed between

the PUM1 and the NREU3C. In general, increased shifting was observed with higher

protein concentration. Comparing the RNA EMSAs between PUM1 and the C-binding CR

mutant with both RNA targets, it was noted that the affinity of PUM1 for the NRE was

comparable to the affinity of the CR mutant to its NREU3C target as binding was achieved

at 0.01 µM of CR protein. On the other hand, the GR mutant successfully bound to its

cognate cytosine-containing RNA, albeit with lower affinity. It can also be seen that the CR

protein has a much higher level of non-specific binding to the NRE probe compared to the

GR mutant, however it still preferentially binds NREU3C. This confirms previous

observations showing that engineered PUF proteins do not always bind with the same

affinities as the wild type proteins (Cheong and Hall, 2006), indicating that future

applications will depend not only on binding preference but also on the affinity for their

target RNAs.

3.6 Summary

We have successfully identified five unique PUF mutants that can selectively

interact with RNAs containing a cytosine but not adenine, guanine or uracil. This was

achieved by randomizing the amino acids at positions 12 and 16 to encode for all possible

20 amino acids and screening for mutants on selection plates. All five variants had an

arginine at position 16 and either the amino acid glycine, alanine, serine, threonine, or

cysteine at position 12. Prior to this, we have shown that the yeast three-hybrid system

has sufficient sensitivity to enable the engineering of individual PUF repeats. Further

testing using β-galactosidase assays confirmed that the A, G and U repeats bind their

expected bases and that the five newly discovered C-binding mutants specifically

interacted with target RNA containing cytosine. Although the β-galactosidase activity for

the C-binding mutants bound to a cytosine-containing RNA was not as high as that for the

wild type PUF, it appears that some of the cytosine-binding mutants in combination with

their target RNAs generated activities similar to or better than the guanine-binding and

adenine-binding mutant, respectively in the presence of their cognate RNAs.

We also confirmed previous observations showing that engineered PUF proteins

do not always bind with the same affinities as wild type proteins, with the GR mutant PUF

binding to its cognate cytosine-containing RNA with an affinity very similar to the wild

type PUF and its cognate NRE RNA. On the other hand, the CR mutant bound to its

cognate cytosine-containing RNA with higher affinity than that of the wild type PUF. The

next step forward is to determine if the newly found code retains its modularity and to

explore other applications of PUF proteins.

CHAPTER 4

Exploring Features of PUF-RNA Interactions

Having discovered the code for cytosine-binding PUF repeats, it would be of

interest to determine if the code has general applicability in order to ensure that the

binding properties of newly designed PUF domains are predictable. It has already been

shown that the modular nature of the interactions enables the sequence specificity of

PUFs to be altered by simply mutating the residues that make contacts with the Watson-

Crick edge of the base (Cheong and Hall, 2006). This has paved the way for PUF domains

to be engineered to recognize endogenous RNAs composed of adenine, guanine or uracil

(Wang et al., 2002; Wang et al., 2009; Tilsner et al., 2009). Here we examine if the same

modularity can be achieved with cytosine.

4.1 General applicability of cytosine-binding code

cloning pairs of annealed oligonucleotides into the multiple cloning site of the pIIIA/MS2-2

plasmid.

NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU1C: 5'-CCGGCUAGCAAUCGUAUAUAUUAAUUUAAUAAAGCAUG-3'; NREG2C: 5'-CCGGCUAGCAAUUCUAUAUAUUAAUUUAAUAAAGCAUG-3' NREU3C: 5'-CCGGCUAGCAAUUGCAUAUAUUAAUUUAAUAAAGCAUG-3'; NREU3G: 5'-CCGGCUAGCAAUUGGAUAUAUUAAUUUAAUAAAGCAUG-3'; NREA4C: 5'-CCGGCUAGCAAUUGUCUAUAUUAAUUUAAUAAAGCAUG-3'; NREU5C: 5'-CCGGCUAGCAAUUGUACAUAUUAAUUUAAUAAAGCAUG-3'; NREA6C: 5'-CCGGCUAGCAAUUGUAUCUAUUAAUUUAAUAAAGCAUG-3'; NREU7C: 5'-CCGGCUAGCAAUUGUAUACAUUAAUUUAAUAAAGCAUG-3'; NREA8C: 5'-CCGGCUAGCAAUUGUAUAUCUUAAUUUAAUAAAGCAUG-3'

A set of eight PUM1 mutants where each repeat was sequentially modified to have a

glycine at position 12 and an arginine at position 16 (GR; Figure 4.1), was made and sub-

cloned from the pJC72 backbone into the yeast expression plasmid pGAD-RC. S. cerevisiae

YBZ1 was transformed with the combination of the nine RNA expression plasmids with the

set of eight pGAD-PUM1 mutant constructs. Enhanced yellow fluorescent protein (EYFP)

was used as a negative control. The transformants were plated on SC agar lacking leucine,

uracil and histidine, supplemented with 0.5 mM 3-amino triazole.

Figure 4.1: Applicability of the C-binding code. Converting successive wild type PUM1 repeats to the cytosine-binding (GR) repeats. Green and red arrows indicate the sequential change in position of the cytosine in the probe and the position of the C-binding repeat unit, respectively.

4.1.1 The cytosine-binding code is modular

The results showed that all of these engineered PUF domains bound RNA targets

with cytosine at the position in their RNA target corresponding to the mutated repeat with

higher affinity than the wild type, non-cytosine-containing RNA target (Figure 4.2). From

the β-galactosidase assay, it was observed that the magnitude of the specificity shift

varied from repeat to repeat. Binding between A4C mutant to its RNA target with cytosine

corresponding to repeat 5 was the strongest while U1C mutant bound the weakest to its

cognate mutant RNA target, only 2-fold better than binding to the NRE. There were also

moderate interactions observed between the NRE RNA and the cytosine-mutated PUFs

(A8C and U5C), although they were both approximately 3-fold weaker than the interaction

observed with its cognate mutant RNA target. This experiment confirms previous reports

that not all repeat-base interactions contribute equally to the binding energy of the RNA-

protein complex (Zamore et al., 1997; Cheong and Hall, 2006), however all mutants

preferentially bound cytosine-containing RNAs.

Figure 4.2: C-binding code possesses modularity. Engineered PUF repeats can selectively bind cytosine at all eight positions within the RNA target. Data are mean ± SEM from six independent experiments. * p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.

To date, all RNA targets of PUF proteins analyzed contain a UGU trinucleotide that

is critical for binding (Zhang et al., 1997; Zamore et al., 1997; Gerber et al., 2004; Souza et

al., 1999; Lamont et al., 2004; Tadauchi et al., 2001; Crittenden et al., 2002; Wang et al.,

2002; Nakahata et al., 2001; Eckmann et al., 2004; Olivas and Parker 2000; Jackson et al.,

2004). The binding of UGU by repeats 6-8 is highly conserved given atypical binding mode

of repeat 7, where the asparagine side chain is not long enough to form a stacking

interaction with a base (Cheong and Hall, 2006). They found that when repeats 6 and 7

was mutated such that they recognize UUG instead of the UGU triplet typically found in

RNA targets of PUF proteins, the mutant PUF binds its cognate mutant RNA 34-fold more

tightly than wild-type RNA, whereas wild-type protein binds wild-type RNA 3,300-fold

more tightly than the mutant RNA (Cheong and Hall, 2006). Zamore et al. (1997) found

that a deletion of the repeats 7 and 8 essentially eliminated RNA binding. In either study,

they were unable to fully recover the original strength of binding affinity of mutated PUFs.

With this knowledge, it can be deduced that re-engineering at these positions would be

difficult.

4.2 PUF-RNA interaction with increasing structure

The binding mode of PUF domains observed in different crystal structures (Wang

et al., 2002; Wang et al., 2009; Gupta et al., 2008; Zhu et al., 2009) indicates that their

RNA targets are exclusively single stranded in the RNA-protein complexes, however

whether their RNA targets must be single stranded prior to PUF domain binding is not

known. To test this hypothesis, a series of RNA variants was generated whereby the

sequence downstream of the NRE was modified sequentially to place the NRE in

increasingly base-paired structures.

cloning pairs of annealed oligonucleotides into the pIIIA/MS2-2 plasmid. Base changes

were made to promote base-pairing within the RNA target to generate an increasingly

stronger hairpin structure:

NRE: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3' NREstem5: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAAUAAAGCAUG-3'; NREstem6: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUUAAAGCAUG-3'; NREstem7: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUAAAAGCAUG-3'; NREstem8: 5'-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUACAAGCAUG-3';

S. cerevisiae YBZ1 was transformed with the combination of the five RNA

expression plasmids with the original pGAD-PUM1 or with EYFP as a negative control. The

transformants were plated on SC agar lacking leucine, uracil and histidine, supplemented

with 0.5 mM 3-amino triazole (Figure 4.3).

Figure 4.3: PUF proteins can bind to highly structured RNA targets. The PUF domain is able to bind to RNA targets that are located within substantially double stranded structures. The number of bases from the PUF recognition site that were paired within a stem structure was increased stepwise from five (stem5) to eight (stem8). Data are mean ± SEM from six independent experiments. *p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.

The results revealed that the PUF protein was able to bind all of the RNA targets. It

was interesting to observe that initially when the RNA target was mutated from a 5 base

pair-stem to a 6 base pair-stem secondary structure, the binding affinity of PUF for its

target actually improved. The introduction of the 7 base pair-stem structure led to

reduced affinity compared to wild-type NRE. The final 8 base pair-stem structure, in which

every base was paired in a stem, binding was still observed albeit less efficiently. This

experiment indicates that PUF proteins are able to invade structured RNAs to bind their

target sequences. It is unclear how the PUF protein successfully achieves this as there are

no crystal structures for the interactions but it can be hypothesized that this presumably

occurs during the dynamic rearrangements that are intrinsic to RNA structures (Kedde et

al., 2010). This is relevant not only to the rational engineering of PUF domains but also to

naturally occurring PUF domain proteins.

4.3 Extending the PUF domain beyond eight repeats

As previously mentioned, naturally occurring PUF proteins typically contain eight

RNA-binding repeats. Even though this is adequate for them to selectively regulate

particular developmental processes in cells, they often achieve this by binding to several

different RNAs (Gerber et al., 2004). For various applications in medicine, biotechnology

and synthetic biology, it would be extremely desirable to be able to target only one

species of RNA within an entire transcriptome. To achieve such levels of sequence

discrimination, we engineered an extended PUF protein with 16 RNA-binding repeats and

assessed its binding abilities to an equally extended RNA target.

To engineer a 16 RNA-binding repeat PUF (PUFx2), we inserted sequences

encoding only the RNA-binding PUF repeats, without flanking regions, from the human

PUM1 cDNA between repeats five and six of a synthetic gene that encodes the same

protein sequence as the PUM1 cDNA but is only 78% similar at the DNA level, to avoid

potential instability of the recombinant DNA (Figure 4.4). The core C. elegans FBF

recognition sequence begins with a UGU triplet, as do all validated PUF target sequences.

The optimal binding site is 5′-UGURNNAUA-3′ (R, purine; N, any base; Bernstein et

al.,2005). Because the C. elegans FBF-1 and FBF-2 PUF proteins contain a short insertion

close to the end of repeat five, we reasoned that this region might tolerate the insertion

of extra PUF repeats. Repeats 1-8 of the human PUM1 cDNA were amplified using primers

that incorporated flanking SacI sites, digested with SacI and cloned into an engineered

SacI site that encodes amino acids 1030 and 1031 of the synthetic gene encoding the

PUM1 PUF domain.

cloning pairs of annealed oligonucleotides into the pIIIA/MS2-2 plasmid. Two mutant

probes were made by the introduction of three cytosine bases. The NREx2 mut1 RNA has

the UGU triplet of the newly added target region mutated to CCC, the NREx2 mut2 RNA

has the UGU triplet of the native NRE mutated to CCC:

NREx2: 5'-CCGGCUAGCAAUUGUUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3';

NREx2mut1: 5'- CCGGCUAGCAAUCCCUGUAUAUAAUAUAUUAAUUUAAUAAAGCAUG-3';

NREx2mut2: 5'- CCGGCUAGCAAUUGUCCCCUAUAAUAUAUUAAUUUAAUAAAGCAUG-3'

Figure 4.4: 16 RNA-binding repeat PUF. A PUF domain consisting of 16 RNA-binding repeats provides additional binding specificity and selectivity. The structure of the engineered 16 repeat PUF and its cognate RNA target are shown schematically. NREx2 mut1 RNA has the UGU triplet of the newly added target region mutated to CCC, the NREx2 mut2 RNA has the UGU triplet of the native NRE mutated to CCC.

S. cerevisiae YBZ1 was transformed with the combination of the five RNA

expression plasmids with the original pGAD-PUM1 with EYFP as a negative control. The

transformants were plated on SC agar lacking leucine, uracil and histidine, supplemented

with 0.5 mM 3-amino triazole.

Figure 4.5: Extended PUF binds its extended RNA target. Survival on media with 0.5 mM 3-aminotriazole and lacking histidine and β-galactosidase assays were used to determine the interaction of PUF domains and their RNA targets. Data are mean ± SEM from six independent experiments. *, p < 0.01 of cognate to non-cognate RNA-protein complexes by a 2-tailed Student’s t test.

The experiment revealed that the extended PUF was not only able to successfully

bind to its cognate extended RNA target in yeast, it also activated transcription of the β-

galactosidase reporter gene more efficiently than the eight repeat PUF with its cognate

RNA (Figure 4.5). The wild type PUF protein was able to bind just as well to the NREx2 and

the NREx2 mut1 because the NREx2 mut1 RNA has the UGU triplet of the newly added

target region mutated to CCC, unlike the NREx2 mut2 which has the CCC mutation

introduced at the native PUF target site. The inserted and flanking PUF repeats

contributed to the binding affinity and selectivity as separately mutating the UGU triplets

recognized by both sets of repeats significantly decreased β-galactosidase activity and

growth on selective media. It would be of interest to obtain the crystal structure of the 16

repeat PUF to enable the visualization of how it binds to its extended RNA target. This

experiment shows that engineered PUF domain proteins which have 16 RNA-binding

repeats can provide the means to selectively bind RNAs in higher eukaryotes that have

more complex transcriptomes.

4.4 Summary

We have successfully shown that the newly discovered cytosine-binding PUF

repeat can be used in a modular manner. This is significant as binding specificity can be

transferred to different positions in the RNA recognition sequence, enabling the design of

engineered PUFs with predictable binding. We also observed that the extent of the

specificity shift from repeat to repeat varied, similarly observed by other studies that

report that not all repeat-base interactions contribute equally to the binding energy of the

RNA-protein complex (Zamore et al., 1997; Cheong and Hall, 2006).

We found that the PUF protein was able to bind to highly structured RNA targets,

which was previously not known as other studies have only reported interactions between

RNA targets that were exclusively single stranded in the RNA-protein complexes. Finally,

we engineered an extended PUF protein with 16 RNA-binding repeats and successfully

showed that it had the ability to bind to its extended RNA target in yeast and also

activated transcription of the β-galactosidase reporter gene more efficiently than the

eight-repeat wild type PUF. Having the ability to target a specific RNA species within the

plethora of transcripts would be highly advantageous for various biotechnology and

medical applications. Engineered PUF proteins would provide unique prospects for fine-

tuning the expression of endogenous genes or transgenes as the post-transcriptional

control of gene expression is rapid and more precise (Isaacs et al., 2004; Zenklusen et al.,

2008). Additionally, the ability to regulate events such as nuclear retention or cytoplasmic

localization of mRNAs can only be achieved at the level of RNA, for instance the mRNAs

(Johnston, 2005). This new toolkit may aid us in better understanding the complex

patterns of gene expression in living cells (Filipovska and Rackham, 2008; Isaacs et al.,

2006).

CHAPTER 5

Engineering Consensus PPRs

The pentatricopeptide repeat (PPR) proteins are made up of 2-26 copies of a 35

amino acid degenerate motif, comprised of two anti-parallel α-helices (Schmitz-

Linneweber and Small, 2008; Small and Peeters; 2008; Ringel et al., 2011). Although they

are similar in structure to the tetratricopeptide (TPR) repeat, TPR proteins are responsible

for mediating protein-protein interactions while PPR domains are mainly involved in RNA-

protein interactions (Small and Peeters, 2000). PPR proteins are of interest as not much is

known with respect to their modular architecture, folding and binding specificities. What

is known is that they are present in a large number of proteins that are associated with an

extensive range of biological functions involving RNA. Their repetitive nature also

indicates that they might have potential as modular RNA-binding proteins similar to PUFs.

Engineering repeat proteins composed of identical or near-identical repeats is an

appealing approach to elucidate their fundamental attributes. Such scaffolds would

consist of short and simple repeated modules, and each repeat would have identical intra-

and inter-repeat interactions (Main et al., 2003). In addition, perfect designed repeat

proteins could be more symmetrical structurally than naturally occurring repeat proteins

and adding or removing whole repeats would simply extend or shorten the protein

without disrupting its tertiary structure (Main et al., 2003). The ability to easily adjust or

modify them enables a more flexible approach to investigate repeat proteins. The

abundance of sequences accessible makes a statistical design approach a fitting strategy

as multiple sequence alignments can be conducted to aid in the identification of

functionally and structurally important residues of repeat proteins. The profound interest

in repeat proteins had already led to the successful design of consensus TPR (Main et al.,

2003), ankyrin (Kohl et al., 2003; Mosavi et al., 2002), and leucine-rich repeat (LRR)

proteins (Stumpp et al., 2003). Our aim was to use a consensus-based PPR design to

create an array of PPR proteins to decipher the RNA-binding code of these proteins and to

create a robust RNA-binding scaffold for biotechnology applications. Although during the

course of this study Barkan et al (2012) used computational methods to deduce a code for

nucleotide recognition of PPR proteins by using the maize protein PPR10 as a model, there

are an abundance of amino acid combinations at positions 4 and 34 (6 and 1’ according to

Barkan et al.) in nature that cannot be predicted computationally. The highly insoluble

nature of naturally occurring PPRs makes them extremely challenging to study. In

addition, the auxiliary sequences in PPR10 make it difficult to determine which PPRs are

important for RNA recognition or to rationally adjust the number of RNA-binding repeats

in this protein.

5.1 Designing consensus PPR

Consensus design is defined as the engineering of a protein composed of the most

common residues at each position determined from a multiple sequence alignment

(Desjarlais and Berg, 1993). More often than not, these designed consensus proteins have

been substantially more stable than natural proteins used in the multiple sequence

alignment. The first step in constructing a consensus PPR (cPPR) was to create a PPR

multiple sequence alignment by searching and retrieving all PPR sequences (Figure 5.1).

An initial PPR consensus was made to maximize the inclusiveness of the subsequent PSI-

BLAST to build a final consensus. The 100 most diverse PPR sequences were obtained from

Pfam PPR record TIGR00756 and narrowed down to canonical 35 amino acid PPRs. PSI-

BLAST was used to generate the multiple sequence alignments, resulting in a position-

specific scoring matrix (PSSM; Altschul and Koonin, 1998). The NCBI Protein Reference

Sequences were searched, as it is a curated, non-redundant database. PSSM is a scoring

matrix that provides substitution scores of amino acid for each position in a protein

multiple sequence alignment individually (National Council for Biotechnology Information,

2011). Positive scores specify that a particular amino acid substitution occurs more

frequently in the alignment than expected while negative scores specify that the

substitution occurs less frequently. Most of the time, large positive scores indicate critical

functional residues (eg. active site residues; National Council for Biotechnology

Information, 2011). Four iterations were performed at which point no new significant

matches were identified. Iteration refers to the process whereby a protein profile is run

against a database, after which new similar sequences can be detected (Altschul and

Koonin, 1998). A new multiple alignment that includes the sequences from the previous

iteration, can be constructed, resulting in the abstraction of a new profile and a new

database search performed. The procedure can be iterated as repeatedly as needed or

until convergence when no new statistically significant sequences can be detected

(Altschul and Koonin, 1998). The consensus PPR PSSM was derived from 6286

sequences. Calculation of a global propensity for each position in the PPR motif was

conducted in order to determine the ratio of the percentage of occurrence of an amino

acid at a given position to its percentage occurrence in the protein database (Andrade et

al., 2001; Kajender et al., 2006). The final designed consensus PPR sequence was taken as

those residues with the highest global propensity at each position of the PPR motif.

Figure 5.1: Pentatricopeptide repeats. A sequence logo illustrating the characteristic amino acid composition of PPR sequences. The PPR profile was constructed from 14,466 PPRs found in the PROSITE PPR entry (PDOC51375; WebLogo). These sequences were derived from the following taxonomic groups: 86% plants, 5.7% fungi, 4.3% animals, 1.8% algae, 1% trypanosomes and 1.2% others. Amino acids are colour coded according to the physiochemical properties of their side chains: small (A, G) in black, nucleophilic (C, S, T) in blue, hydrophobic (I, L, V, M, P) in green, aromatic (F, W, Y) in red, acidic (D, E) in purple, amides (Q, N) in pink, and basic (H, K, R) in orange. Regions of alpha helical structure are shown below. Amino acids are numbered based on the Pfam model, which functions as a minimal unit. Residue 34 is also defined as ii according to Kobayashi et al. (2012) while the numbering scheme used by Fujii et al. (2012) is shifted to the N-terminus by two amino acids such that amino acids 1, 4 and 34 in the Pfam model are annotated as 3, 6 and 1, respectively.

The one exception was position 11, where cysteine was replaced by glycine to

exclude the possibility of undesirable oxidation/reduction reactions that may interfere

with folding. The design of the final cPPR protein consisted of 8 repeats because (i) it is of

a manageable size (ii) based on previous experience working with PUFs (iii) it strikes a

balance between effective binding and non-specific association (iv) it is predicted to be

able to bind a contiguous RNA. In addition to the repeated consensus units, two extra

features were inserted into the consensus proteins: nucleating N-terminal (Met-Gly-Asn-

Ser) residues and a C-terminal solvating helix similarly utilized by Main et al. (2003). The

design was modeled after a consensus TPR design used by Main et al. (2003) to engineer

novel proteins by arraying various numbers of an idealized TPR motif. They found that the

proteins were stable, possessed native-like properties (such as reversible thermal

denaturation transitions) and formed the desired TPR fold.

cPPRcaps Amino acid Sequence

GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYNTLISGLGKAG

Features N-Cap PPR repeat (x8) Half repeat*

*half repeat refers to the solvating helix sequence.

The Met-Gly-Asn-Ser N-terminal cap was used because statistically, Gly, Asn, and

Ser have the highest propensities to occur at the N″, N′, and N cap positions in α helices

(Kumar and Mansal, 1998; Aurora and Rose, 1998; Dasgupta and Bell, 1998; Richardson

and Richardson, 1993). The C-terminal solvating helix (referred as half repeat) was added

after the final consensus repeat with the rationale of increasing its solubility (Main et al.,

2003).

The binding preferences of the cPPRcaps protein were tested in vitro via RNA

electrophoretic mobility shift assays (RNA EMSA) between the consensus PPR protein and

four RNA probes [poly(A), poly(G), poly(C) and poly(U)]. In order to do this, I over

expressed the cPPRcaps poly(U) protein in E. coli and purified it to homogeneity.

5.1.1 Purifying cPPRcaps protein

The synthetic cPPRcaps domain was first sub-cloned from the pMK-RQ plasmid into

the pTYB3 protein expression plasmid. This system expresses the consensus PPR proteins

as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The proteins were

purified as per Section 2.2.7.

Figure 5.2: Purification of cPPRcaps protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Soluble lysate after sonication. Lane 6: Unbound soluble protein after chitin bead binding. Lane 7: Purified cPPRcaps protein. Lane 8: Chitin beads

5.1.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps protein

In order to determine the binding specificity of the purified cPPRcaps protein, an in

vitro assay known as the RNA-electrophoretic mobility shift assay (RNA EMSA) was

conducted. This assay was conducted by incubating the purified cPPR proteins with

cPPRcaps-intein- CBD intein-CBD

cPPRcaps

fluorescein labeled RNA oligonucleotides in binding buffer for an hour; the reactions were

subsequently analyzed by 10% PAGE in TAE and fluorescence was detected using a

Typhoon TRIO scanner. The four fluorescein-labeled RNA oligonucleotides are poly(G) [5’-

(Fl)AAGGGGGGGG-3’], poly(C) [5’-(Fl)CCCCCCCCCC-3’], poly(U) [5’-(Fl)UUUUUUUUUU -3’]

and poly(A) [5’-(Fl)AAAAAAAAAA-3’]. In vitro assays with poly Guanine probes are

challenging due to the propensity of G tracts to form stable quadruplex structures

(Kobayashi et al., 2011). Indeed no fluorescence could be detected in a fluorescein-labeled

RNA consisting of 10 Gs. To bypass this limitation, we used a probe where a run of 8 Gs

was linked to a fluorescein label via two Adenine residues. This probe had readily

detectable fluorescence and was used in all REMSAs as a “polyG” probe.

Figure 5.3: RNA probe recognition of cPPRcaps in vitro. (a) cPPRcaps is specific for uracil containing RNAs, determined by RNA EMSA. cPPRcaps protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.

The RNA EMSAs showed specific binding between the cPPRcaps protein and the

poly(U) target RNA compared to the other three poly(A), poly(G) and poly(C) RNA probes

(Figure 5.3). The top most RNA EMSA with the poly(U) probe shows the cPPRcaps poly(U)

binding strongly to the RNA target, with RNA-protein complexes forming at 0.08 µM

protein. This shows that the combinatorial amino acid code for uracil recognition is

contained within the consensus PPR protein we designed. After I obtained this result, the

study by Barkan et al. (2012) was published showing that residues 4 and 34 of PPRs are

responsible for their binding specificity. According to the code elucidated by Barkan et al.

(2012), the consensus that has residues Asp (N) and Aspartic acid (D) at positions 4 and 34

(N4D34) will bind U. Therefore, given that all 8 repeats are identical, it would be predicted

that the purified cPPR would bind to a poly(U) RNA target. Here I confirmed that

prediction and provide validation for their proposed code. Additionally, the cPPR design

could be utilized as a scaffold to find other combinations of amino acids that could

specifically bind other bases.

5.2 In vitro analysis of other cPPR interaction based on Barkan et al. (2012)

The next set of cPPR repeats were designed according to the code described by

Barkan et al. (2012). This code specifies that (i) Thr (T) at position 4 and Asp (D) at position

34 (T4D34) will bind guanine, (ii) Thr (T) at position 4 and Asn (N) at position 34 (T4N34) will

bind adenine, and (iii) Asn (N) at position 4 and Asn (N) at position 34 (N4N34) will bind

cytosine or uracil. The binding preferences of the cPPRcaps poly(A), poly(G) and poly(U/C)

protein was tested in vitro via the RNA EMSA between the consensus PPR proteins and

four RNA probes. The cPPR proteins were over-expressed in E. coli and purified to

homogeneity.

cPPRcaps Amino acid Sequence poly(A) GAPMGNS VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYTTLISGLGKAG poly(G) GAPMGNS VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYTTLISGLGKAG

poly(U/C) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYNTLISGLGKAG Features N-Cap PPR repeat (x8) Half repeat*

5.2.1 Purification of cPPRcaps poly(A) protein

The synthetic cPPR poly(A) domain was first sub-cloned from the pMK-RQ plasmid

into the pTYB3 protein expression plasmid. This system expresses the consensus PPR

proteins as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The

proteins were purified as per Section 2.2.7 (Figure 5.4).

Figure 5.4: Purification of cPPRcaps poly(A) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(A) protein. Lane 7: Chitin beads

cPPRcaps poly(A)- intein-CBD

intein-CBD

cPPRcaps poly(A) protein

5.2.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(A) proteins.

In order to confirm the binding specificity of the purified cPPRcaps poly(A)

proteins, an RNA EMSA was conducted (as described above in Section 5.1.2).

Figure 5.5: RNA probe recognition of cPPRcaps Poly A in vitro. (a) cPPRcaps poly(A) is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(A) protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.

The RNA EMSAs showed highly specific binding between the cPPRcaps poly(A)

protein and its poly(A) target RNA compared to the other three poly(U), poly(G) and

poly(C) RNA probes. The third RNA EMSA from the top with the poly(A) probe shows the

cPPRcaps poly(A) binding strongly to its target, with RNA-protein complexes forming at

0.01 µM protein. Hence, the combinatorial amino acid code for adenine recognition is

T4N34.

5.2.3 Purification of cPPRcaps poly(G) protein

The synthetic cPPR Poly G domain was first sub-cloned from the pMK-RQ plasmid

into the pTYB3 protein expression plasmid. The proteins were purified as per Section 2.2.7

(Figure 5.6).

Figure 5.6: Purification of cPPRcaps poly(G) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPR PolyG protein. Lane 7: Chitin beads

cPPRcaps poly(G)- intein-CBD intein-CBD

cPPRcaps poly(G) protein

5.2.4 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(G) proteins.

In order to confirm the binding specificity of the purified cPPRcaps poly(A)

proteins, an RNA EMSA was conducted (as described above in Section 5.1.2).

Figure 5.7: RNA probe recognition of cPPRcaps poly(G) in vitro. (a) cPPRcaps poly(G) is not specific for any nucleotide probes, determined by RNA EMSA. cPPRcaps poly(G) protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.

The RNA EMSAs showed no shifts between the cPPRcaps poly(G) protein and its

poly(G) target RNA (Figure 5.7). No shifts were observed between the cPPRcaps poly(G)

and the other three RNA probes too.

5.2.5 Purification of cPPRcaps poly (U/C) protein

The synthetic cPPRcaps poly(U/C) domain was first sub-cloned from the pMK-RQ

plasmid into the pTYB3 protein expression plasmid. The proteins were purified as per

Section 2.2.7.

Figure 5.8: Purification of cPPRcaps poly(U/C) protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate. Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(U/C) protein. Lane 7: Chitin beads

The cPPRcaps poly(U/C) protein is highly insoluble and was not successfully

purified (Figure 5.8).

cPPRcaps poly(U/C)- intein-CBD

5.3 In vitro analysis of other consensus PPR combinations

In order to discover combinations of amino acids at positions 4 and 34 that might

specifically bind G or C, we tested if either Gly (G) or Ser (S) at position 4 and Asp (D) at

position 34 (G4D34 or S4D34) will bind guanine and Asn (N) at position 4 and either Ser (S)

or Thr (T) at position 34 (N4S34 or N4T34) will bind cytosine. The binding preferences of the

cPPR Poly G (GD/SD) and cPPR Poly C (NS/NT) proteins were tested in vitro via RNA EMSA

between the consensus PPR proteins and four RNA probes. The cPPR proteins were over-

expressed in E. coli and purified to homogeneity.

cPPRcaps Amino acid Sequence PolyC (NT) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPTV VTYNTLISGLGKAG

PolyC (NS) GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPSV VTYNTLISGLGKAG

PolyG (GD) GAPMGNS VTYGTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYGTLISGLGKAG

PolyG (SD) GAPMGNS VTYSTLISGLGKAGRLEEALELFEEMKEKGIVPDV VTYGTLISGLGKAG

Features N-Cap PPR repeat (x8) Half repeat*

5.3.1 Purification of cPPRcaps poly(C) [NS/NT] protein

The synthetic cPPRcaps poly(C) [NS/NT] domains were first sub-cloned from the

pMK-RQ plasmid into the pTYB3 protein expression plasmid. The proteins were purified as

per Section 2.2.7 (Figures 5.9 and 5.10).

Figure 5.9: Purification of cPPRcaps poly(C) [NS] protein. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(C) [NS] protein. Lane 7: Chitin beads

Figure 5.10: Purification of cPPRcaps poly(C) [NT] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(C) [NT] protein. Lane 7: Chitin beads

cPPRcaps poly(C) [NS]- intein-CBD

intein-CBD

cPPRcaps poly(C) [NS]

cPPRcaps poly(C) [NT]- intein-CBD

intein-CBD

cPPRcaps poly(C) [NT]

5.3.2 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(C) [NS/NT]

proteins

In order to confirm the binding specificity of the purified cPPRcaps poly(C) proteins

[NS/NT], an RNA EMSA was conducted (as described above in Section 5.1.2).

Figure 5.11: RNA probe recognition of cPPRcaps poly(C) [NS] in vitro. (a) cPPRcaps poly(C) [NS] is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(C) [NS] protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.

Figure 5.12: RNA probe recognition of cPPRcaps poly(C) [NT] in vitro. (a) cPPRcaps poly(C) [NT] is specific for adenine containing RNAs, determined by RNA EMSA. cPPRcaps poly(C) [NT] protein was tested against adenine, uracil, cytosine and guanine containing RNA probes.

Both RNA EMSAs showed specific binding between the cPPRcaps poly(C) [NS] and

[NT] variants and its poly(C) target RNA compared to the other three poly(A), poly(G) and

poly(U) RNA probes (Figures 5.11 and 5.12). cPPRcaps poly(C) [NS] has a slightly stronger

binding affinity compared to cPPRcaps poly(C) [NT] because binding was achieved at 0.08

µM protein compared to 0.16 µM for the latter. This experiment shows that the

combinatorial amino acids that enable cytosine recognition are N4T34 and N4S34.

5.3.3 Purification of cPPRcaps poly(G) [GD/SD] proteins

The synthetic cPPRcaps poly(G) [GD/SD] domain was first sub-cloned from the

pMK-RQ plasmid into the pTYB3 protein expression plasmid. The proteins were purified as

per Section 2.2.7 (Figures 5.13 and 5.14).

Figure 5.13: Purification of cPPRcaps poly(G) [GD] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(G) [GD] protein. Lane 7: Chitin beads Figure 5.14: Purification of cPPRcaps poly(G) [SD] protein. Proteins were resolved on a 10% SDS-Tris-glycine. . Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps poly(G) [SD] protein. Lane 7: Chitin beads

cPPRcaps poly(G) [GD]- intein-CBD intein-CBD

cPPRcaps poly(G) [GD]

cPPRcaps poly(G) [SD]- intein-CBD

intein-CBD

cPPRcaps poly(G) [SD]

5.3.4 RNA Electrophoretic Mobility Shift Assay of the cPPRcaps poly(G) [GD/SD]

proteins

In order to confirm the binding specificity of the purified cPPRcaps poly(G) proteins

[GD/SD], an RNA EMSA was conducted (as described above in Section 5.1.2).

Figure 5.15: RNA probe recognition of cPPRcaps Poly G (SD) in vitro. Both cPPRcaps poly(G) [SD] and cPPRcaps poly(G) [GD; not shown] are not specific for any RNAs, determined by RNA EMSA. cPPRcaps poly(G) [NS] protein was tested against guanine and three other RNA probes (not shown: adenine, uracil, cytosine).

Both cPPRcaps poly(G) [SD/GD] proteins did not bind to its poly(G) RNA probe

(Figure 5.15). No binding was observed between both cPPRcaps poly(G) variants and the

other three RNA probes either (data not shown). It was speculated that the poly(G) RNA

probe may have been compromised as G homopolymers in RNA are known to form highly

stable quadruplex structures (Pochon and Michelson, 1965; Simonsson, 2001).

5.4 Summary

We constructed a stable and soluble consensus PPR architecture and used it to

uncover the amino acid code for binding RNA by PPR proteins. We validated the results of

two previous studies, Kobayashi et al. (2012) and Barkan et al. (2012), and shown that

amino acids at position 4 and 34 are important for nucleotide recognition. The code,

similar to that described by Barkan et al. (2012), specifies that N4D34 bind uracil; T4N34

bind adenine and N4S34 bind cytosine. Futhermore, I showed for the first time that

repeats with N4T34 specifically bind cytosine.

Figure 5.16: The recognition code of PPRs for RNA bases. On the left is the code described by Barkan et al. (2012). Highlighted in blue is our newly found N4T34 code that specifies cytosine binding.

My designed consensus PPRs have revealed the code for RNA recognition functions

in a highly reduced protein consisting almost entirely of only 8 PPRs. We also showed that

PPRs do not all bind with the same affinities for their target RNA, as seen with cPPRcaps

poly(A) compared to cPPRcaps poly(U). The differences in RNA binding affinity of the

proteins may indicate that there are PPR motifs of low and high RNA binding affinities

(Kobayashi et al., 2012), with each nucleotide recognition having different contributions to

the overall binding affinity. Kobayashi et al. (2012) found that having several PPR motifs

led to higher binding affinities compared to proteins with a single PPR motif, suggesting

that there may be cooperative effects between motifs observed during RNA-binding,

instead of simple sum of individual motif affinities. These interactions may be similar to

those observed with RRMs, whereby the combination of two or more RNA binding

domains significantly increases the affinity for the target (Clery et al., 2008). The next step

forward is to determine if the newly found code retains its modularity and to investigate

other potential applications of PPR proteins.

CHAPTER 6

Engineering Designer PPR proteins

Here I investigated the modularity of the PPR code and sought to determine if the

consensus PPR can be recoded to bind a specific target akin to the predictable modularity

of the PUF proteins (Wang et al., 2002; Wang et al., 2009; Tilsner et al., 2009). I

investigated the possibility of engineering PPR proteins to recognize endogenous RNAs

and if they can be used to modulate gene expression. A PPR protein was designed to

target the native target of the PUM1 protein, the NRE, that would enable me to compare

and contrast the binding characteristics of PUF and PPR proteins. Next I used a PPR to

target the poly(A) tails of mRNAs encoded by the mitochondrial genome in human cells.

As there is little known about the role of poly(A) tails of mammalian mitochondrial mRNAs

(Nagaike et al.,2005), this strategy would be very useful to study their role in gene

expression.

6.1 Design of a consensus PPR protein that binds the NRE RNA

The cPPRcaps NRE was designed according to the Pfam model. The Pfam model

defines the 1st amino acid of the PPR motif as the start of helix A, which is a valine.

(http://Pfam.sanger.ac.uk/, PF01535). Kobayashi et al. (2012) also showed that the

proteins of the Pfam model displayed definite RNA binding activities based on the

apparent KD with position 34 being important for PPR function. To determine if binding

specificity of PPRs can be engineered, using the predicted PPR code, I designed a PPR

protein that should bind the native RNA target of PUM1, NRE. The cPPRcaps NRE

(GeneArt) was cloned into the E. coli expression vector pTYB3. The binding specificity of

the cPPRcaps NRE protein was tested in the presence of two RNA probes (NRE and

NREU3C) in vitro using an RNA electrophoretic mobility shift assay (RNA EMSA). Initially, I

over expressed the cPPRcaps NRE protein in E. coli and purified it to homogeneity.

cPPRcaps NRE

Amino acid Sequence Predicted base

Repeat1 GAPMGNS VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat2 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPDV G Repeat3 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat4 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV A Repeat5 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat6 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV A Repeat7 VTYNTLISGLGKAGRLEEALELFEEMKEKGIVPDV U Repeat8 VTYTTLISGLGKAGRLEEALELFEEMKEKGIVPNV VTYNTLISGLGKAG A Features N-Cap PPR repeats Half repeat*

*half repeat refers to the solvating helix sequence 6.1.1 Purifying cPPRcaps NRE proteins

The synthetic cPPRcaps NRE domain was first sub-cloned from the pMK-RQ plasmid

into the pTYB3 protein expression plasmid. This system expresses the consensus PPR

proteins as a fusion to an intein and chitin-binding domain in E. coli ER2566 cells. The

proteins were purified as per Section 2.2.7. A large fraction of cPPRcaps NRE was soluble

and could be purified effectively (Figure 6.1).

Figure 6.1: Purification of cPPRcaps NRE protein. Proteins were resolved on a 10% SDS-Tris-glycine. Lane 1: PageRuler Protein Ladder. Lane 2: Uninduced whole cell lysate. Lane 3: Induced whole cell lysate Lane 4: Insoluble lysate after sonication. Lane 5: Unbound soluble protein after chitin bead binding. Lane 6: Purified cPPRcaps NRE protein. Lane 7: Chitin beads

cPPRcaps NRE intein-CBD

intein-CBD

cPPRcaps NRE

6.1.2 RNA electrophoretic mobility shift assay of the cPPRcaps NRE protein

To confirm the binding specificity of the purified cPPRcaps NRE proteins, a RNA

EMSA was conducted with fluorescein-labeled RNA oligonucleotides NRE and NREU3C

concurrently with the PUM1 protein as a comparison of binding specificity.

Figure 6.2: Comparison between PUM1 and cPPRcaps NRE in vitro. Both proteins were tested against the wild type NRE and cytosine-containing RNA probes.

The RNA EMSA revealed a prominent specificity shift whereby the cPPRcaps NRE

bound to its cognate NRE RNA with an affinity very similar to the wild type PUF and the

NRE RNA, while the cPPRcaps NRE bound to the cytosine-containing RNA with lower

affinity. Unlike the cPPRcaps NRE, the higher binding stringency of the PUM1 protein did

not enable the visualization of a NREU3C-PUM1 complex in this experiment.

6.2 Mammalian mitochondrial RNA metabolism

The mammalian mitochondrial DNA (mtDNA) is 16.5 kb in size, circular and double

stranded, encoding for 13 proteins (Smeitink et al., 2001). These proteins include all

members of the mitochondrial ATP synthase, in addition to respiratory complexes I, III,

and IV that form the electron transport chain in the mitochondrial inner membrane

together with Complex II (Smeitink et al., 2001). The electron transport chain (ETC) and

the ATP synthase are responsible for producing energy via oxidative phosphorylation.

Nuclear and mitochondrial gene expression requires cooperative regulation in response to

energy production. This is because the mitochondrial respiratory complexes are mainly

composed of nuclear encoded polypeptides that are imported into mitochondria post-

translationally, with their proper assembly and function dependent on the expression of

the mitochondria encoded polypeptides (Smeitink et al., 2001). The mitochondrial

genome is also dependent on nuclear encoded proteins for replication, repair,

transcription, and translation (Shoubridge, 2001). Recently, a study by Mercer et al. (2011)

showed that mitochondrial gene expression is significantly correlated with nuclear gene

expression, reinforcing the close coordination between both genomes in relation to the

energy needs of different tissues.

Mammalian mitochondrial mRNAs differ to the nuclear encoded mRNAs in that (i)

they lack both 5’ and 3’ untranslated regions (UTRs), (ii) have no Shine-Dalgarno

sequences, (iii) lack 5’ 7-methylguanosine caps, (iv) do not include introns and base

modifications (Montoya et al., 1981). Other interesting features are that mitochondrial

mRNAs use a non-universal genetic code where AUA codes for a methionine instead of

isoleucine, AGA and AGG are used as termination codons instead of coding for arginine,

and that UGA encodes for a tryptophan instead of a stop codon (Barrell et al., 1979). A

majority of mature mammalian mitochondrial mRNAs start at the first nucleotide

positioned at the 5’ end; nine open reading frames (ORFs) use AUG as their start codons,

three ORFs use AUA, and one ORF uses AUU to encode the first methionine in mammalian

mitochondrial encoded proteins (Mercer et al., 2011). The three ORFs that are the

exception to this rule encode MT-ATP8, MT-ND1 and MT-CO1 because they have 1, 2 and

3 nucleotides preceding their start codons (Mercer et al., 2011). The second ORF of the

two bicistronic transcripts MT-ND4L/ND4 and MT-ATP8/ATP6 have a prominent 5’ leader

sequence compared to the remaining mRNAs. Although all these features results in a

more compact mitochondrial genome, their significance remains to be determined. In

addition, the translation of all these mRNAs and the mechanism by which mitochondrial

ribosomes recognize them are still unknown (Rackham et al., 2012).

One feature that remains identical to that found in nuclear mRNAs is

polyadenylation. With the exception of MT-ND6 mRNA, all mitochondrial mRNAs are

polyadenylated at the 3’ end (Temperley et al., 2010; Slomovic et al., 2005, Mercer et al.,

2011). 3’ polyadenylation is crucial for the maturation of mitochondrial mRNAs because

seven of the 11 mRNAs use it to complete their termination codons (Temperley et

al.,2010; Bobrowicz et al., 2009; Borowski et al., 2010; Gagliardi et al., 2004; Nagaike et

al., 2008). Approximately 45 nucleotides are added onto the 10 polyadenylated

mammalian mRNAs, but their length may vary depending on the specific mRNA and

between different cell types or even with the same mRNA in different cell types

(Temperley et al., 2010). For mRNAs such as MT-CO1 and MT-ND6, it has been proposed

that the 3’ UTRs facilitates the proper termination of protein synthesis by establishing

strong secondary structures (Temperley et al., 2010). Recently, Rackham et al. (2011)

found that not only was the 3’ UTR of MT-ND5 significantly longer compared to MT-CO1

and MT-CO2 (which are not as extensively polyadenylated as the other nine mRNAs), the

3’ UTR of this mRNA acts as a stable long non-coding RNA, although its function is not

known.

A study by Slomovic et al. (2005) found that there were variations in RNA

polyadenylation frequencies that do not correlate with RNA abundance, indicating that

some mitochondrial RNAs may exist in a non-polyadenylated form. mRNAs such as MT-

ND4L/ND4, which require polyadenylation to complete an in-frame stop codon (Ojala et

al., 1981), were found to be diminished in fractions depleted of polyadenylated transcripts

while others like MT-CO1 mRNA were found to be enriched in these fractions. This

indicates that mRNAs which require polyadenylation to complete an in-frame stop codon

are stabilized by polyadenylation of their 3’ end. Wydro et al. (2010) conducted a study

that involved importing a cytoplasmic poly(A) specific 3’ → 5’ exoribonuclease (PARN) into

mitochondria with the purpose of removing the poly(A) tails. They found that this

stabilized some mRNAs, made some of them less stable while others did not appear to

have any effects. This shows that the role of mitochondrial mRNA polyadenylation in

mammals is not always consistent and appears to vary for specific mRNAs. Likewise in

cells, MT-CO1, MT-CO2, MT-CO3, and MT-ATP8/ATP6 mRNAs can be destabilized by the

reduction of the mitochondrial poly(A) polymerase (PAPD1) but have no effect on MT-ND3

mRNA (Nagaike et al., 2005), but a decrease in the length of the poly(A) tail of MT-ND1

mRNA by 2’-phosphodiesterase (PDE12) has been shown to increase its abundance

(Rorbach et al., 2011). The significance of polyadenylated and non-polyadenylated mRNAs

remains unclear but these findings suggest that there are indeed two individual isoform

pools.

Polyadenylation has been shown to affect mitochondrial translation but the exact

role that it plays in mRNA stability and translation needs to be investigated further to help

understand the role of mRNA polyadenylation in mammalian mitochondria (Wydro et al.,

2005; Rorbach et al., 2011; Ruzzenente et al., 2011). As an initial step towards

manipulating the mitochondrial transcriptome at the polyadenylation level, I designed a

mitochondrially targeted poly(A) recognizing PPR protein to bind mitochondrial mRNAs in

cells.

6.2.1 cPPRcaps poly(A) reduces the translation of mitochondrially encoded proteins

In Chapter 5, the RNA EMSA showed that the cPPRcaps poly(A) (Figure 5.5), has a

very strong affinity for its cognate poly(A) RNA. I used this cPPRcaps poly(A) protein and

fused it to a mitochondrial targeting signal derived from ornithine transcarbamylase (OTC;

Mori et al., 1982; Horwich et al., 1986) so that I could use it to bind poly(A) tails of mature

mitochondrial mRNAs in cells. OTC is a nuclear-encoded mitochondrial matrix enzyme

whose leader peptide directs mitochondrial localization, both in vitro and in intact cells

(Horwich et al., 1986). The precursor is recognized by mitochondria and translocated in an

energy-dependent fashion, across both mitochondrial membranes (Mori et al., 1982;

Kolansky et al., 1982). Mitochondrial protein synthesis was measured in the presence of

cyclohexamide to inhibit cytoplasmic translation and 35S cysteine and methionine to label

mitochondrially encoded proteins only, when cPPRcaps poly(A), cPPRcaps NRE or EYFP

proteins were expressed in 143B osteosarcoma cells.

Figure 6.3: cPPRcaps poly(A) affects mitochondrial protein synthesis. (Left) cPPRcaps poly(A) expression lowers mitochondrial translation in cells. 143B cells were transfected with pOTC- cPPRcaps poly(A), pOTC-cPPRcaps NRE and pEYFP-TAP and protein synthesis was measured by pulse incorporation of 35S-labelled methionine and cysteine. Equal amounts of cell lysate protein were separated by SDS–PAGE and visualised by autoradiography. (Right) The gels were stained with Coomassie Brilliant Blue to confirm equal loading.

A general decrease of mitochondrially-encoded proteins was observed of the cPPR

protein designed to target mitochondrial mRNAs, compared to controls that was

particularly apparent after a 6-day expression of this protein (Figure 6.3). It appears that a

second cPPRcaps poly(A) transfection at day 4 resulted in a more pronounce effect on

protein levels at day 6 compared to a single transfection. This finding is very significant

since to date, it has been very challenging to manipulate the mitochondrial transcriptome

that is contained within the mitochondrial matrix. The two mitochondrial membranes

(inner and outer), one of which is tightly coupled and impermeable to anything but ions

(Alberts et al., 2002), that surround the mitochondrial matrix provide a barrier that

prevents the import of any macromolecules, particularly if they are negatively charged.

For this reason, manipulation of mitochondrial gene expression using antisense agents,

RNAi or other technologies has not been possible to date. Further investigation would be

required to determine the maximal level of overall protein knockdown that can be

achieved with prolonged PPR protein expression cycle, possibly through the establishment

of stable cell lines.

6.3 cPPRcaps poly(A) does not affect mitochondrial RNA stability

To investigate if the binding of the poly(A) tails of mitochondrial RNAs affects the

stability of the mRNAs or if it interferes with their translation on mitochondrial ribosomes

I isolated RNA from cells that were treated with the cPPRcaps poly(A) or cPPRcaps NRE

and control treatments and carried out northern blotting. I investigated the effects of

these proteins on several different mitochondrial mRNAs that have varying poly(A) tail

lengths, the MTND6 mRNA that is known to lack a poly(A) tail (Temperley et al., 2010;

Mercer et al., 2011), and the 12S and 16S rRNAs that are also known to lack a poly(A) tail.

Figure 6.4: cPPRcaps poly(A) lowers CytB and ND1 transcripts. RNA isolated from mitochondria was analysed by northern blotting after expression of cPPRcaps poly(A) and cPPRcaps NRE. The blot was stripped and re-probed with probes against mitochondrial mRNAs and ribosomal RNAs (RNR1 and RNR2).

The northern blots indicate that the levels of mitochondrial transcripts are not

affected by the PPR proteins suggesting that the binding of the proteins does not

compromise RNA stability, with the exception of ND1 and cytB by cPPRcaps-NRE after 3

days. This result is very intriguing and requires further examination. Overall, the general

decrease in protein synthesis of the mitochondrial transcripts however indicates that the

poly(A)-binding PPR protein acts to decrease the efficiency of translation.

6.4 Summary

We have shown the PPR proteins possess modularity, similar to that observed in

PUF proteins. This means that the binding specificity of PPRs repeats can be transferred to

various positions in the RNA recognition sequence to enable the design of engineered

PPRs with predictable binding. It has already been shown by numerous studies that

naturally occurring PPRs have diverse roles in mitochondrial gene expression. The modular

nature of these proteins may allow them to have versatile RNA regulatory functions.

Elucidation of the RNA recognition code of PPRs should enable the prediction of their RNA

targets and also to engineer them as tools to specifically and selectively manipulate

mammalian mitochondrial gene expression. In this study, we have successfully engineered

a PPR protein that can target the poly(A) tail of mitochondrial mRNAs, providing evidence

that designer PPRs have the potential to be used to investigate mitochondrial mRNA

transcripts. The next step would be to fuse PPRs to effector domains to make new

discoveries about the post-transcriptional regulation of the mitochondrial genome.

Thus far, analysis of the human mitochondrial transcriptome has uncovered many

novel and important mechanisms for regulating the expression of its genes and the energy

metabolism in cells (Mercer et al., 2011). The unique features of mitochondrial transcripts

and the need for post-transcriptional regulation of its expression demand further

investigation. Engineered RBPs can be exploited to further understand the regulation of

mitochondrial gene expression and how these processes are coordinated in response to

environmental changes. Understanding the links between energy metabolism and

mitochondrial RNA control would provide significant insight into human diseases caused

by mutations in genes encoding mitochondrial proteins.

CHAPTER 7

Discussion

In all life forms binding of RNA by proteins is a vital function, given that all aspects

of gene expression and regulation require RNA-binding proteins. Here we have

successfully identified the amino acid code responsible for nucleotide recognition by PPR

proteins as well as expanded the base recognition scope of PUF proteins by engineering

them to specifically recognize cytosine. The availability of a code that enables the design

of proteins with predictable RNA targets will have many potential applications in

biotechnology and medicine. We have also provided evidence that PUF proteins can be

designed to bind any RNA of interest by designing a PUF protein containing 16 RNA-

binding repeats and that PPR protein can be exploited as potential tools to study

mitochondrial gene expression.

Given that transcription and translation in eukaryotes are uncoupled, the former

taking place in the nucleus and the latter in the cytoplasm, this provides extensive

opportunities for the use of designer RBPs to regulate gene expression post-

transcriptionally. As the control of gene expression is faster and more precise at the post-

transcriptional level, engineered PUF and PPR proteins can be exploited for fine-tuning the

expression of endogenous genes or transgenes, not to mention that some aspects of gene

expression can only be controlled at the level of RNA (eg. the cytoplasmic localization or

nuclear retention of mRNAs; Isaacs et al., 2004; Zenklusen et al., 2008; Johnston et al.,

2005). Numerous studies have shown that mutations or alteration in expression of either

RBPs or their binding sites in target transcripts are the cause of several human diseases

such as muscular atrophies, neurological disorder and cancer. Therefore designer RBPs

could help elucidate the mechanism of these disorders and even provide potential

therapies in the future (Lukong et al.,2008; Musunuru et al.,2003; Kim et al., 2009)

This study has led to the successful identification of five unique PUF mutants that

can selectively interact with RNAs containing a cytosine, achieved via the randomization of

amino acids at positions 12 and 16 to encode for all possible 20 amino acids and screening

for mutants using a yeast genetic system. All five variants had an arginine at position 16,

while the amino acids at position 12 were alanine, glycine, serine, threonine and cysteine.

A subsequent study by Dong et al. (2011) confirmed the specificity of the serine/arginine

pair that we found for cytosine residues and revealed that this specific interaction was

achieved via hydrogen bonding between O2 and N3 of the cytosine base with the side

chain of arginine in a mode comparable to the recognition of uracil by asparagine (Figure

7.1). Albeit different from uracil recognition (where the glutamine at position 16 contacts

the base), the serine at position 12 serves to position the arginine and does not contact

the base directly (Dong et al. 2011). This may explain the ability of cysteine, alanine,

threonine and glycine to replace serine to a certain degree.

We demonstrated that engineered PUF proteins may not always bind with the

same affinities as wild type proteins, which implies that one would have to consider which

PUF repeat combinations would result in the ideal binding affinity for target RNAs of

interest. We also illustrated that engineering a 16-repeat PUF is possible and that it

possessed enhanced specificity given that it bound to its cognate extended RNA target

more efficiently than the wild-type eight repeat PUF with its cognate RNA. A key piece of

material that would complete this study would be the crystal structure of the 16 repeat

PUF which will ultimately inform us if every single PUF repeat unit binds to each RNA base

in its extended RNA target.

Figure 7.1: Recognition of adenine, cytosine, guanine, and uracil by PUF repeats in the crystal structure of a mutant PUM1 (Filipovska and Rackham, 2011). Highlighted in red is the recognition of cytosine by PUF repeat with a serine at amino acid position 12 and an arginine at position 16.

On the other hand, the successful identification of the amino acid code for

nucleotide binding by PPR proteins that has been speculative and vague largely due to the

highly insoluble nature of PPR proteins, is revolutionary as this would finally enable the

engineering of customizable PPR proteins. We confirmed that amino acids at position 4

and 34 are responsible for nucleotide recognition, with N4D34 binding uracil, T4N34 binds

adenine and N4S34/N4T34 binding cytosine. We are currently in the midst of deciphering

the guanine recognition code. The amino acid code described by Barkan et al. (2012) in

maize protein PPR10 was shown to hold true in a highly reduced, consensus PPR

architecture, illustrating that the code is likely to be general to most, if not all, naturally

occurring PPR proteins. Furthermore, the highly soluble and stable consensus PPR should

enable the binding preferences of other amino acid combinations that cannot be

predicted computationally to be readily deciphered. This will provide the means to

discover the binding sites of naturally occurring PPR proteins and to understand their roles

in cells. It was previously hypothesized that like PUF proteins, PPR binding was modular,

but we have shown that this is indeed true as we were successful in designing a PPR

protein that could bind to the NRE RNA. We seek to ascertain the structure of the PPR-

RNA complex in order to provide further understanding of the fundamental principles of

its interaction and also to determine if any other residues in the PPR protein may further

enhance or stabilize binding to RNA, similar to amino acid 13 in PUF proteins.

There are many similarities and differences between PUF and PPR proteins that are

worth highlighting. Firstly, although the classic PUF repeats are 36 amino acids long while

PPR proteins are generally 35 amino acids long, the length distributions of both proteins

actually overlap because shorter and longer repeats of both can be found in naturally

occurring proteins (Wickens et al., 2002; Small et al.,2000; Filipovska and Rackham, 2011).

The overall numbers of repeats in individual proteins is highly variable in PPR proteins but

are much more constrained in PUF proteins, with almost all PUF repeat arrays consisting

of eight RNA-binding repeats (Wicken et al., 2002). This may be a reflection of the

evolutionary constraints placed on PUFs rather than biophysical limitations, as we were

able to engineer a 16 repeat PUF that specifically recognized an extended RNA target.

Structurally, repeats of PPR and PUF proteins are predominantly alpha helical, however

the PUF repeat unit consists of three distinct alpha helices (Wang et al.,2001; Edwards et

al.,2001), while PPRs are predicted to have two alpha helices organized in a helix-turn-

helix structure (Howards et al., 2012; Small and Peeters, 2000). Detailed crystal structures

have revealed that PUF domains interact in a one-to-one stoichiometry with their RNA

targets, however the mode by which PPRs bind their targets are yet to be verified in

crystal form.

The specific recognition of nucleic acids by PUFs are achieved via base-binding and

stacking amino acids found in the second helix of each PUF repeat, while we and Barkan et

al. (2012) have shown that the RNA recognition code of PPR proteins are determined by

the amino acids in positions four and 34. Co-variation analysis of fertility restorer genes

containing PPRs and their RNA targets in plant mitochondria by Fujii et al. (2011)

compellingly proposes that the amino acids in positions one, four and 34 are likely to be

responsible for RNA recognition, with all of these amino acids falling within or adjacent to

helix one of the PPR (Filipovska and Rackham, 2011). The alignment of helix one of PPRs

with helix two of PUF repeats, based on the likeness of its primary sequence places

residues 12, 13 and 16 of the PUF repeat in similar positions to residues with high co-

variation in the PPR repeat. As previously described, amino acids of PUFs at positions 12

and 16 interact with the RNA nucleotides via hydrogen bonding, while stacking

interactions occur at amino acid position 13. Although the specificities of PUFs repeats can

be reassigned without altering the amino acid at position 13 (Cheong and Hall, 2006), a

recent study by Koh et al. (2011) found that the choice of residue at this position can

influence the affinity and specificity of each repeat by modulating the stacking

interactions within the RNA-protein complex. In future studies, it will be of great interest

to determine if residues surrounding positions 4 and 34 of PPRs contribute to the affinity

or specificity of PPR-RNA association.

With regards to biotechnological applications, the availability of designer RBPs may

provide new tools to generate synthetic networks that are controlled at the level of RNA

and to enable better understanding of the complex patterns of gene expression in living

cells (Isaacs et al., 2006; Filipovska and Rackham et al., 2008). To date, the

biotechnological applications of PUFs have involved fusing them to effector domains with

well-characterized functions. The first designer PUF domains were engineered for the

tracking of endogenous mitochondria-encoded RNAs in living cells (Ozawa et al., 2007).

This was achieved by fusing two fragment of a fluorescent protein to adjacent PUF

domains that targeted ND6 mRNA, where a functional fluorophore was regenerated upon

folding when both segments were brought into close proximity (Ozawa et al,. 2007).

Through this approach, Ozawa et al. (2007) observed that oxidative stress led to the

increased mobility of ND6 within mitochondria and stimulated its degradation. The main

advantage of using engineered PUF proteins is that endogenous RNAs can be tracked

without the necessity of heterologous overexpression or genetic manipulation of their

genes (Filipovska and Rackham, 2011).

More recently, Wang et al. (2009) engineered PUF fused to a splicing regulatory

domain that targeted a site upstream of a splice acceptor site in the Bcl-X mRNA. Bcl-X

mRNA encodes for a mitochondrial outer membrane protein that is involved in

programmed cell death or apoptosis. Alternative splicing of this transcript facilitated the

selective production of two distinct isoforms of Bcl-X, Bcl-XL and Bcl-XS, which act as an

apoptotic inhibitor or an apoptotic activator, respectively (Boise et al., 1993; Chipuk et al.,

2010). By fusing the designed PUF domain to splicing repressor domain derived from

hnRNP A1, they were able to increase the production of Bcl-XS and induce apoptosis in

cancer cells (Wang et al., 2009). Our discovery of a universal code for RNA recognition by

PUF proteins should enable various types of applications of designer RBPs to manipulate

numerous aspects of RNA metabolism. Recently, Dong et al. (2011) used the cytosine-

recognizing PUF repeats to engineer splicing factors that bound cytosine-containing splice

sites in VEGF-A pre-mRNA to increase anti-angiogenic vascular endothelial growth factor

isoform production in cultured mammalian cells. The discovery of genome editing has also

generated an assortment of new potential methods that can be used to interrogate

biological systems, with the aim of enhancing our understanding of basic biology that can

possibly lead to new methods for treating human disease. For example, fusing a PUF or

PPR protein to other effector domains can potentially facilitate novel studies in RNA

biology, similar to applications seen with Zinc Finger Nuclease (ZFN) and Transcription

Activator-Like Effector Nuclease (TALEN) in DNA biology (reviewed in Klug, 2010).

The manipulation of RNA has thus far been limited to antisense and RNA

interference technologies (Hebert et al., 2008). Although short interfering RNAs (siRNAs)

have proven to be very potent inhibitors of gene expression and have allowed for the

elucidation and better understanding of gene functions in many different cell lines and

organisms, there are several limitations to siRNA-knockdown technology. These

approaches are limited to lowering the abundance or expression of their target RNAs and,

in the case of siRNAs and miRNAs, their actions are confined to the cytoplasm as they

depend on endogenous RNA interference pathways (Carthew and Sontheimer, 2009). We

have shown that a poly(A)-targeted PPR protein, when fused to a mitochondrial targeting

sequence, could be imported into the mitochondria where it was able to reduce the

translation of mitochondrial-encoded proteins. One limitation of this study was that we

did not determine the maximum level of overall protein knockdown that could be

achieved. This could possibly be investigated through the establishment of stable cell lines

transfected with the cPPR proteins, prolonging PPR protein expression within the cells.

The use of designer RBPs to interrogate mitochondrial gene expression will enable

significant advances towards understanding mitochondrial gene expression and its

involvement in human disease. We aim to engineer PPR proteins targeting specific

mitochondrial RNA transcripts in order to study the regulation of mitochondrial gene

expression at the level of RNA processing, translation and degradation and to understand

how these processes are coordinated. Ultimately, this will provide vital insight into human

diseases caused by mutations in genes encoding mitochondrial proteins.

On the other hand, RNAi mediated by the introduction of long dsRNA has been

used to investigate gene functions in various organisms including plants, Drosophila and

mouse oocytes (Baulcombe, 1999; Kennerdell and Carthew, 1998; Miquitta and Paterson,

1999; Svoboda et al., 2000; Wianny and Zernicka-Goetz, 2000). Although long dsRNA

enables the effective silencing of gene expression by presenting various siRNA sequences

to the target mRNA, the applicability of this approach is limited in mammals due to the

sequence non-specific interferon response caused by the introduction of dsRNA longer

than 30 nt (Elbashir et al., 2001). Not only does interferon trigger the degradation of

mRNA through the induction of 2'-5' oligodenylate synthase, it also leads to the activation

of protein kinase, which phosphorylates the translation initiation factor eIF2 leading to a

global inhibition of mRNA translation (Stark et al., 1998). We have shown that the number

of repeats within a single PUF domain can be adjusted from eight to 16; this would enable

either one to target a limited set of RNAs or only one species of RNA within an entire

transcriptome without some of the drawbacks of RNAi technologies.

The distinctive properties of PUFs can provide novel opportunities to selectively

bind and regulate specific RNAs. This can be exemplified by the ability of PUF proteins to

invade highly structured RNAs to bind their target sequences when we systematically

varied the target RNA structure to place it in increasingly base-paired structures. For RNAi

technologies, this may be a challenge as Kedde et al. (2010) showed that a miRNA

targeting the 3'UTR of p27 mRNA was ineffective without prior binding of PUM1 to that

target site. p27 tumour suppressor is a cyclin-dependent kinase inhibitor that associates

with cyclin-dependent kinase 2 (CDK2) and cyclin E complexes to negatively regulate cell

cycle progression. Kedde et al. (2010) discovered that binding of PUM1 to the 3’UTR opens

up the local RNA structure, allowing access of miR-221 and miR-222. This study is an

example of how an RBP-induced structural change modulates the ability of miRNA to

regulate gene expression.

Designer RBPs can also be used to complement RNAi technologies so that the

consequences of knocking down or overexpressing endogenous genes can be used to

elucidate the functions of a specific gene. There is increasing evidence in support of the

cross-communication between naturally occurring miRNAs and RBPs (Janga., 2012).

TRIM71, a tripartite motif protein which possesses ubiquitin ligase activity, is a target of

the miRNA let-7 and it is responsible for AGO protein degradation through

ubiquitinylation, directly interfering with miRNA function (Rybak et al., 2009). This study is

one of many that suggest that RBPs and microRNAs may target overlapping regions of

RNA transcripts, describing a mechanism by which miRNAs may modulate the post-

transcriptional pathway by revealing or masking the regulatory targets of RBPs (Janga.,

2012). In addition, RBPs near miRNA target sites can possibly regulate miRNA function by

either directly affecting miRNA-binding or indirectly by altering the RNA secondary

structure (van Kouwenhove et al., 2011). The interaction between RBPs and miRNAs can

provide the means of controlling the expression of target genes in a combinatorial

fashion.

In conclusion, since the discovery of the first consensus motifs in RBPs more than

twenty years ago, the number of RBPs and the various functions in which they participate

in has increased vastly. It can be presumed that with the rapid progress in understanding

molecular mechanisms underlying gene expression as well as the assortment of

uncharacterized protein repeat domains, we have only begun to see the potential that

engineered RBP scaffolds possess. Indeed there is still a large gap in the knowledge of the

structures of RBPs as well as their mode of interaction with RNAs and the organization of

these proteins in complex structures. It is not unexpected to think that designer RBPs will

lead to a shift in the way we manipulate and study complex transcriptomes and their

cellular functions. Several challenges still remain with this technology as there is ample

work to be done to further improve both the specificity of engineered RBPs as well as

methods employed to monitor off-target binding. However, the potential of engineering

RBP fusions to alter the splicing, translation, localization or degradation of chosen mRNAs

or other RNA species in order to further understand the mechanism of diseases or for

biotechnological applications is extremely stimulating.

BIBLIOGRAPHY

1. Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P. (2002) Molecular Biology of the Cell. 4th edition. New York: Garland Science; The Transport of Proteins into Mitochondria and Chloroplasts. Available from: http://www.ncbi.nlm.nih.gov/books/NBK26828/

2. Allerson, C. R., Martinez, A., Yikilmaz, E., and Rouault, T. A. (2003). A high-capacity RNA affinity column for the purification of human IRP1 and IRP2 overexpressed in Pichia pastoris. RNA 9, 364-374.

3. Aloni, Y., and Attardi, G. (1971) Symmetrical in vivo transcription of mitochondrial DNA in HeLa cells. Proc Natl Acad Sci USA 68, 1757–1761.

4. Altschul, S. F., Koonin, E.V. (1998) Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases.Trends Biochem Sci. 23(11), 444-447

5. Ameur, A., Zaghlool, A., Halvardson, J., Wetterbom, A., Gyllensten, U., Cavelier, L., Feuk, L. (2011) Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat Struct Mol Biol. 18(12), 1435-1440.

6. Anderson, S., Bankier, A. T., Barrell, B. G., de Bruijn, M. H., Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J., Staden, R., Young, I. G. (1981) Sequence and organization of the human mitochondrial genome. Nature. 290(5806), 457-465.

7. Anderson, S., de Bruijn, M. H., Coulson, A. R., Eperon, I. C., Sanger, F., Young, I. G. (1982). Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J Mol Biol. 156(4), 683-717.

8. Andrade, M. A., Perez-Iratxeta, C., and Ponting, C. P. (2001) Protein repeats: structures, functions and evolution. J. Struct. Biol. 134 (2-3), 117-131

9. Archer, S. K., Luu, V. D., de Queiroz, R. A., Brems, S., Clayton, C. (2009). Trypanosoma brucei PUF9 regulates mRNAs for proteins involved in replicative processes over the cell cycle. PLoS Pathog. 5(8), e1000565.

10. Asaoka-Taguchi, M., Yamada, M., Nakamura, A., Hanyu, K., Kobayashi, S. (1999). Maternal Pumilio acts together with Nanos in germline development in Drosophila embryos. Nat Cell Biol 1, 431–437.

11. Aubourg, S., Boudet, N., Kreis, M., Lecharny, A. (2000). In Arabidopsis thaliana, 1% of the genome codes for a novel protein family unique to plants. Plant Mol Biol. 42(4), 603-613

12. Aurora, R., Rose, G. D. (1998) Helix capping. Protein Sci. 7, 21–38

13. Ausubel, F. M. (1987). Current Protocols in Molecular Biology. (New York, Greene Pub. Associates and Wiley-Interscience)

14. Auweter, S. D., Oberstrass, F. C., Allain, F. H. (2006) Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 34(17), 4943-4959.

15. Barkan, A., Rojas, M., Fujii, S., Yap, A., Chong, Y. S., Bond, C. S., Small, I. (2012). A combinatorial amino acid code for RNA recognition by pentatricopeptide repeat proteins. PLoS Genet. 8(8), e1002910.

16. Barker, D. D., Wang, C., Moore, J., Dickinson, L. K., Lehmann, R. (1992). Pumilio is essential for function but not for distribution of the Drosophila abdominal determinant Nanos. Genes Dev. 6, 2312–2326.

17. Barrell, B. G., Bankier, A. T., Drouin, J. (1979). A different genetic code in human mitochondria. Nature 282, 189–194.

18. Baulcombe, D. C. (1999). Gene silencing: RNA makes RNA makes no protein. Curr. Biol. 9, R599–R601

19. Beick, S., Schmitz-Linneweber, C., Williams-Carrier, R., Jensen, B., Barkan, A. (2008). The pentatricopeptide repeat protein PPR5 stabilizes a specific tRNA precursor in maize chloroplasts. Mol Cell Biol. 28(17), 5337-5347.

20. Bentley, D. L. (2005). Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr. Opin. Cell Biol., 17, 251–256

21. Bernstein, D., Hook, B., Hajarnavis, A., Opperman, L., Wickens, M. (2005) Binding specificity and mRNA targets of a C. elegans PUF protein, FBF-1. RNA. 11(4), 447-458.

22. Bertrand, E., Chartrand, P., Schaefer, M., Shenoy, S. M., Singer, R. H., Long, R. M. (1998). Localization of ASH1 mRNA particles in living yeast. Mol Cell. 2, 437-445.

23. Beyer, K., Dandekar, T., Keller, W. (1997). RNA ligands selected by cleavage stimulation factor contain distinct sequence motifs that function as downstream elements in 3'-end processing of pre-mRNA. J Biol Chem 272, 26769-26779.

24. Bienroth, S., Keller, W., Wahle, E. (1993). Assembly of a processive messenger RNA polyadenylation complex. EMBO J. 12, 585- 594.

25. Bisaillon, M., and Lemay, G. (1997). Viral and cellular enzymes involved in synthesis of mRNA cap structure. Virology 236, 1-7.

26. Bobrowicz, A. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. (2008) Polyadenylation and degradation of mRNA in mammalian mitochondria: a missing link? Biochem Soc Trans 36(3), 517–519.

27. Boch, J., Scholze, H., Schornack, S., Landgraf, A., Hahn, S., Kay, S., Lahaye, T., Nickstadt, A., Bonas, U. (2009). Breaking the code of DNA binding specificity of TAL-type III effectors. Science. 326, 1509-1512.

28. Boise, LH, González-García, M., Postema, C. E., Ding, L., Lindsten, T., Turka, L. A., Mao, X., Nuñez, G., Thompson, C. B. (1993). bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell. 74, 597–608.

29. Bonifacino, J. S. (1998). Current Protocols in Cell Biology. (New York, John Wiley).

30. Borowski, L. S., Szczesny, R. J., Brzezniak, L. K., Stepien, P. P. (2010) RNA turnover in human mitochondria: more questions than answers? Biochim Biophys Acta 1797, 1066–1070.

31. Brown, V., Jin, P., Ceman, S., Darnell, J. C., O'Donnell, W. T., Tenenbaum, S. A., Jin, X., Feng, Y., Wilkinson, K. D., Keene, J. D., Darnell, R. B., Warren, S. T. (2001). Microarray identification of FMRP-associated brain mRNAs and altered mRNA translational profiles in fragile X syndrome. Cell 107, 477-487.

32. Brusilow, S. W., and Horwich, A. L. (1996) Urea cycle enzymes. In: The metabolic and molecular bases of inherited disease, 7th ed, Scriver CR, Beaudet AL, Sly WS, Valle D (Eds), McGraw-Hill, New York, 1187-1232.

33. Brzezniak, L. K., Bijata, M., Szczesny, R. J., Stepien, P. P. (2011). Involvement of human ELAC2 gene product in 3' end processing of mitochondrial tRNAs. RNA Biol. 8(4), 616-626.

34. Camenisch, T. D., Brilliant, M. H., Segal, D. J. (2008). Critical parameters for genome editing using zinc finger nucleases. Med Chem. 8(7), 669-676.

35. Caro, F., Bercovich, N., Atorrasagasti, C., Levin, M. J., Vazquez, M. P. (2006). Trypanosoma cruzi: analysis of the complete PUF RNA-binding protein family. Exp Parasitol 113, 112–124.

36. Carroll, D. (2008). Progress and prospects: Zinc-finger nucleases as gene therapy agents Gene Therapy. 15, 1463–1468.

37. Carthew, R. W., Sontheimer, E. J. (2009). Origins and Mechanisms of miRNAs and siRNAs. Cell. 136(4), 642-655.

38. Cassiday, L. A., and Maher, L. J. (2001) In vivo recognition of an RNA aptamer by its transcription factor target. Biochemistry 40, 2433–2438

39. Cathomen, T., Joung, J. K. (2008). Zinc-finger nucleases: the next generation emerges. Mol Ther. 16(7), 1200-1207.

40. Cermak, T., Doyle, E. L., Christian, M., Wang, L., Zhang, Y., Schmidt, C., Baller, J. A., Somia, N. V., Bogdanove, A. J., Voytas, D. F. (2011). Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 39(12), e82

41. Chase, C. D. (2007). Cytoplasmic male sterility: a window to the world of plant mitochondrial-nuclear interactions. Trends Genet. 23(2), 81-90.

42. Cheong, C. G., Hall, T. M. (2006) Engineering RNA sequence specificity of Pumilio repeats. Proc Natl Acad Sci USA. 103(37), 13635-13639.

43. Chipuk, J. E., Moldoveanu, T., Llambi, F., Parsons, M. J., Green, D. R. (2010). The BCL-2 family reunion. Mol Cell. 37(3), 299-310.

44. Cho, E-J., Kobor, M. S., Kim, M., Greenblatt, J., Buratowski, S. (2001). Opposing effects of Ctk1 kinase and Fcp1 phosphatase at Ser 2 of the RNA polymerase II C-terminal domain. Genes & Dev. 15, 3319-3329

45. Cho, P. F., Gamberi, C., Cho-Park, Y. A., Cho-Park, I. B., Lasko, P., Sonenberg, N. (2006) Cap-dependent translational inhibition establishes two opposing morphogen gradients in drosophila embryos. Curr Biol. 16, 2035–2041.

46. Chong, S., Mersha, F. B., Comb, D. G., Scott, M. E., Landry, D., Vence, L.M., Perler, F.B., Benner, J., Kucera, R.B., Hirvonen, C. A., Pelletier, J.J., Paulus, H. Xu, M.-Q. (1997) Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene 192, 271-281.

47. Chong, S., Shao, Y., Paulus, H., Benner, J., Perler F.B., Xu, M.-Q. (1996) Protein splicing involving the Saccharomyces, cerevisiae VMA intein: the steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing system. J. Biol. Chem. 271, 22159-22168.

48. Christianson, T. W., Clayton, D. A. (1988) A tridecamer DNA sequence supports human mitochondrial RNA 3’-end formation in vitro. Mol Cell Biol, 8, 4502–4509.

49. Cléry, A., Blatter, M., Allain, F. H. (2008). RNA recognition motifs: boring? Not quite. Curr Opin Struct Biol. 18(3), 290-298.

50. Colgan, D. F., and Manley, J. L. (1997). Mechanism and regulation of mRNA polyadenylation. Genes Dev 11, 2755-2766.

51. Coll, O., Villalba, A., Bussotti, G., Notredame, C., Gebauer, F. (2010). A novel, noncanonical mechanism of cytoplasmic polyadenylation operates in Drosophila embryogenesis.Genes Dev. 24(2), 129-134

52. Coulombe, B., Burton, Z. F. (1999). DNA bending and wrapping around RNA polymerase: a "revolutionary" model describing transcriptional mechanisms. Microbiol Mol Biol Rev. 63(2), 457-478

53. Cramer, P. (2004). RNA polymerase II structure: From core to functional complexes. Curr Opin Genet Dev 14, 218–226.

54. Cramer, P., Bushnell, D. A., Kornberg, R. D. (2001). Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863-1876.

55. Crittenden, S. L., Bernstein, D. S., Bachorik, J. L., Thompson, B. E., Gallegos, M., Petcherski, A. G., Moulder, G., Barstead, R., Wickens, M., and Kimble, J. (2002). A conserved RNA-binding protein controls germline stem cells in Caenorhabditis elegans. Nature 417, 660–663.

56. Cui, L., Fan, Q., Li, J. (2002). The malaria parasite Plasmodium falciparum encodes members of the Puf RNA-binding protein family with conserved RNA binding activity. Nucleic Acids Res. 30(21), 4607-4617.

57. Cusack, S. (1997) Aminoacyl-tRNA synthetases. Curr Opin Struct Biol 7,881–889.

58. Cushing, D. A., Forsthoefel, N. R., Gestaut, D. R., Vernon, D. M. (2005). Arabidopsis emb175 and other ppr knockout mutants reveal essential roles for pentatricopeptide repeat (PPR) proteins in plant embryogenesis. Planta. 221(3), 424-436.

59. Dahmus, M. E. (1995). Phosphorylation of the C-terminal domain of RNA polymerase II. Biochim Biophys Acta 1261, 171-182.

60. Dantonel, J. C., Murthy, K. G., Manley, J. L., Tora, L. (1997). Transcription factor TFIID recruits factor CPSF for formation of 3' end of mRNA. Nature. 389(6649), 399-402.

61. Dasgupta, S., Bell, J. A. (1993) Design of helix ends. Amino acid preferences, hydrogen bonding and electrostatic interactions. Int. J. Pept. Protein Res 41, 499–511

62. Dassi, E., Quattrone, A. (2012). Tuning the engine: An introduction to resources on post-transcriptional regulation of gene expression. RNA Biol. 9(10)

63. Davies, S. M., Lopez Sanchez, M. I., Narsai, R., Shearwood, A. M., Razif, M. F., Small, I. D., Whelan, J., Rackham, O., Filipovska, A. (2012). MRPS27 is a pentatricopeptide repeat domain protein required for the translation of mitochondrially encoded proteins. FEBS Lett. 586(20), 3555-3561.

64. Davies, S. M., Rackham, O., Shearwood, A. M., Hamilton, K. L., Narsai, R., Whelan, J., Filipovska, A. (2009). Pentatricopeptide repeat domain protein 3 associates with the mitochondrial small ribosomal subunit and regulates translation. FEBS Lett. 583(12), 1853-1858.

65. De Gregorio, E., Preiss, T., Hentze, M. W. (1999) Translation driven by an eIF4G core domain in vivo. EMBO J 18, 4865-4874.

66. Delannoy, E., Stanley, W. A., Bond, C. S., Small, I. D. (2007). Pentatricopeptide repeat (PPR) proteins as sequence-specificity factors in post-transcriptional processes in organelles. Biochem Soc Trans. 35(Pt 6), 1643-1647

67. Deng, Y., Singer, R. H., Gu, W. (2008). Translation of ASH1 mRNA is repressed by Puf6p-Fun12p/eIF5B interaction and released by CK2 phosphorylation. Genes Dev. 22, 1037–1050.

68. Desjarlais, J. R., and Berg, J. M. (1993) Use of a zinc-finger consensus sequence framework and specificity rules to design specific DNA binding proteins. Proc Natl Acad Sci USA. 90(6), 2256-2260.

69. Dichtl, B., Blank, D., Sadowski, M., Hübner, W., Weiser, S., Keller, W. (2002). Yhh1p/Cft1p directly links poly(A) site recognition and RNA polymerase II transcription termination.EMBO J 21(15), 4125-4135.

70. Dong, S., Wang, Y., Cassidy-Amstutz, C., Lu, G., Bigler, R., Jezyk, M. R., Li, C., Hall, T. M., Wang, Z. (2011). Specific and modular binding code for cytosine recognition in Pumilio/FBF (PUF) RNA-binding domains. J Biol Chem. 286(30), 26732-26742.

71. Dreyfuss, G., Kim, V. N., and Kataoka, N. (2002). Messenger-RNA-binding proteins and the messages they carry. Nat Rev Mol Cell Biol 3, 195-205.

72. Dubendorff, J.W., and Studier, F.W. (1991) Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor. J. Mol. Biol. 219, 45-59.

73. Dubnau, J., Chiang, A. S., Grady, L., Barditch, J., Gossweiler, S., McNeil, J., Smith, P., Buldoc, F., Scott, R., Certa, U., Broger, C., Tully, T. (2003). The staufen/pumilio pathway is involved in Drosophila long-term memory. Curr Biol. 13(4), 286-96.

74. Eckmann, C. R., Kraemer, B., Wickens, M., and Kimble, J. (2002). GLD-3, a bicaudal-C homolog that inhibits FBF to control germline sex determination in C. elegans. Dev. Cell 3, 697–710

75. Edwards, T. A., Pyle, S. E., Wharton, R. P., Aggarwal, A. K. (2001). Structure of Pumilio reveals similarity between RNA and peptide binding motifs. Cell.105(2), 281-289.

76. Elbashir, S. M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., Tuschl, T. (2001). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature. 411(6836), 494-498

77. Eliyahu, E., Pnueli, L., Melamed, D., Scherrer, T., Gerber, A. P., Pines, O., Rapaport, D., Arava, Y. (2010) Tom20 mediates localization of mRNAs to mitochondria in a translation-dependent manner. Mol Cell Biol. 30, 284–294.

78. Fabian, M. R., Mathonnet, G., Sundermeier, T., Mathys, H., Zipprich, J. T., Svitkin, Y. V., Rivas, F., Jinek, M., Wohlschlegel, J., Doudna, J. A., Chen, C. Y., Shyu, A. B., Yates, JR 3rd, Hannon, G. J., Filipowicz, W., Duchaine, T. F., Sonenberg, N. (2009). Mammalian miRNA RISC recruits CAF1 and PABP to affect PABP-dependent deadenylation. Mol Cell. 35(6), 868-880.

79. Falkenberg, M., Gaspari, M., Rantanen, A., Trifunovic, A., Larsson, N. G., Gustafsson, C. M. (2002). Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat Genet. 31, 289–294.

80. Falkenberg, M., Larsson, N. G., Gustafsson, C. M. (2007). DNA replication and transcription in mammalian mitochondria. Annu Rev Biochem. 76, 679–699.

81. Filipovska A, Rackham O. (2008).Building a Parallel Metabolism within the Cell. ACS Chem Biol. 3(1), 51-63.

82. Filipovska, A., Rackham, O. (2011) Designer RNA-binding proteins: New tools for manipulating the transcriptome. RNA Biol. 8(6), 978-983.

83. Forbes, A., Lehmann, R. (1998) Nanos and Pumilio have critical roles in the development and function of Drosophila germline stem cells. Development, 125(4), 679-690.

84. Francischini, C. W., Quaggio, R. B. (2009). Molecular characterization of Arabidopsis thaliana PUF proteins-binding specificity and target candidates. Febs J. 276, 5456–5470

85. Fujii, S., Bond, C. S., Small, I. D. (2011). Selection patterns on restorer-like genes reveal a conflict between nuclear and mitochondrial genomes throughout angiosperm evolution. Proc Natl Acad Sci USA. 108(4), 1723-1728.

86. Gagliardi, D., Stepien, P. P., Temperley, R. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2004) Messenger RNA stability in mitochondria: different means to an end. Trends Genet. 20, 260–267.

87. Galgano, A., Forrer, M., Jaskiewicz, L., Kanitz, A., Zavolan, M., Gerber, A. P. (2008) Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS One 3, e3164

88. Gamberi, C., Peterson, D. S., He, L., Gottlieb, E. (2002). An anterior function for the Drosophila posterior determinant Pumilio. Development. 129, 2699–2710.

89. García-Rodríguez, L. J., Gay, A. C., Pon, L. A. (2007) Puf3p, a Pumilio family RNA binding protein, localizes to mitochondria and regulates mitochondrial biogenesis and motility in budding yeast. J Cell Biol. 176(2), 197-207.

90. Gerber, A. P., Herschlag, D., and Brown, P.O. (2004). Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol 2, E79

91. Gietz, R. D., Woods, R. A. (2002). Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol.350, 87-96

92. Glisovic T., Bachorik, JL., Yong, J., Dreyfuss, G. (2008) RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582(14),1977-1986.

93. Gobert, A., Gutmann, B., Taschner, A., Gössringer, M., Holzmann, J., Hartmann, R. K., Rossmanith, W., Giegé, P. (2010). A single Arabidopsis organellar protein has RNase P activity. Nat Struct Mol Biol. 17(6):740-744.

94. Graveley, B. R. (2000) Sorting out the complexity of SR protein functions. RNA. 6(9), 1197-1211.

95. Grudzien, E., Kalek, M., Jemielity, J., Darzynkiewicz, E., Rhoads, R. E. (2006). Differential inhibition of mRNA degradation pathways by novel cap analogs. J Biol Chem. 281(4), 1857-1867.

96. Gu, W., Deng, Y., Zenklusen, D., Singer, R. H. (2004) A new yeast PUF family protein, Puf6p, represses ASH1 mRNA translation and is required for its localization. Genes Dev. 18, 1452–1465.

97. Gupta, Y. K., Nair, D. T., Wharton, R. P., Aggarwal, A. K. (2008). Structures of human Pumilio with noncognate RNAs reveal molecular mechanisms for binding promiscuity. Structure. 16(4), 549-557

98. Hammani, K., des Francs-Small, C. C., Takenaka, M., Tanz, S. K., Okuda, K., Shikanai, T., Brennicke, A., Small, I. (2011). The pentatricopeptide repeat protein OTP87 is essential for RNA editing of nad7 and atp1 transcripts in Arabidopsis mitochondria. J Biol Chem. 286(24), 21361-21371.

99. Händel E. M., Alwin S., Cathomen T. (2009) Expanding or restricting the target site repertoire of zinc-finger nucleases: the inter-domain linker as a major determinant of target site selectivity. Mol. Ther. 17 (1), 104–111.

100. Hebert, C. G., Valdes, J. J., Bentley, W. E. (2008). Beyond silencing - engineering applications of RNA interference and antisense technology for altering cellular phenotype. Curr Opin Biotechnol 19, 500-505.

101. Hesselberth, J. R., and Ellington, A. D. (2002). A (ribo) switch in the paradigms of genetic regulation. Nat Struct Biol 9, 891-893.

102. Hieronymus, H., Silver, P. A. (2004) A systems view of mRNP biology. Genes Dev. 18(23), 2845-2860.

103. Hirose, Y., Manley, J. L. (1998). RNA polymerase II is an essential mRNA polyadenylation factor. Nature 395, 93-96.

104. Hirose, Y., Ohkuma, Y. (2007). Phosphorylation of the C-terminal domain of RNA polymerase II plays central roles in the integrated events of eucaryotic gene expression. J Biochem 141, 601–608

105. Hockemeyer, D., Wang, H., Kiani, S., Lai, C. S., Gao, Q., Cassady, J. P., Cost, G. J., Zhang, L., Santiago, Y., Miller, J. C., Zeitler, B., Cherone, J. M., Meng, X., Hinkley, S. J., Rebar, E. J., Gregory, P. D., Urnov, F. D., Jaenisch, R. (2011). Genetic engineering of human pluripotent cells using TALE nucleases. Nat Biotechnol. 29(8), 731-734.

106. Holzmann, J., Frank, P., Loffler, E., Bennett, K. L., Gerner, C., Rossmanith, W. (2008). RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell. 135, 462–474.

107. Holzmann, J., Rossmanith, W. (2009). tRNA recognition, processing, and disease: hypotheses around an unorthodox type of RNase P in human mitochondria. Mitochondrion. 9, 284–288.

108. Hook, B. A., Goldstrohm, A. C., Seay, D. J., Wickens, M. (2007) Two yeast PUF proteins negatively regulate a single mRNA. J Biol Chem. 282, 15430–15438.

109. Hook, B., Bernstein, D., Zhang, B., Wickens, M. (2005). RNA-protein interactions in the yeast three-hybrid system: affinity, sensitivity, and enhanced library screening. RNA. 11(2), 227-233.

110. Horwich, A. L., Kalousek, F., Fenton, W. A., Pollock, R. A., Rosenberg, L. E. (1986) Targeting of pre-ornithine transcarbamylase to mitochondria: definition of critical regions and residues in the leader peptide. Cell. 44(3), 451-459.

111. Howard, M. J., Lim, W. H., Fierke, C. A., Koutmos, M. (2012). Mitochondrial ribonuclease P structure provides insight into the evolution of catalytic strategies for precursor-tRNA 5' processing. Proc Natl Acad Sci USA. 109(40), 16149-16154.

112. Hudson, B. P., Martinez-Yamout, M. A., Dyson, H. J., Wright, P. E. (2004). Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nat Struct Mol Biol. 11, 257–264.

113. Isaacs, F. J., Dwyer, D. J., Collins, J. J. (2006) RNA synthetic biology. Nat Biotechnol. 24(5), 545-554

114. Isaacs, F. J., Dwyer, D. J., Ding, C., Pervouchine, D. D., Cantor, C. R., Collins, J. J. (2004) Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol. 22(7), 841-847.

115. Ito, T., Tashiro, K., Muta, S., Ozawa, R., Chiba, T., Nishizawa, M., Yamamoto, K., Kuhara, S., Sakaki, Y. (2000) Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci USA. 97(3), 1143-1147.

116. Jackson Jr., J. S., Houshmandi, S. S., Lopez Leban, F., and Olivas, W. M. (2004). Recruitment of the Puf3 protein to its mRNA target for regulation of mRNA decay in yeast. RNA 10, 1625–1636

117. Jan, E., Motzny, C. K., Graves, L. E., and Goodwin, E. B. (1999). The STAR protein, GLD-1, is a translational regulator of sexual identity in Caenorhabditis elegans. EMBO J 18, 258-269

118. Janga, S. C. (2012). From specific to global analysis of posttranscriptional regulation in eukaryotes: posttranscriptional regulatory networks. Brief Funct Genomics. [Epub ahead of print]

119. Jiao, X., Xiang, S., Oh, C., Martin, C. E., Tong, L., Kiledjian, M. (2010) Identification of a quality-control mechanism for mRNA 5'-end capping. Nature. 467(7315), 608-611.

120. Kahvejian, A., Svitkin, Y. V., Sukarieh, R., M’Boutchou, M. N., Sonenberg, N. (2005). Mammalian poly(A)-binding protein is a eukaryotic translation initiation factor, which acts via multiple mechanisms. Genes Dev. 19, 104–113

121. Kanki, T., Ohgaki, K., Gaspari, M., Gustafsson, C. M., Fukuoh, A., Sasaki, N., Hamasaki, N., Kang, D. (2004) Architectural role of mitochondrial transcription factor A in maintenance of human mitochondrial DNA. Mol Cell Biol. 24, 9823–9834.

122. Kaye, J. A., Rose, N. C., Goldsworthy, B., Goga, A., L`Etoile, N. D. (2009). A 3′UTR pumilio-binding element directs translational activation in olfactory sensory neurons. Neuron. 61, 57–70.

123. Kedde, M., van Kouwenhove, M., Zwart, W., Oude Vrielink, J. A., Elkon, R., Agami, R. (2010). A Pumilio-induced RNA structure switch in p27-3' UTR controls miR-221 and miR-222 accessibility. Nat Cell Biol. 12(10), 1014-1020

124. Keene, J. D., and Tenenbaum, S. A. (2002). Eukaryotic mRNPs may represent posttranscriptional operons. Mol Cell 9, 1161-1167.

125. Kennedy, B. K., Austriaco, N. R. Jr, Zhang, J., Guarente, L. (1995) Mutation in the silencing gene SIR4 can delay aging in S. cerevisiae. Cell 80, 485–496.

126. Kennedy, B. K., Gotta, M., Sinclair, D. A., Mills, K., McNabb, D. S., Murthy, M., Pak, S. M., Laroche, T., Gasser, S. M., Guarente, L. (1997) Redistribution of silencing proteins from

telomeres to the nucleolus is associated with extension of life span in S. cerevisiae. Cell 89, 381–391

127. Kennerdell., J. R., Carthew, R. W. (1998). Use of dsRNA-mediated genetic interference to demonstrate that frizzled and frizzled 2 act in the wingless pathway. Cell 95, 1017–1026

128. Kim, M. Y., Hur, J., Jeong, S. (2009). Emerging roles of RNA and RNA-binding protein network in cancer cells. BMB Rep 42, 125–130.

129. Kishore, S., Luber, S., Zavolan, M. (2010) Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Brief Funct Genomics. 9(5-6), 391-404.

130. Kloc, M., Bilinski, S., Chan, A. P., Allen, L. H., Zearfoss, N. R., Etkin, L. D. (2001). RNA localization and germ cell determination in Xenopus. Int Rev Cytol 203, 63-91.

131. Kobayashi, K., Kawabata, M., Hisano, K., Kazama, T., Matsuoka, K., Sugita, M., Nakamura, T. (2012). Identification and characterization of the RNA binding surface of the pentatricopeptide repeat protein. Nucleic Acids Res. 40(6), 2712-2723.

132. Kobayashi, K., Suzuki, M., Tang, J., Nagata, N., Ohyama, K., Seki, H., Kiuchi, R., Kaneko, Y., Nakazawa, M., Matsui, M., Matsumoto, S., Yoshida, S., Muranaka, T. (2007) Lovastatin insensitive 1, a Novel pentatricopeptide repeat protein, is a potential regulatory factor of isoprenoid biosynthesis in Arabidopsis. Plant Cell Physiol. 48(2), 322-331.

133. Kobe, B. and Kajava, A. V. (2000) When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem. Sci. 25 (10), 509-515

134. Koc, E. C., Burkhart, W., Blackburn, K., Moyer, M. B., Schlatzer, D. M., Moseley, A., Spremulli, L. L. (2001). The large subunit of the mammalian mitochondrial ribosome. Analysis of the complement of ribosomal proteins present. J Biol Chem. 276, 43958–43969.

135. Koh, Y. Y., Wang, Y., Qiu, C., Opperman, L., Gross, L., Tanaka Hall T. M., Wickens, M. (2011). Stacking interactions in PUF-RNA complexes. RNA. 17(4), 718-727.

136. Kohl, A., Binz, H. K., Forrer, P., Stumpp, M. T., Plückthun, A., Grütter, M. G. (2003) Designed to be stable: crystal structure of a consensus ankyrin repeat protein. Proc Natl Acad Sci USA. 100(4), 1700-1705.

137. Kolansky, D. M., Conboy, J. G., Fenton, W. A., Rosenberg, L. E. (1982). Energy-dependent translocation of the precursor of ornithine transcarbamylase by isolated rat liver mitochondria. J Biol Chem. 257(14), 8467-8471.

138. Kotera, E., Tasaka, M., Shikanai, T. (2005). A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature. 433(7023), 326-330.

139. Koussevitzky, S., Nott, A., Mockler, T. C., Hong, F., Sachetto-Martins, G., Surpin, M., Lim, J., Mittler, R., Chory, J. (2007). Signals from chloroplasts converge to regulate nuclear gene expression. Science. 316(5825), 715-719

140. Kuehner, J. N., Pearson, E. L., Moore, C. (2011). Unravelling the means to an end: RNA polymerase II transcription termination. Nat Rev Mol Cell Biol 12, 283–294

141. Kühl, I., Dujeancourt, L., Gaisne, M., Herbert, C. J., Bonnefoy, N. (2011). A genome wide study in fission yeast reveals nine PPR proteins that regulate mitochondrial gene expression. Nucleic Acids Res. 39(18), 8029-8041.

142. Kumar, S., Mansal, M., (1998) Dissecting alpha-helicesposition-specific analysis of alpha-helices in globular proteins Proteins. 31, 460–476

143. Lamont, L. B., Crittenden, S. L, Bernstein, D., Wickens, M., Kimble, J. (2004) FBF-1 and FBF-2 regulate the size of the mitotic region in the C. elegans germline. Dev Cell 7, 697–707.

144. Latham, V. M., Jr., Kislauskis, E. H., Singer, R. H., Ross, A. F. (1994). Beta-actin mRNA localization is regulated by signal transduction mechanisms. J Cell Biol 126, 1211-1219.

145. Lee, E., Yeo, A., Kraemer, B., Wickens, M., and Linial, M. L. (1999). The gag domains required for avian retroviral RNA encapsidation determined by using two independent assays. J Virol 73, 6282-6292.

146. Lee, M. H., Hook, B., Pan, G., Kershner, A. M., Merritt, C., Seydoux, G., Thomson, J. A., Wickens, M., Kimble, J. (2007). Conserved regulation of MAP kinase expression by PUF RNA-binding proteins. PLoS Genet, 3(12), e233.

147. Lehmann, R., Nusslein-Volhard, C. (1987). Involvement of the pumilio gene in the transport of an abdominal signal in the Drosophila embryo. Nature 329, 167–170.

148. Levadoux, M., Mahon, C., Beattie, J. H., Wallace, H. M., and Hesketh, J. E. (1999). Nuclear import of metallothionein requires its mRNA to be associated with the perinuclear cytoskeleton. J Biol Chem 274, 34961-34966.

149. Levinger, L., Morl, M., Florentz, C. (2004) Mitochondrial tRNA 3’ end metabolism and human disease. Nucleic Acids Res 32, 5430–5441.

150. Li, T., Huang, S., Zhao, X., Wright, D. A., Carpenter, S., Spalding, M. H., Weeks, D. P., Yang, B. (2011). Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaryotes. Nucleic Acids Res. 39(14), 6315-6325.

151. Licatalosi, D. D., Geiger, G., Minet, M., Schroeder, S., Cilli, K., McNeil, J. B., Bentley, D.L. (2002). Functional interaction of yeast pre-mRNA 3′ end processing factors with RNA polymerase II. Mol. Cell. 9, 1101–1111

152. Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2008). PPR (pentatricopeptide repeat) proteins in mammals: important aids to mitochondrial gene expression. Biochem J. 416(1), e5-6

153. Lin, H., Spradling, A. C. (1997) A novel group of pumilio mutations affects the asymmetric division of germline stem cells in the Drosophila ovary. Development 124, 2463–2476.

154. Lipinski, K. A., Puchta, O., Surendranath, V., Kudla, M., Golik, P. (2011). Revisiting the yeast PPR proteins--application of an Iterative Hidden Markov Model algorithm reveals new members of the rapidly evolving family. Mol Biol Evol. 28(10), 2935-2948.

155. Liu, Q., Paroo, Z. (2010). Biochemical principles of small RNA pathways. Annu Rev Biochem. 79, 295-319

156. Lodish, H., Berk, A., Zipursky, S. L., et al. (2000). Molecular Cell Biology. 4th edition. New York: W. H. Freeman. Section 11.2, Processing of Eukaryotic mRNA. Available from: http://www.ncbi.nlm.nih.gov/books/NBK21563/

157. Long, R. M., Singer, R. H., Meng, X., Gonzalez, I., Nasmyth, K., Jansen, R. P. (1997). Mating type switching in yeast controlled by asymmetric localization of ASH1 mRNA. Science 277, 383-387.

158. Lopez Sanchez, M. I. G., Mercer, T. R., Davies, S. M., Shearwood, A-M. J., Nygard, K. K. A., Richman, T. R., Mattick, J. S., Rackham, O., Filipovska, A. (2011) RNA processing in human mitochondria. Cell Cycle 10, 1–13

159. Loughlin, F. E., Mansfield, R. E., Vaz, P. M., McGrath, A. P., Setiyaputra, S., Gamsjaeger, R., Chen, E. S., Morris, B. J., Guss, J. M., Mackay, J. P. (2009) The zinc fingers of the SR-like protein ZRANB2 are single stranded RNA-binding domains that recognize 5’ splice site-like sequences. Proc Natl Acad Sci USA. 106, 5581–5586.

160. Lu, D., Searles, M. A., Klug, A. (2003). Crystal structure of a zinc-finger-RNA complex reveals two modes of molecular recognition. Nature. 426, 96–100.

161. Lublin, A. L., Evans, T. C. (2007) The RNA-binding proteins PUF-5, PUF-6, and PUF-7 reveal multiple systems for maternal mRNA regulation during C. elegans oogenesis. Dev Biol 303, 635–649.

162. Lukong, K. E., Chang, K. W., Khandjian, E. W, Richard, S. (2008). RNAbinding proteins in human genetic disease. Trends Genet 24, 416–425.

163. Lurin, C., Andrés, C., Aubourg, S., Bellaoui, M., Bitton, F., Bruyère, C., Caboche, M., Debast, C., Gualberto, J., Hoffmann, B., Lecharny, A., Le Ret, M., Martin-Magniette, M. L., Mireau, H., Peeters, N., Renou, J. P., Szurek, B., Taconnat, L., Small, I. (2004). Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell. 16(8), 2089-2103.

164. Mackay, J. P., Font, J., Segal, D. J. (2011). The prospects for designer single-stranded RNA-binding proteins. Nat Struct Mol. 18, 256-261.

165. Main, E. R., Jackson, S. E., Regan, L. (2003). The folding and design of repeat proteins: reaching a consensus. Curr Opin Struct Biol. 13(4), 482-489.

166. Mandel, C. R., Bai, Y., Tong, L. (2008) Protein factors in premRNA 3'-end processing. Cell Mol Life Sci 65, 1099-1122

167. Maniatis, T., and Reed, J., (2002). An extensive network of coupling among gene expression machines. Nature 416, 499-506

168. Margeot, A., Blugeon, C., Sylvestre, J., Vialette, S., Jacq, C., Corral-Debrinski, M. (2002). In Saccharomyces cerevisiae, ATP2 mRNA sorting to the vicinity of mitochondria is essential for respiratory function. EMBO J 21, 6893-6904.

169. Martin, F., Schaller, A., Eglite, S., Schumperli, D., and Muller, B. (1997). The gene for histone RNA hairpin binding protein is located on human chromosome 4 and encodes a novel type of RNA binding protein. EMBO J 16, 769-778.

170. McCracken, S., Fong, N., Rosonina, E., Yankulov, K., Brothers, G., Siderovski, D., Hessel, A., Foster, S., Shuman, S., Bentley, D. L. (1997a). 5'-Capping enzymes are targeted to pre-mRNA by binding to the phosphorylated carboxy-terminal domain of RNA polymerase II. Genes Dev 11, 3306-3318.

171. McCracken, S., Fong, N., Yankulov, K., Ballantyne, S., Pan, G., Greenblatt, J., Patterson, S. D., Wickens, M., Bentley, D. L. (1997b). The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385, 357-361.

172. McCullough, A. J., and Schuler, M. A. (1997). Intronic and exonic sequences modulate 5' splice site selection in plant nuclei. Nucleic Acids Res 25, 1071-1077.

173. Mee, C. J., Pym, E. C., Moffat, K. G., Baines, R. A. (2004). Regulation of neuronal excitability through pumilio-dependent control of a sodium channel gene. J Neurosci 24, 8695–8703

174. Menon, K. P., Andrews, S., Murthy, M., Gavis, E. R., Zinn, K. (2009). The translational repressors Nanos and Pumilio have divergent effects on presynaptic terminal growth and postsynaptic glutamate receptor subunit composition. J Neurosci. 29, 5558–5572.

175. Menon, K. P., Sanyal, S., Habara, Y., Sanchez, R., Wharton, R. P., Ramaswami, M., Zinn, K. (2004) The translational repressor Pumilio regulates presynaptic morphology and controls postsynaptic accumulation of translation factor eIF-4E. Neuron. 44, 663–676.

176. Mercer, T. R., Neph, S., Dinger, M. E., Crawford, J., Smith, M. A., Shearwood, A. M., Haugen, E., Bracken, C. P., Rackham, O., Stamatoyannopoulos, J. A., Filipovska, A., Mattick, J. S. (2011) The human mitochondrial transcriptome. Cell 146, 645–658.

177. Merrick, W. C., and Sonenberg, N. (1997). Assays for eukaryotic translation factors that bind mRNA. Methods 11, 333-342.

178. Mili, S., Piñol-Roma, S. (2003). LRP130, a pentatricopeptide motif protein with a noncanonical RNA-binding domain, is bound in vivo to mitochondrial and nuclear RNAs. Mol Cell Biol. 23(14), 4972-4982.

179. Miller, J. C., Tan, S., Qiao, G., Barlow, K. A., Wang, J., Xia, D. F., Meng, X., Paschon, D. E., Leung, E., Hinkley, S. J., Dulay, G. P., Hua, K. L., Ankoudinova, I., Cost, G. J., Urnov, F. D., Zhang, H. S., Holmes, M. C., Zhang, L., Gregory, P. D., Rebar, E. J. (2010). A TALE nuclease architecture for efficient genome editing. Nat Biotechnol. 29(2):143-148.

180. Miller, J., McLachlan, A. D., Klug, A. (1985). Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J. 4, 1609–1614.

181. Miller, S., Yasuda, M., Coats, J. K., Jones, Y., Martone, M. E., Mayford, M. (2002). Disruption of dendritic translation of CaMKIIalpha impairs stabilization of synaptic plasticity and memory consolidation. Neuron 36, 507-519.

182. Mingler, M. K., Hingst, A. M., Clement, S. L., Yu, L. E., Reifur, L., Koslowsky, D. J. (2006). Identification of pentatricopeptide repeat proteins in Trypanosoma brucei. Mol Biochem Parasitol. 150(1), 37-45.

183. Misquitta, L., Paterson, B. M. (1999) Targeted disruption of gene function in Drosophila by RNA interference (RNA-i): a role for nautilus in embryonic somatic muscle formation. Proc. Natl Acad. Sci. USA 96, 1451–1456.

184. Monshausen, M., Putz, U., Rehbein, M., Schweizer, M., DesGroseillers, L., Kuhl, D., Richter, D., and Kindler, S. (2001). Two rat brain staufen isoforms differentially bind RNA. J Neurochem 76, 155-165.

185. Montoya, J., Christianson, T., Levens, D., Rabinowitz, M., Attardi, G. (1982) Identification of initiation sites for heavy-strand and light-strand transcription in human mitochondrial DNA. Proc Natl Acad Sci USA 79, 7195–7199.

186. Montoya, J., Ojala, D., Attardi, G. (1981) Distinctive features of the 5’-terminal sequences of the human mitochondrial mRNAs. Nature 290, 465–470

187. Mootha, V. K., Lepage, P., Miller, K., Bunkenborg, J., Reich, M., Hjerrild, M., Delmonte, T., Villeneuve, A., Sladek, R., Xu, F., Mitchell, G. A., Morin, C., Mann, M., Hudson, T. J., Robinson, B., Rioux, J. D., Lander, E. S. (2003). Identification of a gene causing human

cytochrome c oxidase deficiency by integrative genomics. Proc Natl Acad Sci USA. 100, 605–610.

188. Morbitzer, R., Römer, P., Boch, J., Lahaye, T. (2010). Regulation of selected genome loci using de novo-engineered transcription activator-like effector (TALE)-type transcription factors. Proc Natl Acad Sci USA. 107(50), 21617-21622.

189. Mori, M., Miura, S., Morita, T., Takiguchi, M., Tatibana, M. (1982) Ornithine transcarbamylase in liver mitochondria. Mol Cell Biochem. 49(2):97-111.

190. Mori, M., Morita, T., Ikeda, F., Amaya, Y., Tatibana, M., Cohen, P. P. (1981) Synthesis, intracellular transport, and processing of the precursors for mitochondrial ornithine transcarbamylase and carbamoyl-phosphate synthetase I in isolated hepatocytes. Proc Natl Acad Sci USA. 78(10), 6056-6060.

191. Morris, A. R., Mukherjee, N., Keene, J. D. (2008) Ribonomic analysis of human Pum1 reveals cis-trans conservation across species despite evolution of diverse mRNA target sets. Mol Cell Biol, 28, 4093–4103.

192. Mosavi, L. K., Minor, D. L. Jr., Peng, Z. Y. (2002) Consensus-derived structural determinants of the ankyrin repeat motif. Proc Natl Acad Sci USA. 99(25), 16029-16034.

193. Moscou, M. J., Bogdanove, A. J. (2009). A simple cipher governs DNA recognition by TAL effectors. Science. 326, 1501.

194. Muraro, N. I., Weston, A. J., Gerber, A. P., Luschnig, S., Moffat, K. G., Baines, R. A. (2008). Pumilio binds para mRNA and requires Nanos and Brat to regulate sodium current in Drosophila motoneurons. J Neurosci. 28, 2099–2109.

195. Murata, Y., Wharton, R. P. (1995). Binding of pumilio to maternal hunchback mRNA is required for posterior patterning in Drosophila embryos. Cell. 80(5), 747-756.

196. Murphy, W. I., Attardi, B., Tu, C., Attardi, G. (1975) Evidence for complete symmetrical transcription in vivo of mitochondrial DNA in HeLa cells. J Mol Biol 99, 809–814.

197. Mussolino, C., Morbitzer, R., Lütge, F., Dannemann, N., Lahaye, T., Cathomen, T. (2011). A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity. Nucleic Acids Res.39(21), 9283-9293.

198. Musunuru, K. (2003) Cell-specific RNA-binding proteins in human disease. Trends CardiovascMed 13, 188–195.

199. Nagaike, T., Suzuki, T., Katoh, T., Ueda, T. (2005) Human mitochondrial mRNAs are stabilized with polyadenylation regulated by mitochondria-specific poly(A) polymerase and polynucleotide phosphorylase. J Biol Chem 280, 19721–19727.

200. Nagaike, T., Suzuki, T., Tomari, Y., Takemoto-Hori, C., Negayama, F., Watanabe, K., Ueda, T. (2001) Identification and characterization of mammalian mitochondrial tRNA nucleotidyltransferases. J Biol Chem 276, 40041–40049

201. Nagaike, T., Suzuki, T., Ueda, T. (2008) Polyadenylation in mammalian mitochondria: insights from recent studies. Biochim Biophys Acta 177, 266–269

202. Nakahata, S., Katsu, Y., Mita, K., Inoue, K., Nagahama, Y., Yamashita, M. (2001). Biochemical identification of Xenopus Pumilio as a sequence-specific cyclin B1 mRNA-binding protein that physically interacts with a Nanos homolog, Xcat-2, and a cytoplasmic polyadenylation element-binding protein. J. Biol. Chem. 276, 20945–20953

203. Nakahata, S., Kotani, T., Mita, K., Kawasaki, T., Katsu, Y., Nagahama, Y., Yamashita, M. (2003). Involvement of Xenopus Pumilio in the translational regulation that is specific to cyclin B1 mRNA during oocyte maturation. Mech Dev 120, 865–880.

204. Nakamura, T., Meierhoff, K., Westhoff, P., Schuster, G. (2003). RNA-binding properties of HCF152, an Arabidopsis PPR protein involved in the processing of chloroplast RNA. Eur J Biochem. 270(20), 4070-4081.

205. National Centre for Biotechnology Information, PSSM Viewer, Accessed 15th July 2011, http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi.

206. Nolde, M. J., Saka, N., Reinert, K. L., Slack, F. J. (2007) The Caenorhabditis elegans pumilio homolog, puf‐9, is required for the 3′UTR‐mediated repression of the let‐7 microRNA target gene, hbl‐1. Dev Biol 305(2), 551-563.

207. Nolte, R.T., Conlin, R.M., Harrison, S.C., Brown, R.S. (1998). Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. Proc Natl Acad Sci USA. 95, 2938–2943.

208. Novy, R., Morris, B. (2001) Use of glucose to control basal expression in the pET System [Article], inNovations, 13, 8-10, retrieved from http://wolfson.huji.ac.il/expression/procedures/bacterial/Glucose%20supression.pdf

209. O’Brien, T. W. (1971) The general occurrence of 55 S ribosomes in mammalian liver mitochondria. J Biol Chem 246, 3409–3417.

210. Oberstrass, F. C., Auweter, S. D., Erat, M., Hargous, Y., Henning, A., Wenter, P., Reymond, L., Amir-Ahmady, B., Pitsch, S., Black, D. L., Allain, F. H. (2005). Structure of PTB bound to RNA: specific binding and implications for splicing regulation. Science. 309(5743), 2054-2057.

211. Ojala, D., Montoya, J., Attardi, G. (1981) tRNA punctuation model of RNA processing in human mitochondria. Nature 290, 470–474.

212. Okuda, K., Myouga, F., Motohashi, R., Shinozaki, K., Shikanai, T. (2007). Conserved domain structure of pentatricopeptide repeat proteins involved in chloroplast RNA editing. Proc Natl Acad Sci USA. 104(19), 8178-8183.

213. Oleynikov, Y., and Singer, R. H. (2003). Real-Time Visualization of ZBP1 Association with beta-Actin mRNA during Transcription and Localization. Curr Biol 13, 199-207.

214. Olivas, W., Parker, R. (2000). The Puf3 protein is a transcript-specific regulator of mRNA degradation in yeast. EMBO J. 19(23), 6602-6611

215. O'Toole, N., Hattori, M., Andres, C., Iida, K., Lurin, C., Schmitz-Linneweber, C., Sugita, M., Small, I. (2008). On the expansion of the pentatricopeptide repeat gene family in plants. Mol Biol Evol. 25(6):1120-1128

216. Ozawa, T., Natori, Y., Sato, M., Umezawa, Y. (2007). Imaging dynamics of endogenous mitochondrial RNA in single living cells. Nat Methods. 4, 413-419.

217. Padmanabhan, K., Richter, J. D. (2006) Regulated Pumilio-2 binding controls RINGO/Spy mRNA translation and CPEB activation. Genes Dev 20, 199–209.

218. Park, Y. W., Wilusz, J., and Katze, M. G. (1999). Regulation of eukaryotic protein synthesis: selective influenza viral mRNA translation is mediated by the cellular RNA-binding protein GRSF-1. Proc Natl Acad Sci U S A 96, 6694-6699.

219. Parker, R., Song, H. (2004). The enzymes and control of eukaryotic mRNA turnover. Nat Struct Mol Biol. 11(2), 121-127

220. Patel, S. B., Bellini, M. (2008). The assembly of a spliceosomal small nuclear ribonucleoprotein particle. Nucleic Acids Res. 36(20), 6482-6493.

221. Pavletich, N.P., Pabo, C.O. (1991). Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science. 252, 809–817.

222. Pelham, H. R., Brown, D. D. (1980). A specific transcription factor that can bind either the 5S RNA gene or 5S RNA.. Proc Natl Acad Sci USA. 77(7), 4170-4174

223. Perrimon, N., Ni, J. Q., Perkins, L. (2010). In vivo RNAi: today and tomorrow. Cold Spring Harb Perspect Biol. 2(8), a003640.

224. Pfalz, J., Bayraktar, O. A., Prikryl, J., Barkan, A. (2009). Site-specific binding of a PPR protein defines and stabilizes 5' and 3' mRNA termini in chloroplasts. EMBO J. 28(14), 2042-2052.

225. Picard, B., Wegnez, M. (1979). Isolation of a 7S particle from Xenopus laevis oocytes: a 5S RNA-protein complex. Proc Natl Acad Sci USA. 76, 241-245.

226. Pieler, T., Hamm, J., Roeder, R. G. (1987). The 5S gene internal control region is composed of three distinct sequence elements, organized as two functional domains with variable spacing. Cell. 48, 91-100.

227. Pique, M., Lopez, J. M., Foissac, S., Guigo, R., Mendez, R. (2008). A combinatorial code for CPE-mediated translational control. Cell. 132, 434–448.

228. Pradet-Balade, B., Boulme, F., Beug, H., Mullner, E. W., Garcia-Sanz, J. A. (2001). Translation control: bridging the gap between genomics and proteomics? Trends Biochem Sci 26, 225-229.

229. Prikryl, J., Rojas, M., Schuster, G., Barkan, A. (2011). Mechanism of RNA stabilization and translational activation by a pentatricopeptide repeat proteinProc Natl Acad Sci USA. 108(1), 415-420.

230. Prinz, S., Aldridge, C., Ramsey, S. A., Taylor, R. J., Galitski, T. (2007). Control of signaling in a MAP-kinase pathway by an RNA-binding protein. PLoS One. 2, e249.

231. Prostko, C. R., Brostrom, M. A., Brostrom, C. O. (1993). Reversible phosphorylation of eukaryotic initiation factor 2 alpha in response to endoplasmic reticular signaling. Mol Cell Biochem 127-128, 255-265.

232. Putz, U., Skehel, P., and Kuhl, D. (1996). A tri-hybrid system for the analysis and detection of RNA--protein interactions. Nucleic Acids Res 24, 4838-4840.

233. Quenault, T., Lithgow, T., Traven, A. (2011) PUF proteins: repression, activation and mRNA localization. Trends Cell Biol. 21(2), 104-112

234. Rackham, O., Chin, J. W. (2005) A network of orthogonal ribosome x mRNA pairs. Nat Chem Biol. 1(3), 159-166.

235. Rackham, O., Davies, S. M., Shearwood, A. M., Hamilton, K. L., Whelan, J., Filipovska, A. (2009). Pentatricopeptide repeat domain protein 1 lowers the levels of mitochondrial leucine tRNAs in cells. Nucleic Acids Res. 37(17), 5859-5867.

236. Rackham, O., Filipovska, A. (2012). The role of mammalian PPR domain proteins in the regulation of mitochondrial gene expression. Biochim Biophys Acta. 1819(9-10),1008-1016.

237. Rackham, O., Mercer, T. R., Filipovska, A. (2012). The human mitochondrial transcriptome and the RNA-binding proteins that regulate its expression. Wiley Interdiscip Rev RNA. 3(5), 675-695.

238. Rackham, O., Shearwood, A. M., Mercer, T. R., Davies, S. M., Mattick, J. S., Filipovska, A. (2011) Long noncoding RNAs are generated from the mitochondrial genome and regulated by nuclear-encoded proteins. RNA 17, 2085–2093

239. Rappsilber, J., Ryder, U., Lamond, A. I., Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res 12, 1231-1245.

240. Richard, P., Manley, J. L. (2009). Transcription termination by nuclear RNA polymerases. Genes Dev, 23, 1247–1269.

241. Richardson, J. S., Richardson, D. C. (1988) Amino acid preferences for specific locations at the ends of alpha helices. Science 240, 1648–1652

242. Ringel, R., Sologub, M., Morozov, Y. I., Litonin, D., Cramer, P., Temiakov, D. (2011) Structure of human mitochondrial RNA polymerase. Nature. 478(7368), 269-273.

243. Rodeheffer, M. S., Boone, B. E., Bryan, A. C., Shadel, G. S. (2001). Nam1p, a protein involved in RNA processing and translation, is coupled to transcription through an interaction with yeast mitochondrial RNA polymerase. J Biol Chem. 276(11), 8616-8622.

244. Rodeheffer, M. S., Shadel, G. S. (2003) Multiple interactions involving the amino-terminal domain of yeast mtRNA polymerase determine the efficiency of mitochondrial protein synthesis. J Biol Chem 278, 18695–18701.

245. Rodriguez, C. R., Cho, E. J., Keogh, M. C., Moore, C. L., Greenleaf, A. L., Buratowski, S. (2000). Kin28, the TFIIH-associated carboxy-terminal domain kinase, facilitates the recruitment of mRNA processing machinery to RNA polymerase II. Mol. Cell. Biol. 20, 104–112.

246. Rorbach, J., Minczuki, M. (2012). The post-transcriptional life of mammalian mitochondrial RNA. Biochem. J. 444, 357–373

247. Rorbach, J., Nicholls, T. J., Minczuk, M. (2011) PDE12 removes mitochondrial RNA poly(A) tails and controls translation in human mitochondria. Nucleic Acids Res. 39, 7750–7763.

248. Rossmanith, W., Holzmann, J. (2009). Processing mitochondrial (t)RNAs: new enzyme, old job. Cell Cycle 8, 1650–1653.

249. Ruzzenente, B., Metodiev, M. D., Wredenberg, A., Bratic, A., Park, C. B., Camara, Y., Milenkovic, D., Zickermann, V., Wibom, R., Hultenby, K, Erdjument-Bromage, H., Tempst, P., Brandt, U., Stewart, J. B., Gustafsson, C. M., Larsson, N. G. (2011). LRPPRC is necessary for polyadenylation and coordination of translation of mitochondrial mRNAs. EMBO J 31, 443–456.

250. Rybak, A., Fuchs, H., Hadian, K., Smirnova, L., Wulczyn, E. A., Michel, G., Nitsch, R., Krappmann, D., Wulczyn, F. G. (2009). The let-7 target gene mouse lin-41 is a stem cell specific E3 ubiquitin ligase for the miRNA pathway protein Ago2. Nat Cell Biol 11(12), 1411-1420

251. Saint-Georges, Y., Garcia, M., Delaveau, T., Jourdren, L., Le Crom, S., Lemoine, S., Tanty, V., Devaux, F., Jacq, C. (2008) Yeast mitochondrial biogenesis: a role for the PUF RNA-binding protein Puf3p in mRNA localization. PLoS One 3, e2293.

252. Salvetti, A., Rossi, L., Lena, A., Batistoni, R., Deri, P., Rainaldi, G., Locci, M. T., Evangelista, M., Gremigni, V. (2005) DjPum, a homologue of Drosophila Pumilio, is essential to planarian stem cell maintenance. Development 132, 1863–1874.

253. Sasarman, F., Brunel-Guitton, C., Antonicka, H., Wai, T., Shoubridge, E. A.; LSFC Consortium. (2010) LRPPRC and SLIRP interact in a ribonucleoprotein complex that regulates posttranscriptional gene expression in mitochondria. Mol Biol Cell. 21(8), 1315-1323.

254. Scheper, G. C., Thomas, A. A., van Wijk, R. (1998). Inactivation of eukaryotic initiation factor 2B in vitro by heat shock. Biochem J 334, 463-467.

255. Schmitz-Linneweber, C., and Small, I. (2008) Pentatricopeptide repeat proteins: a socket set for organelle gene expression. Trends Plant Sci. 13(12):663-670.

256. Schmitz-Linneweber, C., Williams-Carrier, R. E., Williams-Voelker, P. M., Kroeger, T. S., Vichas, A., Barkan, A. (2006). A pentatricopeptide repeat protein facilitates the trans-splicing of the maize chloroplast rps12 pre-mRNA. Plant Cell. 18(10), 2650-2663.

257. Schoenberg, D. R., and Maquat, L. E. (2012) Regulation of cytoplasmic mRNA decay.Nat Rev Genet. 13(4), 246-259.

258. Sengupta, D. J., Wickens, M., and Fields, S. (1999). Identification of RNAs that bind to a specific protein using the yeast three-hybrid system. RNA 5, 596-601.

259. SenGupta, D. J., Zhang, B., Kraemer, B., Pochart, P., Fields, S., Wickens, M. (1996) A three-hybrid system to detect RNA-protein interactions in vivo. Proc Natl Acad Sci U S A 93, 8496-8501.

260. Sera, T. (2009) Zinc-finger-based artificial transcription factors and their applications. Adv Drug Deliv Rev. 61(7-8):513-26.

261. Sharma, M. R., Koc, E. C., Datta, P. P., Booth, T.M., Spremulli, L.L., Agrawal, R. K. (2003) Structure of the mammalian mitochondrial ribosome reveals an expanded functional role for its component proteins. Cell 115, 97–108.

262. Shestakova, E. A., Wyckoff, J., Jones, J., Singer, R. H., Condeelis, J. (1999). Correlation of beta-actin messenger RNA localization with metastatic potential in rat adenocarcinoma cell lines. Cancer Res 59, 1202-1205.

263. Shikanai, T. (2006). RNA editing in plant organelles: machinery, physiological function and evolution. Cell Mol Life Sci. 63(6), 698-708.

264. Shimizu, Y., Bhakta, M. S., Segal, D. J. (2009) Restricted spacer tolerance of a zinc finger nuclease with a six amino acid linker. Bioorg. Med. Chem. Lett. 19 (14), 3970–3972.

265. Shoubridge, E. A. (2001). Nuclear genetic defects of oxidative phosphorylation. Hum Mol Genet 10, 2277–2284.

266. Shyu, A. B., Wilkinson, M. F. (2000). The double lives of shuttling mRNA binding proteins. Cell. 102(2), 135-138.

267. Sikorski, T. W., Buratowski, S. (2009). The basal initiation machinery: Beyond the general transcription factors. Curr Opin Cell Biol 21, 344–351

268. Slomovic, S., Laufer, D., Geiger, D., Schuster, G. (2005) Polyadenylation and degradation of human mitochondrial RNA: the prokaryotic past leaves its mark. Mol Cell Biol 25, 6427–6435.

269. Small, I. D., Peeters, N. (2000) The PPR motif - a TPR-related motif prevalent in plant organellar proteins. Trends Biochem Sci. 25(2), 46-47

270. Smeitink, J., van denHeuvel, L., DiMauro, S. (2001). The genetics and pathology of oxidative phosphorylation. Nat Rev Genet 2, 342–352

271. Smith, C.W., Valcárcel, J. (2000) Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem Sci. 25(8), 381-388.

272. Smith, P. K., Krohn, R. I.,Hermanson, G. T., Mallia, A. K., Gartner, F. H., Provenzano, M. D., Fujimoto, E. K., Goeke, N. M., Olson, B. J., Klenk, D.C. (1985). Measurement of protein using bicinchoninic acid, Analytical Biochemistry, Volume 150, Issue 1, October 1985, Pages 76-85, ISSN 0003-2697

273. Smits, P., Smeitink, J. A., van den Heuvel, L. P., Huynen, M. A., Ettema, T. J. (2007). Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 35, 4686–4703.

274. Sondheimer, N., Fang, J. K., Polyak, E., Falk, M. J., Avadhani, N. G. (2010). Leucine-rich pentatricopeptide-repeat containing protein regulates mitochondrial transcription. Biochemistry. 49(35), 7467-7473

275. Sonnhammer, E. L., Eddy, S. R., Durbin, R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 28(3), 405-420.

276. Sonoda, J., Wharton, R. P. (1999). Recruitment of Nanos to hunchback mRNA by Pumilio. Genes Dev. 13(20), 2704-2712.

277. Souza, G. M., da Silva, A. M., Kuspa, A. (1999). Starvation promotes Dictyostelium development by relieving PufA inhibition of PKA translation through the YakA kinase pathway. Development 126, 3263–3274

278. Spassov, D. S., Jurecic, R. (2002).Cloning and comparative sequence analysis of PUM1 and PUM2 genes, human members of the Pumilio family of RNA-binding proteins. Gene. 299(1-2), 195-204

279. Spik, A., Oczkowski, S., Olszak, A., Formanowicz, P., Blazewicz, J., Jaruzelska, J. (2006) Human fertility protein PUMILIO2 interacts in vitro with testis mRNA encoding Cdc42 effector 3 (CEP3). Reprod Biol 6(2), 103-113.

280. Srisawat, C., Engelke, D. R. (2001). Streptavidin aptamers: affinity tags for the study of RNAs and ribonucleoproteins. RNA 7, 632-641.

281. Srisawat, C., Goldstein, I. J., Engelke, D. R. (2001). Sephadex-binding RNA ligands: rapid affinity purification of RNA from complex RNA mixtures. Nucleic Acids Res 29, E4.

282. St Johnston, D. (2005) Moving messages: the intracellular localization of mRNAs. Nat Rev Mol Cell Biol. 6(5), 363-375.

283. Stark, G. R., Kerr, I. M., Williams, B. R., Silverman, R. H., Schreiber, R. D. (1998). How cells respond to interferons. Annu Rev Biochem. 67, 227-264.

284. Steinmetz, E. J. (1997). Pre-mRNA processing and the CTD of RNA polymerase II: the tail that wags the dog? Cell 89, 491-494.

285. Sterky, F. H., Ruzzenente, B., Gustafsson, C. M., Samuelsson, T., Larsson, N. G. (2010). LRPPRC is a mitochondrial matrix protein that is conserved in metazoans. Biochem Biophys Res Commun. 398, 759–764.

286. Stern, B., Olsen, L. C., Tröße, C., Ravneberg, H., Pryme, I.F. (2007). Improving mammalian cell factories : The selection of signal peptide has a major impact on recombinant protein synthesis and secretion in mammalian cells.? Trends Cell Mol. Biol. 2, 1-1

287. Stumpf, C. R., Opperman, L., Wickens, M. (2008) Chapter 14. Analysis of RNA-protein interactions using a yeast three-hybrid system. Methods Enzymol. 449, 295-315.

288. Stumpp, M. T., Forrer, P., Binz, H. K., Plückthun, A. (2003) Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. J Mol Biol. 332(2), 471-487.

289. Subramaniam, K., Seydoux, G. (2003). Dedifferentiation of primary spermatocytes into germ cell tumors in C. elegans lacking the pumilio-like protein PUF-8. Curr Biol. 13, 134–139.

290. Suh, N., Crittenden, S. L., Goldstrohm, A., Hook, B., Thompson, B., Wickens, M., Kimble, J. (2009) FBF and its dual control of gld‐1 expression in the Caenorhabditis elegans germline. Genetics 181, 1249–1260.

291. Svoboda, P., Stein, P., Hayashi, H., Schultz, R. M. (2000) Selective reduction of dormant maternal mRNAs in mouse oocytes by RNA interference. Development 127, 4147–4156.

292. Tadauchi, T., Matsumoto, K., Herskowitz, I., and Irie, K. (2001). Post-transcriptional regulation through the HO 3′-UTR by Mpt5, a yeast homolog of Pumilio and FBF. EMBO J. 20, 552–561.

293. Takagaki, Y., and Manley, J. L. (1997). RNA recognition by the human polyadenylation factor CstF. Mol Cell Biol 17, 3907-3914.

294. Takizawa, P. A., and Vale, R. D. (2000). The myosin motor, Myo4p, binds Ash1 mRNA via the adapter protein, She3p. Proc Natl Acad Sci U S A 97, 5273-5278.

295. Takizawa, P. A., Sil, A., Swedlow, J. R., Herskowitz, I., Vale, R. D. (1997). Actin-dependent localization of an RNA encoding a cell-fate determinant in yeast. Nature 389, 90-93.

296. Tarun, S. Z. Jr, Wells, S. E., Deardorff, J. A., Sachs, A. B. (1997). Translation initiation factor eIF4G mediates in vitro poly(A) tail-dependent translation. Proc. Natl. Acad. Sci. USA 94, 9046–9051

297. Tavares-Carreón, F., Camacho-Villasana, Y., Zamudio-Ochoa, A., Shingú-Vázquez, M., Torres-Larios, A., Pérez-Martínez, X. (2008). The pentatricopeptide repeats present in Pet309 are necessary for translation but not for stability of the mitochondrial COX1 mRNA in yeast. J Biol Chem. 283(3), 1472-1479.

298. Taylor, G. A., Carballo, E., Lee, D. M., Lai, W. S., Thompson, M. J., Patel, D. D., Schenkman, D. I., Gilkeson, G. S., Broxmeyer, H. E., Haynes, B. F., and Blackshear, P. J. (1996). A pathogenetic role for TNF alpha in the syndrome of cachexia, arthritis, and autoimmunity resulting from tristetraprolin (TTP) deficiency. Immunity. 4, 445–454.

299. Tebas, P., Stein, D. (2009). Autologous T-Cells Genetically Modified at the CCR5 Gene by Zinc Finger Nucleases SB-728 for HIV. ClinicalTrials.gov

300. Temperley, R. J., Wydro, M., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2010) Human mitochondrial mRNAs-like members of all families, similar but different. Biochim Biophys Acta 1797, 1081–1085.

301. Tennyson, C. N., Klamut, H. J., Worton, R. G. (1995). The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced. Nat Genet 9, 184-190.

302. Thomson, A. M., Rogers, J. T., Walker, C. E., Staton, J. M., and Leedman, P. J. (1999). Optimized RNA gel-shift and UV cross-linking assays for characterization of cytoplasmic RNA-protein interactions. Biotechniques 27, 1032-1039, 1042.

303. Tilsner, J., Linnik, O., Christensen, N. M., Bell, K., Roberts, I. M., Lacomme, C., Oparka, K. J. (2009) Live-cell imaging of viral RNA genomes using a Pumilio-based reporter. Plant J. 57(4), 758-770.

304. Trigon, S., Serizawa, H., Conaway, J. W., Conaway, R. C., Jackson, S. P., Morange, M. (1998). Characterization of the residues phosphorylated in vitro by different C-terminal domain kinases. J Biol Chem. 273(12), 6769-6775.

305. Ulbricht, R. J., Olivas, W. M. (2008). Puf1p acts in combination with other yeast Puf proteins to control mRNA stability. RNA 14, 246–262.

306. Urban, R. J., Bodenburg, Y., Kurosky, A., Wood, T. G., and Gasic, S. (2000). Polypyrimidine tract-binding protein-associated splicing factor is a negative regulator of transcriptional activity of the porcine p450scc insulin-like growth factor response element. Mol Endocrinol 14, 774-782.

307. Vaishnaw, A. K., Gollob, J., Gamba-Vitalo, C., Hutabarat, R., Sah, D., Meyers, R., de Fougerolles, T., Maraganore, J. (2010). A status report on RNAi therapeutics. Silence. 1(1), 14.

308. van Eeden, F. St Johnston, D. (1999). The polarisation of the anterior-posterior and dorsal-ventral axes during Drosophila oogenesis. Curr Opin Genet Dev 9, 396-404

309. Van Etten, J., Schagat, T. L., Hrit, J., Weidmann, C. A., Brumbaugh, J., Coon, J. J., Goldstrohm, A. C. (2012). Human Pumilio Proteins Recruit Multiple Deadenylases to Efficiently Repress Messenger RNAs. J Biol Chem. 287(43), 36370-36383

310. van Kouwenhove, M., Kedde, M., Agami, R. (2011). MicroRNA regulation by RNA-binding proteins and its implications for cancer. Nat Rev Cancer. 11, 644–656.

311. Vessey, J. P., Schoderboeck, L., Gingi, E., Luzi, E., Riefler, J., Di Leva, F., Karra, D., Thomas, S., Kiebler, M. A., Macchi, P. (2010). Mammalian Pumilio 2 regulates dendrite morphogenesis and synaptic function. Proc Natl Acad Sci USA 107, 3223–3227.

312. Vessey, J. P., Vaccani, A., Xie, Y., Dahm, R., Karra, D., Kiebler, M. A., Macchi, P. (2006). Dendritic localization of the translational repressor Pumilio 2 and its contribution to dendritic stress granules. J Neurosci 26, 6496–6508.

313. Vincent, M., Lauriault, P., Dubois, M. F., Lavoie, S., Bensaude, O., and Chabot, B. (1996). The nuclear matrix protein p255 is a highly phosphorylated form of RNA polymerase II largest subunit which associates with spliceosomes. Nucleic Acids Res 24, 4649-4652.

314. Wahl, M. C., Will, C. L., Lührmann, R. (2009).The spliceosome: design principles of a dynamic RNP machine. Cell. 136(4):701-18

315. Wahle, E. (1995). Poly(A) tail length control is caused by termination of processive synthesis. J Biol Chem. 270, 2800-2808.

316. Walther, T. N., Wittop Koning, T. H., Schümperli, D., Müller, B. A. (1998). 5'-3' exonuclease activity involved in forming the 3' products of histone pre-mRNA processing in vitro. RNA. 4(9), 1034-1046

317. Wang, X., McLachlan, J., Zamore, P. D., Hall, T. M. (2002) Modular recognition of RNA by a human pumilio-homology domain. Cell. 110(4), 501-512.

318. Wang, Y., Cheong, C. G., Hall, T. M., Wang, Z. (2009) Engineering splicing factors with designed specificities. Nat Methods. 6(11), 825-830.

319. Wang, Z., and Burge, C. B. (2008) Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 14(5), 802-813.

320. Watanabe, T., Ito, Y., Yamada, T., Hashimoto, M., Sekine, S., Tanaka, H. (1994) The role of the C-terminal domain and type III domains of chitinase A1 from Bacillus circulans WL-12 in chitin degradation. J. Bacteriol. 176, 4465-4472.

321. Watkins, N. J., Segault, V., Charpentier, B., Nottrott, S., Fabrizio, P., Bachi, A., Wilm, M., Rosbash, M., Branlant, C., and Luhrmann, R. (2000). A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell 103, 457-466.

322. Wharton, R. P., Aggarwal, A. K. (2006). mRNA regulation by Puf domain proteins. Sci STKE. 2006(354), pe37.

323. Wharton, R. P., and Struhl, G. (1991) RNA regulatory elements mediate control of Drosophila body pattern by the posterior morphogen nanos. Cell 67, 955–967.

324. Wianny, F., Zernicka-Goetz, M. (2000) Specific interference with gene functions by double-stranded RNA in early mouse development. Nature Cell Biol. 2, 70–75.

325. Wickens, M., Bernstein, D. S., Kimble, J., Parker, R. (2002) A PUF family portrait: 3'UTR regulation as a way of life. Trends Genet 18, 150-157.

326. Will, CL., Lührmann, R. (2011) Spliceosome structure and function. Cold Spring Harb Perspect Biol. 3(7) Review.

327. Williams-Carrier, R., Kroeger, T., Barkan, A. (2008). Sequence-specific binding of a chloroplast pentatricopeptide repeat protein to its native group II intron ligand. RNA. 14(9), 1930-1941.

328. Wong, G. K., Passey, D. A., Yu, J. (2001). Most of the human genome is transcribed. Genome Res 11, 1975-1977.

329. Wood, A. J., Lo, T. W., Zeitler, B., Pickle, C. S., Ralston, E. J., Lee, A. H., Amora, R., Miller, J. C., Leung, E., Meng, X., Zhang, L., Rebar, E. J., Gregory, P. D., Urnov, F. D., Meyer, B. J. (2011). Targeted genome editing across species using ZFNs and TALENs. Science. 333(6040), 307.

330. Wreden, C., Verrotti, A. C., Schisa, J. A., Lieberfarb, M. E., Strickland, S. (1997). Nanos and pumilio establish embryonic polarity in Drosophila by promoting posterior deadenylation of hunchback mRNA. Development. 124, 3015–3023

331. Wydro, M., Bobrowicz, A., Temperley, R. J., Lightowlers, R. N., Chrzanowska-Lightowlers, Z. M. (2010) Targeting of the cytosolic poly(A) binding protein PABPC1 to mitochondria causes mitochondrial translation inhibition. Nucleic Acids Res 38, 3732–3742.

332. Xu, F., Ackerley, C., Maj, M. C., Addis, J. B., Levandovskiy, V., Lee, J., Mackay, N., Cameron, J. M., Robinson, B. H. (2008). Disruption of a mitochondrial RNA-binding protein gene results in decreased cytochrome b expression and a marked reduction in ubiquinol-cytochrome c reductase activity in mouse heart mitochondria. Biochem J. 416, 15–26.

333. Xu, F., Morin, C., Mitchell, G., Ackerley, C., Robinson, B. H. (2004). The role of the LRPPRC (leucine-rich pentatricopeptide repeat cassette) gene in cytochrome oxidase assembly:

mutation causes lowered levels of COX (cytochrome c oxidase) I and COX III mRNA. Biochem J. 382(Pt 1), 331-336

334. Yamazaki, H., Tasaka, M., Shikanai, T. (2004). PPR motifs of the nucleus-encoded factor, PGR3, function in the selective and distinct steps of chloroplast gene expression in Arabidopsis. Plant J. 38(1), 152-163.

335. Yang, Q., Doublié, S., (2011). Structural biology of poly(A) site definition. Wiley Interdiscip Rev RNA. 2(5), 732-747.

336. Ye, B., Petritsch, C., Clark, I. E., Gavis, E. R., Jan, L. Y., Jan, Y. N. (2004). Nanos and Pumilio are essential for dendrite morphogenesis in Drosophila peripheral neurons. Curr Biol. 14, 314–321.

337. Yonaha, M., Proudfoot, N. J. (2000). Transcriptional termination and coupled polyadenylation in vitro. EMBO J. 19(14), 3770-3777.

338. Zamore, P. D., Williamson, J. R., Lehmann, R. (1997) The Pumilio protein binds RNA through a conserved domain that defines a new class of RNA-binding proteins. RNA. 3(12), 1421-1433.

339. Zaphiropoulos, P. G. (1998). Mechanisms of pre-mRNA splicing - classical versus non-classical pathways. Histology & Histopathology 13, 585-589.

340. Zenklusen, D., Larson, D. R., Singer, R. H. (2008). Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol. 15(12), 1263-1271.

341. Zhang, B., Gallegos, M., Puoti, A., Durkin, E., Fields, S., Kimble, J., Wickens, M. P. (1997). A conserved RNA-binding protein that regulates sexual fates in the C. elegans hermaphrodite germ line. Nature. 390(6659), 477-484.

342. Zhang, F., Cong, L., Lodato, S., Kosuri, S., Church, G. M., Arlotta, P. (2011). Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription. Nat Biotechnol. 29(2), 149-153.

343. Zhao, J., Hyman, L., Moore, C. (1999) Formation of mRNA 3' ends in eukaryotes: Mechanism, regulation and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev 63, 405-45

344. Zheng, Z. M., Huynen, M., Baker, C. C. (1998). A pyrimidine-rich exonic splicing suppressor binds multiple RNA splicing factors and inhibits spliceosome assembly. Proc Natl Acad Sci U S A 95, 14088-14093.

345. Zhu, D., Stumpf, C. R., Krahn, J. M., Wickens, M., Hall, T. M. (2009). A 5' cytosine binding pocket in Puf3p specifies regulation of mitochondrial mRNAs. Proc Natl Acad Sci USA. 106(48), 20192-20197

346. Zipor, G., Haim-Vilmovsky, L., Gelin-Licht, R., Gadir, N., Brocard, C., Gerst, J. E. (2009) Localization of mRNAs coding for peroxisomal proteins in the yeast, Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 106(47), 19848-19853.

347. Zsigmond, L., Rigó, G., Szarka, A., Székely, G., Otvös, K., Darula, Z., Medzihradszky, K. F., Koncz, C., Koncz, Z., Szabados, L. (2008). PPR40 connects abiotic stress responses to mitochondrial electron transport. Plant Physiol. 146(4), 1721-1737

engineering rna-binding proteins: unravelling the code · engineering rna-binding proteins:...

Documents

the structure and rna-binding of poly (c) binding protein1

rna polymerase binding *constitutive or basal level...

organic additives stabilize rna aptamer binding of

a brave new world of rna-binding proteins · a brave new...

research paper integral analysis of the rna binding

rna-binding protein immunoprecipitation

localization of binding site for encephalomyocarditis virus...

puf, the magic rna binding protein: programmable rna...

2012 binding of the 5_-untranslated region of coronavirus...

inhibition of rna binding to hepatitis c virus rna-dependent...

annexin a2 is a novel rna-binding protein

mammalian synthetic circuits with rna binding proteins

distinct binding sites double-stranded rna the reovirus

advances in the characterization of rna-binding...

probing binding hot spots at protein–rna recognition sites

rna binding protein, ybx2, regulates rna stability …...rna...

the rna-binding protein sfpq ... - sni-db.stanford.edu...

evoking picomolar binding in rna by a single

regulation of rna editing by rna-binding proteins in human...

viral double-strand rna-binding proteins can enhance