mlc-seq: de novo sequencing of full-length trnas and

MLC-Seq: de novo Sequencing of Full-Length tRNAsand Quantitative Mapping of Multiple RNAModi�cationsShenglong Zhang ( [email protected] )

New York Institute of Technology https://orcid.org/0000-0001-8003-7484Xiaohong Yuan

New York Institute of Technology https://orcid.org/0000-0002-9165-5197Yue Su

New York Institute of TechnologyXudong Zhang

University of California Riverside https://orcid.org/0000-0002-5964-0067Spencer Turkel

New York Institute of TechnologyShundi Shi

Columbia UniversityXuanting Wang

Columbia UniversityEun-Jin Choi

University of Texas Medical BranchWenzhe Wu

University of Texas Medical BranchHaichuan Liu

University of California San FranciscoRosa Viner

Thermo Fisher Scienti�c (United States) https://orcid.org/0000-0003-0550-5545James Russo

Columbia UniversityWenjia Li

New York Institute of Technology https://orcid.org/0000-0001-6059-6422Xiaoyong Bao

University of Texas Medical BranchQi Chen

University of California, Riverside https://orcid.org/0000-0001-6353-9589

https://doi.org/10.21203/rs.3.rs-1090754/v1

mailto:[email protected]

https://orcid.org/0000-0001-8003-7484

https://orcid.org/0000-0002-9165-5197

https://orcid.org/0000-0002-5964-0067

https://orcid.org/0000-0003-0550-5545

https://orcid.org/0000-0001-6059-6422

https://orcid.org/0000-0001-6353-9589

Brief Communication

Keywords:

Posted Date: December 9th, 2021

DOI: https://doi.org/10.21203/rs.3.rs-1090754/v1

License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

https://doi.org/10.21203/rs.3.rs-1090754/v1

https://creativecommons.org/licenses/by/4.0/

MLC-Seq: de novo Sequencing of Full-Length tRNAs and Quantitative Mapping of 1

Multiple RNA Modifications 2

3

4

Xiaohong Yuan1a, Yue Su1a, Xudong Zhang2, Spencer J. Turkel1a, Shundi Shi3, Xuanting Wang3, 5

Eun-Jin Choi4a, Wenzhe Wu4a, Haichuan Liu5, Rosa Viner5, James J. Russo3, Wenjia Li1b, 6

Xiaoyong Bao4a, 4b,4c,4d, Qi Chen2, Shenglong Zhang1a* 7

8

9 1aDepartment of Biological and Chemical Sciences, 1bDepartment of Computer Science, New 10

York Institute of Technology, New York, NY 10023, USA 11 2Division of Biomedical Sciences, School of Medicine, University of California, Riverside, 12

Riverside, CA 92521, USA 13 3Department of Chemical Engineering, Columbia University, New York, NY 10027, USA 14 4aDepartment of Pediatrics, 4bSealy Center for Molecular Medicine, 4cthe Institute of 15

Translational Science, 4dthe Institute for Human Infections &Immunity, University of Texas 16

Medical Branch, Galveston, TX 77555, USA 17 5Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, CA 95134, USA 18

19

20

21

22

Abstract 23

24

Despite the extensive use of next-generation sequencing of RNA, simultaneous sequencing and 25

quantitative mapping of multiple RNA modifications remain challenging. Herein, we develop 26

MLC-Seq, a mass spectrometry-based direct sequencing method allowing for simultaneously 27

unravelling the RNA sequences and quantitatively mapping different tRNA nucleotide 28

modifications site-specifically. Importantly, MLC-Seq reveals the stoichiometric changes of 29

tRNA modifications upon treatment with the dealkylating enzyme AlkB, and led to the discovery 30

of new nucleotide modifications. 31

32

33

Despite wide application of high-throughput next generation sequencing (NGS), the true 34

sequence of a RNA, i.e., identity and location of each and every nucleotide building block 35

(canonical or modified) within a full-length RNA, remains a mystery1, mainly because of the 36

lack of a general method to directly sequence any nucleotide (including unknown ones) at single-37

nucleotide resolution. tRNAs, where >100 different modifications have been discovered, cannot 38

be directly sequenced efficiently. NGS-based methods, which require cDNA synthesis, are 39

generally unable to sequence modifications, and can only be “tailored” to sequence a specific 40

RNA modification type2. Direct nanopore-based RNA sequencing has been used for sequencing 41

tRNAs3, but currently suffers from a high error rate and cannot identify or pinpoint diverse tRNA 42

modifications. Also, not all tRNA modifications are modified 100% of the time, so that site-43

specific quantification of their stoichiometries is challenging4. 44

45

Mass spectrometry (MS), in particular combined liquid chromatography and tandem mass 46

spectrometry (LC-MS/MS), is recognized as the ‘gold standard’ for RNA modification analysis, 47

because it does not require a cDNA synthesis step and is not limited to specific known 48

modification types. However, LC-MS/MS analysis of RNA modifications is often restrained to 49

the ribonucleoside level. For example, RNA samples may be completely digested, enzymatically 50

or chemically, to single nucleosides for MS analysis5-7, but location information for the RNA 51

modifications is lost. To obtain RNA sequence information, LC-MS/MS-based mapping methods 52

rely on other well-established complementary methods (e.g., previous analyses or NGS)8, and 53

can only analyze short RNA oligonucleotides (~17 nt)1, due to spectral complexity and resulting 54

difficulty in data interpretation6. Therefore, they are rarely used for de novo (without prior 55

sequence information) sequencing of RNA. 56

57

LC-MS-based de novo sequencing methods have nevertheless been developed that 58

simultaneously sequence and quantify nucleotide modifications9-12. These methods rely on a 59

complete set of MS ladders that are produced not by MS/MS, but by exactly one random and 60

unbiased cut on each RNA strand via controlled enzymatic or chemical degradation9,13-15. The 61

resulting MS ladder fragments retains the nucleotide modifications from the original parental 62

RNAs, allowing identification, quantification and location of each nucleotide modification in a 63

sample. However, each MS ladder must be perfect, i.e., without any missing fragments, in order 64

to read all nucleotides in an RNA strand9,11,16. These rigorous sample preparation requirements, 65

together with the need for higher sample loading (~1000 pmol)12 for a tRNA, short read length 66

(<35 nt per run)11, and low throughput (a single RNA strand or just a few)9,11, have restricted 67

MS-based de novo sequencing applications to less complex RNA samples, making it difficult to 68

sequence and quantify nucleotide modifications from limited amounts of tRNAs enriched from 69

cells. 70

71

To systematically address the drawbacks of previous MS-based methods, we developed a de 72

novo MS ladder complementation sequencing approach (MLC-Seq) (Fig. 1a) that circumvents 73

the perfect ladder requirement for direct MS sequencing, thereby allowing de novo MS 74

sequencing of full-length cellular tRNAs, together with all nucleotide modifications, at single-75

nucleotide stoichiometric precision. To test the power of the ladder complementation strategy for 76

sequencing of modification-rich tRNAs, we first sequenced a yeast tRNA-Phe sample containing 77

five isoforms (Fig. 1b-c), which have similar sequences but differ in as little as one nucleotide or 78

modification. None of the five isoforms yielded a perfect MS ladder (Fig. 1b) (likely due to 79

inefficient LC-separation or MS measurement of minor ladder fragments from low-abundance 80

tRNA isoforms), and thus could not be sequenced by previously developed MS sequencing 81

methods. However, after ladder complementation using our MLC-Seq method, the missed MS 82

ladder fragments were corrected to generate a perfect MS ladder, allowing for direct sequencing 83

of all five tRNA isoforms and pinpointing each nucleotide or modification difference 84

quantitatively (Fig. 1c and Fig. S1) (see Methods for details). 85

86

MLC-Seq achieves this goal by incorporating a series of systematic and innovative technical 87

advances over previous de novo MS sequencing methods (Extended Data Fig. 1). First, 88

homology search identifies RNA isoforms related to a specific tRNA in a given sample 89

(Extended Data Fig. 2). Second, MassSum, a novel MS-based algorithm, takes advantage of the 90

unique mass of each RNA molecule and site-specific cleavage of RNA phosphodiester bonds 91

under controlled acid degradation conditions15, allowing identification and computational 92

isolation of ladder fragments of the RNA in a mixed RNA sample without end-labeling 93

(Extended Data Fig. 3), which was previously required to MS sequence mixed RNA samples9,11. 94

Third, MS ladder complementation allows sequencing of low-abundance RNA samples that 95

cannot provide a perfect MS ladder and thus cannot be sequenced by previous MS sequencing 96

methods (Fig. 1b). Fourth, with the latest Orbitrap MS instrument, MLC-Seq increases read 97

length of de novo MS-based RNA sequencing from ~35 nt to ~80 nt per run, allowing 98

sequencing of the full-length tRNA without T1 digestion, which was required by previous MS 99

sequencing methods, but complicated the sequencing and data analysis11. Finally, lowering 100

sample loading ~1000-fold down to ng scale (~20 ng or ~1 pmol for a tRNA), allows sequencing 101

of cellular tRNAs. 102

103

MLC-Seq allows direct sequencing of full-length tRNAs without a cDNA intermediate and 104

preserves the sample diversity and modification information in a given tRNA sample that 105

indirect cDNA-based RNA sequencing methods cannot (Fig. 1a). After verifying the sequencing 106

method on yeast tRNA-Phe, we used MLC-Seq to sequence other tRNAs enriched from mouse 107

liver. The identity, position, and stoichiometry of each modification in wild-type tRNA-Gln and 108

tRNA-Glu is shown in Fig. 2a and Extended Data Fig. 4b, respectively. As such, MLC-Seq 109

allows direct and de novo sequencing of full-length tRNAs as well as quantitative mapping of all 110

nucleotide modifications of each specific tRNA enriched from yeast or mice at single-nucleotide 111

resolution, for which no current high-throughput sequencing methods are capable. 112

113

Importantly, our method led to the discovery of new RNA modifications, a 3,4-dihydrocytidine 114

(C′) at position 16 of tRNA-Gln (Fig. 2b) and a 2-Ox-G (G′) at position 17 of tRNA-Glu 115

(Extended Data Fig. 4c), which have not been reported before. MLC-Seq showed that when 116

reading the 5′ ladder of the tRNA-Gln in the two dimensional (2D) mass-retention time (tR) plot, 117

a new ladder started to branch out at position 16, and the mass differences of ladder fragments 118

between position 16 and 15 are 307.0276 Da and 308.0340 Da, indicating a C′ co-existing with 119

D (dihydrouridine) at position 16 (D/C′: 79% vs. 21%). We propose a structure for this newly 120

discovered nucleotide C′ (Fig. 2bII), which contains two more hydrogens in positions 3 and 4 of 121

C, based on reported structural stability analysis of the isomeric C′17. Similarly, a new sequence 122

branch indicates a G′ co-existing with G at position 17 of tRNA-Glu (G/G′: 69% vs. 31%, AlkB-123

treated) (Extended Data Fig. 4c). 124

125

Notably, to differentiate isomeric methylations such as m1A and m6A, we leveraged the 126

demethylation specificity of AlkB on its target methylation types (e.g., m1A, m1G, and m3C)18,19 127

and further successfully distinguished these methylations according to their sensitivities to AlkB 128

(Fig. 2 and Extended Data Fig. 4); this experiment has also in turn, demonstrated the accuracy of 129

MLC-seq to track the stoichiometric changes of AlkB-sensitive methylated nucleotides in 130

various tRNA samples such as tRNA-Gln and tRNA-Glu extracted from mouse liver. MLC-Seq 131

results pinpoint stoichiometric changes of methylations site-specifically in tRNA-Gln after AlkB 132

treatment. In wild type tRNA-Gln, m1A at position 57 of tRNA-Gln occurred at a frequency of 133

100% (no canonical A co-existing). However, 99% of the m1A at position 57 was demethylated 134

into A after AlkB-treatment. Reading the 3′ ladder of the wild-type tRNA in the 2D mass-tR plot 135

indicates an m1A (100%) at position 57 of the tRNA (Fig. 2c). However, after AlkB-treatment, 136

MLC-Seq showed that when reading the 3′-ladder, a ladder started to branch out at position 57, 137

and that 99% A was observed at this position with only 1% m1A remaining. In wild type tRNA-138

Glu, m1A at position 57 occurred at a frequency of 68% (vs 32% A), but all m1A demethylated 139

into A after AlkB treatment (Fig. S2-3). Similarly, in wild type tRNA-Gln, only m1G was found 140

at position 9 (no co-existing canonical G). However, only 45% of the m1G was demethylated 141

into G after AlkB-treatment (Fig. S4-5). 142

143

The full potential of the method’s sequencing read length and throughput remains to be explored, 144

especially when used with state-of-the-art LC-MS instruments and their built-in capacity for 145

automation. MLC-Seq, especially MassSum, increases the throughput of de novo MS-based 146

RNA sequencing, allowing sequencing potentially of an unlimited number of RNA sequences in 147

complex RNA samples, to the extent permitted by MS instruments, and paves the way for large 148

scale de novo MS sequencing of biological samples. It is worth noting that the methods described 149

here could be used to sequence other modified longer RNAs (e.g., rRNA, snoRNA, snRNA, Y 150

RNA, and vault RNA) and small non-coding RNAs (e.g., miRNA, piRNA, tsRNA, rsRNA, and 151

ysRNAs), and as an orthogonal approach to verify RNA sequences or modifications determined 152

by high-throughput sequencing methods. Thus, MLC-Seq will provide a general sequencing tool 153

for quantitatively studying RNA modifications, which is urgently needed more than ever, 154

considering for example that >40 unidentified nucleotide modifications have been reported in 155

SARS-CoV-2 RNA20, though their identities and functions remain unknown. 156

157

158

Data and Code Availability 159

All MS sequence datasets used in the manuscript are publicly available through the 160

corresponding repository on Github (https://github.com/rnamodifications/MLC-Seq). 161

162

Source codes implemented in Python of all the algorithms described in the manuscript (including 163

homology search, identifying acid-labile nucleotides, MassSum data separation, GapFill, ladder 164

complementing) are freely available on Github (https://github.com/rnamodifications/MLC-Seq) 165

and are distributed under the MIT open source license. 166

167

Acknowledgments 168

The authors acknowledge grant support from the USA National Institutes of Health R21 169

HG009576, R56 HG011099, and RM1 HG011563 (Co-I) to S.Z., R01 HD092431 and R01 170

ES032024 to Q.C., and R01 AI116812 and R21 AI113771to X.B. 171

172

Conflict of Interest 173

The authors have filed a patent related to the technology discussed in this manuscript. 174

H.L. and R.V. are employees of Thermo Fisher Scientific, San Jose, CA 95134, USA 175

176

References: 177

178

179

180

1 Alfonzo, J. D. et al. A call for direct sequencing of full-length RNAs to identify all 181

modifications. Nat Genet 53, 1113-1116, doi:10.1038/s41588-021-00903-1 (2021). 182

2 Helm, M. & Motorin, Y. Detecting RNA modifications in the epitranscriptome: predict 183

and validate. Nat Rev Genet 18, 275-291, doi:10.1038/nrg.2016.169 (2017). 184

3 Thomas, N. K. et al. Direct Nanopore Sequencing of Individual Full Length tRNA 185

Strands. ACS Nano 15, 16642-16653, doi:10.1021/acsnano.1c06488 (2021). 186

4 Suzuki, T. The expanding world of tRNA modifications and their disease relevance. Nat 187

Rev Mol Cell Biol 22, 375-392, doi:10.1038/s41580-021-00342-0 (2021). 188

5 Su, D. et al. Quantitative analysis of ribonucleoside modifications in tRNA by HPLC-189

coupled mass spectrometry. Nat Protoc 9, 828-841, doi:10.1038/nprot.2014.047 (2014). 190

6 Lauman, R. & Garcia, B. A. Unraveling the RNA modification code with mass 191

spectrometry. Mol Omics 16, 305-315, doi:10.1039/c8mo00247a (2020). 192

7 Wetzel, C. & Limbach, P. A. Mass spectrometry of modified RNAs: recent 193

developments. Analyst 141, 16-23, doi:10.1039/c5an01797a (2016). 194

8 Kimura, S., Dedon, P. C. & Waldor, M. K. Comparative tRNA sequencing and RNA 195

mass spectrometry for surveying tRNA modifications. Nat Chem Biol 16, 964-972, 196

doi:10.1038/s41589-020-0558-1 (2020). 197

9 Zhang, N. et al. A general LC-MS-based RNA sequencing method for direct analysis of 198

multiple-base modifications in RNA mixtures. Nucleic Acids Res 47, e125, 199

doi:10.1093/nar/gkz731 (2019). 200

10 Zhang, N. et al. A General LC-MS-Based Method for Direct and De Novo Sequencing of 201

RNA Mixtures Containing both Canonical and Modified Nucleotides. Methods Mol Biol 202

2298, 261-277, doi:10.1007/978-1-0716-1374-0_17 (2021). 203

11 Zhang, N. et al. Direct Sequencing of tRNA by 2D-HELS-AA MS Seq Reveals Its 204

Different Isoforms and Dynamic Base Modifications. ACS Chem Biol 15, 1464-1472, 205

doi:10.1021/acschembio.0c00119 (2020). 206

12 Zhang, N., Shi, S., Yoo, B., Yuan, X., Li, W., Zhang, S. . 2D-HELS MS Seq: a General 207

LC-MS Based Method for Direct and de novo Sequencing of RNA Mixtures with 208

Different Nucleotide Modifications. J. Vis. Exp. e61281 (2020). 209

13 Thomas, B. & Akoulitchev, A. V. Mass spectrometry of RNA. Trends in biochemical 210

sciences 31, 173-181 (2006). 211

14 Yoluc, Y. et al. Instrumental analysis of RNA modifications. Crit Rev Biochem Mol Biol 212

56, 178-204, doi:10.1080/10409238.2021.1887807 (2021). 213

15 Bjorkbom, A. et al. Bidirectional Direct Sequencing of Noncanonical RNA by Two-214

Dimensional Analysis of Mass Chromatograms. J Am Chem Soc 137, 14430-14438, 215

doi:10.1021/jacs.5b09438 (2015). 216

16 Bahr, U., Aygun, H. & Karas, M. Sequencing of single and double stranded RNA 217

oligonucleotides by acid hydrolysis and MALDI mass spectrometry. Anal Chem 81, 218

3173-3179, doi:10.1021/ac900100x (2009). 219

17 Snider, M. J., Lazarevic, D. & Wolfenden, R. Catalysis by entropic effects: the action of 220

cytidine deaminase on 5,6-dihydrocytidine. Biochemistry 41, 3925-3930, 221

doi:10.1021/bi011696r (2002). 222

18 Shi, J. et al. PANDORA-seq expands the repertoire of regulatory small RNAs by 223

overcoming RNA modifications. Nat Cell Biol 23, 424-436, doi:10.1038/s41556-021-224

00652-7 (2021). 225

19 Cozen, A. E. et al. ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a 226

complex landscape of modified tRNA fragments. Nat Methods 12, 879-884 (2015). 227

20 Kim, D. et al. The Architecture of SARS-CoV-2 Transcriptome. Cell 181, 914-921 e910, 228

doi:10.1016/j.cell.2020.04.011 (2020). 229

230

231

232

233

Fig. 1 | MLC-Seq allows direct sequencing and quantitative mapping of all tRNA 234

modifications within their full-length sequence context at single-nucleotide resolution. 235

a, The general workflow of MLC-Seq of cellular tRNA samples. A given tRNA sample is first 236

divided into two portions; one, serving as a control, is injected directly into LC-MS to provide 237

mass and relative abundance information of all the intact tRNAs (I), while the rest is subjected to 238

the acid-degradation procedure before LC-MS to provide information including mass and related 239

retention time (tR) for tRNA ladder fragments (II). Partial nucleotide modifications or editing can 240

be first observed at the intact level and then pinpointed site-specifically at the ladder fragment 241

level. For example, at the intact level from the control sample, partial RNA modification/editing 242

such as a methylation (14 Da difference) and A-G editing (16 Da difference) can be observed 243

directly between the intact masses of the related tRNA isoforms (I). After acid degradation and 244

LC-MS, 5′- and 3′-ladders consisting of ladder fragments with various lengths corresponding to 245

nucleotides from the first to the last position on an RNA molecule, are produced and form 246

sigmoidal curves in a Mass-tR plot at the ladder fragment level (II). At this level, the specific site 247

b

a

RNA Modification/Editing (100%)

Controlled

Acid

Degradation5´-

Ladder 3

´-Ladder

Pi

OH

Pi OH

LC-MS

Data

Analysis

C C A

58

ACGCUUAA

G A C A C C U AG

CΨTGUGUm5C

C

m7GG

A

GUm5CΨ

AY

AAGmUCm

AGACC

m22GCGAGAGG

GD

D G AC U Cm2G

AU

UAGGCG

G

U

U

67

IF1

IF3

IF2

IF4

Rela

tive A

bundance

Mass (Around 24000 Da)

CH2

CH2

Ox Ox

14 Da16 Da

IF=Isoform

LC-MS Sample Diversity and

Modification Information Preserved

Partial Modification/Editing (<100%)

RNA Modification/Editing (100%)Partial Modification/Editing (<100%)

(I) Intact RNA Level

Rete

ntion T

ime (

min

)

Mass (Da)

5867

m1A

ACH

2

G

A Ox

0 25000

IF4

IF3

IF2

IF1

(II) Ladder Fragment Level

Partial Nucleotide Modification/Editing Observed

c

Imperfect: Missing

Ladder Components

Perfect: Fixed

Ladder Components

25000|

0|

20000|

15000|

10000|

5000|

Mass (Da)

PheIF1 - 5´-Ladder

PheIF2 - 5´-Ladder

PheIF4 - 5´-Ladder

PheIF3 - 5´-Ladder

PheIF5 - 5´-Ladder

PheIF2 - 5´-Ladder Converted From 3´-Ladder





ACi

CCUUAAGACACCUGCUTGUGUCUGAGGUUAGACCCGAGAGGGDDGACUCAUUUAGG Y´ G AG+C Cm+U

mAmCmGmCmG C

A

m22G CGm+A

A

* *

Position

8

4

0

De

pth

10

6

2

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

GCGGAUUUAm2GCUCAGDDGGGAGAGCm22GCCAGACmUGmAAY AΨmCUGGAGm7GUCm5CUGUGTΨCGm1AUCCACAGAAUUCGCACCAtRNA-Phe(Ref):

GCGGAUUUAmG CUCAGDDGGGAGAGCm22GCCAGACmUGmAAYÁUmCUGGAGmG UCmC UGUGTUCGmA UCCACAGAAUUCGCACCAPheIF2:

GCGGAUUUAmG CUCAGDDGGGAGAGCm22GCCAGACmUGmAAYÁUmCUGGAGmG UCmC UGUGTUCGmA UCCACAGAAUUCGCACCPheIF1:

GCGGAUUUAmG CUCAGDDGGGAGAGCm22GCCAGACmUGmAAYÁUmCUGGAGmG UCmC UGUGTUCGmA UCCACAGAAUUCGCACPheIF5:

GCGGAUUUAmG CUCAGDDGGGAGAGCm22GCCAGACmUGmAAYÁUmCUGGAGmG UCmC UGUGTUCGmA UCCACAGAGUUCGCACCAPheIF4:

GCGGAUUUAmG CUCAGDDGGGAGAGCm22GCCAGACmUGmAAYÁUmCUGGAGmG UCmC UGUGTUCGmA UCCACAGAGUUCGCACCPheIF3:

IF1

IF2

IF3IF4

G, 30%

A, 70%

A, 20%

m1A, 80%m1A, 80% G, 30%

De novo

Sequencing

Quantitative

Mapping

67 58

5858

6767

IF1

A, 70%

A, 20%5867

of the above-mentioned modification or editing can be further pinpointed at single-base 248

resolution. For example, at position 48 and 67, a second branch appears with a mass difference 249

of 14 Da and 16 Da between two ladder fragments at the same position (1st nucleotide after the 250

first branch site in the 2D Mas-tR plot), indicating the partial modification being A/mA and the 251

partial editing A/G occurring at the respective locations. Stoichiometric quantification of partial 252

RNA modification/editing can also be calculated based on the relative intensities of the 253

corresponding ladder fragments in their branches. Finally, combining with LC-MS datasets of 254

the controlled and acid-degraded portions, novel algorithms have been developed to identify 255

each tRNA species or isoform, and computationally separate its MS ladders from the LC-MS 256

data of a given tRNA sample for MLC-Seq (Extended Data Figs. 1-2 and Method Section). This 257

panel is for illustrative purposes only and does not represent any real tRNA sequence position 258

and stoichiometry of partial modification/editing. b, Ladder complementation of various related 259

tRNA isoforms allows direct sequencing that previous MS sequencing methods could not. For 260

example, a yeast tRNA-Phe containing five isoforms, but none of them has a perfect ladder 261

required by previous MS-based sequencing methods. However, after ladder complementation 262

using our MLC-Seq method, the missed MS ladder fragments were corrected to generate a 263

perfect MS ladder, allowing for direct sequencing of all five tRNA isoforms. All 3′- ladders were 264

converted to 5′- to keep the positions consistent. Depth on the bottom illustrates the number of 265

times each nucleotide was read from tRNA isoform ladders (3´and 5´-ladders). c, Sequences of 266

five tRNA-Phe isoforms, showing 100% coverage and accuracy as compared to the reported 267

tRNA-Phe reference sequence. For isomeric nucleotides or modifications such as pseudouridine 268

(ψ) versus U and methylation at different base positions, an extra step is required to differentiate 269

them following our MS sequencing approach as described previously (Ref. 11). 270

*The missing ladder fragments at positions 32 and 34 are due to methylations on the 2′-hydroxyl 271

group of Cm and Gm that block acid degradation. 272

273 Fig. 2 | a, MLC-Seq of tRNA-Gln extracted from mouse liver. I, MLC-Seq results for wildtype 274

tRNA-Gln, showing all the RNA modifications within their full-length tRNA sequence context 275

as well as the stoichiometry of each modification site-specifically. II, Identity, position, and 276

stoichiometry of each modification in wild-type tRNA-Gln. mU: a methylated U, but not Um (2′-277

O-methyluridine). III, Stoichiometric changes of 2 methylated nucleotides after AlkB-treatment. 278

The percentage of m1G at position 9 was reduced from 100% to 55% (with 45% G co-existing at 279

this position), and the percentage of m1A at position 57 was reduced from 100% to 1% (with 280

99% A co-existing at this position), respectively. b, MLC-Seq leading to the discovery of a novel 281

nucleotide of C′ (3,4-dihydrocytidine) at position 16 of tRNA-Gln. I, Structure and unit mass of 282

nucleotide D in MLC-Seq of RNA (exact unit mass: 308.0410 Da). II, Proposed structure of a 283

newly discovered nucleotide C′ (exact unit mass: 307.0569 Da) at position 16, which contains 284

two more hydrogens at positions 3 and 4 of nucleotide C. III, MLC-Seq showing that when 285

Position Base Percentage (%)

16

9

D : C´

m1G

79 : 21

100

20 D 100

19 D 100

48

32

m5C

Cm

100

100

49 C : m5C : mU 57 : 38 : 5

57 m1A 100

17 Gm 100

G : m1G

45% : 55%

A : m1A

99% : 1%

957

m1G

100%

m1A

100%

957

a I IIIII

AlkB-

Treatment

CC A

57

UCCA

GGU

G G C U C U A

A

CGAGm5C

A

ACCU

AA

GCU

CmAGGUC

UCACG

ADGGm

D

A AU G U G

U

CCUUGG

A9

75

CG

U

U

G

UU

C / m5C / mU

m1Gm1A16

20

32

49

D / C´

A / G68

68 A : G 97 : 3

tRNA-Gln

Wild-Type

4500...

b I IIIII

Proposed C´

307.0569 Da

I

12

14

16G

AG

UUC

Am1AA

U

Rete

ntion T

ime (

min

)

m1A1.0x103

2.5x103

6.3x103

1.6x104

4.0x104

1.0x105

Inte

nsity

5000 6000 7000 8000Mass (Da)

......

60 58 56 54 52 50

Position

......

3´-Ladder

12

14

16

GA

GUUC

AAA

U

Rete

ntion T

ime (

min

)

1.0x103

2.5x103

6.3x103

1.6x104

4.0x104

1.0x105

Inte

nsity

m1A

Mass (Da)... ...5000 6000 7000 8000

Position

60 58 56 54 52 50... ...

3´-Ladder

14.0106 Da, CH2

A

m1A

Wild-Type

5500 6500 7500

12

14

16

Rete

ntion T

ime (

min

)Mass (Da)

D

C´1.0x103

2.5x103

6.3x103

1.6x104

4.0x104

1.0x105

Inte

nsity

A

A Gm+G

DD

C

307.0276 Da, C´

14 16 18 20 22Position

... ...

...

A G

308.0340 Da, D

5´-Ladder

AlkB-

Treatment

II

m1A

343.0682 Da

A

329.0525 Da

c

NH

NH2

ON

O

OHO

PO

O

OH

N

NN

N

NH

O

OHO

PO

O

OH

N

NN

N

NH2

O

OHO

PO

O

OH

NH

O

ON

O

OHO

PO

O

OH

D

308.0410 Da

5´

3´

3´

5´

3´

5´

III

99%

reading the 5′ ladder of the tRNA in the 2D mass-tR plot, a new ladder started to branch out at 286

position 16, and the mass differences of ladder fragments between position 16 and 15 are 287

307.0276 Da and 308.0340 Da, indicating a C′ 3,4-dihydrocytidine co-existing with D 288

(dihydrouridine) at position 16. c, MLC-Seq results pinpoint stoichiometric changes of m1A site-289

specifically at position 57 of tRNA-Gln after AlkB treatment. I, Structure and unit mass of m1A 290

and canonical A in MLC-Seq of RNA. Their exact masses are 343.0682 Da and 329.0525 Da, 291

respectively. II-III, 99% m1A at position 57 was demethylated into A after AlkB-treatment. 292

Reading the 3′ ladder of the wild-type tRNA in the 2D mass-tR plot indicates an m1A (100%) at 293

position 57 of the tRNA (II). However, after AlkB-treatment, MLC-Seq shows that when 294

reading the 3′-ladder of the tRNA in the 2D mass-tR plot, a ladder started to branch out at 295

position 57, and that 99% A was observed at this position with only 1% m1A remaining (III). 296

297

298

299

300

301

302

303

304

305

306

307

Extended Data 308

309

310 311

Extended Data Fig. 1 | Detailed illustration for “Data Analysis” procedure in Fig. 1. a, 312

Homology Search before acid degradation. From the intact RNA mass level, the sample diversity 313

and sequence information are obtained, showing the number of RNA species in a given sample 314

and if and how they related to a specific tRNA. I-II, Homology Search allows related tRNAs to 315

be identified. After acid degradation, in order to sequence tRNA, we developed b, A complete 316

set of stepwise innovative computational tools/algorithms to synergize MS information at both 317

ladder fragment level and intact RNA level, including: I-II, Homology Search, in this step, 318

isoform1 with masses of 24610.4911 Da before acid degradation, for example, shifted to 319

24252.3692 Da after acid degradation with a mass difference of 358.1219 Da, indicating that 320

there are acid-labile nucleotide modifications, which was further pinpointed to be a wybutosine 321

(Y) at position 37 of tRNA-Phe (Extended Data Fig. 2). III, For each isoform identified, 322

MassSum algorithm was applied to isolate all its ladder fragments (pairing one in 3′-ladder and 323

one in 5′-ladder) from the complex MS data of the RNA sample (Extended Data Fig. 3). IV, 324

Results using GapFill to find non-paired ladder fragments that are missed by MassSum. After 325

ladder separation, 3′ and 5′-ladder (V) are obtained separately. VI, Ladder Complementation to 326

perfect MS ladders, allowing direct sequencing of full-length tRNA-Phe. (See methods) This 327

figure is for illustrative purposes only. 328

b

(3)

Gap

Fill

(4)

5´-

Lad

der

(5)

Lad

der

Co

mp

lem

en

tio

n

(2) MassSum (If M(S)

>0)

I III

IV V VI

(4)

3´-

Lad

der

Mass (Da)

Re

ten

tio

n T

ime

(m

in)

13 -

14 -

15 -

|

11000

|

12000

|

13000

|

0... ...

|

25000

5´-Ladder

3´-Ladder

Isoform3

Isoform2

Isoform1

Mass (Da)

Re

ten

tio

n T

ime

(m

in)

13 -

14 -

15 -

|

11000

|

12000

|

13000

|

0... ... |

25000

5´-Ladder

3´-Ladder

Isoform3

Isoform2

Isoform1 1 -

2 -

3 -

2 -

1 -

3 -

...

|

39

|

40

|

38

|

37

|

35

|

36

|

34

|

33

|

32

|

31

Position

|

1

|

76...

Iso

form

0|

13000|

11000|

12000|

Mass (Da)... ... 25000

|

...

|

39

|

40

|

38

|

37

|

35

|

36

|

34

|

33

|

32

|

31

|

1

|

76...

5´-Ladder

3´-Ladder

Position

A UAY´AAGmU mCCm

Complemented Ladder

... ...

0|

13000|

11000|

12000|

Mass (Da)... ... 25000

|

0 5000 10000 15000 20000 25000

0

5

10

15

20

Rete

ntion T

ime (

min

)

Mass (Da)

1.0x103

5.5x103

3.0x104

1.7x105

9.1x105

5.0x106

Inte

nsity (

co

un

ts)

a

(1.1

) H

om

olo

gy

Searc

h

I II

329.0525 Da

A

305.0413 Da

C

3.5x108

3.0x108

2.5x108

Inte

nsity (

co

un

ts)

2.0x108

14.0157 Da

CH2

1.5x188

1.0x108

Mass (Da)24300 24600 24900

IF2

IF1

IF3

IF4

IF2

M(IF2)

=24610.4911

IF4

M(IF4)

=24305.4032

IF1

M(IF1)

=24939.5478

Mass (Da)

3.5x108

3.0x108

2.5x108

Inte

nsity (

co

un

ts)

2.0x108

IF3

M(IF3)

=24925.5321

1.5x188

1.0x108

24300 24600 24900

IF2´

M(IF2´)

=24252.3692

(M(IF2)

-M(S)

)

IF1´

M(IF1´)

=24581.3166

(M(IF1)

-M(S)

)A

IF3´

M(IF3´)

=24567.3347

(M(IF3)

-M(S)

)

CH2

24200 24500Mass (Da)

S=Mass Shift

Acid-Labile Nucleotide Identified if Exists

Intact RNA Level

Ladder Fragment Level

II

II

(2) MassSum(If M

(S)=0)

IF=Isoform

(1.2

) H

om

olo

gy

Searc

h

329 330

Extended Data Fig. 2 | MLC-Seq homology search for identification of related RNA 331

isoforms and RNAs with acid labile nucleotide modifications. a, The identification of acid-332

labile nucleotide Y. At intact mass region (around 24kDa), masses before and after acid 333

degradation were compared, revealing a systematic mass shift of 358.1599 Da. This is consistent 334

with a change of Y to ribose Y′ under acidic conditions. b, Homology Search result before acid 335

degradation. Here C represents that the difference between two related tRNA isoforms is a 336

nucleotide C. A represents that the difference between two related tRNA isoforms is a nucleotide 337

A. Ox represents that the difference between two related tRNA isoforms is an oxygen atom. 338

Their mass differences, 305.0413, 329.0525 and 15.9949, correspond to a unit mass of 339

nucleotide C and A, and an oxygen atom. c, Proposed structural change of Y to Y′ under the 340

acid-degradation conditions. Acid-Deg=Acid-Degradation. 341

342

343

Wybutosine (Y) Ribose(Y´)

N

N

N

O

N

O

OH

HH

HHO

PO

O

OH

H

N

NHCOOCH3O

O

OH H

HCOOHO

OH

HH

HHO

PO

O

OH

OH

Isoform

Name

Mass

(Before Acid-Deg)

PheIF1

PheIF2

PheIF3

24610.4911

24939.5478

24626.4639

24252.2692

24581.3166

24268.3038

Mass

(After Acid-Deg)

PheIF4 24955.5224 24597.3533

Isoforms

PheIF5 24305.4032 23947.2566

358.2219

358.1691

358.2312

358.1601

Mass Diff

(Before - After)

358.1466

Relative Abundance

(Before Acid-Deg)

1.000

0.076

0.598

0.147

0.017

a

b c

24300 24600 24900Mass (Da)

2.5x106

4.2x106

7.2x106

1.2x107

2.1x107

3.5x107

Inte

nsity (

co

un

ts)

A

A

C

PheIF1

PheIF5

PheIF2

PheIF3PheIF4

Ox

Ox

Homology Search

(Before Acid-Deg)

344

345 346

Extended Data Fig. 3 | MassSum strategy and MassSum-based computational data 347

separation. a, A purified or mixed RNA starting material is partially acid hydrolyzed under 348

controlled conditions that predominantly generates single-cut fragments. Taking a 9 nt RNA 349

strand as an illustrative example, two ladder fragments are generated as a result of acid-mediated 350

cleavage of the phosphodiester bond between the 1st and 2nd nucleotide of the 9 nt RNA strand. 351

One of them carries the original 5´-end of the RNA strand and has a newly-formed 352

ribonucleotide 3´ (2´)-monophosphate at its 3´-end (denoted F1). The other one carries the 353

original 3´-end of the RNA strand and has a newly-formed hydroxyl at its 5´-end (denoted T8). 354

b, The mass sum of any one-cut fragment pair, e.g., mass sum of F2 and T7 or of F1 and T8, is 355

constant and is equal to the mass of the 9 nt RNA plus the mass of a water molecule. Since the 356

mass sum is unique to each RNA sequence/strand, it can be used to computationally separate all 357

paired fragments of the RNA sequence/strand that were inextricable previously in the complex 358

MS datasets. c, Computational isolation of MS data for all ladder fragments derived/degraded 359

from the most abundant tRNA-Phe isoform (monoisotopic mass: 24252.2692 Da) in both the 5´- 360

and 3´-ladders out of the complex MS data of mixed samples with multiple distinct RNA strands. 361

It is then identified as the 75 nt PheIF1 with MLC-Seq. 362

363 Extended Data Fig. 4 | Ladder complementation and MLC-Seq result for tRNA-Glu 364

extracted from mouse liver. a, Ladder complementation of 6 related tRNA isoforms allows 365

direct sequencing of a wild-type tRNA-Glu sample enriched from mouse liver. Before ladder 366

complementation, each of the six isoforms of tRNA-Glu could not provide a perfect MS ladder 367

and thus could not be sequenced by previous MS methods. After ladder complementation, the 368

missing MS ladder fragments were filled in, allowing for direct sequencing of all 6 tRNA 369

isoforms (Fig. S3). All 3′-ladders were converted to 5′- to keep the positions consistent. b, MLC-370

Seq of tRNA-Glu extracted from mouse liver. I, MLC-Seq results of wild-type tRNA-Glu, 371

showing all the RNA modifications within their full-length tRNA sequence context as well as the 372

stoichiometry of each modification site-specifically. II, Identity, position, and stoichiometry of 373

each modification in wild-type tRNA-Glu. III, Stoichiometric changes of m1A after AlkB-374

treatment. There was 68% m1A at position 57 (with 32% A) in the wild-type tRNA-Glu. All the 375

b

G : G´

69% : 31%

A

100%

1757

A : m1A

32% : 68%

1757

C C A

53

75

57

AAGGGACU

G G C C CU U

G

CUm5Um

GGGm5C

GG

C

CCGCCA

CUCUC

GCGGC

UUAGGADDG

U GA

U C U GG

U

m2GUCCCU

G

G

48

20

6

17

tRNA-Glu

Wild-Type

G / G´

A / m1A

C

m5C / m2C

Proposed G´

346.0314 Da

c

G

345.0474 Da

III

AlkB-

Treatment

Position Base

6 m2G 100

48 m5C : m2C

49 m5C 100

53 m5Um 100

57 A : m1A 32 : 68

19 D 100

20 D 100

Percentage (%)

17 G : G´ 44 : 56#

83 : 17#

Wild-TypeI IIIII

14 16 18 20 22 24

12

14

16

GU

345.0515 Da, G

G DD

A GG

A

Position

Rete

ntion T

ime (

min

)

Mass (Da)

G

G´

1.0x103

2.5x103

6.3x103

1.6x104

4.0x104

1.0x105

Inte

nsity

4800 5800 6800 7800... ...

... ...

346.0589 Da, G´

5´-Ladder

NH

NH

N

O

ON

O

OHO

PO

O

OH

NH

N

N

O

NH2N

O

OHO

PO

O

OH

I II

5´

3´

U+C+C C U G U

G+G

UC U A G U D D AGG A U UC GG CG U CU C AC C GCC GC G G C GG G

m5Um+U

C G U C C G G UC A GG GA+AU CCA

m2G

G G C

m5C

m5C*

m1A

C

Position

25000|

0|

20000|

15000|

10000|

5000|

GluW1 - 5´-Ladder

GluW2 - 5´-Ladder

GluW6 - 5´-Ladder Converted From 3´-Ladder


GluW3 - 5´-Ladder

GluW5 - 5´-Ladder



GluW4 - 5´-Ladder


tRNA-Glu Wild-Type

5´ 3´

a

GluW6 - 5´-Ladder


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

Mass (Da)

#

m1A is demethylated to become A at position 57. c, MLC-Seq leading to the discovery of a novel 376

G′ nucleotide at position 16 of tRNA-Glu. I, Structure and unit mass of nucleotide G in MLC-377

Seq of RNA (exact unit mass: 345.0474 Da). II, Proposed structure of the newly discovered 378

nucleotide G′ (exact unit mass: 346.0314 Da) at position 16, where an -NH is substituted by -O 379

(see partial structure in red). III, MLC-Seq showing that when reading the 5′ ladder of the tRNA 380

in the 2D mass-tR plot, a new ladder started to branch out at position 16, and the mass differences 381

of ladder fragments between position 16 and 17 are 345.0515 Da and 346.0589 Da, indicating a 382

G co-existing with G′ at position 16. 383

*The missing ladder fragment at position 53 is due to the methylation on the 2′-hydroxyl group 384

of m5Um at position 53 that blocks acid degradation (Fig. S2a). 385

386

#At position 17, the ratio presented is based on the dataset that provides the most continuous 387

ladder fragments. However, this ratio is not uniformly supported by all the datasets. m2C (a 388

dimethylcytidine) at position 48 was determined from the AlkB-treated sample, not the wild-type 389

sample due to the low abundance of ladder fragments for the latter (Fig. S2b). 390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

Methods 422

Reagents and Chemicals 423

All chemicals were purchased from commercial sources and used without further purification. 424

tRNA (phenylalanine specific from brewer’s yeast) was obtained from Sigma-Aldrich (St. Louis, 425

Missouri, USA). Formic acid (98–100%) was purchased from Merck (Darmstadt, 426

Germany). 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP, 99%) and N,N-diisopropylethylamine 427

(DIPEA) (99%) were purchased from Thermo Fisher Scientific (Waltham, MA, USA). All other 428

chemicals were obtained from Sigma-Aldrich unless indicated otherwise. 429

430

Mouse liver tRNA pulldown 431

The tRNAs were purified by an affinity pulldown assay combined with gel recovery, with 432

modified protocols from the previous report1. The total RNA of mouse liver was harvested by 433

TRIzolTM reagent (InvitrogenTM 15596026) as the manufacturer instructed. The concentration 434

of total RNAs solution was adjusted to 2mg/ml by RNase-free water. Small RNAs fraction (<200 435

nt) were separated by the buffer containing 50%(w/v) PEG 8000 and 0.5M NaCl solution via 436

centrifuging at 12000rpm and 4 °C for 20mins. The supernatant was collected followed by 437

adding 1/10 volume NaAc solution (InvitrogenTM). 1 ml supernatant was added with 3 ml 438

Ethanol, and 5 ul Linear Acrylamide (InvitrogenTM) to precipitate small RNAs (<200 nt) with -439

20 °C overnight incubation followed by centrifugation at 12000rpm at 4 °C for 20 min. Small 440

RNA (<200 nt) solution was adjusted to 1mg/ml, 1 ml small RNA solution with 6 ul biotinylated 441

probe(100 uM), 26 ul 20× SSC solution (InvitrogenTM) and 15 ul RNase inhibitor (NEB) were 442

incubated at 50°C overnight. 200 ul Streptavidin Sepharose (Cytiva 17511301) was added to the 443

hybridization solution to enrich the biotin labeled probe that captured with the targeted tRNA. 444

After incubation at room temperature for 30mins, the Streptavidin Sepharose was transferred to 445

the 1.5 ml Ultrafree-MC tube (Millipore) and washed by 0.5× saline-sodium citrate (SSC) buffer, 446

the washing step was repeated five times. 500 ul nuclease-free water was added to the MC tube 447

and incubated at 70°C for 15 min followed by centrifugation at 2500g at room temperature for 1 448

min to elute the RNAs that are complementary to the biotinylated probe. The eluent was 449

collected followed by adding 1/10 volume NaAc solution (InvitrogenTM). 1 ml eluent was added 450

with 3 ml Ethanol, and 5 ul Linear Acrylamide (InvitrogenTM) to precipitate RNAs with -20 °C 451

overnight incubation followed by centrifugation at 12000rpm at 4 °C for 20 min. Nuclease-free 452

water was added to dissolve the RNAs pallets. RNAs were loaded into the 7M Urea-PAGE gel to 453

run electrophoresis, the main tRNA band was recovered from the PAGE gel as previously 454

described2 to obtain enriched tRNAs for MLC-Seq. The DNA probes for pull down experiments 455

were synthesized by IDT and the sequences were listed below: 456

tRNA-Glu pulldown probe: 5´-Biotin-CTAACCACTAGACCACCAGGGA. 457

tRNA-Gln pulldown probe: 5´-Biotin-TGGAGGTTCCACCGAGATTTGA. 458

459

Northern Blot 460

A Northern Blot was performed as previously described3 to validate capture of tRNAs (as 461

described above). RNA was separated by 10% Urea-PAGE gel stained with SYBR Gold, and 462

immediately imaged, then transferred to positively charged Nylon membranes (Roche), and UV 463

crosslinked with an energy of 0.12 J. Membranes were pre-hybridized with DIG Easy Hyb 464

solution (Roche) for 1h at 42 °C. To detect tRNAs, membranes were incubated overnight (12–465

16h) at 42 °C with DIG-labelled oligonucleotides probes synthesized by Integrated DNA 466

Technologies (IDT). The membranes were washed twice with low stringent buffer (2× SSC with 467

0.1% (wt/vol) SDS) at 42 °C for 15 min each, rinsed twice with high stringent buffer (0.1× SSC 468

with 0.1% (wt/vol) SDS) for 5 min each, and rinsed in washing buffer (1× SSC) for 10 min. 469

Following the washes, the membranes were transferred into 1× blocking buffer (Roche) and 470

incubated at room temperature for 3 h, after which the Anti-Digoxigenin-AP Fab fragments 471

(Roche,) was added into the blocking buffer at a ratio of 1:10,000 and incubated for an additional 472

30mins at room temperature. The membranes were washed four times with DIG washing buffer 473

(1× maleic acid buffer, 0.3% Tween-20) for 15 min each, followed by incubation in DIG 474

detection buffer (0.1 M TrisHCl, 0.1 M NaCl, pH 9.5) for 5 min, and then coated with CSPD 475

ready-to-use reagent (Roche), incubated in the dark for 30 min at 37 °C before imaging with 476

ChemiDoc™ MP Imaging System (BIO-RAD). Digoxigenin labeled Northern blot probe for 477

tRNA detection were synthesized by IDT and the sequence was listed below: 478

tRNA-Glu Norther blot probe: 5´-DIG-CTAACCACTAGACCACCA. 479

tRNA-Gln Northern blot probe: 5´-DIG-TGGAGGTTCCACCGAGATTT. 480

481

Treatment of tRNA with AlkB 482

200 ng tRNA was incubated in a 50 μl reaction mixture containing 50 mM Na-HEPES, pH 8.0 483

(Alfa Aesar), 75 μM ferrous ammonium sulfate (pH 5.0), 1 mM α-ketoglutaric acid (Sigma 484

Aldrich), 2 mM sodium ascorbate, 50 μg/ml BSA (Sigma-Aldrich), 2.5 μl RNase inhibitor 485

(NEB), and 200 ng AlkB enzyme at 37 °C for 30 min (the recommended mass ratio of AlkB 486

enzyme to RNA is 1:1). The mixture was added to 500 μl TRIzolTM reagent to perform the RNA 487

isolation procedure as per manufacturer instructions. 488

489

Workflow of de novo sequencing of tRNA isoform mixtures 490

A given tRNA sample is firstly divided into two portions; 1/10 is injected directly into LC-MS 491

while the remaining 9/10 is subjected to acid-degradation before LC-MS injection (Fig. 1). After 492

outputting LC-MS datasets of the control portion and acid-degraded portions, novel algorithms 493

we developed (uploaded to Github (https://github.com/rnamodifications/MLC-Seq)) were used 494

to identify each tRNA species or isoform, and computationally separate its MS ladder from the 495

LC-MS data of a given tRNA sample for MLC-Seq. 496

497

Since MS-based sequencing techniques rely on a unique mass value for identifying and locating 498

each nucleotide, in the case where modifications have isomers with identical masses but different 499

chemical structures such as ψ versus U and methylation at different base positions, an extra step 500

is required to differentiate these isomeric nucleotide modifications following our MS sequencing 501

approach as described previously when sequencing a tRNA-Phe sample4. To differentiate 502

different methylations in the tRNA-Gln and tRNA-Glu, we included an additional step in the 503

MLC Seq workflow by treating the tRNA-Gln and tRNA-Glu samples with AlkB, which is 504

known to demethylate tRNA-rich modifications such as m3C, m1A, and m1G2. 505

506

Controlled Acid Degradation of tRNA Samples 507

Formic acid was applied to degrade tRNA samples, including tRNA-Phe (Sigma-Aldrich, St. 508

Louis, Missouri, USA), tRNA-Gln and tRNA-Glu from mouse live (see Sections of tRNA 509

sample preparation), to produce mass ladders according to reported experimental protocols4-7. In 510

brief, we divided each RNA sample solution into three equal aliquots of 5 μl (each in a RNase-511

free, thin walled 0.2 ml PCR tube) for formic acid degradation, each using 50% (v/v) formic acid 512

at 40 °C in a PCR machine, with one reaction running for 2 min, one for 5 min, and one for 15 513

min. The reaction mixtures were each immediately frozen on dry ice at the specified times, 514

followed by centrifugal vacuum concentration (Labconco Co., Kansas City, MO, USA) to 515

dryness, which was typically completed within 30 min. The dried samples of three different time 516

points for each specific tRNA sample were then combined and suspended in 20 μl nuclease-free, 517

deionized water for LC-MS measurement. 518

519

LC-MS Measurement of Intact tRNA and Resulting Acid-Degraded tRNA Samples 520

Each combined acid-hydrolyzed tRNA sample was individually analyzed on an Orbitrap 521

Exploris 240 mass spectrometer (ThermoFisher Scientific, Bremen, Germany) coupled to a 522

Vanquish Horizon UHPLC using a DNAPac reversed-phase (RP) column (2.1 mm x 50 mm, 523

ThermoFisher Scientific, Sunnyvale, California, USA) with 2% HFIP and 0.1% DIPEA as eluent 524

A, and methanol, 0.075% HFIP, and 0.0375% DIPEA as eluent B. A gradient of 20% to 80% B 525

over 6.7 minutes was used for the analysis of intact RNA samples, and from 15% to 35% B over 526

20 min for acid-degraded samples. The flow rate was 0.2 or 0.4 ml/min, and all separations were 527

performed with the column temperature maintained at 70 °C. Injection volumes were 5–25 μl 528

and sample amounts were 20–200 pmol of tRNA-Phe and 1-10 pmol (or ~20 ng) of tRNA-Glu 529

and tRNA-Gln. tRNA samples were analyzed in negative ion full MS mode from 410 m/z to 530

3200 m/z with a scan rate of 2 spectra/s at 120k resolution at m/z 200. The data were processed 531

using the Thermo BioPharma Finder 4.0 (ThermoFisher Scientific), and a compound detection 532

workflow with a deconvolution algorithm was used to extract relevant spectral and 533

chromatographic information from the LC-MS experiments as described previously4-7. 534

535

Homology Search 536

Once LC-MS data is displayed as a two dimensional (2D) mass-retention time (tR) plot, we start 537

a homology search of intact tRNAs in the monoisotopic mass range of >~24k Dalton using an in-538

house developed algorithm in Python (see Github) to first identify related tRNA isoforms that 539

may share the same ancestral precursor tRNA, but are different in absolute sequence, e.g., in 540

posttranscriptional profiles of nucleotide modifications, editing, and truncations. Mass 541

differences between two intact tRNA isoforms are calculated and matched to the known mass of 542

each nucleotide or nucleotide modification in the database5. For example, known mass 543

differences between intact tRNAs of 14.0157 Da and 329.0525 Da (with parts per million (ppm) 544

difference <10 ppm)8 can be assigned to a methylation (Me/-CH2-) event and an additional A 545

nucleotide, respectively. Therefore, these intact tRNAs are assigned to the same tRNA group and 546

considered as homologous isoforms of a specific tRNA for sequencing together. 547

548

The homology search is a non-target pre-selection to group possible related tRNA isoforms 549

together for sequencing. However, only one monoisotopic mass difference of two intact masses 550

is used to identify the tRNA isoforms differing, e.g., by RNA editing/modifications and/or 3′-551

CCA truncations. Thus, there may be errors when grouping a tRNA isoform that does not belong 552

to this specific tRNA group or vice versa, missing a tRNA isoform when cataloging a group. 553

These errors can be fixed later when sequencing each group of tRNA isoforms, and sequencing 554

results can further verify the inter-connection between isoforms. 555

556

Detection of all acid-labile nucleotide modifications 557

Acid-labile nucleotides are identified using another algorithm in Python (see Github). The 558

algorithm analyzes the connections between the compounds (with a monoisotopic mass >24 K 559

Dalton for tRNAs) measured by LC-MS before acid degradation and the LC-MS-detected 560

compounds after acid degradation. For each such a compound pair, if the monoisotopic mass 561

difference can be matched to a known mass difference (calculated from the possible structural 562

change to a specific nucleotide modification during acid hydrolysis), or be matched to the mass 563

difference sum of a subset of different acid-labile nucleotide modifications, the compound pair 564

will be selected and further considered as potentially containing acid-labile nucleotide 565

modifications. In general, if the intact mass of an RNA species does not change after acid 566

degradation, this intact mass will be used for MassSum data separation (See below). Otherwise, 567

an acid-labile nucleotide may be identified by matching to the observed mass difference with the 568

theoretical mass difference caused by acid-mediated structural changes of the nucleotide (See 569

Extended Data Fig. 2). 570

571

5´- and 3´-ladder separation 572

The tR differences can be used to further computationally separate these two ladders (5´- and 3´-573

ladders), breaking two adjacent sigmoidal curves into two isolated curves: one for the 3´- and the 574

other for the 5´-ladder. Due to the large number of RNA/fragment compounds, the dividing line 575

between two subsets of 5´- and 3´-ladder fragments is not visually decisive in the 2D mass-tR 576

plot. Thus, we developed a computational tool (see Github) to separate the 5´- and 3´-fragments. 577

We roughly divide all compounds in each LC-MS data pool into two subgroup areas by circling 578

compounds in the top collective curve of the 2D mass-tR plot and marking the compounds as 5´-579

ladder fragment compounds, while the compounds in the bottom curve are marked as 3´-ladder 580

fragment compounds. The purpose of selecting the top area is to include as many 5´-fragment 581

compounds as possible, while minimizing the 3´-fragments in this area. Similarly, the purpose 582

of selecting the bottom area is to include as many 3´-fragment compounds as possible, while 583

minimizing the 5´-fragments in this area. Overlap between two selected ladder subgroups is 584

inevitable, due to limited tR differences between these two subgroups. The aim of the manual 585

selection step is not to separate the 5´- and 3´-fragments from each other completely, but to serve 586

as two input ladder fragments for MassSum algorithm (See below) to output 5´- and 3´-ladder 587

fragments separately for each tRNA isoform/species. 588

589

MassSum data separation 590

MassSum is an algorithm in Python (see Github) developed based upon the acid degradation 591

principle presented in Extended Data Fig. 3. Taking advantage of the fact that each fragmented 592

pair from two ladder groups (5´- and 3´-ladders) sums to a constant mass value that is unique to 593

each specific tRNA isoform/species, MassSum can isolate ladder compounds corresponding to a 594

specific tRNA isoform. MassSum simplifies the dataset by grouping MS ladder components into 595

subsets for each tRNA isoform/species based on its unique intact mass. Since the well-controlled 596

acid degradation reaction cleaves RNA oligonucleotides at one specific site of the 597

phosphodiester bond (on average, one cut per RNA5) the masses of two RNA fragments (Mass 3´-598

portion and Mass 5´-portion) from the same strand sum to a constant value (Masssum). 599

𝑀𝑎𝑠𝑠!!"#$%&'$( +𝑀𝑎𝑠𝑠)!"#$%&'$( = 𝑀𝑎𝑠𝑠'(&*+& +𝑀𝑎𝑠𝑠,"- = 𝑀𝑎𝑠𝑠./0 (1) 600

Taking advantage of this unique mass sum between the paired 3´- and 5´-ladder fragments 601

(Equation 1), the algorithm chooses two random compounds from the acid-degraded LC-MS 602

dataset and adds their mass values together, one pair at a time. If the sum of the two selected 603

compounds is equal to a specific Masssum, these two compounds will be selected into the pool for 604

each RNA accordingly. The process repeats until all compound pairs have been inspected. In the 605

end, MassSum will cluster the dataset into different groups; each group is a subset that contains 606

3´- and 5´-ladders of one specific RNA sequence. 607

608

GapFill 609

GapFill is another Python-based algorithm (see Github) developed as a complement to 610

MassSum. Since MassSum handles compounds in pairs, in the case that one ladder fragment is 611

missing, e.g., in the 5´-ladder, the corresponding single-cut ladder fragment, even if it exists in 612

the 3´-ladder, will not be separated/called by the MassSum algorithm. In order to extract all 613

ladder fragments from the complex MS data, a GapFill algorithm was designed to “rescue” any 614

ladder fragments missed by MassSum separation. GapFill identifies a gap that has ladder 615

fragments missing between ladder fragment compounds, e.g., two adjacent compounds with 616

Mass values Mass5´-i and Mass5´-j in the 5´-ladder which were found by MassSum algorithm. 617

Within the gap, there exist many ladder compounds in the degraded LC-MS dataset, but 618

presumably none were selected by the MassSum algorithm during data separation. GapFill 619

iterates over each potential compound in the gap in the original LC-MS dataset, and examines 620

the mass differences of this compound and the two ending compounds, Mass5´-i and Mass5´-j. If 621

the mass difference is equal to the sum of one or more nucleotides or modifications in the RNA 622

modification database5, we define it as a connection. If the compound in the gap has connections 623

with both ending compounds, this compound would be selected into a candidate pool for the later 624

sequencing process. After iteration, GapFill calculates connections of the compounds in the 625

candidate pool and assigns weights to them based on the frequency of each connection. The 626

compounds that contain the highest weights would be the ones chosen to fill in the gap. 627

628

Ladder complementation and generation of RNA sequences 629

After MassSum and GapFill, each tRNA isoform has its own set of separate 5´-and 3´-ladders 630

(not combined). Each ladder (5´- or 3´-) consists of a ladder sequence, and we can determine if 631

these ladders are perfect (without missing any ladder fragments), which would allow reading of 632

the full RNA sequence (from the first to the last nucleotide in the sequence). If not, we can 633

complement ladders from other related isoforms in order to obtain a more complete ladder for 634

sequencing (ideally no missing ladder fragment or as complete as possible). A Python-based 635

computational algorithm (see Github) was designed to align ladders from related isoforms based 636

on the position of the ladder fragment in the 5´à3´ direction. For example, we lay out each 637

tRNA-Phe isoform’s full 5´-ladder, e.g., the 5´-ladder in Fig. 1b, on top of each other vertically; 638

horizontally, we align the 5´- ladder of each isoform according to the position of each 639

corresponding ladder fragment, ranging from nucleotide position 1 to nucleotide position 76 for 640

tRNA-Phe. Ladder complementation can be performed separately on 5´- or 3´-ladders separately 641

(but not mixed ladders), resulting in one final 5´-ladder or one final 3´-ladder. Additionally, all 642

3´-ladder fragments can be converted to their corresponding 5´-ladder fragments for each tRNA 643

isoform based on the MassSum principle. As such, each tRNA isoform could have two 5´-ladder 644

fragments in each position on the 5´-ladder: one original 5´-ladder fragment, and a second ladder 645

fragment that was converted from its corresponding 3´-ladder fragment, for reaffirmation and/or 646

complementation. 647

648

Stoichiometric quantification of partial nucleotide modifications/editing 649

Stoichiometries/percentages of partial nucleotide modifications/editing were quantified 650

according to reported protocols4,6,7. In brief, to accurately determine the ratio of partial 651

modification/editing, from datasets of multiple experimental trials (n ≥ 3) of a given tRNA 652

sample, 3 pairs of ladder fragments (one in the original ladder, and the other in the branched 653

ladder) were taken among the partial modification position (first nucleotide after the branch point 654

where the partial modification was observed) or otherwise its immediate next ladder fragment in 655

the same ladder in case the ladder fragment at the partial modification position does not exist, 656

e.g., due to methylation on the 2′-hydroxyl group of Cm that blocks acid degradation. Mean ratio 657

and standard deviation were calculated. 658

659

660

661

662

663

References: 664

665

666

667

1 Drino, A. et al. Production and purification of endogenously modified tRNA-derived 668

small RNAs. RNA Biol 17, 1104-1115, doi:10.1080/15476286.2020.1733798 (2020). 669

2 Shi, J. et al. PANDORA-seq expands the repertoire of regulatory small RNAs by 670

overcoming RNA modifications. Nat Cell Biol 23, 424-436, doi:10.1038/s41556-021-671

00652-7 (2021). 672

3 Zhang, Y. et al. Dnmt2 mediates intergenerational transmission of paternally acquired 673

metabolic disorders through sperm small non-coding RNAs. Nat Cell Biol 20, 535-540, 674

doi:10.1038/s41556-018-0087-2 (2018). 675

4 Zhang, N. et al. Direct Sequencing of tRNA by 2D-HELS-AA MS Seq Reveals Its 676

Different Isoforms and Dynamic Base Modifications. ACS Chem Biol 15, 1464-1472, 677

doi:10.1021/acschembio.0c00119 (2020). 678

5 Bjorkbom, A. et al. Bidirectional direct sequencing of noncanonical RNA by two-679

dimensional analysis of mass chromatograms. J Am Chem Soc 137, 14430-14438, 680

doi:10.1021/jacs.5b09438 (2015). 681

6 Zhang, N. et al. A general LC-MS-based RNA sequencing method for direct analysis of 682

multiple-base modifications in RNA mixtures. Nucleic Acids Res 47, e125, 683

doi:10.1093/nar/gkz731 (2019). 684

7 Zhang, N., Shi, S., Yoo, B., Yuan, X., Li, W., Zhang, S. . 2D-HELS MS Seq: a General 685

LC-MS Based Method for Direct and de novo Sequencing of RNA Mixtures with 686

Different Nucleotide Modifications. J. Vis. Exp. e61281 (2020). 687

8 Brenton, A. G. & Godfrey, A. R. Accurate mass measurement: terminology and treatment 688

of data. J Am Soc Mass Spectrom 21, 1821-1835, doi:10.1016/j.jasms.2010.06.006 689

(2010). 690

691

692

Supplementary Files

This is a list of supplementary �les associated with this preprint. Click to download.

SupportingInformation�nal.docx

zhangepc�at.pdf

https://assets.researchsquare.com/files/rs-1090754/v1/c1f274e12a3a19eae4912660.docx

https://assets.researchsquare.com/files/rs-1090754/v1/b562b1a11a080769419c41b7.pdf

mlc-seq: de novo sequencing of full-length trnas and

Documents