automation of web applications and iterative searching for post-translational modifications

Upload: thinkerbot

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Automation of Web Applications and Iterative Searching for Post-Translational Modifications

    1/1

    Automation of Web Applications and

    Iterative Searching for Post-Translational ModicationsSimon Chiang & Kirk C. Hansen

    Biomolecular Structure Program, University of Colorado Denver

    Introduction

    The past several years have seen a proliferation of analytical tools for proteomics

    data. Several major search engines exist and proteomics toolkits are now available

    in many programming languages. The next challenge of the proteomics

    community will be nding ways to utilize these tools more eectively.

    Studies have show, for instance, that searching MS/MS proteomics data using

    multiple search engines increases the number and quality of peptide/protein

    identications[1]. Variations of this approach, including iterative searching[2], also

    hold promise for improving search results but, as a practical matter, thesetechniques require automation. A signicant barrier to automation is simply

    interfacing with the various analytical tools, the vast majority of which are online.

    Web applications provide a fairly standard interface to humans, the web form, but

    typically they do not provide a programatic interface. Moreover, analytical

    applications can be quite complex; most require a large number of congurations

    where the allowable values are hard to predict. As a result, web applications can be

    dicult to automate. Even with a program that emulates a web form, the work

    required to generate congurations is prohibitive.

    Tap-Mechanize facilities the automation of web applications by providing a simple

    and robust way to capture congurations directly from web forms. Once captured,the congurations may be resubmitted programatically, or used as a template to

    run the application in a batched fashion. Using Tap-Mechanize most web

    applications are easy to incorporate into automated workows.

    We have used Tap-Mechanize to automate several workows related to data

    preparation and processing, and to experiment with iterative searching. Iterative

    searching can take many forms. The most basic type of iterative searching simplypartitions spectra by the strength of their identications, then re-searches the weak

    or unidentied spectra using additional techniques. Thi s type of searching is

    thought to be useful when analyzing proteins with numerous post-translational

    modications (PTMs).

    One such protein is collagen. Collagen consists primarily of a GXY repeat where X

    and Y can be any amino acid. Normally X is proline and Y is hydroxyproline,

    meaning collagen is modied at approximately every third residue. As a

    consequence, hydroxylation of proline must be specied as a PTM to identify the

    majority of collagen peptides. These peptides are physiologically very relevant;hydroxyproline allows collagen molecules to wrap into tight alpha-helix spirals and

    ultimately stabilizes collagen brils. In the absence of hydroxylation, collagen

    degrades easily and the disease scurvy results.

    Using rat tail collagen as a sample, we explored the consequences of using iterative

    searching to identify PTMs, in particular the eects of partitioning spectra on the

    false discovery rate (FDR).

    AbstractData intensive elds like proteomics require researchers to interact with a wide

    variety of software that is increasingly web-based. Web applications pose special

    challenges to programmers seeking to automate their execution. Although web

    applications provide a relatively standard interface to users, the programmatic

    interfaces span numerous protocols and frequently do not exist at all.

    We present Tap-Mechanize, a system to easily capture the output of web forms for

    resubmission using a standard programatic interface. By capturing web forms into

    a standard format, Tap-Mechanize enables many web applications to be used in

    automated workows. Such workows drastically reduce the time required toanalyze large datasets, facilitate reproducibility, and enable more complicated

    techniques to be used during analysis.

    We have used Tap-Mechanize to implement iterative searching of MS/MS

    proteomics data. Iterative searching uses a quick, general search to lter spectra of

    unmodied peptides, and then performs more time-consuming searches on the

    remaining spectra. Using iterative searching we are able identify peptides with

    post-translational modications (PTMs) that normally are missed. These peptides

    are of particular interest because PTMs frequently regulate the function of proteins,and are implicated in many disease states.

    References

    1. Searle, B.C., Turner, M. & Nesvizhskii, A.I. Improving sensitivity by probabilisti-cally combining results from multiple MS/MS search methodologies. J Proteome

    Res 7, 245-53(2008).

    2. Nesvizhskii, A.I. et al. Dynamic spectrum quality assessment and iterative com-

    putational analysis of shotgun proteomic data: toward more ecient identication

    of post-translational modications, sequence polymorphisms, and novel peptides.

    Mol Cell Proteomics 5, 652-70(2006).

    3. Tap Website

    4. Mechanize

    5. Ubiquity

    Implementation Tap-Mechanize is written in the programming language Ruby and utilizes two

    distinct libraries, Tap and Mechanize. Tap (Task Application[3]) is a software

    framework that we developed to standardize our interaction with diverse software

    tools, and to facilitate the construction of automated workows. Mechanize[4] is a

    library that emulates human interactions with web forms and, although we use the

    Ruby version, originates from the Perl open-source community.

    Tap-Mechanize captures congurations by redirecting the HTTP output of a webform to a local server that parses the request into a conguration le. The

    redirection occurs via javascript that re-writes the action of the form upon

    submission. Multiple page requests, requests across https, and page requests using

    links may all be captured using this method.

    The redirection script is injected into the DOM using the Firefox plugin Ubiquity[5].

    Redirection from other browsers is currently not supported.

    Discussion

    Our experiments using Tap-Mechanize to iteratively search a collagen sample for

    hydroxylation of proline illustrates that the partitioning of search results can inatefalse discovery rates (FDRs). The eect is purely mathematical in nature. During the

    non-PTM search, the modied peptides are unidentied and therefore absent from

    the denominator of the FDR equation; as a result the decoy hits have adisproportionately high eect and the FDR increases. During the subsequent PTM

    search, the unmodied peptides are now absent from the denominator and again,

    FDR increases.

    In this example, the lowest FDRs were observed when searching for the modied

    and unmodied peptides together, without iterative searching. However, the total

    number of identications between the non-PTM and PTM searches was greater

    than the total without iterative searching.

    Collagen has an unusually high rate of modication and the observed eect should

    be less severe for most proteins. Moreover, this experiment does not prove or

    disprove the utility of iterative searching. M ostly it illustrates that the partitioning

    step used to select spectra for secondary searching must be executed carefully, and

    that there is great utility in exploring search results under many conditions.

    Without automation it is dicult to pursue studies such as this. Tap-Mechanize

    helps researchers to automate web applications by capturing congurations fromweb forms and resubmitting them within workows. This technique preserves the

    functionality built into the web interface. Moreover, it allows web applications to

    be utilized as-is, without requiring developers to provide a separate programatic

    interface.

    At the most basic level, automation allows researchers to be more productive.

    More signicantly, automation gives researchers an opportunity to examine how

    their tools work. Analytical software is complex; each conguration is meaningful,even though the exact consequences of a conguration are often unclear. The

    same can be said of the many numeric results produced during analysis. It is, as

    always, through trial and error that we enrich our appreciation of what our results

    mean.

    Iterative Search Workow

    Tap-Mechanize

    +

    Redirect

    0. Inputdata1,2. Search using Mascot,exportresults.3,4. Search using GPM,exportresults.5. Generate intersection of results6. Mapresultsaccession numbersusing PIR7. Generate graphicusing GoGetter

    Automated Analytical Workow

    0

    3

    1

    4

    2 5 6 7

    GeneOntology,GOSlims:BiologicalProcess-Weighted(Dataset1name)

    Biologicalprocess(go:0008150) (16.35%)

    Cellularprocess(go:0009987) (16.18%)

    Macromoleculemetabolicprocess (go:0043170)(15.98%Metabolicprocess(go:0008152) (15.70%)

    Nucleobase,nucleoside,nucleotide andnucleicaci..Cellcommunication(go:0007154) (8.73%)

    Regulationofbiological process(go:0050789)(6.46%

    Transport(go:0006810)(4.14%)Responsetostimulus (go:0050896)(2.48%)

    Multicellularorganismaldevelopment (go:0007275)(2Biosyntheticprocess(go:0009058) (0.67%)

    Celldifferentiation(go:0030154) (0.56%)

    Celldeath(go:0008219) (0.48%)Electrontransport(go:0006118) (0.48%)

    Secretion(go:0046903)(0.33%)

    Membranefusion(go:0006944) (0.33%)

    0. InputData1,2. Search withoutPTMs,export results

    3. Partition spectraby identication4,5. Search weak/unidentifedspectrafor PTMs6. Collate results

    strong

    weak

    2% seq cov after primary

    search without PTMs

    52% seq cov after secondary

    search for PTMs

    0 1 2 3

    4 5

    6

    CO1A2_RAT CO1A2_RAT

    Web Applications are in GreenPartition Threshold: exp > 0.05

    Primary (non-PTM)

    Secondary (PTM)

    Non-Iterative Search

    N Spectra

    1293

    1293

    1244

    Peptide Hits

    49

    326

    373

    Decoy Hits

    1

    2

    2

    FDR (%)

    2.04

    0.61

    0.54

    FDR=Decoy Hits/

    Peptide Hits

    1/49

    2/326

    2/373