iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis

Download iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis

Post on 30-Sep-2016

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>ral</p><p>ssBioMed CentBMC Bioinformatics</p><p>Open AcceSoftwareiLAP: a workflow-driven software for experimental protocol development, data acquisition and analysisGernot Stocker1, Maria Fischer1, Dietmar Rieder1, Gabriela Bindea1, Simon Kainz1, Michael Oberstolz1, James G McNally2 and Zlatko Trajanoski*1</p><p>Address: 1Institute for Genomics and Bioinformatics, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria and 2Center for Cancer Research Core Imaging Facility, Laboratory of Receptor Biology and Gene Expression, National Cancer Institute, National Institutes of Health, 41 Library Drive, Bethesda, MD 20892, USA</p><p>Email: Gernot Stocker - gernot.stocker@tugraz.at; Maria Fischer - maria.fischer@tugraz.at; Dietmar Rieder - dietmar.rieder@tugraz.at; Gabriela Bindea - gabriela.bindea@tugraz.at; Simon Kainz - simon.kainz@tugraz.at; Michael Oberstolz - michael.oberstolz@student.tugraz.at; James G McNally - mcnallyj@dce41.nci.nih.gov; Zlatko Trajanoski* - zlatko.trajanoski@tugraz.at</p><p>* Corresponding author </p><p>AbstractBackground: In recent years, the genome biology community has expended considerable effortto confront the challenges of managing heterogeneous data in a structured and organized way anddeveloped laboratory information management systems (LIMS) for both raw and processed data.On the other hand, electronic notebooks were developed to record and manage scientific data,and facilitate data-sharing. Software which enables both, management of large datasets and digitalrecording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques.</p><p>Results: We have developed iLAP (Laboratory data management, Analysis, and Protocoldevelopment), a workflow-driven information management system specifically designed to createand manage experimental protocols, and to analyze and share laboratory data. The systemcombines experimental protocol development, wizard-based data acquisition, and high-throughputdata analysis into a single, integrated system. We demonstrate the power and the flexibility of theplatform using a microscopy case study based on a combinatorial multiple fluorescence in situhybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under theopen source license AGPL from http://genome.tugraz.at/iLAP/.</p><p>Conclusion: iLAP is a flexible and versatile information management system, which has thepotential to close the gap between electronic notebooks and LIMS and can therefore be of greatvalue for a broad scientific community.</p><p>Background ments. Genome biology experiments do not only generate</p><p>Published: 26 November 2009</p><p>BMC Bioinformatics 2009, 10:390 doi:10.1186/1471-2105-10-390</p><p>Received: 1 June 2009Accepted: 26 November 2009</p><p>This article is available from: http://www.biomedcentral.com/1471-2105/10/390</p><p> 2009 Stocker et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 12(page number not for citation purposes)</p><p>The development of novel large-scale technologies hasconsiderably changed the way biologists perform experi-</p><p>a wealth of data, but they often rely on sophisticated lab-oratory protocols comprising hundreds of individual</p></li><li><p>BMC Bioinformatics 2009, 10:390 http://www.biomedcentral.com/1471-2105/10/390</p><p>steps. For example, the protocol for chromatin immuno-precipitation on a microarray (Chip-chip) has 90 steps,uses over 30 reagents and 10 different devices [1]. Evenadopting an established protocol for large-scale studiesrepresents a daunting challenge for the majority of thelabs. The development of novel laboratory protocols and/or the optimization of existing ones is still more distress-ing, since this requires systematic changes of many param-eters, conditions, and reagents. Such changes arebecoming increasingly difficult to trace using paper labbooks. A further complication for most protocols is thatmany laboratory instruments are used, which generateelectronic data stored in an unstructured way at disparatelocations. Therefore, protocol data files are seldom ornever linked to notes in lab books and can be barelyshared within or across labs. Finally, once the experimen-tal large-scale data have been generated, they must be ana-lyzed using various software tools, then stored and madeavailable for other users. Thus, it is apparent that softwaresupport for current biological research - be it genomic orperformed in a more traditional way - is urgently neededand inevitable.</p><p>In recent years, the genome biology community hasexpended considerable effort to confront the challenges ofmanaging heterogeneous data in a structured and organ-ized way and as a result developed information manage-ment systems for both raw and processed data. Laboratoryinformation management systems (LIMS) have beenimplemented for handling data entry from robotic sys-tems and tracking samples [2,3] as well as data manage-ment systems for processed data including microarrays[4,5], proteomics data [6-8], and microscopy data [9]. Thelatter systems support community standards likeFUGE[10,11], MIAME [12], MIAPE [13], or MISFISHIE[14] and have proven invaluable in a state-of-the-art lab-oratory. In general, these sophisticated systems are able tomanage and analyze data generated for only a single typeor a limited number of instruments, and were designedfor only a specific type of molecule.</p><p>On the other hand, commercial as well as open sourceelectronic notebooks [15-19] were developed to recordand manage scientific data, and facilitate data-sharing.The influences encouraging the use of electronic note-books are twofold [16]. First, much of the data that needsto be recorded in a laboratory notebook is generated elec-tronically. Transcribing data manually into a paper note-book is error-prone, and in many cases, for example,analytical data (spectra, chromatograms, photographs,etc.), transcription of the data is not possible. Second, theincorporation of high-throughput technologies into theresearch process has resulted in an increased volume of</p><p>user interfaces with standard report formats, electronicnotebooks contain unstructured data and have flexibleuser interfaces.</p><p>Software which enables both, management of large data-sets and recording of laboratory procedures, would servea real need in laboratories using medium and high-throughput techniques. To the best of our knowledge,there is no software system available, which supports tedi-ous protocol development in an intuitive way, links theplethora of generated files to the appropriate laboratorysteps and integrates further analysis tools. We have there-fore developed iLAP, a workflow-driven informationmanagement system for protocol development and datamanagement. The system combines experimental proto-col development, wizard-based data acquisition, andhigh-throughput data analysis into a single, integratedsystem. We demonstrate the power and the flexibility ofthe platform using a microscopy case study based on com-binatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction.</p><p>ImplementationWorkflow-driven software designThe design of a software platform that supports the devel-opment of protocols and data management in an experi-mental context has to be based on and directed by thelaboratory workflow. The laboratory workflow can bedivided into four principal steps: 1) project definitionphase, 2) experimental design and data acquisition phase,3) data analysis and processing phase and 4) data retrievalphase (Figure 1).</p><p>Project definition phaseA scientific project starts with a hypothesis and the choiceof methods required to address a specific biological ques-tion. Already during this initial phase it is crucial to definethe question as specifically as possible and to capture theinformation in a digital form. Documents collected dur-ing the literature research should be collated with theevolving project definition for later review or for sharingwith other researchers. All files collected in this periodshould be attached to the defined projects and experi-ments in the software.</p><p>Experimental design and data acquisitionFollowing the establishment of a hypothesis and based onpreliminary experiments, the detailed design of the bio-logical experiments is then initiated. Usually, the experi-mental work follows already established standardoperating procedures, which have to be modified andoptimized for the specific biological experiment. Theseprotocols are defined as a sequence of protocol steps.Page 2 of 12(page number not for citation purposes)</p><p>electronic data that need to be transcribed. As opposed toLIMS, which captures highly structured data through rigid</p><p>However, well-established protocols must be kept flexiblein a way that particular conditions can be changed. The</p></li><li><p>BMC Bioinformatics 2009, 10:390 http://www.biomedcentral.com/1471-2105/10/390</p><p>typically changing parameters of standard protocol steps(e.g. fixation times, temperature changes etc.) are impor-tant to record as they are used to improve the experimen-tal reproducibility.</p><p>Equipped with a collection of standard operating proce-dures, an experiment can be initiated and the data gener-ated. In general, data acquisition comprises not only filesbut also observations of interest, which might be relevantfor the interpretation of the results. Most often theseobservations disappear in paper notebooks and are notaccessible in a digital form. Hence, these experimentalnotes should be stored and attached to the originatingprotocol step, experiment or project.</p><p>essed data for subsequent analysis. In order to extractinformation and to combine it in a statistically meaning-ful manner, multiple data sets have to be acquired. Thesoftware workflow should enable also the inclusion ofexternal analytical steps, so that files resulting from exter-nal analysis software can be assigned to their original rawdata files. Finally, the data files generated at the analysisstage should be connected to the raw data, allowing con-nection of the data files with the originating experimentalcontext.</p><p>Data retrievalBy following the experimental workflow, all experimentaldata e.g. different files, protocols, notes etc. should beorganized in a chronological and project-oriented way</p><p>Mapping of the laboratory workflow onto iLAP featuresFigure 1Mapping of the laboratory workflow onto iLAP features. The software design of iLAP is inspired by a typical laboratory workflow in life sciences and offers software assistance during the process. The figure illustrates on the left panel the scientific workflow separated into four phases: project definition, data acquisition and analysis, and data retrieval. The right panel shows the main functionalities offered by iLAP.Page 3 of 12(page number not for citation purposes)</p><p>Data analysis and processingAfter storing the raw result files, additional analysis andpost-processing steps must be performed to obtain proc-</p><p>and continuously registered during their acquisition. Anadditional advantage should be the ability to search andretrieve the data. Researchers frequently have to search</p></li><li><p>BMC Bioinformatics 2009, 10:390 http://www.biomedcentral.com/1471-2105/10/390</p><p>through notebooks to find previously uninterpretableobservations. Subsequently, as the project develops, theresearchers gain a different perspective and recognize thatprior observations could lead to new discoveries. There-fore, the software should offer easy to use interfaces thatallow searches through observation notes, projects- andexperiment descriptions.</p><p>Software ArchitectureiLAP is a multi-tier client-server application and can besubdivided into different functional modules which inter-act as self-contained units according to their definedresponsibilities (see Figure 2).</p><p>Presentation tierThe presentation tier within iLAP is formed by a Webinterface, using Tapestry [20] as the model view controller</p><p>and an Axis Web service [21], which allows programmingaccess to parts of the application logic. Thus, on the clientside, a user requires an Internet connection and a recentWeb browser with Java Applet support, available foralmost every platform. In order to provide a simple, con-sistent but also attractive Web interface, iLAP follows usa-bility guidelines described in [22,23] and uses Web 2.0technologies for dynamic content generation.</p><p>Business tier and runtime environmentThe business tier is realized as view-independent applica-tion logic, which stores and retrieves datasets by commu-nicating with the persistence layer. The internalmanagement of files is also handled from a central servicecomponent, which persists the meta-information foracquired files to the database, and stores the file contentin a file-system-based data hierarchy. The business layer</p><p>Software ArchitectureFigure 2Software Architecture. iLAP features a typical three-tier architecture and can hence be divided into a presentation tier, business tier and a persistence tier (from left to right). The presentation tier is formed by a graphical user interface, accessed using a web browser. The following business layer is protected by a security layer, which enforces user authentication and authorization. After access is granted, the security layer passes the user requests to the business layer, which is mainly respon-sible for guiding the user through the laboratory workflow. This layer also coordinates all background tasks like automatic sur-veying of analysis jobs on a computing cluster or synchronizing/exchanging data with further downstream applications. (e.g. Page 4 of 12(page number not for citation purposes)</p><p>OMERO (open microscopy environment) image server). Finally, the persistence layer interacts with the relational database.</p></li><li><p>BMC Bioinformatics 2009, 10:390 http://www.biomedcentral.com/1471-2105/10/390</p><p>also holds asynchronous services for application-internalJMS messaging and for integration of external computingresources like high-performance computing clusters. Allservices of this layer are implemented as Spring [24]beans, for which the Spring-internal interceptor classesprovide transactional integrity.</p><p>The business tier and the persistence tier are bound by theSpring J2EE lightweight container, which manages thecomponent-object life cycle. Furthermore, the Spring con-text is transparently integrated into the Servlet context ofTapestry using the HiveMind [25] container backend. Thisis realized by using the automatic dependency injectionfunctionality of HiveMind which avoids integrative gluecode for lookups into the Spring container. Since iLAPuses Spring instead of EJB related components, thedeployment of the application only requires a standardconformed Servlet container. Therefore, the Servlet con-tainer Tomcat [26] is used, which offers not only Servletfunctionality bu...</p></li></ul>