processing data from gs flx instrument using unicore workflow system

Post on 12-Sep-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Processing data from GS FLX Instrument using UNICORE workflow system

M. Borcz1,2 R. Kluszczyński2 K. Skonieczna3,4 T. Grzybowski3 Piotr Bała1,2

1Faculty of Mathematics and Computer Science, UMK, Toruń

2ICM University of Warsaw

3Collegium Medicum, UMK, Bydgoszcz

4Postgraduate School, Medical University of Warsaw

PROCESSING TIME

STORAGE

TECHNICAL SUPPORT

AUTOMATION

FLEXIBILITY

SECURITY

PTBI2012 M. Borcz

MOTIVATION

PTBI2012 M. Borcz

PL-GRID

„The goal of the PL-Grid project (Polish Infrastructure for Supporting Computational Science in the European Research Space) is to provide the Polish scientific community with an IT platform based on Grid computer clusters, enabling e-science research in various fields.

PL-Grid aims at significantly extending the amount of computing resources provided to the Polish scientific community (by approximately 215 TFlops of computing power and 2500 TB of storage capacity) and constructing a Grid system that will facilitate effective and innovative use of the available resources.”

www.plgrid.pl

PROCESSING TIME

STORAGE

TECHNICAL SUPPORT

AUTOMATION

FLEXIBILITY

SECURITY

PTBI2012 M. Borcz

MOTIVATION

PTBI2012 M. Borcz

UNICORE UNICORE (Uniform Interface to Computing Resources) is a middleware enabling

access to the Grid resources in a seamless and secure way. UNICORE is a part of Unified

Middleware Distribution developed by EMI project.

www.unicore.eu

www.eu-emi.eu

UNICORE RichClient (URC)

UNICORE CommandlineClient (UCC)

High-LevelAPI (HiLA)

PTBI2012 M. Borcz

UNICORE in PL-Grid

PTBI2012 M. Borcz

EXPERIMENT

Determination of the 18 complete mitochondrial genome sequences of tumor and matched non-tumor tissues obtained from 9 patients diagnosed with colorectal cancer

mtDNA sequences comparison with the reference sequence

mtDNA mutation identification

Ultra high speed processing of mtDNA sequence data.

High-throughput GS FLX Instrument (Roche Diagnostics)

Up to 1 million reads of approximately 500 bp long in a single experiment

PTBI2012 M. Borcz

WORKFLOW

GSRunProcessor : Data from GS FLX Instrument (Roche Diagnostics) , SFF and CWF files

GSReferenceMapper: SFF files GSReporter: CWF files GSAssembler: SFF files, FASTA file

BLAST: FASTA file

PTBI2012 M. Borcz

DATA PROCESSING

High-throughput GS FLX Instrument (Roche Diagnostics) UNICORE Commandline Client (UFTP)

Target System Storage (PL-Grid)

UNICORE Rich Client Batch System (PL-Grid):

GS Run Processor GS Reporter GS Reference Mapper GS Assembler BLAST

PTBI2012 M. Borcz

STORAGE

PTBI2012 M. Borcz

UNICORE RICH CLIENT Gridbeans are plug-ins enabling to run an application on the grid. They generate description of the job and supply user with graphical interface to enter input data and present results.

PTBI2012 M. Borcz

WORKFLOW EDITOR Gridbeans can be used to build simple jobs or can be treated as building blocks

for workflows consisting of various tasks and operations.

PTBI2012 M. Borcz

DETAILS

Data: 17 Gb

Images: 834 files

File size: 33Mb

Transfer: 3s / file

GSRunAnalysisPipe:

Interlagos: AMD Opteron(TM) Processor 6272 @ 2.10GHz

AMD: AMD Opteron(tm) Processor 6174 @ 2.20GHz

Intel: Intel(R) Xeon(R) CPU, X5660 @ 2.80GHz (inifiniband)

1 cpu: 70.0h

8x8 cpu (Intel, MPI): 2.5h

PTBI2012 M. Borcz

SHORT DEMONSTRATION

PTBI2012 M. Borcz

REFERENCES

www.unicore.eu

www.plgrid.pl

www.eu-emi.eu

www.454.com

„Building a National Distributed e-Infrastructure - PL-Grid” Lecture Notes in Computer Science, Vol 7136, in the subseries: Information Systems and Applications, incl. Internet / Web, and HCI.

top related