mohamed abouelhoda: next generation workflow systems on the cloud: the tavaxy system
DESCRIPTION
Mohamed Abouelhoda's talk at ISCB-Asia on Next Generation Workflow Systems on the Cloud: The Tavaxy System. December 19th 2012TRANSCRIPT
![Page 1: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/1.jpg)
Tavaxy Integrating Taverna and Galaxy with Cloud
Computing Support
Mohamed Abouelhoda
Nile University
Egypt
1
![Page 2: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/2.jpg)
Workflows in Bioinformatics
Get Protein Sequence
Clustalw
T-Coffee
Finding Homologous Sequences
BLASTp Get Id &
sequences
2
![Page 3: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/3.jpg)
Implementing Scientific Workflows
Method 1: Write Python/Perl/Shell script
Advantages • Reliable and efficient • Comprehensive programming capabilities (conditionals, loops, etc..)
Disadvantages • Requires programming skills , especially with HPC resources • Scripts are workflow-specific • Costly to create, debug and modify • Requires installing and managing tools
3
![Page 4: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/4.jpg)
Implementing Scientific Workflows
Methods 2: Use of Workflow Systems
http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems
Most popular in Bioinformatics
4
• Kepler • Triana • WildFire • GenePattern • Pegsus • Taverna • Galaxy • and many more
![Page 5: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/5.jpg)
Composing a Workflow using Galaxy
Workflow Environment
Tool Library
Canvas for workflow editing
5
![Page 6: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/6.jpg)
Composing a Workflow using Galaxy
Create new workflow and give it a name
6
![Page 7: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/7.jpg)
Composing a Workflow using Galaxy
Click to create input node
7
![Page 8: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/8.jpg)
Composing a Workflow using Galaxy
Input node created, will read input from
user account
8
![Page 9: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/9.jpg)
Composing a Workflow using Galaxy
Choose BLAST tool and click to create it
BLAST node is custom
one added to the system
9
![Page 10: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/10.jpg)
Composing a Workflow using Galaxy
1. BLAST node created
2. Window opened to set BLAST parameters
10
![Page 11: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/11.jpg)
Composing a Workflow using Galaxy
Connect output of input node as input to BLAST
11
![Page 12: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/12.jpg)
Composing a Workflow using Galaxy
Complete composing workflow
12
![Page 13: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/13.jpg)
Composing a Workflow using Galaxy
Start running the workflow
13
![Page 14: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/14.jpg)
Composing a Workflow using Galaxy
Screen showing parts of the output files and provides links to them
14
![Page 15: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/15.jpg)
Benefits of Workflow Systems
• Intuitive abstract means for describing computational experiments
• Requires no programming expertise
• Easy to modify
• Hide execution details: invocation and scheduling
• Direct use of parallel architectures
Accelerates computation
15
![Page 16: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/16.jpg)
Benefits of Workflow Systems
• Come with library of tools or access directories Tool accessibility
• Store experiment details (used tools, their parameters, and used data) Reproducibility
• Share workflows with analysis history including intermediate results Transparency
Formalize computation
16
Mesirov. Science, 2010 B. Giardine, et al. Genome Research, 2005.
![Page 17: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/17.jpg)
Implementing Scientific Workflows
Methods 2: Use of Workflow Systems
http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems
Most popular in Bioinformatics
17
• Kepler • Triana • WildFire • GenePattern • Pegsus • Taverna • Galaxy • and many more
![Page 18: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/18.jpg)
Composing a Workflow using Taverna
Example: Homology Workflow
Read Protein Sequence
Clustalw
T-Coffee
BLASTp Get Id &
sequences
18
![Page 19: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/19.jpg)
Composing a Workflow using Taverna
Open the workflow editor, it is desktop
application
19
![Page 20: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/20.jpg)
Composing a Workflow using Taverna
•Choose a web-service and copy its URL if not in the service directory, •Here we will use BLAST at EBI
20
![Page 21: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/21.jpg)
Composing a Workflow using Taverna
Paste service URL so that Taverna collects
information about it (i.e., methods and parameters)
21
![Page 22: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/22.jpg)
Composing a Workflow using Taverna
Service information
retrieved
22
![Page 23: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/23.jpg)
Composing a Workflow using Taverna
Create node for the BLAST service
23
![Page 24: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/24.jpg)
24
Composing a Workflow using Taverna
Create node to receive input
24
![Page 25: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/25.jpg)
Composing a Workflow using Taverna
Create nodes to prepare input and parameters in XML
25
![Page 26: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/26.jpg)
Composing a Workflow using Taverna
Create node to receive Job ID from
service provider
26
![Page 27: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/27.jpg)
Composing a Workflow using Taverna
Input data and parameters are merged in
XML file
27
![Page 28: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/28.jpg)
Composing a Workflow using Taverna
Create sub-workflow to check status of the asynchronous
service
28
![Page 29: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/29.jpg)
Composing a Workflow using Taverna
Rename it Poll status
29
![Page 30: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/30.jpg)
Composing a Workflow using Taverna
Put in place after job submission
30
![Page 31: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/31.jpg)
Composing a Workflow using Taverna
Use get result method of the web-service to retrieve data
when ready
31
![Page 32: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/32.jpg)
Composing a Workflow using Taverna
Get result are created
32
![Page 33: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/33.jpg)
Composing a Workflow using Taverna
Start putting Get_Result in place in the workflow
33
![Page 34: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/34.jpg)
Composing a Workflow using Taverna
Specify output type (BLAST xml)
34
![Page 35: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/35.jpg)
Composing a Workflow using Taverna
Start retrieval when check status is done
Short cut to sub-workflow
35
![Page 36: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/36.jpg)
Composing a Workflow using Taverna
The rest of the workflow
Read Protein Sequence
Clustalw
T-Coffee
BLASTpGet Id &
sequences
36
![Page 37: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/37.jpg)
Taverna vs. Galaxy
D. Hull, et al. Nucleic Acids Research, 2006. B. Giardine, et al. Genome Research, 2005.
T. Oinn, et al. Bioinformatics, 2004. 37
Taverna Galaxy
Purpose General Bioinformatics
Interface Desktop Web-interface
Usability More difficult Easy
Engine Control flow Data flow
Control constructs Yes No
Jobs Web-service Local invocation
Programs Service directory Library of tools
Use of Local HPC Threads only Threads/Cluster
Taverna Galaxy
![Page 38: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/38.jpg)
Communities
Taverna MyExperiment Galaxy Pages
PIC
38
What if we can make use of both repositories and what if we can have
advantages of both systems??
![Page 39: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/39.jpg)
Taverna
39
Galaxy
Integrating Taverna and Galaxy Workflows
![Page 40: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/40.jpg)
Tavaxy
40
Integrating Taverna and Galaxy Workflows
![Page 41: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/41.jpg)
Tavaxy
• A standalone workflow system based on workflow patterns
• Integrates both Taverna and Galaxy workflows and improve their
performance
• Easy to use and includes a large library of software tools
• Exploits high performance computing resources (parallel
infrastructure) with all details being hidden
• Runs on local or cloud computing infrastructure
• Optimized to handle large datasets
41
![Page 42: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/42.jpg)
Tavaxy Environment
42
![Page 43: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/43.jpg)
Input node
Pattern nodes
Sub-workflow
nodes
Tool node
Tavaxy Nodes (Processing Units)
• If-else conditional • Iteration • Data Merge • Data Select
• Pattern nodes, Sub-workflow nodes • Remote execution
New in Tavaxy and not in Galaxy
43
• Easier to use interface • Direct use of local HPC
New in Tavaxy and not in Taverna
![Page 44: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/44.jpg)
• EMBOSS: Package of sequence analysis tools
• SAMtools: Package of scirpts and programs to handle NGS data
• Fastx: Package for manipulating fasta files
• Galaxy tools: A set of data utilities and tools developed by the galaxy team
• Taverna tools: A set of tools based on web-services based on Taverna. collection This section includes as well a set of data manipulation utilities developed by the taverna team.
• Tavaxy tools: This section includes extra utilities and tools for sequence analysis and genome comparison developed/added by the Tavaxy team.
• Cloud utilities: A set of data manipulation and configuration of computing infrastructure on the Amazon cloud.
220 Tools organized in the following categories
Tools in Tavaxy
44
![Page 45: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/45.jpg)
NGS Mapping Tools and Databases in Tavaxy
• MAQ
• BFAST
• BWA
• Bowtie
• …
Tools
• Reference human genome hg36
• Indexed genome for MAQ, Bowtie
• NCBI nucleotide/protein DBs
Databases
45
![Page 46: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/46.jpg)
Data Patterns of Tavaxy
To facilitate execution of data-intensive tasks in cloud computing cluster
46
![Page 47: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/47.jpg)
Domenstration 1 Importing Taverna Workflow
47
![Page 48: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/48.jpg)
The Tavaxy Project
Taverna or Galaxy workflows can be o imported , open and re-draw in Tavaxy environment o re-designed, delete or re-order workflow nodes o enhanced, add new nodes and sub-workflows o optimized,
Tavaxy concstructs can be used to exploit parallelization remote Taverna calls can be replaced with local tools
o and executed in Tavaxy on local (HPC) infrastructure on cloud
Single environment for integrating Galaxy and Taverna workflows:
48
![Page 49: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/49.jpg)
Importing Taverna Workflow
49
Taverna Workflow
![Page 50: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/50.jpg)
50
Taverna Workflow
![Page 51: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/51.jpg)
51
Imported Taverna Workflow
![Page 52: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/52.jpg)
53
Optimized Imported Taverna Workflow
![Page 53: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/53.jpg)
Idea of Integration
Workflow Patterns
54
- Taverna is used as a secondary engine to execute Taverna sub-workflows - The use of patterns as special nodes in the data-flow oriented engine - Simulating control constructs over the data flow digraph representing the workflow - Use of sub-workflow to enable iteration pattern
Bag of techniques
![Page 54: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/54.jpg)
High Performance Cloud Computing Support
55
![Page 55: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/55.jpg)
• Whole system instantiation
• Delegating execution of a sub-workflow to the cloud
• Delegating execution of one tool to the cloud
Three Modes for Supporting Cloud
56
![Page 56: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/56.jpg)
User case scenario
57
1
Cloud Platform
Create master node
Master node creates worker nodes
1
4
3
2
Instance disks are mounted
Connection to S3 is established
Whole System Instantiation
2
S3 Storage
4 3
(Instance Disks) EBS
5 Submit data, command line (and executable) and execute
5
Include Tavaxy tools and DBs
![Page 57: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/57.jpg)
User case scenario
58
Create master node
Master node creates worker nodes
1
4
3
2
Instance disks are mounted
Connection to S3 is established
Sub-workflow and tool delegation
5 Submit data, command line (and executable) and execute
1
Cloud Platform
2
S3 Storage
4 3
(Instance Disks) EBS
5
![Page 58: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/58.jpg)
Whole System Instantiation
59
![Page 59: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/59.jpg)
Whole System Instantiation
60
![Page 60: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/60.jpg)
Sub-workflow/Tool Instantiation
61
![Page 61: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/61.jpg)
Exploiting Cloud Computing
Two steps: 1- Enter AWS credentials 2- Define your cluster
62
![Page 62: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/62.jpg)
Supporting Task Parallelism
• Branching in the DAG implies independent execution • Independent jobs can run in parallel
I. Parallelism due to branching
![Page 63: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/63.jpg)
Tavaxy Data Patterns
[x1,.., xn]
=[A(x1),..,A(xn)] a
[x1,.., xn]
=[A((x1, y1)),..,A((xn, yn))] a
A
[y1,.., yn] A
B
=[B(A(x1)),..,B(A(xn))] b
Supporting Data Intensive Tasks
• For an input as a list, node A can process the list items in parallel
Data parallelism
Single List Multiple List
![Page 64: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/64.jpg)
Personalized Medicine Workflow
65
![Page 65: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/65.jpg)
Personalized Medicine Workflow
Individual Whole Genome Mapping and Variant Annotation Tonellato-Wall
66
![Page 66: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/66.jpg)
Workflow Sktech
Alignment
67
Variant analysis
![Page 67: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/67.jpg)
Personalized Medicine Workflow on Tavaxy
Read Mapping
Variant Calling
Formatting
SNP/DiseaseDatabase Queries
68
![Page 68: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/68.jpg)
Exploiting Parallelization and Locality of Data
Use of list data pattern
Fork Pattern
Data base are hosted locally
Reads are defined as list which implies
parallel processing of blocks of reads
Parallel Task Execution
(Parallel DBs Queries)
69
![Page 69: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/69.jpg)
Read-Mapping with Crossbow
• Crossbow (based on EMR/Hadoop) is used to map set of human reads to human genome.
• With this cluster, we mounted EBS volumes including the reference human genome files (each including one chromosome)
• Read Datasets: – illumnia reads of around 13 Gbp (47 GB) from from the African
genome,
– the human genome version hg18, build 36
70
![Page 70: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/70.jpg)
Read-Mapping with Crossbow
71
![Page 71: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/71.jpg)
Domenstration 3 Metagenomics Workflow
72
![Page 72: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/72.jpg)
NGS-based Metagenomics Study
73
Galaxy Workflow Optimized Tavaxy Workflow
![Page 73: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/73.jpg)
NGS-based Metagenomics Study
74
Mega-blast run on cloud
![Page 74: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/74.jpg)
NGS-based Metagenomics Study
• The MegaBLAST program is used to annotate a set of sequences coming from a metagenomics experiment.
• With this cluster, we mounted EBS volumes including the NCBI NT database
• Datasets: Windshield dataset composed of two collections of 454 FLX reads: – Trip A: 138575 (25.3 Mbp) Mb104283 (18.8 Mbp)
– Trip B: ) 151000 (30.2 Mbp) 79460 (12.7 Mbp)
75
![Page 75: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/75.jpg)
Bioinformatics Experiment (2)
• We used the windshield dataset which is composed of two collections of 454 FLX reads.
• These reads came from the DNA of the organic matter on the windshield of a moving vehicle that visited two geographic locations (trips A and B).
• For each trip A or B, there are two subsets for the left and right part of the windshield.
• The number of reads are 66575 (12.3 Mbp) 71000 (14.2 Mbp) 104283 (18.8 Mbp) 79460 (12.7 Mbp) for trips A Left, B Left, A Right, and B Right, respectively.
• The queries were queued in parallel, fetch strategy was used
76
![Page 76: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/76.jpg)
• Use of workflow systems provides flexibility and efficiency
• High performance and cloud computing resources are exploited with technical details being hidden
• Future work include – Further performance optimization for execution on local and cloud
infrastructures.
– Supporting multiple cloud providers
– Handling larger data sizes in multi-use environment
Conclusions and Future Work
77
![Page 77: Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System](https://reader034.vdocuments.mx/reader034/viewer/2022051819/54c8ae1e4a7959da358b456b/html5/thumbnails/77.jpg)
Thanks for attention
78