![Page 1: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/1.jpg)
Introduction to and
INF-BIO5121/9121
Sveinung GundersenELIXIR.NO / Dept. of Informatics, UiO
Oct 7, 2014
![Page 2: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/2.jpg)
Credit
• Some of this presentation (most figures) is fetched from the presentation “Introduction to Lifeportal” held by Karin Lagesen, provided under the CC-by license (http://creativecommons.org/licenses/by/4.0/). Modifications have been made.
![Page 3: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/3.jpg)
• We are doing science, also on the computer!
• 4-5-6 is typically done on the computer anyway
• But the methods/software used in bioinformatics often give very varied results
• We should really think of computer analysis as part of the experiment, aiming for the same level of rigor and reproducibility!by Tiffany Ard, Nerdy Baby artwork,
https://www.facebook.com/NerdyBabyLLC
![Page 4: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/4.jpg)
Galaxy• Developed at Penn State and Emory
Universities, for over 10 years by a large development team
• Aims to be a framework for “supporting
• Accessible
• Reproducible
• Transparent
• computational research in the life sciences” (Goecks et. al., Genome Biology 2010)
![Page 5: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/5.jpg)
Accessible
• Users do not need to learn the command line
• Web-based solution, point-and-click
• Consistent look and feel
• Easy to upload your own datasets, or import datasets from established data warehouses
![Page 6: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/6.jpg)
Reproducible
• Bioinformaticians gets surprised every time they need to redo/modify previous analyses
• But bench biologists already know the importance of reproducibility!
• You also know that even with a detailed lab journal, reproduction is a challenge
• The question is then how this manifests itself when doing analysis on a computer
![Page 7: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/7.jpg)
What is in silico reproducibility?
• Basically the same issues as at the bench:
• Materials -> Data sources
• Experiment conditions -> Analysis parameters
• Equipment (and models) -> Programs (and versions)
• And the same challenges:
• Are all relevant conditions described accurately?
• Will the same materials and equipment be available?
![Page 8: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/8.jpg)
What is the current status of reproducibility?• Less than half of selected microarray
experiments published in Nature Genetics could be reproduced(Ioannidis et al., Nat Genet 2009)
• More than half [of surveyed papers] do not provide primary data and list neither the version nor the parameters used [for read mapping](Nekrutenko and Taylor., Nat Rev Genet 2012)
![Page 9: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/9.jpg)
Why should you care?(about making your analyses reproducible)
• Because it’s the right thing to do!
• .. and the one that’s struggling with its reproduction is often the future you
• Journals are becoming aware of the issues
• Reviewers may value it
• Anyway, it’s the same as at the bench..
![Page 10: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/10.jpg)
Galaxy supports reproducibility
• Automatically tracks metadata at every step
• Which are the datasets?
• What are the parameters?
• Which tools, and which version of the tool?
• What are the outputs
• Users can annotate the steps to capture the intent of the analysis!
![Page 11: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/11.jpg)
Galaxy supports reproducibility
• All jobs can be rerun later, by independent scientists
• Workflows capture common analysis sequences, i.e. typical experimental setups. Can be reused for other datasets and experiments
![Page 12: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/12.jpg)
Transparent• “Enabling users to share and communicate
their experimental results and outputs in a meaningful way” (Goecks et. al., Genome Biology 2010)
• Everything can be shared: Datasets, histories (i.e. experimental logbook), tools, workflows
• Provides public repositories
• Galaxy Pages are web-based documents for publishing results. Every level of detail can be accessed by readers
![Page 13: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/13.jpg)
• Galaxy installation at UiO, running on the Abel cluster
• Contains hundreds of tools, from Phylogeny tools to High Througput Sequencing analysis
• Available for all Feide users (all university users and several colleges)
![Page 14: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/14.jpg)
lifeportal.uio.no
Select Feide login, press Academic Login
![Page 15: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/15.jpg)
Select your institution
Select University ofOslo, then con;nue
![Page 16: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/16.jpg)
Use UiO username/password
Your UiO usernameand password
![Page 17: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/17.jpg)
Verify login information
Click User, verify thatyour email addressis shown
![Page 18: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/18.jpg)
Page orientation
Naviga;on bar, with workflows, shared data etc.
History panel-‐ shows allthe datasets you haveanalyzed and produced
Tool panel withmany analysisprograms Detail panel –
where the resultsare shown
![Page 19: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/19.jpg)
Create a new history
When star;ng on a "new"thing, start with a cleanhistory, and name it!
![Page 20: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/20.jpg)
Getting data: uploading
Click on Upload File,then Upload File again
Select fastqsanger assequence format
![Page 21: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/21.jpg)
Uploading data
Select input file here
![Page 22: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/22.jpg)
Uploaded data
Uploading data -‐ not quite done
![Page 23: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/23.jpg)
Look at data - eye symbol
![Page 24: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/24.jpg)
Data annotation - pen symbol
Can add informa;onabout the data set hereGood for tracking data
![Page 25: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/25.jpg)
Removing data set - X
NOTE: removed data sets are not gone,just not shown in your history
Need to do more to actually delete it
![Page 26: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/26.jpg)
Analyzing data
Select programin leT bar
Select inputfile here
?
![Page 27: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/27.jpg)
15.08.2014 [email protected]
The abel computer cluster
• Lifeportal runs on the abel computer cluster
• > 10 000 cores!
• > 40 TB memory!
• Lifeportal submits jobs to the abel cluster
• Can use several cores for a single job
27
![Page 29: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/29.jpg)
15.08.2014 [email protected]
Job op;ons
• # tasks = # cores you want to use• # tasks per node: –One node has 16 cores, some;mes programs run faster if all cores are in the same node
• Wall ;me: guess;mated run;me. – Note: jobs exceeding that will be killed!
• Memory per cpu: each CPU has 4 GB of memory -‐ just leave this op;on
29
![Page 30: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/30.jpg)
15.08.2014 [email protected]
CPU quotas
• Quotas calculated as # CPU hours• All have 200 hrs to use
• Big projects should apply for their own quotas
30
![Page 31: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/31.jpg)
15.08.2014 [email protected]
Running job status
• Colors show the status of the job
• Purple: data uploading
• Gray: analysis queued
• Yellow: running
• Green: done
• Red: error has occured
Queued
Running
Done
31
![Page 32: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/32.jpg)
15.08.2014 [email protected]
Results show up as new data set!
Results from jobshow up as a newdata set in history!
Basic sta;s;csappear here
FastQC qualityplot
32
![Page 37: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/37.jpg)
15.08.2014 [email protected]
Share or publish histories
37
Can share via link or publish for all to see
![Page 38: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/38.jpg)
15.08.2014 [email protected]
Published histories open to all
38
NOTE: others can not only look atpublished histories, they can alsocopy data sets from it!
Prac;cal way to share data!
![Page 40: Introduction to and - Wiki.uio.no · 2014-10-07 · Introduction to and INF-BIO5121/9121 Sveinung Gundersen ELIXIR.NO / Dept. of Informatics, UiO Oct 7, 2014](https://reader030.vdocuments.mx/reader030/viewer/2022041019/5ecde0135140d13f9829afdd/html5/thumbnails/40.jpg)
Galaxy:other tutorials
• For more tutorials and exercises, check out:
http://wiki.g2.bx.psu.edu/Learn
• Article with step-for-step examples/protocols making use of Galaxy in different scenarios:
Blankenberg, D., et al., Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology, Jan 2010, Chapter 19.