presentation of the crg bioinformatics core facility jean-françois taly
TRANSCRIPT
![Page 1: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/1.jpg)
Presentation of the
CRG Bioinformatics Core facility
Jean-François Taly
![Page 2: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/2.jpg)
People in the BioCore
Jean-Francois Luca Toni•@CRG 2009•@BioCore 2012•Acting head•Structur. bioinfo.•MSA•NGS analyst•Galaxy server•Training
•@BioCore 2010•NGS analyst•Small ncRNA prediction•Motif analysis•Training
•@Biocore 2009•Wikis•Web/DB dev.•DB Mirrors•Struct. bioinfo.•Training
•@Biocore 2014•Micro-arrays•NGS analyst•Galaxy•Training
Sarah
![Page 3: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/3.jpg)
Our mission
• Expertise in bioinformatics• Service• Consultation
• Trainings • Internal and external
• Support in infrastructures• In collaboration with the SIT and TIC
• Part of the CRG bioinformaticians network• 83 @ bioinformatics retreat• Many more in PRBB/CNAG
![Page 4: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/4.jpg)
Our services
Analysis Microarray Chip-seq RNA-seq DE and assembly Genome assembly Variant calling
Informatics support Wiki WEB Server API
Trainings Galaxy, Perl, Linux, advanced bioinformatics
![Page 5: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/5.jpg)
Fee per service
Item PRBB fees Public fees without VAT
Manual data analysis 13.12 €/hour 39.36 €/hour
Automated data analysis (CPU time)
2.38 €/hour 7.16 €/hour
![Page 6: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/6.jpg)
Our contribution to projects
Project conception
Bioinfo exp. design
Bioinfo exp. realization
Bioinfo output interpretation
Project conclusions
![Page 7: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/7.jpg)
Our contribution to projects
Project conception
Bioinfo exp. design
Bioinfo exp. realization
Bioinfo output interpretation
Project conclusions
Apply a definedprocedures
![Page 8: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/8.jpg)
Our contribution to projects
Project conception
Bioinfo exp. design
Bioinfo exp. realization
Bioinfo output interpretation
Project conclusions
CustomizedAnalysis
![Page 9: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/9.jpg)
CRG bioinformatics community
Big Data WG• EGA initiative• Data Engineering• NoSQL• HPC
NGS Tech. Sem.• RNA-seq• G. assembly• Variant Annot.• Metagenomics
Other topics• Integrated -omics• Good practice in
code dev.• Galaxy dev.• …
![Page 10: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/10.jpg)
source: Creative Commons, Wikipedia
Gene expression array data analysis:• Background correction and normalization• Differential expression analysis• Gene Ontology and pathway analysis• Various graphics / plots
Additional array-based technologies the Bioinformatics unit supports include:• qPCR arrays• Comparative Genomics Hybridization arrays
Main tools are based on the R / Bioconductor environment
Micro-arrays
![Page 11: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/11.jpg)
RNA-seq
![Page 12: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/12.jpg)
RNA-seq
![Page 13: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/13.jpg)
DNA-seq
![Page 14: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/14.jpg)
DNA-seq
Pevzner P A et al. PNAS 2001;98:9748-9753
![Page 15: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/15.jpg)
Chip-seq
![Page 16: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/16.jpg)
Chip-seq
![Page 17: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/17.jpg)
Growing to the next level
From gene DE to transcripts DE Users have now access to longer reads and deeper coverage
Metagenomics 16S Ribosomal amplicon sequencing with MiSeq
Data integration framework Combining different data types into one single analysis
RNAseq DE Histone marks Metabolomics data Proteomics
Data analysis workflow on Galaxy Leave the basic processing to users and focus on advanced analysis
![Page 18: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/18.jpg)
Databases mirroring
Biological file sources ENSEMBL UCSC NCBI Blast DBs UniProt PDB Igenomes (Illumina, only Human but the rest is upcoming)
All Indexed and formated for NCBI BLAST+ (makeblastdb for proteins and nucleic acids) Bowtie & Bowtie2 BWA Fastaindex (Exonerate) GEM faTo2bit
![Page 19: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/19.jpg)
Where are they stored?
In CRG common storage: /db
More information: http://biocore.crg.cat/wiki/Category:Mirrors
IMPORTANT: /db/seq (former /seq) IS DEPRECATED
![Page 20: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/20.jpg)
WEB and Database services
Applications Data and project management Platforms for big data analysis and complex information
querying Promotion and publication of scientific results
![Page 21: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/21.jpg)
WEB and Database services Example
Superfly for Yogi Jaëger Visual catalogue of gene embryo development of different fly
species.
![Page 22: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/22.jpg)
WEB and Database services Example
PRGDB with Walter Sanseverino Wiki-based Database of plant resistance genes.
![Page 23: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/23.jpg)
Activity per category in 2014
![Page 24: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/24.jpg)
Presentation of the Galaxy platform
Jean-François Taly Bioinformatics Core Facility
CRG (Barcelona, Catalonia, Spain)September 18th 2014
EMBO Global Exchange CoursePasteur Institute of Tunis, Tunisia
![Page 25: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/25.jpg)
Biologists: Linux-free data analysis with a graphical
interface
Bioinformaticians: Insure reproducibility when sharing analysis
and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are
not familiar of
Software Developers: Diffuse their tools on a standardized platform
Why Should I Use Galaxy?
![Page 26: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/26.jpg)
The Galaxy Team
Galaxy is developed by :• The Nekrutenko lab in the center for
Comparative Genomics and Bioinformatics at Penn State University
• The Taylor lab at Johns Hopkins University• The community
https://wiki.galaxyproject.org/GalaxyTeam
![Page 27: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/27.jpg)
Rationale behind GalaxyFrom Goeks et al. Genome Biol. 2010.
“Computation has become an essential tool in life science research. This is exemplified in genomics, where first microarrays and now massively parallel DNA sequencing have enabled a variety of genome-wide functional assays, such as ChIP-seq and RNA-seq (and many others), that require increasingly complex analysis tools. However, sudden reliance on computation has created an 'informatics crisis' for life science researchers: computational resources can be difficult to use, and ensuring that computational experiments are communicated well and hence reproducible is challenging. Galaxy helps to address this crisis by providing an open, web-based platform for performing accessible, reproducible, and transparent genomic science. “
![Page 28: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/28.jpg)
Biologists: Linux-free data analysis with a graphical
interface
Bioinformaticians: Insure reproducibility when sharing analysis
and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are
not familiar of
Software Developers: Diffuse their tools on a standardized platform
Why Should I Use Galaxy?
![Page 29: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/29.jpg)
Makes bioinformatics accessible
![Page 30: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/30.jpg)
From a command line …
![Page 31: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/31.jpg)
… to a graphical interface
![Page 32: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/32.jpg)
One step
![Page 33: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/33.jpg)
Multi-step protocol1
2
3
4
5
![Page 34: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/34.jpg)
Workflow
![Page 35: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/35.jpg)
Galaxy Tutorials https://usegalaxy.org/u/jeremy/p/galaxy-rna-seq-analysis-exercise
https://wiki.galaxyproject.org/Learn
![Page 36: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/36.jpg)
NGS in a laptop• MinION brings NGS to your laptop
• http://youtu.be/UtXlr19xTh8
![Page 37: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/37.jpg)
Biologists: Linux-free data analysis with a graphical
interface
Bioinformaticians: Insure reproducibility when sharing analysis
and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are
not familiar of
Software Developers: Diffuse their tools on a standardized platform
Why Should I Use Galaxy?
![Page 38: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/38.jpg)
Reproducibility
Bioinformaticians suffer that too!• Results can change in function of
• Libraries and software versions• Genome annotations
• Results published without the code
Want to share your findings with everybody?
• Froze an environment in a Virtual Machine• Use an application controller (Docker) • Prepare a Galaxy workflow
![Page 39: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/39.jpg)
Improve the visibility of a paper
“A Galaxy workflow and the corresponding wrappers are available to download at https://mylab.com. A virtual machine containing a pre-set up server can be download at the same address “
Why not having as well?
![Page 40: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/40.jpg)
Galaxy Workflows
![Page 41: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/41.jpg)
Biologists: Linux-free data analysis with a graphical
interface
Bioinformaticians: Insure reproducibility when sharing analysis
and workflows Teach their knowledge to a broad audience Get access to workflows for topics they are
not familiar of
Software Developers: Diffuse their tools on a standardized platform
Why Should I Use Galaxy?
![Page 42: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/42.jpg)
Wrapping software
Software
The wrapper prepare the command line
XML file
![Page 43: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/43.jpg)
Simple wrapper example
![Page 44: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/44.jpg)
venn_diagram.sh Wrapper can launch scripts
![Page 45: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/45.jpg)
TopHat wrapper (1) XML file describing tophat parameters
![Page 46: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/46.jpg)
TopHat wrapper (2) XML file describing tophat parameters
![Page 47: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/47.jpg)
Community Tools/Wrappers
![Page 48: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/48.jpg)
Galaxy Public servers Good points
Free No IT tasks Comes with reference genomes and
workflows
Bad points Offer Limited Resources (Disk/CPUs) Data transfer may be long Give access to the tools they want Data security may not be respected
Should I install Galaxy?
![Page 49: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/49.jpg)
Galaxy Public Servers https://wiki.galaxyproject.org/PublicGalaxyServers
![Page 50: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/50.jpg)
Galaxy Local Server Good points
Total control on data and tools Your own disk and CPU limitation Some companies sell a ready-to-use
infrastructure Tool shed helps to install wrappers and
software
Bad points Cost of installation and maintenance Need IT supports if you need a multi-users
advanced set up
Should I install Galaxy?
![Page 51: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/51.jpg)
Get Galaxy https://wiki.galaxyproject.org/Admin/GetGalaxy
Can be installed only in Linux or Mac
![Page 52: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/52.jpg)
NFS:/software
HPC
User
/scratch
Sequences Indexes
Files, Back-up, tmp
FTP
NFS
NFS:/db
Galaxy server
Tools
DATA Software
30 days max.
Files > 2Gb
![Page 53: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/53.jpg)
Database engine Galaxy team recommend postgreSQL but can it be
MySQL Store users details and data information
Tools = wrappers File describing all possible parameters of a software Script preparing the correct command line
Apache server
![Page 54: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/54.jpg)
![Page 55: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/55.jpg)
Shared file system NFS (2Pb)
10 €/Tb/Group/Month Access to the shared biological resources
Ensembl, UCSC Genomes and indexes Uniprot, pfam, smart, PDB
Access to the shared software repository
High Performance Computing 7 cores 8 CPUS each (56 tot) 47 Gb memory
![Page 56: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/56.jpg)
![Page 57: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/57.jpg)
FTP server Proftpd for the server side I recommend Filezila for the client (multiplatform)
Upload from Galaxy Files are moved to the shared file system
![Page 58: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/58.jpg)
Galaxy is an open, web-based platform for computational biomedical research.
Accessible: Users without programming experience can run tools and workflows
Reproducible: Galaxy captures analysis details Transparent: Users can share and publish
analyses
WIKI: https://wiki.galaxyproject.org/FrontPage
Summary
![Page 60: Presentation of the CRG Bioinformatics Core facility Jean-François Taly](https://reader030.vdocuments.mx/reader030/viewer/2022013004/56649da15503460f94a8d279/html5/thumbnails/60.jpg)