swiss institute of bioinformatics institut suisse de bioinformatique lf-2003.01 introduction to...

36
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Upload: sheila-small

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Introduction to Bioinformatics

Page 2: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

SIB and EMBnet Bioinformatics resources for biomedical

scientists

Page 3: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

The Swiss Institute of Bioinformatics

Founded in March 1998 Collaborative structure Lausanne - Geneva -

Basel Groups at ISREC, Ludwig Institute, Unil, HUG,

UniGe, recently UniBas and soon EPFL. Several roles: teaching, services, research Currently: ~ 130 employees

Page 4: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Projects at SIB

Databases SWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODEL TrEST, TrGEN (predicted proteins), tromer (transcriptome)

Softwares Melanie, Deep View, proteomic tools, ESTScan, pftools, Java

applets Services

Web servers ExPASy, EMBnet Teaching and helpdesk

Research Mostly sequence and expression analysis, 3D structure, and

proteomic

Page 5: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 6: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Teaching

DEA (master degree) in Bioinformatics: 1 year full time, first diploma common to Unige and Unil.

EMBnet courses: 2x 1 week per year in Lausanne, to be extended in Basel

Pregrade courses in Geneva, Fribourg and Lausanne Universities

Other courses at CHUV and EPFL Courses in other countries: Colombia,

Cambodia, Peru, …

Page 7: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Research

New algorithms (faster alignments…) New technology (GRID or cluster computing) New tools (protein analysis, microarrays,

confocal microscopy) New databases (microarrays, transcriptome,

proteome)

Collaborations with lab researchers!

Page 8: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Three levels of services

Simple web access to softwares and databases Easy to use for basic occasional research with few sequences Potentially insecure

Command-line access with a local Unix account More powerful (automation) and secure Requires to understand Unix system and frequent practice

Collaboration with SIB Access to experts in the field (help desk) For projects requiring huge programming or special hardware

resources

Page 9: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

SIB’s important sites

Home www.isb-sib.ch

ExPASy - Expert Protein Analysis System www.expasy.org

Hits database and tools hits.isb-sib.ch

EMBnet Switzerland www.ch.embnet.org

Geneva Bioinformatics www.genebio.ch

Page 10: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

SIB home

Page 11: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Expert Protein Analysis SystemQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 12: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Swiss node http://www.ch.embnet.org

Page 13: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

EMBnet organisation

European in 1988, now world-wide spread 32 country nodes, 8 special nodes.

Role Training, education (EMBER) Software development (EMBOSS, SRS) Computing resources (databases, websites, services) Helpdesk and technical support Publications (EMBnet.news, Briefings in Bioinformatics)

Access: www.embnet.org Each node with “www.xx.embnet.org” where xx is the country

code (e.g., ch for Switzerland)

Page 14: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

EMBnet home

Page 15: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

European Molecular Biology Open Software Suite

Free Open Source (for most Unix plateforms) GCG successor (compatible with GCG file

format) More than 200 programs Easy to install locally

but no interface, requires local databases Unix command-line only

Interfaces Jemboss, www2gcg, w2h, wemboss … (with account) Pise, EMBOSS-GUI (no account)

Access: www.emboss.org

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 16: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Other important sites

ExPASy - Expert Protein Analysis System www.expasy.org

EBI - European Bioinformatics Institute www.ebi.ac.uk

NCBI - National Center for Biotechnology Information www.ncbi.nlm.nih.gov

Sanger - The Sanger Institute www.sanger.ac.uk

Page 17: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Bioinformatics: definition

Every application of computer science to biology Sequence analysis, images analysis, sample

management, population modelling, … Analysis of data coming from large-scale

biological projects Genomes, transcriptomes, proteomes, metabolomes,

etc…

Page 18: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

The new biology

Traditional biology Small team working on a specialized topic Well defined experiment to answer precise questions

New « high-throughput » biology Large international teams using cutting edge

technology defining the project Results are given raw to the scientific community

without any underlying hypothesis

Page 19: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Example of « high-throughput »

Complete genome sequencing Large-scale sampling of the transcriptome (EST) Simultaneous expression analysis of thousands of genes

(DNA microarrays, SAGE) Large-scale sampling of the proteome Protein-protein analysis large-scale 2-hybrid (yeast,

worm) Large-scale 3D structure production (yeast) Metabolism modelling Simulations Biodiversity

Page 20: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Role of bioinformatics

Control and management of the data Analysis of primary data e.g.

Base calling from chromatograms Mass spectra analysis DNA microarrays images analysis

Statistics Database storage and access Results analysis in a biological context

Page 21: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

First information: a sequence ?

Nucleotide RNA (or cDNA) Genomic (intron-exon) Complete or incomplete?

mRNA with 5’ and 3’ UTR regions Entire chromosome

Protein Pre/Pro or functional protein? Function prediction Post-translational modifications? Holy Grail: 3D structure?

Page 22: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Genomes in numbers

Sizes: virus: 103 to 105 nt bacteria: 105 to 107 nt yeast: 1.35 x 107 nt mammals: 108 to

1010 nt plants: 1010 to 1011 nt

Gene number: virus: 3 to 100 bacteria: ~ 1000 yeast: ~ 7000 mammals: ~ 30’000 Plants: 30’000-

50’000?

Page 23: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Sequencing projects

« small » genomes (<107): bacteria, virus Many already sequenced (industry excluded) More than 100 microbial genomes already in the public

domain More to come! (one new every two weeks…)

« large » genomes (107-1010) eucaryotes 15 finished (S.cerevisiae, S. Pombe, E. cuniculi, G. theta,

C.elegans, D.melanogaster, A. gambiae, P. falciparum, P. yoelii, D. rerio, F. rubripes, A.thaliana, O. sativa (2x), M. musculus, Homo sapiens)

Many more to come: rat, pig, cow, maize (and other plants), insects, fishes, many pathogenic parasites (Leishmania…)

EST sequencing Partial mRNA sequences ~15x106 sequences in the public

domain

Page 24: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Human genome

Size: 3 x 109 nt for a haploid genome Highly repetitive sequences 25%, moderately repetitive

sequences 25-30% Size of a gene: from 900 to >2’000’000 bases (introns

included) Proportion of the genome coding for proteins: 5-7% Number of chromosomes: 22 autosomal, 1 sexual

chromosome Size of a chromosome: 5 x 107 to 5 x 108 bases

centromer exons of a gene telomer

regulatory elements repetitive sequences

locus control region

Page 25: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

How to sequence the human genome?

Consortium « international » approach: Generate genetic maps (meiotic recombination) and

pseudogenetic maps (chromosome hybrids) for indicator sequences

Generate a physical map based on large clones (BAC or PAC) Sequence enough large clones to cover the genome

« commercial » approach (Celera): Generate random libraries of fixed length genomic clones (2kb

and 10kb) Sequence both ends of enough clones to obtain a 10x coverage Use computer techniques to reconstitute the chromosomal

sequences, check with the public project physical map

Page 26: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Sequencing progression

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 27: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Interpretation of the human draft

Still many gaps and unordered small pieces (except for chr 6, 7, 13, 14 20, 21, 22, Y)

Even a genomic sequence does not tell you where the genes are encoded. The genome is far from being « decoded »

One must combine genome and transcriptome to have a better idea

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Last freeze Ncbi30 June 24, 2002Last freeze Ncbi30 June 24, 2002

Page 28: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

The transcriptome

The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genome

The documentation of the localization (cell type) and conditions under which these RNAs are expressed

The documentation of the biological function(s) of each RNA species

Page 29: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Public draft transcriptome

Information about the expression specificity and the function of mRNAs « full » cDNA sequences of know function « full » cDNA sequences, but « anonymous » (e.g. KIAA or

DKFZ collections) EST sequences

cDNA libraries derived from many different tissues Rapid random sequencing of the ends of all clones ORESTES sequences

Growing set of expression data (microarrays, SAGE etc…) Increasing evidences for multiple alternative splicing and

polyadenylation

Page 30: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Example mapping of ESTs and mRNAs

ESTsmRNAs

Computer prediction

Page 31: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

The proteome

Set of proteins present in a particular cell type under particular conditions

Set of proteins potentially expressed from the genome

Information about the specific expression and function of the proteins

Page 32: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Information on the proteome

Separation of a complex mixture of proteins 2D PAGE (IEF + SDS PAGE) Capillary chromatography

Individual characterisation of proteins Tryptic peptides signature (MS) Sequencing by chemistry or MS/MS

All post-translational modifications (PTMs) !

Page 33: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Tridimentional structures

Methods to determine structures X-ray cristallography NMR

Data format Atoms coordinates (except H) in a cartesian space

Databases For proteins and nucleic acids (RSCB, was PDB) Independent databases for sugars and small organic

molecules

Page 34: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Visualisation of the structures

Secondary structure elements Alpha helices, beta sheets, other

Softwares Various representations (atoms, bonds, secondary…) Big choice of commercial and free software (e.g.,

DeepView)

Page 35: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Sequence information, and so what ?

How to store and organise ? Databases (next lecture)

How to access, search, compare ? Pairwise alignments, dot plots (Tuesday) BLAST searches in db (Tuesday) EST clustering (Wednesday) Multiple Alignments (Wednesday) Patterns, PSI-BLAST, Profiles and HMMs (Thursday) Gene prediction (Thursday) Protein function prediction (Friday) Users problems (Friday)

Page 36: Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2003.01

Thank you