introduction to bioinformatics

44
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel- Gutfreund Teaching Assistance: Idit kosti Inbal Tal Edward Vitkin urse web site : tp://webcourse.cs.technion.ac.il/236523

Upload: katell-martin

Post on 04-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Introduction to Bioinformatics. Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Idit kosti Inbal Tal Edward Vitkin. Course web site : http://webcourse.cs.technion.ac.il/236523. What is Bioinformatics?. Course Objectives. To introduce the bioinfomatics discipline - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Bioinformatics

Introduction to Bioinformatics

Lecturer: Prof. Yael Mandel-Gutfreund

Teaching Assistance:

Idit kosti

Inbal Tal

Edward Vitkin

Course web site :http://webcourse.cs.technion.ac.il/236523

Page 2: Introduction to Bioinformatics

2

What is Bioinformatics?

Page 3: Introduction to Bioinformatics

3

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

Page 4: Introduction to Bioinformatics

4

Course Structure and Requirements

1.Class Structure1. 2 hours Lecture 2. 1 hour tutorial

2. Home work• Homework assignments will be given every second

week• The homework will be done in pairs.• 5/5 homework assignments will be submitted

2. A final project will be conducted in pairs * Project will be presented as a poster –poster day 20.3

Page 5: Introduction to Bioinformatics

5

Grading

• 20 % Homework assignments

• 80 % final project

(10% proposal,

20% supervisor evaluation

70% poster presentation)

Page 6: Introduction to Bioinformatics

6

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.

• Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

Page 7: Introduction to Bioinformatics

7

What is Bioinformatics?

Page 8: Introduction to Bioinformatics

8

“The field of science in which biology, computer science, and information technology merge to form a single discipline”

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

What is Bioinformatics?

Page 9: Introduction to Bioinformatics

9

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

21ST centaury

Genome Transcriptome Proteome

Page 10: Introduction to Bioinformatics

10

From DNA to Genome

Watson and Crick DNA model 1955

1960

1965

1970

1975

1980

1985

Page 11: Introduction to Bioinformatics

11

1995

1990

2000 First human genome draft

First genomeHemophilus Influenzae

Yeast genome

Page 12: Introduction to Bioinformatics

12

Total 1379 294

Eukaryotes 133 39

Bacteria 1152 235

Archaea 94 23

Complete Genomes

2010 2005

Page 13: Introduction to Bioinformatics

1,000 Genomes Project: Expanding the Map of Human Genetics

Researchers hope the effort will speed up the discovery of many diseases's genetic roots

13

Page 14: Introduction to Bioinformatics

14

Main Goal:

To understand the living cell

Annotation Comparativegenomics

Functionalgenomics

25000 genomes… What’s Next ?

The “post-genomics” The “post-genomics” eraera

SystemsBiology

Page 15: Introduction to Bioinformatics

From ….25000 genomes

To…Understanding living cells

Page 16: Introduction to Bioinformatics

16

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG AAT ......

.............. TGAAAAACGTA

Annotation

Page 17: Introduction to Bioinformatics

17

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

Page 18: Introduction to Bioinformatics

18

How do we identify a genein a genome?

A gene is characterized by several features (promoter, ORF…)some are easier and some harder to detect…

Page 19: Introduction to Bioinformatics

19

Using Bioinformatics approaches for Gene hunting

Relative easy in simple organisms (e.g. bacteria)

VERY HARD for higher organism (e.g. humans)

Page 20: Introduction to Bioinformatics

20

Comparativegenomics

Page 21: Introduction to Bioinformatics

21

Comparison between the full drafts of the human and chimp genomesrevealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

Page 22: Introduction to Bioinformatics

So where are we different ??

22

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG--GGATGCGGGCCCTATACCCMouse ATAGCG---GGATGCGGCGC-TATACC-A

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGGGGATGCGGGCCCTATACCCMouse ATAGCGGGATGCGGCGCTATACCA

Page 23: Introduction to Bioinformatics

23

And where are we similar ???

VERY SIMAILARConserved between many organisms

VERYDIFFERENT

Page 24: Introduction to Bioinformatics

24

Functionalgenomics

Page 25: Introduction to Bioinformatics

25

TO BE IS NOT ENOUGH In any time point a gene can be functional or not

Page 26: Introduction to Bioinformatics

26

From the gene expression pattern we can lean:

What does the gene do ?When is it needed?What other genes or proteins interact with it?…..

What's wrong??

Page 27: Introduction to Bioinformatics

27

Systems Biology

Page 28: Introduction to Bioinformatics

Jeong et al. Nature 411, 41 - 42 (2001)

Biological networks

Page 29: Introduction to Bioinformatics

What can we learn from a network?

Page 30: Introduction to Bioinformatics

What can we learn from Biological Networks

• Is the protein essential for the organism ?• Is it a good drug targets?

What can we learn about this protein

Page 31: Introduction to Bioinformatics

What of all this will we learn in the course?

31

The course will concentrate on the bioinformatics tools and databases which are used to :Annotate genes, Compare genes and genomesInfer the function of the genes and proteinsAnalyze the interactions between genes and proteinsETC….

Page 32: Introduction to Bioinformatics

32

Biological Databases

The different types of data are collected in database

– Sequence databases – Structural databases– Databases of Experimental Results

All databases are connected

Page 33: Introduction to Bioinformatics

33

Sequence databases

• Gene database

• Genome database

• Disease related mutation database

• ………….

Page 34: Introduction to Bioinformatics

34

Genome Browsers

Easy “walk” through the genome

UCSC Genome Browser http://genome.ucsc.edu/

Page 35: Introduction to Bioinformatics

35

Disease related database

Page 36: Introduction to Bioinformatics

36

Sickle Cell Anemia

• Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

Page 37: Introduction to Bioinformatics

37

Healthy Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 38: Introduction to Bioinformatics

38

Diseased Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Page 39: Introduction to Bioinformatics

39

Structure Databases

• 3-dimensional structures of proteins, nucleic acids, molecular complexes etc

• 3-d data is available due to techniques such as NMR and X-Ray crystallography

Page 40: Introduction to Bioinformatics

40

Page 41: Introduction to Bioinformatics

41

Databases of Experimental Results

• Data such as experimental microarray images- gene expression data

• Proteomic data- protein expression data

• Metabolic pathways, protein-protein interaction data, regulatory networks

• ETC………….

Page 42: Introduction to Bioinformatics

42

PubMed

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.gov/pubmed/

Literature Databases

Page 43: Introduction to Bioinformatics

43

Putting it all Together

• Each Database contains specific information

• Like other biological systems also these databases are interrelated

Page 44: Introduction to Bioinformatics

44

GENOMIC DATAGenBank

DDBJ

EMBL

ASSEMBLED GENOMES

GoldenPath

WormBase

TIGR

PROTEIN

PIR

SWISS-PROT

STRUCTUREPDB

MMDB

SCOP

LITERATURE

PubMed

PATHWAYKEGG

COG

DISEASE

LocusLink

OMIM

OMIA

GENESRefSeq

AllGenes

GDBSNPs

dbSNP

ESTs

dbEST

unigene

MOTIFS

BLOCKS

Pfam

Prosite

GENE EXPRESSION

Stanford MGDB

NetAffx

ArrayExpress