comp 691r bioinformatics algorithms - concordia...

39
COMP 691R Bioinformatics Algorithms Lecture 1a Course Outline

Upload: others

Post on 21-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

COMP 691R

Bioinformatics Algorithms

Lecture 1a

Course Outline

Page 2: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Classes

Fridays 17:45 to 20:15 in H-605

Instructor

Professor Greg Butler

Room: ER-603-53

Tel: 848 3031

[email protected]

http://www.cs.concordia.ca/~faculty/gregb/

Office Hours and Discussion

In class (preferable).

By appointment (email me with suggested time)

Lab

H-968

Page 3: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Bioinformatics Algorithms

— to cover the major algorithms used in bioinformatics

— emphasize algorithmic principles

Important

— algorithms use available databanks!

— practical performance of algorithms

— use on “farms” or “clusters”

Information Sources

See web site:http://www.cs.concordia.ca/~faculty/gregb/home/c691R.html

Books in Webster Library Reserve.

Selected journal and conference articles. Web sites.

Evaluation

four programming assignments (40%)

two data analysis assignments (20%)

three-hour final examination (40%)

Page 4: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Bioinformatics Algorithms — Real Objectives

Cover the necessary background on bioinformatics sothat you can understand the following three papers:

L. Duret and S. Abdeddaim, Multiple alignments forstructural, functional, or phylogenetic analyses of ho-mologous sequences. In Bioinformatics: Sequence,Structure and Databanks, editted by D. Higgins andW. Taylor, Oxford University Press, 2000.

C. Notredame, Recent progresses in multiple sequencealignment: a survey, Pharmacogenomics 3(1) (2002)131–144.

N. Kaminski and N. Friedman, Practical approaches toanalyzing results of microarray experiments, AmericanJournal of Respir. Cell Mol. Biol. 27 (2002) 125–132.

Page 5: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Lecture Schedule — Very Tentative

Week 1 (16 May):Course Outline. Biology and Genomics Introduction.

Weeks 2 – 5 (23 May – 13 June):Basic Algorithms on Sequences.• Sequence properties, scanning.• Alignment, multiple alignment, phylogeny.

Weeks 6–9 (20 June – 11 July):Sequence Analysis.• Basics of sequencing projects.• Sequence base calling, trimming, assembly.• Sequence analysis: Blast, InterProScan, PSORT II.• Secondary structure prediction.

Weeks 10–13 (18 July – 15 August):Microarray Data Analysis.• Basics of microarray experiments.• Normalization, differential expression.• Clustering.• Bayesian approaches.

Page 6: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Assignments

All course work is individual work.

Four programming assignments:

— must use C++

— basic algorithms from scratch using STL libraries

— advanced work may use existing librariesEMBOSS, R, Bioconductor libraries

— submit written report for each— describe algorithm, design, testing, results— source code listing as appendix to report

Two data analysis assignments

— use available tools and libraries

— given datasets

— submit 5–10 page written report for each

Assignments due about every 2 weeks.

— precise schedule to come

Page 7: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Final Examination

Focus is on material in 3 papers and the required back-ground.

Must know basic concepts of genomics.

Must know major algorithms

— purpose, how it works, complexity, limitations

— data structures, heuristics

Should be able to compare major algorithms

Should be able to propose a new algorithm for a dataanalysis problem using the major existing algorithms andinformation resources

Page 8: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

COMP 691R

Bioinformatics Algorithms

Lecture 1b

Background on Biology and Genomics

Page 9: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Life

Taxonomic Classification of Man

Superkingdom: EukaryotaKingdom: MetazoaPhylum: ChordataClass: MammaliaOrder: PrimataFamily: HominidaeGenus: HomoSpecies: sapiens

Prokaryotes: a unicellular organism having cells lackingmembrane-bound nuclei

— includes bacteria and archaea

Eukaryotes: an organism with membrane-bound nucleiin its cells

— includes plants, animals, fungi

Archaea: are prokaryotes

— have some genes resembling eukaryotes

— often life in extreme environments

Page 10: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Structure of Organisms

Organs

Tissues

Cells

Cellular Compartments

The Gene Ontology has a category for Cellular Com-ponent which includes subcellular structures, locations,and macromolecular complexes:

GO:0005575 : cellular_componentGO:0005623 : cell

GO:0005627 : ascusGO:0030424 : axonGO:0005933 : budGO:0000267 : cell fractionGO:0009986 : cell surfaceGO:0030425 : dendriteGO:0030312 : external encapsulating structureGO:0019861 : flagellumGO:0042601 : foresporeGO:0005622 : intracellularGO:0016020 : membraneGO:0030496 : midbodyGO:0042597 : periplasmic spaceGO:0030428 : septumGO:0005936 : shmooGO:0030427 : site of polarized growth

Page 11: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in
Page 12: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Cell

Each bacterium is enclosed by a rigid cell wall composedof a protein-sugar molecule.The wall gives the cell its shape and surrounds the cyto-plasmic membrane, protecting it from the environment.

The cytoplasm, or protoplasm, of bacterial cells is wherethe functions for cell growth, metabolism, and replica-tion are carried out.

The nucleoid is a region of cytoplasm where the chro-mosomal DNA is located.It is not a membrane bound nucleus, but simply an areaof the cytoplasm where the strands of DNA are found.Most bacteria have a single, circular chromosome.

Ribosomes are microscopic “factories” found in all cellsThey translate the genetic code from the molecular lan-guage of nucleic acid to that of amino acids – the build-ing blocks of proteins.

Plasmids are small, extrachromosomal genetic struc-tures.Like the chromosome, plasmids are made of a circularpiece of DNA.Unlike the chromosome, they are not involved in repro-duction.

Pili (singular, pilus) are small hairlike projections.They assist the bacteria in attaching to other cells andsurfaces, such as teeth, intestines, and rocks.

Flagella (singular, flagellum) are hairlike structures thatprovide a means of locomotion.

Page 13: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in
Page 14: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Cell

The nucleus s a highly specialized organelle that servesas the information and administrative center of the cell.

Mitochondria are oblong shaped organelles that arefound in the cytoplasm of every eukaryotic cell.In the animal cell, they are the main power generators,converting oxygen and nutrients into energy.

The endoplasmic reticulum is a network of sacs thatmanufactures, processes, and transports chemical com-pounds for use inside and outside of the cell.It is connected to the double-layered nuclear envelope,providing a connection between the nucleus and the cy-toplasm.

The Golgi apparatus is the distribution and shipping de-partment for the cell’s chemical products.It modifies proteins and fats built in the endoplasmicreticulum and prepares them for export to the outsideof the cell.

Page 15: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in
Page 16: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Cell

The most important characteristic of plants is their abil-ity to photosynthesize, i.e. make their own food byconverting light energy into chemical energy.

This process is carried out in specialized organelles calledchloroplasts.

Page 17: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Cell — Viruses

Page 18: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

DNA — Deoxyribonucleic acid

Page 19: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Genes and Proteins

Genes are segments of DNA encoding information thatultimately direct the production of RNA molecules thatserve a variety of functions, including:

— dictating the synthesis of proteins that perform awide variety of functions in the body

— regulating (turning on or turning off) the expressionof other genes

— forming structures in the cell ribosomes that are crit-ical for the manufacture of proteins

— transporting amino acids to the ribosomes for thecreation of proteins

DNA forms a double helix with a backbone of eachstrand of the helix consisting of a repeating ...sugar-phosphate-sugar-phosphate... polymer.

– the sugar is deoxyribose

Attached to the sugar ring is one of four nitrogen-containing bases:adenine (A), guanine (G), cytosine (C), thymine (T).

Page 20: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Genes and Proteins

The combination of one of these nitrogenous bases, asugar molecule, and a phosphate molecule is called a nu-cleotide, the basic building block of the DNA molecule.

The double helix is held together by weak hydrogenbonds between each thymine and adenine base, as wellas between each guanine and cytosine base.

Each of these pairs is called a base pair, or “bp” forshort.

The two strands of DNA are complementary

Page 21: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The Genes and Proteins

Transcription is the synthesis of a messenger RNA(mRNA) by replication of the DNA

Translation is the synthesise of proteins by the ribo-somes which interpret the genetic code on the mRNA.

Genetic information is encoded in a sequence of threenucleotides termed codons.

Each codon represents one of 20 amino acids.

A protein is a polypeptide consisting of amino acids.

Page 22: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

IUPAC Standard Codes

IUPAC nucleotide codes

nucleotide code Base--------------- -------A AdenineC CytosineG GuanineT (or U) Thymine (or Uracil)R A or GY C or TS G or CW A or TK G or TM A or CB C or G or TD A or G or TH A or C or TV A or C or GN any base. or - gap

Page 23: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

IUPAC Standard Codes

IUPAC amino acid codes

IUPAC amino Three letteracid code code Amino acid

------------ ----- ----------A Ala AlanineC Cys CysteineD Asp Aspartic AcidE Glu Glutamic AcidF Phe PhenylalanineG Gly GlycineH His HistidineI Ile IsoleucineK Lys LysineL Leu LeucineM Met MethionineN Asn AsparagineP Pro ProlineQ Gln GlutamineR Arg ArginineS Ser SerineT Thr ThreonineV Val ValineW Trp TryptophanY Tyr Tyrosine

Page 24: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Genetic Code

First Position Second Position Third PositionU C A G

--------------------------------------------------Phe Ser Tyr Cys U

U Phe Ser Tyr Cys CLeu Ser STOP STOP ALeu Ser STOP Trp G

--------------------------------------------------Leu Pro His Arg ULeu Pro His Arg C

CLeu Pro Gln Arg ALeu Pro Gln Arg G

--------------------------------------------------Ile Thr Asn Ser UIle Thr Asn Ser C

AIle Thr Lys Arg AMET Thr Lys Arg G

--------------------------------------------------Val Ala Asp Gly UVal Ala Asp Gly C

GVal Ala Glu Gly AVal Ala Glu Gly G

--------------------------------------------------

MET — Methionine — is the START codon

Page 25: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Transcription and Translation

Page 26: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Cell Processes

The activity of organisms is determined by themechanisms within cells, between cells,and with the environment.

These include

— cell life cycle of reproduction

— repair and maintenance of cell structures

— development of the organism

— transcription, translation, and its regulation

— biosynthesis of compounds for the cell

— metabolism of matter from the environment

— transport of matter

— recovering of energy and matter (catabolism)

Gene Ontology category of biological process has

GO:0008150 : biological_processGO:0007610 : behaviorGO:0000004 : biological_process unknownGO:0009987 : cellular processGO:0007275 : developmentGO:0008371 : obsoleteGO:0007582 : physiological processesGO:0016032 : viral life cycle

Page 27: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Cell Processes

GO:0009987 : cellular processGO:0007154 : cell communicationGO:0008219 : cell deathGO:0030154 : cell differentiationGO:0008151 : cell growth and/or maintenanceGO:0006928 : cell motilityGO:0006944 : membrane fusion

Page 28: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Cell Processes

GO:0007275 : developmentGO:0009838 : abscissionGO:0007568 : agingGO:0030154 : cell differentiationGO:0007349 : cellularizationGO:0009790 : embryonic developmentGO:0009908 : floweringGO:0009292 : genetic transferGO:0040007 : growthGO:0007320 : inseminationGO:0002164 : larval developmentGO:0009933 : meristem organizationGO:0009653 : morphogenesisGO:0007389 : pattern specificationGO:0009791 : post-embryonic developmentGO:0040029 : regulation of gene expression, epigeneticGO:0000003 : reproductionGO:0009835 : ripeningGO:0007530 : sex determinationGO:0007548 : sex differentiationGO:0019827 : stem cell maintenance

Page 29: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Cell Processes — physiological processes

GO:0046717 : acid secretionGO:0008218 : bioluminescenceGO:0046849 : bone remodelingGO:0009758 : carbohydrate utilizationGO:0008151 : cell growth and/or maintenanceGO:0008015 : circulationGO:0010031 : circumnutationGO:0000746 : conjugationGO:0016265 : deathGO:0009900 : dehiscenceGO:0007586 : digestionGO:0030146 : diuresisGO:0007588 : excretionGO:0030198 : extracellular matrix organization and biogenesisGO:0009844 : germinationGO:0007599 : hemostasisGO:0042592 : homeostasisGO:0020021 : host cell immortalizationGO:0020012 : immune evasionGO:0007595 : lactationGO:0008152 : metabolismGO:0042303 : molting cycleGO:0009877 : nodulationGO:0009935 : nutrient uptakeGO:0007584 : nutritional response pathwayGO:0018987 : osmoregulationGO:0007567 : parturitionGO:0009405 : pathogenesisGO:0030432 : peristalsisGO:0015979 : photosynthesisGO:0009856 : post-pollinationGO:0007565 : pregnancyGO:0007585 : respiratory gaseous exchangeGO:0009719 : response to endogenous stimulusGO:0009605 : response to external stimulusGO:0006950 : response to stressGO:0030431 : sleepGO:0001659 : thermoregulation

Page 30: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Metabolism — Food to Waste in Animals

Page 31: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Metabolism — Glycolysis

Page 32: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Catabolism of Proteins and Amino Acids

Page 33: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Regulation

Page 34: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Gene Regulatory Networks

Page 35: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Cell Differentiation into Tissues

Page 36: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Protein Structure

Central Dogma

Amino acid sequence

determines protein structure

determines enzyme function

An active site is where the interaction takes place.

Page 37: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Protein Structure

Proteins are polymers of amino acids.

Amino acids are primary amines that contain an alphacarbon that is connected to an amino (NH3) group, acarboxyl group (COOH), and a variable side group (R)

Polymers of amino acids are created by linking an aminogroup to a caroboxyl group on another amino acid.

This is termed a peptide bond.

Page 38: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

The 20 Amino Acids

Page 39: COMP 691R Bioinformatics Algorithms - Concordia Universityusers.encs.concordia.ca/~gregb/home/PDF/comp691R-lecture... · 2003. 5. 16. · Final Examination Focus is on material in

Post-genomics Studies