doug raiford lesson 1. biologists and computer scientists note the word “scientists”...
TRANSCRIPT
Doug RaifordLesson 1
Biologists and Computer ScientistsNote the word “Scientists”
04/21/23 Introduction 2
Wikipedia Computational biology encompasses bioinformatics Bioinformatics applies algorithms and statistical
techniques to the interpretation, classification and understanding of biological datasets
NCBI Bioinformatics: Research, development or application
of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze or visualize such data.
Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems
For the purposes of this course we are
treating the terms as synonymous
04/21/23 3Introduction
It’s All About the DataVirtually every biological experiment
requires a processor and software
04/21/23 Introduction 4
Genetic material comprised of 3 billion base-pairs
The sheer volume of data requires the involvement of computational and storage techniques in order to analyze
04/21/23 5Introduction
Can now identify which genes are affected by a disease or treatment
Thousands of genes per experiment
Multiple experiments per time-point
Multiple time-points04/21/23 6Introduction
Data growing exponentiallyThousands of complete genomesEach genome results in thousands of
experiments
04/21/23 7Introduction
Vast amounts of data More data coming in daily Sophisticated
computational techniques required Clustering Searches Optimizations Data mining Pattern recognition Classification
04/21/23 8Introduction
04/21/23 Introduction 9
A little about me Work School
Moodle is the primary page Weekly schedule▪ When homeworks
are due▪ When projects are
due▪ Links to quizzes,
projects, and homeworks
Instructor website Syllabus Slides
04/21/23 10Introduction
Bioinformatics: Sequence and Genome Analysis
Beginning Perl for Bioinformatics
04/21/23 11Introduction
3:30 to 5:00 Tuesday and ThursdayOr by appointment
Social Science 412Phone 406-243-5605Email
A little about myself
04/21/23 12Introduction
Try to get assignments in on time Letter grade for each day late
04/21/23 13Introduction
Component Undergrad Graduate
Homework 10% 8%
Quizzes 25% 21%
Exams (3 of them) 30% 25%
Projects 35% 29%
Grad Project NA 17%
90 - 100 A87 - 89 B+80 - 86 B77 - 79 C+70 - 76 C67 - 69 D+60 - 66 D00 - 59 F
90 - 100 A87 - 89 B+80 - 86 B77 - 79 C+70 - 76 C67 - 69 D+60 - 66 D00 - 59 F
Your work in this class needs to be your own
Overly similar work (to that of your classmates or to content from the web) will be considered to be the result of copying First offense will result in a zero
on the assignment Second will be referred to the
Dean of Academic Affairs Student Conduct Code
http://life.umt.edu/vpsa/student_conduct.php
04/21/23 14Introduction
Let me know of any special needs during this first week Letter from Disability Services
for Students (DSS) Religious observances Officially sanctioned,
scheduled University extracurricular activity opportunity to make up class
assignments or other graded assignments
04/21/23 15Introduction
Improve the computer scientist’s understanding of biological systems and problems
Improve the biologist’s understanding of the science of computing and provide the beginnings of a CS skill-set
04/21/23 16Introduction
Four Distinct Audiences
Computer scientists all about the algorithms, implementations, programming languages, design, etc.
Biologists mostly just want an introduction to programming
Undergrads High-level overview
Graduate Students Specific tools and skills that will aid them in research04/21/23 Introduction 17
Computer Scientists Biologists etc.
Undergrad Grad Undergrad Grad
Undergraduates Some algorithms (even implement some) New language: Perl and R Introduce programming concepts Lots of practice programming (8 projects) Lots of guidance from me
Graduate students Practice writing a grant (a draft and a final version) Practice writing a paper (a draft and a final
version) Practice using several actual Bio Tools
All Team projects
04/21/23 Introduction 18
04/21/23 19Introduction
Computer science wise Not really anything new More of an application
of existent techniques Dynamic programming
techniques Hidden Markov Models Exploratory data analysis▪ Clustering▪ Multivariate analysis▪ Clustering▪ Principal components analysis
04/21/23 20Introduction
Research Ph.D. generating
publicationsEmployee in a company
Drug company Genomics lab
04/21/23 21Introduction
Bioinformatician www.simplyhired.com
04/21/23 22Introduction
Techniques that are successful in bioinformatics are the same that are successful in other data-intensive fields
04/21/23 23Introduction
Hunger, need for clean water
Global warmingDisease
04/21/23 24Introduction
Genetically engineered crops Disease resistant Greater yields
Water treatment Genetically
engineered microbes▪ Sewage treatment—
purification▪ Clean oil spills
04/21/23 25Introduction
Plants consume CO2 and release O2 But the carbon is released back into the
atmosphere over a period of time
Genetically engineered plants could convert into stable form
04/21/23 26Introduction
Genetically enhanced microbes convert back to fuel Methanococcus jannaschii Takes CO2 and converts it
to methane
04/21/23 27Introduction
Test for increased risk of certain cancers
Personalize medicine Leukemia▪ Genetic profile
resistant to certain chemotherapy
Increased risk of drug reactions
04/21/23 28Introduction
Many drugs bind to protein active sites
Computational techniques for predicting drug performance
04/21/23 29Introduction
Actually alter our genetic code to treat genetic disorder
Or simply add disembodied gene to our complement
04/21/23 30Introduction
What does it have to do with informatics?
Where do computer scientists fit in this picture?
Role of computers and computer
scientists
Role of computers and computer
scientists
04/21/23 31Introduction
Why biologists would attend
04/21/23 Introduction 32
CS types good at the data analysis
Must understand what the data means
Don’t know what to look for—what questions to ask
Don’t speak the lingo
Haploid Hypertonic Hypotonic Erythematous Cilia Cell membrane Nucleus Lytic cycle Gene Biotic factors Nulliparity Hyperosmotic Natural selection Fluid mosaic model Solute Homologous chromosome Ribosome Mitochondria Diffusion Leucocytes Photosynthesis Genetic variation Organism Plasma membrane Cytoplasm Wagners disease Meiosis Habitat Diploid Cell Youpon Concentration gradient Ecosystem Homeostasis Mitosis Osmosis Allele Enzyme Autotrophic Egestion Mitochondrion Gamete Organisms Nucleotide Amino-acyl Gene expression Point mutation Duplication event
Haploid Hypertonic Hypotonic Erythematous Cilia Cell membrane Nucleus Lytic cycle Gene Biotic factors Nulliparity Hyperosmotic Natural selection Fluid mosaic model Solute Homologous chromosome Ribosome Mitochondria Diffusion Leucocytes Photosynthesis Genetic variation Organism Plasma membrane Cytoplasm Wagners disease Meiosis Habitat Diploid Cell Youpon Concentration gradient Ecosystem Homeostasis Mitosis Osmosis Allele Enzyme Autotrophic Egestion Mitochondrion Gamete Organisms Nucleotide Amino-acyl Gene expression Point mutation Duplication event
04/21/23 33Introduction
Biologists understand the data
Don’t know how to formulate the problem in CS terms
Don’t know what magic the CS types can bring to the table
Don’t speak the lingo
Acyclic graph Heap sort Huffman coding Adjacency-matrix Admissible vertex Abstract data type Algorithm All pairs shortest path Euclidean distance Hash Tree Linked list Heap Complexity analysis Recursion Dynamic programming Graph Hamiltonian path Heuristic Hidden Markov Model Principal components analysis Isomorphic Simplex algorithm Mahalanobis distance Discrete event simulation NP-complete Big O Optimization problem Polymorphism Polynomial time Clustering Classifying Stack Queue Stochastic modeling Tail recursion Binary tree Self organizing map Shortest common string Minimum spanning tree Singular matrix Trie Vertex cover
04/21/23 34Introduction
Won’t be a full-fledged bioinformatician Will be able to contribute given
close guidance practice and continued training and
guidance
04/21/23 35Introduction
Determine problem to be solved given data Determine which tool to utilize Manually Format data for input to tool
Might involve data retrieval if utilizing repository data
Run tool Analyze results
Biologists
perform all steps
04/21/23 36Introduction
Determine problem to be solved given data Develop algorithmic approach Implement algorithm (write code) Format data for input to algorithm
Might involve data retrieval if utilizing repository data
Run code Analyze results
ComputerScientist
Biologist
Biologist
04/21/23 37Introduction
Determine problem to be solved given data Develop algorithmic approach Implement algorithm (write code) Format data for input to algorithm
Might involve data retrieval if utilizing repository data
Run code Analyze results
ComputerScientist
Biologist
Biologist
04/21/23 38Introduction
CS types Provide beginnings of a
biology background Introduce some existing
tools, sources of data, and analysis techniques
Biologists Introduce some existing
tools, sources of data, and analysis techniques
Provide some programming essentials
04/21/23 39Introduction