1 bioinformatics in the department of computer science lenwood s. heath department of computer...
Post on 18-Dec-2015
215 views
TRANSCRIPT
1
Bioinformatics in the Department of Computer
Science
Lenwood S. HeathDepartment of Computer Science
Blacksburg, VA 24061
College of EngineeringNorthern Virginia Engineering Showcase
March 5, 2004
2
Bioinformatics Faculty
Cliff Shaffer
Adrian Sandu
Alexey Onufriev
Lenny Heath T. M. Murali
Naren Ramakrishnan
Eunice Santos
Layne Watson
Roger Ehrich
Chris North
Joao Setubal, CS and VBI
3/5/2004 Bioinformatics in Computer Science
3
Relevant Expertise
• Algorithms — Heath, Santos, Setubal, Shaffer, Watson• Computational structural biology — Onufriev, Sandu• Computational systems biology — Murali• Data mining — Ramakrishnan• Genomics — Heath, Murali, Ramakrishnan• Human-omputer interaction, visualization — North• Image processing — Ehrich, Watson• High performance computing — Sandu, Santos, Watson• Numerical analysis — Onufriev, Watson• Optimization — Watson• Problem solving environments — Ramakrishnan, Shaffer
3/5/2004 Bioinformatics in Computer Science
4
Selected Collaborations
• Virginia Tech: Biochemistry, Biology, Fralin Biotechnology Center, Plant Physiology, Veterinary Medicine, Virginia Bioinformatics Institute (VBI), Wood Science
• North Carolina State University: Forest Biotechnology Center
• Duke: Biology• University of Illinois: Plant Biology
5
Selected Funding• NSF IBN 0219322: ITR: Understanding Stress Resistance Mechanisms in
Plants: Multimodal Models Integrating Experimental Data, Databases, and the Literature. L. S. Heath; R. Grene, B. I. Chevone, N. Ramakrishnan, L. T. Watson. $499,973.
• NSF EIA-01903660: A Microarray Experiment Management System. N. Ramakrishnan, L. S. Heath, L. T. Watson, R. Grene, J. W. Weller (VBI). $600,000.
• DARPA N00014-01-1-0852: Dryophile Genes to Engineer Stasis-Recovery of Human Cells. M. Potts, L. S. Heath, R. F. Helm, N. Ramakrishnan, T. O. Sitz, F. Bloom, P. Price (Life Technologies), J. Battista (LSU). $4,532,622.
• NSF MCB-0083315: Biocomplexity---Incubation Activity: A Collaborative Problem Solving Environment for Computational Modeling of Eukaryotic Cell Cycle Controls. J. J. Tyson, L. T. Watson, N. Ramakrishnan, C. A. Shaffer, J. C. Sible. $99,965.
• NIH 1 R01 GM64339-01: ``Problem Solving Environment for Modeling the Cell Cycle. J. J. Tyson, J. Sible, K. Chen, L. T. Watson, C. A. Shaffer, N. Ramakrishnan, P. Mendes (VBI). 211,038.
• Air Force Research Laboratory F30602-01-2-0572: The Eukaryotic Cell Cycle as a Test Case for Modeling Cellular Regulation in a Collaborative Problem Solving Environment. J. J. Tyson, J. C. Sible, K. C. Chen, L. T. Watson, C. A. Shaffer, N. Ramakrishnan. $1,650,000.
3/5/2004 Bioinformatics in Computer Science
6
Research Resources
System X• Third fastest computer on the planetLaboratory for Advanced Scientific Computing &
Applications (LASCA)• Parallel algorithms & math software• Anantham Cluster• Grid computingBioinformatics Research LAN• Linux, Mac OS X, Windows• Bioinformatics databases and analysis
3/5/2004 Bioinformatics in Computer Science
7
JigCell: A PSE for JigCell: A PSE for Eukaryotic Cell Cycle ControlsEukaryotic Cell Cycle Controls
Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa,
Clifford A. Shaffer, Layne T. Watson,
Naren Ramakrishnan, and John J. Tyson
Departments of Computer Science and Biology
3/5/2004 Bioinformatics in Computer Science
8
Computational Molecular Biology
DNA
mRNA
Protein
Enzyme
Reaction Network
Cell Physiology
…TACCCGATGGCGAAATGC...
…AUGGGCUACCGCUUUACG...
…Met - Gly - Tyr - Arg - Phe - Thr...
ATP ADP
-P
X Y ZE1
E2
E3E4
9
Clb5MBF
P Sic1SCFSic1
Swi5
Clb2Mcm1
Unaligned chromosomes
Cln2Clb2
Clb5
Cdc20 Cdc20
Cdh1
Cdh1
Cdc20
APC
PPX
Mcm1
SBF
Esp1Esp1 Pds1
Pds1
Cdc20
Net1
Net1P
Cdc14
RENT
Cdc14
Cdc14
Cdc15
Tem1
Bub2
CDKs
Esp1
Mcm1 Mad2
Esp1
Unaligned chromosomes
Cdc15
Lte1
Budding
Cln2SBF
?
Cln3
Bck2and
growth
Sister chromatid separation
DNA synthesis
Cell Cycle of Budding Yeast
3/5/2004 Bioinformatics in Computer Science
10
JigCell Problem-Solving Environment
Experimental Database
Wiring Diagram
Differential Equations Parameter Values
Analysis Simulation
VisualizationAutomatic Parameter Estimation
3/5/2004 Bioinformatics in Computer Science
11
Why do these calculations?
• Is the model “yeast-shaped”?
• Bioinformatics role: the model organizes experimental information.
• New science: prediction, insight
JigCell is part of the DARPA BioSPICE suite of software tools for computational cell biology.
3/5/2004 Bioinformatics in Computer Science
12
Expresso:A Next Generation Software
System for Microarray Experiment Management
and Data Analysis
3/5/2004 Bioinformatics in Computer Science
13
• Integration of design, experimentation, and analysis
• Data mining; inductive logic programming (ILP)
• Closing the loop
• Drought stress experiments with pine trees and Arabidopsis
Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis
3/5/2004 Bioinformatics in Computer Science
14
Scenarios for Effects of Abiotic Stress on Gene Expression in Plants
3/5/2004 Bioinformatics in Computer Science
15
Data Mining with ILP
• ILP (inductive logic programming) is a data mining algorithm for inferring relationships or rules.
• ILP groups related data and chooses in favor of relationships having short descriptions.
• ILP can also flexibly incorporate a priori biological knowledge (e.g., categories and alternate classifications).
• Hybrid reasoning: Information Integration “Is there a relationship between genes in a given
functional category and genes in a particular expression cluster?”
ILP mines this information in a single step
3/5/2004 Bioinformatics in Computer Science
16
Rule Inference in ILP• Infers rules relating gene expression levels to
categories, both within a probe pair and across probe pairs, without explicit direction
• Example Rule:[Rule 142] [Pos cover = 69 Neg cover = 3]
level(A,moist_vs_severe,not positive) :- level(A,moist_vs_mild,positive).
• Interpretation:
“If the moist versus mild stress comparison was positive for some clone named A, it was negative or unchanged in the moist versus severe comparison for A, with a confidence of 95.8%.”
3/5/2004 Bioinformatics in Computer Science
17
ILP in the Expresso PipelineExpresso is a next generation software system for microarray
experiments that provides a database interface to ILP functionality.
3/5/2004 Bioinformatics in Computer Science
18
Status of Expresso• Capabilities
– Data capture and storage
– Statistical analysis
– Data mining by ILP
– Microarray experiment design — GeneSieve
– Expresso-assisted experiment composition
– Closing the experimental loop
• Successful microarray experiment analysis
– Pine, Norway spruce, yeast, Deinococcus radiodurans (an extremophile microorganism), human cell lines
• Planned microarray experiment analysis
– Potato, Arabidopsis thaliana, tomato, rice, corn
3/5/2004 Bioinformatics in Computer Science
19
Networks in Bioinformatics
• Mathematical Model(s) for Biological Networks
• Representation: What biological entities and parameters to represent and at what level of granularity?
• Operations and Computations: What manipulations and transformations are supported?
• Presentation: How can biologists visualize and explore networks?
3/5/2004 Bioinformatics in Computer Science
20
Reconciling Networks
Munnik and Meijer,FEBS Letters, 2001
Shinozaki and Yamaguchi-Shinozaki, Current Opinion
in Plant Biology, 2000
3/5/2004 Bioinformatics in Computer Science
21
Multimodal Networks• Nodes and edges have flexible semantics to represent:
- Time
- Uncertainty
- Cellular decision making; process regulation
- Cell topology and compartmentalization
- Rate constants
- Phylogeny
• Hierarchical
3/5/2004 Bioinformatics in Computer Science
22
Using Multimodal Networks
• Help biologists find new biological knowledge
• Visualize and explore
• Generating hypotheses and experiments
• Predict regulatory phenomena
• Predict responses to stress
• Incorporate into Expresso as part of closing the loop
3/5/2004 Bioinformatics in Computer Science
23
Conclusions
• Engaged faculty with the right expertise
• Numerous life science collaborations
• Federal research funding
• First-class computational resources
• A variety of cutting-edge bioinformatics research projects
3/5/2004 Bioinformatics in Computer Science
24
Bioinformatics Education
• Courses in Computer Science• Courses in the Life Sciences• Bioinformatics Option• Doctoral Program in Genetics,
Bioinformatics, and Computational Biology
3/5/2004 Bioinformatics in Computer Science
25
Doctoral Program in Genetics, Bioinformatics,
and Computational Biology
Multidisciplinary: biology, biochemistry, crop science, plant physiology, computer science, mathematics, statistics, veterinary medicine
3/5/2004 Bioinformatics in Computer Science
26
Anantham Cluster
Previous cluster specs
• 200 AMD 1 GHz processors
• 1 GB RAM per processor
• 2 TB disk space
• 2.56 Gb/s Myrinet network
Previous 200 processor cluster