databasing expression with integrative biochip informaticsjuhan/bgm/ppt/databasingexpression.pdf ·...
TRANSCRIPT
1
Ju Han Kim, M.D., Ph.D., M.S.SNUBiomedical InformaticsSeoul Nat’t Univ. School of Medicinehttp://www.snubi.org/
Databasing Expression with Integrative Biochip Informatics
Databasing Gene Expression
• Bio-databases• Microarray basics• Do we really need databases for expression?• An example for pharmacogenomics approach• Public repositories for expression• Relational vs. Object Oriented Models• OM-MAGEML• The reality• Integrative biochip informatics, coming soon…
http://www.snubi.org/
2
Bio-databases
• PIR: bio-sequences in the 60’s by M. Dayhoff
• NAR review: major 500 bio-databases
• Primary – Secondary -….
• Standardization
• Integration
• Intelligence
http://www.snubi.org/
Paradigm Shift - Clinical Knowledge Management -
Clinician-directed Resource-directed
Dr. Abraham
Dr. ElsonDr. Faughnan
Dr. Dandy
Dr. connellyDr. Belsky
Dr. Abraham
Informaticshttp://www.snubi.org/
3
DNA
RNA
Protein
The Central Dogma of Life
http://www.snubi.org/
http://www.snubi.org/
4
http://www.snubi.org/
http://www.snubi.org/
5
Bio-databases, what are the problems
• Heterogeneity: data types and forms• Complexity• Only loosely connected• Noise & quality• Level of granularity • Curation• Is modelling even possible?• In-silico experiment is only possible with
models http://www.snubi.org/
Clinical knowledge engineering:
3-D Visualizationof Gene Expression
Streicher J, et al., Nature Genetics 2000;25:147-52
http://www.snubi.org/
6
Clinical knowledge engineering:Future queries
Database Structure
Streicher J, et al., Nature Genetics 2000;25:147-52http://www.snubi.org/
Biochip basics
Bioinformaticspipeline
http://www.snubi.org/
7
Biochip, Core competency• They are the genes!• We have the map!• Natural measure of quantification.• Literally, INFINITE # of states • Dynamic series on time & space• Don’t need to extract bio-molecules.• Now systemic perturbations!
put abstraction barrierStreamlining & automation of the process
Do biology in silico! ********************************************** Say NO to lab bench!
InterestingPatients
InterestingAnimals
InterestingCell Lines
AppropriateTissue
AppropriateConditions Extract RNA
Scan Biochip
HybridizeBiochip
MakeBiochip
Data Pre-processing
AccessSignificance
A Functional Genomics Strategy
Post-clusterAnalysis &IntegrationBiological
ValidationInformaticalValidation? ??
FunctionalClustering
http://www.snubi.org/
8
Biochip informatics: clustering
A11A21A31A41A51A61A71A81A91
time
A12A22A32A42A52A62A72A82A92
A13A23A33A43A53A63A73A83A93
A14A24A34A44A54A64A74A84A94
A15A25A35A45A55A65A75A85A95
A16A26A36A46A56A66A76A86A96
http://www.snubi.org/
Biochip informatics: clustering
clusteringhttp://www.snubi.org/
9
Hierarchical & Partitional Clustering
http://www.snubi.org/
Clinical relevance of Biochip informatics
Dx. Discovery
Px.Tx.
http://www.snubi.org/
10
Really need databases for expression?
• Standard practice for gene sequences is..• How do we know the data integrity without data?• Are authors’ interpretations sufficient?
• It’s a whole new type of data.• Observational vs. experimental• Enormous potential for the good of public
• Where to house the data?• In what format?• and.. what’s next?
http://www.snubi.org/
With a single format for gene expression data, databases
should be able to 'talk' to one another and exchange data.
The existence of a standard language should also spur
development of software tools to query the databases,
and to manage and display gene expression data.http://www.snubi.org/
11
http://www.snubi.org/
An example for pharmacogenomics
http://www.snubi.org/
12
An example for pharmacogenomics
http://www.snubi.org/
Expression data repository projects
• Public repositories in making:
• GEO : NCBI
• GeneX : NCGR
• ArrayExpress : EBI
• In-house databases : Stanford, MIT, U. Pennsylvania
• Organism specific databases: Mouse in Jackson
http://www.snubi.org/
13
http://www.snubi.org/
http://www.snubi.org/
14
Database Model: Relational vs. Object-Oriented (or Frame-based)
MIAME Model
RAD Modelhttp://www.snubi.org/
StanfordMicroarrayDatabase(SMD)Relational
• Ontology for samples• Gene index for genes• Annotation for Exp.
http://www.snubi.org/
15
ArrayDB
http://www.snubi.org/
http://www.snubi.org/
16
MGED participants including
• Affymetrix• Berkeley• DDBJ • DKFZ• EMBL• Gene Logic• Incyte• Max Plank Institute
• NCBI• NCGR• NHGRI• Sanger Centre• Stanford• Uni Pennsylvania• Uni Washington• Whitehead Institute
http://www.snubi.org/
Reporting a Microarray Experiment
• Purpose of Study• Experimental Details
• Experimental Description
• Experimental Data• Image Files • Data Files
Standard Needed to Describe Microarray Experiment
http://www.snubi.org/
17
Goals of a Microarray Data Standard
•Upload and Retrieval from Public Repositories•Data Exchange Between Portals
•Standard Data Access and Exchange Format
•Encapsulate Data and Experiment Description•Rationale for Experimental Study •Experimental Details•Experimental Data
•Widespread Industry Support
http://www.snubi.org/
Microarray Standards
•XML Implementation of the MIAME Standard•Formed Via Merge of MAML and GEML Standards•De Facto Widespread Industry Support
•MAGE-ML
•MIAME•Minimum Information About a Microarray Experiment •Experimental Design, Array Design, Hybridization,
Samples, Measurements and Normalization
http://www.snubi.org/
18
http://www.snubi.org/
Conceptual view of gene expression data.
http://www.snubi.org/
19
Three parts of gene expression DB
• Gene annotation – may be given as links to gene sequence databases
• Sample annotation – there currently are no public external databases (except the species taxonomy)
• Gene expression matrix – each position contains information characterizing the expression of a particular gene in a particular sample. What are the measurement units for gene expression levels?
http://www.snubi.org/
General principles of MIAME design
• The recorded information about each experiment should be sufficient to interpret the experiment and should be detailed enough to enable comparisonsto similar experiments and permit replication of experiments
• The information should be structured in a way that enables useful querying as well as automated data analysis and mining
http://www.snubi.org/
20
MIAME structure
I. Array design
II. Experiment design
1. Experimental design
2. Samples used, extract preparation and labeling
3. Hybridization procedures and parameters
4. Measurement data and specifications of data processing
http://www.snubi.org/
Microarray Information to be Captured
http://www.snubi.org/
21
Six components of microarray experiment.
http://www.snubi.org/
Three levels of microarray gene expression data processing.
http://www.snubi.org/
22
http://www.snubi.org/
•Framework for Developing MAGE-ML•OMG specifications are developed in UML•MAGE-OM represents a data driven model of microarray experiments•This model can be used to automatically generate an XML DTD
MAGE-OM
UML (Unified Modeling Language)•Standard object-oriented design language•Methods for showing relationships between data objects•Objects are boxes (things)•Association between objects are indicated by lines
http://www.snubi.org/
23
http://www.snubi.org/
Main Packages
•Biosequence•Quantitation Type•ArrayDesign•DesignElement•Array•BioMaterial•BioAssay•BioAssayData
•Experiment•HigherLevelAnalysis•Protocol•Description•Audit and Security•Measurement•BioEvent
http://www.snubi.org/
24
http://www.snubi.org/
http://www.snubi.org/
25
http://www.snubi.org/
Mapping from MAGE-OM to MAGE-ML
Generated DTD
http://www.snubi.org/
26
ArrayDesign의Class Diagram
http://www.snubi.org/
• Current DTD (Document Type Definition)• A few sample data sets
Information Available online
• Platform for moving data between data generators and shared databases
• International format to communicate data from DBto third part application
• Support MIAME compliant data
MAGE-ML :Microarray Data Exchange Formats
http://www.snubi.org/
27
http://www.snubi.org/
http://www.snubi.org/
28
MAML(Microarray Markup Language)
• standard / vocabulary• communication / msg• ontology
http://www.snubi.org/
http://www.snubi.org/
29
RDB implementation of MAGEML
Web Server
RDBMS
INTERNET ClientXML XML
• 관계형 데이터베이스에서 XML로 변환 후 연결한다.
• XML이라는 object는 데이터베이스와 응용프로그램 간에 데이터 교환을 위한 표준기술
의 역할을 한다.→ 데이터베이스에 직접 의존하는 부분이 적어지며 갱신의 부담이 적어진다.
→ 플랫폼이나 기타 언어 등에 제약을 받지 않고 XML의 자원을 전세계적으로
이용할 수 있다.
http://www.snubi.org/
<Overview of data transfer structure>
Client Middleware Database
Middleware Implementation in Database
http JDBCJSP DOM을 처리하는
ServletDATABASE(MySQL)
XML File
http://www.snubi.org/
30
Transforming the cancer center
http://www.snubi.org/
Goals of National Center for Toxicogenomics
http://www.snubi.org/
31
Minimum information to be recorded about toxicogenomics experiments
• Experimental design parameters, animal husbandry information or cell line and culture information, exposure parameters, dosing regimen, dose groups, and in-life observations.
• Microarray data, specifying the number and details of replicate array bioassays associated with particular samples, and including PCR transcript analysis if available.
• Numerical biological endpoint data, including necropsy weights or cell counts and doubling times, clinical chemistry and enzyme assays, hematology, urinalysis, other.
• Textual endpoint information such as gross observations, pathology and microscopy findings.
http://www.snubi.org/
ArrayTrack
MicroarrayDatabase
GeneLibProteinLibPathwayLibToxicantLib
INTERFACE
ToolsPublicCommercial
In-house
Functionalcomponents
User
http://www.snubi.org/
32
Development of a ToxicogenomicSupportive Database: dbZach
Lyle D. Burgoon*Department of Pharmacology & ToxicologyInstitute For Environmental Toxicology and The National Food Safety & Toxicology CenterMichigan State University
tel: (517) 353-1944fax: (517) 353-9334e-mail: [email protected]://dbzach.fst.msu.edu
Research Program Supported by: National Institutes of HealthP42 ES 04911-12, *T32 ES07255
http://www.snubi.org/
IBM/Mayo Clinic CollaborationApplied Genomics Data Analysis
Genomic data (DNA) – GeneChip array data (RNA)Protein data
Clinical DataSigns
SymptomsLaboratoryRadiology
Etc.
Optimized, individualized healthcare
DatabasesGenomeProteomeDiseaseTumorsDrugs
Phase I
http://www.snubi.org/
33
http://www.snubi.org/
PsyBase 1.0
1995년 서울대학교병원 신경정신과에서 사용되기 시작된 국내 최초의전자의무기록 PsyBase 1.0. http://www.snubi.org/
34
http://www.snubi.org/
http://www.snubi.org/
35
Streamlining the process
Data management layer
clone data
Outside data Slide data
Cell data
Hyb. data
Exp. data
scan data
Image analysisArray fabrication
Cluster analysisData mining
Pathway/networkanalysis
Inhouse data
• Miniaturization & Streamlining the process
http://www.snubi.org/
Integrated biochip informatics
Integrated biochip informaticsIntegrated biochip informatics
Data management layer
clone data
Outside data Slide data
Cell data
Hyb. data
Exp. data
scan data
Inhouse data
Image analysisArray fabrication
Cluster analysisData mining
Pathway/networkanalysis
CIS
CommunicationOntology
Literature and FactualDatabase mining
Rosetta transcriptomicsIdeker, 2001, ScienceChemoinformatics
http://www.snubi.org/
36
Xperanto: Expressionist’s Esperanto in XML
http://www.snubi.org/
Xperanto: Expressionist’s Esperanto in XML
http://www.snubi.org/
37
Xperanto: Expressionist’s Esperanto in XML
http://www.snubi.org/
Xperanto: Expressionist’s Esperanto in XML
http://www.snubi.org/
38
Xperanto: Expressionist’s Esperanto in XML
DTD or Schema(generated from MAGE-OM)
MAGE-OM
XML Validation
Data Entryby HTML
MAGE-ML
Translation to/from XML
MAGEstk(Java API)
MGED Ontology
Relational DB
RDB implementation
인터넷
XML ExpressionRepository
1. Input control2. R-DB construction
http://www.snubi.org/
DTD or Schema(generated from MAGE-OM)
MAGE-OM
XML Validation
Data Entryby HTML
MAGE-ML
MAGEstk(Java API)
MGED Ontology
Relational DBRDB implementation
인터넷
XMLExpressionRepository
서열정보 모델링
발현정보 모델링
임상정보 모델링
Open Source & XML
http://www.snubi.org/
39
Thank you!