databasing expression with integrative biochip informaticsjuhan/bgm/ppt/databasingexpression.pdf ·...

39
Ju Han Kim, M.D., Ph.D., M.S. SNUBiomedical Informatics Seoul Nat’t Univ. School of Medicine http://www.snubi.org/ Databasing Expression with Integrative Biochip Informatics Databasing Gene Expression Bio-databases Microarray basics Do we really need databases for expression? An example for pharmacogenomics approach Public repositories for expression Relational vs. Object Oriented Models OM-MAGEML The reality Integrative biochip informatics, coming soon… http://www.snubi.org/

Upload: others

Post on 27-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

1

Ju Han Kim, M.D., Ph.D., M.S.SNUBiomedical InformaticsSeoul Nat’t Univ. School of Medicinehttp://www.snubi.org/

Databasing Expression with Integrative Biochip Informatics

Databasing Gene Expression

• Bio-databases• Microarray basics• Do we really need databases for expression?• An example for pharmacogenomics approach• Public repositories for expression• Relational vs. Object Oriented Models• OM-MAGEML• The reality• Integrative biochip informatics, coming soon…

http://www.snubi.org/

Page 2: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

2

Bio-databases

• PIR: bio-sequences in the 60’s by M. Dayhoff

• NAR review: major 500 bio-databases

• Primary – Secondary -….

• Standardization

• Integration

• Intelligence

http://www.snubi.org/

Paradigm Shift - Clinical Knowledge Management -

Clinician-directed Resource-directed

Dr. Abraham

Dr. ElsonDr. Faughnan

Dr. Dandy

Dr. connellyDr. Belsky

Dr. Abraham

Informaticshttp://www.snubi.org/

Page 3: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

3

DNA

RNA

Protein

The Central Dogma of Life

http://www.snubi.org/

http://www.snubi.org/

Page 4: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

4

http://www.snubi.org/

http://www.snubi.org/

Page 5: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

5

Bio-databases, what are the problems

• Heterogeneity: data types and forms• Complexity• Only loosely connected• Noise & quality• Level of granularity • Curation• Is modelling even possible?• In-silico experiment is only possible with

models http://www.snubi.org/

Clinical knowledge engineering:

3-D Visualizationof Gene Expression

Streicher J, et al., Nature Genetics 2000;25:147-52

http://www.snubi.org/

Page 6: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

6

Clinical knowledge engineering:Future queries

Database Structure

Streicher J, et al., Nature Genetics 2000;25:147-52http://www.snubi.org/

Biochip basics

Bioinformaticspipeline

http://www.snubi.org/

Page 7: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

7

Biochip, Core competency• They are the genes!• We have the map!• Natural measure of quantification.• Literally, INFINITE # of states • Dynamic series on time & space• Don’t need to extract bio-molecules.• Now systemic perturbations!

put abstraction barrierStreamlining & automation of the process

Do biology in silico! ********************************************** Say NO to lab bench!

InterestingPatients

InterestingAnimals

InterestingCell Lines

AppropriateTissue

AppropriateConditions Extract RNA

Scan Biochip

HybridizeBiochip

MakeBiochip

Data Pre-processing

AccessSignificance

A Functional Genomics Strategy

Post-clusterAnalysis &IntegrationBiological

ValidationInformaticalValidation? ??

FunctionalClustering

http://www.snubi.org/

Page 8: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

8

Biochip informatics: clustering

A11A21A31A41A51A61A71A81A91

time

A12A22A32A42A52A62A72A82A92

A13A23A33A43A53A63A73A83A93

A14A24A34A44A54A64A74A84A94

A15A25A35A45A55A65A75A85A95

A16A26A36A46A56A66A76A86A96

http://www.snubi.org/

Biochip informatics: clustering

clusteringhttp://www.snubi.org/

Page 9: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

9

Hierarchical & Partitional Clustering

http://www.snubi.org/

Clinical relevance of Biochip informatics

Dx. Discovery

Px.Tx.

http://www.snubi.org/

Page 10: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

10

Really need databases for expression?

• Standard practice for gene sequences is..• How do we know the data integrity without data?• Are authors’ interpretations sufficient?

• It’s a whole new type of data.• Observational vs. experimental• Enormous potential for the good of public

• Where to house the data?• In what format?• and.. what’s next?

http://www.snubi.org/

With a single format for gene expression data, databases

should be able to 'talk' to one another and exchange data.

The existence of a standard language should also spur

development of software tools to query the databases,

and to manage and display gene expression data.http://www.snubi.org/

Page 11: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

11

http://www.snubi.org/

An example for pharmacogenomics

http://www.snubi.org/

Page 12: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

12

An example for pharmacogenomics

http://www.snubi.org/

Expression data repository projects

• Public repositories in making:

• GEO : NCBI

• GeneX : NCGR

• ArrayExpress : EBI

• In-house databases : Stanford, MIT, U. Pennsylvania

• Organism specific databases: Mouse in Jackson

http://www.snubi.org/

Page 13: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

13

http://www.snubi.org/

http://www.snubi.org/

Page 14: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

14

Database Model: Relational vs. Object-Oriented (or Frame-based)

MIAME Model

RAD Modelhttp://www.snubi.org/

StanfordMicroarrayDatabase(SMD)Relational

• Ontology for samples• Gene index for genes• Annotation for Exp.

http://www.snubi.org/

Page 15: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

15

ArrayDB

http://www.snubi.org/

http://www.snubi.org/

Page 16: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

16

MGED participants including

• Affymetrix• Berkeley• DDBJ • DKFZ• EMBL• Gene Logic• Incyte• Max Plank Institute

• NCBI• NCGR• NHGRI• Sanger Centre• Stanford• Uni Pennsylvania• Uni Washington• Whitehead Institute

http://www.snubi.org/

Reporting a Microarray Experiment

• Purpose of Study• Experimental Details

• Experimental Description

• Experimental Data• Image Files • Data Files

Standard Needed to Describe Microarray Experiment

http://www.snubi.org/

Page 17: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

17

Goals of a Microarray Data Standard

•Upload and Retrieval from Public Repositories•Data Exchange Between Portals

•Standard Data Access and Exchange Format

•Encapsulate Data and Experiment Description•Rationale for Experimental Study •Experimental Details•Experimental Data

•Widespread Industry Support

http://www.snubi.org/

Microarray Standards

•XML Implementation of the MIAME Standard•Formed Via Merge of MAML and GEML Standards•De Facto Widespread Industry Support

•MAGE-ML

•MIAME•Minimum Information About a Microarray Experiment •Experimental Design, Array Design, Hybridization,

Samples, Measurements and Normalization

http://www.snubi.org/

Page 18: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

18

http://www.snubi.org/

Conceptual view of gene expression data.

http://www.snubi.org/

Page 19: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

19

Three parts of gene expression DB

• Gene annotation – may be given as links to gene sequence databases

• Sample annotation – there currently are no public external databases (except the species taxonomy)

• Gene expression matrix – each position contains information characterizing the expression of a particular gene in a particular sample. What are the measurement units for gene expression levels?

http://www.snubi.org/

General principles of MIAME design

• The recorded information about each experiment should be sufficient to interpret the experiment and should be detailed enough to enable comparisonsto similar experiments and permit replication of experiments

• The information should be structured in a way that enables useful querying as well as automated data analysis and mining

http://www.snubi.org/

Page 20: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

20

MIAME structure

I. Array design

II. Experiment design

1. Experimental design

2. Samples used, extract preparation and labeling

3. Hybridization procedures and parameters

4. Measurement data and specifications of data processing

http://www.snubi.org/

Microarray Information to be Captured

http://www.snubi.org/

Page 21: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

21

Six components of microarray experiment.

http://www.snubi.org/

Three levels of microarray gene expression data processing.

http://www.snubi.org/

Page 22: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

22

http://www.snubi.org/

•Framework for Developing MAGE-ML•OMG specifications are developed in UML•MAGE-OM represents a data driven model of microarray experiments•This model can be used to automatically generate an XML DTD

MAGE-OM

UML (Unified Modeling Language)•Standard object-oriented design language•Methods for showing relationships between data objects•Objects are boxes (things)•Association between objects are indicated by lines

http://www.snubi.org/

Page 23: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

23

http://www.snubi.org/

Main Packages

•Biosequence•Quantitation Type•ArrayDesign•DesignElement•Array•BioMaterial•BioAssay•BioAssayData

•Experiment•HigherLevelAnalysis•Protocol•Description•Audit and Security•Measurement•BioEvent

http://www.snubi.org/

Page 24: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

24

http://www.snubi.org/

http://www.snubi.org/

Page 25: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

25

http://www.snubi.org/

Mapping from MAGE-OM to MAGE-ML

Generated DTD

http://www.snubi.org/

Page 26: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

26

ArrayDesign의Class Diagram

http://www.snubi.org/

• Current DTD (Document Type Definition)• A few sample data sets

Information Available online

• Platform for moving data between data generators and shared databases

• International format to communicate data from DBto third part application

• Support MIAME compliant data

MAGE-ML :Microarray Data Exchange Formats

http://www.snubi.org/

Page 27: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

27

http://www.snubi.org/

http://www.snubi.org/

Page 28: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

28

MAML(Microarray Markup Language)

• standard / vocabulary• communication / msg• ontology

http://www.snubi.org/

http://www.snubi.org/

Page 29: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

29

RDB implementation of MAGEML

Web Server

RDBMS

INTERNET ClientXML XML

• 관계형 데이터베이스에서 XML로 변환 후 연결한다.

• XML이라는 object는 데이터베이스와 응용프로그램 간에 데이터 교환을 위한 표준기술

의 역할을 한다.→ 데이터베이스에 직접 의존하는 부분이 적어지며 갱신의 부담이 적어진다.

→ 플랫폼이나 기타 언어 등에 제약을 받지 않고 XML의 자원을 전세계적으로

이용할 수 있다.

http://www.snubi.org/

<Overview of data transfer structure>

Client Middleware Database

Middleware Implementation in Database

http JDBCJSP DOM을 처리하는

ServletDATABASE(MySQL)

XML File

http://www.snubi.org/

Page 30: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

30

Transforming the cancer center

http://www.snubi.org/

Goals of National Center for Toxicogenomics

http://www.snubi.org/

Page 31: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

31

Minimum information to be recorded about toxicogenomics experiments

• Experimental design parameters, animal husbandry information or cell line and culture information, exposure parameters, dosing regimen, dose groups, and in-life observations.

• Microarray data, specifying the number and details of replicate array bioassays associated with particular samples, and including PCR transcript analysis if available.

• Numerical biological endpoint data, including necropsy weights or cell counts and doubling times, clinical chemistry and enzyme assays, hematology, urinalysis, other.

• Textual endpoint information such as gross observations, pathology and microscopy findings.

http://www.snubi.org/

ArrayTrack

MicroarrayDatabase

GeneLibProteinLibPathwayLibToxicantLib

INTERFACE

ToolsPublicCommercial

In-house

Functionalcomponents

User

http://www.snubi.org/

Page 32: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

32

Development of a ToxicogenomicSupportive Database: dbZach

Lyle D. Burgoon*Department of Pharmacology & ToxicologyInstitute For Environmental Toxicology and The National Food Safety & Toxicology CenterMichigan State University

tel: (517) 353-1944fax: (517) 353-9334e-mail: [email protected]://dbzach.fst.msu.edu

Research Program Supported by: National Institutes of HealthP42 ES 04911-12, *T32 ES07255

http://www.snubi.org/

IBM/Mayo Clinic CollaborationApplied Genomics Data Analysis

Genomic data (DNA) – GeneChip array data (RNA)Protein data

Clinical DataSigns

SymptomsLaboratoryRadiology

Etc.

Optimized, individualized healthcare

DatabasesGenomeProteomeDiseaseTumorsDrugs

Phase I

http://www.snubi.org/

Page 33: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

33

http://www.snubi.org/

PsyBase 1.0

1995년 서울대학교병원 신경정신과에서 사용되기 시작된 국내 최초의전자의무기록 PsyBase 1.0. http://www.snubi.org/

Page 34: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

34

http://www.snubi.org/

http://www.snubi.org/

Page 35: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

35

Streamlining the process

Data management layer

clone data

Outside data Slide data

Cell data

Hyb. data

Exp. data

scan data

Image analysisArray fabrication

Cluster analysisData mining

Pathway/networkanalysis

Inhouse data

• Miniaturization & Streamlining the process

http://www.snubi.org/

Integrated biochip informatics

Integrated biochip informaticsIntegrated biochip informatics

Data management layer

clone data

Outside data Slide data

Cell data

Hyb. data

Exp. data

scan data

Inhouse data

Image analysisArray fabrication

Cluster analysisData mining

Pathway/networkanalysis

CIS

CommunicationOntology

Literature and FactualDatabase mining

Rosetta transcriptomicsIdeker, 2001, ScienceChemoinformatics

http://www.snubi.org/

Page 36: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

36

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Page 37: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

37

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Page 38: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

38

Xperanto: Expressionist’s Esperanto in XML

DTD or Schema(generated from MAGE-OM)

MAGE-OM

XML Validation

Data Entryby HTML

MAGE-ML

Translation to/from XML

MAGEstk(Java API)

MGED Ontology

Relational DB

RDB implementation

인터넷

XML ExpressionRepository

1. Input control2. R-DB construction

http://www.snubi.org/

DTD or Schema(generated from MAGE-OM)

MAGE-OM

XML Validation

Data Entryby HTML

MAGE-ML

MAGEstk(Java API)

MGED Ontology

Relational DBRDB implementation

인터넷

XMLExpressionRepository

서열정보 모델링

발현정보 모델링

임상정보 모델링

Open Source & XML

http://www.snubi.org/

Page 39: Databasing Expression with Integrative Biochip Informaticsjuhan/BGM/ppt/DatabasingExpression.pdf · • Microarray basics ... ***** Say NO to lab bench! Interesting Patients Interesting

39

Thank you!