databasing expression with integrative biochip informaticsjuhan/bgm/ppt/databasingexpression.pdf ·...

Post on 27-Sep-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Ju Han Kim, M.D., Ph.D., M.S.SNUBiomedical InformaticsSeoul Nat’t Univ. School of Medicinehttp://www.snubi.org/

Databasing Expression with Integrative Biochip Informatics

Databasing Gene Expression

• Bio-databases• Microarray basics• Do we really need databases for expression?• An example for pharmacogenomics approach• Public repositories for expression• Relational vs. Object Oriented Models• OM-MAGEML• The reality• Integrative biochip informatics, coming soon…

http://www.snubi.org/

2

Bio-databases

• PIR: bio-sequences in the 60’s by M. Dayhoff

• NAR review: major 500 bio-databases

• Primary – Secondary -….

• Standardization

• Integration

• Intelligence

http://www.snubi.org/

Paradigm Shift - Clinical Knowledge Management -

Clinician-directed Resource-directed

Dr. Abraham

Dr. ElsonDr. Faughnan

Dr. Dandy

Dr. connellyDr. Belsky

Dr. Abraham

Informaticshttp://www.snubi.org/

3

DNA

RNA

Protein

The Central Dogma of Life

http://www.snubi.org/

http://www.snubi.org/

4

http://www.snubi.org/

http://www.snubi.org/

5

Bio-databases, what are the problems

• Heterogeneity: data types and forms• Complexity• Only loosely connected• Noise & quality• Level of granularity • Curation• Is modelling even possible?• In-silico experiment is only possible with

models http://www.snubi.org/

Clinical knowledge engineering:

3-D Visualizationof Gene Expression

Streicher J, et al., Nature Genetics 2000;25:147-52

http://www.snubi.org/

6

Clinical knowledge engineering:Future queries

Database Structure

Streicher J, et al., Nature Genetics 2000;25:147-52http://www.snubi.org/

Biochip basics

Bioinformaticspipeline

http://www.snubi.org/

7

Biochip, Core competency• They are the genes!• We have the map!• Natural measure of quantification.• Literally, INFINITE # of states • Dynamic series on time & space• Don’t need to extract bio-molecules.• Now systemic perturbations!

put abstraction barrierStreamlining & automation of the process

Do biology in silico! ********************************************** Say NO to lab bench!

InterestingPatients

InterestingAnimals

InterestingCell Lines

AppropriateTissue

AppropriateConditions Extract RNA

Scan Biochip

HybridizeBiochip

MakeBiochip

Data Pre-processing

AccessSignificance

A Functional Genomics Strategy

Post-clusterAnalysis &IntegrationBiological

ValidationInformaticalValidation? ??

FunctionalClustering

http://www.snubi.org/

8

Biochip informatics: clustering

A11A21A31A41A51A61A71A81A91

time

A12A22A32A42A52A62A72A82A92

A13A23A33A43A53A63A73A83A93

A14A24A34A44A54A64A74A84A94

A15A25A35A45A55A65A75A85A95

A16A26A36A46A56A66A76A86A96

http://www.snubi.org/

Biochip informatics: clustering

clusteringhttp://www.snubi.org/

9

Hierarchical & Partitional Clustering

http://www.snubi.org/

Clinical relevance of Biochip informatics

Dx. Discovery

Px.Tx.

http://www.snubi.org/

10

Really need databases for expression?

• Standard practice for gene sequences is..• How do we know the data integrity without data?• Are authors’ interpretations sufficient?

• It’s a whole new type of data.• Observational vs. experimental• Enormous potential for the good of public

• Where to house the data?• In what format?• and.. what’s next?

http://www.snubi.org/

With a single format for gene expression data, databases

should be able to 'talk' to one another and exchange data.

The existence of a standard language should also spur

development of software tools to query the databases,

and to manage and display gene expression data.http://www.snubi.org/

11

http://www.snubi.org/

An example for pharmacogenomics

http://www.snubi.org/

12

An example for pharmacogenomics

http://www.snubi.org/

Expression data repository projects

• Public repositories in making:

• GEO : NCBI

• GeneX : NCGR

• ArrayExpress : EBI

• In-house databases : Stanford, MIT, U. Pennsylvania

• Organism specific databases: Mouse in Jackson

http://www.snubi.org/

13

http://www.snubi.org/

http://www.snubi.org/

14

Database Model: Relational vs. Object-Oriented (or Frame-based)

MIAME Model

RAD Modelhttp://www.snubi.org/

StanfordMicroarrayDatabase(SMD)Relational

• Ontology for samples• Gene index for genes• Annotation for Exp.

http://www.snubi.org/

15

ArrayDB

http://www.snubi.org/

http://www.snubi.org/

16

MGED participants including

• Affymetrix• Berkeley• DDBJ • DKFZ• EMBL• Gene Logic• Incyte• Max Plank Institute

• NCBI• NCGR• NHGRI• Sanger Centre• Stanford• Uni Pennsylvania• Uni Washington• Whitehead Institute

http://www.snubi.org/

Reporting a Microarray Experiment

• Purpose of Study• Experimental Details

• Experimental Description

• Experimental Data• Image Files • Data Files

Standard Needed to Describe Microarray Experiment

http://www.snubi.org/

17

Goals of a Microarray Data Standard

•Upload and Retrieval from Public Repositories•Data Exchange Between Portals

•Standard Data Access and Exchange Format

•Encapsulate Data and Experiment Description•Rationale for Experimental Study •Experimental Details•Experimental Data

•Widespread Industry Support

http://www.snubi.org/

Microarray Standards

•XML Implementation of the MIAME Standard•Formed Via Merge of MAML and GEML Standards•De Facto Widespread Industry Support

•MAGE-ML

•MIAME•Minimum Information About a Microarray Experiment •Experimental Design, Array Design, Hybridization,

Samples, Measurements and Normalization

http://www.snubi.org/

18

http://www.snubi.org/

Conceptual view of gene expression data.

http://www.snubi.org/

19

Three parts of gene expression DB

• Gene annotation – may be given as links to gene sequence databases

• Sample annotation – there currently are no public external databases (except the species taxonomy)

• Gene expression matrix – each position contains information characterizing the expression of a particular gene in a particular sample. What are the measurement units for gene expression levels?

http://www.snubi.org/

General principles of MIAME design

• The recorded information about each experiment should be sufficient to interpret the experiment and should be detailed enough to enable comparisonsto similar experiments and permit replication of experiments

• The information should be structured in a way that enables useful querying as well as automated data analysis and mining

http://www.snubi.org/

20

MIAME structure

I. Array design

II. Experiment design

1. Experimental design

2. Samples used, extract preparation and labeling

3. Hybridization procedures and parameters

4. Measurement data and specifications of data processing

http://www.snubi.org/

Microarray Information to be Captured

http://www.snubi.org/

21

Six components of microarray experiment.

http://www.snubi.org/

Three levels of microarray gene expression data processing.

http://www.snubi.org/

22

http://www.snubi.org/

•Framework for Developing MAGE-ML•OMG specifications are developed in UML•MAGE-OM represents a data driven model of microarray experiments•This model can be used to automatically generate an XML DTD

MAGE-OM

UML (Unified Modeling Language)•Standard object-oriented design language•Methods for showing relationships between data objects•Objects are boxes (things)•Association between objects are indicated by lines

http://www.snubi.org/

23

http://www.snubi.org/

Main Packages

•Biosequence•Quantitation Type•ArrayDesign•DesignElement•Array•BioMaterial•BioAssay•BioAssayData

•Experiment•HigherLevelAnalysis•Protocol•Description•Audit and Security•Measurement•BioEvent

http://www.snubi.org/

24

http://www.snubi.org/

http://www.snubi.org/

25

http://www.snubi.org/

Mapping from MAGE-OM to MAGE-ML

Generated DTD

http://www.snubi.org/

26

ArrayDesign의Class Diagram

http://www.snubi.org/

• Current DTD (Document Type Definition)• A few sample data sets

Information Available online

• Platform for moving data between data generators and shared databases

• International format to communicate data from DBto third part application

• Support MIAME compliant data

MAGE-ML :Microarray Data Exchange Formats

http://www.snubi.org/

27

http://www.snubi.org/

http://www.snubi.org/

28

MAML(Microarray Markup Language)

• standard / vocabulary• communication / msg• ontology

http://www.snubi.org/

http://www.snubi.org/

29

RDB implementation of MAGEML

Web Server

RDBMS

INTERNET ClientXML XML

• 관계형 데이터베이스에서 XML로 변환 후 연결한다.

• XML이라는 object는 데이터베이스와 응용프로그램 간에 데이터 교환을 위한 표준기술

의 역할을 한다.→ 데이터베이스에 직접 의존하는 부분이 적어지며 갱신의 부담이 적어진다.

→ 플랫폼이나 기타 언어 등에 제약을 받지 않고 XML의 자원을 전세계적으로

이용할 수 있다.

http://www.snubi.org/

<Overview of data transfer structure>

Client Middleware Database

Middleware Implementation in Database

http JDBCJSP DOM을 처리하는

ServletDATABASE(MySQL)

XML File

http://www.snubi.org/

30

Transforming the cancer center

http://www.snubi.org/

Goals of National Center for Toxicogenomics

http://www.snubi.org/

31

Minimum information to be recorded about toxicogenomics experiments

• Experimental design parameters, animal husbandry information or cell line and culture information, exposure parameters, dosing regimen, dose groups, and in-life observations.

• Microarray data, specifying the number and details of replicate array bioassays associated with particular samples, and including PCR transcript analysis if available.

• Numerical biological endpoint data, including necropsy weights or cell counts and doubling times, clinical chemistry and enzyme assays, hematology, urinalysis, other.

• Textual endpoint information such as gross observations, pathology and microscopy findings.

http://www.snubi.org/

ArrayTrack

MicroarrayDatabase

GeneLibProteinLibPathwayLibToxicantLib

INTERFACE

ToolsPublicCommercial

In-house

Functionalcomponents

User

http://www.snubi.org/

32

Development of a ToxicogenomicSupportive Database: dbZach

Lyle D. Burgoon*Department of Pharmacology & ToxicologyInstitute For Environmental Toxicology and The National Food Safety & Toxicology CenterMichigan State University

tel: (517) 353-1944fax: (517) 353-9334e-mail: burgoonl@msu.eduhttp://dbzach.fst.msu.edu

Research Program Supported by: National Institutes of HealthP42 ES 04911-12, *T32 ES07255

http://www.snubi.org/

IBM/Mayo Clinic CollaborationApplied Genomics Data Analysis

Genomic data (DNA) – GeneChip array data (RNA)Protein data

Clinical DataSigns

SymptomsLaboratoryRadiology

Etc.

Optimized, individualized healthcare

DatabasesGenomeProteomeDiseaseTumorsDrugs

Phase I

http://www.snubi.org/

33

http://www.snubi.org/

PsyBase 1.0

1995년 서울대학교병원 신경정신과에서 사용되기 시작된 국내 최초의전자의무기록 PsyBase 1.0. http://www.snubi.org/

34

http://www.snubi.org/

http://www.snubi.org/

35

Streamlining the process

Data management layer

clone data

Outside data Slide data

Cell data

Hyb. data

Exp. data

scan data

Image analysisArray fabrication

Cluster analysisData mining

Pathway/networkanalysis

Inhouse data

• Miniaturization & Streamlining the process

http://www.snubi.org/

Integrated biochip informatics

Integrated biochip informaticsIntegrated biochip informatics

Data management layer

clone data

Outside data Slide data

Cell data

Hyb. data

Exp. data

scan data

Inhouse data

Image analysisArray fabrication

Cluster analysisData mining

Pathway/networkanalysis

CIS

CommunicationOntology

Literature and FactualDatabase mining

Rosetta transcriptomicsIdeker, 2001, ScienceChemoinformatics

http://www.snubi.org/

36

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

37

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

Xperanto: Expressionist’s Esperanto in XML

http://www.snubi.org/

38

Xperanto: Expressionist’s Esperanto in XML

DTD or Schema(generated from MAGE-OM)

MAGE-OM

XML Validation

Data Entryby HTML

MAGE-ML

Translation to/from XML

MAGEstk(Java API)

MGED Ontology

Relational DB

RDB implementation

인터넷

XML ExpressionRepository

1. Input control2. R-DB construction

http://www.snubi.org/

DTD or Schema(generated from MAGE-OM)

MAGE-OM

XML Validation

Data Entryby HTML

MAGE-ML

MAGEstk(Java API)

MGED Ontology

Relational DBRDB implementation

인터넷

XMLExpressionRepository

서열정보 모델링

발현정보 모델링

임상정보 모델링

Open Source & XML

http://www.snubi.org/

39

Thank you!

top related