data integration in the microbedb.jp using semantic web ... iccc13microbedb.jp.pdf · data...

16
Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori , Ikuo Uchiyama, Yasukazu Nakamura, Hideaki Sugawara, Ken Kurokawa, and MicrobeDB.jp Project Team ICCC13, September 26, 2013, Beijing, China WDCM and CODATA Joint Workshop 1 1, 2, 3 1 2 3 3 1) Tokyo Institute of Technology, 2) National Institute for Basic Biology, 3) National Institute of Genetics, Japan

Upload: others

Post on 30-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Data integration in the MicrobeDB.jpusing Semantic Web technology

Hiroshi Mori, Ikuo Uchiyama, Yasukazu Nakamura, Hideaki Sugawara,

Ken Kurokawa, and MicrobeDB.jp Project Team

ICCC13, September 26, 2013, Beijing, ChinaWDCM and CODATA Joint Workshop

1

1, 2, 31

2 3 3

1) Tokyo Institute of Technology, 2) National Institute for Basic Biology,3) National Institute of Genetics, Japan

Page 2: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Ortholog Taxonomy

Pathogen

Gene Function

Metagenome

Genome Culture Collection

Which DBs should we use?

Many microbial databases (DBs) exist …

Page 3: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

From National Research Council (USA)

Microbes inhabit almost everywhere on Earth and interact with their environments.

Knowledge of microbes will have high potential scientific and commercial applications.

Page 4: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Promoting the Integrated Use of Life Science Databases in Japan

・ FY 2007-2010 “Integrated Database Project”→ Database Center for Life Science (DBCLS)

・ FY 2011-→ National Bioscience Database Center (NBDC)

About NBDC・ Established in April 2011・ As part of the Japan Science and Technology Agency (JST), a

funding agency supported by MEXT

URL: http://biosciencedbc.jp/?lng=en

Page 5: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Activities by NBDC1. Formulation of strategies related to coordination and integration of

DBs, and international cooperation

2. Creation and management of a portal website from existing life science DBs http://biosciencedbc.jp/?lng=en

3. Funding of R&D of new technology necessary for organizing and linking life science DBs

4. Funding of R&D that coordinate existing and emerging DBs in specific research fields

Includes microbes (PI: Ken KUROKAWA)

Aim ofto integrate several microbial data (include omics, taxonomy/cultures, habitats) using semantic web technology

Page 6: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

integrates lots of data related to microbes.

Especially, we integrates the microbial data that can be linked to genomes.

Ortholog: MBGD

Genome: GTPS/RefSeq

Annotation: TogoAnnotation

Culture Collection:NBRC/JCM

Metadata: INSDC SRA

Metagenome: INSDC SRA

Taxonomy: NCBI Taxonomy

http://microbedb.jp/

Gene Taxon Environment

Red color indicates our collaborators.

Other dataOther data

Other data

How to simplify the process of integration of other domain’s data?

Page 7: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Gene1has

FunctionGO:000370

0

RDF is a standard data model of Semantic Web technology

Genome1 organismEscherichia

coli

Search

RDF (Resource Description Framework)Data model which uses Triples (Subject – Predicate – Object) Gene1

hasFunction

GO:0003700Gene1

hasFunction

GO:0003700

Genome1 organismEscherichia

coliGenome1 organismEscherichia

coli

Organism1has

GenomeGenome1Organism1

hasGenome

Genome1Organism1

hasGenome

Genome1

Organism1 inhabit LakeOrganism1 inhabit LakeOrganism1 inhabit Lake

RDF

OntologyTriple store

SPARQL

S P O

gtps:Gene1 rdfs:label “16S rRNA gene”

KO:03043

<URI> <URI> <URI>/Literal

URI node can be linked to other nodes

S P O/S P O

S P O ×

To prepare data in RDF, the database management system automatically recognize same resources (same URI).

Page 8: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Gene1has

FunctionGO:000370

0

Genome1 organismEscherichia

coli

Gene1has

FunctionGO:000370

0Gene1has

FunctionGO:000370

0

Genome1 organismEscherichia

coliGenome1 organism Organism 1

Organism1has

GenomeGenome1Organism1

hasGenome

Genome1Organism1

hasGenome

Genome1

Organism1 inhabit LakeOrganism1 inhabit LakeOrganism1 inhabit Lake

DB 1

Gene1 hasFunction

GO:0003700Organism 1 can

ProduceEnzyme 1

Genome1 organismEscherichia

coliEnzyme 1canUse

Compound 1

Organism1has

GenomeGenome1

Organism 1can

GrowMedium 1

DB 2

owl:sameAs

1. When two DBs use same URI, already two DB’s data are integrated.2. If not, you can integrate two DB’s data by adding one Triple (db1:A owl:sameAs db2:B).

How to integrate the data from two different DBs?

How can we discriminate whether two DB’s resources are same or not?

You don’t need to place all of these data in one DB managenement system.

Page 9: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

You should describe your resource by using some Ontologies

Ontology is a structured controlled vocabulary to describe properties and types of resources.

MEO (Microbes Environmental Ontology) PDO (Pathogenic Disease Ontology)

MCCV (Microbial Culture Collection Vocabulary)

MSV (Metagenome Sample Vocabulary)

MPO (Microbial Phenotype Ontology)

MBGD Ortholog Ontology

Most of them can be obtained from

For example, to answer: What is soil? What is a relationship between soil and sand?

Page 10: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Ortholog: MBGD

Genome: GTPS/RefSeq

Annotation: TogoAnnotation

Culture Collection:NBRC/JCM

Metadata: INSDC SRA

Metagenome: INSDC SRA

Taxonomy: NCBI Taxonomy

http://microbedb.jp/

Gene Taxon Environment

Red color indicates our collaborators.

We have converted most of our data to RDF, developed many ontologies, and developed a RDFized microbial DB.

More than 1 billion Triples!

Page 11: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

JCM/NBRC Culture Collection data1. Strain_Number

2. Other_Collection_Numbers

3. Name

4. Organism_Type

5. History_of_Deposit

6. Date_of_Isolation

7. Isolated_from

8. Geographic_Origin

9. Status

10. Optimum_Temperature_for_Growth

11. Maximum_Temperature_for_Growth

12. Minimum_Temperature_for_Growth

13. Medium

14. Application

15. Literature

RDF conversion example

Page 12: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

nbrc:NBRC_12841

rdf:type

:MCCV_000001(Culture)

<http://www.dsmz.de/catalogues/details/culture

/DSM-40226.html>

:MCCV_000025

:MCCV_000012

“Streptomyces griseus subsp. griseus (Krainsky 1914) Waksman and Henrici 1948”

:MCCV_000014“Optimal growth temperature”

<http://identifiers.org/taxonomy/67263>

<http://www.ncbi.nlm.nih.gov/taxonomy/67263>

<http://purl.uniprot.org/taxonomy/67263>

” DSM 40226”

#

:MCCV_000026

“28"^^<http://www.w3.org/2001/XMLSchema#integer>

:MCCV_00018“Strain Number”

nbrcmedium:NBRC_227

:MCCV_000033” Application"

"Thienamycins production ; Vitamin B12 (Cyanocobalamine) production ; Steroid conversion"

<http://identifiers.org/taxonomy/67274>

<http://www.ncbi.nlm.nih.gov/taxonomy/67274>

<http://purl.uniprot.org/taxonomy/67274>

“IFO 12841 <-- SAJ <-- OWU (ISP 5226) <-- Squibb &

Sons (F. Arnow, MD 2428, ETH 24234, NIHJ 501)”

:MCCV_000027”History of deposit”

“Soil”

:MCCV_000028

“Isolated from” #

meo:MEO_0000007

rdfs:label

dc:identifier

"false"^^xsd:boolean

:MCCV_000017”Type Strain "

Example of NBRC Culture Collection RDF data

:MCCV_00023

:MCCV_00022

Page 13: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Overall data structure of MicrobeDB.jp

Page 14: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

http://microbedb.jp/

Page 15: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

Keyword example: lake

Taxonomic compositio

n of 16S amplicon

sequencing which

sampled from lake

Metagenome

samples obtained

from lake

MEO hierarchi

cal structure

lake meo:pond is_a meo:lake Strain_A mccv:isolation_source meo:pond Strain_A

Abundant Orthologs in metagenome samples

obtained from lake

JCM/NBRC Strains isolated from lake

Genome sequenced

strains isolated

from lake

MicrobeDB.jp will facilitate the exploration of the existing scattered information of microbes.

Page 16: Data integration in the MicrobeDB.jp using Semantic Web ... ICCC13MicrobeDB.jp.pdf · Data integration in the MicrobeDB.jp using Semantic Web technology Hiroshi Mori, Ikuo Uchiyama,

・ Ken Kurokawa (Tokyo Institute of Technology)Junichi Takehara, Koji Yoshino, Nozomi Yamamoto, Takuji Yamada, Fumikazu Konishi

・ Yasukazu Nakamura (National Institute of Genetics, DDBJ)Takatomo Fujisawa, Eri Kaminuma, Hideaki Sugawara

・ Ikuo Uchiyama (National Institute for Basic Biology)Hirokazu Chiba, Hiroyo Nishide

Advisor (DataBase Center for Life Science)Shinobu Okamoto, Shuichi Kawashima, Toshiaki Katayama, Yasunori Yamamoto, Shoko Kawamoto

NBRC Culture Collection dataKen’ichiro Suzuki, Masami Ichihara, Natsuko Ichikawa

JCM Culture Collection dataMoriya Ohkuma, Takuji Kudo

Funding

Acknowledgementshttp://microbedb.jp/