biosemantics semantic support technology for on-line knowledge tracking and discovery second order...

25
BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and t umor necro s is factor a lpha, t he cells acquire a mature phenotype of dendritic cells t hat is characterized by up-r egulation of human l eukocy t e a ntigen (C D80, C D86, CD40 and C D54 and appearance of CD83 . These

Upload: eugene-dennis

Post on 16-Jan-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

BIOSEMANTICSSemantic Support Technology

For on-line Knowledge Tracking and Discovery

Second Order Semantic Enrichment

CD40 ligand and tumor necrosis factor alpha, the cells acquire a mature phenotype of dendritic cells that is characterized by up-regulation of human leukocy

te antigen (CD80, CD86, CD40and CD54 and appearance of CD83. These

Page 2: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Too much to read: major trends foreseen:

• From Reading to Consulting

• From Reading to Meta Analysis

• From Writing to Knowledge Representations

• To Central AND Distributed Annotation

Page 3: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

textminingWriting =ambiguity

Future (hope)

Papyrust

Page 4: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

What do we do ?

• Disambiguate Text and tag/link concepts– Pre-done for ‘own’ content– On the fly for selected web environments

• Meta-analyse at concept level

• Provide meta-analysed information

• Support Information Based Knowledge Discovery (especially new associations)

Page 5: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Ambiguity 1: Synonyms

• Facilitating networks of information. van Mulligen EM, Diwersy M, Schmidt M, Buurman H, Mons BProceedings of AMIA Symposium 2000, 868-72

Page 6: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Ambiguity 2: Homonyms

PSAProstate Specific AntigenPSoriatic Arthritisalpha-2,8-PolySialic AcidPolySubstance AbusePicryl Sulfonic AcidPolymeric Silicic AcidPartial Sensory AgnosiaPoultry Science Association

• Distribution of information in biomedical abstracts and full-text publications, Schuemie MJ, Weeber M, Schijvenaars BJ, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA, Bioinformatics 2004 Nov 1, 20:2597-604

Page 7: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

But…we have nomenclature committees now……

DEFB4DEFB4defensin, beta 4defensin, beta 4SAP1SAP1, HBD-2, DEFB-2, DEFB102, DEFB2, HBD-2, DEFB-2, DEFB102, DEFB2

ELK4ELK4ELK4, ETS-domain protein (SRF accessory protein 1)ELK4, ETS-domain protein (SRF accessory protein 1) SAP1SAP1

PSAPPSAPproposin (variant Gaucher disease and variant proposin (variant Gaucher disease and variant metachromatic leukodystrophy)metachromatic leukodystrophy)SAP1SAP1, GLBA, GLBA

PRESENT

Page 8: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

• Contextual annotation of web pages for interactive browsing, van Mulligen E, Diwersy M, Schijvenaars B, Weeber M, van der Eijk CC, Jelier R, Schuemie M, Kors J, Mons B, Medinfo 2004, 11:94-8• Which gene did you mean?, Mons B, BMC Bioinformatics 2005 Jun 7, 6:142

First order semantic enrichment

The Knowlet

2nd order S.E.

Page 9: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Second Order Semantic Enrichment1: Creating Reference Knowlets

PSA Prostate Specific Antigen

PSA Psoriatic Arthritis

ReferenceKnowlet

ReferenceKnowlet

Page 10: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

2. Context matching

PSA ??

Prostate Specific Antigen

Psoriatic Arthritis

ReferenceKnowlet

ReferenceKnowlet

New text

93 % correct in ‘Worst Case Scenario’98 % overall….

• Thesaurus-based disambiguation of gene symbols. Schijvenaars BJ, Mons B, Weeber M, Schuemie MJ, van Mulligen EM, Wain HM, Kors JABMC Bioinformatics 2005 Jun 16, 6:149•Word sense disambiguation in the biomedical domain: an overview. Schuemie MJ, Kors JA, Mons B, Journal of Computational Biology 2005 Jun, 12:554-65

x

Page 11: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Text (free or structured)

Resolving ambiguities (contextual reference concepts)

Concept Tagging and inserting appropriate links

Basic methodology:Concept Tagging, creation and systematic aggregation of Knowlets

Text Knowlet

Object Knowlets (people, diseases, drugs, genes)

Collection Knowlets ( category, pathway, Micro-array-gene-set )

Aggregation of Object Knowlets

Aggregation of Text Knowlets

Page 12: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

person organisation Object 1

gene

Object 2

disease

Object 3

drug

> 15 million Knowlets from PubMed etc.

3. Building an association matrix of large data sources

Page 13: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

 

1

0.16 1

0.30 0.03 1

0.28 0.35 0.20 1

0.188

0.004 0.15 0.13 1

A matrix of associative distances

meta-analysis

HierarchicalClusteringACSMDSEtc.

Page 14: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

4. Meta-analysis method 1: ACS

• Constructing an Associative Concept Space for Literature-based Discovery, van der Eijk CC, van Mulligen EM, Kors JA, Mons B, van den Berg JJournal of the American Society for Information Science and Technology 2004, 55(5): 436-444•Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Jelier R, Jenster G, Dorssers LC, van der Eijk CC, •van Mulligen EM, Mons B, Kors JA Bioinformatics 2005 May 1, 21:2049-58

Page 15: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Function unknow nChaperonesChromatin structureFibrous proteinsmRNA metabolismOthersRibosomal proteinsRibosome biogenesisTranslation

SRPPARN

l

• Assignment of protein function and discovery of new nucleolar proteins based on automatic analysis of MEDLINE. Martijn Schuemie, Christine Chichester, Frederique Lisaceck, Yohann Coute, Peter-Jan Roes, Jean Charles Sanchez, Barend MonsTo be Submitted

Page 16: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Fingerprints

Knowlet 

Association Matrix Meta-analysis

Expert Challenge

WikiZ/PExpert comments

Peer to Peer Review

Final Approval

U.W. Fingerprint

Update

Literature

Protein A

Page 17: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

0.1

0.4

0.9

New publications or annotations

Solid

Liquid

Gas

1st order Semantic enrichment

ReductionFalse Positives

DiscussionVoting in Wiki

Meta-analysisProximity measures

Proposals to Data bases ?

Central Annotation

Page 18: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

• REGISTRATION (1X)

• Unique Author ID• E-mail Adress• PHP/userpage• People Knowlets

• Unique concept ID• Language variants• Homonyms• Definitions (brief)• Object Knowlets

Science Wiki’s• UID from WiktionaryZ• Research information• Talk-page• Liquid Threads• Object Knowlets

• UID from WiktionaryZ• Articles about UID’s• Encyclopaedic/ NPOV• Anonymous allowed

Page 19: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Dr. Johan den Dunnen

Wiki-Authors

OMIM

NPOV

DMD (Hs)

MEI

Wiki-Proteins

DMD (Hs)

AOI

Page 20: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis

Freely provided by

• Original Source • WZ definition (choice)• Related concepts (Knowlet)• Experts (Wiki-Authors)• More about (Wiki-proteins)• Vote (Wiki-Proteins)• Wikipedia article• Browse (Knowlet Browser)• Google (e-vamp)

Page 21: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis
Page 22: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis
Page 23: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis
Page 24: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis
Page 25: BIOSEMANTICS Semantic Support Technology For on-line Knowledge Tracking and Discovery Second Order Semantic Enrichment CD40 ligand and tumor necro sis