syed farrukh mehdi reza fathzadeh s. m. faisal abbas (presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

19
MedOnto: Medical Ontology Learning System (Work in Progress) Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

Upload: clement-carr

Post on 17-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

1

MedOnto: Medical Ontology Learning System

(Work in Progress)

Syed Farrukh MehdiReza Fathzadeh

S. M. Faisal Abbas (Presenter){fmehdi,reza,fabbas}@cs.dal.ca

Page 2: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

2

Ontology◦ Machine readable information

Text◦ Human readable information, most of the current

information is text. Ontology Learning

◦ (Semi) automatic extraction of relevant concept and relations

Medical Domain

Introduction:

Page 3: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

3

Syntax based concept learning augmented with domain specific subject corpora

Methodology

Domain Specific

Knowledge base

Syntax Based Extraction

Page 4: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

4

Medical Domain Terminology◦ OpenGalen project

GALEN Terminology Server

For Other domains, domain specific terminology corpus should be used.

Domain Specific Corpus

Page 5: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

5

Syntax Based Extraction Levels

Paul Buitelaar

Page 6: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

6

Parsing ◦ Linguistic Method

Using Production Rules specified by linguists

◦ Statistical Method Using statistical models derived from written text.

We used Stanford NLP Parser which is a statistical parser

Dependency Trees instead of Parse Trees

Term Extraction

Page 7: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

7

Domain Specific Terminology Corpus Language corpus for general concepts

◦ GRAIL Terminology Server for Medical Domain◦ WordNet for English Language

Synonym Extraction

Page 8: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

8

Intension◦ Formal and information definition of terms

Extension◦ Deriving concepts

Linguistic Realization◦ Concept coverage

Concept Extraction

Page 9: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

9

Terminal Concept◦ Nouns, Noun Phrases

Compound Concepts◦ Defined Rules

Terminal and Compound Concepts

Page 10: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

10

Concepts are related Defined Rules

Relation Extraction

Page 11: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

11

IN subordinating conjunction (FUNC_WORD) or preposition (PREP) ◦ “of”

Candidate for Taxonomy

Rules (IN)

Page 12: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

12

CC coordinating conjunction ◦ “and”, “or” etc

◦ Compound concepts, broken into terminal concepts

Rules (CC)

Page 13: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

13

RB adverb and adverbial phrase DT determiner/demonstrative pronoun

Ignored in our work so far

Rules (RB, DT, PDT)

Page 14: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

14

Verb is used as a relation between subject and object

Rule (VB)

Page 15: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

15

JJ adjective NN common noun

Rule (JJ+NN -> NP)

Page 16: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

16

Recursive, until dependency tree is exhausted

Create compound concepts and relate them with the rule and then apply the rules on the sub phrases

Algorithm

Page 17: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

17

Framework Institution Reference

ASIUM INRIA, Jouy--‐en--‐Josas Faure and Nedellec 1999

TextToOnto AIFB, University of Karlsruhe Madche and Volz 2001

HASTI Amir Kabir University, Teheran Shamsfard,Barforoush2004

OntoLT DFKI, Saarbrucken Buitelaar et al. 2004

DOODLE Shizuoka University Morita et al.2004

Text2Onto AIFB, University of Karlsruhe Cimiano and Volker 2005

OntoLearn University of Rome Velardi et al. 2005

OLE Brno University of Technology Novacek and Smrz 2005

OntoGen Institute Jozef Stefan, Ljubljana Fortuna et al., 2007

GALeOn Technical University of Madrid Manzano-Macho et al. 2008

DINO DERI, Galway Novacek et al.2008

OntoLancs Lancester University Gacitua et al. 2008

RELExO AIFB, University of Karlsruhe Volker and Rudolph 2008

OntoComp University of Dresden Sertkaya 2008

Other Work

Page 18: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

18

[Buitelaar05] Paul Buitelaar, etal. Ontology Learning from Text, October 3 rd , 2005

[Kim09] Jin-Dong Kim et al., Overview of BioNLP’09 Shared Task On Event Extraction  

[Stuck] Semantic Technologies, Ontology Learning, Prof. Dr. Heiner Stuckenschmidt, Dr. Johanna Völker

[Biemann] Chris Biemann: Ontology Learning from Text: A Survey of Methods

[StanParser] http://nlp.stanford.edu/software/lex-parser.shtml

[WordNet] http://wordnet.princeton.edu/ [OpenGALEN] http://www.opengalen.org/

References

Page 19: Syed Farrukh Mehdi Reza Fathzadeh S. M. Faisal Abbas (Presenter) {fmehdi,reza,fabbas}@cs.dal.ca 1

19

Please provide us Comments and Directions

Thank you.