semantic browser lsdis (large scale distributed information systems) lab. bilal gonen m.sc. in...

8
Semantic Browser • LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia [email protected]

Upload: roger-holmes

Post on 16-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

Semantic Browser

• LSDIS (Large Scale Distributed Information Systems) Lab.

Bilal GonenM.Sc. in Computer Science

University of Georgia

[email protected]

Page 2: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

• Semantic Browser tool enables the users to traverse among the semantically connected documents easily. The documents are connected by using the relationships, such as “causes”, “adjacent to ”, “produces”…etc.

Semantic Browser

Page 3: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

• PubMed dataset is used (48,252 documents).

• MESH terms in these documents are annotated with the UMLS ontology.

• Our ontology that we generated from the UMLS, has;

• 135 classes & 49 relationships in schema level.

• 21,945 entity instances in instance level.

Dataset

Page 4: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

Lymphocytosis

Cancers

diseases

GonadsNafenopin

ultraviolet raysnon-melanoma

melanoma

Skin cancer

MagneticsFlu

blood cancer

KhellinHyptis

causes

• How can we utilize this knowledgebase to enable user to traverse among the documents?

Page 5: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

ultraviolet rays Melanomacauses

Doc-to-Doc

By using the information “ultraviolet rays causes melanome”, user can traverse to other files which contains the term “melanoma”

Page 6: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

JSP(Java Server Page)

Java Script

AJAX

user

Lucene Index for documentsPubMed

dataset

Ontology SemDis API

Lucene indexing is used to index the documents with the 21,945 MESH terms when they occur in the documents.

User Interface(HTML page)

Because, the request is to get the documents in which the requested term appears, so JSP calls the Lucene methods in the Java class.

The advantage of the AJAX technology is to send and receive only needed information between the client and server. Thus, instead of reloading the whole page, only the response received from the server is embedded into HTML page.

To begin traversing, the user can type any MESH term or one of its synonyms in the user interface.

Now, user hovers on any MESH term in the document’s abstract.

Because, the request is to get the types of the MESH term, so JSP calls the corresponding SemDis method in the Java class.

SemDis API gets the types of the instance term from the ontology.

request

response

keyword

related documents

Built in LSDIS Lab. This API is used to process the triples in the ontology.

Contains 135 classes and 49 relationships in schema level. And 21,945 entity instances in the instance level

Contains 48,252 documents

List of the documents are returned from the Lucene index.

The user clicks on one of the file names, and the abstract of that file is embedded into HTML page.

The MESH term is sent to server as a request to get its types from the ontology by using the SemDis API.

By hovering the mouse over the class name returned from server, the user makes another request to SemDis API to get the relationships outgoing from that class. And these processes continue in this way, until the user gets the list of file names to traverse to.

Because we also used the synonyms of the 21,945 MESH terms, therefore we used ~104,000 terms to index the documents.

Page 7: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

• User Interface (HTML page)• SemDis API (to process the ontology)• Ant & Tomcat (to deploy to servlet container)• AJAX (Asynchronous JavaScript And XML)• JSP (Java Server Page)• Java script (several XMLDOM and DHTML

functions are used in JavaScript)• Ontologies in RDF format• Lucene indexing• Eclipse 3.1.2 for Java coding.

Technology used

Page 8: Semantic Browser LSDIS (Large Scale Distributed Information Systems) Lab. Bilal Gonen M.Sc. in Computer Science University of Georgia gonen@uga.edu

Thank you..