self adaptive based natural language interface for disambiguation of

15
Self-Adaptive Based Natural Language Interface for Disambiguation of Semantic Search NURFADHLINA MOHD SHAREF [email protected] MOHAMMAD YASSER SHAFAZAND [email protected] FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, UNIVERSITI PUTRA MALAYSIA SERDANG, SELANGOR, MALAYSIA

Upload: nurfadhlina-mohd-sharef

Post on 08-Jul-2015

66 views

Category:

Internet


0 download

DESCRIPTION

The semantic technology enhances big data advancements by allowing sophisticated analysis of texts. Through the Linked Data technology, tremendous amount of information can be connected. However, this inherits ambiguity when it needs to be manipulated for certain purpose like natural language interface, semantic search and question answering. There are limited works which address ambiguity in semantic search. This paper introduces a technique based on self-adaptive disambiguation which utilizes the possible concept annotations of terms in the natural language queries. This will allow users to compose query in natural language and receive accurate answers without having to master the formal syntax of the semantic query language.

TRANSCRIPT

Page 1: Self adaptive based natural language interface for disambiguation of

Self-Adaptive Based Natural Language Interface for Disambiguation of Semantic SearchN U R FA D H L I N A M O H D S H A R E F N U R FA D H L I N A @ U P M . E D U. M YM O H A M M A D YA S S E R S H A FA Z A N D 7 9 . Z A N D @ G M A I L .C O M

FA C U LT Y O F C O M P U T E R S C I E N C E A N D I N F O R M AT I O N T E C H N O LO GY,

U N I V E R S I T I P U T R A M A L AY S I AS E R D A N G , S E L A N G O R , M A L AY S I A

Page 2: Self adaptive based natural language interface for disambiguation of

"Big Data" refers to data sets whose size is beyond the ability of typical database software tools to capture, store manage and analyze (McKinsey).

“Linked Data” stands for semantically well structured, interconnected, syntactically interoperable datasets that are distributed among several repositories either inside or outside organisations http://www.semantic-web.at/big-data-linked-data

Page 3: Self adaptive based natural language interface for disambiguation of

Utilizing Linked Data and Big Data for organisational and enterprise purposes will be one of the next big challenges in the evolution of the web.

Big Data takes account of the fact that new techniques and technologies are needed for the sustainable and socially balanced exploitation of huge data pools. The Linked Data paradigm is one approach to cope with Big Data, as it advances the hypertext principle from a web of documents to a web of rich data.

Page 4: Self adaptive based natural language interface for disambiguation of
Page 5: Self adaptive based natural language interface for disambiguation of

Semantic Web: a webby way to link data

Open Data meets the Semantic Web: Linked Open Data

http://www.semantic-web-journal.net/system/files/swj488.pdf

Page 6: Self adaptive based natural language interface for disambiguation of

One of the key challenges in making use of Big Data lies in finding ways of dealing with heterogeneity, diversity, and

complexity of the data, while its volume and velocity forbid solutions available for smaller datasets as based, e.g., on manual curation or manual integration of data. Semantic Web Technologies are meant to deal with these issues,

and indeed since the advent of Linked Data a few years ago, they have become central to mainstream Semantic Web

research and development.

We can easily understand Linked Data as being a part of the greater Big Data landscape, as many of the challenges are the same. The linking component of Linked Data, however, puts an additional focus on the integration and conflation

of data across multiple sources.

Page 7: Self adaptive based natural language interface for disambiguation of

BIG DATA

Volume Velocity VarietyValue and Veracity

Supercomputing

Internet of Things

Semantic Web

Social Science

Page 8: Self adaptive based natural language interface for disambiguation of

Smart DataSmart data makes sense out of Big data http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html

It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making.

uses background knowledge, experiences, advanced and contextualized reasoning, and is often highly personalized

focused on the actionable value in data creation, processing and consumption phases for improving the human experience

Page 9: Self adaptive based natural language interface for disambiguation of

5 steps to Turn Big Data into Smart Datahttp://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1

1. Add meaning

2. Add context

3. Embrace Graphs

4. Iterate

5. Adopt standard

Page 10: Self adaptive based natural language interface for disambiguation of

Natural Language Query Generated SPARQLWhat is the lowest point in kansas? SELECT ?c0

WHERE {?c0 ?p0 ?i0 . ?c0 a geo:LoPoint .filter (?i0 = geo:kansas) .filter ( ?p0 = geo:isLowestPointOf ) .

}What is the area of idaho? SELECT ?i0

WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:idaho) .filter ( ?p0 = geo:stateArea ) .

}what states border oklahoma? SELECT ?i0

WHERE {?c0 ?p0 ?i0 . ?i0 a geo:State .filter (?c0 = geo:oklahoma) .filter ( ?p0 = geo:borders ) .

}what is the population of oregon? SELECT ?i0

WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:oregon) .filter ( ?p0 = geo:statePopulation ) .

}

Page 11: Self adaptive based natural language interface for disambiguation of

Ambiguities in Querying Big Datawhen there are more than one possible concept annotation for a word in the NL input

when a word inside the NL input cannot be matched with any KB concept

when constructing the SPARQL where there is more than one possibility of SPARQL pattern

Page 12: Self adaptive based natural language interface for disambiguation of

Self Adaptive Model for Semantic Data Search in Big Data

Page 13: Self adaptive based natural language interface for disambiguation of

Input: NL query

Output: Answer

Process:

1. Load ontology and build a matrix of the object properties, classes and instances and its

connections

2. Let T as the tokenized and stemmed NL query

3. For each tT, let A be the set of annotation based on relevant concepts

4. For each aA

a. Create and add possible triplets, filters and options statements using dictionary

and reasoner (using bottom up reasoning rules)

b. Create new SPARQL syntax using (4(a))

c. Run SPARQL and send statements and results to reasoner.

5. Return last created SPARQL syntax which has results.

Page 14: Self adaptive based natural language interface for disambiguation of

ResultsThe SANLI is tested on two different datasets namely the Mooney’s Geography ontology and a Quran structure ontology.

SANLI is able to correctly answer all questions in the geography ontology where the questions have <s, p, o>, <o, p, s>, <p, o>, <o, p> and <o > patterns identified.

Rules for other patterns have not yet been implemented. For example <o, p, o> patterns mostly result in a true false result as in “Does Texas border Oklahoma?” which we have not implemented yet.

Page 15: Self adaptive based natural language interface for disambiguation of

ConclusionThe Semantic Web can leverage the sophisticated analytics with bigdata.

Big Data and Linked Data will be an integral part of the future webinfrastructure, where massive amounts of data are available,connected and identifiable via Uniform Resource Identifiers.

More personalized-based applications to exploit smart data to itsmaximum potential