semantic search facilitator: concept and current state of development

55
Semantic Search Facilitator: Concept Semantic Search Facilitator: Concept and Current State of Development and Current State of Development InBCT Tekes PROJECT Chapter 3.1.3 : InBCT Tekes PROJECT Chapter 3.1.3 : “Industrial Ontologies and Semantic Web” (year “Industrial Ontologies and Semantic Web” (year 2004) 2004)

Upload: kailani-james

Post on 30-Dec-2015

16 views

Category:

Documents


0 download

DESCRIPTION

Semantic Search Facilitator: Concept and Current State of Development. InBCT Tekes PROJECT Chapter 3.1.3 : “Industrial Ontologies and Semantic Web” (year 2004). Industrial Ontologies Group. Researchers Vagan Terziyan Oleksandr Kononenko Andriy Zharko Oleksiy Khriyenko Olena Kaykova - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search Facilitator: Concept Semantic Search Facilitator: Concept and Current State of Developmentand Current State of Development

InBCT Tekes PROJECT Chapter 3.1.3 :InBCT Tekes PROJECT Chapter 3.1.3 :“Industrial Ontologies and Semantic Web” (year 2004)“Industrial Ontologies and Semantic Web” (year 2004)

Page 2: Semantic Search Facilitator: Concept and Current State of Development

• Researchers Vagan Terziyan Oleksandr Kononenko Andriy Zharko Oleksiy Khriyenko Olena Kaykova Olga Klochko Andriy Taranov

Industrial Ontologies GroupIndustrial Ontologies GroupIndustrial Ontologies GroupIndustrial Ontologies Group

• Contact: e-mail: [email protected] Phone: +358 14 260 4618 URL: http://www.cs.jyu.fi/ai/OntoGroup

Page 3: Semantic Search Facilitator: Concept and Current State of Development

ResourcesResourcesResourcesResources

12 000  EURO salaries for 5 months

Resources used from InBCT Project in 2004:Resources used from InBCT Project in 2004:

Page 4: Semantic Search Facilitator: Concept and Current State of Development

Semantic-basedSemantic-based EnhancementEnhancement ofof thethe InformationInformation RetrievalRetrieval

Motivation from Industrial Ontologies Group:Motivation from Industrial Ontologies Group:

““While recently there is luck of annotated resources in the Web, While recently there is luck of annotated resources in the Web, which makes metadata-based search useless, we should develop which makes metadata-based search useless, we should develop

enhanced Web search tool based on Google and WordNet ontology enhanced Web search tool based on Google and WordNet ontology

and provide semantic search user interfaceand provide semantic search user interface””

Page 5: Semantic Search Facilitator: Concept and Current State of Development

Semantic Web and Semantic Web and Information RetrievalInformation Retrieval

Semantic Web and Semantic Web and Information RetrievalInformation Retrieval

Semantic Web promises many advantages and benefits, but: We are only in “transition” towards the Semantic Web Resources are not yet annotated semantically Not enough metadata available in the Web for more smart

search

Semantic search of non-semantic data ??? Yes, why not? We need a Semantic Facilitator !

Page 6: Semantic Search Facilitator: Concept and Current State of Development

Semantic Facilitator ConceptSemantic Facilitator ConceptSemantic Facilitator ConceptSemantic Facilitator Concept

What is it? Search service that uses other services

Utilizes other search engines as Web services and… … makes their performance better due to smart query

generation algorithms Supports search within heterogeneous resources (Web

pages, Web databases, local file system, etc.) Filters returned results based on user preferences

Intelligent “semantic query”-based tool that really “understands” what users want to find

What it is not? Search engine, indexing tool, registry, etc. Data storage, database browser, etc.

Page 7: Semantic Search Facilitator: Concept and Current State of Development

Web search - What’s the Problem?Web search - What’s the Problem?Web search - What’s the Problem?Web search - What’s the Problem?

• Search in the web is not always convenient: Polysemy of words gives

irrelevant results Synonymy does not supported

by search engines => loss of relevant results

• There is a need to capture semantics from search query

?Mouse

Page 8: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search AssistantSemantic Search Assistantlight version of Semantic Facilitatorlight version of Semantic Facilitator

Semantic Search AssistantSemantic Search Assistantlight version of Semantic Facilitatorlight version of Semantic Facilitator

• “Semantic Search Assistant” (SSA) is a software that: helps user to obtain more relevant results while

using standard search engine (Google) by interaction with WordNet ontology

finds possible contexts for words in search query can broaden or constrict search query with other

relevant words and phrases for result improvement works with not annotated documents is not restricted to any concrete domain

Page 9: Semantic Search Facilitator: Concept and Current State of Development

Sense DeterminationSense DeterminationSense DeterminationSense Determination

• WordNetWordNet is an open source ontology, which contains information about different meanings of a term, synonyms, antonyms and other lexical and semantic relations

• Having several words in search query we can determine in which context (sense) each of them is used with the help of WordNet: by comparing words synsets by comparing words textual descriptions and

examples by finding common roots going up in WordNet

hierarchy tree for each word by asking a user

Page 10: Semantic Search Facilitator: Concept and Current State of Development

How does it work?How does it work?How does it work?How does it work?

1. Gets keyword query 2. Translates original query into series of

queries to Google taking into account the semantics of keywords

3. Combines returned results

Page 11: Semantic Search Facilitator: Concept and Current State of Development

Ontology Ontology Personalization:Personalization:

is mechanism, which is mechanism, which allows users to have allows users to have own conceptual view own conceptual view and be able to use it for and be able to use it for semantic querying of semantic querying of search facilities. search facilities.

“Driver”

“Driver”

“Driver”

“Driver”“Driver”

                                               

Common ontologyCommon ontology

SSearchearch

Ontology PersonalizationOntology PersonalizationOntology PersonalizationOntology Personalization

Page 12: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search Enhancement :Semantic Search Enhancement :Common (linguistic) Common (linguistic)

ontologyontology

QueryQuery : : ( ( XX XX XX XX XXXX XXXX XX XX ))

Domain ontologyDomain ontology

SemanticFilteringSemanticFiltering

Result:Result:

Semantic Search FacilitatorSemantic Search Facilitator uses uses ontologically (WordNet) defined ontologically (WordNet) defined knowledge about words and embedded knowledge about words and embedded support of advanced Google-search support of advanced Google-search query features in order to construct query features in order to construct more efficient queries from formal more efficient queries from formal textual description of searched textual description of searched information. Semantic Search information. Semantic Search Facilitator hides from users the Facilitator hides from users the complexity of query language of complexity of query language of concrete search engine and performs concrete search engine and performs routine actions that most of users do in routine actions that most of users do in order to achieve better performance order to achieve better performance and get more relevant results.and get more relevant results.

Semantic Search EnhancementSemantic Search EnhancementSemantic Search EnhancementSemantic Search Enhancement

Page 13: Semantic Search Facilitator: Concept and Current State of Development

Capturing Semantics from Capturing Semantics from Search PhrasesSearch Phrases

Motivation according to our Ukrainian colleague: Motivation according to our Ukrainian colleague: Vadim ErmolayevVadim Ermolayev

““Google query should be transformed based on domain Google query should be transformed based on domain ontologyontology””

Page 14: Semantic Search Facilitator: Concept and Current State of Development
Page 15: Semantic Search Facilitator: Concept and Current State of Development
Page 16: Semantic Search Facilitator: Concept and Current State of Development
Page 17: Semantic Search Facilitator: Concept and Current State of Development
Page 18: Semantic Search Facilitator: Concept and Current State of Development
Page 19: Semantic Search Facilitator: Concept and Current State of Development
Page 20: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search AssistantSemantic Search Assistant

Page 21: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search AssistantSemantic Search AssistantSemantic Search AssistantSemantic Search Assistant

Page 22: Semantic Search Facilitator: Concept and Current State of Development

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Rij

…Word(i)

Sense (i1)

Sense (ij)

Sense (ip)

-1-1 11 …

Syn (ij1)

Syn (ijk)

Syn (ijmij )

… Nijk

i = 1, ni = 1, n

nn – number of the words from query

j = 1, pj = 1, p

pp – number of the word’s senses

-1-1 11

-1-1 11

- relevance of the word’s sense

00 11

Ri -

significance of the word in query

k = 1, mk = 1, mijij

mmijij – number of the word’s synonyms in senses

- number of the synonym’s senses

Page 23: Semantic Search Facilitator: Concept and Current State of Development

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

QQijkijk = =

Synonym Quality:Synonym Quality:

** LLNNijkijk

11 pp

j=1j=1

RRijij

LL – – number of the synsets which contain Synnumber of the synsets which contain Synijkijk

, , if Synif Synijkijk is a member of the synset is a member of the synsetjj

Word(i)

SynSynQQijkijk

SynSynQQijkijk

SynSynQQijkijk

SynSynQQijkijk

SynSynQQijkijk

SynSynQQijkijk

SynSynQQijkijk

……

Reduction of the synonym quality absolute valueReduction of the synonym quality absolute value if QQijkijk >= 0 >= 0, then synonym will used via ”OR” in a queryif QQijkijk < 0 < 0, , then will used via ”AND NOT”

Page 24: Semantic Search Facilitator: Concept and Current State of Development

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Word(1)

SynSyn SynSyn SynSyn …… SynSyn

Word(i) SynSyn SynSyn SynSyn …… SynSyn

Word(n) SynSyn SynSyn SynSyn …… SynSyn

SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn

AND AND

OR (AND NOT) OR (AND NOT) OR (AND NOT)

Algorithm 1:Algorithm 1:

QueryQuery

Page 25: Semantic Search Facilitator: Concept and Current State of Development

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Algorithm for the New Query Algorithm for the New Query GenerationGeneration

Word(1)

SynSyn SynSyn SynSyn …… SynSyn

Word(i) SynSyn SynSyn SynSyn …… SynSyn

Word(n) SynSyn SynSyn …… SynSyn

SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn

AND AND

OR (AND NOT) OR (AND NOT) OR (AND NOT)

Algorithm 2:Algorithm 2:

QueryQuery

Filtering based on a

significance of the word RRii

SynSyn

|Q|>

Page 26: Semantic Search Facilitator: Concept and Current State of Development

Google APIGoogle APIAdaptation to search engineAdaptation to search engine

Page 27: Semantic Search Facilitator: Concept and Current State of Development

We use Google because..We use Google because..We use Google because..We use Google because..

Developers write software that connects remotely to the Google Web APIs service and access Google's index of more than 4 billion web pages

Google Web APIs support the same search syntax as the Google.com site

Communication is performed via the Simple Object Access Protocol (SOAP), an XML-based mechanism for exchanging typed information

..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine

Page 28: Semantic Search Facilitator: Concept and Current State of Development

WordNetWordNet

( ( online access: online access: http://www.cogsci.princeton.edu/cgi-bin/webwnhttp://www.cogsci.princeton.edu/cgi-bin/webwn ) )

Page 29: Semantic Search Facilitator: Concept and Current State of Development

WordNet 2.0 Search ExampleWordNet 2.0 Search ExampleWordNet 2.0 Search ExampleWordNet 2.0 Search Example• Search word: "driver“ The noun "driver" has 5 senses in WordNet.

1. driver -- (the operator of a motor vehicle)2. driver -- (someone who drives animals that pull a vehicle)3. driver -- (a golfer who hits the golf ball with a driver)4. driver, device driver -- ((computer science) a program that determines how a computer will communicate with a peripheral device)5. driver, number one wood -- (a golf club (a wood) with a near vertical face that is used for hitting long shots from the tee)

• Sense 1driver -- (the operator of a motor vehicle)       => busman, bus driver -- (someone who drives a bus)       => chauffeur -- (a man paid to drive a privately owned car)       => designated driver --(the member of a party who is designated to refrain from

alcohol and so is sober when it is time to drive home)       => honker -- (a driver who causes his car's horn to make a loud honking sound; "the honker was fined for disturbing the peace")       => motorist, automobilist -- (someone who drives (or travels in) an automobile)       => owner-driver -- (a motorist who owns the car that he/she drives)       => racer, race driver, automobile driver -- (someone who drives racing cars at high speeds)       …

Page 30: Semantic Search Facilitator: Concept and Current State of Development

WordNet – Basic TerminologyWordNet – Basic TerminologyWordNet – Basic TerminologyWordNet – Basic Terminology

Syntactic category – part of speech {noun, verb, {noun, verb, adjective, adverb}adjective, adverb}

Synonymic set (synset)(synset) – list of synonymic words or collocations Every word can have several senses Every sense of a word is associated with synonyms

(synset) of the word in that specific sense Synsets are organized in hierarchies interlinked with

semanticrelations

Page 31: Semantic Search Facilitator: Concept and Current State of Development

WordNet – OrganizationWordNet – OrganizationWordNet – OrganizationWordNet – Organization

Building Blocks: Word forms – common word orthography Word meanings – by synsets

Relations: Lexical – between word forms Semantic – between word meanings

=> Pointers: Lexical – pertain only to specific word Semantic – pertain to all of the words in semantic set.

Page 32: Semantic Search Facilitator: Concept and Current State of Development

Semantic Search AssistantSemantic Search Assistantprototypeprototype

Page 33: Semantic Search Facilitator: Concept and Current State of Development

Features of SSAFeatures of SSAFeatures of SSAFeatures of SSA

• Platform independent (written in Java)

• Works in 2 modes:common mode, implements almost all of

Google functionality;extended mode, extends common mode,

makes several requests with the same semantic sense, returns compound results.

• Keeps results in XML format

Page 34: Semantic Search Facilitator: Concept and Current State of Development

Common modeCommon modeCommon modeCommon mode

• SSA has clear and simple interface, which helps user makes advanced Google search without special knowledge

• SSA transforms values of fields into Google request according to special format, which Google provides for advanced search

Page 35: Semantic Search Facilitator: Concept and Current State of Development

Extended modeExtended modeExtended modeExtended mode

• More powerful mode than the common one• SSA takes user request, makes a try to choose

more convenient sense with user’s help• Makes a set of requests, which extend user’s

request by synonyms and exclude unsuitable words

Page 36: Semantic Search Facilitator: Concept and Current State of Development

Generating of requests setGenerating of requests setGenerating of requests setGenerating of requests set

• WordNet API and dictionaries are used for generating the set of requests

• When user enters original request, SSA switches to the panel, where different senses of typed word are presented

Page 37: Semantic Search Facilitator: Concept and Current State of Development

Generating of requests set (2)Generating of requests set (2)Generating of requests set (2)Generating of requests set (2)

• For every presented sense on this panel a user can see some description (even example) extracted from WordNet dictionary

• Also he/she can set rate of correspondence for every sense in range [-1, 1]

Page 38: Semantic Search Facilitator: Concept and Current State of Development

Making compound resultMaking compound resultMaking compound resultMaking compound result

• SSA sends generated requests to Google one by one

• It keeps obtained results for each request separately

• User finally will get an integrated result, which was generated according special rules

Page 39: Semantic Search Facilitator: Concept and Current State of Development

Integrated results:Integrated results:generating rulesgenerating rules

Integrated results:Integrated results:generating rulesgenerating rules

• Unique identifier for each result is its URL

• SSA counts amount of URL appearances in returned results and sets this amount as index for every URL

• Results with bigger index are showed first

• If indexes are equal, results are shown according the order as Google returned them

Page 40: Semantic Search Facilitator: Concept and Current State of Development

Results analysisResults analysisResults analysisResults analysis

• After making all requests, SSA shows final results

• All results are keeping also in files in XML format for further analysing

• User can highlight results for specific request, if there were more than one request

Page 41: Semantic Search Facilitator: Concept and Current State of Development

ResultsResultsResultsResults

• Methods for automatic sense determination using WordNet Lexical Database were studied and correspondent algorithms were implemented

• Algorithm for new query generating were implemented and embedded to the programming complex

• User Interface for advanced search (with Google integration) was developed with Semantic Search Assistant functionality

Page 42: Semantic Search Facilitator: Concept and Current State of Development

ExampleExampleExampleExample

• Initial query:hotel reservation agency

(1, 7 and 5 senses correspondingly)

• From first 5 results only 3 are relevant(results with whole sequence of query words even does not appear in first three pages)

• Generated query:("hotel") ("booking" OR "reserve")

(-"qualification") ("bureau" OR "agency") (-"means")

• From first 5 results all are relevant (using synonym “booking” along with “reservation” was helpful)

Page 43: Semantic Search Facilitator: Concept and Current State of Development

ExampleExampleExampleExample

Results of initial query: Results of generated query:

Page 44: Semantic Search Facilitator: Concept and Current State of Development

More ExamplesMore Examples

Page 45: Semantic Search Facilitator: Concept and Current State of Development

Test 1:Test 1:Test 1:Test 1:Initial query: cork mousepadcork mousepad

Page 46: Semantic Search Facilitator: Concept and Current State of Development

Test 1:Test 1:Test 1:Test 1:Enhanced query: ("phellem" OR "bobfloat" OR "bobber" OR "cork" OR "bob") ("mousepad" OR ("phellem" OR "bobfloat" OR "bobber" OR "cork" OR "bob") ("mousepad" OR

"mouse mat")"mouse mat")

Initial query: cork mousepadcork mousepad

Page 47: Semantic Search Facilitator: Concept and Current State of Development

Test 2:Test 2:Test 2:Test 2:Initial query: flowers present shopflowers present shop

Page 48: Semantic Search Facilitator: Concept and Current State of Development

Test 2:Test 2:Test 2:Test 2:Enhanced query: ("flower") (-"heyday" -"prime" -"efflorescence") ("present") (-"nowadays" ("flower") (-"heyday" -"prime" -"efflorescence") ("present") (-"nowadays"

-"present tense") ("store" OR "shop") (-"workshop") -"present tense") ("store" OR "shop") (-"workshop")

Initial query: flowers present shopflowers present shop

Page 49: Semantic Search Facilitator: Concept and Current State of Development

Test 3:Test 3:Test 3:Test 3:Initial query: hotel reservation agencyhotel reservation agency

Page 50: Semantic Search Facilitator: Concept and Current State of Development

Test 3:Test 3:Test 3:Test 3:Enhanced query: ("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency") ("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency")

(-"means") (-"means")

Initial query: hotel reservation agencyhotel reservation agency

Page 51: Semantic Search Facilitator: Concept and Current State of Development

Test 4:Test 4:Test 4:Test 4:Initial query: zodiac fishzodiac fish

Page 52: Semantic Search Facilitator: Concept and Current State of Development

Test 4:Test 4:Test 4:Test 4:Enhanced query: ("zodiac") ("pisces" OR "fish" OR "pisces the fishes") ("zodiac") ("pisces" OR "fish" OR "pisces the fishes")

Initial query: zodiac fishzodiac fish

Page 53: Semantic Search Facilitator: Concept and Current State of Development

DrawbacksDrawbacksDrawbacksDrawbacks

• Lack highly specialized terminology for narrow domains in WordNet => difficult to get better results with SSA in such cases

• Frequent absence of sense relation between words in whole phrases => difficulty of context determination by used algorithms

• Presence of several very close senses for many terms in WordNet => no clear belonging of the word to some sense

• Possible wrong determination of part of speech for word in query => using improper synonyms and antonyms for making query

Page 54: Semantic Search Facilitator: Concept and Current State of Development

Possible Improvements and Possible Improvements and further workfurther work

Possible Improvements and Possible Improvements and further workfurther work

• Additional Adaptive Learning (for personalized context definition)

• Creating Global Sense Ontology on the basis of WordNet Database

• Improving algorithms for automatic computing of relevance indexes

• Adding algorithms for smart cutting off for generated queries

• Using fuzzy logic for determination of query context• Adding other lexical databases for supporting search in

specific domains (like programming, medicine)• Multilingual support

Page 55: Semantic Search Facilitator: Concept and Current State of Development

Current statusCurrent statusCurrent statusCurrent status• During Jan-May 2004 main efforts for the InBCT

“Semantic Search Facilitator” project were put into the research and design of the basic features of SSA and implementation of ontology-based search method.

• The development of the prototype Semantic Search Assistant software has been started and pilot version is ready.

• Starting 1.06.2004 kernel part of the Industrial Ontologies Group start working on TEKES project “SmartResource”: Proactive Self-Maintained Resources in Semantic Web

at Agora Center, University of Jyväskylä• Further development (from the point of stability and

usability) of SSA will be continued during Jul-Sep 2004