vishnu kanth reddy challam - ittc home...vishnu kanth reddy challam master’s thesis defense date:...

39
University of Kansas Contextual Information Retrieval Using Ontology-Based User Profiles Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22 nd , 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

Upload: others

Post on 05-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

1University of Kansas

Contextual Information Retrieval Using Ontology-Based User Profiles

Vishnu Kanth Reddy ChallamMaster’s Thesis DefenseDate: Jan 22nd, 2004.

CommitteeDr. Susan Gauch(Chair)

Dr.David AndrewsDr. Jerzy W.Grzymala-Busse

Page 2: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

2University of Kansas

Presentation Outline

Search Engines TodaySearch Engine Personalization ContributionsOur Approach for Contextual IRExperiments and EvaluationConclusions and Future Work

Page 3: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

3University of Kansas

Search Engines TodayReturn results based on simple key-word matches. No regard for conceptual information.

For E.g. : If the query is “SALSA” Is it………

Page 4: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

4University of Kansas

Search Engines Today Contd..

What is the user looking for?No personalization mechanism to understand the information needs of the user.

Page 5: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

5University of Kansas

Search Engine Personalization…How?

Collect and represent information about the user.Use this information to either filter or re-rank the results returned from the initial retrieval process or directly use this information in the search process.

Page 6: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

6University of Kansas

Search Engine Personalization ..Challenges

How can accurate information about the user’s interests be collected and represented?How can we use this information to deliver personalized search results?

Page 7: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

7University of Kansas

Contributions….

We present a novel-approach to personalizing search engines using ontology-based contextual user profiles.Studied the effect of conceptual ranking versus original keyword based ranking.Studied the usage of multiple sources of information to build the user’s contextual profile.

Page 8: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

8University of Kansas

Related Work

Semantic WebExplicitly state meaning of content using Knowledge Representation LanguagesDomain specific effortsWeb is democratic!

Page 9: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

9University of Kansas

Design Criteria

Monitor and store user information on the client machine or the server.Short term vs. Long termWith server side profiling, privacy is an issue.Instantaneous information needs are hard to satisfy.

Page 10: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

10University of Kansas

Contextual SearchNo long term user profilesBuild contextual profiles that capture the information needs of the user at the time they conduct search…TASK ORIENTEDUpload the contextual profile to the server.Privacy

Page 11: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

11University of Kansas

How to Build Contextual Profiles?

Monitor the activity of the user on his/her Windows machine. Capture content from Word documents,Web pages, Chat transcripts etc..Classify the captured content to build a contextual profile

Page 12: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

12University of Kansas

Monitoring the User Activity

A Windows application that runs in the background.Captured text from open Word, IE, MSN Chat windows.Stored the captured content in a special folder on the clients machine.Content is assigned a time-stamp.

Page 13: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

13University of Kansas

Text Classification

Classifier works in 2 phases: training and classification.Training Phase:

Classifier is given a series of documents classified manually.Learns about the features (vocabulary) of the various categories into which the text might be classified.

Page 14: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

14University of Kansas

Text Classification Contd…

Classification phase:Classifier, classifies the input text and assigns it to a particular category based on similarity between the features of input text and those extracted from training data.

Page 15: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

15University of Kansas

Text Classification : Our Approach

Vector-space model (tf-idf model).Training data are the documents manually assigned into categories of the Standard Tree which is our reference ontology. Classifier creates a vector of vocabulary terms and weights associated with the category in an inverted file.

Page 16: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

16University of Kansas

Standard Tree

Page 17: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

17University of Kansas

Text Classification: Our Approach Contd..

During classification phase, vector of input document is created.Degree of similarity between training vectors and input document vector calculated using dot product of the vectors.Best matches are the concepts into which the input document is assigned.

Page 18: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

18University of Kansas

Building Contextual User Profile

Content created/viewed within a specific time window is classified.The classifier represents the user’s contextual profile for the time window as a weighted ontology. Weight of a concept in the ontology represents the amount of information recently viewed/created that was classified into that concept.

Page 19: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

19University of Kansas

Sample Contextual User ProfileCategory-id, WeightCategory-id used to identify the concept in Standard Tree.26878 is Top/Science/Environment/Water_Resources

Page 20: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

20University of Kansas

Personalizing Search Results Using Contextual User ProfilesResults are re-ranked using a combination of the original rank and their conceptual rankSimilarity of the documents to the contextual profile is used to calculate the conceptual rank

Page 21: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

21University of Kansas

Conceptual RankDocument’s title and summary are classified to create the document profile.Document profile is compared to the contextual profile to calculate the conceptual similarity between document and user’s context.

wherewtik = Weight of Conceptk in Contextiwtjk = Weight of Conceptk in documentj

jk

N

kikji wtwtdoccontextsim ∗= ∑

=1

),(

Page 22: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

22University of Kansas

Final Rank

α has a value between 0 and 1 Varying the values of α between 0 and 1 conceptual and keyword ranks can be weighted differently.

Rank Keyword)-(1 Rank Conceptual* Rank Final ∗+= αα

Page 23: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

23University of Kansas

Experiments and Evaluation

Wrapper around Google built using Google API.Google Wrapper builds a log of:

1. Queries given by user2. Results & ranks returned by Google3. Result clicked by the user4. Title & Summaries

Randomizes the results returned by Google before displaying them to the user.

Page 24: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

24University of Kansas

Google Wrapper

Page 25: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

25University of Kansas

Experiments5 users asked to write essays on topics ranging from car buying, labs at ITTC to jewelry.Windows application monitored their activityQueries issued to Google WrapperResult clicked by the user was used as a form of implicit user relevance for analysis.

Page 26: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

26University of Kansas

Experiments Contd..Log of 50 queries.6 had to be filtered out. 44 queries analyzedEvaluate number of concepts for the user’s contextual profile, the document profile and the value of α for blending original and conceptual ranks.Analysis based on average rank of the result clicked by the user in our conceptual search engine and baseline system Google.

Page 27: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

27University of Kansas

EvaluationProfile built from content of Word documents alone32 queries analyzedVaried the number of concepts for the user profile and the document profile.Average Google Rank is 4.84Best average conceptual rank is 4.68

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

5.2

Google Top 2Concepts

Fr omDocumentPr of i l e

Top 5Concepts

Fr omDocumentPr of i l e

Top 7Concepts

Fr omDocumentPr of i l e

Al l ConceptsFr om

DocumentPr of i l e

Aver

age

Ran

k

Top 10 Conceptsf rom User Prof ile

Top 20 Conceptsf rom User Prof ile

Top 30 Conceptsf rom User Prof ile

A ll concepts f romUser Prof ile

Page 28: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

28University of Kansas

Evaluation Contd…Final Rank calculated using the formula

FR = α*CR +(1-α)*KRBest final rank of 4.59 when α = 0.45.16 percent improvement over Google’s rank of 4.84Contextual information from Word documents can be used to improve web queries.

4.45

4.5

4.55

4.6

4.65

4.7

4.75

4.8

4.85

4.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Value

Fina

l Ran

k

Page 29: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

29University of Kansas

Evaluation Contd…Profile built from content of Web pages alone31 queries analyzedVaried the number of concepts for the user profile and the document profile.Average Google Rank is 4.58Best average conceptual rank is 4.74( 30 concepts for contextual profile and all concepts for document profile)

0

1

2

3

4

5

6

Google Top 2Concept s

FromDocument

Prof i le

Top 5Concept s

FromDocument

Prof i le

Top 7Concept s

FromDocument

Prof i le

AllConcept s

FromDocument

Prof i le

Aver

age

Ran

k

Top 10 ConceptsFrom User Pro f ile

Top 20 Conceptsf rom User Pro f ile

Top 30 Conceptsf rom User Pro f ile

A ll Concepts f romUser Prof ile

Page 30: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

30University of Kansas

Evaluation Contd…Final Rank calculated using the formula

FR = α*CR +(1-α)*KRBest final rank of 4.22 when α = 0.47.86 percent improvement over Google’s rank of 4.74Contextual information from Web Pages can be used to improve web queries.

3.94

4.14.24.34.44.54.64.74.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

Page 31: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

31University of Kansas

Evaluation Contd…

Profile built by combining content of Web pages and Word Documents.Final Profile = β *Word Profile + (1 - β) * Web Profileβ has values between 0 and 1

Page 32: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

32University of Kansas

Evaluation Contd….

Effect of α and β22 queries analyzedBest Conceptual Rank 4.36 when α is 0.8 and β is 0.115% improvement over Google’s rank!

4

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

Bet a = 0

Bet a = 0.1

Bet a = 0.2

Bet a = 0.3

Bet a = 0.4

Bet a = 0.5

Bet a = 0.6

Bet a = 0.7

Bet a = 0.8

Bet a = 0.9

Bet a = 1

Page 33: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

33University of Kansas

Evaluation Contd…Effect of α on final rankHigh value of α indicates that conceptual rank should be given more importance.Re-ranking among top 10, all of them match the user’s query equally well.Primary distinguishing factor is conceptual similarity to contextual profile.

4.24.34.44.54.64.74.84.9

55.15.2

0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alpha Values

Aver

age

Ran

k

Page 34: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

34University of Kansas

Evaluation Contd…Effect of β on final rankβ values between 0.1 and 0.5 produce roughly comparable results.Increased importance of Web content maybe because Word documents were short.If more content available in Word documents a higher value of β might have been observed.

4.3

4.4

4.5

4.6

4.7

4.8

4.9

5

5.1

5.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Beta values

Aver

age

Ran

k

Page 35: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

35University of Kansas

ConclusionsContextual profiles improve Web searches.15% improvement over Google when profile is built by combining content from Word documents and Web pagesWithin top 10 results of Google, re-ranking should be done giving more weight to conceptual similarity between documents and the contextual profile

Page 36: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

36University of Kansas

Conclusions Contd..All users were expert search engine users. Query length was long. Longer queries tend to disambiguate themselves.System performs better for shorter queries more common on the Web as a whole

Page 37: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

37University of Kansas

Future Work

Best time window within which documents captured should be included in the contextual profileAnalyze content from other sources like Chat transcripts, Excel spreadsheets, PowerPoint slides etc..Combination of user’s current context, long and short term interests.

Page 38: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

38University of Kansas

Questions or Comments

?? or !!

Page 39: Vishnu Kanth Reddy Challam - ITTC HOME...Vishnu Kanth Reddy Challam Master’s Thesis Defense Date: Jan 22nd, 2004. Committee Dr. Susan Gauch(Chair) Dr.David Andrews Dr. Jerzy W.Grzymala-Busse

39University of Kansas

Thank You!