outline

53
Personalization in Digital Personalization in Digital Library: Library: An Intelligent Service based An Intelligent Service based on on Semantic User Profiles Semantic User Profiles Department of Computer Science University of Bari - Italy 3 rd Italian Research Conference on Digital Library Systems Padova, Italy, 29-30 January 20 Giovanni Semeraro Pasquale Lops Marco Degemmis Pierpaolo Basile Annalisa Gentile

Upload: kirestin-tillman

Post on 03-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Outline. Motivation Information overload in a scientific congress scenario Conference Participant Advisor Service Profile-driven paper recommending User Profiles as Bayesian Text Classifiers User Profiles learned from documents semantically indexed through a WSD procedure [*] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Outline

Personalization in Digital Library:Personalization in Digital Library:

An Intelligent Service based onAn Intelligent Service based on

Semantic User ProfilesSemantic User Profiles

Department of Computer Science University of Bari - Italy

3rd Italian Research Conference on

Digital Library Systems Padova, Italy, 29-30 January 2007

Giovanni Semeraro Pasquale Lops

Marco Degemmis Pierpaolo Basile

Annalisa Gentile

Page 2: Outline

2

OutlineOutline

Motivation

Information overload in a scientific congress scenario Conference Participant Advisor Service

Profile-driven paper recommending User Profiles as Bayesian Text ClassifiersUser Profiles learned from documents semantically

indexed through a WSD procedure [*] Empirical Evaluation Conclusions and Future Work

[*] Combining Learning and Word Sense Disambiguation for Intelligent User Profiling - IJCAI 2007

Page 3: Outline

3

MotivationMotivation

Information overload in the scientific congress scenario

Page 4: Outline

4

MotivationMotivation

Information overload in the scientific congress scenario

Page 5: Outline

5

Personalized systems adapt their behavior to individual users by learning user profilesStructured model of the user interestsExploitable for providing personalized content and

services

Personalization usually done automatically based on the user profile and possibly the profiles of other users with similar interests (collaborative approach)

How personalization can be used in the scientific congress scenario?

Web PersonalizationWeb Personalization

Page 6: Outline

6

Web Personalization in the Web Personalization in the scientific congress scenarioscientific congress scenario

Learn research interests of participants from papers they rated

Store research interests in personal profiles Used to build personalized programs delivered to

participants

Page 7: Outline

7

OUR STRATEGY

content-based recommendations by

learning from TEXT

and USER FEEDBACK on items

Learning User Profiles as a Learning User Profiles as a Text Categorization problemText Categorization problem

Page 8: Outline

8

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

MULTI-WORD CONCEPTS

Keyword-based profiles: Keyword-based profiles: problemsproblems

Page 9: Outline

9

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

SYNONYMY

Keyword-based profiles: Keyword-based profiles: problemsproblems

Page 10: Outline

10

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

POLYSEMY

Keyword-based profiles: Keyword-based profiles: problemsproblems

Page 11: Outline

11

Advanced NLP techniques used to represent documents

Naïve Bayes text classification to assign a score (level of interest) to items according to the user preferences

Result: semantic user profile - as a binary text classifier (user-likes and user-dislikes) - containing the probabilistic model of user preferences

ITem Recommender (ITR)ITem Recommender (ITR)

Page 12: Outline

12

ITem Recommender (ITR)ITem Recommender (ITR)

Page 13: Outline

13

Word Sense Word Sense Disambiguation (WSD)Disambiguation (WSD)

Process of deciding which sense of a word is used in a specific context

WordNet as sense inventorynouns, verbs, adverbs and adjectives organized

into SYNonym SETs (synset), each one representing an underlying lexical concept

change of text representation from vectors (bag) of words (BOW) into vectors (bag) of synsets (BOS)

Page 14: Outline

14

JIGSAW WSD algorithmJIGSAW WSD algorithm

Three different strategies to disambiguate nouns, verbs, adjectives and adverbsEffectiveness of WSD strongly influenced by the

POS tag of the target word

Input: d = {w1, w2, …. , wh} document

Output: X = {s1, s2, …. , sk} (kh)

Each si obtained by disambiguating wi based on the context of each word

Some words not recognized by WordNet Groups of words recognized as a single concept

Page 15: Outline

15

JIGSAWJIGSAWnounsnouns: The idea: The idea

Adaptation of the Resnik algorithm Semantic similarity between synsets inversely

proportional to their distance in the WordNet IS-A hierarchy Path length similarity between synsets used to

assign scores to the candidate synsets of a polysemous word

Page 16: Outline

16

Synset Semantic Similarity Synset Semantic Similarity

SINSIM(cat,mouse) =

-log(5/32)=0.806

Placental mammal

Carnivore Rodent

Feline, felid

Cat(feline mammal)

Mouse(rodent)

1

2

3 4

5

Leacock-Chodorow similarity

Page 17: Outline

17

JIGSAWJIGSAWnounsnouns

w = cat C = {mouse}white hunt mousecat

mousecat mousemouse

02244530: any of numerous small

rodents…

03651364: a hand-operated electronic

device …

cat

“The white cat is hunting the mouse”

02037721: feline mammal…

00847815: computerized axial

tomography…

T={02244530,03651364}

Wcat={02037721,00847815}

Page 18: Outline

18

w = cat C = {mouse}white hunt

cat mousemouse

02244530: any of numerous small

rodents…

03651364: a hand-operated electronic

device …

cat

T={02244530,03651364}

“The white cat is hunting the mouse”

02037721: feline mammal…

00847815: computerized axial

tomography…

Wcat={02037721,00847815}

0.107

0.0

0.0

0.8060.8060.806

JIGSAWJIGSAWnounsnouns

Page 19: Outline

19

Description of synset si = gloss + example phrases in WordNet for si

GlosseGlossess

JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription

Page 20: Outline

20

Description of synset si = gloss + example phrases in WordNet for si

Example phrasesExample phrases

JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription

Page 21: Outline

21

JIGSAWJIGSAWverbsverbs: The idea: The idea

It tries to establish a relation between verbs and nounsNot directly linked in WordNet

Verb w disambiguated using:nouns in the context of wnouns into the description of each candidate

synset for w

Page 22: Outline

22

1. (70) play -- (participate in games or sport; "We played hockey all afternoon"; "play cards"; "Pele played for the Brazilian teams in many important matches")

2. (29) play -- (play on an instrument; "The band played all night long")

3. …

JIGSAWJIGSAWverbsverbs: Example (1/4): Example (1/4)

nouns(play,1): game, sport, hockey, afternoon, card, team, match

w=play N={basketball,

soccer}

nouns(play,2): instrument, band, night

nouns(play,35): …

I play basketball and soccer

Page 23: Outline

23

nouns(play,1): game, sport, hockey, afternoon, card, team, match

JIGSAWJIGSAWverbsverbs: Example (2/4): Example (2/4)

game

game1

game2

gamek

…basketball

basketball1

basketballh

MAXbasketball = MAXi SinSim(wi,basketball)

winouns(play,1)

w=play N={basketball,

soccer}

sport

sport1

sport2

sportk

Page 24: Outline

24

nouns(play,1): game, sport, hockey, afternoon, card, team, match

JIGSAWJIGSAWverbsverbs: Example (3/4): Example (3/4)

game

game1

game2

gamek

…soccer

soccer1

soccerh

MAXsoccer = MAXi SinSim(wi, soccer) winouns(play,1)

w=play N={basketball,

soccer}

sport

sport1

sport2

sportk

Page 25: Outline

25

JIGSAWJIGSAWverbsverbs: Example (4/4): Example (4/4)

nouns(play,1)

MAXsoccer

MAXbasketball Φ (play,1)= Weighted average of MAX values taking into account the position of each word in the context wrt the verb

nouns(play,i) Φ (play,i)

... ...

Synset assigned to “Synset assigned to “play”play” = argmax = argmax Φ (play,i)Φ (play,i) ii

Page 26: Outline

26

Based on the Lesk algorithm Similarity between the glosses of each candidate

sense of target word and the glosses of words in the context

JIGSAWJIGSAWothersothers

Page 27: Outline

27

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (1/5)Example (1/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Candidate synsets for the target wordCandidate synsets for the target word

Page 28: Outline

28

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Keep glosses of candidate synsetsKeep glosses of candidate synsets

Page 29: Outline

29

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Keep glosses of each word in the contextKeep glosses of each word in the context

Page 30: Outline

30

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

1. {07784932} wine, vino -- (fermented juice (of grapes especially))

2. {04907195} wine, wine-colored -- (a red as dark as red wine)

Page 31: Outline

31

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (3/5)Example (3/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

1. {07784932} wine, vino -- (fermented juice (of grapes especially)) 2. {04907195} wine, wine-colored -- (a red as dark as red wine)

+

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

= Gloss of the whole Gloss of the whole contextcontext

Page 32: Outline

32

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

Overlap between GlossesOverlap between Glosses

No overlapNo overlap

Page 33: Outline

33

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

OverlapOverlap

Page 34: Outline

34

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (5/5)Example (5/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

selected synset: 01546830 selected synset: 01546830

Page 35: Outline

35

Paper RecommendingPaper Recommending

Instance

(paper)

Instance

(paper)

AbstractAbstract

AuthorsAuthors

TitleTitle

Tokenization + Stopword +Stemming

Keyword-based representation (BOW)

Tokenization + Stopword +POS + disambiguation

Sense-based representation (BOS)

content-based recommendations by learning from TEXT and USER RATINGS

(1-5) on papers

Page 36: Outline

36

An example of An example of BOS-generated ProfileBOS-generated Profile

Page 37: Outline

37

Conference Participant Conference Participant Advisor: LoginAdvisor: Login

Conference Participant Advisor service

Page 38: Outline

38

Conference Participant Advisor: Conference Participant Advisor: Selecting Papers to train the systemSelecting Papers to train the system

Page 39: Outline

39

Conference Participant Advisor: Conference Participant Advisor: Query disambiguationQuery disambiguation

Page 40: Outline

40

Conference Participant Advisor: Conference Participant Advisor: Rating Retrieved PapersRating Retrieved Papers

Page 41: Outline

41

Conference Participant Advisor: Conference Participant Advisor: Getting the Personalized ProgramGetting the Personalized Program

Page 42: Outline

42

Personalized Program Personalized Program delivered by maildelivered by mail

1 - personalized conference program

2 - details about recommended papers

Page 43: Outline

43

Conference Participant Advisor: Conference Participant Advisor: Personalized Program + Paper detailsPersonalized Program + Paper details

Page 44: Outline

44

Experimental EvaluationExperimental Evaluation

Experiments: BOW-generated profiles vs. BOS-generated profiles

ISWC dataset 100 papers accepted at ISWC 02-03 288 ratings collected by 11 users

5-fold stratified cross-validation Precision, Recall, F-measure, NDPM

Paper relevant if rating >3 Probability of class “likes” >0.5

Wilcoxon signed rank test Classification for each user is a trial Low number of independent trials Significance level p < 0.05

Page 45: Outline

45

Results of Semantic Profiles Results of Semantic Profiles EvaluationEvaluation

UserIdUserIdPrecisionPrecision RecallRecall F1F1 NDPMNDPM

BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS

11 0.570.57 0.550.55 0.470.47 0.500.50 0.510.51 0.530.53 0.600.60 0.560.56

22 0.730.73 0.550.55 0.700.70 0.830.83 0.720.72 0.670.67 0.430.43 0.460.46

33 0.600.60 0.570.57 0.350.35 0.350.35 0.440.44 0.430.43 0.550.55 0.590.59

44 0.600.60 0.530.53 0.300.30 0.430.43 0.400.40 0.480.48 0.470.47 0.470.47

55 0.580.58 0.670.67 0.650.65 0.530.53 0.610.61 0.590.59 0.390.39 0.590.59

66 0.930.93 0.960.96 0.830.83 0.830.83 0.880.88 0.890.89 0.460.46 0.360.36

77 0.550.55 0.900.90 0.600.60 0.600.60 0.580.58 0.720.72 0.450.45 0.480.48

88 0.740.74 0.650.65 0.630.63 0.620.62 0.680.68 0.630.63 0.370.37 0.330.33

99 0.600.60 0.540.54 0.630.63 0.730.73 0.620.62 0.620.62 0.310.31 0.270.27

1010 0.500.50 0.700.70 0.370.37 0.500.50 0.420.42 0.580.58 0.510.51 0.480.48

1111 0.550.55 0.450.45 0.830.83 0.700.70 0.670.67 0.550.55 0.380.38 0.330.33

MeanMean 0.630.63 0.640.64 0.580.58 0.600.60 0.590.59 0.610.61 0.450.45 0.450.45

+1% +2% +2% =

Page 46: Outline

46

Conclusions & Future WorksConclusions & Future Works

Conference Participant Advisor Intelligent service relying on concept-based profiles WSD based on linguistic ontology

As a future work integration of: domain-specific ontologies in the process of semantic

representation and indexing of documents social networks of conference participants as additional

source of information

Page 47: Outline

47

Service detailsService details

Service deployed in VIKEF project at: http://193.204.187.223:8080/iswc_rebuild/

Page 48: Outline

48

Page 49: Outline

49

Backup slidesBackup slides

Page 50: Outline

50

Bag of WordsBag of Words Bag of SynsetsBag of SynsetsDoc_iDoc_i

ddWord Word FormForm Occurr.Occurr.

3131 artificialartificial 11

3131 intelligenceintelligence 11

…… …… ……

11341134 WWWWWW 33

11341134 webweb 22

…… …… ……

Bag of SynsetsBag of Synsets

Reduction of features Recognition of bigrams Synonyms represented by the same synsets

Doc_idDoc_id Word Word FormForm

Synset_iSynset_idd

OccurrencOccurrencee

3131artificial artificial

intelligencintelligencee

67125686712568 11

…… …… ……

11341134 rollroll 20517202051720 33

11341134 wheelwheel 20517202051720 22

…… …… ……

11341134 WWW,webWWW,web 0442551704425517 55

Page 51: Outline

51

Classification Phase Classification Phase

Each document is represented as a vector of BOS, one for each slot

Each slot is independent from the others

),|()(

)()|(

||

1

||

1

S

m

b

k

nmjk

i

jij

im

kimsctPdP

cPdcP

S = {s1, s2, …, s|S|} is the set of slots

tk is the kth token (occurring nkim times in BOS bim)

bim is the BOS in slot sm of instance di

Page 52: Outline

52

Training PhaseTraining Phase

2||

1)(ˆ

||

1

TRcP

TR

i

ij

j

),|(ˆmjk sctP

C = {c+, c-} C+ likes (ratings 4-5) C– dislikes (ratings 1-2) (3 is neutral)

User ratUser ratingings rs ri i Weighted Instances Weighted Instances iiii

MAX

r

11

1

||

1

),,(TR

ikim

ijmjk nsctN

0),,( if

),,(

),,(

1

mjkV

hmjhc

mjk sctN

sctNV

sctNjc

j

0),,( if 1

),,(1

mjk

jcV

hmjh

jc sctNVV

sctNV

V

jc

jc

Page 53: Outline

53

EvaluationEvaluation

JIGSAW evaluated on SENSEVAL-3 English Sample task: 37.6% Precision

JIGSAW evaluated on SENSEVAL-3 English All Word task: 52% Precision

Algorithm Precision

Lesk-based (nouns)

0.246

Lesk-based (verbs)

0.295

Lesk-based (adjectives)

0.403

JIGSAWnouns 0.319

JIGSAWverbs 0.405

JIGSAWothers 0.403

SENSEVAL-3 English Sample task