outline
DESCRIPTION
Outline. Motivation Information overload in a scientific congress scenario Conference Participant Advisor Service Profile-driven paper recommending User Profiles as Bayesian Text Classifiers User Profiles learned from documents semantically indexed through a WSD procedure [*] - PowerPoint PPT PresentationTRANSCRIPT
Personalization in Digital Library:Personalization in Digital Library:
An Intelligent Service based onAn Intelligent Service based on
Semantic User ProfilesSemantic User Profiles
Department of Computer Science University of Bari - Italy
3rd Italian Research Conference on
Digital Library Systems Padova, Italy, 29-30 January 2007
Giovanni Semeraro Pasquale Lops
Marco Degemmis Pierpaolo Basile
Annalisa Gentile
2
OutlineOutline
Motivation
Information overload in a scientific congress scenario Conference Participant Advisor Service
Profile-driven paper recommending User Profiles as Bayesian Text ClassifiersUser Profiles learned from documents semantically
indexed through a WSD procedure [*] Empirical Evaluation Conclusions and Future Work
[*] Combining Learning and Word Sense Disambiguation for Intelligent User Profiling - IJCAI 2007
3
MotivationMotivation
Information overload in the scientific congress scenario
4
MotivationMotivation
Information overload in the scientific congress scenario
5
Personalized systems adapt their behavior to individual users by learning user profilesStructured model of the user interestsExploitable for providing personalized content and
services
Personalization usually done automatically based on the user profile and possibly the profiles of other users with similar interests (collaborative approach)
How personalization can be used in the scientific congress scenario?
Web PersonalizationWeb Personalization
6
Web Personalization in the Web Personalization in the scientific congress scenarioscientific congress scenario
Learn research interests of participants from papers they rated
Store research interests in personal profiles Used to build personalized programs delivered to
participants
7
OUR STRATEGY
content-based recommendations by
learning from TEXT
and USER FEEDBACK on items
Learning User Profiles as a Learning User Profiles as a Text Categorization problemText Categorization problem
8
AI is a branch of computer science
doc1
the 2007 International Joint Conference on Artificial Intelligence will be held in India
doc2
apple launches a new product…
doc3
artificial
0.02
intelligence 0.01
apple 0.13
AI 0.15
…
USER PROFILE
MULTI-WORD CONCEPTS
Keyword-based profiles: Keyword-based profiles: problemsproblems
9
AI is a branch of computer science
doc1
the 2007 International Joint Conference on Artificial Intelligence will be held in India
doc2
apple launches a new product…
doc3
artificial
0.02
intelligence 0.01
apple 0.13
AI 0.15
…
USER PROFILE
SYNONYMY
Keyword-based profiles: Keyword-based profiles: problemsproblems
10
AI is a branch of computer science
doc1
the 2007 International Joint Conference on Artificial Intelligence will be held in India
doc2
apple launches a new product…
doc3
artificial
0.02
intelligence 0.01
apple 0.13
AI 0.15
…
USER PROFILE
POLYSEMY
Keyword-based profiles: Keyword-based profiles: problemsproblems
11
Advanced NLP techniques used to represent documents
Naïve Bayes text classification to assign a score (level of interest) to items according to the user preferences
Result: semantic user profile - as a binary text classifier (user-likes and user-dislikes) - containing the probabilistic model of user preferences
ITem Recommender (ITR)ITem Recommender (ITR)
12
ITem Recommender (ITR)ITem Recommender (ITR)
13
Word Sense Word Sense Disambiguation (WSD)Disambiguation (WSD)
Process of deciding which sense of a word is used in a specific context
WordNet as sense inventorynouns, verbs, adverbs and adjectives organized
into SYNonym SETs (synset), each one representing an underlying lexical concept
change of text representation from vectors (bag) of words (BOW) into vectors (bag) of synsets (BOS)
14
JIGSAW WSD algorithmJIGSAW WSD algorithm
Three different strategies to disambiguate nouns, verbs, adjectives and adverbsEffectiveness of WSD strongly influenced by the
POS tag of the target word
Input: d = {w1, w2, …. , wh} document
Output: X = {s1, s2, …. , sk} (kh)
Each si obtained by disambiguating wi based on the context of each word
Some words not recognized by WordNet Groups of words recognized as a single concept
15
JIGSAWJIGSAWnounsnouns: The idea: The idea
Adaptation of the Resnik algorithm Semantic similarity between synsets inversely
proportional to their distance in the WordNet IS-A hierarchy Path length similarity between synsets used to
assign scores to the candidate synsets of a polysemous word
16
Synset Semantic Similarity Synset Semantic Similarity
SINSIM(cat,mouse) =
-log(5/32)=0.806
Placental mammal
Carnivore Rodent
Feline, felid
Cat(feline mammal)
Mouse(rodent)
1
2
3 4
5
Leacock-Chodorow similarity
17
JIGSAWJIGSAWnounsnouns
w = cat C = {mouse}white hunt mousecat
mousecat mousemouse
02244530: any of numerous small
rodents…
03651364: a hand-operated electronic
device …
cat
“The white cat is hunting the mouse”
02037721: feline mammal…
00847815: computerized axial
tomography…
T={02244530,03651364}
Wcat={02037721,00847815}
18
w = cat C = {mouse}white hunt
cat mousemouse
02244530: any of numerous small
rodents…
03651364: a hand-operated electronic
device …
cat
T={02244530,03651364}
“The white cat is hunting the mouse”
02037721: feline mammal…
00847815: computerized axial
tomography…
Wcat={02037721,00847815}
0.107
0.0
0.0
0.8060.8060.806
JIGSAWJIGSAWnounsnouns
19
Description of synset si = gloss + example phrases in WordNet for si
GlosseGlossess
JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription
20
Description of synset si = gloss + example phrases in WordNet for si
Example phrasesExample phrases
JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription
21
JIGSAWJIGSAWverbsverbs: The idea: The idea
It tries to establish a relation between verbs and nounsNot directly linked in WordNet
Verb w disambiguated using:nouns in the context of wnouns into the description of each candidate
synset for w
22
1. (70) play -- (participate in games or sport; "We played hockey all afternoon"; "play cards"; "Pele played for the Brazilian teams in many important matches")
2. (29) play -- (play on an instrument; "The band played all night long")
3. …
JIGSAWJIGSAWverbsverbs: Example (1/4): Example (1/4)
nouns(play,1): game, sport, hockey, afternoon, card, team, match
w=play N={basketball,
soccer}
nouns(play,2): instrument, band, night
nouns(play,35): …
…
I play basketball and soccer
23
nouns(play,1): game, sport, hockey, afternoon, card, team, match
JIGSAWJIGSAWverbsverbs: Example (2/4): Example (2/4)
game
game1
game2
gamek
…basketball
basketball1
basketballh
MAXbasketball = MAXi SinSim(wi,basketball)
winouns(play,1)
w=play N={basketball,
soccer}
sport
sport1
sport2
sportk
…
…
24
nouns(play,1): game, sport, hockey, afternoon, card, team, match
JIGSAWJIGSAWverbsverbs: Example (3/4): Example (3/4)
game
game1
game2
gamek
…soccer
soccer1
soccerh
MAXsoccer = MAXi SinSim(wi, soccer) winouns(play,1)
w=play N={basketball,
soccer}
sport
sport1
sport2
sportk
…
25
JIGSAWJIGSAWverbsverbs: Example (4/4): Example (4/4)
nouns(play,1)
MAXsoccer
MAXbasketball Φ (play,1)= Weighted average of MAX values taking into account the position of each word in the context wrt the verb
nouns(play,i) Φ (play,i)
... ...
Synset assigned to “Synset assigned to “play”play” = argmax = argmax Φ (play,i)Φ (play,i) ii
26
Based on the Lesk algorithm Similarity between the glosses of each candidate
sense of target word and the glosses of words in the context
JIGSAWJIGSAWothersothers
27
1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")
2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")
…
JIGSAWJIGSAWothersothers::Example (1/5)Example (1/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
Candidate synsets for the target wordCandidate synsets for the target word
28
1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")
2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")
…
JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
Keep glosses of candidate synsetsKeep glosses of candidate synsets
29
1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)
2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …
JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
Keep glosses of each word in the contextKeep glosses of each word in the context
30
1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)
2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …
JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
1. {07784932} wine, vino -- (fermented juice (of grapes especially))
2. {04907195} wine, wine-colored -- (a red as dark as red wine)
31
1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)
2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …
JIGSAWJIGSAWothersothers::Example (3/5)Example (3/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
1. {07784932} wine, vino -- (fermented juice (of grapes especially)) 2. {04907195} wine, wine-colored -- (a red as dark as red wine)
+
a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine
= Gloss of the whole Gloss of the whole contextcontext
32
1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")
2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")
JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine
Overlap between GlossesOverlap between Glosses
No overlapNo overlap
33
1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")
2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")
JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine
OverlapOverlap
34
1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")
2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")
JIGSAWJIGSAWothersothers::Example (5/5)Example (5/5)
w=aged N={bottle, wine}I bought a bottle of aged
wine
a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine
selected synset: 01546830 selected synset: 01546830
35
Paper RecommendingPaper Recommending
Instance
(paper)
Instance
(paper)
AbstractAbstract
AuthorsAuthors
TitleTitle
Tokenization + Stopword +Stemming
Keyword-based representation (BOW)
Tokenization + Stopword +POS + disambiguation
Sense-based representation (BOS)
content-based recommendations by learning from TEXT and USER RATINGS
(1-5) on papers
36
An example of An example of BOS-generated ProfileBOS-generated Profile
37
Conference Participant Conference Participant Advisor: LoginAdvisor: Login
Conference Participant Advisor service
38
Conference Participant Advisor: Conference Participant Advisor: Selecting Papers to train the systemSelecting Papers to train the system
39
Conference Participant Advisor: Conference Participant Advisor: Query disambiguationQuery disambiguation
40
Conference Participant Advisor: Conference Participant Advisor: Rating Retrieved PapersRating Retrieved Papers
41
Conference Participant Advisor: Conference Participant Advisor: Getting the Personalized ProgramGetting the Personalized Program
42
Personalized Program Personalized Program delivered by maildelivered by mail
1 - personalized conference program
2 - details about recommended papers
43
Conference Participant Advisor: Conference Participant Advisor: Personalized Program + Paper detailsPersonalized Program + Paper details
44
Experimental EvaluationExperimental Evaluation
Experiments: BOW-generated profiles vs. BOS-generated profiles
ISWC dataset 100 papers accepted at ISWC 02-03 288 ratings collected by 11 users
5-fold stratified cross-validation Precision, Recall, F-measure, NDPM
Paper relevant if rating >3 Probability of class “likes” >0.5
Wilcoxon signed rank test Classification for each user is a trial Low number of independent trials Significance level p < 0.05
45
Results of Semantic Profiles Results of Semantic Profiles EvaluationEvaluation
UserIdUserIdPrecisionPrecision RecallRecall F1F1 NDPMNDPM
BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS
11 0.570.57 0.550.55 0.470.47 0.500.50 0.510.51 0.530.53 0.600.60 0.560.56
22 0.730.73 0.550.55 0.700.70 0.830.83 0.720.72 0.670.67 0.430.43 0.460.46
33 0.600.60 0.570.57 0.350.35 0.350.35 0.440.44 0.430.43 0.550.55 0.590.59
44 0.600.60 0.530.53 0.300.30 0.430.43 0.400.40 0.480.48 0.470.47 0.470.47
55 0.580.58 0.670.67 0.650.65 0.530.53 0.610.61 0.590.59 0.390.39 0.590.59
66 0.930.93 0.960.96 0.830.83 0.830.83 0.880.88 0.890.89 0.460.46 0.360.36
77 0.550.55 0.900.90 0.600.60 0.600.60 0.580.58 0.720.72 0.450.45 0.480.48
88 0.740.74 0.650.65 0.630.63 0.620.62 0.680.68 0.630.63 0.370.37 0.330.33
99 0.600.60 0.540.54 0.630.63 0.730.73 0.620.62 0.620.62 0.310.31 0.270.27
1010 0.500.50 0.700.70 0.370.37 0.500.50 0.420.42 0.580.58 0.510.51 0.480.48
1111 0.550.55 0.450.45 0.830.83 0.700.70 0.670.67 0.550.55 0.380.38 0.330.33
MeanMean 0.630.63 0.640.64 0.580.58 0.600.60 0.590.59 0.610.61 0.450.45 0.450.45
+1% +2% +2% =
46
Conclusions & Future WorksConclusions & Future Works
Conference Participant Advisor Intelligent service relying on concept-based profiles WSD based on linguistic ontology
As a future work integration of: domain-specific ontologies in the process of semantic
representation and indexing of documents social networks of conference participants as additional
source of information
47
Service detailsService details
Service deployed in VIKEF project at: http://193.204.187.223:8080/iswc_rebuild/
48
49
Backup slidesBackup slides
50
Bag of WordsBag of Words Bag of SynsetsBag of SynsetsDoc_iDoc_i
ddWord Word FormForm Occurr.Occurr.
3131 artificialartificial 11
3131 intelligenceintelligence 11
…… …… ……
11341134 WWWWWW 33
11341134 webweb 22
…… …… ……
Bag of SynsetsBag of Synsets
Reduction of features Recognition of bigrams Synonyms represented by the same synsets
Doc_idDoc_id Word Word FormForm
Synset_iSynset_idd
OccurrencOccurrencee
3131artificial artificial
intelligencintelligencee
67125686712568 11
…… …… ……
11341134 rollroll 20517202051720 33
11341134 wheelwheel 20517202051720 22
…… …… ……
11341134 WWW,webWWW,web 0442551704425517 55
51
Classification Phase Classification Phase
Each document is represented as a vector of BOS, one for each slot
Each slot is independent from the others
),|()(
)()|(
||
1
||
1
S
m
b
k
nmjk
i
jij
im
kimsctPdP
cPdcP
S = {s1, s2, …, s|S|} is the set of slots
tk is the kth token (occurring nkim times in BOS bim)
bim is the BOS in slot sm of instance di
52
Training PhaseTraining Phase
2||
1)(ˆ
||
1
TRcP
TR
i
ij
j
),|(ˆmjk sctP
C = {c+, c-} C+ likes (ratings 4-5) C– dislikes (ratings 1-2) (3 is neutral)
User ratUser ratingings rs ri i Weighted Instances Weighted Instances iiii
MAX
r
11
1
||
1
),,(TR
ikim
ijmjk nsctN
0),,( if
),,(
),,(
1
mjkV
hmjhc
mjk sctN
sctNV
sctNjc
j
0),,( if 1
),,(1
mjk
jcV
hmjh
jc sctNVV
sctNV
V
jc
jc
53
EvaluationEvaluation
JIGSAW evaluated on SENSEVAL-3 English Sample task: 37.6% Precision
JIGSAW evaluated on SENSEVAL-3 English All Word task: 52% Precision
Algorithm Precision
Lesk-based (nouns)
0.246
Lesk-based (verbs)
0.295
Lesk-based (adjectives)
0.403
JIGSAWnouns 0.319
JIGSAWverbs 0.405
JIGSAWothers 0.403
SENSEVAL-3 English Sample task