확률모델
DESCRIPTION
정보검색시스템 강의노트 강승식교수님TRANSCRIPT
![Page 1: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/1.jpg)
1
2.5.4 Probabilistic Model
Motivation
– Attempt to capture the IR problem within a probabilistic framework
Assumption (Probabilistic Principle)
– Probability of relevance depends on the query and the document representations only
– There is a subset of all documents which the user prefers as the answer set for the query q
– Such an ideal answer set is labeled R and should maximize the overall probability of relevance to the user
– Documents in the set R are predicted to be relevant to the query
– Documents not in this set are predicted to be non-relevant R
![Page 2: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/2.jpg)
23
Probabilistic Model
Definition
qddRP
R
R
RdP
RdP
RPRdP
RPRdP
dRP
dRPqdsim
ww
jj
j
j
j
j
j
jj
iqij
query theorelevant t is document y that theProbabilit :)|(
relevant-non be known to documents ofSet :
relevant be known to documents ofSet :
)|(
)|(~
)()|(
)()|(
)|(
)|(),(
binary all are riables weight vaindex term : }1,0{},1,0{
Using Bayes’ rule documents theall
same theare )(),( RPRP
)(
)()|(
)(
)()|(:'
bP
aPabP
bP
baPbaPruleBayes
Rd jj RdP set thefrom document theselectingrandomly ofy Probabilit:)|(
relevant. is collection entire thefrom selectedrandomly document ay that Probabilit:)(RP
![Page 3: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/3.jpg)
23
Probabilistic Model
Definition (Cont.)
t
i i
i
i
iijiqj
dg idg i
dg idg i
j
RkP
RkP
RkP
RkPwwqsim(d
RkPRkP
RkPRkPqdsim
jiji
jiji
1
0)(1)(
0)(1)(
)|(
)|(1log
)|(1
)|(log~),
)|()|(
)|()|(~),(
Assuming independence of
index terms
Taking logarithms, ignoring constants,
1)|()|( RkPRkP ii
Rii kRkP
set thefrom selectedrandomly document ain present is index term y that theprobabilit the:)|(
Rii kRkP
set thefrom selectedrandomly document ain present not is index term y that theprobabilit the:)|(
![Page 4: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/4.jpg)
23
Probabilistic Model
Initial Probability
Improving Probability
iii
i
i
knN
nRkP
RkP
index term econtain th which documents ofnumber : )|(
5.0)|(
ii
iii
iiiii
ii
iii
kVV
VVNN
nVn
VN
Vn
VN
VnRkP
VN
nV
V
V
V
VRkP
index term econtain th which ofsubset :
retrievedinitially documents ofsubset :11
5.0)|(
1
1
5.0 )|(
Adjustment the
problems for small values of V and Vi
![Page 5: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/5.jpg)
23
Probabilistic Model
Advantage
– Documents are ranked in decreasing order of their probability of being relevant
Disadvantage
– Guess the initial separation of documents into relevant and non-relevant sets
– Binary weights
• Does not take into account the frequency with which an index term occurs inside a document
– Independence assumption for index terms
![Page 6: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/6.jpg)
23
2.5.5 Brief Comparison of Classic Models
Boolean Model– Weakest classic method– Inability to recognize partial matches
Vector Model– Popular retrieval model
Vector Model v.s Probabilistic Model– Croft
• Probabilistic model provides a better retrieval performance
– Salton, Buckley• Vector model is expected to outperform the
probabilistic model with general collections
![Page 7: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/7.jpg)
23
참고 : Probabilistic ModelExample of Probabilistic Retrieval Model
Query : q(k1, k2) 5 documents
Assume d2 and d4 are relevant
d1 d2 d3 d4 d5
k1 1 0 1 1 0
k2 0 1 0 1 1
2
1)|( 1 RkP
3
2)|( 1 RkP 1)|( 2 RkP
3
1)|( 2 RkP
![Page 8: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/8.jpg)
23
Probabilistic Model Example of Probabilistic Model
Q: “gold silver truck”
D1: “Shipment of gold damaged in a fire.”
D2: “Delivery of silver arrived in a silver truck.”
D3: “Shipment of gold arrived in a truck.”
– Assume that documents D2 and D3 are relevant to the query
N= number of documents in the collection
R= number of relevant documents for a given query Q
n= number of documents that contain term t
r= number of relevant documents that contain term t
variable Gold Silver Truck
N 3 3 3
R 2 2 2
n 2 1 2
r 1 1 2
![Page 9: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/9.jpg)
23
Probabilistic Model Example of Probabilistic Model (Cont.)
Computing Term Relevance Weight
Similarity Coefficient for each document
– D1: -0.477, D2: 1.653, D3: 0.699
)5.0)()(
5.0
5.0
5.0log(
rRnN
rn
rR
rtrj
477.03
1log)
5.01223
5.012
5.012
5.01log(:
gold
477.0333.0
1log)
5.01213
5.011
5.012
5.01log(:
silver
176.1333.0
5log)
5.02223
5.022
5.022
5.02log(:
truck
![Page 10: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/10.jpg)
23
참고 : Review of Probability Theory Try to predict whether or not a baseball team (e.g. LA
Dodgers) will win one of its games
–
– Let’s assume the independence of evidences
– By Bayes’ Theorem
–
,75.0)|( sunnywinP 6.0)|( chanhowinP,5.0)( winP
?),|( chanhosunnywinP
,),|( cswP ,)|( swP )|( cwP
),(
)()|,(
),(
),,(),|(
csP
wPwcsP
csP
cswPcswP
)()|,(
)()|,(
),(
)()|,(
),(
)()|,(
1 lPlcsP
wPwcsP
csP
lPlcsP
csP
wPwcsP
We can’t calculate it, so…
Probability
to lose
![Page 11: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/11.jpg)
23
Probabilistic ModelReview of Probability Theory (Cont.)
Independent Assumption
Making Substitutions
)|()|()|,(
)|()|()|,(
lcPlsPlcsP
wcPwsPwcsP
)()|()|(
)()|()|(
)()|,(
)()|,(
1 lPlcPlsP
wPwcPwsP
lPlcsP
wPwcsP
)()|(
)()|(
1 ,
)(
)()|()|(
lPlsP
wPwsP
sP
wPwsPswP
)()|(
)()|(
1 ,
)(
)()|()|(
lPlcP
wPwcP
cP
wPwcPcwP
![Page 12: 확률모델](https://reader036.vdocuments.mx/reader036/viewer/2022082702/554f54f0b4c905b9508b4ffb/html5/thumbnails/12.jpg)
Probabilistic ModelReview of Probability Theory (Cont.)
Making Substitutions (Cont.)
5.45.0
5.0
4.0
6.0
25.0
75.0)(
)(
11
)(
)(
)(
)(
1)(
)(
1
)()|()|(
)()|()|(
1
wP
lPlP
wP
wP
lP
wP
lPlPlcPlsP
wPwcPwsP
818.011
9 The probability of LA Dodgers will win its game
on sunny days when Chanho plays