information retrieval (chapter 2: modeling)

60
Sogang University: Dept of Computer Science Information Retrieval Information Retrieval (Chapter 2: Modeling) (Chapter 2: Modeling) ์„œ์„œ์„œ์„œ์„œ ์„œ์„œ์„œ์„œ์„œ Office: Office: ์„œ์„œ์„œ ์„œ์„œ์„œ 816 816 Tel: 705-8488 Tel: 705-8488 Email: Email: [email protected] [email protected]

Upload: imani-goodman

Post on 03-Jan-2016

103 views

Category:

Documents


0 download

DESCRIPTION

์„œ์ •์—ฐ๊ต์ˆ˜ Office: ๊ณตํ•™๊ด€ 816 Tel: 705-8488 Email: [email protected]. Information Retrieval (Chapter 2: Modeling). 2.1 ์†Œ๊ฐœ - ์šฉ์–ด ์ •๋ฆฌ. ์ปฌ๋ ‰์…˜ (Collection) : ๋ฌธ์„œ์˜ ๋ชจ์ž„ ๋ฌธ์„œ (Document) : ์ž์—ฐ์–ด ๋ฌธ์žฅ์˜ ๋‚˜์—ด ์ƒ‰์ธ์–ด (index term) ์ง‘ํ•ฉ ( ์ •๋ณด๊ฒ€์ƒ‰ ) ์ƒ‰์ธ์–ด : ์˜๋ฏธ (meaning) ๋ฅผ ๊ฐ€์ง€๋Š” ํ‚ค์›Œ๋“œ ํ˜น์€ ํ‚ค์›Œ๋“œ ๋ฌด๋ฆฌ ๋ฌธ์„œ ๋‚ด์šฉ์˜ ์š”์•ฝ ๋Œ€๋ถ€๋ถ„์€ ๋ช…์‚ฌ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Retrieval (Chapter 2: Modeling)

Sogang University: Dept of Computer Science

Information RetrievalInformation Retrieval

(Chapter 2: Modeling)(Chapter 2: Modeling)

์„œ์ •์—ฐ๊ต์ˆ˜์„œ์ •์—ฐ๊ต์ˆ˜Office: Office: ๊ณตํ•™๊ด€ ๊ณตํ•™๊ด€ 816816

Tel: 705-8488Tel: 705-8488

Email: Email: [email protected]@sogang.ac.kr

Page 2: Information Retrieval (Chapter 2: Modeling)

Page 2Information Retrieval

Chapter 2: Modeling

2.12.1 ์†Œ๊ฐœ ์†Œ๊ฐœ - - ์šฉ์–ด ์ •๋ฆฌ์šฉ์–ด ์ •๋ฆฌ2.12.1 ์†Œ๊ฐœ ์†Œ๊ฐœ - - ์šฉ์–ด ์ •๋ฆฌ์šฉ์–ด ์ •๋ฆฌ ์ปฌ๋ ‰์…˜ (Collection) : ๋ฌธ์„œ์˜ ๋ชจ์ž„ ๋ฌธ์„œ (Document) : ์ž์—ฐ์–ด ๋ฌธ์žฅ์˜ ๋‚˜์—ด

์ƒ‰์ธ์–ด (index term) ์ง‘ํ•ฉ ( ์ •๋ณด๊ฒ€์ƒ‰ ) ์ƒ‰์ธ์–ด : ์˜๋ฏธ (meaning) ๋ฅผ ๊ฐ€์ง€๋Š” ํ‚ค์›Œ๋“œ ํ˜น์€ ํ‚ค์›Œ๋“œ ๋ฌด๋ฆฌ

๋ฌธ์„œ ๋‚ด์šฉ์˜ ์š”์•ฝ ๋Œ€๋ถ€๋ถ„์€ ๋ช…์‚ฌ ์ƒ‰์ธ์–ด ์ง‘ํ•ฉ์€ ์‚ฌ์šฉ์ž์˜ ์ •๋ณด์š”๊ตฌ๋‚˜ ๋ฌธ์„œ์˜ ์˜๋ฏธ์ ์ธ ํ‘œํ˜„์œผ๋กœ ๊ฐ„์ฃผํ•จ .

๋ฌธ์ œ์  1: โ€“ ๊ณผ์ž‰๋‹จ์ˆœํ™” (oversimplification problem) : ์ •๋ณด์š”๊ตฌ๋‚˜ ๋ฌธ์„œ์˜

์ผ๋ถ€๋ฅผ ์žƒ์–ด๋ฒ„๋ฆฐ๋‹ค .โ€“ ๋ถ€์ •ํ™•ํ•œ ์ •ํ•ฉ : ์‚ฌ์šฉ์ž ์š”๊ตฌ์— ๋Œ€ํ•œ ๋ฌธ์„œ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์„ ์ˆ˜ ์—†๋‹ค .

๋‹จ์–ด ๋ฌธ์„œ ์ปฌ๋ ‰์…˜ ์งˆ์˜ (Query) : ๋ฌธ์„œ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ

Page 3: Information Retrieval (Chapter 2: Modeling)

Page 3Information Retrieval

Chapter 2: Modeling

์†Œ๊ฐœ ์†Œ๊ฐœ - - ์šฉ์–ด ์ •๋ฆฌ์šฉ์–ด ์ •๋ฆฌ (cont.)(cont.)์†Œ๊ฐœ ์†Œ๊ฐœ - - ์šฉ์–ด ์ •๋ฆฌ์šฉ์–ด ์ •๋ฆฌ (cont.)(cont.)

์ •๋ณด๊ฒ€์ƒ‰์˜ ๋ฌธ์ œ ๊ฒ€์ƒ‰๋ชจ๋ธ

: ์ƒ‰์ธ์–ด ๊ณต๊ฐ„์˜ ํ•œ ์ ์œผ๋กœ ํ‘œํ˜„๋œ ๋ฌธ์„œ์™€ ์งˆ์˜๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š”๊ฐ€ ?

์ˆœ์œ„ ๊ฒฐ์ • (Ranking) ์•Œ๊ณ ๋ฆฌ์ฆ˜ : ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๊ฐ€ ์งˆ์˜์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ์ง€๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •ํ• 

๊ฒƒ์ธ๊ฐ€ ? ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋“ค์„ ์ ์ ˆํ•œ ์ˆœ์„œ๋กœ ์žฌ์ •๋ ฌํ•œ๋‹ค . ๊ฐ€๋Šฅํ•˜๋ฉด ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ ( ์งˆ์˜์–ด ) ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ

์ˆœ์œผ๋กœ ์ •๋ ฌํ•œ๋‹ค .

Page 4: Information Retrieval (Chapter 2: Modeling)

Page 4Information Retrieval

Chapter 2: Modeling

2.2 IR 2.2 IR ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„2.2 IR 2.2 IR ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜์ฒด๊ณ„

์ง‘ํ•ฉ๋ก ์  ๋ชจ๋ธ

ํผ์ง€์ง‘ํ•ฉ๋ชจ๋ธํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ๋ชจ๋ธ

๋Œ€์ˆ˜๋ก ์  ๋ชจ๋ธ

์ผ๋ฐ˜ ๋ฒกํ„ฐ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ ์ƒ‰์ธ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ

ํ™•๋ฅ ๋ก ์  ๋ชจ๋ธ

์ถ”๋ก ๋ง ๋ชจ๋ธ์‹ ๋…๋ง ๋ชจ๋ธ

์‚ฌ์šฉ์ž

์ž‘์—…

๊ฒ€์ƒ‰ :

์ถ•์  (Adhoc)

์—ฌ๊ณผ (filtering)

๋ธŒ๋ผ์šฐ์ง•

์ „ํ†ต๋ชจ๋ธ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ๋ฒกํ„ฐ ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธ

๊ตฌ์กฐ์  ๋ชจ๋ธ

๋น„์ค‘์ฒฉ ๋ฆฌ์ŠคํŠธ ๋ชจ๋ธ๊ทผ์ ‘๋…ธ๋“œ ๋ชจ๋ธ

๋ธŒ๋ผ์šฐ์ง•ํ‰๋ฉด (flat)

๊ตฌ์กฐ๊ธฐ๋ฐ˜ํ•˜์ดํผํ…์ŠคํŠธ

Page 5: Information Retrieval (Chapter 2: Modeling)

Page 5Information Retrieval

Chapter 2: Modeling

๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ถ„๋ฅ˜๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ถ„๋ฅ˜๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ถ„๋ฅ˜๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ถ„๋ฅ˜

์ƒ‰์ธ์–ด ์ „๋ฌธ ์ „๋ฌธ + ๊ตฌ์กฐ

๊ฒ€์ƒ‰

์ „ํ†ต(classic)

์ง‘ํ•ฉ๋ก ๋Œ€์ˆ˜๋ก ํ™•๋ฅ ๋ก 

์ „ํ†ต (classic)

์ง‘ํ•ฉ๋ก ๋Œ€์ˆ˜๋ก ํ™•๋ฅ ๋ก 

๊ตฌ์กฐ

๋ธŒ๋ผ์šฐ์ง• ํ‰๋ฉด (flat)ํ‰๋ฉด (flat)

ํ•˜์ดํผํ…์ŠคํŠธ๊ตฌ์กฐ ๊ธฐ๋ฐ˜

ํ•˜์ดํผํ…์ŠคํŠธ

๋ฌธํ—Œ์˜ ๋…ผ๋ฆฌ์ƒ

์‚ฌ

์šฉ

์ž

๊ณผ

์ œ

Page 6: Information Retrieval (Chapter 2: Modeling)

Page 6Information Retrieval

Chapter 2: Modeling

2.3 2.3 ๊ฒ€์ƒ‰์˜ ์ข…๋ฅ˜ ๊ฒ€์ƒ‰์˜ ์ข…๋ฅ˜ 2.3 2.3 ๊ฒ€์ƒ‰์˜ ์ข…๋ฅ˜ ๊ฒ€์ƒ‰์˜ ์ข…๋ฅ˜ ์ถ•์  ๊ฒ€์ƒ‰ (Ad hoc retrieval)

: ์ปฌ๋ ‰์…˜ ๋‚ด์˜ ๋ฌธ์„œ๋Š” ๋ณ€ํ•˜์ง€ ์•Š๊ณ  ์‚ฌ์šฉ์ž๊ฐ€ ๊ทธ๋•Œ ๊ทธ๋•Œ ์›ํ•˜๋Š” ์งˆ์˜๋ฅผ ํ•˜๋Š” ๊ฒ€์ƒ‰๋ฐฉ๋ฒ• .

์ผ๋ฐ˜์ ์ธ ์ •๋ณด๊ฒ€์ƒ‰์—์„œ ๋งŽ์ด ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒ€์ƒ‰ ์œ ํ˜•

์—ฌ๊ณผ ๊ฒ€์ƒ‰ (Filtering): ์ƒˆ๋กœ์šด ๋ฌธ์„œ๊ฐ€ ๊ฒ€์ƒ‰์‹œ์Šคํ…œ์— ๊ณ„์† ๋“ค์–ด์˜ค๊ณ  ์งˆ์˜์š”๊ตฌ๋Š” ํ•ญ์ƒ

๊ณ ์ •๋˜์–ด ์žˆ๋Š” ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ• ์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ (user profile)

๊ฐ ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ๊ธฐ์ˆ ํ•ด ๋†“์€ ๊ฒƒ ์—ฌ๊ณผ๊ฒ€์ƒ‰์—์„œ๋Š” ๋‹จ์ˆœํžˆ ๊ด€๋ จ์ด ์žˆ๋‹ค๊ณ  ํŒ๋‹จ๋˜๋Š” ๋ฌธ์„œ๋ฅผ ์ „๋‹ฌ ๋ผ์šฐํŒ… (Routing)

์—ฌ๊ณผ๋œ ๋ฌธํ—Œ์˜ ์ˆœ์œ„๋ฅผ ๊ณ„์‚ฐํ•ด์„œ ์ œ๊ณตํ•˜๋Š” ์—ฌ๊ณผ ๊ฒ€์ƒ‰

Page 7: Information Retrieval (Chapter 2: Modeling)

Page 7Information Retrieval

Chapter 2: Modeling

์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ ์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ (user profile)(user profile)์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ ์‚ฌ์šฉ์ž ํ”„๋กœํŒŒ์ผ (user profile)(user profile)

Static user profile ์‚ฌ์šฉ์ž๊ฐ€ ์ž์‹ ์ด ์›ํ•˜๋Š” ์ •๋ณด์— ํ•ด๋‹นํ•˜๋Š” ํ‚ค์›Œ๋“œ๋ฅผ ๊ฒฐ์ •ํ•˜์—ฌ

์ž…๋ ฅํ•จ์œผ๋กœ์จ ์ž์‹ ์˜ profile ์„ ์ œ์ž‘ Dynamic user profile

์ฒ˜์Œ์— ๋ช‡ ๊ฐœ์˜ ํ‚ค์›Œ๋“œ ์ž…๋ ฅ Filter ์—์„œ ์ œ๊ณตํ•˜๋Š” ๊ฒฐ๊ณผ ๋ฌธ์„œ์— ๋Œ€ํ•ด feedback ์„ ์ฃผ๋ฉด ์‹œ์Šคํ…œ์ด

๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ž๋™ ๋ถ„์„ํ•˜์—ฌ ํ”„๋กœํŒŒ์ผ์˜ ํ‚ค์›Œ๋“œ๋ฅผ ๋ณ€๊ฒฝ ์ด๋Ÿฌํ•œ relevance feedback cycle ์˜ ๊ณ„์†

Page 8: Information Retrieval (Chapter 2: Modeling)

Page 8Information Retrieval

Chapter 2: Modeling

2.4 IR 2.4 IR ๋ชจ๋ธ์˜ ํ˜•์‹ ํŠน์„ฑ๋ชจ๋ธ์˜ ํ˜•์‹ ํŠน์„ฑ (formal characterization)(formal characterization)2.4 IR 2.4 IR ๋ชจ๋ธ์˜ ํ˜•์‹ ํŠน์„ฑ๋ชจ๋ธ์˜ ํ˜•์‹ ํŠน์„ฑ (formal characterization)(formal characterization)

IR ๋ชจ๋ธ [D, Q, F, R(qi, dj)]

ํ•จ์ˆ˜ ์ˆœ์œ„๊ฒฐ์ • ๊ฒฐ์ •ํ•˜๋Š” ์—ฐ๊ด€๋„๋ฅผ ๋ฅผ๋ฌธํ—Œ ์™€ ์งˆ์˜ ๊ตฌ์„ฑ ์ •๋ฆฌ๋กœ ๋ฒ ์ด์ฆˆ ,์—ฐ์‚ฐ ํ™•๋ฅ  ํ‘œ์ค€ ,์ง‘ํ•ฉ -๋ชจ๋ธ ํ™•๋ฅ 

๊ตฌ์„ฑ ์—ฐ์‚ฐ์œผ๋กœ ์„ ํ˜•๋Œ€์ˆ˜ ํ‘œ์ค€ ,๊ณต๊ฐ„ ๋ฒกํ„ฐ t์ฐจ์›์˜ -๋ชจ๋ธ ๋ฒกํ„ฐ

๊ตฌ์„ฑ ์—ฐ์‚ฐ์œผ๋กœ ์ง‘ํ•ฉ ํ‘œ์ค€ ,์ง‘ํ•ฉ ๋ฌธํ—Œ -๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ

)(framework ๊ณจ๊ฒฉ ๋ชจ๋ธ๋งํ•˜๊ธฐ์œ„ํ•œ ๊ด€๊ณ„๋ฅผ ์ด๋“ค์˜ ์งˆ์˜์™€ ,ํ‘œํ˜„ ๋ฌธํ—Œ์งˆ์˜ ์ง‘ํ•ฉ์ธ )ํ‘œํ˜„ ์˜ (์ƒ ๋˜๋Š” ๋…ผ๋ฆฌ ์ •๋ณด์š”๊ตฌ์˜ ์‚ฌ์šฉ๋œ

์ง‘ํ•ฉ ํ‘œํ˜„์˜ ๋˜๋Š” view) (logical์ƒ ๋…ผ๋ฆฌ ๋Œ€ํ•œ ๋ฌธํ—Œ์„ ์†Œ์žฅ๋œ

jiji dqdqR

F

Q

D

: ),(

:

:

:

Page 9: Information Retrieval (Chapter 2: Modeling)

Page 9Information Retrieval

Chapter 2: Modeling

2.5 2.5 ์ „ํ†ต์ ์ธ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ์ „ํ†ต์ ์ธ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ2.5 2.5 ์ „ํ†ต์ ์ธ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ์ „ํ†ต์ ์ธ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ (Boolean) ๋ชจ๋ธ

์ง‘ํ•ฉ (Set) ๋ชจ๋ธ ๋ฌธํ—Œ๊ณผ ์งˆ์˜๊ฐ€ ์ƒ‰์ธ์–ด์˜ ์ง‘ํ•ฉ์œผ๋กœ ํ‘œํ˜„ ์ง‘ํ•ฉ๊ณผ ํ‘œ์ค€์ ์ธ ์ง‘ํ•ฉ ์—ฐ์‚ฐ์ž๋กœ ๊ตฌ์„ฑ

๋ฒกํ„ฐ (Vector) ๋ชจ๋ธ ๋Œ€์ˆ˜ (Algebra) ๋ชจ๋ธ ๋ฌธํ—Œ๊ณผ ์งˆ์˜๊ฐ€ t ์ฐจ์› ๊ณต๊ฐ„์˜ ๋ฒกํ„ฐ๋กœ ํ‘œ์‹œ ๋ฒกํ„ฐ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ํ‘œ์ค€ ์„ ํ˜• ๋Œ€์ˆ˜ ์—ฐ์‚ฐ์ž๋กœ ๊ตฌ์„ฑ

ํ™•๋ฅ  (Probabilistic) ๋ชจ๋ธ ๋ฌธํ—Œ๊ณผ ์งˆ์˜ ํ‘œํ˜„์ด ํ™•๋ฅ ๋ก ์— ๊ทผ๊ฑฐ ์ง‘ํ•ฉ , ํ™•๋ฅ  ์—ฐ์‚ฐ๊ณผ ๋ฒ ์ด์ฆˆ ์ •๋ฆฌ (Bayesโ€™ theorem) ๋กœ ๊ตฌ์„ฑ

Page 10: Information Retrieval (Chapter 2: Modeling)

Page 10Information Retrieval

Chapter 2: Modeling

๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋… ์ƒ‰์ธ์–ด (index term)

๋ฌธํ—Œ์˜ ์ฃผ์ œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์˜๋ฏธ๋ฅผ ์ง€๋‹Œ ๋‹จ์–ด ์ฃผ๋กœ ๋ช…์‚ฌ

๋ช…์‚ฌ๋Š” ์ž์‹ ๋งŒ์˜ ์˜๋ฏธ๋ฅผ ์ง€๋‹˜ ๊ฐ€์ค‘์น˜ (Weight)

๋ฌธํ—Œ์„ ๊ธฐ์ˆ ํ•˜๋Š” ์œ ์šฉ์„ฑ์˜ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋ƒ„ ์ •์˜ (Definition)

)) ,์ฆ‰(

ํ•จ์ˆ˜ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ์˜ ์ƒ‰์ธ์–ด ์—์„œ ๋ฒกํ„ฐ ์ฐจ์›- t:

๊ฐ€์ค‘์น˜ ์˜ ์ƒ‰์ธ์–ด ์—์„œ ๋ฌธํ—Œ:

),...,( : ๋ฌธํ—Œ

},...,{: ์ง‘ํ•ฉ ์ƒ‰์ธ์–ด

21

1

ijji

ii

ijij

tjjjj

t

wd(g

kg

kdw

wwwdj

kkK

Page 11: Information Retrieval (Chapter 2: Modeling)

Page 11Information Retrieval

Chapter 2: Modeling

๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋…๊ธฐ๋ณธ ๊ฐœ๋… ์ƒ‰์ธ์–ด ์ƒํ˜ธ ๋…๋ฆฝ์„ฑ ๊ฐ€์ •

(ki, dj) ์˜ ๊ฐ€์ค‘์น˜ wij ๋Š” (ki+1, dj) ์˜ ๊ฐ€์ค‘์น˜ w(i+1)j ์™€ ๋ฌด๊ด€ํ•˜๋‹ค๊ณ  ๊ฐ€์ •

์ƒ‰์ธ์–ด ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ์„ ๋‹จ์ˆœํ™” ๋น ๋ฅธ ์ˆœ์œ„ ๊ณ„์‚ฐ ๊ฐ€๋Šฅ

์ƒ‰์ธ์–ด ์ƒํ˜ธ ์—ฐ๊ด€์„ฑ ์‹ค์ œ ๋ฌธํ—Œ ๋‚ด์—์„œ์˜ ์ƒ‰์ธ์–ด ์ถœํ˜„์€ ์„œ๋กœ ์—ฐ๊ด€ ์˜ˆ )

์ปดํ“จํ„ฐ ๋„คํŠธ์›Œํฌ ๋ถ„์•ผ : โ€˜ ์ปดํ“จํ„ฐโ€™์™€ โ€˜๋„คํŠธ์›Œํฌโ€™ ๋‘ ๋‹จ์–ด๋Š” ์ƒํ˜ธ ์—ฐ๊ด€๋˜์–ด ๊ฐ€์ค‘์น˜์— ์˜ํ–ฅ

์‹ค์ œ ์ƒํ™ฉ์—์„œ ์ƒ‰์ธ์–ด ์—ฐ๊ด€์„ฑ์„ ์ด์šฉํ•˜์—ฌ ์ˆœ์œ„ํ™”์— ํฌ๊ฒŒ ๋„์›€์ด ๋˜๋Š” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ๋‚ธ ์ ์ด ์—†์Œ

๋”ฐ๋ผ์„œ , ์—ฐ๊ด€์„ฑ์ด ํ™•์‹คํžˆ ๋„์›€์ด ๋˜๋Š” ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด์ง€๊ธฐ ๊นŒ์ง€๋Š” ๋…๋ฆฝ์„ฑ ๊ฐ€์ •์ด ์œ ํšจํ•จ

Page 12: Information Retrieval (Chapter 2: Modeling)

Page 12Information Retrieval

Chapter 2: Modeling

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ์ง‘ํ•ฉ๋ก ๊ณผ ๋ถˆ๋ฆฌ์•ˆ ๋Œ€์ˆ˜ํ•™์— ๊ธฐ๋ฐ˜ํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ ์ƒ‰์ธ์–ด ๊ฐ€์ค‘์น˜ wi,j {0, 1}

์—ฐ๊ด€๋œ ๋ฌธ์„œ์ธ์ง€ ์•„๋‹Œ์ง€๋งŒ ์˜ˆ์ธก ๋ฌธ์„œ๋ฅผ ์ˆœ์œ„ํ™”ํ•  ์ˆ˜ ์—†๋‹ค . ์งˆ์˜

์‚ฌ์šฉ์ž๊ฐ€ ์ž์‹ ์˜ ์š”๊ตฌ๋ฅผ Boolean expression ์œผ๋กœ ์ •ํ™•ํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ์€ ๊ทธ๋ฆฌ ์‰ฌ์šด ์ผ์ด ์•„๋‹ˆ๋‹ค

๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  , ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ์€ ๊ฐ€์žฅ ์˜ค๋žซ๋™์•ˆ ์‚ฌ์šฉ๋˜์–ด ์˜จ ๋Œ€ํ‘œ์ ์ธ ์ดˆ์ฐฝ๊ธฐ ๊ฒ€์ƒ‰๋ชจ๋ธ์ด๋‹ค .

์—ฐ์‚ฐ์ž : not, or, and

์ •์˜ ( ๊ต๊ณผ์„œ ์ฐธ์กฐ )

Page 13: Information Retrieval (Chapter 2: Modeling)

Page 13Information Retrieval

Chapter 2: Modeling

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ์˜ˆ์ œ 1)

์ƒ‰์ธ์–ด ( ์—ญํŒŒ์ผ์ด๋ผ๊ณ  ํ•จ ) curve:{12, 25, 36, 89, 125, 128, 215} fitting:{11, 12, 17, 36, 78, 136, 215} interpolation: {11, 18, 36, 125, 132}

์งˆ์˜ : ((curve and fitting) or interpolation)

1. (curve and fitting) = {12, 36, 215}

2. ((curve and fitting) or interpolation)

= {12, 36, 215} or {11, 18, 36, 125, 132}

= {11, 12, 18, 36, 125, 132, 215}

Page 14: Information Retrieval (Chapter 2: Modeling)

Page 14Information Retrieval

Chapter 2: Modeling

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์˜ˆ์ œ )

)0,0,1()0,1,1()1,1,1(

)(

dnf

cba

q

kkkq

ka kb

kc

Page 15: Information Retrieval (Chapter 2: Modeling)

Page 15Information Retrieval

Chapter 2: Modeling

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

)1,0,1()0,1,1()1,1,1(

)(

dnfq

q

์‹œ์Šคํ…œํ”„๋กœ๊ทธ๋žจ๋ณ‘๋ ฌ๋ณ‘๋ ฌ ํ”„๋กœ๊ทธ๋žจ

์‹œ์Šคํ…œ๋ฌธ์„œ

์ƒ‰์ธ์–ด์œ ์‚ฌ๋„

๋ณ‘๋ ฌ ํ”„๋กœ๊ทธ๋žจ ์‹œ์Šคํ…œ โ€ฆ

001 1 0 1 โ€ฆ 1

002 0 0 1 โ€ฆ 0

003 0 1 1 โ€ฆ 0

004 1 1 0 โ€ฆ 1

Page 16: Information Retrieval (Chapter 2: Modeling)

Page 16Information Retrieval

Chapter 2: Modeling

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† )) ์žฅ์ 

์ง๊ด€์ ์ด๊ณ  ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๋‹ค . ์‚ฌ์šฉ์ž ์š”๊ตฌ์ธ ์งˆ์˜์˜ ์˜๋ฏธ๊ฐ€ ๋ช…ํ™•ํ•˜๋‹ค .

๋‹จ์  ์ˆœ์œ„ํ™”๋ฅผ ์ ์šฉํ•˜๊ธฐ์— ๊ณค๋ž€ํ•˜๋‹ค . ์‚ฌ์šฉ์ž ์š”๊ตฌ๋ฅผ ๋ถˆ๋ฆฌ์•ˆ ํ‘œํ˜„์œผ๋กœ ์ •ํ™•ํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๊ธฐ ์‰ฝ์ง€ ์•Š๋‹ค . ๋ถ€๋ถ„์ •ํ•ฉ (partial matching) ์ด ๋ถˆ๊ฐ€๋Šฅ . (All or nothing systems)

์‚ฌ์šฉ์ž ์š”๊ตฌ๊ฐ€ (A and B and C and D) ๋ผ๋ฉด (A, B, and C but not D) ๋ฅผ ๊ฒ€์ƒ‰๋˜์ง€ ์•Š๋Š”๋‹ค .

์‚ฌ์šฉ์ž ์š”๊ตฌ์— ํ‘œํ˜„๋œ ๋ชจ๋“  ๋‹จ์–ด (term) ์˜ ์ค‘์š”๋„๋Š” ๊ณผ์—ฐ ๋™๋“ฑํ•œ๊ฐ€ ?

๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์˜ ํฌ๊ธฐ๋ฅผ ํ†ต์ œํ•  ์ˆ˜ ์—†๋‹ค (Too much or too little)

Page 17: Information Retrieval (Chapter 2: Modeling)

Page 17Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ ๋™๊ธฐ

์ด์ง„ ๊ฐ€์ค‘์น˜ {0, 1} ์€ ๋„ˆ๋ฌด ์ œํ•œ์ ์ด๋‹ค . ๊ฐ€์ค‘์น˜๋กœ ์‹ค์ˆ˜ (float-point) ๋ฅผ ์‚ฌ์šฉํ•˜์ž .

๋ถ€๋ถ„ ์ •ํ•ฉ์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜์ž . ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์˜ ์—ฐ๊ด€ ์œ ๋ฌด๋ฟ ์•„๋‹ˆ๋ผ

โ€“ ์งˆ์˜์™€์˜ ์œ ์‚ฌ๋„์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰๋œ ๋ฌธํ—Œ์„ ์ˆœ์œ„ํ™”ํ•œ๋‹ค . Cosine ์œ ์‚ฌ๋„ ์˜ˆ

Page 18: Information Retrieval (Chapter 2: Modeling)

Page 18Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์˜ˆ 2)D1 = 2T1 + 3T2 + 5T3

D2 = 3T1 + 7T2 + T3

Q = 0T1 + 0T2 + 2T3

T3

T1

T2

D1 = 2T1+ 3T2 + 5T3

D2 = 3T1 + 7T2 + T3

Q = 0T1 + 0T2 + 2T3

7

32

5

โ€ข D1 ํ˜น์€ D2 ์ค‘ ์–ด๋Š ๊ฒƒ์ด Q ์— ๋” ์œ ์‚ฌํ•œ๊ฐ€ ?

โ€ข ์–ด๋–ป๊ฒŒ ์œ ์‚ฌ๋„ ( ๊ฑฐ๋ฆฌ , ๊ฐ๋„ ๋“ฑ ) ๋ฅผ ์ธก์ •ํ•  ๊ฒƒ์ธ๊ฐ€ ?

Page 19: Information Retrieval (Chapter 2: Modeling)

Page 19Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์งˆ์˜์™€ ๋ฌธํ—Œ์„ t- ์ฐจ์›์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค . ๋ฌธํ—Œ dj = (w1,j, w1,j, ..., wt,j)

์งˆ์˜ q = (w1,q, w1,q, ..., wt,q)

๊ฐ€์ค‘์น˜ 0 wi,j 1 : ์ƒ‰์ธ์–ด์˜ ์ค‘์š”๋„ ์งˆ์˜ / ๋ฌธํ—Œ์˜ ์œ ์‚ฌ๋„ sim(dj, q) ์— ๋”ฐ๋ผ ๊ฒ€์ƒ‰๋œ ๋ฌธํ—Œ์„

์ˆœ์œ„ํ™”ํ•œ๋‹ค . ์œ ์‚ฌ๋„ sim(dj, q) = (dj q) / (|dj| |q|) =

์งˆ์˜๊ฐ€ ๋ถ€๋ถ„์ ์œผ๋กœ ์ •ํ•ฉ๋  ๊ฒฝ์šฐ๋„ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋‹ค . (sim(dj, q) > )

t

i qi

t

i ji

t

i qiji

ww

ww

1

2,1

2,

1 ,,

Page 20: Information Retrieval (Chapter 2: Modeling)

Page 20Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ - - ์œ ์‚ฌ๋„์œ ์‚ฌ๋„๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ - - ์œ ์‚ฌ๋„์œ ์‚ฌ๋„

t

1=i

t

1=i

22

t

1=i

2

|Y|+|X|

|YX| 2

ii

ii

yx

yx

Dice Coefficient

t

1=i

|YX| ii yxInner Product

t

1=i 1

22

t

1=i1/21/2

|Y||X|

|YX|

t

iii

ii

xy

yx

Cosine Coefficient

Jaccard Coefficient

t

1=i

t

1=i

t

1=i

22

t

1=i

|YX|-|Y|+|X|

|YX|

iiii

ii

yxyx

yx

Page 21: Information Retrieval (Chapter 2: Modeling)

Page 21Information Retrieval

Chapter 2: Modeling

ki 1 2 โ€ฆ 17 โ€ฆ 456 โ€ฆ 693 โ€ฆ 5072d1 0 0.3 0 0.5 0 0d2 0.2 0.6 0.3 0 0.8 0.3...dn 0 0.2 0 0 0.6 0 q 0.3 0.7 0 0 0.7 0

๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์œ ์‚ฌ๋„๊ฐ€ ๋‚ด์  (inner product) ์ผ ๊ฒฝ์šฐ :

sim(d1, q) = 0.3*0 + 0.7*0.3 + 0.7*0 = 0.21

sim(d2, q) = 0.3*0.2 + 0.7*0.6 + 0.7*0.8 = 1.04

sim(dn, q) = 0.3*0 + 0.7*0.2 + 0.7*0.6 = 0.56

๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ (if = 0.5)

d2, dn

์šฉ์–ด - ๋ฌธํ—Œ ํ–‰๋ ฌ (Term-Document Matrix)

Page 22: Information Retrieval (Chapter 2: Modeling)

Page 22Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ - - ์šฉ์–ด๊ฐ€์ค‘์น˜์šฉ์–ด๊ฐ€์ค‘์น˜๋ฒกํ„ฐ๋ชจ๋ธ๋ฒกํ„ฐ๋ชจ๋ธ - - ์šฉ์–ด๊ฐ€์ค‘์น˜์šฉ์–ด๊ฐ€์ค‘์น˜

ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ฌธ์ œ ํด๋Ÿฌ์Šคํ„ฐ๋‚ด ์œ ์‚ฌ๋„ (intra-clustering similarity)

์–ด๋–ค ๊ฐ์ฒด๋ฅผ ์ž˜ ์„ค๋ช…ํ•˜๋Š” ํŠน์„ฑ์ด ๋ฌด์—‡์ด๋ƒ ? ํด๋Ÿฌ์Šคํ„ฐ๊ฐ„ ๋น„์œ ์‚ฌ๋„ (inter-cluster dissimilarity)

์–ด๋–ค ๊ฐ์ฒด๋ฅผ ๋‹ค๋ฅธ ๊ฐ์ฒด์™€ ๊ตฌ๋ถ„ํ•˜๋Š” ํŠน์„ฑ์ด ๋ฌด์—‡์ด๋ƒ ?

์ •๋ณด๊ฒ€์ƒ‰ ๋ฌธ์ œ ํด๋Ÿฌ์Šคํ„ฐ๋‚ด ์œ ์‚ฌ๋„ (intra-clustering similarity)

์šฉ์–ด๋นˆ๋„์ˆ˜ (term frequency): tf, freqi,j

๋ฌธํ—Œ dj ์—์„œ ์šฉ์–ด ki ์˜ ์ˆœ์ˆ˜ ๋นˆ๋„์ˆ˜ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ„ ๋น„์œ ์‚ฌ๋„ (inter-cluster dissimilarity)

์—ญ๋ฌธํ—Œ๋นˆ๋„์ˆ˜ (inverse document frequency): idf ๋ฌธํ—Œ ์ปฌ๋ ‰์…˜์—์„œ ์šฉ์–ด ki ์˜ ๋นˆ๋„์ˆ˜์˜ ์—ญ์ˆ˜

Page 23: Information Retrieval (Chapter 2: Modeling)

Page 23Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

๊ฐ€์ค‘์น˜ ๊ธฐ๋ฒ• ์šฉ์–ด ๋นˆ๋„์ˆ˜ (tf)

๋ฌธํ—Œ๋‚ด ์šฉ์–ด ๋นˆ๋„์ˆ˜๊ฐ€ ๋†’์„ ์ˆ˜๋ก ์—ฐ๊ด€์ด ํผ

์—ญ๋ฌธํ—Œ ๋นˆ๋„์ˆ˜ (idf) ๋งŽ์€ ๋ฌธํ—Œ์— ์ถœํ˜„ํ•œ ์šฉ์–ด๋Š” ์—ฐ๊ด€ / ๋น„์—ฐ๊ด€ ๋ฌธํ—Œ์„ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์–ด๋ ค์›€

)document in the termoffrequency Raw : ( max jiij

ljl

ijij dkfreq

freq

freqf

documents ofnumber Total :

appears index term hein which t documents ofNumber :

log

N

kn

n

Nidf

ii

ii

Page 24: Information Retrieval (Chapter 2: Modeling)

Page 24Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์ž˜ ์•Œ๋ ค์ง„ ์ƒ‰์ธ์–ด ๊ฐ€์ค‘์น˜ ๊ธฐ๋ฒ• tf ์™€ idf ์˜ ๊ท ํ˜• (tf-idf ๊ธฐ๋ฒ• )

์งˆ์˜์—์„œ ์šฉ์–ด ๊ฐ€์ค‘์น˜ ๊ธฐ๋ฒ•

iij

iijij

idff

n

Nfw

log

iiq

ilql

iqiq

idff

n

N

freq

freqw

)5.05.0(

logmax

5.05.0

Page 25: Information Retrieval (Chapter 2: Modeling)

Page 25Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

truck"ain arrived gold ofShipment " :

ck"silver tru ain arrivedsilver ofDelivery " :

fire" ain damaged gold ofShipment " :

3

2

1

D

D

D

ii n

Nidf log

ck"silver tru gold" :Q

Term a arrived damaged delivery fire gold in of silver shipment truck

idf 0 .176 .477 .477 .477 .176 0 0 .477 .176 .176

iijij idffw iiqiq idffw

Page 26: Information Retrieval (Chapter 2: Modeling)

Page 26Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ๋ชจ๋ธ ๋ฒกํ„ฐ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11

D1 0 0 .477 0 .477 .176 0 0 0 .176 0

D2 0 .176 0 .477 0 0 0 0 .954 0 .176

D3 0 .176 0 0 0 .176 0 0 0 .176 .176

Q 0 0 0 0 0 .176 0 0 .477 0 .176

ij

t

iiqj wwDQSC

1

),(

031.0)176.0(

)0)(176.0()176.0)(0()0)(477.0()0)(0()0)(0()176.0)(176.0(

)477.0)(0()0)(0()477.0)(0()0)(0()0)(0(),(

2

1

DQSC

486.0)176.0()477.0)(954.0(),( 22 DQSC

062.0)176.0()176.0(),( 223 DQSC

Hence, the ranking would be D2, D3, D1

Document vectors

Not normalized

Page 27: Information Retrieval (Chapter 2: Modeling)

Page 27Information Retrieval

Chapter 2: Modeling

๋ฒกํ„ฐ ๋ชจ๋ธ ๋ฒกํ„ฐ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))๋ฒกํ„ฐ ๋ชจ๋ธ ๋ฒกํ„ฐ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์žฅ์  ์šฉ์–ด ๊ฐ€์ค‘์น˜๋Š” ๊ฒ€์ƒ‰์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค . ๋ถ€๋ถ„์ •ํ•ฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค . ๊ฒ€์ƒ‰๋œ ๋ฌธํ—Œ์„ ์ˆœ์œ„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค .

๋‹จ์  ์ƒ‰์ธ ์šฉ์–ด๋“ค๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๋‹ค .

์šฉ์–ด๊ฐ„ ์ƒํ˜ธ๋…๋ฆฝ ๊ฐ€์ •์˜ ๋ชจ์ˆœ์ด๋‹ค . ์šฉ์–ด๋“ค ์‚ฌ์ด์˜ ์˜์กด์„ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์•˜๋‹ค .

์—ฐ๊ด€ ํ”ผ๋“œ๋ฐฑ์˜ ์งˆ์˜ ํ™•์žฅ ์—†์ด ์„ฑ๋Šฅ ๊ฐœ์„ ์ด ์–ด๋ ต๋‹ค .

Page 28: Information Retrieval (Chapter 2: Modeling)

Page 28Information Retrieval

Chapter 2: Modeling

ํ™•๋ฅ ๋ชจ๋ธํ™•๋ฅ ๋ชจ๋ธ - - ๊ธฐ๋ณธ๊ธฐ๋ณธํ™•๋ฅ ๋ชจ๋ธํ™•๋ฅ ๋ชจ๋ธ - - ๊ธฐ๋ณธ๊ธฐ๋ณธ

๊ฒฝ์ฃผ๋งˆ ๋ฐฑ๋‘์‚ฐ์€ ํ†ต์‚ฐ 100 ๋ฒˆ์˜ ๊ฒฝ์ฃผ๋ฅผ ๋›ฐ์—ˆ๋‹ค . ๊ทธ ์ค‘ 20 ๋ฒˆ์˜ ๊ฒฝ์ฃผ์—์„œ ์šฐ์Šนํ–ˆ๋‹ค .

P( ๋ฐฑ๋‘์‚ฐ =Win) = 20/100 = .2 ๊ทธ ์ค‘ 30 ๋ฒˆ์€ ๋น„๊ฐ€ ์™”๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ง‘์•˜๋‹ค .

P(Weather=Rain) = 30/100 = .3 ๊ทธ ์ค‘ ๋ฐฑ๋‘์‚ฐ์€ 15 ๋ฒˆ์„ ์ด๊ฒผ๋‹ค .

โ€“ ์กฐ๊ฑด ํ™•๋ฅ  P( ๋ฐฑ๋‘์‚ฐ =Win|Weather=Rain) = 15/30 = .5โ€“ P(Win|Rain)= P(Win, Rain)/P(Rain) โ€“ = 0.15/0.3 = .5

P(Rain|Win) =? Bayesโ€™ theorem

75.02.0

3.05.0

P(W)

P(R)P(W|R) P(R|W)

P(B)

P(A)P(B|A)P(A|B)

Page 29: Information Retrieval (Chapter 2: Modeling)

Page 29Information Retrieval

Chapter 2: Modeling

ํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธ ๋™๊ธฐ

IR ๋ฌธ์ œ๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ํ•ด์„ํ•œ๋‹ค . 1976 ๋…„์— Robertson ๊ณผ Sparck Jones[677] ๊ฐ€ ์ œ์•ˆํ–ˆ๋‹ค .

๊ฐ€์ • ( ํ™•๋ฅ  ์›์น™ ) ์—ฐ๊ด€ ํ™•๋ฅ  (Probability of relevance) ์€ ๋ฌธํ—Œ๊ณผ ์งˆ์˜ ํ‘œํ˜„์—๋งŒ ์ข…์†๋œ๋‹ค .

์งˆ์˜ q ์˜ ์ด์ƒ์ ์ธ ์ •๋‹ต ์ง‘ํ•ฉ (R) ๊ฐ€์ •ํ•˜์ž . ์ง‘ํ•ฉ R ์˜ ๋ฌธํ—Œ๋งŒ ์งˆ์˜ q ์— ์—ฐ๊ด€ (relevant) ๋˜๊ณ  ๋‹ค๋ฅธ ๋ฌธํ—Œ์€

์—ฐ๊ด€๋˜์ง€ ์•Š๋Š”๋‹ค .

Page 30: Information Retrieval (Chapter 2: Modeling)

Page 30Information Retrieval

Chapter 2: Modeling

ํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธ ์ •์˜

t

i i

i

i

iijiqj

dg idg i

dg idg i

j

jj

j

j

j

j

j

jj

iqij

RkP

RkP

RkP

RkPwwqsim(d

RkPRkP

RkPRkPqdsim

qddRP

R

R

RdP

RdP

RPRdP

RPRdP

dRP

dRPqdsim

ww

jiji

jiji

1

0)(1)(

0)(1)(

)|(

)|(1log

)|(1

)|(log~),

)|()|(

)|()|(~),(

query theorelevant t is document y that theProbabilit :)|(

relevant-non be known to documents ofSet :

relevant be known to documents ofSet :

)|(

)|(~

)()|(

)()|(

)|(

)|(),(

binary all are riables weight vaindex term : }1,0{},1,0{

Bayesโ€™ rule ๋™์ผ ๋ฌธํ—Œ์— ๋ชจ๋“  ์€ )(),( RPRP

์ƒ‰์ธ์–ด ๋…๋ฆฝ์„ฑ ๊ฐ€์ •

Log ๋ฅผ ์ทจํ•˜๊ณ  ,

์ƒ์ˆ˜ ๋ฌด์‹œ1)|()|( RkPRkP ii

Page 31: Information Retrieval (Chapter 2: Modeling)

Page 31Information Retrieval

Chapter 2: Modeling

ํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธํ™•๋ฅ  ๋ชจ๋ธ ์ดˆ๊ธฐ ํ™•๋ฅ 

ํ™•๋ฅ ์˜ ๊ฐœ์„ 

iii

i

i

knN

nRkP

RkP

index term econtain th which documents ofnumber : )|(

5.0)|(

ii

iii

iiiii

ii

iii

kVV

VVN

N

nVn

VN

Vn

VN

VnRkP

VN

nV

V

V

V

VRkP

index term econtain th which ofsubset :

retrievedinitially documents ofsubset :11

5.0)|(

1

1

5.0 )|(

๋„ˆ๋ฌด ์ž‘์€ V ์™€ Vi

์˜ ๊ฒฝ์šฐ ,

์กฐ์ • ์š”์†Œ๋ฅผ ๋”ํ•จ

Page 32: Information Retrieval (Chapter 2: Modeling)

Page 32Information Retrieval

Chapter 2: Modeling

ํ™•๋ฅ  ๋ชจ๋ธ ํ™•๋ฅ  ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))ํ™•๋ฅ  ๋ชจ๋ธ ํ™•๋ฅ  ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

Advantage ์—ฐ๊ด€ ํ™•๋ฅ ์— ๋”ฐ๋ผ ๋ฌธํ—Œ ์ˆœ์œ„ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค .

Disadvantage ์ดˆ๊ธฐ ๋ฌธํ—Œ์ด ์—ฐ๊ด€ / ๋น„์—ฐ๊ด€์œผ๋กœ ๋ถ„๋ฆฌ๋˜์—ˆ๋‹ค๋Š” ๊ฐ€์ •์ด ํ•„์š”ํ•˜๋‹ค . ์ƒ‰์ธ์–ด์˜ ๋ฌธํ—Œ๋‚ด ๋นˆ๋„์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๋Š”๋‹ค . ์ƒ‰์ธ์–ด๋“ค ๊ฐ„์˜ ๋…๋ฆฝ์„ฑ์„ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค .

๊ทธ๋Ÿฌ๋‚˜ , ๋…๋ฆฝ์„ฑ ๊ฐ€์ •์ด ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ๊ฒƒ์ธ์ง€๋Š” ์•„์ง ๋ชจ๋ฆ„ !!

Page 33: Information Retrieval (Chapter 2: Modeling)

Page 33Information Retrieval

Chapter 2: Modeling

์ „ํ†ต ๋ชจ๋ธ์˜ ๋น„๊ต์ „ํ†ต ๋ชจ๋ธ์˜ ๋น„๊ต์ „ํ†ต ๋ชจ๋ธ์˜ ๋น„๊ต์ „ํ†ต ๋ชจ๋ธ์˜ ๋น„๊ต ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ

๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ๋ชจ๋ธ ๋ถ€๋ถ„ ์ •ํ•ฉ ์ธ์‹ ๋ถˆ๊ฐ€ -> ๋‚ฎ์€ ์„ฑ๋Šฅ

๋ฒกํ„ฐ ๋ชจ๋ธ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ฒ€์ƒ‰ ๋ชจ๋ธ

๋ฒกํ„ฐ ๋ชจ๋ธ๊ณผ ํ™•๋ฅ  ๋ชจ๋ธ Croft

ํ™•๋ฅ  ๋ชจ๋ธ์ด ๋” ์ข‹์€ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ์ œ๊ณต Salton, Buckley

์ผ๋ฐ˜ ์ปฌ๋ ‰์…˜์—์„œ ๋ฒกํ„ฐ ๋ชจ๋ธ์ด ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„

Page 34: Information Retrieval (Chapter 2: Modeling)

Page 34Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ

๋™๊ธฐ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฌธํ—Œ๊ณผ ์งˆ์˜๋Š” ํ‚ค์›Œ๋“œ ์ง‘ํ•ฉ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค . ๋ฌธํ—Œ๊ณผ ์งˆ์˜์˜ ์‹ค์ œ์ ์ธ ์˜๋ฏธ์˜ ๋ถ€๋ถ„์ ์ธ ํ‘œํ˜„์ด๋‹ค . ๋ฌธํ—Œ๊ณผ ์งˆ์˜์˜ ์ •ํ•ฉ (matching) ์€ ๊ทผ์ ‘ , ๋˜๋Š” ๋ชจํ˜ธํ•œ ์ •ํ•ฉ์ด ๋œ๋‹ค

๊ฐ ์šฉ์–ด๋Š” ํผ์ง€ ์ง‘ํ•ฉ (fuzzy set) ์œผ๋กœ ์ •์˜ํ•œ๋‹ค . ๊ฐ ๋ฌธํ—Œ์€ ๊ฐ ์šฉ์–ด์˜ ํผ์ง€ ์ง‘ํ•ฉ์—์„œ ์†Œ์† ์ •๋„ (degree of

membership) ๋ฅผ ๊ฐ€์ง„๋‹ค .

Page 35: Information Retrieval (Chapter 2: Modeling)

Page 35Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ง‘ํ•ฉ์˜ ์†Œ๊ฐœํผ์ง€ ์ง‘ํ•ฉ์˜ ์†Œ๊ฐœ ํผ์ง€ ์ง‘ํ•ฉ A ์— x ์˜ ์†Œ์† ์ •๋„ A(x) :

A(x) : X [0,1] X : ์ „์ฒด ์ง‘ํ•ฉ (universal set) [0,1] : 0 ๊ณผ 1 ์‚ฌ์ด์˜ ์‹ค์ˆ˜

์˜ˆ ) ์ „์ฒด ์ง‘ํ•ฉ U = {4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8}

ํผ์ง€ ์ง‘ํ•ฉ TALL={0/4.5, 0.2/5, .5/5.5, .7/6, 1/6.5, 1/7, 1/7.5, 1/8} ์†Œ์†ํ•จ์ˆ˜ (membership function)

4.5 5.5 6 6.5

0.5

1.0

0 0 Height in feet

LL0.7

Page 36: Information Retrieval (Chapter 2: Modeling)

Page 36Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ง‘ํ•ฉ์˜ ์—ฐ์‚ฐ ํผ์ง€ ์ง‘ํ•ฉ์˜ ์—ฐ์‚ฐ ํผ์ง€ ์ง‘ํ•ฉ ์—ฐ์‚ฐ์˜ ๋งค์šฐ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ •์˜๋˜๋ฉฐ ์—ฌ๊ธฐ์„œ ํ•œ

์˜ˆ๋ฅผ ๋ณด์ธ๋‹ค .

๊ต์ง‘ํ•ฉ A B ์˜ ์†Œ์†ํ•จ์ˆ˜ : AB(x)= min{(x), (x)} or (x)= (x)(x) for all xX

ํ•ฉ์ง‘ํ•ฉ A B ์˜ ์†Œ์†ํ•จ์ˆ˜ : (x)= max{A(x), (x)} or (x)= (x)+(x)- (x)(x)

์—ฌ์ง‘ํ•ฉ Aโ€™ ์˜ ์†Œ์†ํ•จ์ˆ˜ : Aโ€™(x)= 1-(x)

Page 37: Information Retrieval (Chapter 2: Modeling)

Page 37Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ ๋ฌธํ—Œ D ์˜ ํ‘œํ˜„ : ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ (w1,โ€ฆ,wt),

wi = Ti(D)

์šฉ์–ด Ti ์˜ ํผ์ง€ ์ง‘ํ•ฉ์—์„œ ๋ฌธํ—Œ D ์— ์†Œ์† ์ •๋„ ์˜ˆ :

POLITICS={politics(D1)/ D1 , politics(D2)/ D2 ,โ€ฆ, politics(DN)/ DN}

์งˆ์˜์— ๋Œ€ํ•œ ์—ฐ๊ด€ ์ •๋„ :

๋ฌธํ—Œ D ์˜ ์†Œ์† ์ •๋ณด์— ๋Œ€ํ•ด (Ti AND Tj): min(wi, wj) ๋กœ ๊ณ„์‚ฐ

(Ti OR Tj) : max(wi, wj) ๋กœ ๊ณ„์‚ฐ

(NOT Ti) : 1-wi ๋กœ ๊ณ„์‚ฐ

Page 38: Information Retrieval (Chapter 2: Modeling)

Page 38Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ -- ์˜ˆ์ œ์˜ˆ์ œ ์˜ˆ 1) - AND

D1: elephant/1 + Asia/0.2 + ... D2: elephant/0.2 + Asia/0.2 + ... Q2= elephants AND Asia D1 : min(1, 0.2) = 0.2. D2 : min(0.2,0.2) = 0.2 D1 better

์˜ˆ 2) - OR D1:elephant/0.8 + hunting/0.1 + ... D2: elephant/0.7 + hunting/0.7

+ ... Q3= elephants OR hunting D1 : max(0.8, 0.1)=0.8 D2 : with max(0.7, 0.7)=0.7 D2 better

์˜ˆ 3) - NOT D1: mammals/0.5+Asia/0.2+... D2:

mammals/0.51+Asia/0.49+... Q4 = (mammals AND NOT

Asia)

D1 : min(0.5, 1-0.2) = 0.5 D2 : min(0.51, 1-0.49) = 0.51

Page 39: Information Retrieval (Chapter 2: Modeling)

Page 39Information Retrieval

Chapter 2: Modeling

ํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธํผ์ง€ ์ •๋ณด๊ฒ€์ƒ‰ ๋ชจ๋ธ ์†Œ์†ํ•จ์ˆ˜๋ฅผ ์–ด๋–ป๊ฒŒ ๊ณ„์‚ฐํ• ๊นŒ ?

์šฉ์–ด - ์šฉ์–ด ์—ฐ๊ด€ ํ–‰๋ ฌ ์‚ฌ์šฉํ•œ๋‹ค .

์šฉ์–ด - ์šฉ์–ด ์—ฐ๊ด€ ํ–‰๋ ฌ (Term-term correlation matrix) CNM

๋‘ ์šฉ์–ด ki ์™€ kl ์˜ ์—ฐ๊ด€๋„ cij :

liil

ii

illi

ilil

kkn

kn

nnn

nc

and termecontain th which documents ofNumber :

termecontain th which documents ofNumber :

k1 k2 โ€ฆ

K1 C11 C12

K2 C12 C22

โ€ฆ

์†Œ์† ํ•จ์ˆ˜ (Degree of membership) ๋ฌธํ—Œ dj ๊ฐ€ ์šฉ์–ด ki ์— ๊ด€๋ จ๋œ ์†Œ์†์ •๋ณด

jl dk

ilij c )1(1

Page 40: Information Retrieval (Chapter 2: Modeling)

Page 40Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๋™๊ธฐ

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ๊ฐ„๋‹จํ•˜๋‹ค . ์šฉ์–ด ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค . ๊ฒ€์ƒ‰๋œ ๋ฌธํ—Œ๋ฅผ ์ˆœ์œ„ํ™”ํ•  ์ˆ˜ ์—†๋‹ค . ๊ฒ€์ƒ‰๋œ ๋ฌธํ—Œ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ฑฐ๋‚˜ ๋„ˆ๋ฌด ์ž‘๋‹ค .

๋ฒกํ„ฐ ๋ชจ๋ธ ๊ฐ„๋‹จํ•˜๊ณ  ๋น ๋ฅด๋‹ค . ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค .

๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ์˜ ์งˆ์˜ ํ˜•์‹์„ ๋ฒกํ„ฐ ๋ชจ๋ธ์— ์ ์šฉํ•˜์—ฌ ๋ถ€๋ถ„ ์ •ํ•ฉ์ด๋‚˜ ์šฉ์–ด ๊ฐ€์ค‘์น˜๋ฅผ ์ด์šฉํ•˜์ž .

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ

Page 41: Information Retrieval (Chapter 2: Modeling)

Page 41Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))

๋ถˆ๋ฆฌ์•ˆ ๋…ผ๋ฆฌ์˜ ๊ฐ€์ •์— ๋Œ€ํ•œ ๋ฌธ์ œ ์งˆ์˜ ์šฉ์–ด์˜ ๋ถˆ๋ฆฌ์•ˆ ํ•ฉ : q = kx ky

์šฉ์–ด kx ํ˜น์€ ky ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฌธํ—Œ์€ ๋‘ ์šฉ์–ด kx ์™€ ky ๊ฐ€ ํฌํ•จ๋˜์ง€ ์•Š์€ ๋‹ค๋ฅธ ๋ฌธํ—Œ๋งŒํผ ์งˆ์˜ q ์— ์—ฐ๊ด€๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค .

์งˆ์˜ ์šฉ์–ด์˜ ๋ถˆ๋ฆฌ์•ˆ ๊ณฑ : q = kx ky

์šฉ์–ด kx ์™€ ky ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฌธํ—Œ์€ ๋‘ ์šฉ์–ด kx ํ˜น์€ ky ๊ฐ€ ํฌํ•จ๋œ ๋ฌธํ—Œ๋ณด๋‹ค ์งˆ์˜ q ์— ์—ฐ๊ด€๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค .

Page 42: Information Retrieval (Chapter 2: Modeling)

Page 42Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† )) ์˜ˆ )

๋‘ ์šฉ์–ด kx ๊ณผ ky ์— ๋Œ€ํ•ด์„œ ์ƒ๊ฐํ•ด๋ณด์ž .

๊ฐ€์ค‘์น˜ (normalized tf-idf factor)

2 ์ฐจ์› ๊ณต๊ฐ„์—์„œ ์งˆ์˜์™€ ๋ฌธํ—Œ์˜ ์œ ์‚ฌ๋„

)10( max

)10( max

yjii

yyjyj

xjii

xxjxj

widf

idffw

widf

idffw

(1,1)

kx

ky

(1,0)

(0,1)

(0,0)

๊ฐ€์žฅ ์›์น˜ ์•Š์Œ d

x y(1,1)

kx

ky

(1,0)

(0,1)

(0,0)

๊ฐ€์žฅ ์›ํ•จ .d

x y

Page 43: Information Retrieval (Chapter 2: Modeling)

Page 43Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† )) ๋…ผ๋ฆฌํ•ฉ ์งˆ์˜ :

์œ ์‚ฌ๋„ : ์ขŒํ‘œ (0,0) ์œผ๋กœ๋ถ€ํ„ฐ์˜ Normalized ๋œ ๊ฑฐ๋ฆฌ์ด๋‹ค .

๋…ผ๋ฆฌ๊ณฑ ์งˆ์˜ :

์œ ์‚ฌ๋„ : ์ขŒํ‘œ (1, 1) ๋กœ๋ถ€ํ„ฐ์˜ ๊ฑฐ๋ฆฌ์˜ ์—ญ์ˆ˜

yxor kkq

yxand kkq

2),(

22 yxdqsim or

2

)1()1(1),(

22 yxdqsim and

Page 44: Information Retrieval (Chapter 2: Modeling)

Page 44Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† )) P-norm ๋ชจ๋ธ

๊ฑฐ๋ฆฌ ๊ฐœ๋… ์ผ๋ฐ˜ํ™” : ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ๋ฟ ์•„๋‹ˆ๋ผ p- ๊ฑฐ๋ฆฌ์˜ ๊ฐœ๋…์œผ๋กœ ์ผ๋ฐ˜ํ™”ํ•œ

๋ชจ๋ธ์ด๋‹ค . p ๊ฐ’์€ ์งˆ์˜ ์‹œ ์ค€๋‹ค . ์ผ๋ฐ˜ํ™”๋œ ๋…ผ๋ฆฌํ•ฉ ์งˆ์˜ (Generalized disjunctive query)

์ผ๋ฐ˜ํ™”๋œ ๋…ผ๋ฆฌ๊ณฑ ์งˆ์˜ ( Generalized conjunctive query)

mppp

or kkkq ...21

mppp

and kkkq ...21

Page 45: Information Retrieval (Chapter 2: Modeling)

Page 45Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† )) P-norm ๋ชจ๋ธ์—์„œ ์งˆ์˜ - ๋ฌธํ—Œ์˜ ์œ ์‚ฌ๋„ (query-document

similarity)

์˜ˆ )

ppm

pp

jand

ppm

pp

jor

m

xxxdqsim

m

xxxdqsim

1

21

1

21

)1(...)1()1(1),(

...),(

321 )( kkkq pp

p

p

p

ppp

j

xxx

dqsim

1

3

1

21

2

2)1()1(

1

),(

Page 46: Information Retrieval (Chapter 2: Modeling)

Page 46Information Retrieval

Chapter 2: Modeling

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† ))ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ ((๊ณ„์†๊ณ„์† )) P-norm ๋ชจ๋ธ์˜ ํŠน์„ฑ

p = 1 ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ํ•œ ๋ฒกํ„ฐ ๋ชจ๋ธ

p = ์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ํผ์ง€ ์ง‘ํ•ฉ ๋ชจ๋ธ

1< p < ๋กœ p๊ฐ’์„ ๋ณ€ํ™”์‹œํ‚ค๋ฉด ๋ฒกํ„ฐ๋ชจ๋ธ๊ณผ ํผ์ง€์ง‘ํ•ฉ๋ชจ๋ธ์˜ ์ค‘๊ฐ„ ์„ฑ์งˆ์„ ๊ฐ€์ง„ ๊ฒ€์ƒ‰๋ชจ๋ธ์ด ๋˜๋Š” ์œ ์—ฐ์„ฑ์ด ์žฅ์ ์ด๋‹ค

ํ™•์žฅ ๋ถˆ๋ฆฌ์•ˆ ๋ชจ๋ธ์€ 1983 ๋…„์— ์†Œ๊ฐœ๋œ ๊ฐœ๋…์ด์ง€๋งŒ ์‹ค์ œ ๋งŽ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š๊ณ  ์žˆ๋‹ค . ๊ทธ๋Ÿฌ๋‚˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ด๋ก ์ ์ธ ์žฅ์ ์„ ๋งŽ์ด ๊ฐ€์ง€๊ณ  ์žˆ์–ด์„œ ๋ฏธ๋ž˜์˜ ๊ฒ€์ƒ‰๋ชจ๋ธ๋กœ ์‚ฌ์šฉ๋  ๊ฐ€๋Šฅ์„ฑ์ด ๋งŽ๋‹ค

m

xxdqsimdqsim m

jandjor

...),(),( 1

)(min),(

)(max),(

iijand

iijor

xdqsim

xdqsim

Page 47: Information Retrieval (Chapter 2: Modeling)

Page 47Information Retrieval

Chapter 2: Modeling

์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด๋“ค ๊ฐ„์˜ ์˜์กด์„ฑ ๊ณ ๋ ค์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด๋“ค ๊ฐ„์˜ ์˜์กด์„ฑ ๊ณ ๋ ค์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด๋“ค ๊ฐ„์˜ ์˜์กด์„ฑ ๊ณ ๋ ค์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด๋“ค ๊ฐ„์˜ ์˜์กด์„ฑ ๊ณ ๋ ค Ti : ์ƒ‰์ธ์–ด i ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฒกํ„ฐ

dri : ๋ฌธํ—Œ Dr ์—์„œ ์ƒ‰์ธ์–ด i ์˜ ๊ฐ€์ค‘์น˜

qsi : ์งˆ์˜ Qs ์—์„œ ์ƒ‰์ธ์–ด i ์˜ ๊ฐ€์ค‘์น˜ ์งˆ์˜์™€ ๋ฌธํ—Œ์˜ ์œ ์‚ฌ๋„ ๋‚ด์ ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ

t

jjrjs

t

iirir TqQTdD

11

t1

sr

and So

T,...,T vectors theofn combinatiolinear a are

Qquery andDDocument

jji

isjri

t

jjsj

t

iiris TTqdTqTdQ

,11r ))((D

:get product weinner Using

Page 48: Information Retrieval (Chapter 2: Modeling)

Page 48Information Retrieval

Chapter 2: Modeling

์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ ์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ - - ์˜ˆ์ œ ์˜ˆ์ œ

D1=2T1+3T2+5T3

D2=3T1+7T2+1T3

Q =0T1+0T2+2T3

sim(D1, Q) = (2T1+ 3T2 + 5T3) * (0T1 + 0T2 + 2T3)

= 4T1T3 + 6T2T3 + 10T3T3

= 4*0-6*0.2+10*1= 8.8

sim(D2, Q) = (3T1+ 7T2 + 1T3) * (0T1 + 0T2 + 2T3)

= 6T1T3 + 14T2T3 + 2T3T3

= 6*0-14*0.2+2*1= -.8

๊ฒ€์ƒ‰๊ฒฐ๊ณผ (if = 0.5): D1

T1 T2 T3

T1 1 .5 0

T2 .5 1 -.2

T3 0 -.2 1

Page 49: Information Retrieval (Chapter 2: Modeling)

Page 49Information Retrieval

Chapter 2: Modeling

์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด์šฉ์–ด -- ์šฉ์–ด ํ–‰๋ ฌ์šฉ์–ด ํ–‰๋ ฌ (term-term matrix)(term-term matrix)์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ ๋ชจ๋ธ โ€“ ์šฉ์–ด์šฉ์–ด -- ์šฉ์–ด ํ–‰๋ ฌ์šฉ์–ด ํ–‰๋ ฌ (term-term matrix)(term-term matrix)

์šฉ์–ด -๋ฌธํ—Œ ํ–‰๋ ฌ : MNM

N: ๋ฌธํ—Œ ์ˆ˜ , M: ์šฉ์–ด (์ƒ‰์ธ์–ด ) ์ˆ˜ ์šฉ์–ด -์šฉ์–ด ํ–‰๋ ฌ : TNN

T = M M-t

์˜ˆ ) (์‹ค์ œ ๊ณ„์‚ฐ์—์„œ๋Š” ์ •๊ทœํ™”๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค .)

์šฉ์–ด๊ฐ„ ์ƒ๊ด€์„ฑ์ด ๊ฒ€์ƒ‰์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ์ฆ๊ฑฐ๋Š” ์•„์ง ์—†๊ณ  , ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ์•„์„œ ์ผ๋ฐ˜ํ™”๋œ ๋ฒกํ„ฐ๋ชจ๋ธ์ด ์ „ํ†ต๋ฒกํ„ฐ๋ชจ๋ธ๋ณด๋‹ค ์„ ํ˜ธ๋˜์ง€ ๋ชปํ•จ ์ด๋ก ์ ์ธ ๊ด€์ ์—์„œ ํ™•์žฅ๋œ ์•„์ด๋””์–ด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ธฐ๋ฐ˜์ด ๋จ

262213t

225827t

132713t

ttt

173

532

ttt

15t

73t

32t

dd

T

3

2

1

321

2

1

321

2

2

1

21

d

d

Page 50: Information Retrieval (Chapter 2: Modeling)

Page 50Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (Latent Semantic Indexing Model)(Latent Semantic Indexing Model)์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (Latent Semantic Indexing Model)(Latent Semantic Indexing Model)

๋™๊ธฐ ์–ดํœ˜์  ์ •ํ•ฉ์˜ ๋ฌธ์ œ์ 

๊ฐœ๋… (concept) ์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•˜๋‹ค .(synonymy)โ€“ ์ƒ‰์ธ๋˜์ง€ ์•Š์•˜์œผ๋‚˜ ์—ฐ๊ด€ ๋ฌธํ—Œ์€ ๊ฒ€์ƒ‰๋˜์ง€ ์•Š๋Š”๋‹ค .

๋Œ€๋ถ€๋ถ„์˜ ๋‹จ์–ด๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์˜๋ฏธ๋ฅผ ์ง€๋‹Œ๋‹ค .(polysemy)โ€“ ๋น„์—ฐ๊ด€ ๋ฌธํ—Œ์ด ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ์— ํฌํ•จ๋œ๋‹ค .

๊ธฐ๋ณธ ๊ฐœ๋… ์งˆ์˜์™€ ๋ฌธํ—Œ์˜ ์ •ํ•ฉ ๊ณผ์ •์— ์ƒ‰์ธ์–ด ์ •ํ•ฉ ๋Œ€์‹ ์— ๊ฐœ๋… ์ •ํ•ฉ (con

cept matching) ์„ ์‚ฌ์šฉํ•œ๋‹ค . ๋ฌธํ—Œ ๋ฒกํ„ฐ์™€ ์งˆ์˜ ๋ฒกํ„ฐ์˜ ์ฐจ์›์„ ๊ฐœ๋… ๋ฒกํ„ฐ๋กœ ๋Œ€์‘์‹œํ‚จ๋‹ค . ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฐœ๋… ๋ฒกํ„ฐ์˜ ์ฐจ์›์€ ์ƒ‰์ธ์–ด ๋ฒกํ„ฐ์˜ ์ฐจ์›๋ณด๋‹ค ์ž‘๋‹ค .

์™œ๋ƒํ•˜๋ฉด ํ•˜๋‚˜์˜ ๊ฐœ๋…์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ƒ‰์ธ์–ด ( ์šฉ์–ด ) ๋ฅผ ํฌํ•จํ•œ๋‹ค .

Page 51: Information Retrieval (Chapter 2: Modeling)

Page 51Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† )) ์šฉ์–ด - ๋ฌธํ—Œ ํ–‰๋ ฌ (term-document matrix) MtN =(Mij)

Mij = wi,j : ๋ฌธํ—Œ dj ์™€ ์šฉ์–ด ki ์˜ ๊ด€๋ จ๋„

๋‹จ์ผ ๊ฐ’ ๋ถ„ํ•ด (singular value decomposition, SVD) ๋ฅผ ์‚ฌ์šฉํ•ด์„œ M ์„ KSD ์œผ๋กœ ๋ถ„ํ•ดํ•œ๋‹ค . K : ์šฉ์–ด - ์šฉ์–ด ์ƒ๊ด€ ํ–‰๋ ฌ (term-to-term correlation matrix) MMt ๋กœ ๊ตฌํ•  ์ˆ˜

์žˆ๋Š” ๊ณ ์œ ๋ฒกํ„ฐ ํ–‰๋ ฌ (matrix of eigenvectors) Dt : ๋ฌธํ—Œ - ๋ฌธํ—Œ ํ–‰๋ ฌ (document-document matrix) MtM ๋ฅผ ์ „์น˜ํ•˜์—ฌ (transpose)

๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ๊ณ ์œ ๋ฒกํ„ฐ ํ–‰๋ ฌ S : ๋‹จ์ผ๊ฐ’ (singular values) ์˜ ๋Œ€๊ฐํ–‰๋ ฌ (diagonal matrix) r r,

์—ฌ๊ธฐ์„œ r = min(t, N) ์€ M ์˜ ์ฐจ์ˆ˜ (rank). ์›๋ž˜ ํ–‰๋ ฌ M ์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด Ms ๋ฅผ ๊ตฌํ•œ๋‹ค .

Ms ์˜ ์ฐจ์ˆ˜ s < r Ms = KsSsDs

Page 52: Information Retrieval (Chapter 2: Modeling)

Page 52Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

๋‹จ์ผ๊ฐ’๋ถ„ํ•ด (Singular Value Decomposition)

0

aluesingular v : ),,(

ectorsingular vright :

ectorsingular vleft :

orthogonal :

121

1

nrr

n

nTT

T

diag

V

U

IVVUU

VUA

A U VT=

m x n m x n n x n n x n

Page 53: Information Retrieval (Chapter 2: Modeling)

Page 53Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ

์šฉ์–ด - ๋ฌธํ—Œ ํ–‰๋ ฌtf-idf ๊ฐ€์ค‘์น˜ ๋ฐฉ๋ฒ•์„์ ์šฉ

Page 54: Information Retrieval (Chapter 2: Modeling)

Page 54Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ k: ์ค„์—ฌ์ง„ ๊ฐœ๋… ๊ณต๊ฐ„์˜ ์ฐจ์›

์šฉ์–ด์™€ ๋ฌธํ—Œ ์‚ฌ์ด์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์ถฉ๋ถ„ํžˆ ์ฐพ์„ ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ปค์•ผ ํ•œ๋‹ค .

๋‹จ์–ด ์‚ฌ์šฉ์—์„œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ณ€ํ˜•์— ์˜ํ•œ ์žก์Œ์„ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ž‘์•„์•ผ ํ•œ๋‹ค .

๊ฒ€์ƒ‰ ์งˆ์˜ ์œ ์‚ฌ๋„ : ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„

๋ฌธ์„œ์˜ ์ˆœ์œ„ํ™” ์งˆ์˜๋ฅผ ์ฒซ๋ฒˆ์งธ ๋ฌธ์„œ D0 ๋กœ ๋ชจ๋ธ๋งํ•œ๋‹ค . Ms

tMs ์—์„œ ์ฒซ๋ฒˆ์งธ ์ค„์€ ์งˆ์˜์— ๋Œ€ํ•œ ๋ชจ๋“  ๋ฌธ์„œ์˜ ์ˆœ์œ„๋ฅผ ์ œ๊ณตํ•œ๋‹ค .

1ห† kkTUqq

Page 55: Information Retrieval (Chapter 2: Modeling)

Page 55Information Retrieval

Chapter 2: Modeling

์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ์ž ์žฌ์˜๋ฏธ์ƒ‰์ธ ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† )) ์žฅ์ 

ํšจ์œจ์ ์ด๊ณ  ๊ฐœ๋…์ ์ธ ์ƒ‰์ธ ๋ชจ๋ธ์ด๋‹ค . ์ƒ‰์ธ์–ด์˜ ์žก์Œ๊ณผ ์ƒ‰์ธ์–ด ๋ฒกํ„ฐ์˜ ์ฐจ์›์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค .

์ƒ‰์ธ์–ด์˜ ์ •ํ•ฉ์ด ์—†์„ ๊ฒฝ์šฐ์—๋„ ๊ฒ€์ƒ‰๋  ์ˆ˜ ์žˆ๋‹ค . ๋‹ค์˜์–ด ๋ฌธ์ œ๋ฅผ ๋ถ€๋ถ„์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค .

ํ•œ ๋‹จ์–ด๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€์˜ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด โ€“ ๊ทธ ๋‹จ์–ด๋Š” ์—ฌ๋Ÿฌ ์˜๋ฏธ์˜ ํ‰๊ท  ๊ฐ€์ค‘์น˜๋กœ ํ‘œํ˜„ํ•œ๋‹ค .

๋‹จ์  ๊ฒ€์ƒ‰ ์†๋„๊ฐ€ ๋Šฆ๋‹ค .

์—ญ์ƒ‰์ธ ํŒŒ์ผ์ด ์—†๋‹ค . ์งˆ์˜ ๋ฒกํ„ฐ์™€ ๊ฐ ๋ฌธํ—Œ ๋ฒกํ„ฐ๋ฅผ ๊ณฑํ•ด์•ผ ํ•œ๋‹ค .

SVD ๊ณ„์‚ฐ์€ ๋Šฆ๊ณ  ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”๊ตฌํ•œ๋‹ค . ๋‹ค์˜์–ด ๋•Œ๋ฌธ์— ํ•œ ๋‹จ์–ด๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋  ์ˆ˜๋„ ์žˆ๋‹ค .

Page 56: Information Retrieval (Chapter 2: Modeling)

Page 56Information Retrieval

Chapter 2: Modeling

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๋™๊ธฐ

์ •๋ณด๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ๋ฌธํ—Œ ๋ฒกํ„ฐ์™€ ์งˆ์˜๋ฒกํ„ฐ์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ˆœ์œ„ํ™”ํ•˜๋Š” ๊ฒƒ์ด๋‹ค . ๋ฌธํ—Œ๊ณผ ์งˆ์˜์— ํฌํ•จ๋œ ์ƒ‰์ธ์€ ์ •ํ•ฉ๋˜์–ด์•ผ ํ•˜๊ณ  ์ ์ ˆํ•œ

๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ์ˆœ์œ„ํ™”๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค .

์‹ ๊ฒฝ๋ง์ด ์ด์™€ ๊ฐ™์€ ์ผ์„ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ํŒจํ„ด ์ •ํ•ฉ๊ธฐ์ด๋‹ค .

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ 3์ธต์œผ๋กœ ๊ตฌ์„ฑ

์งˆ์˜ ์šฉ์–ด , ๋ฌธํ—Œ ์šฉ์–ด , ๋ฌธํ—Œ

Page 57: Information Retrieval (Chapter 2: Modeling)

Page 57Information Retrieval

Chapter 2: Modeling

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† )) ์ž๋ฃŒํ‘œํ˜„ :

๊ตญ๋ถ€ํ‘œํ˜„ (local representation) ์ž…๋ ฅ์ธต : ์ƒ‰์ธ์–ด ๋‹น ํ•˜๋‚˜์˜ ๋…ธ๋“œ ์ถœ๋ ฅ์ธต : ๋ฌธ์„œ ๋‹น ํ•˜๋‚˜์˜ ๋…ธ๋“œ

ํ•™์Šต : ๋ชจ๋“  ๊ฐ€์ค‘์น˜๊ฐ€ ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์—

์˜ํ•ด์„œ ๊ฒฐ์ •๋œ๋‹ค . ํ•™์Šต ๋ฐ์ดํ„ฐ : term vector

dj [k1, k2, ..., kt] 1 [1, 0, ..., 1]

์‹คํ–‰ : ์ž…๋ ฅ : query vector ์ถœ๋ ฅ : query vector ์— ๋Œ€ํ•œ

์ถœ๋ ฅ์ธต์˜ ๊ฐ ๋…ธ๋“œ์— ํ™œ์„ฑํ™”๋œ ๊ฐ’์— ๋”ฐ๋ผ์„œ ๋ฌธ์„œ์˜ ์ˆœ์œ„๊ฐ€ ๊ฒฐ์ •๋œ๋‹ค .

์ž…๋ ฅ์ธต ์ถœ๋ ฅ์ธต

Page 58: Information Retrieval (Chapter 2: Modeling)

Page 58Information Retrieval

Chapter 2: Modeling

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† )) ์˜ˆ์ œ )

D1 Cats and dogs eat.

D2 The dog has a mouse

D3 Mice eat anything

D4 Cats play with mice and rats

D5 Cats play with rats

์งˆ์˜ Do cats play with mice?

Page 59: Information Retrieval (Chapter 2: Modeling)

Page 59Information Retrieval

Chapter 2: Modeling

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

์งˆ์˜ ์šฉ์–ด (query term) ๋…ธ๋“œ ์งˆ์˜ ์šฉ์–ด (query term) ๋…ธ๋“œ์˜ ๊ฐ€์ค‘์น˜ : 1.0 ์•„๋ž˜์™€ ๊ฐ™์€ ์—ฐ๊ฒฐ ๊ฐ€์ค‘์น˜ (connection weight) ๋ฅผ ํ†ตํ•ด์„œ ๋ฌธํ—Œ์šฉ์–ด

(document term) ๋…ธ๋“œ์— ์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ธ๋‹ค .

๋ฌธํ—Œ ์šฉ์–ด (document term) ๋…ธ๋“œ ์•„๋ž˜์™€ ๊ฐ™์€ ์—ฐ๊ฒฐ ๊ฐ€์ค‘์น˜ (connection weight) ๋ฅผ ํ†ตํ•ด์„œ ๋ฌธํ—Œ ๋…ธ๋“œ์—

์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ธ๋‹ค

t

i iq

iqiq

w

ww

1

2

t

i ij

ijij

w

ww

1

2i

ijij n

Nfw log

ilql

iqiq n

N

freq

freqw log

max

5.05.0

Page 60: Information Retrieval (Chapter 2: Modeling)

Page 60Information Retrieval

Chapter 2: Modeling

์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ (( ๊ณ„์†๊ณ„์† ))

๋ฌธํ—Œ ๋…ธ๋“œ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ž…๋ ฅ ์‹ ํ˜ธ๋“ค์„ ํ•ฉํ•œ๋‹ค .

t

i ij

t

i iq

t

i ijiqij

t

iiq

ww

wwww

1

2

1

2

1

1

Cosine measure