wang hua 王化 情報科学科四年
DESCRIPTION
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness. Wang Hua 王化 情報科学科四年. Motivation. Too many search engines More than 20 major general-purpose engines More specific-purpose engines Simple aggregation of rankings is popular. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/1.jpg)
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness
Wang Hua 王化情報科学科四年
![Page 2: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/2.jpg)
Motivation
Too many search engines More than 20 major general-purpose enginesMore specific-purpose engines
Simple aggregation of rankings is popular.
We address the need to quantify and visualize the closeness between search engines.
![Page 3: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/3.jpg)
![Page 4: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/4.jpg)
Too Many Search Engines with Different Policy
Major search enginesYahoo, Altavista, Google,Lycos etc.
Distinct ranking policyDirectory type Robot typePagerank type with hyperlink
![Page 5: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/5.jpg)
Outline of Methods
Ranking
Li st d istance measure
Distance between search engines
![Page 6: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/6.jpg)
Ranking
Partial ListCases for WWW web sitesTop 100 list
![Page 7: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/7.jpg)
List of results from search engines
![Page 8: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/8.jpg)
Footrule Distance among Ranking Lists
: ranking lists i |(i) - (i)| [a,b,c,d,e]
[a,d,e,c,b] 0+2+1+2+3 =8
![Page 9: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/9.jpg)
Kendall-tau Distance Definition [Dwork, WWW10, 2001] Counts the number of pairwise disagreements betwe
en two lists
| { i < j | (i) < (j) but (i) > (j) } |
[a,b,c,d] [a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d)
0+0+0+1+1+1=3
![Page 10: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/10.jpg)
Character of Distance
Kendall-tau has O(n log n)-time complexity
Meets triangle inequality and norm distance
![Page 11: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/11.jpg)
Matrix of Distance
Keyword = “university
Engines Dmos Alta Yahoo OverT Excite Lycos Aol Sprinks Galay
Dmos 441 100 132 121 190 213 211 42
Alta 490 737 574 895 915 100 720
Yahoo 2324 2123 1349 879 1221 1766
Overture 7162 7113 6254 945 312
Excite 8927 9699 282 192
Lycos 8712 462 354
Aol 461 365
Sprinks 123
Galaxy
Table 4.2 The Closeness of Search Engines
![Page 12: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/12.jpg)
Visualization
Kernighan-Lin Algorithm
Kamada Spring Model
Comparison of the 2 methods
![Page 13: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/13.jpg)
Kernighan-Lin Method
Brief explanation
![Page 14: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/14.jpg)
Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”
![Page 15: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/15.jpg)
Kernighan-Lin by Color CodingKeyword1=“Gucci” Keyword2=“Hermes”
![Page 16: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/16.jpg)
Kamada Spring Model
Brief explanation
![Page 17: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/17.jpg)
An example
![Page 18: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/18.jpg)
Kamada Spring ModelKeyword1=“Totti” Keyword2=“Nakata”
![Page 19: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/19.jpg)
Comparison of the 2 methods
![Page 20: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/20.jpg)
Results
Distances between search engines are different.
Different fields have different characters
Some search engines such as Sprinks are far away from others.
Excite, Aol are near to each other in most cases.
![Page 21: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/21.jpg)
Conclusion
Address the need to quantify and visualize the closeness between search engines.
Provide users GUI to see the closeness of search engines.
Help users to select the proper search engines
Help users to see the features of each search engines in carious fields.
![Page 22: Wang Hua 王化 情報科学科四年](https://reader035.vdocuments.mx/reader035/viewer/2022081419/56814668550346895db38bf7/html5/thumbnails/22.jpg)
Future Work
Use more search engines
Use both general-purpose and special-purpose search engines
Use hyperlinks to find the resemblance
Apply this idea to other fields