the google similarity distance
DESCRIPTION
The Google Similarity Distance. Presenter : Chien-Hsing Chen Author: Rudi L. Cilibrasi Paul M.B. Vitanyi. 2007,TKDE. Outline. Motivation Objective NGD Experiments Conclusions Personal Opinion. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/1.jpg)
1Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Presenter: Chien-Hsing Chen
Author: Rudi L. Cilibrasi
Paul M.B. Vitanyi
The Google Similarity Distance
2007,TKDE
![Page 2: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/2.jpg)
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective NGD Experiments Conclusions Personal Opinion
![Page 3: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/3.jpg)
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
great cost of designing structures capable of manipulating knowledge
entering high quality contents in these structures by knowledgeable human experts
the efforts are long-running
large scale
![Page 4: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/4.jpg)
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
The author develop a method that uses only the name of an object and obtains knowledge about the similarity of objects
a regular FCA, used in Ontology, acquires the similarity between objects and attributes
![Page 5: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/5.jpg)
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.The Google Similarity Distance
Kolmogorov complexity
![Page 6: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/6.jpg)
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.The Google Similarity Distance
NGD (horse, rider) = 0.443“horse” 46,700,000 pages
“rider” 12,200,000 pages
“horse, rider” 2,630,000 pages
N= Indexed 8,058,044,651 pages
NGD(pensi, cola)=0.797NGD( 賓拉登 , 攻擊 )=0.64NGD(horse, rider)=0.898NGD(book, drink)=0.694NGD(web, network)=0.2768
![Page 7: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/7.jpg)
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
Hierarchical ClusteringGiven a set of objects in a space provided with a distance measure, the matrix has as entries the pairwise distances between the objects.
![Page 8: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/8.jpg)
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
Hierarchical ClusteringDataset: 17th Century painters
![Page 9: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/9.jpg)
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Applications and Experiments
SVM-NGD LearningThe author uses the anchor words to convert each of the 40 training words w1, …, w40 to 6-dimensional training vector v1,…v40.
The entry vj,i of vj=(vj,1,…,vj,6) is defined as vj,i=NGD(wj,ai) (1j 40, 1 i 6)≦ ≦ ≦ ≦
![Page 10: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/10.jpg)
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.NGD Translation
![Page 11: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/11.jpg)
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comparison to WordNet semantics
Randomly selected 100 semantic categories from the WordNet database
for each category, SVM is trained on 50 labeled training samplesPositive examples are from WordNet, others are from dictionary
Per experiment is used a total of six anchors, 3 are from WordNet, 3 are from dictionary
Testing dataset, 20 new examples
Running with 100 experiments
The author ignores the false negatives
![Page 12: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/12.jpg)
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
This knowledge base was created over the course of decades by paid human experts.
Google has already indexed more than 8 billion pages and shows no signs of slowing down.
Someone who estimated the 8-billion indexed pages was in 2004.
![Page 13: The Google Similarity Distance](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56815923550346895dc64e8a/html5/thumbnails/13.jpg)
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Opinion
AdvantageGoogle search engine was respected recently for similarity measure.
Drawbackanchors determination, accuracy measure (ignore false-negative)
NGD is a nothing novel but a demonstration straightly
Application