semanc (analysisin...
TRANSCRIPT
![Page 1: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/1.jpg)
Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2014/sais_2014.htm
Semantic Word Clouds
Marina San(ni [email protected]
Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden
Autumn 2014
1 Lect 10: Seman(c Word Clouds
![Page 2: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/2.jpg)
Acknowledgements
• Some slides borrowed from Sergey Pupyrev.
Lect 10: Seman(c Word Clouds 2
![Page 3: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/3.jpg)
Outline
• Word Clouds • 3 early algorithms • 3 new algorithms • Metrics & Quan(ta(ve Evalua(on
Lect 10: Seman(c Word Clouds 3
![Page 4: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/4.jpg)
Word Clouds
• Word clouds have become a standard tool for abstrac(ng, visualizing and comparing texts…
• We could apply the same or similar techniques to the huge amonts of tags produced by users interac(ng in the social networks
Lect 10: Seman(c Word Clouds 4
![Page 5: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/5.jpg)
Comparison & conceptualiza(on Tool
Lect 10: Seman(c Word Clouds 5
• Word Clouds as a tool for ”conceptualizing” documents. Cf Ontologies
• Ex: 2008, comparison of speeches: Obama vs McCain
![Page 6: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/6.jpg)
Word Clouds and Tag Clouds…
• … are oVen used to represent importance among terms (ex, band popularity) or serve as a naviga(on tool (ex, Google search results).
Lect 10: Seman(c Word Clouds 6
![Page 7: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/7.jpg)
The Problem…
• How to compute seman(c-‐preserving word clouds in which seman(cally-‐related words are close to each other.
Lect 10: Seman(c Word Clouds 7
![Page 8: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/8.jpg)
Wordle h^p://www.wordle.net
• Prac(cal tools, like Wordle, make word cloud visualiza(on easy.
• Shortoming: they do not capture the rela(onships between words in any way
Lect 10: Seman(c Word Clouds 8
![Page 9: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/9.jpg)
Many word clouds are arranged randomly (look also at the sca^ered colours)
Lect 10: Seman(c Word Clouds 9
![Page 10: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/10.jpg)
Seman(c Pa^erns
• Humans ins(nc(vely tend to pick up pa^erns
• Ins(nc(vely, one could say that two words that are close to each other in a word cloud are seman(cally related.
Lect 10: Seman(c Word Clouds 10
![Page 11: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/11.jpg)
So, it makes sense to place such related words close to each other (look also at the color distribu(on)
Lect 10: Seman(c Word Clouds 11
![Page 12: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/12.jpg)
In linguis(cs and in LT…
• … if a pair of words oVen appear together in a sentence, then we can assume that this pair of words is related seman(cally.
Lect 10: Seman(c Word Clouds 12
![Page 13: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/13.jpg)
Seman(c word clouds have higher user sa(sfac(on compared to other layouts…
Lect 10: Seman(c Word Clouds 13
![Page 14: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/14.jpg)
All recent word cloud visualiza(on tools aim to incoprorate seman(cs in the layout…
Lect 10: Seman(c Word Clouds 14
![Page 15: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/15.jpg)
… but none of them provide any guarantee about the quality of the layout in terms of seman(cs
Lect 10: Seman(c Word Clouds 15
![Page 16: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/16.jpg)
Early algorithms: Force-‐Directed Graph
• Most of the exis(ng algorithms are based on force-‐directed graph layout.
• Force-‐directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthe(cally pleasing way
– A^rac(ve forces between pairs to reduce empty space
– Repulsive forces ensure that words do not overlap
– Final force preserve seman(c rela(ons between words.
Lect 10: Seman(c Word Clouds 16
Force-‐directed graph drawing algorithms assign forces among the set of edges and the set of nodes of a graph drawing. Typically, spring-‐like a^rac(ve forces based on Hooke's law are used to a^ract pairs of endpoints of the graph's edges towards each other, while simultaneously repulsive forces like those of electrically charged par(cles based on Coulomb's law are used to separate all pairs of nodes.
![Page 17: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/17.jpg)
Newer Algorithms: rectangle representa(on of graphs
• Vertex-‐weighted and edge-‐weighed graph: – The ver(ces of the graph are the words
• Their weight correspond to some measure of importance (eg. word frequencies)
– The edges capture the seman(c relatedness of pair of words (eg. co-‐occurrence) • Their weight correspond to the strength of the rela(on
– Each vertex can be drawn as a box (rectangle) with a dimension determing by its weight
– A realized adjacency is the sum of the edge weights for all pairs of touching boxes.
– The goal is to maximize the realized adjacencies.
Lect 10: Seman(c Word Clouds 17
![Page 18: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/18.jpg)
Experimental Setup: 1) Term Extrac(on 2) Ranking 3) Similarity Conputa(on
Lect 10: Seman(c Word Clouds 18
![Page 19: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/19.jpg)
Early Algorithms
1. Wordle (Random) 2. Context-‐Preserving Word Cloud Visualiza(on
(CPWCV) 3. Seam Carving
Lect 10: Seman(c Word Clouds 19
![Page 20: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/20.jpg)
Wordle à Random
• The Wordle algorithm places one word at a (me in a greedy fashion, aiming to use space as efficiently as possible.
• First the words are sorted by weight in decreasing order.
• Then for each word in the order, a posi(on is picked at random.
Lect 10: Seman(c Word Clouds 20
![Page 21: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/21.jpg)
1: Random
Lect 10: Seman(c Word Clouds 21
![Page 22: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/22.jpg)
2: Random
Lect 10: Seman(c Word Clouds 22
![Page 23: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/23.jpg)
3: Random
Lect 10: Seman(c Word Clouds 23
![Page 24: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/24.jpg)
4: Random
Lect 10: Seman(c Word Clouds 24
![Page 25: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/25.jpg)
5: Random
Lect 10: Seman(c Word Clouds 25
![Page 26: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/26.jpg)
6: Random
Lect 10: Seman(c Word Clouds 26
![Page 27: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/27.jpg)
Context-‐Preserving Word Cloud Visualiza(on (CPWCV)
• First, a dissimilarity matrix is computed and Mul(dimensional Scaling (MDS) is performed
• Second, effort to create a compact layout
Lect 10: Seman(c Word Clouds 27
Mul(dimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset.
![Page 28: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/28.jpg)
1: Context-‐Preserving
Lect 10: Seman(c Word Clouds 28
![Page 29: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/29.jpg)
2: Context-‐Preserving : repulsive force
Lect 10: Seman(c Word Clouds 29
![Page 30: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/30.jpg)
3: Context-‐Preserving : a^rac(ve force
Lect 10: Seman(c Word Clouds 30
![Page 31: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/31.jpg)
Seam Carving
• Seam carving is a content-‐aware image resizing technique
• Basically, an algorithm for image resizing
• It was invented at Mitsubishi’s
Lect 10: Seman(c Word Clouds 31
![Page 32: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/32.jpg)
1: Seam Carving
Lect 10: Seman(c Word Clouds 32
![Page 33: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/33.jpg)
2: Seam Carving : space is divided into regions
Lect 10: Seman(c Word Clouds 33
![Page 34: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/34.jpg)
3: Seam Carving : empty paths trimmed out itera(vely
Lect 10: Seman(c Word Clouds 34
![Page 35: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/35.jpg)
4: Seam Carving
Lect 10: Seman(c Word Clouds 35
![Page 36: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/36.jpg)
5: Seam Carving
Lect 10: Seman(c Word Clouds 36
![Page 37: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/37.jpg)
6: Seam Carving: space divided into regions
Lect 10: Seman(c Word Clouds 37
![Page 38: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/38.jpg)
7: Seam Carving
Lect 10: Seman(c Word Clouds 38
![Page 39: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/39.jpg)
3 New Algorithms
1. Inflate and Push 2. Star Forest 3. Cycle Cover
Lect 10: Seman(c Word Clouds 39
![Page 40: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/40.jpg)
Inflate-‐and-‐Push
• Simple heuris(c method for word layout, which aims to preserve seman(c rela(ons between pair of words.
Lect 10: Seman(c Word Clouds 40
![Page 41: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/41.jpg)
1: Inflate
Lect 10: Seman(c Word Clouds 41
![Page 42: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/42.jpg)
2: Inflate : scaling down
Lect 10: Seman(c Word Clouds 42
![Page 43: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/43.jpg)
3: Inflate : seman(cally-‐related words are placed close to each other
Lect 10: Seman(c Word Clouds 43
![Page 44: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/44.jpg)
4: Inflate : repulsive force to resolve overlaps
Lect 10: Seman(c Word Clouds 44
![Page 45: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/45.jpg)
5: Inflate
Lect 10: Seman(c Word Clouds 45
![Page 46: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/46.jpg)
Star Forest
• A star is a tree and a star forest is a forest whose connected components are all stars.
Lect 10: Seman(c Word Clouds 46
![Page 47: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/47.jpg)
Star Forest : star = graph • Dissimilarity matrix à disjoint stars = star forest • A^rac(ve force to get a compact layout
Lect 10: Seman(c Word Clouds 47
![Page 48: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/48.jpg)
Cycle Cover • This algorithm is based on a similarity matrix. • First, a similarity path(=cycle) is created • Then, the op(mal level of compact-‐ness is computed
Lect 10: Seman(c Word Clouds 48
![Page 49: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/49.jpg)
Quan(ta(ve Metrics
Lect 10: Seman(c Word Clouds 49
![Page 50: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/50.jpg)
Criteria 1. Realized Adjacenies – how close are similar words to each other?
2. Distor(on – how distant are dissimilar words?
3. Comptactness – how well u(lized is the drawing area?
4. Uniform Area U(liza(on – uniformity of the distribu(on (overpopulated vs sparse areas
in the word cloud) 5. Aspect Ra(o – width and height of the bounding box
6. Running Time – execu(on (me
Lect 10: Seman(c Word Clouds 50
![Page 51: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/51.jpg)
2 datasets
(1) WIKI , a set of 112 plain-‐text ar(cles extracted from the English Wikipedia, each consis(ng of at least 200 dis(nct words (2) PAPERS , a set of 56 research papers published in conferences on experimental algorithms (SEA and ALENEX) in 2011-‐2012.
Lect 10: Seman(c Word Clouds 51
![Page 52: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/52.jpg)
Cycle Cover wins
Lect 10: Seman(c Word Clouds 52
![Page 53: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/53.jpg)
Seam Carving wins
Lect 10: Seman(c Word Clouds 53
![Page 54: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/54.jpg)
Random wins
Lect 10: Seman(c Word Clouds 54
![Page 55: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/55.jpg)
Inflate wins
Lect 10: Seman(c Word Clouds 55
![Page 56: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/56.jpg)
Random and Seam Carving win
Lect 10: Seman(c Word Clouds 56
![Page 57: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/57.jpg)
All ok except Seam Carving
Lect 10: Seman(c Word Clouds 57
![Page 58: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/58.jpg)
Demo
Lect 10: Seman(c Word Clouds 58
![Page 59: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/59.jpg)
Final Words
Lect 10: Seman(c Word Clouds 59
![Page 60: Semanc (Analysisin Language(Technology(santini.se/teaching/sais/2014/10_SemanticWordClouds.pdfOutline& • Word&Clouds& • 3 early&algorithms& • 3new algorithms& • Metrics& Quan(tave](https://reader034.vdocuments.mx/reader034/viewer/2022050308/5f70ceb66b73e3444f55208d/html5/thumbnails/60.jpg)
The end
60 Lect 10: Seman(c Word Clouds