glimpses through the clouds: collocates in a new light, david beavan, university of glasgow, dh2008

27
Glimpses through the clouds: Collocates in a new light David Beavan, Department of English Language

Upload: david-beavan

Post on 20-Jun-2015

152 views

Category:

Education


2 download

DESCRIPTION

Talk given at Digital Humanities 2008 (DH2008) in Oulu, Finland on 27 June 2008. Web site: http://www.scottishcorpus.ac.uk/corpus/bnc/ Abstract: http://www.ekl.oulu.fi/dh2008/Digital%20Humanities%202008%20Book%20of%20Abstracts.pdf This paper demonstrates a web-based, interactive data visualisation, allowing users to quickly inspect and browse the collocational relationships present in a corpus. The software is inspired by tag clouds, first popularised by on-line photograph sharing website Flickr (www.flickr.com). A paper based on a prototype of this Collocate Cloud visualisation was given at Digital Resources for the Humanities and Arts 2007. The software has since matured, offering new ways of navigating and inspecting the source data. It has also been expanded to analyse additional corpora, such as the British National Corpus (http://www.natcorp.ox.ac.uk/), which will be the focus of this talk.

TRANSCRIPT

Page 1: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Glimpses through the clouds: Collocates in a new light David Beavan, Department of English Language

Page 2: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

What are clouds?

Page 3: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 4: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 5: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Cloud properties

Alphabetical listing of items l Good for navigation l Quickly locate or discount a known item l Limited number of items

(Flickr tag cloud = 150)

Font size shows popularity l Good for browsing l Often used tags ‘jump out at you’ l Limited usefulness if less popular terms are sought

Page 6: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 7: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Word frequency cloud

Shares properties with tag clouds l Words listed alphabetically:

good for navigation l Font size shows frequency of word:

good for browsing

Restricted view l Summarises the document as a whole l Does not give insight into the usage or context of each word

for this we need to look at co-occurrences/collocates

Page 8: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Our corpus

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 9: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocates of ‘blue’

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 10: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocates of ‘blue’

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 11: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocates of ‘blue’

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 12: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocates of ‘blue’

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 13: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Co-occurrences as a cloud

Using the British National Corpus (BNC) l Popular and well known l 100 million word corpus l British English l Compiled in early 1990s l Wide range of genres l Written and spoken data l 2007 XML edition

Page 14: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 15: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 16: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Co-occurrence clouds

Co-occurrence clouds l 100 most frequent co-occurring word pairs l Rendered as a cloud l Inherit cloud benefits of navigation and exploration l Allow user to create new clouds from visible words

What’s missing l KWIC concordance of word pairs l Measure of collocation strength

Page 17: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Our corpus

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 18: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocates of ‘brown’

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Page 19: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 20: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 21: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 22: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 23: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 24: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Page 25: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Collocate clouds

Collocate clouds l 100 most frequent co-occurring word pairs l Rendered as a cloud l Inherit cloud benefits of navigation and exploration l Allow user to create new clouds from visible words l KWIC concordance of word pairs l Measure of collocation strength

Page 26: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Future

Advantages l Easy to interpret and use l Lowers the barrier to corpus analysis l Iterative nature promotes browsing and investigation

Improvements l Allow use of stopwords / filter words l Configure ‘size’ of cloud l Show POS l Group words under their headword l Make your own?

Page 27: Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

Glimpses through the clouds: Collocates in a new light David Beavan [email protected]