kim ebensgaard jensen aalborg university constructions in wonderland: exploring the functionality of...
TRANSCRIPT
Kim Ebensgaard JensenAalborg University
Constructions in Wonderland:Exploring the functionality of constructions through N-grams
Functionality of Language | Sprogets FunktionalitetAalborg Unviersity
10 December, 2014
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
IntroductionIntroduction
• Is there a way to (semi-)automatically identify constructions in texts, discourses, and corpora?
• Could N-gram analysis and N-gram based network analysis be ways to do that?
• How can (semi-)automatic identification of constructions help us learn about the functional contributions of constructions in discourse?• To this end, we will analyze two literary classics and four
political speeches.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
IntroductionIntroduction
Outline• Theoretical framework
• Positioning of paper
• (Very) basic principles of construction grammar
• Functionality of constructions
• Method• Data collection and methods
• Wordclouds
• N-grams
• Network analysis
• Analyses• Word clouds
• N-grams
• Networks
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Theoretical grounding
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Theoretical and methodological Theoretical and methodological positioning of this paperpositioning of this paper
• Theoretical positioning
• Construction grammar (e.g. Goldberg 1995; Croft 2001)
• Cognitive poetics/stylistics (e.g. Stockwell 2002)
• Cognitive discourse analysis (e.g. Hart 2013)
• Methodological positioning
• Corpus stylistics (e.g. Semino & Short 2004)
• Corpus-aided discourse studies (e.g. Baker 2012)
• Text-mining (e.g. Miner et al. 2012)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
ConstructionsConstructions
In construction grammar, a construction is a functional unit that pairs form and semantic and/or discourse-pragmatic function (Goldberg 1995, 2006; Croft 2001, 2005; Hilpert 2014).
Examples from English - [form]/[function] (cf. Langacker 1987):
• [S V IO DO]/[TRANSFER OF POSSESSION] (Goldberg 1995)
• [X BE so Y that Z]/[SCALAR CAUSATION] (Bergen & Binsted 2004)
• [you don't want me to V]/[THREATENING SPEECH ACT] (Martínez 2013)
• [to begin with]/[INTRODUCTION OF LIST OF ITEMS] (Lipka & Schmid 1994)
• [PROacc CLinf (or NP)]/[DISBELIEF TOWARDS PROPOSITION] (Lambrecht 1990)
• 'What, me worry?!', 'Him a doctor?!' etc.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
ConstructionsConstructions
• Constructions may be schematic, substantive (fixed), or something in-between (Fillmore et al. 1988).
• Constructions may be atomic/simple, complex, or something in-between and form a lexicon-syntax continuum (e.g. Goldberg 1995, Croft 2001).
• Language competence is an inventory of constructions (aka. the construct-i-con) of varying degrees of abstraction which are instantiated in language use (e.g. Goldberg 1995).
• In most contemporary incarnations of construction grammar, the construct-i-con is usage-based (e.g. Croft 2001).
• Constructions are subject to general human cognitive processes and principles, such that language is not a separate, autonomous cognitive faculty; thus, construction grammar is part of the overall endeavor of cognitive linguistics.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
The discursive and stylistic The discursive and stylistic functionality of constructionsfunctionality of constructions
• If constructions are functional units (pairings of form and meaning/function), then they logically must contribute to discourse as part of a speaker's linguistic repertoire. Here are two examples:
• Writers of fiction may use constructions in...• descriptions of actions and happenings;
• characterizations (Culpeper 2009) and mind styles (Fowler 1977) by having characters use certain constructions in their dialog and narrative, or by using certain constructions in the descriptions of characters or of their actions;
• in setting up the text-world and specifying temporal relations in the narrative;
• foregrounding, deviation, parallelism etc. (e.g. Short & Leech 2007)
• In political speeches, speakers may use constructions in...• framing of issues and other ideologically based representations;
• organization of topics/issues;
• rhetorical strategies (including parallelisms etc.).
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Data collection and methodsData collection and methods
• Text data• Alice's Adventures in Wonderland by Lewis Carroll (1865),
obtained via Gutenberg Project = AW• Adventures of Huckleberry Finn by Mark Twain (1884), obtained
via Gutenberg Project = HF• Inaugural speeches by US Presidents, obtained via Bartleby
• Text mining and corpus-linguistic methods• Wordclouds (R package ‘wordcloud’)• N-grams (R package ‘tau’, AntConc)• Network analysis (R package ‘igraph’)• Concordances (AntConc)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
WordcloudsWordclouds
• Wordclouds are graphical representations of the lexical texture of a text, based on frequencies.
• They are visual versions of frequency lists.
• ... except they do not provide any information on frequencies.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gramsN-grams
• An N-gram is a string of words that co-occur frequently in a data set (such as a corpus or a text).
• N-grams are specified in accordance with the number of words in the string in question (N = number)
• Monogram (1-gram) = one word, bigram (2-gram) = two words, trigram (3-gram) = three words, four-gram (4-gram) = four words, five-gram (5-gram) = five words etc.
• Examples = 'damn you' (2-gram), 'what the hell' (3-gram), 'pick up the phone' (4-gram)
(e.g. Stubbs 2009)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gramsN-grams
• N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus.
• How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency.
• An example: 4-grams in all of Shakespeare's plays
• Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays.
• Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays.
• List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gramsN-grams
• N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus.
• How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency.
• An example: 4-grams in all of Shakespeare's plays
• Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays.
• Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays.
• List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gramsN-grams
• N-gram analysis can be used for a wide range of things:
• Identification of "aboutness" of a text or discourse.
• Phraseological analysis.
• Probablistic language modeling.
• Analysis of aspects of style, genre and register.
• Identification of various types of anonymous writers.
• Identification of constructions and their functionality in a text or discourse.
• Etc.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysisNetwork analysis
• Network analysis sets up data points as nodes in a network and calculates the strengths of association between them.
• As a text mining method, it represents word types (as opposed to tokens) in a text as nodes and, based on N-gram relations, sets up relations between the nodes, or words.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Analysis
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Wordclouds: a first look
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
WordcloudsWordclouds
Wordcloud AW Wordcloud HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
WordcloudsWordclouds
Wordcloud AW Wordcloud HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Visually attractive
Informative to some extent
No details (e.g. frequency)
Visually attractive
Informative to some extent
No details (e.g. frequency)
Kim Ebensgaard JensenAalborg University
N-grams in AW and HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
speech / dialogspeech / dialog
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
speech / dialogspeech / dialog
definite NPdefinite NP
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
speech / dialogspeech / dialog
definite NPdefinite NP
'said''said'
Kim Ebensgaard JensenAalborg University
N-grams in AWN-grams in AW
Yoshikata ShibuyaKyoto University of Foreign Studies
2-grams 3-grams 4-grams
speech / dialogspeech / dialog
definite NPdefinite NP
'said''said'
[DIALOG said NPdef/unique]/[TOPICALIZATION
OF DIALOG]
[DIALOG said NPdef/unique]/[TOPICALIZATION
OF DIALOG]
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
productiveproductive
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
productiveproductive less productiveless productive
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
productiveproductive less productiveless productive
Two different constructions[it BEn't no X]
[there BEn't no X]
Two different constructions[it BEn't no X]
[there BEn't no X]
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Kim Ebensgaard JensenAalborg University
N-grams in N-grams in HFHF
Yoshikata ShibuyaKyoto University of Foreign Studies
3-grams 4-grams 5-grams2-grams
Event relating constructionsused to temporally structure events
in the narrative:[X and then Y]/[EVENT1 FOLLOWED BY EVENT2][by and by X]/[EVENT HAPPENING AFTER SOME TIME]
Event relating constructionsused to temporally structure events
in the narrative:[X and then Y]/[EVENT1 FOLLOWED BY EVENT2][by and by X]/[EVENT HAPPENING AFTER SOME TIME]
Kim Ebensgaard JensenAalborg University
N-gram analysis: pros and cons 1N-gram analysis: pros and cons 1
Pro:• Simple N-gram analysis can help us identify and address constructions and
their functionality in one text or discourse, as it identifies frequent combinations of words in the text or discourse in question.
Con: Problem #1:• What simple N-gram analysis does not tell us is whether or not those
frequent combination of words can also be found in other text.• Our N-gram analyses of AW and HF do not tell us if the findings are actually
just general patterns in English or if they are actually good delineations of the texts.
Solution to problem #1:• To obtain a list of N-grams that really delineate a given text (so that we can
identify what constructions are characteristically associated with the text), a comparative analysis can be useful.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-grams: AW vs. HFN-grams: AW vs. HF
• Note that here we focus on 2-grams
• Procedures
1.2-gram analysis of AW and HF
2.Normalization of frequencies for AW and HF: per 10,000 words
3.Comparison between 2-grams of AW and those of HF with Fisher’s exact test
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-grams: AW vs. HFN-grams: AW vs. HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-grams: AW vs. HFN-grams: AW vs. HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Other texts: a quick look at inaugural speeches
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
What about other texts?What about other texts?
Political texts: inaugural speeches by US Presidents
• Procedures1. 2-gram analysis of 4 inaugural speeches by George Bush (GB), Bill Clinton
(BC), George W. Bush (GWB), and Barack Obama (BO).
2. Normalization: per 1,000 words
3. Fisher’s exact test
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
ResultsResults
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
ResultsResults
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gram analysis: pros and cons 2N-gram analysis: pros and cons 2
• Pros• N-grams can help us identify and address constructions and their
functionality in one text or discourse• Comparative N-gram analysis can help us find N-grams that delineate texts or
discourses• So far, we have found 2-grams that delineate AW and HF, and also some political
speeches.
• Cons: Problem #2:• Shorter N-grams are embedded in longer N-grams (2-grams can be found inside
some of the 3-grams and 4-grams).• Consequently, our N-gram lists contain some redundancy.• Our comparative N-gram analyses focused on 2-grams, but texts contain longer
strings of words (3-grams, 4-grams, etc) and also shorter N-grams (1-grams).• From the perspective of construction grammar, we should also talk about those
longer/shorter N-grams, like we did in our isolated N-gram analyses.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
N-gram analysis: pros and cons 2N-gram analysis: pros and cons 2
• Solution to problem #2
• We can use the methods we have used 2-grams with other N-grams, but is there any simpler way to find both short and long N-grams?
• That is, can we find 1-grams, 2-grams, 3-grams, 4-grams, etc. all at one time?
• Is there any way to provide descriptively a more efficient analysis on frequently co-occurring combinations of words (constructions)?
• Our suggestion here is to use network analysis based on 2-grams.
• Other N-grams emerge in the network, as a network includes all N-grams.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AW and HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
Yoshikata ShibuyaKyoto University of Foreign Studies
Top 96 2-grams
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
• 'said the cat' (3-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
• 'said the cat' (3-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
• 'said the cat' (3-gram)
• 'said alice' (2-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
• 'said the cat' (3-gram)
• 'said alice' (2-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
[the N]/[DEFINITE NOMINAL REFERENCE][the N]/[DEFINITE NOMINAL REFERENCE]
Kim Ebensgaard JensenAalborg University
Network analysis of AWNetwork analysis of AW
A closer look at some N-grams:
• 'said the march hare' (4-gram)
• 'said the mock turtle' (4-gram)
• 'said the king' (3-gram)
• 'said the caterpillar' (3-gram)
• 'said the cat' (3-gram)
• 'said alice' (2-gram)
Yoshikata ShibuyaKyoto University of Foreign Studies
[DIALOG said NPdef/unique]/[TOPICALIZATION
OF DIALOG]
[DIALOG said NPdef/unique]/[TOPICALIZATION
OF DIALOG]
Kim Ebensgaard JensenAalborg University
Network analysis of HFNetwork analysis of HF
Yoshikata ShibuyaKyoto University of Foreign Studies
Top 99 2-grams
Kim Ebensgaard JensenAalborg University
Network analysis of HFNetwork analysis of HF
Negation constructions
Including double negation
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis of HFNetwork analysis of HF
Negation constructions
Including double negation
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Two interesting 2-grams of 'i'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Two interesting 2-grams of 'i'
• 'i reckon'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Two interesting 2-grams of 'i'
• 'i reckon'
• 'says i'
• 'i says'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Two interesting 2-grams of 'i'
• 'i reckon'
• 'says i'
• 'i says'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Two interesting 2-grams of 'i'
• 'i reckon'
• 'says i'
• 'i says'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
• 'and then'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
• 'and then'
• 'by and by'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
• 'and then'
• 'by and by'
• 'and so'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
• 'and then'
• 'by and by'
• 'and so'
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Network analysis (HF)Network analysis (HF)
Three N-grams of 'and':
• 'and then'
• 'by and by'
• 'and so'
Yoshikata ShibuyaKyoto University of Foreign Studies
[X and so Y]/[EVENT1 AMOUNTS TO EVENT2][X and so Y]/[EVENT1 AMOUNTS TO EVENT2]
Kim Ebensgaard JensenAalborg University
Network analysisNetwork analysis
• At the end of the day, short and long N-grams can be found automatically.
• A list of N-grams can be used to compute p.values with Fisher’s exact test, for instance.
• Then, what’s good about using network analysis?• Firstly, it allows us to capture several N-gram types simulteneously without too much
redundancy.
• Secondly, the real advantage that network analysis offers is not just its visual effects, but in fact it tells you a lot about the internal functionality of the network.
• For instance, it is possible to compute the centrality of nodes in a network. This method provides estimates regarding the relationship between a network and the functionality of nodes in it.
• A simple N-gram analysis does not provide answers to these issues.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
CentralityCentrality
• Betweenness centrality: Dehmer & Subhash C. Basak (2012: 70-1)
... the betweenness centrality is based on the shortest path between nodes.
... The betweenness centrality focuses on the number of visits through the shortest paths. If a walker moves from one node to another node via the shortest path, then the nodes with a large number of visits by the walker may have high centrality.
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
CentralityCentrality
• Computing betweenness centrality: Top 15
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
Concluding remarksConcluding remarks
• Can N-grams be used as means of identifying constructions?• Yes, partially.
• Is network analysis useful in the identification of constructions?• Yes, partially.
• What about functionality?• Neither N-grams nor N-gram based networks tell us much about functionality, as they
show us purely formal relations.
• However, they guide us in terms of connections between words that are salient in a given text and may be indicative of constructional as functional units.
• We can then look at the discursive behavior of such N-grams and extrapolate constructions and their functionality in the text or discourse (and, depending on the corpus, in general).
Yoshikata ShibuyaKyoto University of Foreign Studies
Kim Ebensgaard JensenAalborg University
BibliographyBibliography
Baker, P. (2012). Acceptable bias? Using corpus linguistic methods withcritical discourse analysis. Discourse Studies, 9(3): 247-256.
Bergen, B. & K. Binsted (2004). The cognitive linguistics of scalar humor. In M. Achard & S. Kemmer (eds.). Language, Culture, and Mind. Stanford, CA: CSLI. 79-91.
Croft, W. A. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press.
Culpeper, J. (2009). In G. Brône & J. Vandaele (eds). Cognitive Poetics: Goals, Gains and Gaps. Berling: Mouton de Gruyter.
Dehmer, M. & S. C. Basak (2012). Statistical and Machine Learning Approaches for Network Analysis. Chichester: Wiley-Blackwell.
Fillmore, C, P. Kay and M. C. O'Connor (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, 64: 501–38.
Fowler, R. (1977). Linguistics and the Novel. London: Methuen.
Hart, C. (2013). Constructing contexts through grammar: Cognitive models and conceptualisation in British newspaper reports on political protests. In J. Flowerdew (ed.), Discourse in Context. London: Bloomsbury. 159-183.
Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: Chicago University Press.
Lambrect, K. (1990). "What, me worry?": Mad Magazine sentences revisited. BLS, 16: 215-228.
Langacker, R. (1987). Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites. Stanford, CA: Stanford University Press.
Lipka, L. & H.-J. Schmid (1994). To begin with: Degrees of idiomaticity, textual functions and pragmatic exploitations of a fixed expression. ZAA, 42: 6-15.
Martínez, N. D. C. (2013). Illocutionary Construtions in English: Cognitive Motivation and Linguistic Realization. Bern: Peter Lang.
Miner, G., J. Elder, T. Hill, R. Nisbet, D. Delen & A. Fast,. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Elsevier Academic Press.
Semino, E. & M. Short (2004). Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing. Londong: Routledge.
Short M. & G. Leech (2007). A Linguistic Introduction to English Fictional Prose (2nd ed.). Harlow: Pearson Longman.
Stockwell, P. (2002). Cognitive Poetics. London: Routledge.
Stubbs, M. (2009). Technology and phraseology. In U. Römer & R. Schulze (eds.), Exploring the Lexis-Grammar Interface. Amsterdam: John Benjamins. 15-31.
Yoshikata ShibuyaKyoto University of Foreign Studies