kim ebensgaard jensen aalborg university constructions in wonderland: exploring the functionality of...

84
Kim Ebensgaard Jensen Aalborg University Constructions in Wonderland: Exploring the functionality of constructions through N-grams Functionality of Language | Sprogets Funktionalitet Aalborg Unviersity 10 December, 2014 oshikata Shibuya yoto University of Foreign Studies

Upload: trystan-yielding

Post on 14-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Kim Ebensgaard JensenAalborg University

Constructions in Wonderland:Exploring the functionality of constructions through N-grams

Functionality of Language | Sprogets FunktionalitetAalborg Unviersity

10 December, 2014

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

IntroductionIntroduction

• Is there a way to (semi-)automatically identify constructions in texts, discourses, and corpora?

• Could N-gram analysis and N-gram based network analysis be ways to do that?

• How can (semi-)automatic identification of constructions help us learn about the functional contributions of constructions in discourse?• To this end, we will analyze two literary classics and four

political speeches.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

IntroductionIntroduction

Outline• Theoretical framework

• Positioning of paper

• (Very) basic principles of construction grammar

• Functionality of constructions

• Method• Data collection and methods

• Wordclouds

• N-grams

• Network analysis

• Analyses• Word clouds

• N-grams

• Networks

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Theoretical grounding

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Theoretical and methodological Theoretical and methodological positioning of this paperpositioning of this paper

• Theoretical positioning

• Construction grammar (e.g. Goldberg 1995; Croft 2001)

• Cognitive poetics/stylistics (e.g. Stockwell 2002)

• Cognitive discourse analysis (e.g. Hart 2013)

• Methodological positioning

• Corpus stylistics (e.g. Semino & Short 2004)

• Corpus-aided discourse studies (e.g. Baker 2012)

• Text-mining (e.g. Miner et al. 2012)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

ConstructionsConstructions

In construction grammar, a construction is a functional unit that pairs form and semantic and/or discourse-pragmatic function (Goldberg 1995, 2006; Croft 2001, 2005; Hilpert 2014).

Examples from English - [form]/[function] (cf. Langacker 1987):

• [S V IO DO]/[TRANSFER OF POSSESSION] (Goldberg 1995)

• [X BE so Y that Z]/[SCALAR CAUSATION] (Bergen & Binsted 2004)

• [you don't want me to V]/[THREATENING SPEECH ACT] (Martínez 2013)

• [to begin with]/[INTRODUCTION OF LIST OF ITEMS] (Lipka & Schmid 1994)

• [PROacc CLinf (or NP)]/[DISBELIEF TOWARDS PROPOSITION] (Lambrecht 1990)

• 'What, me worry?!', 'Him a doctor?!' etc.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

ConstructionsConstructions

• Constructions may be schematic, substantive (fixed), or something in-between (Fillmore et al. 1988).

• Constructions may be atomic/simple, complex, or something in-between and form a lexicon-syntax continuum (e.g. Goldberg 1995, Croft 2001).

• Language competence is an inventory of constructions (aka. the construct-i-con) of varying degrees of abstraction which are instantiated in language use (e.g. Goldberg 1995).

• In most contemporary incarnations of construction grammar, the construct-i-con is usage-based (e.g. Croft 2001).

• Constructions are subject to general human cognitive processes and principles, such that language is not a separate, autonomous cognitive faculty; thus, construction grammar is part of the overall endeavor of cognitive linguistics.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

The discursive and stylistic The discursive and stylistic functionality of constructionsfunctionality of constructions

• If constructions are functional units (pairings of form and meaning/function), then they logically must contribute to discourse as part of a speaker's linguistic repertoire. Here are two examples:

• Writers of fiction may use constructions in...• descriptions of actions and happenings;

• characterizations (Culpeper 2009) and mind styles (Fowler 1977) by having characters use certain constructions in their dialog and narrative, or by using certain constructions in the descriptions of characters or of their actions;

• in setting up the text-world and specifying temporal relations in the narrative;

• foregrounding, deviation, parallelism etc. (e.g. Short & Leech 2007)

• In political speeches, speakers may use constructions in...• framing of issues and other ideologically based representations;

• organization of topics/issues;

• rhetorical strategies (including parallelisms etc.).

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Method

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Data collection and methodsData collection and methods

• Text data• Alice's Adventures in Wonderland by Lewis Carroll (1865),

obtained via Gutenberg Project = AW• Adventures of Huckleberry Finn by Mark Twain (1884), obtained

via Gutenberg Project = HF• Inaugural speeches by US Presidents, obtained via Bartleby

• Text mining and corpus-linguistic methods• Wordclouds (R package ‘wordcloud’)• N-grams (R package ‘tau’, AntConc)• Network analysis (R package ‘igraph’)• Concordances (AntConc)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

WordcloudsWordclouds

• Wordclouds are graphical representations of the lexical texture of a text, based on frequencies.

• They are visual versions of frequency lists.

• ... except they do not provide any information on frequencies.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gramsN-grams

• An N-gram is a string of words that co-occur frequently in a data set (such as a corpus or a text).

• N-grams are specified in accordance with the number of words in the string in question (N = number)

• Monogram (1-gram) = one word, bigram (2-gram) = two words, trigram (3-gram) = three words, four-gram (4-gram) = four words, five-gram (5-gram) = five words etc.

• Examples = 'damn you' (2-gram), 'what the hell' (3-gram), 'pick up the phone' (4-gram)

(e.g. Stubbs 2009)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gramsN-grams

• N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus.

• How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency.

• An example: 4-grams in all of Shakespeare's plays

• Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays.

• Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays.

• List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gramsN-grams

• N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus.

• How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency.

• An example: 4-grams in all of Shakespeare's plays

• Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays.

• Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays.

• List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gramsN-grams

• N-gram analysis can be used for a wide range of things:

• Identification of "aboutness" of a text or discourse.

• Phraseological analysis.

• Probablistic language modeling.

• Analysis of aspects of style, genre and register.

• Identification of various types of anonymous writers.

• Identification of constructions and their functionality in a text or discourse.

• Etc.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysisNetwork analysis

• Network analysis sets up data points as nodes in a network and calculates the strengths of association between them.

• As a text mining method, it represents word types (as opposed to tokens) in a text as nodes and, based on N-gram relations, sets up relations between the nodes, or words.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Analysis

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Wordclouds: a first look

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

WordcloudsWordclouds

Wordcloud AW Wordcloud HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

WordcloudsWordclouds

Wordcloud AW Wordcloud HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Visually attractive

Informative to some extent

No details (e.g. frequency)

Visually attractive

Informative to some extent

No details (e.g. frequency)

Kim Ebensgaard JensenAalborg University

N-grams in AW and HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

speech / dialogspeech / dialog

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

speech / dialogspeech / dialog

definite NPdefinite NP

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

speech / dialogspeech / dialog

definite NPdefinite NP

'said''said'

Kim Ebensgaard JensenAalborg University

N-grams in AWN-grams in AW

Yoshikata ShibuyaKyoto University of Foreign Studies

2-grams 3-grams 4-grams

speech / dialogspeech / dialog

definite NPdefinite NP

'said''said'

[DIALOG said NPdef/unique]/[TOPICALIZATION

OF DIALOG]

[DIALOG said NPdef/unique]/[TOPICALIZATION

OF DIALOG]

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

productiveproductive

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

productiveproductive less productiveless productive

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

productiveproductive less productiveless productive

Two different constructions[it BEn't no X]

[there BEn't no X]

Two different constructions[it BEn't no X]

[there BEn't no X]

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Kim Ebensgaard JensenAalborg University

N-grams in N-grams in HFHF

Yoshikata ShibuyaKyoto University of Foreign Studies

3-grams 4-grams 5-grams2-grams

Event relating constructionsused to temporally structure events

in the narrative:[X and then Y]/[EVENT1 FOLLOWED BY EVENT2][by and by X]/[EVENT HAPPENING AFTER SOME TIME]

Event relating constructionsused to temporally structure events

in the narrative:[X and then Y]/[EVENT1 FOLLOWED BY EVENT2][by and by X]/[EVENT HAPPENING AFTER SOME TIME]

Kim Ebensgaard JensenAalborg University

N-gram analysis: pros and cons 1N-gram analysis: pros and cons 1

Pro:• Simple N-gram analysis can help us identify and address constructions and

their functionality in one text or discourse, as it identifies frequent combinations of words in the text or discourse in question.

Con: Problem #1:• What simple N-gram analysis does not tell us is whether or not those

frequent combination of words can also be found in other text.• Our N-gram analyses of AW and HF do not tell us if the findings are actually

just general patterns in English or if they are actually good delineations of the texts.

Solution to problem #1:• To obtain a list of N-grams that really delineate a given text (so that we can

identify what constructions are characteristically associated with the text), a comparative analysis can be useful.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-grams: AW vs. HFN-grams: AW vs. HF

• Note that here we focus on 2-grams

• Procedures

1.2-gram analysis of AW and HF

2.Normalization of frequencies for AW and HF: per 10,000 words

3.Comparison between 2-grams of AW and those of HF with Fisher’s exact test

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-grams: AW vs. HFN-grams: AW vs. HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-grams: AW vs. HFN-grams: AW vs. HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Other texts: a quick look at inaugural speeches

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

What about other texts?What about other texts?

Political texts: inaugural speeches by US Presidents

• Procedures1. 2-gram analysis of 4 inaugural speeches by George Bush (GB), Bill Clinton

(BC), George W. Bush (GWB), and Barack Obama (BO).

2. Normalization: per 1,000 words

3. Fisher’s exact test

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

ResultsResults

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

ResultsResults

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gram analysis: pros and cons 2N-gram analysis: pros and cons 2

• Pros• N-grams can help us identify and address constructions and their

functionality in one text or discourse• Comparative N-gram analysis can help us find N-grams that delineate texts or

discourses• So far, we have found 2-grams that delineate AW and HF, and also some political

speeches.

• Cons: Problem #2:• Shorter N-grams are embedded in longer N-grams (2-grams can be found inside

some of the 3-grams and 4-grams).• Consequently, our N-gram lists contain some redundancy.• Our comparative N-gram analyses focused on 2-grams, but texts contain longer

strings of words (3-grams, 4-grams, etc) and also shorter N-grams (1-grams).• From the perspective of construction grammar, we should also talk about those

longer/shorter N-grams, like we did in our isolated N-gram analyses.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

N-gram analysis: pros and cons 2N-gram analysis: pros and cons 2

• Solution to problem #2

• We can use the methods we have used 2-grams with other N-grams, but is there any simpler way to find both short and long N-grams?

• That is, can we find 1-grams, 2-grams, 3-grams, 4-grams, etc. all at one time?

• Is there any way to provide descriptively a more efficient analysis on frequently co-occurring combinations of words (constructions)?

• Our suggestion here is to use network analysis based on 2-grams.

• Other N-grams emerge in the network, as a network includes all N-grams.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AW and HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

Yoshikata ShibuyaKyoto University of Foreign Studies

Top 96 2-grams

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

• 'said the cat' (3-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

• 'said the cat' (3-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

• 'said the cat' (3-gram)

• 'said alice' (2-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

• 'said the cat' (3-gram)

• 'said alice' (2-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

[the N]/[DEFINITE NOMINAL REFERENCE][the N]/[DEFINITE NOMINAL REFERENCE]

Kim Ebensgaard JensenAalborg University

Network analysis of AWNetwork analysis of AW

A closer look at some N-grams:

• 'said the march hare' (4-gram)

• 'said the mock turtle' (4-gram)

• 'said the king' (3-gram)

• 'said the caterpillar' (3-gram)

• 'said the cat' (3-gram)

• 'said alice' (2-gram)

Yoshikata ShibuyaKyoto University of Foreign Studies

[DIALOG said NPdef/unique]/[TOPICALIZATION

OF DIALOG]

[DIALOG said NPdef/unique]/[TOPICALIZATION

OF DIALOG]

Kim Ebensgaard JensenAalborg University

Network analysis of HFNetwork analysis of HF

Yoshikata ShibuyaKyoto University of Foreign Studies

Top 99 2-grams

Kim Ebensgaard JensenAalborg University

Network analysis of HFNetwork analysis of HF

Negation constructions

Including double negation

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis of HFNetwork analysis of HF

Negation constructions

Including double negation

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Two interesting 2-grams of 'i'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Two interesting 2-grams of 'i'

• 'i reckon'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Two interesting 2-grams of 'i'

• 'i reckon'

• 'says i'

• 'i says'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Two interesting 2-grams of 'i'

• 'i reckon'

• 'says i'

• 'i says'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Two interesting 2-grams of 'i'

• 'i reckon'

• 'says i'

• 'i says'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

• 'and then'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

• 'and then'

• 'by and by'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

• 'and then'

• 'by and by'

• 'and so'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

• 'and then'

• 'by and by'

• 'and so'

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Network analysis (HF)Network analysis (HF)

Three N-grams of 'and':

• 'and then'

• 'by and by'

• 'and so'

Yoshikata ShibuyaKyoto University of Foreign Studies

[X and so Y]/[EVENT1 AMOUNTS TO EVENT2][X and so Y]/[EVENT1 AMOUNTS TO EVENT2]

Kim Ebensgaard JensenAalborg University

Network analysisNetwork analysis

• At the end of the day, short and long N-grams can be found automatically.

• A list of N-grams can be used to compute p.values with Fisher’s exact test, for instance.

• Then, what’s good about using network analysis?• Firstly, it allows us to capture several N-gram types simulteneously without too much

redundancy.

• Secondly, the real advantage that network analysis offers is not just its visual effects, but in fact it tells you a lot about the internal functionality of the network.

• For instance, it is possible to compute the centrality of nodes in a network. This method provides estimates regarding the relationship between a network and the functionality of nodes in it.

• A simple N-gram analysis does not provide answers to these issues.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

CentralityCentrality

• Betweenness centrality: Dehmer & Subhash C. Basak (2012: 70-1)

... the betweenness centrality is based on the shortest path between nodes.

... The betweenness centrality focuses on the number of visits through the shortest paths. If a walker moves from one node to another node via the shortest path, then the nodes with a large number of visits by the walker may have high centrality.

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

CentralityCentrality

• Computing betweenness centrality: Top 15

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

Concluding remarksConcluding remarks

• Can N-grams be used as means of identifying constructions?• Yes, partially.

• Is network analysis useful in the identification of constructions?• Yes, partially.

• What about functionality?• Neither N-grams nor N-gram based networks tell us much about functionality, as they

show us purely formal relations.

• However, they guide us in terms of connections between words that are salient in a given text and may be indicative of constructional as functional units.

• We can then look at the discursive behavior of such N-grams and extrapolate constructions and their functionality in the text or discourse (and, depending on the corpus, in general).

Yoshikata ShibuyaKyoto University of Foreign Studies

Kim Ebensgaard JensenAalborg University

BibliographyBibliography

Baker, P. (2012). Acceptable bias? Using corpus linguistic methods withcritical discourse analysis. Discourse Studies, 9(3): 247-256.

Bergen, B. & K. Binsted (2004). The cognitive linguistics of scalar humor. In M. Achard & S. Kemmer (eds.). Language, Culture, and Mind. Stanford, CA: CSLI. 79-91.

Croft, W. A. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press.

Culpeper, J. (2009). In G. Brône & J. Vandaele (eds). Cognitive Poetics: Goals, Gains and Gaps. Berling: Mouton de Gruyter.

Dehmer, M. & S. C. Basak (2012). Statistical and Machine Learning Approaches for Network Analysis. Chichester: Wiley-Blackwell.

Fillmore, C, P. Kay and M. C. O'Connor (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, 64: 501–38.

Fowler, R. (1977). Linguistics and the Novel. London: Methuen.

Hart, C. (2013). Constructing contexts through grammar: Cognitive models and conceptualisation in British newspaper reports on political protests. In J. Flowerdew (ed.), Discourse in Context. London: Bloomsbury. 159-183.

Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: Chicago University Press.

Lambrect, K. (1990). "What, me worry?": Mad Magazine sentences revisited. BLS, 16: 215-228.

Langacker, R. (1987). Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites. Stanford, CA: Stanford University Press.

Lipka, L. & H.-J. Schmid (1994). To begin with: Degrees of idiomaticity, textual functions and pragmatic exploitations of a fixed expression. ZAA, 42: 6-15.

Martínez, N. D. C. (2013). Illocutionary Construtions in English: Cognitive Motivation and Linguistic Realization. Bern: Peter Lang.

Miner, G., J. Elder, T. Hill, R. Nisbet, D. Delen & A. Fast,. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Elsevier Academic Press.

Semino, E. & M. Short (2004). Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing. Londong: Routledge.

Short M. & G. Leech (2007). A Linguistic Introduction to English Fictional Prose (2nd ed.). Harlow: Pearson Longman.

Stockwell, P. (2002). Cognitive Poetics. London: Routledge.

Stubbs, M. (2009). Technology and phraseology. In U. Römer & R. Schulze (eds.), Exploring the Lexis-Grammar Interface. Amsterdam: John Benjamins. 15-31.

Yoshikata ShibuyaKyoto University of Foreign Studies