key cluster patterns in shakespeare 2009 aston symposium 22 may 2009 mike scott
Post on 20-Dec-2015
216 views
TRANSCRIPT
Abstract Key words (KWs) in Shakespeare plays have been shown to
belong to certain category-types such as theme-related KWs, character-related KWs.
Other KWs, generally the more interesting ones, seem to be pointers to other patterns indicative of quite specific features of the language, or of the status of characters or of individual sub-themes.
It may be that there is a tension between global KWs and much more localised, "bursty" ones in this regard.
The presentation turns attention now to key word clusters, that is n-grams which are shown to occur distinctively in each individual play, or in the speeches of an individual character. The diverse types of patterns are what will be explored here.
Are n-grams a mere coincidence of relatively frequent words co-occurring frequently so that they are but sound and fury signifying nothing?
Alas poor Yorick! Double, double toil and trouble And thereby hangs a tale Friends, Romans, countrymen, lend
me your ears A blinking idiot Beggar'd all description
Aims
• take previous key word (KW) analysis of Shakespeare plays up one level
• by examining KW clusters
… a proviso
no claim to illuminate understanding of the plays,
the objective being to understand more about keyness and key words
Clusters
sequences of consecutive words repeatedly found in corpora Biber's "bundles" n-grams
no guarantee they are "phrases"
In WordSmith, n is between 2 and 8
Why bother?
(increasing awareness that words don't act alone…
and anyway some inconsistencies e.g. "behind" v. "in front
of" "France" v. "Saudi
Arabia" v. "United Arab Emirates")
…but hang about in gangs)
Keyness A word is said to be "key" if a) it occurs in the text at least as many times as
the user has specified as a Minimum Frequency b) its frequency in the text when compared with
its frequency in a reference corpus is such that the statistical probability as computed by an appropriate procedure is smaller than or equal to a p value specified by the user.
(WordSmith manual)
KW Clusters
re-interpreting "word" to include "cluster"
so the questions are1. How much overlap is there between
KWs and KW clusters? 2. What (if anything) do key clusters
show that KWs don't?
Procedures
with the 1916 OUP Shakespeare corpus at my site
build one overall "index" which knows the positions and neighbours of each word in all 37 plays
compute 2-word clusters using the index build one individual index for each of the
plays compute 2-word clusters for each play
using its index
Procedures (cont.)
repeat previous steps for all lengths of cluster 2 to 5
result = 38 indexes 37 × 4 = 152 individual play cluster
wordlists 4 cluster wordlists for the set of 37
plays
single-word list (all the plays)
N Word Freq. % Texts %
1 THE 26,831 3.29 37 100.00
2 AND 24,110 2.95 37 100.00
3 I 20,536 2.51 37 100.00
4 TO 19,155 2.35 37 100.00
5 OF 15,997 1.96 37 100.00
6 A 13,980 1.71 37 100.00
7 YOU 13,855 1.70 37 100.00
8 MY 12,283 1.50 37 100.00
9 THAT 10,760 1.32 37 100.00
10 IN 10,569 1.29 37 100.00
pure grammar
2-word clusters
N Word Freq. % Texts %
1 I AM 1,858 0.23 37 100.00
2 MY LORD 1,685 0.21 36 97.30
3 I HAVE 1,628 0.20 37 100.00
4 I WILL 1,582 0.19 37 100.00
5 IN THE 1,582 0.19 37 100.00
6 TO THE 1,518 0.19 37 100.00
7 OF THE 1,376 0.17 37 100.00
8 IT IS 1,079 0.13 37 100.00
9 TO BE 971 0.12 37 100.00
10 THAT I 914 0.11 37 100.00
I + AUX
incomplete prepositional phrases
3-word clusters
N Word Freq. % Texts %
1 I PRAY YOU 250 0.03 34 91.89
2 I WILL NOT 214 0.03 36 97.30
3 I KNOW NOT 162 0.02 36 97.30
4 I DO NOT 160 0.02 33 89.19
5 I AM A 141 0.02 35 94.59
6 I AM NOT 139 0.02 34 91.89
7 MY GOOD LORD 132 0.02 29 78.38
8 AND I WILL 129 0.02 34 91.89
9 I WOULD NOT 126 0.02 34 91.89
10 THIS IS THE 122 0.01 36 97.30
negatives
4-word clusters
N Word Freq. % Texts %
1 WITH ALL MY HEART 47 21 56.76
2 I KNOW NOT WHAT 39 20 54.05
3 GIVE ME YOUR HAND 34 19 51.35
4 I DO BESEECH YOU 33 17 45.95
5 GIVE ME THY HAND 31 22 59.46
6 I DO NOT KNOW 29 17 45.95
7 I WOULD NOT HAVE 26 18 48.65
8 AY MY GOOD LORD 25 13 35.14
9 WHAT IS THE MATTER 25 13 35.14
10 GIVE ME LEAVE TO 24 18 48.65
requesting etc., social interactions
5-word clusters
N Word Freq. % Texts %
1 I AM GLAD TO SEE 16 9 24.32
2 I THANK YOU FOR YOUR 12 11 29.73
3 FOR MINE OWN PART I 10 8 21.62
4 I HAD RATHER BE A 9 8 21.62
5 WITH ALL MY HEART AND 9 8 21.62
6 AM GLAD TO SEE YOU 8 5 13.51
7 AS I AM A GENTLEMAN 8 6 16.22
8 I PRAY YOU TELL ME 8 7 18.92
9 KNOW NOT WHAT TO SAY 8 8 21.62
10 SO I TAKE MY LEAVE 8 7 18.92
social formulae
Procedures (cont.)
compare the 2-cluster wordlists of each play with the 2-cluster wordlist of all the plays
repeat for 3-, 4- and 5-word clusters 37 × 4 = 148 key cluster lists
just a title
N Concordance
1 night. Have you not spoken 'gainst the Duke of Cornwall? He's coming hither,
2 father, and given him notice that the Duke of Cornwall and Regan his duchess
3 and foolish. Holds it true, sir, that the Duke of Cornwall was so slain? Most
4 Gloucester, I'd speak with the Duke of Cornwall and his wife. Well, my
repetition!
When we are born, we cry that we are come
To this great stage of fools. This' a good block!
It were a delicate stratagem to shoe
A troop of horse with felt; I'll put it in proof,
And when I have stol'n upon these sons-in-law,
Then, kill, kill, kill, kill, kill, kill! (Lear)
more repetition!And my poor fool is hang'd! No, no, no life!Why should a dog, a horse, a rat, have life,And thou no breath at all? Thou'lt come no more,Never, never, never, never, never!Pray you, undo this button: thank you, sir.Do you see this? Look on her, look, her lips,Look there, look there!
</LEAR><STAGE DIR><Dies.></STAGE DIR>
speech-specific, rhythmicHave more than thou showest,Speak less than thou knowest,Lend less than thou owest,Ride more than thou goest,Learn more than thou trowest,Set less than thou throwest; Leave thy drink and thy whore,And keep in-a-door,And thou shalt have moreThan two tens to a score
RQ 1 (How much overlap is there between KWs and KW clusters?)
Procedure
For selected plays (Hamlet, Romeo, Henry IV part 1, As You Like It):1. Save the column of single word KWs as a plain text file2. Save the column of 2-cluster KWs as a separate file too3. Save the columns of 3-, 4- and 5-cluster KWs likewise4. Make wordlists of these "texts"5. Compute "detailed consistency" of these wordlists6. Use "Set" function to classify items which appear in various
listings 7. Identify the percentage of words which appear in the KW-cluster
lists but not in the single word KW listings & vice-versa8. Identify items which appear in numerous listings.
Romeo and Juliet
There are 43% (207-117 = 90) of the KWs which come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.
2s not found in the single KW list include high frequency grammar items (THE, MY, AT, TO etc.)
2s which are not found elsewhere in any cluster include SHALL
3s not found elsewhere include TELL, WHERE
4s not found elsewhere include COMMEND
types in KW list but not in KW clusters (A-C)
AH, ALACK, AN, APOTHECARY, BED, BENVOLIO, CAPULET, CLOUDS, CORDS, CORSE
Common to 4 or 5 KW listings
HER, O, SILVER, A, ART, BOTH, JULE, LADY, PLAGUE, SOUND, THOU, THY, WITH YOUR
As You Like It There are 48% (190-98 = 92) KWs which
come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.
2s not found in the single KW list include high freq. grammar items (THE, OF, FOR, AND)
2s which are not found elsewhere include HIM, WHO
3s not found elsewhere include AT, WOULD
types in KW list but not in KW clusters (A-C)
ADAM, ALIENA, AMBLES, AURDEY, BEARDS, CELIA, CHARLES, CLOWN, COUNTERFEITED, COUTIER'S, COVERED, COZ, CURED
Henry IV part 1 There are 43% (204-117 =87) KWs which come into
the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.
2s not found in the single KW list include high frequency grammar items (IN, TO, YOU) but also SIR, TRUE
2s which are not found elsewhere include TWO, FEAR, FIRE, CUDGEL
3s not found elsewhere include WELL, WHY, FATHER 4s not found elsewhere include GIVE, ARE, DOOR, LET
types in KW list but not in KW clusters (A-C)
AFOOT, BANISH, BARDOLPH, CLIFTON, COMPULSION, COUNTERFEIT, COWARD
Hamlet There are (44%) 140-79 =61 KWs which
come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.
2s not found in the single KW list include high freq. grammar items (MY, AND OF) but also GOOD
2s which are not found elsewhere include FROM, O, OUR, IS, IN
3s not found elsewhere include HOW, LIFE, EXCEPT, YOUR, REVENGE, NOT, OWN
RQ 1: How much overlap is there between KWs and KW clusters?
More than 50% of the single-word KWs are in the clusters
but the clusters add some 40% or more extra words
not all additions are grammatical Key clusters tail off at 4 or 5
at 4 Kws, which play is this?
midsummer night's dream
all's well that ends well
anthony & cleopatra
"bursty" keyness?
Conclusions
1. How much overlap is there between KWs and KW clusters?
Only a moderate amount; they highlight different aspects of the play
2. What (if anything) do key clusters show that KWs don't?
At the extremes they may highlight songs and very localised bursts in the play but by no means always or only this
<SHALLOW> It is well said, in faith, sir; and it is well said
indeed too. 'Better accommodated!' it is good; yea indeed, is it: good phrases are surely and ever were, very commendable. Accommodated! it comes of accommodo: very good; a good phrase.
</SHALLOW>
<BARDOLPH> Pardon me, sir; I have heard the word.
'Phrase,' call you it? By this good day, I know not the phrase; but I will maintain the word with my sword to be a soldier-like word, and a word of exceeding good command, by heaven. Accommodated; that is, when a man is, as they say, accommodated; or, when a man is, being, whereby, a' may be thought to be accommodated, which is an excellent thing.
</BARDOLPH>