an investigation into corpus-based learning about language inin the primary-school: cllip corpus...
TRANSCRIPT
An investigation intoCorpus-based learning about language in
the primary-school: CLLIP
Corpus evidence of the features of children’s literature
The CLLIP Project: Background
CLLIP:Corpus-based Learning about
Language In the Primary-school ESRC-funded project Exploring potential for using corpus
evidence with primary school children (9-11 year olds) for learning about language (L1)
Linguistic analysis of CLLIP corpus
CLLIP corpus is a collection of the texts in the British National Corpus that were written for a child audience
The corpus contains imaginative fiction, factual prose and other texts
Linguistic analysis was conducted on the imaginative fiction texts only
Project research question: 1
1. Does linguistic analysis of the corpus data confirm, extend or challenge the descriptions of English lexis and syntax which are identified as teaching targets in the National Curriculum and the National Literacy Strategy?
1a. Does any such analysis suggest a need for further research on the basis of a larger dedicated corpus of writing for children?
Corpora: CLLIP and comparison
CLLIP corpus: imaginative fiction written for child audience, from the BNC
31 texts
Comparison corpus (hereafter ‘Comp’): imaginative fiction written for an adult audience, from the BNC
315 texts Newspaper texts from the BNC
114 texts
Purpose of the linguistic analysis
To determine the characteristic features of the language of imaginative fiction written for children
To compare and contrast the language of these texts with the language of imaginative fiction written for adults, and also the language of newspapers
Questions
What is distinctive about the discourse of the CLLIP corpus?
What similarities and differences are there in the overall word frequencies and of POSgrams in the three corpora?
Is there a difference in the uses of certain lexical items between the child and adult fiction corpora?
A POSgram is a sequence of parts of speech, such as an article followed by an adjective followed by another adjective then a noun (eg a bright red car; the last chocolate biscuit). In this study, we look at 6-grams (sequences of six parts of speech)
Frequency of Parts of Speech
0.00
5.00
10.00
15.00
20.00
25.00
Comparison of POS categories for 3 corpora (expressed in percentages)
CLLIP 7.53 5.29 7.95 5.62 2.29 2.63 15.29 4.53 6.95 4.53 1.59 4.23 1.03 1.56 1.74 14.51
Comparison 7.82 5.91 7.68 5.72 2.69 2.74 16.60 3.89 7.51 3.89 1.68 4.23 0.88 1.87 1.65 13.74
Newspaper 9.73 7.96 4.45 4.83 1.28 2.28 23.15 7.57 8.92 3.49 1.81 3.64 0.27 1.32 1.35 10.44
ArticleAdjectiv
eAdverb
Conjunction
Possessive
Determiner
NounProper noun
Preposition
Pronoun
Infinitive to
Verb 'be'
Verb 'do'
Verb 'have'
Modal verb
Lexical verb
For each part of speech you can see 3 columns. The first two columns (left and middle) are for the CLLIP and Comp corpora respectively. What is remarkable is the similarity between the two for most parts of speech. There are many more nouns proportionally in the Newspaper corpus, while there are more lexical verbs in the fiction corpora.
Frequency data
CLLIP – 22.0%; Comp – 22.4%; News – 23.5%
The top ten most frequent tokens for the CLLIP and Comp corpora are remarkably similar, particularly the top 4. Note the greater frequency of ‘of’ in the News corpus, which is related to the higher number of nouns – in expressions such as ‘the resignation of’. The figures at the top show the percentage of the overall frequency that the top ten account for in each corpus
Frequency - adjectives
CLLIP – 14.6%; Comp – 11.3%; News – 11.9%
Once again, a remarkable similarity exists between the top 11 adjectives for the fiction corpora, while the Newspaper corpus contains many adjectives that refer to social attributes. The figures at the top indicate that the top 11 adjectives in the CLLIP corpus do a larger amount of ‘work’ than those for the other two corpora
Frequency - nouns
CLLIP – 8.3%; Comp – 7.8%; News – 6.7%
POSgram information
This table shows the most frequent 6-POS grams for each corpus. For each corpus, the sequence preposition + article + noun + of + article + noun is most common, followed by preposition + article + noun + preposition [not ‘of’] + article + noun in the two fiction corpora
Prep+art+[ ]+of+art+noun
51%
This slide shows the nouns that most frequently fill the third slot in the preposition + article + noun + of + article + noun sequence. This shows that the sequence most commonly indicates spatial or temporal relations in the fiction corpora while in the newspaper corpus it can also express causal relations. The top six nouns in the CLLIP corpus account for 51% of the 6 POS grams of this sequence.
Body parts: NECK
Do nouns in the CLLIP corpus more typically refer to physical entities in the world than the equivalent noun in the Comp corpus? The two righthand columns show the percentage of uses of the word ‘neck’ that are used to refer to part of a piece of clothing, or used in an idiomatic sense. The adult corpus contains only a marginally higher percentage of idiomatic uses.
Neck
CLLIP: ‘stick your neck
out’ Little physical
contact Intimacy with
animals Neck as site of
pain
Comp: ‘breathing down
your neck’ Lots of physical
contact Intimacy between
humans Neck as site of
desire, tenderness, place for ornamentation
Finger
CLLIP Figurative – 13% Jab, prod, lay, run,
put Accusing,
admonishing Used for drawing,
for indicating the need for silence and for pulling triggers
Comp Figurative – 19% Put, raise, point, run,
jab, wag Furtive, tentative,
negligent Used for
communicating, for feeling [contours & textures], for wearing rings
in time – CLLIP
We looked at uses of ‘in time’ in the CLLIP corpus. The dominant meaning is immediate, and characters are concerned to accomplish something before the expiry of an implied deadline, externally imposed. A childly perspective seems often to imply staying on the right side of trouble or sanction.
in time – Comp
‘In time’ in the Comp corpus is used in several senses.i: ‘in the fullness of time’, time on a large scale, which the speaker can perceive from a distanceii: ‘within an appropriate period of time’iii: others, as in the last line, where ‘in’ and ‘time’ have more separate meanings than is usual in the phrase