word frequency lists for chinese teaching and learning ...the correlation between the rank and...

30
The 3rd International Symposium on CSL Teaching and Learning 第三屆華語二語教學國際研討會 Yale-China Chinese Language Centre, Chinese University of Hong Kong March 22, 2013 Word Frequency Lists for Chinese Teaching and Learning: With Reference to Language Use 語言運用與漢語教學用詞頻表的研制 Chengzhi Chu 儲誠志 University of California, Davis

Upload: others

Post on 22-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

The 3rd International Symposium on CSL Teaching and Learning

第三屆華語二語教學國際研討會

Yale-China Chinese Language Centre, Chinese University of Hong Kong March 22, 2013

Word Frequency Lists for Chinese Teaching and Learning:

With Reference to Language Use

語言運用與漢語教學用詞頻表的研制

Chengzhi Chu 儲誠志

University of California, Davis

Page 2: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

• 1990年代后的各种词频研究,特别是信息处理学界的研究:北航、北语、北大、清华、山西大学、台湾中研院、Leeds U., Lancaster U., C. Chu, 国家语委,香港,新加坡国立大学,等等.

• 最近发展:

《汉语风》分级词表 (2007-)

中国国家语委《现代汉语常用词表》 (商务印书馆,2008) 台湾师大 华语教学基础词汇(2010) 新HSK分级词表 刘英林等 《汉语国际教育用音节汉字词汇等级划分》 (2011) 等等。

频率词典和教学基础词表的研究

--有关成果

• 早期代表成果:

《现代汉语频率词典》,HSK词汇等级大纲

Page 3: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

•为衡量一本教材的词汇选择和分布控制的 合理性提供参照依据

频率词典和教学基础词表的研究

--对词汇教学的重要性

• 使教材中的词汇选择与安排有了参照基础

• 使词汇教学的横向关联与纵向贯通成为可能

• 词语使用“登原现象”的揭示对改进词汇教学深具意义

Page 4: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

Chu (2003) Chinese Vocabulary

Frequency List

• Corpus

-- Large size: 54,072,905 words; 82,328,955 characters

-- Balanced samples:

Mainland 58.5% Taiwan 35.3% Hong Kong 6.2%

Literature: 33.31% Newspaper: 32.10%

Journals (humanity, science, and technology): 15.31%

K-12 textbooks: 6.19% Other: 13.09%

-- Contemporary data: mainly 1980s to 2001

• List: 133,414 vocabulary items

Page 5: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

The Plateau-Climbing Phenomenon

词语使用的 “登原现象”

The correlation between the rank and accumulated frequency of vocabulary

items forms a “plateau-climbing phenomenon” in a statistics chart, which

figuratively represents the ‘core’ and ‘peripheral’ vocabulary distinction.

(Chu, 2003) 词语使用的 “登原现象”

Accumulated Chinese Word Frequency: 1-133,414

0.000%

20.000%

40.000%

60.000%

80.000%

100.000%

120.000%

0

20000

40000

60000

80000

100000

120000

140000

160000

Acc_Freq

Page 6: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

Rank Word Count Freq Acc_Freq

1 的 3285067 6.075% 6.075%

50 很 109173 0.202% 28.559%

100 因爲 57518 0.106% 35.868%

150 打 35647 0.066% 39.918%

200 一定 29311 0.054% 42.905%

250 這些 24723 0.046% 45.378%

300 我國 21273 0.039% 47.482%

350 朋友 18512 0.034% 49.316%

400 得到 16816 0.031% 50.958%

450 事情 15151 0.028% 52.428%

500 辦 13743 0.025% 53.754%

Accumulated Chinese Word Frequency: Rank 1-500

0.000%

10.000%

20.000%

30.000%

40.000%

50.000%

60.000%

0 50 100 150 200 250 300 350 400 450 500 550

Acc_Freq

Closer Look: Ranks 1-500

Page 7: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

Rank Word Count Freq Acc_Freq

1000 緊 7293 0.013% 62.742%

2000 旁邊 3649 0.007% 72.112%

3000 付出 2266 0.004% 77.408%

4000 甜 1573 0.003% 80.917%

5000 受不了 1182 0.002% 83.449%

6000 强制 911 0.002% 85.381%

7000 添加 732 0.001% 86.904%

8000 繁華 605 0.001% 88.136%

9000 運送 506 0.001% 89.162%

10000 時尚 439 0.001% 90.035%

Accumulated Chinese Word Frequency: Rank 1000-10000

0.000%10.000%20.000%30.000%40.000%50.000%60.000%70.000%80.000%90.000%100.000%

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

Acc_Freq

Closer Look: Ranks 1000-10000

Page 8: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

Rank Word Count Freq Acc_Freq

10000 時尚 439 0.001% 90.035%

20000 臺灣大學 147 0.000% 94.759%

30000 扉頁 68 0.000% 96.669%

40000 忠言 40 0.000% 97.676%

50000 南極洲 30 0.000% 98.285%

60000 宿命論 16 0.000% 98.694%

70000 不遠萬里 17 0.000% 99.029%

80000 映射 4 0.000% 99.189%

90000 跳棒棒 3 0.000% 99.343%

100000 霹靂神兵 1 0.000% 99.496%

110000 精實行銷公司 1 0.000% 99.655%

120000 氟化氫 1 0.000% 99.810%

130000 被髮纓冠 1 0.000% 99.963%

133414 阿·德米特裏延科 1 0.000% 100.000%

Accumulated Chinese Word Frequency: Rank 10000-133414

0.000%

20.000%

40.000%

60.000%

80.000%

100.000%

120.000%

0

20000

40000

60000

80000

100000

120000

140000

160000

Acc_Freq

Closer Look: Ranks 10000-133414

Page 9: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

Pedagogical Significance

登原现象的启示

-- “核心词汇”与“非核心词汇”的分野及相应的

教学策略

-- 对词汇习得中有关现象的解释

Page 10: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

频率词典和教学基础词表 --实例

• 词频表实例:

Page 11: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

频率词典和教学基础词表的研究

--若干局限

1. 单一平面上的频率统计方法和简单的“高频先学”理念给面向第二语言教学的中文词汇计量研究造成根本局限

--“核心词”/“基础词”的确立需要新的理论基础和系统的、大规模的语料分析

2. 词的确立与分词操作存在很多问题: 面向信息工程的分词 面向语言研究的分词 面向第二语言教学的分词

3. 等级跨度太大,不适合海外汉语教学的需要

Page 12: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

4.“义项词频”的研究和“义项词频表”的研制尚未展开 --同形/同音词:会来;开会;会客 --多义词:打人;打球;打电话;打主意

频率词典和教学基础词表的研究

--若干局限

5. 词项整理缺少系统的理论研究和应用方案,没有“词群”关联与归并理念: 妈妈 vs 妈,一点儿/一点/点儿/点,图书 vs 图书馆,…

Page 13: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

• 简单的“高频先学”理念:频率不是唯一的依据

单一平面上的频率统计方法和简单的“高频先学”理念

• 单一平面的频率统计:虽然语料都由抽样构成,但

词表统计大多不分语域,不别语用。

-- 结果: “你好、谢谢、妈妈、玩、长”等词语的排列

次序远在“经济、美国、领导、既、组织” 之后,尽

管在一般人的语感中前者比后者更为常用、有用。

Page 14: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

改进之道-总体方针

“以學習者為中心”

服務於--

“幫助學習者在盡可能短的時間內,學會盡可能多的語言能力,學的東西盡可能有用”

這一基本的教學目標。

Page 15: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

研制原则:

Within certain practical limits, present earlier core

vocabulary items which have

a) higher frequency,

b) broader coverage and

c) higher availability, and meet

d) students’ immediate needs;

Page 16: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

改进之道-操作方案

“以学习者为中心”的词表研制须要考虑 四个交际层级: 基本生存、自由生活、大众文化、专业百科这一现 代学习者语言学习与应用一般阶程的四个阶段。

两类交际语体: 口语和书面语两种基本的语言体式,其中还须分别日 常自然、私密俚俗、庄重典雅等三类交际语域。

三个参照维度: 第一维度指本族人使用语言的实际情况,尤其是对不同项目的使用频 度和在不同语域的使用广度; 第二个维度是以前的学习者对各项目的实际习得情况,通常通过对中 介语语料的分析统计进行观察; 第三个维度是语言教师长期的经验积累,这种经验积累集中体现于代 表性的语言教材。多元学习环境指的世界各地各不相同的中文语 境和课程设置。

Page 17: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

改进之道-操作方案

在实际研究中,需要

--构建大型的本族人语料库、中介语语料库和代表性教材语料库(都可利用已有的语料库资源改造补充),

--各语料库中的语料要按照交际层级和交际语体的下位类型及其他社会语言学属性进行细致具体的属性标注,

--然后进行字、词(义项)、结构式、篇、文化点、交际场景等层面的信息加工和标注,

--最后对各语料库进行分层统计和综合统计,并根据一套严谨的、可验证操作的标准和程序对分层统计和综合统计的结果进行综合计算,

从而得出不同类别的学习者在不同学习环境中的不同学习阶程所应该学习的语言与交际项目的序列清单和参考分级。

Page 18: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

改进之道-操作方案

显然,这样的序列清单和参考分级不是单一平面的频率统计的结果,也不是不同的教学环境下完全一致的固定方案;而是一个以学生为中心,兼具高度通用性和适度本地化的、可以动态分级的参考方案。

Page 19: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

《汉语风》(Chinese Breeze) 是一套大型中文分级泛读系列丛书。这

套丛书以“学习者通过轻松、广泛的阅读提高语言的熟练程度、培养语感、增强对中文的兴趣和学习自信心”为基本理念,根据难度分为8

个等级,每一级8-10册,共60余册,每册8,000至30,000字。丛书的读者对象为大学和中学里从初级一直到高级(大致掌握300-4,500常用词)水平的中文学生,以及水平在此之间的其他中文学习者。

Hànyǔ Fēng (Chinese Breeze) is a large and innovative Chinese

graded reader series which offers over 60 titles of enjoyable stories at

eight language levels. It is designed for college and secondary school

Chinese language learners, offering them a new opportunity to read

for pleasure and simultaneously developing real fluency, building

confidence, and increasing motivation for Chinese learning.

《汉语风》分级系列读物的编写实践 The practice of Chinese Breeze graded readers

Page 20: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

300 word level:

Page 22: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

ICTCLAS 9.0 (Zhang 2008)

WS-Proofreader 1.0 (Chu 2007)

VocProfiler 1.0 (Chu 2008, 2011)

WordComparator 1.0 (Chu 2007)

ChineseTA 1.1 (2005)

etc.

软件工具 Software Tools

Page 23: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: ChineseTA

Page 24: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: WS-Proofreader

Page 25: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

1. One base word list and three texts for processing are selected:

图示 Illustrations: VocProfiler

Page 26: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: VocProfiler

2. Report “General Statistics” of the processed texts:

Page 27: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: VocProfiler

3. Report statistics of text lengths, sentence lengths, word tokens and types,

type-token ratio and standardized type-token ratio.

Page 28: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: VocProfiler

4. Analyze vocabulary coverage of each text by the base word list:

Page 29: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

图示 Illustrations: VocProfiler

5. Produce word list with the info of frequency, distribution range,

occurrence indexes, uniqueness against to the base list, etc. of each word:

Page 30: Word Frequency Lists for Chinese Teaching and Learning ...The correlation between the rank and accumulated frequency of vocabulary items forms a “plateau-climbing phenomenon” in

谢 谢 !