results - tokyo university of foreign...

CHAPTER　FOUR

RESULTS

　　　　　This　chapter　reports　the　results　o　f　quantitative　and　qualitative　analyses　o　f

examinees’perf（）rmances　and　their　attitudes　toward　testing　speaking　in　the　computer

mode　and　face－to－face　mode．　In　order　to　answer　the　research　questions　posed　in　this

study，　data　were　gathered　from　multiple　sources：（1）examinees’scores　on　the　two　tests，

（2）examinees’speech　samples，　and（3）po　st－exam　questionnaire　results．

　　　　　The　analyses　fbcused　primarily　on　the　differences　in　test　scores，　speech　samples

and　examinee　attitudes　between　the　computer　mode　and　the　face－to－face　mode．

Quantitatively，　t－test　and克ctor　analysis　results　are　examined　to　determine　the

relationships　between　test　scores　across　modes．　The　results　of　ANOVAs　are　discussed

to　explore　the　relationships　between　delivery　mode　and　speech　sample．　Moreover，　t－test

and　chi－square　test　results　from　analysis　of　questionnaire　items　are　reported．

Qualitatively，　examinees’comments　on　open－ended　questions　are　discussed．

　　　　　This　chapter　is　corpposed　of　fbur　main　sections．　The　first　section　presents　the

results　of　analyzhlg　the　comparability　ofraw　scores　across　the　two　modes．　The　second

section　reports　the　comparability　of　underlying　constructs　measured　across　modes．

Examinees’speech　samples　are　examined　in　the　third　section，　in　which　results　regarding

the　effect　of　computer　delivery　mode　on　speech　samples　and　the　relationship　between

delivery　mode　and　examinees’proficiency　are　reported．　The　final　section　deals　with

questio皿aire　results．　Specifically，　examinees’responses　to　questionnaire　items

regarding　the　two　modes　are　analyzed．　The　comments　from　the　examinees　on

64

東京外国語大学博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)

comparisons　o　f　the　two　modes　are　categoriZed　and　illustrated　in　detail．

4．1　Comparing　the　magnitude　of　raw　scores

　　　　　This　section　first　reports　the　rater　reliability　fbr　ratings　assigned　to　the　monologic

tasks　delivered　in　the　computer　mode　and　the　face－to－face　mode．　It　then　examines　the

existence　of　order　effect，　which　is　a　concem　when　using　a　counterbalanced　design．

Finally，　it　describes　the　results　ofcomparing　the　mean　scores　ofratings　across　modes．

4．1．1　　1nter－rater　reliability

　　　　　In　speaking　test　scores　that　were　obtained　from　two　raters，　a　source　of　error

typically　lies　in　the　inconsistency　of　the　ratings．　Thus，　inter－rater　reliabilities　using　two

types　of　indexes　were　calculated　to　measure　the　consistency　between　the　two　ratings

awarded　to　each　rating　element　f（）r　each　task．　First，　Pearson　product　moment　correlation

coefficients　were　computed　between　the　ratings．　In　addition，　given　that　a　high

correlation　coefficient　could　be　obtahled　despite　relatively　different　rat01gs　being

awarded　by　the　two　raters，　the　inter－rater　agreement　percentage　was　also　calculated．

Exact　agreement　indicates　that　the　two　raters　assigned　the　same　score；a（lj　acent

agreement　means　that　the　rating　differenc　e　between　the　two　raters　was　one．

　　　　　As　can　be　seen　in　Table　4．1，　the　inter－rater　reliability　estimates（Pearson

correlation　coefficients）fbr　the　computer　mode　ranged　from．52　to．75．　Except　fbr

vocabulary（r＝．52），　fluency（r＝．53）and　pronunciation（r＝．54）fbr　the　opmion　task，

these　estimates　are　sufficiently　high．　Moreover，　the　agreement　between　the　ratings

awarded　by　the　two　raters　showed　satisfactory　results　in　total　agreement，　ranging　from

96．2％to　100％fbr　all　cases，　with　moderately　high　exact　agreement　percentages

65


（49．5％－72．2％）and　a（lj　acent　agreement　percentages（26．6％－45．6％）．

Table　4．11nter－rater　reliabilities　fbr　the　ratings　in　the　computer　mode

Rating　element　Task　　　Exact　　　A（lj　acent　　　Totalb

agreement％agreement％agreement％

Grammar

Vocabulary

Fluency

Pronunciation

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Na1Tative

Opinion

5482939476657555

72．2

58．2

57．0

49．4

70．9

59．5

62．0

55．7

26．6

38．0

40．5

36．7

29．1

30．4

38．0

40．5

98．8

96．2

97．5

86．1

100

89．9

100

96．2

N・te．　N　＝79．　apears・n　c・rrelati・n　c・efficient　between　the　tw・ratings．　bT・tal

agreement　percent　is　the　sum　of　the　exact　and　a（lj　acent　agreement　percent．

　　　　　Table　4．2　displays　the　results　of　inter－rater　reliabilities　fbr　the　face－to－face　mode．

Pearson　correlation　coefficients　ranged　from．60　to．74．　Except　fbr　pronunciation　figures

fbr　the　opinion　task（r＝．60）being　slightly　low，　other　figures　are　suf五ciently　high．

Similar　to　fmdings　for　the　computer　mode，　rater　agreement　tumed　out　to　be　satisfactory

fbr　all　cases　in　terms　of　exact　agreement（54．4％－68．4％），　a（ljacent　agreement

（29．1％－45．6％），and　total　agreement（96．2％－100％）．

　　　　　The　results　of　rater　agreement　indicate　that　the　two　ratings　assigned　to　both

modes　were　almost　all　within　one　score　difference．　Taking　the　results　ofboth　types　of

inter－rater　reliability　indexes　into　account，　the　consistency　ofthe　ratings　in　both　modes

Was　considered　to　be　acceptable　fbr　this　study．　Thus，　the　two　ratings　awarded　to　each

rating　element　were　averaged　for　each　task．　They　are　named　element　scores　in　this　study

and　are　used　in　the　fo　llowing　quantitative　analyses．

66


Table　4．21nter－rater　reliabilities　for　the　ratings　in　the　face－to－－face　mode

Rating　element　Task a　　　Exact　　　A（lj　acent　　　Totalb

agreement％agreement％agreement％

Grammar

Vocabulary

Fluency

Pronunciation

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

2304454077777676

60．8

65．8

54．4

60．8

68．4

67．1

65．8

55．7

39．2

31．6

45．6

39．2

31．6

29．1

34．2

41．8

100

97．4

100

100

100

96．2

100

97．5

N・te．　N－79．　apears・n　c・rrelati・n　c・efficient　between　the　tw・ratings．　bT・tal

agreement　percent　is　the　sum　of　the　exact　and　a（lj　acent　agreement　percent．

4．1．2　　0rder　effect

　　　　　Prior　to　comparing　raw　scores　awarded　to　the　two　modes，　order　effect　was

examined　to　address　the　concem　about　the　counterbalanced　research　design　that　was

adopted　in　this　study．　Table　4．3　presents　the　means　and　standard　deviations　fbr　the

average　ofelement　scores　across　the　two　tasks　and　total　scores　by　mode　and　order．　The

means　oftest　scores　in　Table　4．3　showed　that　the　two　groups　assigned　to　different　test

orders　seemed　to　perf（）rm　differently　acro　ss　the　two　modes．　The　different　trends　for　the

two　groups　were　most　evident　in　the　total　scores．　That　is，　fbr　the　group　who　took　the

computer　mode　first，　the　total　score　was　much　higher　in　the　face－to－face　mode（M＝

9．05，5D＝2．44）than　that　in　the　computer　mode（M＝7．68，　SD＝2．58）．　However，　total

scores　fbr　the　other　group　indicate　only　a　small　discrepancy　between　the　computer　mode

（ルf＝7．55，SD＝2．30）and　the　face－to－face　mode（M＝7．59，5D＝2．55）．　It　seems　that　a

practice　effect　occurred　for　the　group　who　took　the　computer　mode　first．

67


Table　4．3　Descriptive　statistics　ofscores　by　mode　and　order

Rating　element／Order Computer Face－to－face

M SD M SDGrammar　　　　　　　　　C→F

　　　　　　　　　F→C

Vocabulary　　　　　　　　　C→F

　　　　　　　　　F→C

Fluency　　　　　　　　　C→F

　　　　　　　　　F→C

Pronunciation

　　　　　　　　　C→F

　　　　　　　　　F→C

Total　score

　　　　　　　　　C→F

　　　　　　　　　F→C

1．90

1．83

2．06

1．98

1．79

1．89

1．94

1．84

7．68

7．55

0．72

0．60

0．70

0．63

0．62

0．70

0．67

0．54

2．58

2．30

2．19

1．86

2．48

2．04

1．99

1．70

2．39

1．99

9．05

7．59

0．67

0．70

0．73

0．71

0．67

0．60

0．55

0．69

2．44

2．55

Note．　C→F：Computer　test　first／face－to・・face　test　second；F→C：Face－to－face　test

first／computer　test　second．

　　　　　In　order　to　test　whether　the　practice　effect　was　statistically　significant，　repeated

measures　ANOVAs　were　carried　out　on　element　scores　across　tasks　and　total　score．

Table　4．4　shows　that　the　mode－by－order　interactions　were　statistically　significant　on　all

types　ofscores．　There　were　also　significant　main　effects　of　mode　on　the　element　scores

except　fbr　fluency　and　total　score．　However，　in　this　case，　the　interpretation　of

interactions　between　order　and　mode　should　take　precedence　over　the　main　effect．

Table　4．4　ANOVA　results　ofscores　by　mode　and　order

Rating　element Mode Order Mode＊order

F F FGrammar

Vocabulary

Fluency

Pronunciation

Total　score

1　　14．62　　．00

1　　21．22　　．00

1　　　0．01　　．92

1　　41．95　　．00

1　　25．04　　　．00

1．83　．18

3．13　．08

0．47　．49

3．67　．06

2．22　．14

　9．31　．00

12．04　　　．00

16．71　　．00

11．09　　　．00

22．31　　　．00

68


4．1．3　Comparing　test　scores

　　　　　Given　the　significant　interaction　between　delivery　mode　and　test　order，　it　was

decided　not　to　comi）ine　the　data　from　the　two　administrations　of　the　tests　in　different

orders．　Instead，　in　order　to　compare　the　magnitude　of　test　scores　across　modes，

independent　t－tests　were　conducted　separately　on　the　scores　of　the　first　and　the　second

test　administered　to　the　examinees．

　　　　　Table　4．5　presents　the　results　ofthe　t－test　on　the　first　test．　As　shown　in　Table　4．5，

fbr　the　narrative　task，　the　means　of　all　the　element　scores　in　the　computer　mode　were

slightly　higher　than　those　in　the　face－to－face　mode．　On　the　other　hand，　the　opinion　task

showed　an　opposite　trend；that　is，　the　means　ofall　the　element　scores　were　higher　in　the

face－to－face　mode．　For　the　total　score，　the　mean　was　higher　in　the　computer　mode（M＝

7．68）than　in　the　face－to－face　mode（M＝7．59）．　Further，　all　the　differences　in　the　means

of　element　scores　and　total　scores　between　the　two　modes　were　small，　being　no　greater

than　O．21．

　　　　　The　t－test　results　confirmed　that　none　of　the　differences　in　the　means　of　element

scores　and　total　scores　between　the　two　modes　were　significant．　This　indicates　that　fro　m

the　data　of　the　first　test　administered，　delivery　mode　did　not　make　a　difference　on　the

magnitude　ofexaminees’　test　scores　in　terms　ofeither　element　scores　or　total　score．

69


Table　4．5　lndependent　t－test　results　ofscores　ofthe　first　test　administered　acro　ss　modes

Rating　element TaskComputer（n＝41）

Face－to－face

（n＝38）t

M SD M SDGrammar

Vocabulary

Fluency

Pronunciation

Total　score

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

2．05

1．74

2．21

1．91

2．01

156

2．05

1．83

7．68

0．81

0．71

0．77

0．77

0．79

0．54

0．68

0．71

2．58

1．86

1．87

2．05

2．03

1．80

1．59

2．05

1．92

7．59

0．76

0．72

0．72

0．75

0．63

0．62

0．75

0．69

2．55

1．09

－0．77

0．92

－0．65

1．29

－O．24

－0．02

・・n．58

0．17

846201867243528958

　　　　　On　the　other　hand，　the　results　of　the　t－test　based　on　the　data　of　the　second　test

administered　revealed　a　different　pattem　from　that　ofthe　frrst　test．　Table　4．6　shows　that

there　were　significant　differences　acro　ss　modes　in　the　scores　ofgrammar　on　the　opinion

task　and　hl　those　of　vocabulary　and　pronunciation　on　both　the　narrative　task　and　the

opinion　task．　The　total　score　was　also　significantly　different　across　modes．　These　results

were　considerably　different丘om　tho　se　of　the　first　test，　where　no　significant　difference

was　fbund　in　any　type　of　scores　across　modes．　The　disparity　of　the　results　seems　to

provide　evidence　fbr　the　concern　about　the　interaction　between　the　delivery　mode　and

test　order．　Thus，　to　evaluate　the　effects　ofthe　delivery　mode　on　the　magnitude　ofthe　test

score，　it　was　decided　to　only　use　the　results　from　the　analysis　ofthe　frrst　test，　which　are

considered　to　be　more　valid，　being　without　the　contamination　ofthe　order　effect．

70


Table　4．6　lndependent　t－test　results　of　scores　ofthe　second　test　administered　acro　ss

　　　　　　　　modes

Rating　element　TaskComputer（n＝38）

Face＿to＿face

（n＝＝　41）t

M 5D M SDGrammar

Vocabulary

Fluency

Pronunciation

Total　score

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

Narrative

Opinion

1．91

1．75

2．12

1．84

2．08

1．71

1．97

1．71

7．55

0．62

0．70

0．70

0．71

0．81

0．74

0．52

0．64

2．30

2．15

2．23

2．51

2．45

2．10

1．89

2．40

2．38

9．05

0．74

0．73

0．79

0．76

0．78

0．70

0．57

0．62

2．44

155

2．97

2．34

3．66

0．10

1．11

3．48

4．69

2．82

302027001100092000

4．2　Comparing　psychometric　constructs

　　　　　One　ofthe　purposes　ofthe　present　study　was　to　investigate　the　effect　ofcomputer

delivery　mode　on　the　underlying　constructs　in　comparison　to　the　face－to－face　mode．　To

this　end，　a　series　of　exploratory　factor　analyses　were　perf（）rmed　to　explore　statistically

whether　there　are　co　mponents　that　are　shared　in　common　by　the　mono　lo　gic　tasks

delivered　in　the　two　modes．　In　this　section，　first，　the　results　ofchecking　the　assumptions

of　the　exploratory　factor　analysis　are　presented，　and　then　the　results　of　the　exploratory

factor　analyses　are　reported．

　　　　　Given　that　analysis　in　the　previous　section　revealed　a　practice　effect，　to　deal　with

this　proble叫the　original　data　was　transfbrmed　by　subtracting　it　from　the　mean　scores

on　each　variable　fbr　each　group　assigned　to　the　different　test廿lg　orders．　For　reportmg

purposes，　all　the　eight　variables　fbr　each　mode　used　in　the　fbllowing　analyses　were

71


assigned　labels　as　shown　in　Table　4．7．　For　example，　CGN　refers　to　the　grammar　score

f（）rthe　narrative　task　of　the　computer　mode，　and　FVO　represents　the　vocabulary　score

for　the　opinion　task　ofthe　face－to－face　mode．

Table　4．7　Measured　variables　used　in　factor　analysis

Measured　variable Task Label

Computer〃mode

　　Grammar　scores　in

　　Vocabulary　scores　in

　　Fluency　scores　in

　　Pronunciation　scores　in

Face－to－face〃mode

　　Grammar　scores　in

　　Vocabulary　scores　in

　　Fluency　scores　in

　　Pronunciation　scores　in

Narrative

QpinionNarrative

QpinionNarTative

QpinionNarrative

Qpinion

Narrative

QpinionNarrative

QpinionNarrative

OpinionNarTative

Qpinion

CGNCGOCVNCVOCFNCFOCPNCPO

㎝GOWW日m州mFFFFFFF

4．2．1　Preliminary　data　analyses

　　　　　Table　4．8　presents　descriptive　statistics　ofall　the　variables．　They　were　computed

to　check　the　assumptions　ofthe　exploratory　factor　analysis．

　　　　　Univariate　normality　of　the　16　0bserved　variables　was　assessed　through

examination　ofthe　skewness　and　kurtosis　fbr　each　variable．　As　seen　in　Table　4．8，　none

of　the　observed　variables　deviated　from　normality，　with　values　fbr　the　skewness　and

kurtosis　within　an　acceptable　range　of－2　to　2．　The　Pearson　product－moment　correlations

72


among　all　the　variables　were　calculated（see　Appendix　I）．　The　correlation　figures　ranged

from　56　to．84，　indicating　that　all　the　variables　correlated　fairly　well　with　each　other

and　none　of　the　correlation　coefficients　were　particularly　large．　This　suggests　that

multicollinearity　is　not　a　problem　fbr　the　present　data．　Univariate　outliers　were　also

examined；no　subject　was　fbund　to　be　a　univariate　outlier（z＜－30r・z＞3）fbr　all　the

variables．

Table　4．8　Descriptive　statistics　for　all　variables

Variable Min Max M SD Skewness Kurtosis

CGNCGOCVNCVOCFNCFOCPNCPOFGNFGOFVNFVOFFNFFOFPNFPO

25

S6

ノ〉と）旋≧．1V＝79．

4．2．2　　EXPRoratory　factor　analyses

　　　　　First，　an　exploratory　factor　analysis，　using　the　principal　axis　method（Principal

factor　analysis）and　a　varimax　rotation　pattem，　was　carried　out　to　explore　the　number　of

factors　underlying　the　eight　observed　variables　fbr　the　computer　mode．　The　solution

73


revealed　only　one　factor．　The　factor　was　produced　based　on　eigenvalues　greater　than　1．O

as　shown　in　Table　4．9．　The　scree　plot（see　Figure　4．1）confirmed　the　mea血1gfUhless　of

the　factor．　The　factor　accounted　fbr　about　75％of　the　total　variance．　As　seen　in　Table

4．10，the　magnitudes　ofall　the　factor　loadings　were　substantial，　ranging丘om．82　to．90．

This　suggests　that　all　the　eight　variables，　which　represent　fbur　rating　elements　in　each

task，　are　reasonably　good　indicators　ofthis　factor．

Table　4．9　Exploratory　factor　analysis　o　f　data　from　the　computer　mode

Factor Eigenvalue Percentage　of　variance Cumulative　percentage12345678

6．04

0．74

0．41

0．30

0．17

0．17

0．10

0．08

75．45

9．23

5．14

3．73

2．12

2．09

1．29

0．95

75．45

84．68

89．82

935595．67

97．76

99．05

100．00

4u肩きω

　　　　　　　　　　　　　　　　　　　1’＿s45678　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Factol　lmmber

　　　　　　　　　　　　Figure　4．　l　Scree　plot　for　data　from　the　computer　mode

Table　4．10　Factor　loading　for　the　data　from　the　computer　mode

74


Variable Factor　loading

CGNCVNCFNCPNCGOCVOCFOCPO

7748992088888889

　　　　　Aprincipal　factor　analysis　with　varimax　rotation　was　also　conducted　for　the

face－to－face　mode，　and　the　results　obtained　were　similar　to　those　fbr　the　computer　mode．

As　indicated　in　Table　4．11，　the　results　showed　that　only　one　factor　with　eigenvalues

greater　than　l．O　was　extracted．　The　scree　test　also　suggested　one　factor（see　Figure　4．2）．

Table　4．11　shows　that　about　77％of　the　variance　was　explained　by　this　factor．　Table

4．12　presents　the　factor　loadings　ofthe　variables，　which　demonstrate　that　the　factor　was

well　defined　by　the　variables　since　factor　loadings　were　high　within　a　range　of．83

to．91．

Table　4．11Exploratory　factor　analysis　o　f　data　from　the　face－to－face　mode

Factor Eigenvalue Percentage　ofvariance Cumulative　percentage

6．12

0．62

0．43

0．25

0．19

0．16

0．13

0．09

76．54

7．70

5．36

3．18

2．42

1．96

1．66

1．19

76．54

84．24

89．60

92．77

95．19

97．16

98．81

100．00

75


ω肩≧ω置

　　　　　　　　1－d，45678　　　　　　　　　　　　　　　　　　　FactOi’　ntvanb　er

Figure　4．2　Scree　plot　for　data　from　the　face－to－face　mode

Table　4．12　Factor　loading　for　the　data　from　the　face－to－face　mode


㎝W“日G。vomp。FFFFFF

．87

．90

．86

．85

．90

．91

．83

．86

　　　　　The　analyses　to　this　point　revealed　that　both　modes　seemed　to　measure　only　one

factor．　Further，　given　that　factor　loading　re　flects　the　portion　of　the　total　variance　that

each　variable　contributes　to　the　factor，　a　comparison　o　f　the　factor　loadings　o　f　8　variable　s

across　modes　shows　that　each　pair　of　observed　variables　generally　has　equivalent

loading　on　the　factor．

76


　　　　　In　order　to　support　the　above　analyses　fbr　the　two　modes，　another　principal　factor

analysis　with　varimax　rotation　was　perf（）rmed　with　the　16　observed　variables　o　f　both　the

computer　mode　and　the　face－to－face　mode．　Again，　the　solution　produced　only　one

component　on　the　basis　of　eigenvalues　greater　than　l．0（see　Table　4．　13），　which　was

confirmed　by　the　scree　plot　in　Figure　4．3．　This　single　factor　accounted　f（）r　most　of　the

total　variance（71％）．　As　can　be　seen　in　Table　4．14，　all　the　variables　loaded　highly　on

the　factor，　ranging　from．78　to．88．　Further，　they　seemed　to　contribute　similarly　to　the

major　component　with　similarly　high　values　of　factor　loadings　when　coml）ared　across

modes．

　　　　　Taken　together，　the　results　described　in　this　section　indicate　that　monologic　tasks

delivered　in　the　computer　mode　and　the　face－to－face　mode　seem　to　measure　the　same

psychometric　construct．

Table　4．13　Exploratory　factor　analysis　o　f　combined　data　from　both　modes

Factor Eigenvalue Percentage　ofvariance Cumulative　percentage

11 71．12

5．41

4．97

3．79

2．78

2．55

2．03

1．40

1．17

1．07

0．92

0．83

0．62

0．52

0．42

0．40

71．12

76．53

81．50

85．29

88．07

90．62

92．65

94．05

95．22

96．29

97．21

98．04

98．65

99．17

99．60

100．00

77


・肩旨u躍

　　　　　　　　　　　　1　　＿　　⊃　　4　　5　　6　　7　　8　　9　10　11　1⊃　13　14　15　16

　　　　　　　　　　　　　　　　　　　　　Factor　ntu　lb　er

　　　　　　Figure　4．3　Scree　plot　f（）r　the　combined　data　fro　m　both　modes

Table　4．14　Factor　loading　for　the　combined　data　from　both　modes


ぽ㎝㎝㎜㎝σ℃σ℃蕊蕊㎜㎜㌶

78


4．3 Comparing　speech　samples

　　　　　In　this　section，　reliabilities　ofthe　codings　of　speech　samples　between　two　coders

are　first　reported．　The　results　of　grouping　examinees　according　to　their　scores　on

computer－delivered　tasks　are　then　introduced．　Finally，　the　results　of　comparing　speech

samples　between　the　two　modes　are　presented．

4．3．1　　1nter－coder　reliability

　　　　　The　inter－coder　reliability　was　examined　through　agreement　between　the　codings

from　the　two　coders．　Table　4．15　summarizes　the　inter－coder　reliabilities　for　all　the

coding　units．　As　can　been　seen　in　Table　4．15，　the　achieved　levels　were　high　in　almost　all

cases，　ranging　fヒom　87％to　99％．

Table　4．15　Summary　o　f　inter－coder　reliability

Category Coding　units Inter－coder　agreement（％）

Fluency

Accuracy

Complexity

Speech　time

Length　ofpause　time

Unfilled　pauses

Filled　pauses

Words　of　repetition

Words　of　self－correction

Words　of　false　starts

Total　words

AS－unit

㎞dependent　clause

Subordinate　clause

ErTors

TypeTokenGrammatical　words

Lexical　words

High－frequency　lexical　words

Low－frequency　lexical　words

00000000

ハUOOO

000000

8799977798999989

つ」88∠0

79


4．3．2　Grouping　of　examinees　based　on　test　scores

　　　　　In　order　to　examine　a　possible　interaction　effect　between　language　proficiency

and　delivery　mode，　participants　were　categorized　into　three　groups　based　on　their　total

score　on　computer－delivered　tasksl7．　As　shown　in　Table　4．16，　those　who　scored　over

66．6％were　assigned　to　the　high　proficiency　group（n＝＝26；、ルf＝9．69），　and　those　who

scored　between　66．6％and　33．3％were　assigned　to　the　middle　proficiency　group（n＝

26；、M＝15．17）．　The　rest　were　assigned　to　the　low　proficiency　group（n＝27；M＝

20．63）．The　results　ofaone－way　ANOVA　revealed　a　significant　effect　f（）r　the　placement

in　a　proficiency　group，、F（2，76）＝227．82，　p＜．00．　Post　hoc　analyses（Tukey）indicated

that　each　group　was　significantly　different　in　their　total　scores　at　p＜．00．

Table　4．16Descriptive　statistics　fbr　proficiency　groups

Pro　ficiency　group M 5D Min MaxLowMidHighTotal

9．69

15．17

20．63

15．23

1501．27

2．54

4．87

8．00

13．00

17．50

8．00

12．00

17．50

26．50

26．50

4．3．3　Effect　of　delivery　mode　and　interaction　With　proficiency

　　　　　In　order　to　examine　the　effect　ofdelivery　mode　on　examinees’speech　sample　and

interaction　ofdelivery　mode　with　examinees’proficiency，　repeated　measures　ANOVAs

were　conducted．　The　results　were　presented　in　the　respect　of　fluency，　accuracy，　and

cornplexity．

17fiven　that　the　GTEC　fbr　STUDENTS　is　a　computer－delivered　test　and　the　correlation　figure　between

　the　total　scores　for　the　computer　and　face－to－face　version　was　quite　high（r＝．95），　it　was　decided　to　use

　the　total　scores　from　the　computer　version．

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　80


4．3．3．1　　Fluency

　　　　　Table　4．17　displays　the　means　and　standard　deviations　for　fluency　measures　by

delivery　mode　and　proficiency　level．　As　can　be　seen　in　Table　4．17，　fluency　is　higher　in

computer－delivered　monologic　tasks　fbr　measures　of　speech　rate　and　dysfluent　words，

whereas　it　is　higher　in　the　face－to－face　mode　regarding　filled　pauses　and　unfilled　pauses．

Table　4．17　Descriptive　statistics　o　f　fluency　measures　by　mode　and　proficiency

　　　Measure

（per　60　seconds）

Computer Face＿to＿face

Pr（～f M SD M SDNo．　of　words

No．　ofunfilled　pauses

No．　of　filled　pauses

No．　of　dysfluent　words

No．　ofrepetition　words

No．　of　self－correction　words

No．　of　false　start　words

㎞隠M㎞㎜麗㎞㎜盟㎞㎜鵠㎞嚇鵠㎞堀鵠㎞醐盟 51

U4

V9

U5

16

P0

P3

P7

S」

47

U0

V8

U2

12

P3

Q0

Q0

Note．」Prof＝Proficiency　groups．

81


　　　　　Table　4．18　summarizes　the　statistics　o　f　measures　of　fluency　by　means　of　the

repeated　measures　ANOVA．　As　shown　in　Table　4．18，　a　main　effect　of　delivery　mode

was　found　fbr　three　measures，　and　the　results　were　somewhat　mixed．　There　were

significant　differences　in　the　number　o　f　dysfluent　words　per　60　seconds（F（1，2）＝15．16，

p＜．00）and　the　number　ofrepetition　words　per　60　seconds（F（1，2）＝16．05，　p＜．00）．

That　is，　examinees　used　more　repetition　words　in　face－to－face　monologic　tasks　than　in

those　delivered　by　coMI）uter．　This　means　that　examinees　were　more　fluent　with　the

computer　mode　than　with　the　face－to－face　mode．　A　significant　difference　was　also

observed　for　the　measure　of　filled　pauses　per　60　seconds（F（1，2）＝7．55，p＜．01）but　in

the　oppo　site　direction．　That　is，　examinees　used　more　filled　pauses　in　computer－delivered

tasks，　indicating　that　they　were　more　fluent　in　the　face－to－face　mode．　There　was　no

significant　hlteraction　effect　between　delivery　mode　and　proficiency　level，　suggesting

that　fluency　of　examinees’speech　produced　with　the　two　modes　was　not　affected

differently　by　their　proficiency．

Table　4．18　ANOVA　results　for　fluency　measures

　　　Measure

（per　60　seconds）

Mode Level Mode＊level

F F FNo．　ofwords

No．　ofunfilled　pauses

No．　of　filled　pauses

No．　ofdysfluent　words

No．　ofrepetition　words

No．　of　self－correction　words

No．　of　false　start　words

2．14

0．11

7．55

15．16

16．05

2．83

2．06

222222（∠

38．46

36．33

0．92

0．77

1．35

0．27

0．05

2222222

0．16

0．48

1．69

0．32

0．14

0．41

1．26

82


4．3．3．2　Accuracy

　　　　　Accuracy　was　examined　in　terms　oftwo　measures：ratio　of　error－free　clauses　and

ratio　of　error－free　AS－units．　Table　4．19　presents　the　means　and　standard　deviations　fbr

accuracy　measures　by　delivery　mode　and　proficiency　level．　According　to　Table　4．19，

the　face－to－face　tasks　yielded　a　slightly　higher　accuracy　on　both　measures．　However，　the

results　of　the　repeated　measures　ANOVAs　failed　to　show　these　differences　to　be

statistically　significant（see　Table　4．20）．　This　means　that　linguistic　accuracy　was　not

affected　by　the　delivery　mode．　In　addition，　no　interaction　effect　between　delivery　mode

and　proficiency　level　was　statistically　significant．　This　demonstrates　that　linguistic

perf（）rmance　across　modes　was　not　different　in　the　aspect　of　accuracy　among　the　three

proficiency　groups．

Table　4．19Descriptive　statistics　ofaccuracy　measures　by　mode　and　proficiency

Measure Computer Face＿to　．．　face

Prof M SD M SDPercentage　of　error－free　clauses

Percentage　of　error－free　AS－units

LowMidHighTotal

LowMidHigh

Total

0．28

0．47

0．65

0．47

0．18

0．35

0．54

0．36

0．21

0．19

0．19

0．24

0．19

0．21

0．21

0．25

6679667922272227

0．28

0．54

0．63

0．49

0．19

0．44

0．56

0．40

0．24

0．19

0．18

0．25

0．23

0．19

0．18

0．25

6679667922272227

N（）te．」Pr（’f＝Proficiency　groups．

Table　4．20　ANOVA　results　for　accuracy　measures

Measure Mode Level Mode＊level

F F FPercentage　oferror－free

clauses

Percentage　of　error－　free

AS－units

1　　0．51　　．48

1　　2．69　　．10

230．05．00

2　　30．41　　．00

〔∠（∠

1．50

1．13

？」つJ〔∠2」

83


4．3．3．3　Comple】dty

　　　　　Complexity　was　measured　in　terms　of　both　syntactic　complexity　and　lexical

coMI）lexity．　Table　4．21　shows　descriptive　statistics　fbr　complexity　measures　by　delivery

mode　and　pro　ficiency　level．　For　the　measures　ofsyntactic　complexity，　the　means　for　the

coMI）uter　mode　were　slightly　higher　than　those　fbr　the　face－to－face　mode．　Lexical

complexity　also　increased　in　the　computer　mode　but　only　with　respect　to　Guiraud’s

Index，　while　two　other　measures　oflexical　density　went　in　the　opposite　direction．

Table　4．21　Descriptive　statistics　ofcomplexity　measures　by　mode　and　proficiency

Measure Computer Face＿to－face

Prof M SD M SDPercentage　of　clauses

Percentage　of　subordinate　clauses

No．　ofwords

Guiraud’s　Index

Lexical　density

Weighted　lexical　density

㎞㎜鴇㎞隠霊㎜鵠㎞㎜麗㎞隠M㎞㎜麗

ノVと）te．」Pr（～∫＝Proficiency　group　s．

84


　　　　　As　shown　in　Table　4．22，　the　repeated　measures　ANOVAs　revealed　these

differences　not　to　be　significant　in　terms　ofthe　main　effect　ofthe　delivery　mode．　That　is，

there　was　no　significant　difference　in　syntactic　complexity　and　lexical　complexity　o　f　the

language　produced　in　the　two　modes．　Again，　no　significant　interaction　effect　between

proficiency　level　and　delivery　mode　could　be　established．　This　implies　that　examinees

at　different　proficiency　levels　did　not　perform　differently　in　terms　of　linguistic

complexity　across　modes．

Table　4．22　ANOVA　results　for　complexity　measures

Measure Mode Level Mode＊level

F F FPercentage　of　clauses

Percentage　of　subordinate

clauses

No．　ofwords

Guiraud’s　Index

Lexical　density

Weighted　lexical　density

0．62

0．01

1．16

0．07

1．67

1．94

（∠（∠　

35．17

27．15

40．33

50．59

11．77

9．42

ハUO　

OハUOO

OハU　

OOO∩U

（∠（∠　

0．53

1．96

0．14

0．25

1．73

1．83

く∨－　

4．4　Comparing　examinee　attitudes

　　　　　In　order　to　explore　the　face　validity　o　f　the　computer－delivered　speaking　test　from

the　examinees’perspective，　two　questionnaires，　described　in　Section　3．2．3，　were

administered　immediately　after　the　two　tests　were　completed．　The　questionnaires　aimed

to　collect　infbrmation　in　two　areas：general　attitudes　toward　speaking　tests　delivered　in

the　computer　and　the　face－to－face　modes（Questionnaire　l）and　a　direct　comparison　of

the　two　modes（Questionnaire　2）（see　ApPendix　E　and　ApPendix　F　fbr　the

questionnaire　s）．

85


4．4．1　Examinee　attitudes　toward　the　two　modes

　　　　　Table　4．23　presents　the　means　and　standard　deviations　for　the　five　statements　in

Questionnaire　l　on　examinee　attitudes　and　perceptions　regarding　the　computer　mode

and　the　face－to－face　mode．　As　shown　in　Table　4．23，　mean　scores　for　examinees’

responses　were　all　above　3，　except　for　those　on　favorableness（Q4）fbr　the　computer

mode（M＝2．95），　indicating　that　examinees　generally　showed　agreement　with　the

statements　fbr　both　modes．　Specifically，　examinees　reported　that　they　felt　nervous　on

both　tests．　They　considered　both　tests　to　be　difficult　but　fair．　They　held　a　slightly　neutral

position　toward　the　computer　mode　but　showed　favorable　attitudes　toward　the

face－to－face　mode．　Finally，　they　perceived　both　tests　to　be　accurate　measures　of　their

spoken　English．

Table　4．23　comparative　results　on　Questionnaire　1

Statement Computer Face＿to＿face t

M SD M SD

Ifelt　nervous　when　I　was　taking　the　test．

1　feel　this　test　was　difficult．

Ifeel　the　test　was　fair．

Iliked　the　format　ofthe　test．

The　test　reflects　accurately　how　well　I

speak　English．

3．13

3．68

3．57

2．95

3．14

1．23

1．07

0．89

1．05

0．96

3．56

3．82

3．76

3．16

3．40

1．17

0．99

0．83

0．98

0．92

2．78＊＊

1．49

1．97

2．20＊

2．32＊

64．／0106

No　te．＊p＜．05；＊＊liP＜．01．

　　　　　In　order　to　evaluate　differences　in　examinees’responses　to　the　statements

regarding　the　two　modes，　a　dependent　t－test　was　carried　out．　The　results　in　Table　4．23

revealed　that　there　were　significant　differences　in　examinees’responses　to　three

statements．　That　is，　examinees　reported　a　lower　level　of　nervousness　in　the　computer

86


mo　de（M＝3．13，5Z）＝1．23）than　in　the　face－to－face　mo　de（M＝3．56，5D＝1．17）．　The

computer　mode　was　viewed　as　less　favorable（M＝2．95，5D＝1．05）than　the

face－to－face　mode（ル1＝3．16，5D＝0．98）and　less　accurate　in　reflecting　the廿English

speaking　level（M＝3．14，5D＝0．96）than　the　face－to－face　mode（M＝3．40，　SD＝0．92）．

However，　the　two　modes　were　not　fbund　to　be　significantly　different　in　test　difficulty

and　test　fairness．

4．4．2　Direct　comparisons　of　the　two　modes

　　　　　Questio皿aire　2　aimed　to　gather　infbrmation　enabling　a　direct　co叫）arison　of

examinees’attitudes　and　perceptions　conceming　testing　speaking　in　the　computer　mode

and　the　face－to－face　mode．　Table　4．24　presents　the　portion　of　the　examinees　that　chose

each　ofthree　options　hl　the　six　questions．

　　　　　Chi－square　tests　were　performed　to　statistically　test　the　difference　in　percentages

ofexaminees　that　chose　each　option．　The　results　in　Table　4．24　showed　that　the　observed

frequencies　differed　from　the　expected　frequencies　at　a　statistically　significant　level　fbr

all　the　questions　except　that　on　test　difficulty．　Specifically，　more　frequently　than

expected，　examinees　preferred　the　face－to－face　mode　and　fbund　it　more　favorable　and

more　valid　though　also　more　nerve－racking．　As　fbr　Question　2，　examinees　did　not　find

the　tests　in　the　two　modes　to　be　significantly　different　in　difficulty．　Regarding　Question

3，the　results　revealed　that　significantly　more　examinees　than　expected　considered　the

fairness　of　the　two　tests　to　be　the　same．　overall，　the　results　of　Questio皿aire　2

corroborate　those　ofQuestionnaire　1．

87


Table　4．24　comparative　results　on　Questionnaire　2

Question

　　　　　　　　　Face－to－　　BothComputer　　　　　　　　　　　face　　the　same　　（％）　　　　　　　　　　　（％）　　（％）

ρ

1Which　test　did　you　feel　more　nervous

　taking？

2　Which　test　did　you　find　more　difficult？

3　Which　test　did　you　feel　was　fairer？

4Which　test　do　you　like　better？

5Which　test　do　you　think　re且ected　your

　　English　level　more　accurately？

6　Which　type　oftest　do　you　prefer　to　take

　　in　the　fUture？

19．1

26．3

34．1

30．9

11．6

30．2

51．1

31．6

10．6

49．0

50．5

61．6

29．8

42．1

55．3

19．1

37．9

8．2

14．89＊　94

3．68

25．51＊

13．68＊

22．51＊

554．50／80／0ノ

37．28＊　86

Note．＊P＜．05．

　　　　　In　the　following，　responses　to　each　open－ended　question　in　Questio皿aire　2　are

categorized　by　means　ofcontent　analysis，　and　the　results　f（）r　each　question　are　reported

with　examples　ofcomments　from　examinees．

ρ1．JW2ic乃test・di吻oufeel〃iore　nervous　taking2

　　　　　As　can　be　seen　in　Table　4．24，　a　majority　of　the　examinees（51．1％）felt　more

nervous　in　taking　the　face－to－face　test，　while　a　minority（19．1％）perceived　the　computer

mode　to　be　more　anxiety－provoking．　Table　4．25　presents　the　percentage　of　detailed

comments　given飴r　each　option．75％of　those　who　gave　comments　on　choosing　the

飴ce－to－face　mode　attributed　the仕higher　level　of　anxiety　to　the　presence　of　the

mterviewer．　In　contrast，　conrments　from　tho　se　who　cho　se　the　co卿uter　mode　focused

primarily，　and　surprisingly，　on　the　facilitative　role　of　reactions　from　the　interviewer　as

opposed　to　the　one－sided　nature　of　the　computer　mode（57％）．　The　fbllowing　are　some

88


examples　ofthe　comments　from　the　examinees：

a．During　the　face－to－face　test，　I　cared　about　the　expressio　n　o　f　the　interviewer　and

　　worried　about　whether　I　was　speaking　well　or　not．　But　when　taking　the

　　computer－delivered　speaking　test，　I　don’t　need　to　face　an　interviewer　directly，

　　which　made　me　feel　quite　relaxed．（Face－to－face）

a．Ifelt　assured　during　the　face－to－face　test　when　the　interviewer　gave　reactions

　　with　eye　contact　and　backchannels．　However，　when　there　was　no　reaction丘o　m

　　computer，1　felt　somewhat　tense．（Computer）

a．Rather　than　to　the　computer，1　prefer　speaking　in　front　o　f　a　human　being，　since

　　I　felt　so　mehow　she　could　understand　what　1　was　talking　about．（Computer？

Table　4．25　summary　o　f　comments　on　nervousness（Q　1）

ReasonsFrequency（N　＝＝　68）

Percentage　　　（％）

Computer〃mode

　a．no　reactions　from　computer

　b．being　the　first　test　having　taken

　c．no　oPPortunity　to　clarify　unclear　instructions

　d．distracted　by　the　timer　on　the　screen

　e．concemed　with　quality　ofrecording　with　computer

Both　theぷame

　　a．having　the　same　task　contents

　　b．no　confidence　in　English　speaking　ability


　　a．the　presence　of　the　interviewer

　b．being　the　first　test　having　taken

　　c．unable　to　concentrate　on　practice

　　d．unable　to　practice　loudly

28

3

37

1

〔∠－

8弓111

2

41

4

54

57

Q5V74

713／0つ⊃

く」0ノ『11且

89


ρ2．〃hich　test　did　you∫〕nd〃zore　4頒cμ1τ2

According　to　Table　4．24，42．1％ofthe　examinees　thought　that　the　two　tests　were

not　very　different　in　difHculty．　Table　4．26　indicates　that　among　examinees　who

commented　on　this　question，　those　who　chose“Both　theぷαme”gave　as　their　main

reason　the　fact　that　the　content　and　procedure　of　the　two　types　of　tests　were　the　same

（86％）．Here　are　some　example　s　ofthe　comments：

a．Since　the　format　ofthe　two　t）？es　oftests　and　instnlctions廿om　the　interviewer

　　and　the　computer　were　all　the　same，1　did　not　feel　any　difference　in　difficulty　o　f

　　the　tests．（Both　the　50〃1（…ソ

a．There　is　no　difference　jn　task　contents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　，

　　does　not　differ．（Both　theぷαη2Cノ

so　I　feel　the　difficulty　ofthe　test　itself

Table　4．26　summary　o　f　comments　on　test　difficulty（Q2）

ReasonsFrequency

（N＝60）

Percentage

　　　（％）

Computer　mode　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　20

　　a．feeling　nervous　without　any　reactions　from　computer

　　b．being　the　first　test　having　taken

　　c．unable　to　ask　questions

　　d．unfamiliar　with　recording　their　voices　on　a　microphone

　　e．unconfident　to　communicate　with　voice　only

f．　feeling　unmotivated　to　perform　better

Both　the　same　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　21


　　b．no　confidence　in　English　speaking　ability

Face－to－face　mode　　　　　　　　　　　　　　　　　　　　　　　　　　　19

　　a．more　anxiety－provoking

　　b．being　the　first　test　having　taken

　　c．less　control　ofthe　pace　of　test　taking

002」1

1

33

35

32

0く」ハU

／04．QO11

06／010

ρ3．〃hich　test　didyoufeel　wasfairer2

Table　4．24　shows　that　most　ofthe　examinees（55．3％）chose“Both　the　same”fbr

90


fairness　ofthe　two　types　of　speaking　test．　Also，　it　should　be　noted　that　about　three　times

as　many（34．1％vs．10．1％）fe　lt　that　the　computer　mode　was　fairer．　Table　4．27

summarizes　the　percentage　ofdetailed　comments　given　for　each　option．　As　indicated　in

Table　4．27，　the　fact　that　the　test　contents　and　procedures　were　the　same　across　modes

was　the　main　reason　examinees　chose“Both　the　same”（87％）．　In　addition，　examinees

mentioned　the　absence　of　influence　by　the　interviewer　in　the　computer　test　as　the

primary　reason　for　its　fairness（72％）．　The　detailed　comments　are　as　fo　llows：

a．No　matter　which　type　of　speaking　test　it　is，　since　the　preparation　and　response

　　time，　the　content　of　the　tasks，　and　test　conditions　were　the　same，　I　thmk　the

　　飽irness　were　the　same．（Both　theぷ醐く劾

a．Ithink　some　interviewers　may　make　the　examinee　feel　nervous　or　relaxed．　So，

　　the　atmosphere　of　the　interviewer　may　well　influence　how　the　examinee

　　performs　in　the　face－to－face　test．（Computer？

a．The　interviewer　looked　quite　kind，　so　I　was　pretty　relaxed　during　the

　　face－to－face　test．　But　when　I　took　EIKEN　in　the　past，　the　interviewer　at　that

　　time　looked　quite　harsh　and　unfHendly．　So　I　was　quite　scared　and　couldn’t

　　speak　well　at　that　time．1℃omputer？

Table　4．27　summary　o　f　comments　on　test　fairne　ss（Q3）

ReasonsFrequency　　Percentage（ノV＝48）　　　　　（％）

Computer　mode　a．no　influence　ofthe　interviewer

　b．administration　under　the　same　conditions

　c．less　anxiety－provoking

Both　the　same

　　a．having　the　same　test　contents　and　test　procedures

　　b．feeling　no　anxiety　on　both　tests

Face－to－face〃zode

　　a．having　oPPortunity　to　talk　to　real　people

　　b．possible　to　clarify　instructions　with　the　interviewer

25

16

7

8　4．31

421

く∨〔∠

52

33

15

71111

丹13811

－且0／7’〔∠

91


ρ4．〃乃ich　test　did　you　1’んθZ）et旋ir2

　　　　　Table　4．24　indicates　that　almost　half　of　the　students（49％）reported　that　they

liked　the　face－to－face　test　better　than　the　computer　test，　while　30．9％favored　the

computer　test．　As　can　be　seen　in　Table　4．28，　comments　regarding　the　affective　appeal　o　f

the　face－to－face　test　f（）cused　primarily　on　the　fb　llowing：（a）it　felt　natural　to　talk　in　the

presence　of　the　interviewer（27％）；（b）it　was　less　anxiety－provoking（20％）；（c）it　was

similar　to　a　real　communication　situation（20％）；（d）it　was　pleasant　to　talk　with　people

（16％）；（e）it　was　po　ssible　to　clarify　unc　lear　instructions　with　the　interviewer（11％）；and

（りthey　felt　motivated　to　use　facial　expressions　and　gestures　to　communicate（7％）．

Tho　se　who　cho　se　the　comr）uter　mode　gave　the　fact　that　it　was　le　ss　anxiety－provokhlg　as

the　main　reason（54％）．　They　also　mentioned　better　concentration（12％）and　better

control　ofpace　of　test－taking　in　the　co卿uter　mode（12％）．　Comments　from　examinees

shed　some　light　on　these丘ndings：

　　　　　a．V》hen　I　speak㎞」丘ont　ofaperson，　I　feel　like　talking　to　that　person．　So，　I　feel　I

　　　　　　　could　speak　naturally　m　the　face－to－face　test．（Fac（ヲィo吻c¢ノ

　　　　　a．Ilike　the　face－to－face　test　because　I　fbel　someone　is　listening　to　what　I　am

　　　　　　　talking　about．　On　the　contrary，　I　feel　lonely　in　the　computer　test．（Face－to－fac¢ノ

　　　　　b．Ifbel　more　relaxed　when　I　talk　to　someone　than　when　I　talk　to　a　machine．

　　　　　　　（Face－to－face？

　　　　　e．Ithmk　it　is　practical　to　test　how　we　speak　when　someone　is　present．　Personally，

　　　　　　　Idon’t　like　taking　a　test　on　the　computer　because　1　feel　that　computer　is　easy　to

　　　　　　　break　down．　But　in　the　face－to－face　test，　the　interviewer　can　deal　with　problems

　　　　　　　that　may　happen　during　the　test．　So　1　feel　it　is　more　flexible．（Face・・to－face？

　　　　　e．In　the　face－to－face　test，　when　there　is　anything　I　don’t　understand　or　I　want　to

　　　　　　　clarify，　I　could　ask　questions．　The　intcrviewer　could　help　me　cope　with　the

　　　　　　　situation．（Fa　ce－to－face？

92


fWhen　the　interviewer　is　present，1　feel　motivated　to　use　non－verbal　means　such

　　as　facial　expression　and　gesture　to　express　what　1　want　to　say．（Face－to－face？

a．Ife　lt　quite　nervous　in　the　face－to－face　test．　But　in　the　cornputer－delivered

　　speaking　test，　since　I　don’t　need　to　talk　in　front　of　someone，　I　could　keep　calm

　　and　speak　as　usual．（Computer）

b．During　the　computer　test，　there　was　no　one　around．　So　compared　with　the

　　face－to－face　test，　I　was　able　to　concentrate　on　thinking　and　giving　responses　to

　　the　tasks．（Computer？

c．Icould　give　answers　to　the　questions　on　my　own　pace　in　the　computer　test．

　　（Computer）

Table　4．28　summary　o　f　comments　on　affective　apPeal　ofthe　test　modes（Q4）

ReasonsFrequency

（1V＝78）

Percentage　　　（％）

Computer〃mode

　a．less　anxiety－provoking

　b．better　concentration　on　thinking　and　responding

　c．better　control　ofthe　pace　of　test　taking

　d．being　a　fair　test

　e．being　able　to　practice　loudly

　fgetting　used　to　working　on　the　computer

　9．test－like

　h．having　spare　time　during　the　tasks

　i．not　getting　shy

26

41

33

54

P2

P2S44444

Both　the　same


　　b．having　both　advantages　and　disadvantages

74．「づ

9弓1354．


　　a．feeling　natural　to　talk　in　front　of　the　interviewer

　　b．less　anxiety－provoking

　　c．similar　to　the　real　communication　situation

　　d．pleasant　to　talk　to　real　people

　　e．possible　to　clarify　instructions　with　the　interviewer

　　fmotivated　to　use　expression　and　gestures

4512

X9753

587’0061

93


ρ5．Mtich　teぷt　do　you　t励k　reflecte吻o〃γEη91」ぷ〃evel〃20re　accuratelγ2

As　shown　in　Table　4．24，　half　of　the　students（50．5％）considered　the　face－to－face

test　as　giving　better　representation　of　their　actual　speaking　English　level，　whereas　only

11．6％chose　the　coml）uter　mode．　Table　4．29　presents　the　percentage　ofcomments　given

for　each　option．　According　to　Table　4．29，　tho　se　who　gave　comments　on　choosing　the

face－to－face　mode　mainly　believed　that　it　was　similar　to　a　real　communication　situation

（55％）and　it　was　less　anxiety－provoking（16％）．　Here　are　some　examples　of　the

comments：

a．In　the　situation　of　speaking　English，　it　usually　involves　conversation　between

　　people．　So，　I　think　the　face－to－face　test，　which　is　more　similar　to　real

communication　than　the　computer　test　can　measure　my　English　speaking

　　ability　more　accurately．（Face－to－face）

a．When　we　actually　speak，　there　are　always　other　people　present　to　listen　to　us　or

　　talk　to　us．　1　think　it　is　meaningless　to　talk　to　a　computer．（Face－to－face？

Table　4．29　summary　o　f　comments　on　test　validity（Q5）

ReasonsFrequency　　Percentage（ノV『＝54）　　　　　（％）

Computer〃mode


　b．no　influence　of　the　interviewer

　c．test＿like

Both　theぷame


　　b．having　both　advantages　and　disadvantages

．Face－to三face〃iode

ahCdef9 similar　to　the　real　communication　situation

less　anxiety－provoking

possible　to　clarify　instructions　with　the　interviewer

feeling　natural　to　talk　in　front　of　the　interviewer

nonverbal　performance　should　also　be　evaluated

motivated　to　talk　more

providing　good　extent　of　pressure　to　concentrate

11

5

38

弓づ（∠

2

20

9

70

つ⊃00　　9一／－1

0（U／04．

く∨／0く」－

94


ρ6．〃hich　type　of　test　would）ノou」prefer　to　take　in　theノ’uture　2

　　　　　Table　4．24　revealed　that　two－thirds　of　the　examinees（61．6％）would　prefer　the

face－to－face　test　when　given　a　choice，　while　only　30％of　the　students　chose　the

computer　test．　As　indicated　in　Table　4．30，　tho　se　who　preferred　the　face－to－face　test　gave

the　fbllowing　reasons：（a）they　felt　less　nervous（31％）；（b）it　was　similar　to　real

communication（29％）；and（c）it　was　pleasant　to　talk　to　real　people（17％）．　Interestingly，

those　who　chose　the　cornputer　test　also　mentioned　feeling　less　anxious　as　the　primary

reason（77％）．　Specific　comments　included　the　fo　llowing：

　　　　　a．Since　I　felt　quite　calm　in　the　face－to－face　test，　I　thmk　this　type　of　speaking　test

　　　　　　　SUitS　me　Well．（Face－to－face？

　　　　　b．Although　1　felt　a　little　nervous　in　the　face－to－face　test，　I　think　it　is　similar　to　the

　　　　　　　actual　communication　situation　where　we　need　to　talk　to　native　speakers．　Also，

　　　　　　　taking　the　test　seems　to　be　good　practice．　That’s　why　1　prefer　the　face－to－face

　　　　　　　test．（‘Face－to－face）

　　　　　b．Although　in　the　daily　life，　we　have　a　chance　to　speak　English　in　front　of

　　　　　　　someone，　we　seldom　need　to　talk　to　a　computer．　So　the　face－to－face　test　seems

　　　　　　　to　be　the　more　natural　one．（Face－to－face？

　　　　　a．If　the　only　purpose　of　taking　the　test　is　to　pass　it，　I　would　prefer　the

　　　　　　　cornputer－delivered　speaking　test　because　I　didn’t　feel　very　nervous　during　the

　　　　　　　test．（Compute1ジ

　　　　　a．Since　I　felt　very　nervous　du血g　the　face－to－face　test，　I　prefer　the　computer　test

　　　　　　　　in　which　I　could　perfbrm　to　the　best　as　I　usually　do．（Comρuter？

95


Table　4．30　summary　o　f　comments　on　preference　ofthe　test　modes（Q6）

ReasonsFrequency　　Percentage（ノVF＝72）　　　　　（％）

Computer〃zode　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　22


　b．being　a　fairer　test　without　the　interviewer’sinfluence

　c．because　he（she）is　good　at　it

　d．better　concentration

Both　the　same

　　a．having　both　advantages　and　disadvantages


　　a．less　anxiety－provoking

　　b．similar　to　the　real　communication　situation

　　c．pleasant　to　talk　to　real　people

d．motivated　to　strive　for　better　performance

　　e．　possible　to　clarify　instructions　with　the　interviewer

　　f．　concerned　with　quality　of　recording　with　computer

9．getting　used　to　the　face－to－face　test　format

2

48

1

2

31

3

67

77X95

100

31

Q9P7

W644

96


results - tokyo university of foreign...

Documents