julie j. dubeau canadian defence academy bilc conference san antonio, texas, may 20-24 2007

40
Are We All On the Same Page? An Exploratory Study of OPI Ratings Across NATO Countries Using the NATO STANAG 6001 Scale* Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007 *This research was conducted as an MA Thesis, Carleton University, September 06

Upload: nairi

Post on 14-Jan-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Are We All On the Same Page? An Exploratory Study of OPI Ratings Across NATO Countries Using the NATO STANAG 6001 Scale*. Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Are We All On the Same Page? 

An Exploratory Study of OPI RatingsAcross NATO Countries

Using the NATO STANAG 6001 Scale* 

Julie J. DubeauCanadian Defence Academy

BILC CONFERENCESAN ANTONIO, TEXAS, May 20-24 2007

*This research was conducted as an MA Thesis, Carleton University, September 06

Page 2: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Presentation Outline Context Research Questions Literature Review Methodology Results

Ratings Raters Scale use

Conclusion

Page 3: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

NATO Language Testing Context Standardized Language Profile (SLP) based on the

NATO STANDARDIZATION AGREEMENT (NATO STANAG) 6001 Language Proficiency Levels

26 NATO countries, 20 Partnership for Peace (PfP) countries

Interoperability is essential

Page 4: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Research Questions

The overarching research question was:

How comparable or consistent are ratings across NATO raters and countries?

Page 5: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Research Questions Research questions pertaining to the

ratings (RQ1)

Research questions pertaining raters’ training and background (RQ2)

Research questions pertaining to the rating process and to the scale (RQ3)

Page 6: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Literature Review

Testing Constructs What are we testing?

Rater Variance How do raters vary?

Page 7: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Methodology Design of study : Exploratory survey

Participants : Recruited at Sofia BILC 05

103 raters from 18 countries and 2 NATO units

Control group

Page 8: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Methodology Instrumentation & Procedure & Analysis

Rater data questionnaire

2 Oral Proficiency Interviews (OPIs) A & B

Questionnaire accompanying each sample OPI

Page 9: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Methodology

Analysis Rating comparisons

Original ratings ‘Plus’ ratings

Rater comparisons Training Background

Page 10: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Methodology Country to country comparisons

Within country dispersion

Rating process Rating factors

Rater/scale interaction Scale user-friendliness

Page 11: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results RQ1- Summary

Ratings : To compare OPIs ratings in NATO countries, and to explore the efficacy of ‘plus levels’ or plus ratings.

Some rater-to-rater differences

‘Plus’ levels brought ratings closer to the mean

Some country-to-country differences

Greater ‘within-country’ dispersion

Low correlation between samples A & B

Page 12: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results All Ratings for Sample A (level 1)

Levels Numbers %

1 46 44.7

1+ 14 13.6

2 40 38.8

2+ 2 1.9

3 1 1.0

Total 103 100.0

Page 13: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results All Ratings (with +) for Sample A

Levels Numbers %

Within Level 1 range

70 68.0

Within Level 2 range

32 31.1

Within Level 3 range

1 1.0

Total 103 100.0

Page 14: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

View of OPI ratings sample A

within level 3within level 2within level 1Stacked view of A

60

50

40

30

20

10

0

Co

un

t

1

32

10

60

Within L3 rangeWithin L2 rangeWithin L1 range

Adjusted scores with ‘pluses’

Page 15: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

All Countries’ Means for Sample A

2.402.202.001.801.601.401.201.00

Overall Country Mean

Co

un

try

nu

mb

ers 20

1918

1716

1514

1312

1110

98

7

65

43

21

Page 16: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results All Ratings for Sample B (level 2)

Levels Numbers %

1 2 1.9

1+ 1 1.0

2 47 45.6

2+ 8 7.8

3 34 33.0

3+ 2 1.9

4 2 1.9

Total 96 93.2

Page 17: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

View of OPI ratings sample B

within level 4within level 3within level 2within level 1

Stacked view of B

60

50

40

30

20

10

0

Co

un

t

1 1

31

5

55

12

Within L4 range

Within L3 range

Within L2 range

Within L1 rangeAdjusted + range B

Page 18: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

All Countries’ Means for Sample B

3.303.002.702.402.101.80

countrymeanB

25.00

20.00

15.00

10.00

5.00

0.00

Co

un

try

#

2019

1817

1514

1312

1110

98

7

65

43

21

Page 19: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Samples A & B A Spearman rank-order correlation

coefficient ρ = .57 A Pearson product-moment

correlation coefficient r = .55

= low statistical correlations between the two sets of data (Samples A & B)

= no consistency from raters

Page 20: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results RQ2- Summary

Raters: To investigate rater training and scale training and see how (or if) they impacted the ratings, and to explore how various background characteristics impacted the ratings

Trained raters scored within the mean, especially for sample B

Experienced raters did not do as well as scale-trained raters

Full-time raters closer to mean ‘New’ NATO raters closer to mean No difference in ratings btwn NS & NNS raters

Page 21: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

substantial to lotsnone to little

70

60

50

40

30

20

10

0

Fre

qu

ency

63.27%

36.73%

Tester (Rater) Training

Page 22: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Rating B and Tester Training Crosstabulation

Summary of Tester Trg

Little Lots

Total

1420236

Score B correct? Yes

No

Missing

Total

14

20

2

36

44

14

4

62

58

34

6

98

Page 23: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

substantial to lotsnone to little

60

50

40

30

20

10

0

Per

cen

t

40.0%

60.0%

STANAG Scale Training

Page 24: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Rating B and STANAG Training Crosstabulation

 

Summary of STANAG Trg

Little Lots

Total

1420236

Rating B correct? Yes

No

Missing

Total

28

24

5

57

29

8

1

38

57

32

6

95

Page 25: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

5 years +4 to 5 years2 to 3 years0 to 1 year

50

40

30

20

10

0

Fre

qu

ency

49.5%

15.84%19.8%

14.85%

Years Experience

Page 26: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Rating B and 4 Yrs Experience Crosstabulation

 

Experience

3 yrs or less 4 yrs or more

Total

1420236

Rating B correct? Yes

No

Missing

Total

26

6

3

35

34

29

3

66

60

35

6

101

Page 27: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results Raters’ Background

Work in Testing Full-time? Yes 34 (33.0 %)

No 67 (65.0 %)

Full-time testers more reliable

60% were NNS

53% were from ‘older’ NATO countries

Page 28: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

‘Old’ & ‘New’ NATO Countries

 

Rating B Correct?

Yes No

Total

1420236

New NATO? Yes

No

Total

27

27

54

6

26

32

37

55

92

Other/Missing

4

2

6

Page 29: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

‘Old’ & ‘New’ NATO Countries

 

Summary of Tester Trg

Little Lots

Total

1420236

New NATO? Yes

No

Total

6

23

29

30

28

58

36

51

87

Page 30: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results RQ3- Summary

Scale: To explore the ways in which raters used the various STANAG statements and rating factors to arrive at their ratings.

Rating process did not affect ratings significantly

Rating factors not equal everywhere

3 main ‘types’ of raters emerged: Evidence-based Intuitive Extra-contextual

Page 31: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results An ‘evidenced-based’ rating for Sample B (level

2):

This candidate’s performance cannot be rated as 2+. Grammatical/structural control is inadequate and does not rise above (even occasionally) into the upper level. Mispronunciation detracts from the delivery and can be problematic. No evidence of well-controlled but extended discourse. No clear evidence of the use of even some complex structures that might raise the performance to the + level. Finally, there is no evidence that the performance rises and crosses into level 3. (Rater 36)

Page 32: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results An ‘intuitive’ rating for Sample A (level 1):

I would say that just about every single sentence in the interpretation of the level 2 speaking could be applied to this man. And because of that I would say that he is literally at the top of level 2. He is on the verge of level 3 literally. So I would automatically up him to a low 3. (Rater 1)

Page 33: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results An ‘extra-contextual’ rating for Sample A (level 1):

I wouldn’t give him a 2 plus but I would give him a 3 minus. I have to admit that I am basing that decision on the fact that by demonstrating he is a high 2 in every single aspect of the description of a level 2, I would give him a sort of vote of confidence that in any job abroad he might have a hard time at first but I think he could handle really working in the language. (Rater 1)

Page 34: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results

An ‘extra-contextual’ rating for Sample A (level 1):

Yes! I would be happy to give him a 1+. Since we do not use ‘plus levels’ I am afraid that rating him as a clear 1 would disadvantage him and, for this reason, I would rather give him a very low 2. (Rater 20)

Page 35: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results An ‘extra-contextual’ rating for Sample A (level 1):

I got to question 7 and re-read the STANAG document and now I think ‘2’ is more appropriate. (Rater 95)

***Level 3 is the basic level needed for officers in (my country). I think the candidate could perform the tasks required of him. He could easily be bulldozed by native speakers in a meeting, but would hold his own with non-native speakers. He makes mistakes that very rarely distort meaning and are rarely disturbing. (Rater 95)

Page 36: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Results

Control group:

Comparable ratings to lesser trained group of participants

Evidence-based ratings

Page 37: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Implications Plus levels beneficial

Training uneven

Frequent re-training

Different grids

Institutional perspectives

Page 38: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Limitations & Future Research

OPIs new to some participants

Future research could: Get participants to test Investigate rating grids Look at other skills

Page 39: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

ConclusionSo, are we all on the same page?

YES! BUT…

Plus levels were instrumental in bridging gap

Training was found to be key to reliability

More in-country norming should be the first

step toward international benchmarking

Page 40: Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Thank You!Questions?

Are We All On the Same Page? 

An Exploratory Study of OPI RatingsAcross NATO Countries

Using the NATO STANAG 6001 Scale

Julie J. [email protected]

The full thesis is available on the CDA website

http://cda.mil.ca/dpd/engraph/services/lang/lang_e.asp(A condensed article is also forthcoming)