reliability prepared by marina gvozdeva, elena onoprienko, yulia polshina, nadezhda shablikova

23
RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Upload: megan-houston

Post on 14-Jan-2016

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

RELIABILITYPrepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Page 2: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Outline

1. Defining reliability2. How to measure reliability3. Reliability coefficient 4. Observed score and true score5. SEM6. Item analyses

Page 3: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Tests as measuring tools

‘A test is something (as a series of questions or exercises) for measuring the skill, knowledge, intelligence, capacities, or aptitudes of an individual or group’

(Merriam Webster Dictionary Online, 2013)

Page 4: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova
Page 5: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova
Page 6: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Tests as measuring tools

‘…a language test is a procedure for gathering evidence of general or specific language abilities from performance on tasks designed to provide a basis for predictions about an individual’s use of those abilities in real world contexts.’

(McNamara, 2000:11)

Page 7: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

A reliable test

A perfectly reliable test is ‘one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered.’

(Hughes, 1989:31)

Page 8: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

An unreliable test

A completely unreliable test is one ‘which would give sets of results unconnected with each other.’

(Hughes, 1989: 32)

Page 9: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Strategies to estimate reliability

We can use statistics to estimate how reliable a test is:• test-retest reliability;• equivalent (parallel) forms reliability;• internal consistency reliability.

Page 10: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Test-retest reliability

‘calculating a reliability estimate by administering a test on two occasions and calculating the correlation between the two sets of scores’

(Brown, 2002)

Page 11: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Equivalent (parallel/alternative) forms

reliability‘calculating a reliability estimate by administering two forms of a test and calculating the correlation between the two sets of scores’

(Brown, 2002)

Page 12: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Internal consistency reliability

‘calculating a reliability estimate based on a single form of a test administered on a single occasion using internal consistency equations’

(Brown, 2002)

Page 13: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Internal consistency reliability:

• calculating reliability from single administration of test;

• some commonly reported figures (reliability coefficients) are;- split-half;- Cronbach’s alpha.

• calculated automatically by many statistical software packages.

Page 14: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Split-half reliability:

• the test is split in half (e.g. odd / even) creating “equivalent forms”;

• the two “forms” are correlated with each other;

• the correlation coefficient is adjusted to reflect the entire test length.

Page 15: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Reliability coefficient:

• range: -1.0 (inverse relationship) to 0.0 (totally unreliable test) to 1.0 (perfectly reliable test);

• reliability coefficients are estimates of the systematic variance in the test scores;

• lower reliability coefficient = greater measurement error in the test score.

Page 16: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

How high should reliability be?

(Pope n.d.)

Kevin
I have assumed no date, as it originally said "online".
Page 17: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Standard error of measurement (SEM):

• This allows us to use the score that the test taker got for the test (observed score) and estimate what their true level of ability might be. Of course, we do not know, so the ‘true score’ that we estimate must be a range of numbers.

• Observed score.• True score.

Page 18: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Maria’s scores:

50 50 49 52 50 51 49 48 50

True score = observed score +/- error

Standard error of measurement (SEM):

Page 19: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

We would expect the student to score near the centre of the distribution most of the time.

Standard error of measurement (SEM):

Page 20: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

The standard error of measurement (SEM) is the standard deviation of all those scores averaged across persons and test administrations.

(Brown, 2002)

Standard error of measurement (SEM):

Page 21: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Sx √(1-rxx’) Sx – standard deviation of raw scores

rxx’ – reliability coefficient

Standard error of measurement (SEM):

Page 22: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

1 SEM = 68% confidence2 SEM = 95% confidence3 SEM = 99.7% confidence

Standard error of measurement (SEM):

Page 23: RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova

Observed score = 50SEM = 3

68%: from 47 to 5395%: from 44 to 56

Standard error of measurement (SEM):