i wish i could believe you: the frustrating unreliability of some assessment research
TRANSCRIPT
![Page 1: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/1.jpg)
I wish I could believe you:the frustrating unreliability of some assessment research
Tim Hunt & Sally Jordan, The Open University@tim_hunt @SallyJordan9
Are these two things related?
![Page 2: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/2.jpg)
Trick question (of course)
2
From a great web sitehttp://www.tylervigen.com/spurious-correlations
![Page 3: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/3.jpg)
Correlation &Causation
3
![Page 4: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/4.jpg)
Sly (1999)
614 students P01 S01 S02
Practice tests as formative assessment improve student performance on computer-managed learning assessment
4
A computerised assessment was quite exciting in itself back in 1999!
Questions picked at random from a bank.
P01 & S01 used the same test bank.S02 was different, with no practice.
![Page 5: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/5.jpg)
Sly (1999)
614 students P01 S01 S02 609 students
417 62.18% 72.72% 66.88% 415
197 – 67.56% 62.24% 194
Practice tests as formative assessment improve student performance on computer-managed learning assessment
5
All standard deviations 15–17%
![Page 6: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/6.jpg)
Sly (1999)
614 students P01 S01 S02 609 students
417 62.18% 72.72% 66.88% 415
197 – 67.56% 62.24% 194
Practice tests as formative assessment improve student performance on computer-managed learning assessment
6
All standard deviations 15–17%
+5.38%+5.16% +4.64%
![Page 7: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/7.jpg)
OU level 3 physics (SM358)
An investigation into factors affecting physics’ students engagement with online assessment (Bolton & Jordan)
7
![Page 8: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/8.jpg)
OU level 3 physics (SM358)
The assessment strategy
8
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs
1 iCMA
2 iCMAs
3 iCMAs
4 iCMAs
5 iCMAs
6 iCMAs
![Page 9: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/9.jpg)
OU level 3 physics (SM358)
Proportion of students
9
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs 11.6% 3.4% 1.5% 0.5% 0.5%
1 iCMA 1.5% 1.0%
2 iCMAs 1.5% 2.4% 1.5%
3 iCMAs 1.5%
4 iCMAs 5.3% 2.4%
5 iCMAs 0.5% 3.9% 5.8% 8.2%
6 iCMAs 0.5% 0.5% 5.8% 5.8% 34.3%
![Page 10: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/10.jpg)
OU level 3 physics (SM358)
Exam mark
10
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs 6.0
1 iCMA
2 iCMAs 17.0 24.0
3 iCMAs 60.0
4 iCMAs 43.7 62.0
5 iCMAs 23.0 46.0 62.6 69.5
6 iCMAs 35.3 60.8 77.5
![Page 11: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/11.jpg)
OU level 3 physics (SM358)
Exam mark compared to predictive model
11
0 TMAs 1 TMA 2 TMAs 3 TMAs 4 TMAs
0 iCMAs −20.8
1 iCMA
2 iCMAs −43.9 −27.5
3 iCMAs −9.0
4 iCMAs −15.6 +1.8
5 iCMAs −3.8 −11.1 +1.4 +2.4
6 iCMAs −17.1 +3.4 +4.6
![Page 12: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/12.jpg)
Confoundingvariables
12
![Page 13: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/13.jpg)
Berkeley gender bias case (1973)
Men Women
Applicants Admitted Applicants Admitted
Total 8442 44% 4321 35%
https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case
13
![Page 14: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/14.jpg)
Berkeley gender bias case (1973)
Men Women
Department Applicants Admitted Applicants Admitted
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 272 6% 341 7%
https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case
14
![Page 15: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/15.jpg)
RealExperiments
15
![Page 16: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/16.jpg)
What is an experiment?
Split participants into two equal groups.
Split randomly, so if there are confounding variables,they are probably equally split between groups.
Give different ‘treatments’ to each group,trying to keep everything else the same.
Blind the treatment, if possible, to reduce all sorts of biases.
But, blinding is not normally possible in education.(You probably know if you just sat an exam!)
[Pick your favourite research methods book]
16
![Page 17: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/17.jpg)
Karpicke & Blunt (2011) + many more
Retrieval practice produces more learning than elaborative studying with concept mapping
17
![Page 18: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/18.jpg)
… but! Wooldridge et al (2014)
The testing effect with authentic educational materials:A cautionary note
18
“Based on [the testing effect], … some textbooks are now accompanied by quizzing ancillaries …The quizzes are designed with the assumption that answering factual and application questions will promote a more integrated mental model that incorporates the target knowledge.”
Typically, the quizzes and test banks sample items from similar sub-sections in the textbook but not necessarily the same information.
![Page 19: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/19.jpg)
… but! Wooldridge et al (2014)
19
The testing effect with authentic educational materials:A cautionary note
![Page 20: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/20.jpg)
How reliable isstudent opinion?
20
![Page 21: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/21.jpg)
Background
Our own work with interactive computer-marked assignments (iCMAs)
21
![Page 22: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/22.jpg)
Findings from a questionnaire
StatementDefinitely agree or
mostly agreeNeutral
Mostly or definitely disagree
Answering iCMA questions helps me to learn
129(85%)
7(5%)
12(8%)
If I get the answer to an iCMA question wrong, the computer-generated feedback is useful
128(85%)
11(7%)
8(5%)
Responses received from 151 students (response rate 20%)(Jordan, 2011)
22
![Page 23: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/23.jpg)
Watching students in a usability lab
Six students observed answering questions (Jordan, 2009)
23
![Page 24: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/24.jpg)
Data analysis
Much more data presented in Jordan (2014)
24
![Page 25: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/25.jpg)
Reflection
Weaver (2006, p. 386) reports that 90% of students agreed with the statement “Positive comments have boosted my confidence.”
Marriott (2009, p. 243) reports that 93% of students agreed with the statement“I find the immediate reporting of my test result valuable.” It is almost certainly the case that more students report that they find feedback useful than actually make good use of it. This is in line with the bias in self-reported behaviour that is observed in medicine and business. (Jordan, 2014, p. 69).
But: Student opinion is important. (Dermo, 2009).
We need to consider student opinion, butwe also need to consider students’ actual actions.
25
![Page 26: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/26.jpg)
Ethics
26
![Page 27: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/27.jpg)
Ethics
Is it ethical to only give a helpful intervention to half the class?
Are we allowed to do experiments in Education?
27
![Page 28: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/28.jpg)
Look at evidence-based medicine
How do you know it’s effective if you have not done the experiment?If you don't know whether it is effective, is it ethical to use it?
(They have been doing this for a while)
28
NICEAcademicresearchers
Drugcompanies
Doctors
Meta analysis
The literature
Medical schools
![Page 29: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/29.jpg)
The end
29
![Page 30: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/30.jpg)
References
Bolton, J., Jordan, R. & Jordan, S. (2015). An investigation into factors affecting physics' studentsengagement with online assessment, Manuscript in preparation.
Cohen, L., Manon, L. & Morrison, K. (2011). Research methods in education, 7th Edition, Routledge.
Dermo, J. (2009). e-Assessment and the student learning experience: A survey of student perceptions of e‐assessment. British Journal of Educational Technology, 40(2), 203–214.
Goldacre, B. (2008). Bad Science, Fourth Estate.
Goldacre, B. (2012). Bad Pharma, Fourth Estate.
Jordan, S. (2009). Assessment for learning: pushing the boundaries of computer-based assessment.Practitioner Research in Higher Education, 3(1), 11–19.
Jordan, S. (2011). Using interactive computer–based assessment to support beginning distance learners of science, Open Learning, 26(2), 147–164.
Jordan, S. (2014). E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics. PhD thesis. The Open University. At http://oro.open.ac.uk/41115/.
Karpicke, J. & Blunt, J. (2011). Retrieval practice produces more learning than elaborative studying withconcept mapping, Science, 331(6018) 772–775.
Marriott, P. (2009). Students' evaluation of the use of online summative assessment on an undergraduatefinancial accounting module. British Journal of Educational Technology, 40(2), 237–254.
Sly, L. (1999). Practice tests as formative assessment improve student performance on computer‐managed learning assessments, Assessment & Evaluation in Higher Education, 24(3), 339–343.
Vigen, T. (2014). Spurious Correlations, at http://www.tylervigen.com/spurious-correlations.
Weaver, M. R. (2006). Do students value feedback? Student perceptions of tutors’ written responses.Assessment & Evaluation in Higher Education, 31(3), 379–394.
Wikipedia (2015). Simpson's paradox, at https://en.wikipedia.org/wiki/Simpson%27s_paradox.
Wooldridge, C., Bugg, J., McDaniel, M. & Liu, Y. (2014). The testing effect with authentic educational materials:A cautionary note, Journal of Applied Research in Memory and Cognition, 3(3), 214–221. 30
![Page 31: I wish I could believe you: the frustrating unreliability of some assessment research](https://reader030.vdocuments.mx/reader030/viewer/2022032619/55becee0bb61eb15728b4670/html5/thumbnails/31.jpg)
Summary
Correlation vs causation
Confounding variables
Experiments – designed to minimise confounding variables
Don't abstract your experiment so muchthat the results aren't relevant
Student opinion and attitudes are importantbut different from actions or effectiveness
Ethical issues are real, but should be overcome
31
@tim_hunt [email protected]@SallyJordan9 [email protected]