how to remove an out layer tester lucjan janowski faculty of electrical engineering, automatics,...
TRANSCRIPT
How to remove an out layer tester
Lucjan Janowski
Faculty of Electrical Engineering, Automatics, Computer Science and ElectronicsDepartment of Telecommunications
2
Agenda
• Can a tester be an out layer?• The detecting philosophy• Latent variables• Rasch model• WinSteps• The final decision• Conclusion
2008 I 05-07
3
Can a tester be an out layer?
2008 I 05-07
4
What would we like to model?
• Why do we use testers?• A tester represents human
perception that is difficult to model • People are different and so are our
users/clients. Our goal is to take such difference into account
• Some of us are critical and others are uncritical
• A tester can be tired or not focused enough and therefore his/her answer can be random
2008 I 05-07
5
A tired tester problem
• A user can be tired too. Should we remove all tired testers?
• Can a tester score randomly? What are the consequences?
• Note that detecting that a tester scores a picture differently than the average score does not mean that it is a random tester
• We have to be very careful with testers removal since our goal is to build a model of the average user not the proper user
2008 I 05-07
6
Why are some scores different?
• Different effects can affect tester’s judgement differently (e.g. motion intensity, color, etc.)
• Testers have different experience (e.g. watching mainly youtube or films on a DVD set)
• Each of us is more or less critic to anything that he/she judges
• The words describing the opinion scale can be understood differently (in Poland OK is good in England OK is fair)
2008 I 05-07
7
What can we do?
• We have to detect random scores• A tester that scores randomly often
should be removed from the model building
• An answer that differs from the average score is not necessarily a random one therefore we have to consider the average score but corrected by a tester individualism
• We need a mathematic model of a user behavior that takes into account those properties
2008 I 05-07
8
Latent variable
OS
This is what a tester sees
Any distortion that influences QoE
2008 I 05-07
9
Latent variable
OS
Latent variable
This is what a tester sees
Any distortion that influences QoE
2008 I 05-07
10
Latent variable manifestation
2008 I 05-07
5 4 3 2 1
5 4 3 2 1
5 4 3 2 1
5 4 3 2 1
11
An example
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
12
Non extreme values testers
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
13
Wide range for 10 and 1
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
14
Critical tester
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
15
Are the answers random?
2008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
16
Rasch model
• We assume that a latent variable is the variable that is really scored by testers
• We assume that the opinion score probability is a logit function of the model parameters
• The function has parameters describing:– a tester “criticism” factor– a film/picture/… quality– an average threshold value for particular
score
2008 I 05-07
17
Rasch model equation
• n the tester number• i the object number (what is scored)• x the opinion score value (1-5, 0-10, …)
2008 I 05-07
)(
)(
1 xin
xin
e
enix
182008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
n
192008 I 05-07
Tester IDVideo ID (increasing distortion)
0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1
n
20
Rasch model
• We assume that Rasch model is correct and the data that do not fit this model are incorrect [sic]
• Note that without any assumption we are not able to detect randomly scoring testers
2008 I 05-07
Data
Model values
Observed values
5
1xnixniE
21
OMS (Outfit Mean Square)
• Knowing the model probability and the user answer we can estimate how far is a tester from the model
• A tester’s accuracy or quality is based on the OMS (Outfit Mean Square)
• Rasch model can be computed by WinSteps software (http://www.winsteps.com/)
• The OMS can be interpreted on the basis of heuristically obtained ranges2008 I 05-07
22
Results interpretation
2008 I 05-07
•A tester is not relevant and he/she should be removed2<OMS
•We should be suspicious 1.5<OMS<
2
•Correct tester0.5<OMS<1.5
•A tester fits the model too well OMS<0.5
23
An example results
2008 I 05-07
Tester IDVideo ID (increasing distortion)
OMS0 1 2 3 4 5 6 7 8 9 10
147 10 9 10 7 4 2 5 4 2 1 1 1.78148 10 9 8 5 4 3 1 3 2 1 1 1.23149 8 10 9 2 7 4 3 1 1 0 1 2.81150 9 9 9 5 6 5 3 2 5 2 2 0.90151 8 7 8 7 6 6 5 2 5 3 2 0.76152 10 9 7 8 7 4 3 3 2 1 1 1.36153 3 6 4 3 3 3 3 3 3 2 1 0.67
24
Rasch model disadvantages
• It is more accurate for more data. It is difficult to have lots of results since the tests are expensive
• Not all type of correct testers’ behavior can be modeled
• The algorithms are not implemented in Matlab therefore it is difficult to implement it in an automatic analysis made in Matlab
2008 I 05-07
25
Conclusion
• A tester’s answers make it possible to model human perception but not all his/her answers are correct
• Out layers should be removed • Rasch model helps to detect not relevant
testers • The final decision should be checked since
not all correct behaviors can be modeled by Rasch model
2008 I 05-07
262008 I 05-07