how to remove an out layer tester lucjan janowski faculty of electrical engineering, automatics,...

How to remove an out layer tester

Lucjan Janowski

Faculty of Electrical Engineering, Automatics, Computer Science and ElectronicsDepartment of Telecommunications

2

Agenda

• Can a tester be an out layer?• The detecting philosophy• Latent variables• Rasch model• WinSteps• The final decision• Conclusion

2008 I 05-07

3

Can a tester be an out layer?

2008 I 05-07

4

What would we like to model?

• Why do we use testers?• A tester represents human

perception that is difficult to model • People are different and so are our

users/clients. Our goal is to take such difference into account

• Some of us are critical and others are uncritical

• A tester can be tired or not focused enough and therefore his/her answer can be random

2008 I 05-07

5

A tired tester problem

• A user can be tired too. Should we remove all tired testers?

• Can a tester score randomly? What are the consequences?

• Note that detecting that a tester scores a picture differently than the average score does not mean that it is a random tester

• We have to be very careful with testers removal since our goal is to build a model of the average user not the proper user

2008 I 05-07

6

Why are some scores different?

• Different effects can affect tester’s judgement differently (e.g. motion intensity, color, etc.)

• Testers have different experience (e.g. watching mainly youtube or films on a DVD set)

• Each of us is more or less critic to anything that he/she judges

• The words describing the opinion scale can be understood differently (in Poland OK is good in England OK is fair)

2008 I 05-07

7

What can we do?

• We have to detect random scores• A tester that scores randomly often

should be removed from the model building

• An answer that differs from the average score is not necessarily a random one therefore we have to consider the average score but corrected by a tester individualism

• We need a mathematic model of a user behavior that takes into account those properties

2008 I 05-07

8

Latent variable

OS

This is what a tester sees

Any distortion that influences QoE

2008 I 05-07

9

Latent variable

OS

Latent variable

This is what a tester sees

Any distortion that influences QoE

2008 I 05-07

10

Latent variable manifestation

2008 I 05-07

5 4 3 2 1

5 4 3 2 1

5 4 3 2 1

5 4 3 2 1

11

An example

2008 I 05-07

Tester IDVideo ID (increasing distortion)

0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

12

Non extreme values testers

2008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

13

Wide range for 10 and 1

2008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

14

Critical tester

2008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

15

Are the answers random?

2008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

16

Rasch model

• We assume that a latent variable is the variable that is really scored by testers

• We assume that the opinion score probability is a logit function of the model parameters

• The function has parameters describing:– a tester “criticism” factor– a film/picture/… quality– an average threshold value for particular

score

2008 I 05-07

17

Rasch model equation

• n the tester number• i the object number (what is scored)• x the opinion score value (1-5, 0-10, …)

2008 I 05-07

)(

)(

1 xin

xin

e

enix

182008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

n

192008 I 05-07


0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1148 10 9 8 5 4 3 1 3 2 1 1149 8 10 9 2 7 4 3 1 1 0 1150 9 9 9 5 6 5 3 2 5 2 2151 8 7 8 7 6 6 5 2 5 3 2152 10 9 7 8 7 4 3 3 2 1 1153 3 6 4 3 3 3 3 3 3 2 1

n

20

Rasch model

• We assume that Rasch model is correct and the data that do not fit this model are incorrect [sic]

• Note that without any assumption we are not able to detect randomly scoring testers

2008 I 05-07

Data

Model values

Observed values

5

1xnixniE

21

OMS (Outfit Mean Square)

• Knowing the model probability and the user answer we can estimate how far is a tester from the model

• A tester’s accuracy or quality is based on the OMS (Outfit Mean Square)

• Rasch model can be computed by WinSteps software (http://www.winsteps.com/)

• The OMS can be interpreted on the basis of heuristically obtained ranges2008 I 05-07

http://www.winsteps.com/

22

Results interpretation

2008 I 05-07

•A tester is not relevant and he/she should be removed2<OMS

•We should be suspicious 1.5<OMS<

2

•Correct tester0.5<OMS<1.5

•A tester fits the model too well OMS<0.5

23

An example results

2008 I 05-07


OMS0 1 2 3 4 5 6 7 8 9 10

147 10 9 10 7 4 2 5 4 2 1 1 1.78148 10 9 8 5 4 3 1 3 2 1 1 1.23149 8 10 9 2 7 4 3 1 1 0 1 2.81150 9 9 9 5 6 5 3 2 5 2 2 0.90151 8 7 8 7 6 6 5 2 5 3 2 0.76152 10 9 7 8 7 4 3 3 2 1 1 1.36153 3 6 4 3 3 3 3 3 3 2 1 0.67

24

Rasch model disadvantages

• It is more accurate for more data. It is difficult to have lots of results since the tests are expensive

• Not all type of correct testers’ behavior can be modeled

• The algorithms are not implemented in Matlab therefore it is difficult to implement it in an automatic analysis made in Matlab

2008 I 05-07

25

Conclusion

• A tester’s answers make it possible to model human perception but not all his/her answers are correct

• Out layers should be removed • Rasch model helps to detect not relevant

testers • The final decision should be checked since

not all correct behaviors can be modeled by Rasch model

2008 I 05-07

262008 I 05-07

how to remove an out layer tester lucjan janowski faculty of electrical engineering, automatics,...

Documents

tester score

tester individualismwe

uncriticala tester

tester has8latent variable

tester idvideo id

random scoresa tester

tired tester problema

tired testers