bundeswehr university munich lmu munich / bundeswehr

34
Evaluation in Human-Computer Interaction: Beyond Lab Studies CHI 2021 Tutorial Albrecht Schmidt, LMU Munich Florian Alt, Bundeswehr University Munich Ville Mäkelä, LMU Munich / Bundeswehr University Munich

Upload: others

Post on 23-Feb-2022

16 views

Category:

Documents


0 download

TRANSCRIPT

Evaluation in Human-Computer Interaction: Beyond Lab Studies

CHI 2021 Tutorial

Albrecht Schmidt, LMU Munich

Florian Alt, Bundeswehr University Munich

Ville Mäkelä, LMU Munich / Bundeswehr University Munich

Organizers

Albrecht SchmidtLMU [email protected]

2

Florian AltBundeswehr University [email protected]

Ville MäkeläLMU [email protected]

Bundeswehr University [email protected]

Course Outline● Introductions

● Motivation

● Overview of Approaches to Out-of-Lab Studies

● Discussion

● Deep Dive

● Conclusion

3

Introductions - Breakout Groups● Breakout groups of 5 people

● 5 minutes

● Each of you discuss two of your own studies:

○ One you ran within the last year

○ One you wanted to run, but didn’t/couldn’t

● Does not need to be resolved fully

… go and talk to each other in the break after the tutorial :-)

4

Motivation● Does research need to be validated?

● Why do we evaluate?

● What is a good evaluation?

● Why is it difficult to change to an alternative evaluation?

5

6

OVERVIEW OF RESEARCH APPROACHES

Challenges and Opportunities● Reproducibility - but we do not value replication (nearly no papers)

● Internal vs external validity

● Long-term effects

Freely add your answers, ideas, and comments:

https://docs.google.com/document/d/18rbJORJnEepVF4CwyWaq6ZX-ocqjW_g3oY7SiH2R3d4/edit

7

Using Existing Stuff● Use existing data sets

Example: [Khamis et al., 2018]

Search Engine: https://datasetsearch.research.google.com/

● Use (previously collected) log data Examples: [Henze et al., 2013], [Alt et al., 2020]

● Mining existing information Example: [Mäkelä and Schmidt, 2020]

● Meta studies, synthesis of research results,

and literature researchExample: [Buschek et al., 2018], [Katsini et al., 2020]

8

[Alt et al., 2020] Alt, Buschek, Heuss & Müller. 2021. Orbuculum -

Predicting When Users Intend to Leave Large Public Displays. In Proc.

IMWUT.

[Buschek et al., 2018] Buschek, Hassib & Alt. 2018. Personal Mobile

Messaging in Context: Chat Augmentations for Expressiveness and

Awareness. In ToCHI.

[Henze et al., 2013] Henze, Sahami, Schmidt, Pielot, Michahelles. 2013.

Empirical Research through Ubiquitous Data Collection. In IEEE

Computer.

[Mäkelä and Schmidt, 2020] Mäkelä & Schmidt. 2020. I Don’t Care as

Long as It’s Good: Player Preferences for Real-Time and Turn-Based

Combat Systems in Computer RPGs. In Proc. CHI PLAY ‘20.

[Katsini et al., 2020] Katsini, Abdrabou, Raptidis, Khamis, Alt. The Role

of Eye Gaze in Security and Privacy Applications: Survey and Future

HCI Research Directions. In Proc. CHI ‘20

[Khamis et al., 2018] Khamis, Baier, Henze, Alt & Bulling. 2018.

Understanding Face and Eye Visibility in Front-Facing Cameras of

Smartphones used in the Wild. In Proc. CHI ‘18.

Web and App Usage● Create applications / web pages that

implement your study / evaluation

○ Example: [von Zezschwitz et al., 2016]

○ Use “interactive” survey tools (e.g., SoSci)

○ Distribution and Recruiting Channels

■ Crowdsourcing (e.g., MTurk, Prolific, etc.)

■ App Stores [Schneegass et al., 2014]

● Piggyback experiment into an app / web page

○ Example: ResearchIME [Buschek et al., 2018]

○ How to: [Henze et al., 2013]

9

[Buschek et al.,. 2018] Buschek, Bisinger & Alt. 2018. ResearchIME: A

Mobile Keyboard Application for Studying Free Typing Behaviour in

the Wild. In Proc. CHI '18.

[Henze et al., 2013] Henze, Sahami, Schmidt, Pielot, Michahelles.

2013. Empirical Research through Ubiquitous Data Collection. In

IEEE Computer.

[Schneegass et al., 2014] Schneegass, Steimle, Bulling, Alt & Schmidt.

2014. SmudgeSafe: geometric image transformations for

smudge-resistant user authentication. In Proc. UbiComp '14.

[von Zezschwitz et al., 2016] von Zezschwitz, Eiband, Buschek,

Oberhuber, De Luca, Alt, & Hussmann. 2016. On quantifying the

effective password space of grid-based unlock gestures. In Proc.

MUM '16.

Users at Home● Engage with users through remote

communication

○ [Fröhlich et al., 2021]

○ [Rivu et al., 2021, case study 2]

● Create prototypes that can be experienced

remotely

○ [Rivu et al., 2021, case study 1]

● Supply study equipment to your users at home

○ [Prange et al., 2019]

○ [Bramley et al., 2018]

10

[Fröhlich et al., 2021] Fröhlich, Wagenhaus, Schmidt, Alt.

Don’t stop me know! Exploring Challenges of First-Time

Cryptocurrency Users. In Proc. DIS ‘21 (to appear)

[Prange et al., 2019] Prange, Tiefenau, von Zezschwitz, Alt.

Towards Understanding User interaction in Future Smart

Homes. In Proc. CHI EA ’19.

[Rivu et al., 2021]. Rivu, Mäkelä, Prange, Delgado Rodriguez,

Piening, Zhou, Köhle, Pfeuffer, Abdelrahman, Hoppe,

Schmidt, & Alt. 2021. Remote VR Studies: A Framework for

Running Virtual Reality Studies Remotely Via

Participant-Owned HMDs. arXiv preprint.

[Bramley et al., 2018] Bramley, Goode, Anderson, & Mary.

2018. Researching in-store, at home: Using virtual reality

within quantitative surveys. International Journal of Market Research 60, 4 (2018), 344–351.

New Approaches to Evaluation● Run studies in virtual reality

● Use analytic methods (e.g. KLM)

● Do computational evaluation = proof it is better

(e.g. keyboard optimization, menu layout)

11

Mäkelä et al. 2020. Virtual Field Studies: Conducting Studies on Public Displays in Virtual Reality. In Proc. CHI ‘20.

Schneegaß et al. 2011. Support for modeling interaction with automotive user interfaces. In Proc. AutomotiveUI ‘11.

Appropriate Your Research Questions● Study phenomena that happen online

○ study Facebook behavior, fake news

● Study HCI in the home and in home office using remote methods

● Change to technical evaluation rather than working with users

12

https://thomaskosch.com/wp-content/papercite-data/pdf/hoppe2021odins.pdf

Matthias Hoppe, Daria Oskina, Albrecht Schmidt, and

Thomas Kosch. 2021. Odin’s Helmet: A Head-Worn Haptic

Feedback Device to Simulate G-Forces on the Human Body

in Virtual Reality. Proc. ACM Hum.-Comput. Interact. 5,

EICS, Article 212 (June 2021), 15 pages.

https://doi.org/10.1145/3461734

13

DISCUSSION

Breakout Groups● Topic per group: a specific approach (name of your breakout group)

○ What are the positive and negative aspects in contrast to lab studies? (report this in 30 secs)

○ Do you have examples: how was this used? how could it be used? where should it not be used?

● Topics:

○ Using Existing Stuff (don’t generate the data yourself - use data that is out there, incl. literature)

○ Web and App Usage (piggyback your experiment in a smartphone app / web site)

○ Users at Home (send equipment to people or use what they have at home)

○ New Approaches to Evaluation (just invent an evaluation method that works for your setting)

○ Appropriate Your Research Questions (change your research to make it fit what you can do)

● Google Doc for collecting research examples:https://docs.google.com/document/d/18rbJORJnEepVF4CwyWaq6ZX-ocqjW_g3oY7SiH2R3d4/edit

14

Discussion● Summarize your discussion in 30 seconds

○ What are the positive and negative aspects in contrast to lab studies?

● Do you have examples: how was this used? how could it be used? where

should it not be used? Add them here:

○ https://docs.google.com/document/d/18rbJORJnEepVF4CwyWaq6ZX-ocqjW_g3oY7Si

H2R3d4/edit

15

● Discuss in groups: How will reviewing need to change if we do more

out-of-the-lab evaluations?

○ What would we have to report that we don’t report now?

○ What criteria should be added for judging papers?

○ How to ensure reproducibility?

○ What different insights would papers begin to generate?

○ What do want to keep, what do we want to throw away after COVID?

● GoogleDoc for collecting research examples:https://docs.google.com/document/d/18rbJORJnEepVF4CwyWaq6ZX-ocqjW_g3oY7SiH2R3d4/edit

Breakout Groups

16

Discussion● Summarize your points for each question in 30 seconds

○ What would we have to report that we don’t report now?

○ What criteria should be added for judging papers?

○ How to ensure reproducibility?

○ What different insights would papers begin to generate?

○ What do want to keep, what do we want to throw away after COVID?

17

18

DEEP DIVE

Deep Dive● VR studies

○ Simulation studies

○ Remote studies

● Use data that’s “out there”

● Large-scale piggyback

● Technical evaluation (speed, forces)

19

VR Studies● Simulation studies

○ Studies where we utilize VR as a research testbed; we study phenomena that exist outside

of VR, but we study them in VR

● Remote studies

○ Studies involving VR technologies - for whatever purpose - can be conducted remotely,

without having users come to the lab

○ The easiest way is to recruit people who already own the necessary hardware, such as a

VR head-mounted display (HMD), and have them participate using their own setup

20

Simulation of Studies in VR● Can we simulate user studies in virtual reality?

- Yes (sort of)

● Promising results from several studies, where

results between a real-world study and an identical VR study were comparable

21

Mäkelä et al. 2020. Virtual Field Studies: Conducting Studies on Public Displays in

Virtual Reality. In Proc. CHI ‘20.Rivu et al. 2021. Exploring Emotions and Emotion Elicitation Techniques in Virtual Reality. In Proc. INTERACT ‘21 (to appear).

Remote VR Studies● Can we recruit people who own the

necessary VR hardware, and have them participate from home?

● Yes! And there are many ways to do it

22

Rivu et al. 2021. When Friends become Strangers: Understanding the Influence of Avatar Gender On Interpersonal Distance Between Friends in Virtual Reality. In Proc. INTERACT ‘21 (to appear).

Rivu et al. 2021. Remote VR Studies: A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs. arXiv preprint.

Remote VR Studies

23

Rivu et al. 2021. Remote VR Studies: A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs. arXiv preprint.

Remote VR Studies

24

Rivu et al. 2021. Remote VR Studies: A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs. arXiv preprint.

Remote VR Studies

25

Rivu et al. 2021. Remote VR Studies: A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs. arXiv preprint.

Remote VR Studies

26

Saffo et al. 2021. Remote and Collaborative Virtual Reality Experiments via Social VR Platforms. In Proc. CHI ‘21.

Rivu et al. 2021. Remote VR Studies: A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs. arXiv preprint.

Use Data That’s Out There● Example: analysis of online discussions

about video game -related preferences

○ We gathered relevant discussion threads

from several websites, resulting in 546 total

posts from 17 discussion threads and eight

different websites.

○ Thematic analysis over multiple rounds

27

Mäkelä & Schmidt. 2020. I Don’t Care as Long as It’s Good: Player Preferences for Real-Time and Turn-Based Combat Systems in Computer RPGs. In Proc. CHI PLAY ‘20.

Use Data That’s Out ThereOnline discussion are..

● Authentic

● Insightful in unexpected ways

● Messy and unstructured

● Little to no background information

available

28

Mäkelä & Schmidt. 2020. I Don’t Care as Long as It’s Good: Player Preferences for Real-Time and Turn-Based Combat Systems in Computer RPGs. In Proc. CHI PLAY ‘20.

Large-Scale Piggyback - Example● Investigating typing behavior on smartphones

● Applications: adaptive user interfaces, security

● Approach

○ Android keyboard app

○ Logging filter

○ 3 week field study

○ 6 Million events

29

[Buschek et al.,. 2018] Buschek, Bisinger & Alt. 2018.

ResearchIME: A Mobile Keyboard Application for Studying

Free Typing Behaviour in the Wild. In Proc. CHI '18.

Large-Scale Piggyback - How to● Identify research goals

● Select study method (relational or experimental)

● Devise an incentive mechanism

● Choose target platform

● Design and develop app

● Prepare data collection

● Implement scheme to obtain informed consent

● Distribute and promote app

● Monitor data collection

● Filter and analyze data to answer research question

30

[Henze et al., 2013] Henze, Sahami, Schmidt, Pielot,

Michahelles. 2013. Empirical Research through Ubiquitous

Data Collection. In IEEE Computer.

Technical Evaluation ● Perform technical measurements

● Measure parameters that are not directly dependent on a person using the system, device, application, e.g. bandwidth requirement, delay of presentation, forces experiences

● Compare technical features of your work to previously published work and show that your solution is “better” with regard to specific parameters

https://thomaskosch.com/wp-content/papercite-data/pdf/hoppe2021odins.pdf Matthias Hoppe, Daria Oskina, Albrecht Schmidt, and Thomas Kosch. 2021. Odin’s Helmet: A Head-Worn Haptic Feedback Device to Simulate G-Forces on the Human Body in Virtual Reality. Proc. ACM Hum.-Comput. Interact. 5, EICS, Article 212 (June 2021), 15 pages. https://doi.org/10.1145/3461734

31

Mathematical Modelling ● Modelling Users and Interaction

32

Fischer, Florian; Fleig, Arthur; Klar, Markus; Grüne, Lars; Müller, Jörg. An Optimal Control Model of Mouse Pointing Using the LQRBayreuth, 2020. https://arxiv.org/pdf/2002.11596.pdfSeinfeld, Sofia; Feuchtner, Tiare; Maselli, Antonella; Müller, Jörg. User Representations in Human-Computer Interaction in Human–Computer Interaction (2020) . - page 1-39 doi:10.1080/07370024.2020.1724790 ...

Computational Optimization● Using optimization techniques to improve systems

● Proof that your approach/design is better with regard to an objective function

● Example: optimize key assignment, adapative and predictive keyboard

Daniel Buschek. Intelligent Text Entry - Adaptive and predictive keyboards. Lecture in the Intelligent User Interfaces course. 2021. https://iui-lecture.org/

33

34

DISCUSSION AND CONCLUSION