rebecca fiebrink perry cook, advisor fpo, 12/13/2010

Real-time Human Interaction with Supervised Learning Algorithms for Music Composition and PerformanceRebecca FiebrinkPerry Cook, AdvisorFPO, 12/13/2010

2 Source: googleisagiantrobot.com

5

function [x flag hist dt] = pagerank(A,optionsu)[m n] = size(A);if (m ~= n) error('pagerank:invalidParameter', 'the matrix A must be square');end; options = struct('tol', 1e-7, 'maxiter', 500, 'v', ones(n,1)./n, … 'c', 0.85, 'verbose', 0, 'alg', 'arnoldi', … 'linsys_solver', @(f,v,tol,its) bicgstab(f,v,tol,its), … 'arnoldi_k', 8, 'approx_bp', 1e-3, 'approx_boundary', inf,… 'approx_subiter', 5);if (nargin > 1) options = merge_structs(optionsu, options);end;if (size(options.v) ~= size(A,1)) error('pagerank:invalidParameter', … 'the vector v must have the same size as A');end;if (~issparse(A)) A = sparse(A);end;% normalize the matrixP = normout(A);switch (options.alg) case 'dense’ [x flag hist dt] = pagerank_dense(P, options); case 'linsys’ [x flag hist dt] = pagerank_linsys(P, options) case 'gs’ [x flag hist dt] = pagerank_gs(P, options); case 'power’ [x flag hist dt] = pagerank_power(P, options); case 'arnoldi’ [x flag hist dt] = pagerank_arnoldi(P, options); case 'approx’ [x flag hist dt] = pagerank_approx(P, options); case 'eval’ [x flag hist dt] = pagerank_eval(P, options); otherwise

error('pagerank:invalidParameter', ...

'invalid computation mode specified.');

end;

function [x flag hist dt] = pagerank(A,optionsu)

9

useful

usableeffective, efficient, satisfying

10

Machine learning

algorithms?

11

Outline

• Overviews of interactive computer music and machine learning

• The Wekinator software• Live demo and video• User studies• Findings and Discussion• Contributions, Future Work, and Conclusions

interactive computer music

13

Interactive computer music

sensed action

interpretation

response (music, visuals, etc.)

computer

human with microphone, sensors, control interface, etc.

audio synthesis or processing,

visuals, etc.

14

Example 1: Gesture recognition

sensed action

identification

response

computer

Bass drum:

“Gesture 1”

15

Example 1: Gesture recognition

sensed action

response

computer

Bass drum:

Hi-hat“Gesture 2”

identification

16

Model of sensed action to meaning

sensed action

response

computer

model

meaning

17

Example 2: Continuous gesture-to-sound mappings

18

sensed action

interpretation

sound generation

computer

mapping

human + control interface

Example 2: Continuous gesture-to-sound mappings

19

A composed system

sensed action

mapping/model/

interpretation

response

mapping/model/

interpretation

supervised learning

21

algorithm

trainingdata

Training

Supervised learning

model

inputs

outputs

22

algorithm

trainingdata

Training

Supervised learning

model

inputs

outputsRunning

“Gesture 1” “Gesture 2” “Gesture 3”

“Gesture 1”

23

Supervised learning is useful

• Models capture complex relationships from the data and generalize to new inputs. (accurate)

• Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient)

So why isn’t it used more often in computer music?Useful but not necessarily usable

24

Criteria for a usable supervised learning tool for composers

1. General-purpose: many algorithms & applications

2. Runs on real-time signals

3. Compatible with other compositional tools

4. Supports interaction via a GUI

✗✓

Weka and similar

general-purpose tools

✗✓

✓✓

???

✓✓

✓(some)

Existing computer

music tools

✗✓

✓(some)

✗ ✗ ✓5. Supports appropriate end-user interactions with the supervised learning process

25

Interactive machine learning (IML)

• Fails and Olsen (2003): training set editing for computer vision systems– Also: handwriting analysis web image classification document classification

(Shilman et al. 2006; Fogarty et al. 2008, Amershi et al. 2009; Baker et al. 2009)– Other research investigates other interactions, e.g. classifier ensemble

manipulation (Talbot et al. 2009), confusion matrix manipulation (Kapoor et al. 2010)

• Relevant research questions explored in my work:– Which interactions are possible and useful?– What are the essential differences between IML and conventional ML?– How can IML be used in real-time and creative contexts?

26

Outline



27

The Wekinator

• Built on Weka API• Downloadable at http://code.google.com/p/wekinator/

(Fiebrink, Cook, and Trueman 2009; Fiebrink, Trueman, and Cook 2009; Fiebrink et al. 2010)

1. General-purpose: many algorithms & applications

2. Runs on real-time signals

3. Compatible with other compositional tools

4. Supports interaction via a GUI

✓✓✓✓

5. Supports appropriate end-user interactions with the supervised learning process

✓

28

A tool for running models in real-time

model(s)

.01, .59, .03, ....01, .59, .03, ....01, .59, .03, ....01, .59, .03, ...

5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …

time

time

Feature extractor(s)

Parameterizable process

29

A tool for real-time, interactive design

Wekinator supports user interaction with all stages of the model creation process.

30

3.3098 Class24

Under the hood

Model1 Model2 ModelM

Feature1 Feature2 Feature3 FeatureN…

Parameter1 Parameter2 ParameterM

…

…

joystick_x joystick_y

pitchvolume

webcam_1

31

3.3098 Class24

Under the hood

Model1 Model2 ModelM

Feature1 Feature2 Feature3 FeatureN…

Parameter1 Parameter2 ParameterM

…

…

Learning algorithms:Classification:

AdaBoost.M1J48 Decision TreeSupport vector machineK-nearest neighbor

Regression:MultilayerPerceptron

32

Tailored but not limited to music

The Wekinator• Built-in feature extractors for music & gesture• ChucK API for feature extractors and synthesis

classes

Other modules for sound synthesis,

animation, …?

Other feature extraction modules

Open Sound Control (UDP)

Control messages

33

Outline



34

Wekinator in performance

35

Outline



36

Study 1: Participatory design process with composers

• Method:– 7 composers– 10 weeks, 3 hours / week– Group discussion, experimentation,

and evaluation– Iterative design– Final questionnaire

• Outcomes:– A focus on instrument-building– Much-improved software and lots of feedback

(Fiebrink, Trueman, et al., 2010)

37

Study 2: Teaching interactive systems building in an undergraduate course

• COS/MUS 314 Spring 2010 (PLOrk)• Focus on building interactive music

systems• Method:– Wekinator midterm assignment

• 1 continuous + 1 discrete system• Limited choice of controller + synthesis, balanced task order• Logging + questionnaire data

– Observations of midterm and final performances• Outcomes:– Successful project completion– Logs from 21 students

38

Study 3: Bow gesture recognition

• Work with a composer/cellist to build gesture classifiers for a commercial sensor bow, the “K-Bow”

38

39

Study 3: Bow gesture recognition

• Work with a composer/cellist to build gesture classifiers for a commercial sensor bow

• Method:– Work with cellist to build 8 classifiers for standard bow gestures:

Up/down bow, on/off string, grip engaged/disengaged, bow rolled/not rolled, horizontal position, vertical position, speed, articulation• First session: all 8 classification tasks• Supplemented with second session: 5 “hard” tasks repeated

– Logging (data + actions), observations, final questionnaire– Cellist assigned each iteration’s model a rating of quality (1 to 10)

• Outcomes:– Successful models created for all 8 tasks (rated “9” or “10”)

39

40

Study 4: Composer case studies

• CMMV by Dan Trueman, faculty

41


• CMMV by Dan Trueman, faculty• The Gentle Senses / MARtLET by Michelle Nagai, graduate student

42


• CMMV by Dan Trueman, faculty• The Gentle Senses / MARtLET by Michelle Nagai, graduate student• G by Raymond Weitekamp, undergraduate

43

Outline



Discussion of Findings

1. Users’ interactions with the Wekinator(i.e., first reasons why interaction is useful)

2. Users’ goals for the trained models 3. The Wekinator’s influence on users’ actions and goals 4. Usability and usefulness of the Wekinator 5. The Wekinator as a tool for supporting creativity and

embodiment

[interaction summary]

46

Users’ interactions with the Wekinator

• Users in all studies iteratively modified the learning problem, retrained, and re-evaluated.

• Editing the training dataset was used more frequently than changing the learning algorithm or features.

47

PLOrk students’ actions (Study 2)

48

K-Bow actions (Study 3)

49

Composers’ actions (Study 1)

50

Key questions

• How and why did users edit the training data?• What methods and criteria did users employ to

evaluate the trained models?

[interaction with the training set]

52

Interaction and the training dataset

• Training data is an interface for:– defining the learning problem– clarifying the learning problem to fix errors– communicating changes in problem over time

• Model made incrementally more complex• User changes his mind about how a model should work

– providing a “sketch” that the computer fills in

53

Thoughts

• The data is the most effective and appropriate interface for accomplishing these tasks.– Appropriate when:

• User has knowledge of learning problem• User is capable of creating examples (efficiently)• Training process is short

– PLOrk: TODO– K-Bow: TODO

• The interfaces for creating and editing the training data affect the user’s ability to accomplish these tasks

54

Playalong data creation supports fine-grained control and embodied interaction

• Some users found it highly useful– 2 participatory design composers, 2 composers in case studies– 8 of 21 PLOrk students used playalong (and used it often)

• Allowed training data to represent more fine-grained information

• Enabled some composers to engage their musical and physical expertise– Allows practice and attention to “feel”

[interaction and model evaluation]

56

“Conventional” model evaluation

model

Available data

Training set

Evaluation set

Train

Evaluate

57

“Direct” evaluation in Wekinator

model

Training set Train

Evaluate

58

Direct evaluation used more frequently than conventional metrics

• Composers in study 1 and case studies: only direct evaluation

Task # times CV accuracy

# times training accuracy

# times direct eval.

Total time in direct eval. (min.)

Total time (min.)

Continuous 0.9 1.7 4.2** 8.5 27.1

Discrete 1.0 1.6 5.3** 2.7 16.1

• PLOrk per-student averages:

• K-bow per-task averages:Session # times

CV accuracy

# times training accuracy

# times direct eval.

Total time in direct eval. (min.)

Total time (min.)

1 1.8 0 5.4 32.4 204.9

2 0.0 0 2.6 5.9 44.4

59

Roles of cross-validation and training accuracy

• Useful as quick, objective evaluation techniques• K-bow:– Used for “stubborn” problems– Quickly assess changes to features or algorithms

• PLOrk:– Treated as evidence a model was performing well– Used to validate the user’s ability

60

Roles of direct evaluation

• Assess behavior of the model against subjective criteria

• Obtain feedback that shapes the users’ future interactions with the system


1. Users’ interactions with the Wekinator 2. Users’ goals for the trained models

(i.e. people care about more than accuracy!)3. The Wekinator’s influence on users’ actions and goals 4. Usability and usefulness of the Wekinator 5. The Wekinator as a tool for supporting creativity and

embodiment

62

Users’ goals for the trained models

• What criteria were users employing when they evaluated models?

63

Accuracy

• Does the model behavior match the learning concept definition?– OR the user’s expectations?

important even for creative, open-ended tasks, especially on inputs like the training examples

64

Cost

• What are the consequences and locations of model errors?– Is the mistake similar to one a human would make?– Is the mistake avoidable in performance?

65

Shape of decision boundary

• TODO: Add boundaries

66

Complexity and unexpectedness

• TODO: Add quotes

67

“Feel”

• TODO: Add quote

68

Subjective evaluation criteria & CV

• K-bow:– Cross-validation sometimes correlates with subjective

quality, but sometimes it doesn’t!

Task: Horizontal Position

Vertical Position

Bow Direction

On/Off String

Speed Articulation

R: -0.59 -0.44 -0.74 -0.50 0.65 0.93

Pearson’s correlation for tasks with > 3 iterations:

69

Thoughts

• Is generalization accuracy important?– Yes!

• Human and environmental variations are inevitable– BUT it may not be the only or most important factor– and generalization estimated from the training set (e.g.,

using cross-validation) is not always informative• ML systems designed for human use should be evaluated by

human use.

70

More thoughts

• Most algorithms are designed to produce models with good generalization accuracy as estimated from the training data

• BUT the user is employing the training data as an interface to influence the model’s behavior…– Training accuracy may be more important– Training set may not be representative of future inputs

• Better algorithms might– Privilege training accuracy (e.g., k-nearest neighbor)– Provide for optimization against other subjective criteria (e.g., using

regularization parameter for boundary smoothness)


1. Users’ interactions with the Wekinator2. Users’ goals for the trained models 3. The Wekinator’s influence on users’ actions and goals

(i.e., interaction is bi-directional!)4. Usability and usefulness of the Wekinator 5. The Wekinator as a tool for supporting creativity and

embodiment

72

Interaction is bi-directional

control

feedback**Running the models (“direct evaluation”)Cross-validation and training accuracy

Machine learning

algorithms

73

Running models informs future actions

• For example:– locate errors add correctly-labeled examples– detect total failure delete all the data– find that they like the model stop iterating

74

Running models trains users to be more effective supervised learning practitioners

• Users especially learned to create better training datasets– Minimize noise– Balance the number of examples in each class– Vary examples along all the dimensions that might vary in

performance

• Important for novice users• Insufficient feedback is frustrating

75

Running models informs users’ goals for machine learning• Some PLOrkers and composers relied on the Wekinator to

help formulate the problem definition• Users learned what was most easily accomplished

… and exploited flexibilities in the learning concept definition to create a model that most easily met their most important goals• PLOrk: Define classes based on what’s easy to classify• K-Bow: Add more classes when a model performed well (e.g.,

Bow Speed)

76

Running models informs users about themselves and their work

K-Bow cellist:Model’s confusion of spiccato and riccocet

realization that her spiccato was too much like riccocet

improved technique

77

Thoughts – Impact in other domains

• Direct evaluation on user-generated inputs could be useful in these same ways in other applied machine learning domains– Appropriate when user is qualified to create inputs and

evaluate results– Requires an appropriate interface

• Interactive machine learning allows users to create more useful and accurate models by allowing them to interactively redefine the learning problem to create useful and achievable models


1. Users’ interactions with the Wekinator2. Users’ goals for the trained models 3. The Wekinator’s influence on users’ actions and goals4. Usability and usefulness of the Wekinator 5. The Wekinator as a tool for supporting creativity and

embodiment

79

Usability and usefulness: Study 1 composers

Statement 5-point Likert mean (std. dev.)

“The Wekinator allows me to create more expressive mappings than other techniques.”

4.5 (.8)

“The Wekinator allows me to create mappings more easily than other techniques.”

4.7 (.5)

80

Composers’ values – Studies 1 and 4

• Speed and ease of creating and exploring mappings– especially complex mappings

• Privileging the gesture-sound relationship• Access to surprise and discovery• Balancing surprise and compleixty with predictability

and control• Playfulness

81

Usability and usefulness: PLOrk students

Statement 5-point Likert mean (std. dev.)

“I can reliably predict what sound my model will make for a given inputgesture.”

4.5 (.7)

“Wekinator eventually learned what I wanted it to.”

4.3 (.9)

“My model provides reliable gesture classifications” (discrete task)

4.9 (.2)

“My model is musically expressive” (continuous task)

4.1 (.7)

82

Usability and usefulness: PLOrk students

• Students enjoyed the Wekinator and found it useful– “Learning by experimentation was a lot of fun!”– “It’s so cool, the Wekinator rocks.”

• Model-building was fast– 27.1 minutes to build a continuous mapping– 16.1 minutes to build a discrete classifier

83

Usability and usefulness: K-Bow

Task Rating (1 to 10) CV Accuracy (%)Direction 10 87.3On/Off String 10 83.5Grip 10 100.0Roll 10 98.2Horizontal Position 10 89.3Vertical Position 10 90.0Speed 9 87.5Articulation 9 98.8

• Models successfully created for all 8 tasks:

84

Usability and usefulness: K-Bow

Statement 5-point Likert response

“The Wekinator was able to create accurate bow stroke classiers in our work so far”

4

“The Wekinator was able tocreate bow stroke classiers more easily than other approaches”

“10 (so 5)"

85

Barriers to usability

• Long training times (for a few users)• Algorithms’ inability to model the desired concept• Difficulty in debugging – Especially in choosing a better algorithm, or algorithm

parameters– Especially due to novice users

86

“Well, I had basically lost interest in the whole process of digital controller-based instrument building, so the Wekinator's very existence has enabled and inspired me to get back into the game... The Wekinator enables you to focus on what your primary sonic and physical concerns are, and takes away the need to address so many details, and it does so in such a way that even if you DID spend all the time on building the mappings manually, you would *never* come up with what the Wekinator comes up with. So, the process becomes more focused, more musical, more creative, more playful. I actually *want* to do it. “ (Composer)

“I love Wekinator!” (PLOrk)


1. Users’ interactions with the Wekinator2. Users’ goals for the trained models 3. The Wekinator’s influence on users’ actions and goals4. Usability and usefulness of the Wekinator 5. The Wekinator as a tool for supporting creativity and

embodiment

88

Users valued…

• Rapid prototyping (even when ML not necessary)– kbow too

• Exploring possibilities

• Physicality and abstraction– kbow too

• Discovery and surprise– not kbow

• Balancing with control & constraint• etc.• Flexibility to tailor to a performer/space

&*$?

&*$?

&*$?

89

Interactive machine learning and creativity support

• Creativity support criteria (Resnick et al. AAA)– FILL IN

• Performative, interactive creativity– IML also supports an embodied approach to design– Say a little more about this

90

Outline



91

Contributions

1. The Wekinator software and “playalong” interaction for training data creation

92

Contributions

2. A demonstration of the important roles that interaction can play in the development of supervised learning systems…

…and a greater understanding of the differences between interactive and conventional machine learning contexts.

communicating evolving problem definitions, evaluating models against subjective criteria, becoming a better machine learning user, practice, embodied design, …

emphasis on generalization accuracy, training dataset size, training time, iterative sketching and refinement, …

93

Contributions

3. A better understanding of the requirements and challenges in the analysis and design of algorithms and interfaces for interactive supervised learning in real-time and creative problem domains.

enabling inspiration, educating novice users, providing access to complexity, enabling evaluation against subjective criteria, supporting embodiment, rapid prototyping, …

94

Contributions

4. A clearer characterization of composers’ goals and priorities for interacting with computers in music composition and instrument design and a demonstration that interactive supervised learning is useful in supporting composers in their work.

95

Contributions

5. A demonstration of the usefulness of interactive supervised learning as a creativity support tool.

96

An HCI view on algorithms

• Algorithms afford certain possible interactions, control, and feedback – i.e., they have an innate potential to be useful

• User interfaces can hide or expose these affordances– And can expose them in more or less usable ways

• The Wekinator exploits the fact that supervised learning models can be manipulated through the training dataset

• Algorithms can be made more useful and usable– through more appropriate interfaces– through affording more appropriate interactions, control, and

feedback

97

Future work

• Improve the Wekinator as a compositional tool and a research platform– Collaborate with musicians, researchers, and other

users in participatory processes

• Investigate new algorithms, interactions, and interfaces for music performance and beyond– Further explore how interactive machine learning can

be made more useful and usable by more people applying it to more problems

98

Final Conclusions

• It is important to consider the interface and context of algorithms in practice

• Supervised learning (even with “old” algorithms) still has unexplored uses

• Computer music is a fantastic area to be doing research– Musicians make great collaborators

99

Thanks!• Perry Cook• Dan Trueman• Dan Morris• Ken Steiglitz• Adam Finkelstein• Szymon Rusinkiewicz• Michelle Nagai• Cameron Britt• Konrad Kaczmarek• Michael Early• MR Daniel• Anne Hege• Raymond Weitekamp• All the PLOrk students

• Meg Schedel• Andrew McPherson• Barry Threw• Keith McMillen Instruments• Ge Wang• Jeff Snyder• Xiaojuan Ma• Sonya Nikolova• Matt Hoffmann• Merrie Morris• Sumit Basu• Ichiro Fujinaga

• National Science Foundation GRFP

• Francis Lathrop Upton Fellowship

• National Science Foundation grants 0101247 and 0509447

• The Kimberly and Frank H. Moss '71 Research Innovation Fund

• The David A. Gardner '69 Magic Project

• The John D. and Catherine T. MacArthur Foundation

• Everyone else I’m forgetting

100

Related publications• Fiebrink, R. 2006. An exploration of feature selection as an optimization tool for musical genre

classification. Master’s thesis, McGill University. • Fiebrink, R., P. R. Cook, and D. Trueman. 2009. “Play-along mapping of musical controllers.” Proc.

International Computer Music Conference.• Fiebrink, R., M. Schedel, and B. Threw. 2010. “Constructing a personalizable gesture-recognizer

infrastructure for the K-Bow.” International Conference on Music and Gesture (MG3).• Fiebrink, R., D. Trueman, C. Britt, M. Nagai, K. Kaczmarek, M. Early, M.R. Daniel, A. Hege, and P. R.

Cook. 2010. “Toward understanding human-computer interactions in composing the instrument.” Proc. International Computer Music Conference.

• Fiebrink, R., D. Trueman, and P. R. Cook. 2009. “A meta-instrument for interactive, on-the-fly learning.” Proc. New Interfaces for Musical Expression.

• Fiebrink, R., G. Wang, and P. R. Cook. 2007. “Don't forget the laptop: Using native input capabilities for expressive musical control.” Proc. International Conference on New Interfaces for Musical Expression.

• Fiebrink, R., G. Wang, and P. R. Cook. 2008. “Support for MIR prototyping and real-time applications in the ChucK programming language.” Proc. International Conference on Music Information Retrieval.

• Wang, G., R. Fiebrink, and P. R. Cook. 2007. “Combining analysis and synthesis in the ChucK programming language.” Proc. International Computer Music Conference.

101

References

• Amershi, S., Fogarty, J., Kapoor, A., and Tan, D. 2010. “Examining Multiple Potential Models in End-User Interactive Concept Learning.” Proc CHI 2010.

• Baker, K., A. Bhandari, and R. Thotakura. 2009. “Designing an Interactive Automatic Document Classification System.” Proc. HCIR 2009, pp. 30–33.

• Fails, Jerry, and Dan Olsen. 2003. “Interactive machine learning.” Proc. IUI, pp. 39–45.• Fels, S. S. and G. E. Hinton. 1993. “Glove-Talk: A neural network interface between a data-glove and a

speech synthesizer.” IEEE Trans. on Neural Networks, vol. 4.• M. Lee, A. Freed, and D. Wessel. 1992. “Neural networks for simultaneous classification and parameter

estimation in musical instrument control.” Adaptive and Learning Systems, vol. 1706, pp. 244-55.• Raphael, Chris. 2001. “A probabilistic expert system for automatic musical accompaniment.” Journal of

Computational and Graphical Statistics, vol. 10, no. 3, pp. 487-512.• Shneiderman, B. 2000. “Creating Creativity: User interfaces for supporting innovation.” ACM Trans. CHI, vol.

7, no. 1, pp. 114–138.• Shneiderman, B. 2007. “Creativity support tools: Accelerating discovery and innovation.” Comm. ACM vol.

50, no. 12, Dec. 2007, pp. 20–32.• Witten, I., and E. Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San

Francisco: Morgan Kaufmann.

102

Training set size – why so small?

• Learning concepts were “easier” ? (i.e., lower sample complexity)

• Users learned to provide the most useful training examples for representing the problem?– like active learning, but the user is in charge

• Users defined the learning concept definition in order to negotiate the tradeoffs between what they wanted and what was possible in a given amount of time to create training data and train the algorithms?

103

Running models enables users to practice employing them more effectively

• Through practice, they learn to use models more effectively• Users accepted or expected the need to adapt their behaviors

104

Example 2: Audio analysis

sensed action

interpretation

sound generation

computermicrophone and/or

sensors

“Key of F”

105

Example 2: Audio analysis

sensed action

interpretation

sound generation

computermicrophone and/or

sensors

“samba!”

rebecca fiebrink perry cook, advisor fpo, 12/13/2010

Documents

algorithms interfaces

search algorithms

options case gs x flag

endfunction x flag hist

case dense x flag hist

supervised learning

machine learning algorithms

algorithms impact