tablanet: a real-time online musical collaboration system...
TRANSCRIPT
TablaNet:a Real-Time Online Musical Collaboration System
for Indian Percussion
Mihir Sarkar
Thesis Proposal for the Degree of Master of Scienceat the
Massachusetts Institute of Technology
Fall 2006
Thesis Advisor Barry L. VercoeProfessor of Media Arts and Sciences
Massachusetts Institute of Technology
Thesis Reader Tod MachoverProfessor of Music and Media
Massachusetts Institute of Technology
Thesis Reader Miller S. PucketteProfessor, Music
Associate Director, Center for Research in Computing and the ArtsUniversity of California, San Diego
Abstract
Distance education in music stands to benefit from real-time interactions over the
Internet. For instance we can imagine an instructor living in a city teaching music
to children in villages so as to enhance or help maintain their local traditions. At
the same time, online music performance systems rely on real-time communication
platforms over fast and robust data networks. In this context I propose to develop
TablaNet, a real-time online musical collaboration system for the tabla, a pair of North
Indian hand drums. I selected the tabla, not only because of my familiarity with it,
but also because of its ”intermediate complexity” as a percussion instrument: although
tabla patterns are only based on rhythmic compositions without melodic or harmonic
structure, different strokes can produce a variety of more than 10 pitched and unpitched
sounds called bols, which contribute to the tabla’s expressive potential. Unlike other
networked music performance projects, which attempt to optimize the audio stream in
order to minimize the network latency, I plan to transmit symbolic information over
the network. By listening to individual drum sounds, and automatically recognizing
them at the near-end, the system will be able, based on the prior events received, to
predict and synthesize rhythmic phrases with the appropriate pitch and tempo at the
far-end. The system will be evaluated on quantitative grounds, such as its latency
tolerance and audio quality, as well as in terms of the system’s ”playability” by tabla
players of various levels.
i
Table of Contents
1 Introduction 1
2 Background 1
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Proposed Approach 3
3.1 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Initial Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.4 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.5 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Evaluation 5
4.1 Expected Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Planning 6
5.1 Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
References 9
Outside Reader Biography 11
Miller S. Puckette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
ii
1 Introduction
Hand drums are essential to Indian music; they are not only used for rhythmic accompani-
ment but also in call-and-response ”duels” and solo performances. However it is sometimes
difficult to find instruction for these instruments in areas with different musical traditions
(e.g. between the North and the South of India, or between rural areas, where classical
instruments may be difficult to come by, and cities, which may have limited access to folk
culture). Moreover with people being increasingly mobile and ”connected”, communication
services (in particular over data networks) are becoming ever more relevant, both socially
(e.g. through ”social networking”) and culturally—as a possible means to sustain indigenous
artistic traditions. In this context, I propose to develop TablaNet, a real-time online musical
collaboration system for Indian percussion involving machine listening.
The main challenge in this application is to overcome network latency. Musicians need to
be perceptually synchronized with one another while data travels on the network. I plan
to solve this problem by writing software that (i) recognizes individual drum strokes and
extracts higher-level rhythmic features from the input signal, (ii) transmits symbolic events
over the network instead of an audio stream, and (iii) synthesizes rhythmic phrases at the
output by using previous events to predict current patterns.
In this project, I will focus my attention on the tabla, the most popular percussion instru-
ment in North India. I expect my results to generalize to other percussion instruments of a
similar nature. This work will result in a playable prototype, a simulation environment for
testing and demonstration, a video presentation, and my master’s thesis, which will docu-
ment this study. After introducing the background to this project and mentioning previous
work in this area, I will outline my approach to solve this problem. I will then present the
evaluation criteria, and define the project plan and requirements.
2 Background
2.1 Motivation
While growing up in France, I missed being able to play with my musician friends in India
and the US. To overcome this situation, we would mail each other multitrack cassettes where
we had recorded one or more tracks. The Internet made this process faster, if not easier, but
we were still far from being able to ”jam” together. This inspired me to devise a system to
enable musicians to play together in real-time over the Internet.
1
2.2 Related work
2.2.1 Tabla Analysis & Synthesis
I shall not describe the tabla in this document (the reader is invited to wait for my master’s
thesis where background information and references will be provided). Probably because it is
one of the most popular Indian instruments, and possibly because of its timbral quality—its
ability to produce both pitched and unpitched sounds—several researchers have investigated
the questions of modeling and simulating the tabla. There have been a number of attempts
to recognize tabla strokes using statistical pattern classification (from [Gillet and Richard,
2003] and [Chatwani, 2003] to [Samudravijaya et al., 2004] and [Chordia, 2005]). However
all these methods analyze recorded performances, and are not necessarily applicable to live
performances, which may be affected by varying environmental conditions, and captured
with sensors other than microphones. There have been different types of electronic tabla
controllers (see [Hun Roh and Wilcox, 1995] and [Kapur et al., 2003a]), some of which
use tabla sounds that are generated with physical models [Kapur et al., 2004]. Moreover
substantial progress has been made in representing complex rhythms with a linguistic model
[Kippen and Bel, 1992] that has been implemented on the Bol Processor [Kippen and Bel,
1994]. However there has been no work, as far as I know, in the area of phrase prediction
for percussion instruments (see [Chafe, 1997] on the prediction of solo piano performance).
2.2.2 Networked Musical Performance
Since the advent of the Internet, musicians have been looking at online music collaboration as
the next ”killer-app”. In fact the network music performance space has been and continues to
be the source of several commercial endeavors (from the defunct Rocket Network to Ninjam,
Audio Fabric, and Lightspeed Audio Labs, a new startup still in stealth mode). However
these efforts (e.g. [Cooperstock and Spackman, 2001], [Kapur et al., 2003b], [Sawchuk et al.,
2003] and [Weinberg, 2005b]) are restricted by a hard theoretical constraint: the inherent
latency of computer networks. This delay, whose minimum is bounded by the speed of
light, is undesirable for music traveling over long distances. In spite of that, most current
projects still attempt to minimize latency either by sending MIDI commands, or by trying to
optimize the trade-off between audio stream compression and algorithmic complexity (e.g.
[Lazzaro and Wawrzynek, 2001], [Chatwani and Koren, 2004], [Gu et al., 2004]). Some
projects even rely on improved and faster networks, such as the experimental Internet2
[Bargar et al., 1998]. More recently, studies have been conducted on the effects of time delay
2
on musician synchronization (see [Chafe et al., 2004], [Chew et al., 2004] and [Maki-Patola,
2005]). Some researchers, notably Chris Chafe of CCRMA at Stanford University, have
also found creative ways to turn network latency to their advantage by converting delays
into reverberation [Chafe, 2003]. Thus, despite meeting with limited success, researchers are
finding new ways to interact musically over the Internet, and several roadmaps have been
proposed for networked musical performance (for instance [Weinberg, 2005a] and [Kapur
et al., 2005]).
3 Proposed Approach
3.1 Research Methodology
I propose to develop a computer system to enable real-time online musical collaboration
between two tabla players. The principles of this application, although specific to Indian
percussions, can be extended and generalized to other instruments and cultures. This system
will be evaluated with human tabla players using the system in a live setting propitious to
musical exchange. We shall also discuss the importance of interactions via other modalities
such as speech or vision, which carry instructions, appreciative sounds and gestures, and
an ”excitement factor” among the musicians and the audiences at both ends. An initial
assessment may be conducted on the importance of visual contact between musicians (in
particular tabla players) playing together in order to evaluate the relevance of a networked
music performance system offering only audio as a communication channel.
Several risks could impede progress on this project; there could be technical difficulties
for instance, like a subsystem not attaining the expected quality (e.g. low tabla strokes
recognition rate). However risk is inherent to research, and although I shall find ways to
mitigate them as much as possible in the course of this study, I shall not detail them further
in this document.
3.2 Initial Study
I conducted preliminary work where I demonstrated the concept presented in this document
by sensing vibrations on the tabla drumhead, analyzing stroke onsets, and transmitting
tempo and quantized onset events over a non-guaranteed connectionless UDP (User Data-
gram Protocol) network layer. The receiver triggered sampled tabla sounds on reception of
the events. This application was prototyped in the Max/MSP environment.
3
3.3 System Design
The proposed system architecture is described in the TablaNet system diagram (fig. 1). We
do not go into further details in this proposal about the network infrastructure, or details of
the computer system (standard configuration—probably under Linux) and audio speakers.
Tabla
Tabla
Sensors
Sensors Mixer / amplifier
Mixer / amplifier
Speaker
Speaker
Computersystem
Computersystem
Network
Figure 1: The TablaNet System Diagram
3.4 Hardware Implementation
The TablaNet system, although mostly software-based, relies on important pieces of hard-
ware. In order to avoid feedback from the speakers, which play the audio signal generated
by the far-end, into a microphone (and thus generating false alarms), I plan to use vibra-
tion sensors (most probably piezo-electric films) placed directly on the tabla drumheads. The
outputs of these sensors will be fed into a pre-amplified mixer, keeping in mind the frequency
range of tabla sounds, and will finally enter the A-to-D converter on the computer.
3.5 Software Implementation
The computer program at the near-end will contain code to extract features from the audio
input, classify incoming tabla strokes based on those features, and perform higher-level
4
operations, like extract the tempo. The application will then transmit the data to the far-
end computer over the Internet. The receiver will reassemble the packets, and generate a
tabla phrase in real-time based on the events received up to that point in time. A main
part of the work will be to design the tabla phrase prediction algorithm. The target software
environment (i.e. language, IDE) has not been decided yet. Tabla sound synthesis at the far-
end will either be based on a physical model so as to offer maximum control over the sound
quality (e.g. pitch slides), or on sample playback (e.g. wavetable synthesis or soundfonts,
which sometimes offer limited instrument control over some sound parameters) in order to
limit the additional load in designing a tabla sound synthesis.
4 Evaluation
4.1 Expected Contributions
I expect that my research work will result in the following contributions:
• Design a networked tabla performance system
• Develop an extensible tabla phrase prediction engine
• Implement a real-time continuous tabla strokes recognizer
• Realize a sensor interface for percussion with no audio feedback based on an array ofpiezo-electric sensors placed on each tabla head and an appropriate amplifier interface
• Create a real-world musical interaction between two tabla musicians over a computernetwork
4.2 Quantitative Results
The system will be evaluated on the following criteria:
• Tabla strokes recognition rate, and comparison with existing systems
• One-way and round-trip time delay (network latency), and comparison with allowableperceptual maximum
• Tabla phrase prediction error rate
• Output audio quality by listeners (non-performers) based on a statistical perceptualassessment
5
4.3 Qualitative Results
In addition to the quantitative assessment, we will examine the system’s ”playability” by
tabla players of various levels (beginner = less than 1 year experience; intermediate = from 1
to 3 years experience; and expert = more than 3 years experience). Experiments will involve
activities in the areas of:
• Distance learning
• Rhythmic accompaniment
• Call and response (called Jugalbandi)
Network latency will be simulated using median and worst case figures. After playing on the
system for various periods of time, tabla players at both ends as well as the audience will be
asked to comment on whether the system meets their expectations in terms of how ”natural”
the rhythmic patterns (variety, quantization, etc.) and audio output sound. Results will be
collected in the form of a survey and evaluated with a formal quantitative coding system
for qualitative data. I hope that the prototype will give musicians the impression of playing
with a fellow musician, rather than just playing with (or against) a machine. Questionnaire
responses will be included as an appendix to my master’s thesis.
5 Planning
5.1 Deliverables
The deliverables for this project fall under two categories: a working prototype suitable
for live demonstration and simulation (i.e. one tabla player versus the computer), and a
technical description of my work in the form of my master’s thesis, which will document the
design choices, implementation details, and results of this study. In addition, I intend to
present the results of this research at appropriate venues (e.g. the Conference on Human
Factors in Computing Systems (CHI), the Audio Engineering Society (AES) Convention, the
International Conference on New Interfaces for Musical Expression (NIME), or the Sound
and Music Computing (SMC) Conference). I will also produce a short audio/video segment
to illustrate various usage scenarios of the system in action (e.g. rhythmic accompaniment,
call and response).
6
5.2 Schedule
January • Background research
• Preliminary tabla strokes dataset collection
• Discrete tabla strokes identification (offline simulation)
• COUHES1 application for data gathering and system testing
February • Sensor interface design and development
• Complete tabla strokes dataset collection
• Continuous tabla strokes identification (real-time processing)
• Article on TablaNet system architecture
March • User interface and system prototyping
• Networked musical collaboration environment
• Tabla sound synthesis (sample playback)
• Master’s thesis first draft
April • Learning and prediction of tabla performance
• Tabla sound synthesis (physical model)
• System testing and evaluation
• Master’s thesis review and final draft
May • Video footage and production
• Prototype demonstration
• Master’s thesis submission
• Article on tabla strokes identification and phrase prediction
1MIT Committee on the Use of Humans as Experimental Subjects
7
5.3 Resources
The resources required to carry-on this project are:
• A tabla set (available from Prof. Barry Vercoe)
• Microphone, pre-amplifier, audio cables (available)
• 2 audio speakers (to be procured through an internal channel)
• 2 computers for demonstration (to be procured through an internal channel)
• Development platforms (Mac OS X and Windows XP available, Linux to be installed)
• Audio software and development environment (partially available)
• Vibration (piezo) sensors (partially available)
• Electronic parts for sensor interface and pre-amplifier (to be purchased)
• Participation incentives (gift coupons or the like) for dataset gathering and systemtesting
As far as recording tabla strokes and testing the system are concerned, I have access to a
relatively large number of tabla players of various levels at the Media Lab, and through
Sangam (the MIT Indian students association) and the music school of Sangeet, a Harvard
University student-run organization dedicated to South Asian music. In addition, several
Media Lab students can help me with recording and editing the video footage.
8
References
R. Bargar, S. Church, A. Fukuda, J. Grunke, D. Keislar, B. Moses, B. Novak, B. Pennycook,Z. Settel, J. Strawn, et al. AES white paper: Networking audio and music using Internet2 andnext-generation Internet capabilities. Technical report, AES: Audio Engineering Society, 1998.
C. Chafe. Statistical Pattern Recognition for Prediction of Solo Piano Performance. In Proc. ICMC,Thessaloniki, 1997.
C. Chafe. Distributed Internet Reverberation for Audio Collaboration. In AES (Audio EngineeringSociety) 24th Int’l Conf. on Multichannel Audio, 2003.
C. Chafe, M. Gurevich, G. Leslie, and S. Tyan. Effect of Time Delay on Ensemble Accuracy. InProceedings of the International Symposium on Musical Acoustics, 2004.
A. Chatwani and A. Koren. Optimization of Audio Streaming for Wireless Networks. Technicalreport, Princeton University, 2004.
A.A. Chatwani. Real-Time Recognition of Tabla Bols. Princeton University, Senior Thesis, May2003.
E. Chew, R. Zimmermann, A.A. Sawchuk, C. Kyriakakis, C. Papadopoulos, ARJ Francois, G. Kim,A. Rizzo, and A. Volk. Musical Interaction at a Distance: Distributed Immersive Performance. InProceedings of the MusicNetwork Fourth Open Workshop on Integration of Music in MultimediaApplications, September, pages 15–16, 2004.
P. Chordia. Segmentation and Recognition of Tabla Strokes. In Proc. of ISMIR (InternationalConference on Music Information Retrieval), 2005.
J.R. Cooperstock and S.P. Spackman. The Recording Studio that Spanned a Continent. In Proc.of IEEE International Conference on Web Delivering of Delivering of Music (WEDELMUSIC),2001.
O.K. Gillet and G. Richard. Automatic Labelling of Tabla Signals. In Proc. of the 4th ISMIRConf., 2003.
X. Gu, M. Dick, U. Noyer, and L. Wolf. NMP-a new networked music performance system. InGlobal Telecommunications Conference Workshops, IEEE, pages 176–185, 2004.
J. Hun Roh and L. Wilcox. Exploring Tabla Drumming Using Rhythmic Input. In CHI’95 pro-ceedings, 1995.
A. Kapur, G. Essl, P. Davidson, and P.R. Cook. The Electronic Tabla Controller. Journal of NewMusic Research, 32(4):351–359, 2003a.
A. Kapur, G. Wang, P. Davidson, PR Cook, D. Trueman, TH Park, and M. Bhargava. TheGigapop Ritual: A Live Networked Performance Piece for Two Electronic Dholaks, DigitalSpoon, DigitalDoo, 6 String Electric Violin, Rbow, Sitar, Table, and Bass Guitar. In Proceedingsof the International Conference on New Interfaces for Musical Expression (NIME), Montreal,2003b.
9
A. Kapur, P. Davidson, P.R. Cook, P. Driessen, and A. Schloss. Digitizing North Indian Perfor-mance. In Proceedings of the International Computer Music Conference, 2004.
A. Kapur, G. E. Wang, P. Davidson, and P. R. Cook. Interactive Network Performance: a dreamworth dreaming? Organised Sound, 10(03):209–219, 2005.
J. Kippen and B. Bel. Modelling Music with Grammars: Formal Language Representation in theBol Processor. Computer Representations and Models in Music, Ac. Press ltd, pages 207–232,1992.
J. Kippen and B. Bel. Computers, Composition and the Challenge of ”New Music” in ModernIndia. Leonardo Music Journal, 4:79–84, 1994.
J. Lazzaro and J. Wawrzynek. A case for network musical performance. In Proceedings of the 11thinternational workshop on Network and operating systems support for digital audio and video,pages 157–166. ACM Press New York, NY, USA, 2001.
T. Maki-Patola. Musical Effects of Latency. Suomen Musiikintutkijoiden, 9:82–85, 2005.
K. Samudravijaya, S. Shah, and P. Pandya. Computer Recognition of Tabla Bols. Technical report,Tata Institute of Fundamental Research, 2004.
AA Sawchuk, E. Chew, R. Zimmermann, C. Papadopoulos, and C. Kyriakakis. From remote mediaimmersion to Distributed Immersive Performance. In Proceedings of the 2003 ACM SIGMMworkshop on Experiential telepresence, pages 110–120. ACM Press New York, NY, USA, 2003.
G. Weinberg. Interconnected Musical Networks: Toward a Theoretical Framework. ComputerMusic Journal, 29(2):23–39, 2005a.
G. Weinberg. Local Performance Networks: musical interdependency through gestures and con-trollers. Organised Sound, 10(03):255–265, 2005b.
10
Outside Reader Biography
Miller S. Puckette
Miller Puckette obtained a B.S. in Mathematics from MIT (1980) and Ph.D. in Mathematics from
Harvard (1986). Puckette was a member of MIT’s Media Lab from its inception until 1987, and then
a researcher at IRCAM (Institut de Recherche et de Coordination Acoustique/Musique, founded by
composer and conductor Pierre Boulez). There he wrote the Max program for Macintosh computers,
which was first distributed commercially by Opcode Systems in 1990 and is now available from
Cycling ’74. In 1989 Puckette joined IRCAM’s ”musical workstation” team and put together an
enhanced version of Max, called Max/FTS, for the ISPW system, which was commercialized by
Ariel, Inc. This system became a widely used platform in computer music research and production
facilities. The IRCAM real-time development team has since reimplemented and extended this
software under the name jMax, which is distributed free with source code.
Puckette joined the Music department of the University of California, San Diego in 1994, and is now
Associate Director of the Center for Research in Computing and the Arts (CRCA). He is currently
working on a new real-time software system for live musical and multimedia performances called
Pure Data (”Pd”), in collaboration with many other artists/researchers/ programmers worldwide.
Pd is free and runs on Linux, IRIX, and Windows systems. Since 1997 Puckette has also been part
of the Global Visual Music project with Mark Danks, Rand Steiger, and Vibeke Sorensen, which
has been generously supported by a grant from the Intel Research Council.
11