[studies in computational intelligence] perception-based data processing in acoustics volume 3 ||...
TRANSCRIPT
1 INTRODUCTION
Over the last decade, a series of publications has brought and established
new research areas related to music, and intensified the research verging
on several disciplinary boundaries, typically dealt with separately. The ex-
plosion of collaboration and competition was triggered by the Internet
revolution. Research achievements published in the Internet, along with
audio and video available through the Internet have made research more
efficient. This creates enormous possibilities and synergy. Also standards
are more easily defined and implemented. On the other hand, content
search of the Internet resources must in response bring new solutions to the
problem – most possibly in the form of new standards and technology.
Among new emerging areas are: Music Information Retrieval (MIR), Se-
mantic Audio Analysis (SAA), music ontology, and many others. Music
Information Retrieval refers to data extraction and retrieval from musical
databases found on the Internet. The MIR strategic plans were defined and
re-defined many times. Strong collaboration, and at the same time strong
competition, afforded solutions to many problems defined within the scope
of MIR, and overcame some of the largest obstacles found in this field. In
addition, these problems have been addressed by technology, thus no re-
search plans have been immune to the demands of an increasingly com-
petitive technology environment.
There exist several definitions on semantic audio analysis. In one of
them SAA means the extraction of features from audio (live or recorded)
that either have some relevance to humans (e.g. rhythm, notes, phrases) or
some physical correlate (e.g. musical instruments). This may be treated as
complementary to human-entered metadata. In order to differentiate be-
tween human-entered metadata and semantic data, the latter issue consti-
tutes a form of ‘technical metadata’, which can accompany a recording or
broadcast. Thus metadata are important elements of SAA, and should
cover both the extraction of features and their semantic representation.
This book will highlight examples where SAA can supplement interactions
with music and audio.
Human communication includes the capability of recognition. This is
particularly true of auditory communication. Information retrieval can be
Bozena Kostek: Perception-Based Data Processing in Acoustics, Studies in ComputationalIntelligence (SCI) 3, 1–5 (2005)www.springerlink.com c© Springer-Verlag Berlin Heidelberg 2005
2 1 INTRODUCTION
investigated with cognitive systems engineering methodologies. Music in-
formation retrieval turns out to be particularly challenging, since many
problems remain unsolved to this day.
Topics that should be included within the scope of the aforementioned
areas include: automatic classification of musical instrument sounds and
musical phrases/styles, music representation and indexing, estimating mu-
sical similarity using both perceptual and musicological criteria, recogniz-
ing music using audio and/or semantic description, building up musical da-
tabases, evaluation of MIR systems, intellectual property rights issues, user
interfaces, issues related to musical styles and genres, language modeling
for music, user needs and expectations, auditory scene analysis, gesture
control over musical works, and many others. Some topics contained
within the notion of MIR are covered by the MPEG-7 standard, which
provides description of the multimedia content in order to support better
interpretation of information.
It should be stressed that solving these problems requires human assis-
tance. Many features of multimedia content description are based on per-
ceptual phenomena and cognition. The preparation of format description,
both numerical and categorical, is done on the basis of understanding the
problem area. Information retrieval systems are presupposed to give an ex-
act match to documents involving the same cues to the user query. How-
ever, operations, which are behind the query do not always provide good
responses to the user’s interest. This means that retrieving multimedia con-
tent on the basis of descriptors would also require human assistance. Deci-
sion systems may produce numerous rules generated in the mining process.
This necessitates the provision of the generated rules for post-processing.
Another problem which needs attention is the processing of unknown,
missing attribute values or incomplete data when acquiring knowledge
from databases. To improve information retrieval quality, various strate-
gies were proposed and used, such as probabilistic, clustering and intelli-
gent retrieval. The latter technique often uses concept analysis requiring
semantic calculations.
The MPEG-7 standard refers to metadata information contained in the
Internet archives. This notion is often applied to the value-added informa-
tion created to describe and track objects, and to allow access to those in-
formation objects. In this context descriptors that are well-defined provide
means for better computing and improved users interfacing and data man-
agement. It can easily be observed that these low-level descriptors are
more data- than human-oriented. This is because the idea behind this stan-
dard is to have data defined and linked in such a way as to be able to use it
for more effective automatic discovery, integration, and re-use in various
applications. The most ambitious task is, however, to provide seamless
1 INTRODUCTION 3
meaning to low- and high-level descriptors. In such a way data can be
processed and shared by both systems and people.
There seems to exist a way to change primitives into higher abstraction
levels, namely semantics. One of the most interesting concepts are the so-
called ‘computing with words’ introduced by Zadeh, and the perception-
based data processing which refer to the fact that humans employ words in
computing and reasoning, arriving at conclusions expressed as words from
premises formulated in a natural language. Computing with words can be a
necessity when the available information is too imprecise to justify the use
of numbers or can be a right solution when it is in better rapport with real-
ity. It seems that this paradigm of computing can be used with success in
music information retrieval, as it offers better processing of subjective de-
scriptors of musical instrument sounds and enables the analysis of data that
result in a new way of describing musical instrument sounds. An example
of such processing was recently introduced by the author. It was proposed
that categorical notions would be quantities partitioned by using fuzzy
logic. Lately, Zadeh presented an overview of fuzzy logic defined in terms
of computational rather than logical sense. In his overview he suggested
that fuzzy logic has four principal aspects. The first one refers to fuzzy
logic understood in narrow sense, thus it is the logic of approximate rea-
soning. The second aspect is related to classes that have unsharp bounda-
ries. The third one is concerned with linguistic variables, which appear in
fuzzy rules, designated for control applications and decision analysis. The
fourth aspect, a so-called epistemic facet, is related to knowledge process-
are based on the concept of granularity, which reflects the ability of human
sensory organs and brain to process perception-based information. Existing
theories, especially probability theory, do not have the capability to oper-
ate on such information, thus the development of the methodology of
computing with words is considered by Zadeh to be an important event in
the evolution of fuzzy logic.
It may be observed that musical object classification using learning al-
gorithms mimics human reasoning. These algorithms constitute a way to
handle uncertainties in musical data, so they are especially valuable in do-
mains in which there is a problem of imprecision and a need for knowl-
edge mining. Such algorithms often need human supervisory control, thus
user modeling is also necessary for retrieval processes. This remark refers
to both rule-based systems and neural networks in which an expert controls
the algorithm settings and the choice of feature vectors.
The research studies, introduced and examined in this book, often repre-
sent hybrids of various disciplines. They apply soft computing methods to
selected problems in musical acoustics and psychophysiology. These are
ing, meaning and linguistics. Applications related to the last mentioned aspect
4 1 INTRODUCTION
discussed on the basis of the research carried out in the MIR community,
as well as on the results of experiments performed at the Multimedia Sys-
tems Department of Gdansk University of Technology. The topics pre-
sented in this work include automatic recognition of musical instruments
and audio signals, separation of duets, processing musical data in the con-
text of seeking for correlation between subjective terms and objective
measures. The classification process is shown as a three-layer process con-
sisting of pitch extraction, parametrization and pattern recognition. Artifi-
cial Neural Networks (ANNs) and rough set-based system are employed as
decision systems and they are trained with a set of feature vectors (FVs)
extracted from musical sounds recorded at the Multimedia Systems De-
partment, and others available in the MIR community. Also, genetic algo-
rithms were applied in musical sound classification.
This book starts with a chapter that focuses on the perceptual bases of
hearing and music perception. The next chapter reviews some selected soft
computing methods along with the application of these intelligent compu-
tational techniques to various problems within MIR, beginning with neural
networks, rough set theory, and including evolutionary computation, and
some other techniques. In addition, a review of the discretization methods
which are used in rough set algorithms is given. The discretization process
is aimed at replacing specific data values with interval numbers to which
they belong. Within this chapter, methods of sound parametrization are
also discussed. This chapter aims at presenting only the main concepts of
the methods mentioned, since the details are extensively covered in a vast
selection of literature. Following this, the next chapter deals with musical
signal separation, its second part introduces the musical phrase analysis,
while the third one is focused on metadata analysis. The Frequency Enve-
lope Distribution (FED) algorithm is presented, which was introduced for
the purpose of musical duet separation. The effectiveness checking of the
FED algorithm is done on the basis of neural networks (NNs). They are
tested on feature vectors (FVs) derived from musical sounds after the sepa-
ration process has been performed. The experimental results are shown and
discussed.
The next chapter deals with the applications of hybrid intelligent tech-
niques to acoustics, and introduces the research, which is based on cogni-
tive approach to acoustic signal analysis. This chapter starts with a short
review of fuzzy set theory. It is followed by a presentation of acquisition
of subjective test results and their processing in the context of perception.
Evaluation of hearing impairment based on fuzzy-rough approach is pre-
sented within this chapter. An overview of the experiments is included,
with more detailed descriptions available through some of the cited au-
thor’s and her team’s papers. In addition, the topic of processing of acous-
1 INTRODUCTION 5
tic signals based on beamforming techniques and neural networks is pre-
sented using cognitive bases of binaural hearing. Another topic related to
audio-visual correlation is a subject of the consecutive chapter. Once
again, a hybrid approach is introduced to process audio-visual signals.
The last chapter outlines the concluding remarks which may be derived
from the research studies carried out by the team of researchers and stu-
dents of the Multimedia Systems Department, Gda sk University of Tech-
nology. An integral part of each chapter is a list of references, which pro-
vide additional details related to the problems presented in the consecutive
book sections.