[studies in computational intelligence] perception-based data processing in acoustics volume 3 ||...

5
1 INTRODUCTION Over the last decade, a series of publications has brought and established new research areas related to music, and intensified the research verging on several disciplinary boundaries, typically dealt with separately. The ex- plosion of collaboration and competition was triggered by the Internet revolution. Research achievements published in the Internet, along with audio and video available through the Internet have made research more efficient. This creates enormous possibilities and synergy. Also standards are more easily defined and implemented. On the other hand, content search of the Internet resources must in response bring new solutions to the problem – most possibly in the form of new standards and technology. Among new emerging areas are: Music Information Retrieval (MIR), Se- mantic Audio Analysis (SAA), music ontology, and many others. Music Information Retrieval refers to data extraction and retrieval from musical databases found on the Internet. The MIR strategic plans were defined and re-defined many times. Strong collaboration, and at the same time strong competition, afforded solutions to many problems defined within the scope of MIR, and overcame some of the largest obstacles found in this field. In addition, these problems have been addressed by technology, thus no re- search plans have been immune to the demands of an increasingly com- petitive technology environment. There exist several definitions on semantic audio analysis. In one of them SAA means the extraction of features from audio (live or recorded) that either have some relevance to humans (e.g. rhythm, notes, phrases) or some physical correlate (e.g. musical instruments). This may be treated as complementary to human-entered metadata. In order to differentiate be- tween human-entered metadata and semantic data, the latter issue consti- tutes a form of ‘technical metadata’, which can accompany a recording or broadcast. Thus metadata are important elements of SAA, and should cover both the extraction of features and their semantic representation. This book will highlight examples where SAA can supplement interactions with music and audio. Human communication includes the capability of recognition. This is particularly true of auditory communication. Information retrieval can be Bo˙ zena Kostek: Perception-Based Data Processing in Acoustics, Studies in Computational Intelligence (SCI) 3, 1–5 (2005) www.springerlink.com c Springer-Verlag Berlin Heidelberg 2005

Upload: bozena

Post on 24-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

1 INTRODUCTION

Over the last decade, a series of publications has brought and established

new research areas related to music, and intensified the research verging

on several disciplinary boundaries, typically dealt with separately. The ex-

plosion of collaboration and competition was triggered by the Internet

revolution. Research achievements published in the Internet, along with

audio and video available through the Internet have made research more

efficient. This creates enormous possibilities and synergy. Also standards

are more easily defined and implemented. On the other hand, content

search of the Internet resources must in response bring new solutions to the

problem – most possibly in the form of new standards and technology.

Among new emerging areas are: Music Information Retrieval (MIR), Se-

mantic Audio Analysis (SAA), music ontology, and many others. Music

Information Retrieval refers to data extraction and retrieval from musical

databases found on the Internet. The MIR strategic plans were defined and

re-defined many times. Strong collaboration, and at the same time strong

competition, afforded solutions to many problems defined within the scope

of MIR, and overcame some of the largest obstacles found in this field. In

addition, these problems have been addressed by technology, thus no re-

search plans have been immune to the demands of an increasingly com-

petitive technology environment.

There exist several definitions on semantic audio analysis. In one of

them SAA means the extraction of features from audio (live or recorded)

that either have some relevance to humans (e.g. rhythm, notes, phrases) or

some physical correlate (e.g. musical instruments). This may be treated as

complementary to human-entered metadata. In order to differentiate be-

tween human-entered metadata and semantic data, the latter issue consti-

tutes a form of ‘technical metadata’, which can accompany a recording or

broadcast. Thus metadata are important elements of SAA, and should

cover both the extraction of features and their semantic representation.

This book will highlight examples where SAA can supplement interactions

with music and audio.

Human communication includes the capability of recognition. This is

particularly true of auditory communication. Information retrieval can be

Bozena Kostek: Perception-Based Data Processing in Acoustics, Studies in ComputationalIntelligence (SCI) 3, 1–5 (2005)www.springerlink.com c© Springer-Verlag Berlin Heidelberg 2005

2 1 INTRODUCTION

investigated with cognitive systems engineering methodologies. Music in-

formation retrieval turns out to be particularly challenging, since many

problems remain unsolved to this day.

Topics that should be included within the scope of the aforementioned

areas include: automatic classification of musical instrument sounds and

musical phrases/styles, music representation and indexing, estimating mu-

sical similarity using both perceptual and musicological criteria, recogniz-

ing music using audio and/or semantic description, building up musical da-

tabases, evaluation of MIR systems, intellectual property rights issues, user

interfaces, issues related to musical styles and genres, language modeling

for music, user needs and expectations, auditory scene analysis, gesture

control over musical works, and many others. Some topics contained

within the notion of MIR are covered by the MPEG-7 standard, which

provides description of the multimedia content in order to support better

interpretation of information.

It should be stressed that solving these problems requires human assis-

tance. Many features of multimedia content description are based on per-

ceptual phenomena and cognition. The preparation of format description,

both numerical and categorical, is done on the basis of understanding the

problem area. Information retrieval systems are presupposed to give an ex-

act match to documents involving the same cues to the user query. How-

ever, operations, which are behind the query do not always provide good

responses to the user’s interest. This means that retrieving multimedia con-

tent on the basis of descriptors would also require human assistance. Deci-

sion systems may produce numerous rules generated in the mining process.

This necessitates the provision of the generated rules for post-processing.

Another problem which needs attention is the processing of unknown,

missing attribute values or incomplete data when acquiring knowledge

from databases. To improve information retrieval quality, various strate-

gies were proposed and used, such as probabilistic, clustering and intelli-

gent retrieval. The latter technique often uses concept analysis requiring

semantic calculations.

The MPEG-7 standard refers to metadata information contained in the

Internet archives. This notion is often applied to the value-added informa-

tion created to describe and track objects, and to allow access to those in-

formation objects. In this context descriptors that are well-defined provide

means for better computing and improved users interfacing and data man-

agement. It can easily be observed that these low-level descriptors are

more data- than human-oriented. This is because the idea behind this stan-

dard is to have data defined and linked in such a way as to be able to use it

for more effective automatic discovery, integration, and re-use in various

applications. The most ambitious task is, however, to provide seamless

1 INTRODUCTION 3

meaning to low- and high-level descriptors. In such a way data can be

processed and shared by both systems and people.

There seems to exist a way to change primitives into higher abstraction

levels, namely semantics. One of the most interesting concepts are the so-

called ‘computing with words’ introduced by Zadeh, and the perception-

based data processing which refer to the fact that humans employ words in

computing and reasoning, arriving at conclusions expressed as words from

premises formulated in a natural language. Computing with words can be a

necessity when the available information is too imprecise to justify the use

of numbers or can be a right solution when it is in better rapport with real-

ity. It seems that this paradigm of computing can be used with success in

music information retrieval, as it offers better processing of subjective de-

scriptors of musical instrument sounds and enables the analysis of data that

result in a new way of describing musical instrument sounds. An example

of such processing was recently introduced by the author. It was proposed

that categorical notions would be quantities partitioned by using fuzzy

logic. Lately, Zadeh presented an overview of fuzzy logic defined in terms

of computational rather than logical sense. In his overview he suggested

that fuzzy logic has four principal aspects. The first one refers to fuzzy

logic understood in narrow sense, thus it is the logic of approximate rea-

soning. The second aspect is related to classes that have unsharp bounda-

ries. The third one is concerned with linguistic variables, which appear in

fuzzy rules, designated for control applications and decision analysis. The

fourth aspect, a so-called epistemic facet, is related to knowledge process-

are based on the concept of granularity, which reflects the ability of human

sensory organs and brain to process perception-based information. Existing

theories, especially probability theory, do not have the capability to oper-

ate on such information, thus the development of the methodology of

computing with words is considered by Zadeh to be an important event in

the evolution of fuzzy logic.

It may be observed that musical object classification using learning al-

gorithms mimics human reasoning. These algorithms constitute a way to

handle uncertainties in musical data, so they are especially valuable in do-

mains in which there is a problem of imprecision and a need for knowl-

edge mining. Such algorithms often need human supervisory control, thus

user modeling is also necessary for retrieval processes. This remark refers

to both rule-based systems and neural networks in which an expert controls

the algorithm settings and the choice of feature vectors.

The research studies, introduced and examined in this book, often repre-

sent hybrids of various disciplines. They apply soft computing methods to

selected problems in musical acoustics and psychophysiology. These are

ing, meaning and linguistics. Applications related to the last mentioned aspect

4 1 INTRODUCTION

discussed on the basis of the research carried out in the MIR community,

as well as on the results of experiments performed at the Multimedia Sys-

tems Department of Gdansk University of Technology. The topics pre-

sented in this work include automatic recognition of musical instruments

and audio signals, separation of duets, processing musical data in the con-

text of seeking for correlation between subjective terms and objective

measures. The classification process is shown as a three-layer process con-

sisting of pitch extraction, parametrization and pattern recognition. Artifi-

cial Neural Networks (ANNs) and rough set-based system are employed as

decision systems and they are trained with a set of feature vectors (FVs)

extracted from musical sounds recorded at the Multimedia Systems De-

partment, and others available in the MIR community. Also, genetic algo-

rithms were applied in musical sound classification.

This book starts with a chapter that focuses on the perceptual bases of

hearing and music perception. The next chapter reviews some selected soft

computing methods along with the application of these intelligent compu-

tational techniques to various problems within MIR, beginning with neural

networks, rough set theory, and including evolutionary computation, and

some other techniques. In addition, a review of the discretization methods

which are used in rough set algorithms is given. The discretization process

is aimed at replacing specific data values with interval numbers to which

they belong. Within this chapter, methods of sound parametrization are

also discussed. This chapter aims at presenting only the main concepts of

the methods mentioned, since the details are extensively covered in a vast

selection of literature. Following this, the next chapter deals with musical

signal separation, its second part introduces the musical phrase analysis,

while the third one is focused on metadata analysis. The Frequency Enve-

lope Distribution (FED) algorithm is presented, which was introduced for

the purpose of musical duet separation. The effectiveness checking of the

FED algorithm is done on the basis of neural networks (NNs). They are

tested on feature vectors (FVs) derived from musical sounds after the sepa-

ration process has been performed. The experimental results are shown and

discussed.

The next chapter deals with the applications of hybrid intelligent tech-

niques to acoustics, and introduces the research, which is based on cogni-

tive approach to acoustic signal analysis. This chapter starts with a short

review of fuzzy set theory. It is followed by a presentation of acquisition

of subjective test results and their processing in the context of perception.

Evaluation of hearing impairment based on fuzzy-rough approach is pre-

sented within this chapter. An overview of the experiments is included,

with more detailed descriptions available through some of the cited au-

thor’s and her team’s papers. In addition, the topic of processing of acous-

1 INTRODUCTION 5

tic signals based on beamforming techniques and neural networks is pre-

sented using cognitive bases of binaural hearing. Another topic related to

audio-visual correlation is a subject of the consecutive chapter. Once

again, a hybrid approach is introduced to process audio-visual signals.

The last chapter outlines the concluding remarks which may be derived

from the research studies carried out by the team of researchers and stu-

dents of the Multimedia Systems Department, Gda sk University of Tech-

nology. An integral part of each chapter is a list of references, which pro-

vide additional details related to the problems presented in the consecutive

book sections.