new methodologies towards an automatic optical …jsc/students/2008anarebelo/2008... · new...

Ana Maria Rebelo

New methodologies towards an automatic

optical recognition of handwritten musical

scores

Dissertação de Mestrado

Universidade do Porto

Outubro 2008

Universidade do Porto

Departamento de Matemática Aplicada

Tese submetida à Faculdade de Ciências da

Universidade do Porto para obtenção do grau de

Mestre em Engenharia Matemática

New methodologies towards an automatic

optical recognition of handwritten musical

scores

Ana Maria Rebelo

([email protected])

Dissertação realizada sob a supervisão do

Professor Doutor Jaime S. Cardoso (INESC Porto, FEUP)

e sob a co-orientação do

Professor Doutor Joaquim F. Pinto da Costa (DMA, FCUP).

Porto, Outubro 2008

A painter paints pictures on canvas. But musicians paint their pictures on silence.

Leopold Stokowski

Abstract

Many music works produced in the past are currently available only as original manuscripts

or as photocopies. Their preservation has entailed a vast amount of research in the last years.

The digitalization has been commonly used as a possible preservation method which comprises

setting up a digital copy. Despite the fact of an easy accessibility in a machine-readable format,

which encourages browsing, retrieval, search and analysis and, most importantly, the preser-

vation of endangered works, while providing a generalized access to digital content, actually

it only keeps a portion of the information. Several systems have been presented in order to

fulll the lack of a tool able to analyse and perform semantic search operations. Carrying

this task manually is very time consuming and error prone. While optical music recognition

systems usually perform well on printed scores, the processing of handwritten musical scores

by computers remains far from ideal.

One of the fundamental stages in the optical music recognition is the detection and subsequent

removal of sta lines. In this project we investigate a general-purpose, knowledge-free method

for the automatic detection of music sta lines based on stable paths approach. Lines aected

by curvature, discontinuities, and inclination are robustly detected. A sta removal algorithm is

also developed by adapting an existing line removal approach to use the stable paths algorithm

at the detection stage. Experimental results show that the proposed technique consistently

outperforms well-established algorithms. The developed approach will be integrated, as future

work, in a web based system providing seamless access to browsing, retrieval, search and anal-

ysis of submitted scores.

Symbol detection is also crucial for a good performance in an optical music recognition system.

In this work we propose a segmentation method based on the hierarchical decomposition of

music sheets. The symbols are split into four dierent types to facilitate their extraction. Since

we are working with handwritten scores many situations may occur. The work degradation,

highly handwritten dierent types as well as other possible undesirable discontinuities are

situations to take into account. One way to diminish these situations in the classication

process is to create a database where several distortions are simulated. So, a new methodology

is presented where elastic matching is used in conjunction with several classiers, for instance

neural networks and hidden Markov models. A profound comparative study is made about

classiers with and without the elastic matching applied to handwritten scores. It is expected

that the obtained results outperform the actual state of the art.

I

Resumo

Muitos dos trabalhos musicais produzidos no passado estão actualmente disponíveis apenas

como manuscritos originais ou fotocópias. A sua preservação tem causado um vasto leque

de pesquisas nos últimos anos. A digitalização tem sido usualmente utilizada como possível

método de preservação em que se resume à criação de uma cópia digital. Apesar de constituir

uma acessibilidade fácil num formato legível por computador, que permite a consulta, a recu-

peração, a procura e a análise, e mais importante, a preservação das obras em risco, enquanto

promove um acesso generalizado ao material digital, na verdade apenas guarda uma parte da

informação. Vários sistemas têm sido apresentados para preencher a falta de uma ferramenta

que seja capaz de realizar análises e executar operações de pesquisa semântica. Executar esta

tarefa manualmente é deveras dispendioso e susceptível a erros. Enquanto que os sistemas de

reconhecimento óptico musical apresentam, geralmente, um bom desempenho para partituras

impressas, para partituras manuscritas o processo recorrendo ao computador continua longe do

ideal.

Uma das etapas fundamentais em reconhecimento óptico é a detecção e subsequente remoção

das linhas de pauta. Neste projecto é investigado um método de uso geral e conhecimento livre

para a detecção automática das linhas de música baseado na aproximação do caminho estável.

Linhas afectadas por curvatura, discontinuidades e inclinação são detectadas robustamente.

Também é desenvolvido um algoritmo de remoção de linhas adaptando um processo já exis-

tente de remoção para usar o algoritmo do caminho estável na fase de detecção. Os resultados

experimentais obtidos mostram que a técnica proposta supera consistentemente os algoritmos

bem estabelecidos. A aproximação desenvolvida será agora integrada, num trabalho futuro,

num sistema web, proporcionando um acesso à navegação, recuperação, pesquisa e análise das

partituras submetidas.

A detecção dos símbolos é também crucial para um bom desempenho de um sistema de re-

conhecimento óptico musical. Neste trabalho é proposto um método de segmentação baseado

numa decomposição hierárquica da folha de música. Os símbolos são divididos em quatro tipos

diferentes para facilitar a sua extracção. Como estamos a lidar com partituras manuscritas,

diversas situações são susceptíveis de ocorrer. A degradação do trabalho, diferentes tipos de

manuscritos, assim como outras possíveis descontinuidades não desejáveis são situações a ter em

conta. Uma forma de diminuir estes problemas, no processo de classicação, é criar uma base

de dados onde se simulam várias distorções. Neste sentido, uma nova metodologia é apresen-

tada, onde se recorre à modelação elástica em conjunção com vários classicadores, como por

exemplo, redes neuronais e cadeias de Markov escondidas. É apresentado um profundo estudo

comparativo sobre estes classicadores com e sem modelação elástica aplicada às partituras

manuscritas. É esperado que os resultados obtidos superem o estado actual da arte.

Acknowledgements

Was this project successfully accomplished? The period was long and sometimes dicult; how-

ever I believe that the answer is yes. I am thankful to a set of persons who made all this work

possible and fun to perform.

I came to INESC Porto to nish my undergraduate in Mathematical Applied to Technology

and I ended up staying for another year for the realization of my Master course in Mathemat-

ical Engineering. I am sincerely grateful to Prof. Dr. Jaime S. Cardoso, my thesis supervisor,

for giving me the opportunity to continue to work with him, by the trust in my work, by

the availability and patience always demonstrated, for his guidance given with high rigor and

professionalism and by his wise knowledge. I also thank my thesis co-supervisor, Prof. Dr.

Joaquim Pinto da Costa, for the availability to help and guide with all the rigor and erudition.

It was a great pleasure to work in the INESC Porto with a team as fabulous as the UTM that

done everything to integrate me in the group as soon as possible. To all my colleagues who

accompanied me on this journey and in one way or another supported me and encouraged me,

my sincere thanks. I also do a special thank to Artur Capela, my colleague in the OMR project,

for all the patience and support given, and to Prof. Dr. Carlos Guedes, professor from ESMAE

associated with this Project.

I thank INESC Porto for providing the right environment for high-quality research and FCT

(Fundação para a Ciência e Tecnologia) for nancial support.

To my parents who made a huge eort to tolerate and help the realization of my dreams and

aspirations. To my twin sister my greatest appreciation for all that she represents for me, for

her support and the countless hours of complicity available. To D. Lurdes, a very special friend

who made me face the diculties with courage and determination. Last but not least, I thank

a person very special to me, accompanied me on this journey with great patience and spirit

winner, helping me in this work with his critical and rigorous eye. Thank you for your presence

at all times.

I would like to nish these acknowledgments stressing that with determination and humbleness

our dreams can take place.

Ana Maria Silva Rebelo,

October, 2008

III

Contents

Contents IV

List of Figures VI

List of Tables VIII

Glossary X

I Introduction 1

1 Introduction 3

1.1 OMR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Contributions of this Project and Related Publications . . . . . . . . . . . . . . 7

1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Related Works 10

2.1 State of the Art in Sta Lines Detection . . . . . . . . . . . . . . . . . . . . . . 10

2.2 State of the Art in Musical Symbol Extraction and Classication . . . . . . . . 12

2.3 Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

II Detection 21

3 Sta Lines Detection and Removal 23

3.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Underlying Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Algorithm Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Stable Paths on a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.6 Design of the Weight Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.7 Sta Line Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.8 Database of Music Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.9 Evaluation Metrics and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

IV

IIISegmentation and Classication 47

4 Segmentation and Classication Process 49

4.1 Musical Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.2 Segmentation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Classication Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

IVConclusions and Future Work 71

5 Conclusion 73

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

References 76

V Appendix 81

A Fundamentals 83

A.1 Primal Problem vs Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.2 Error Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.3 Dalitz's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.4 Matching in Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.5 Otsu Threshold Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A.6 Sta Line Removal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A.7 Baum-Welch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A.8 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

B Table of confusion 97

B.1 Results Obtained Without Elastic Matching for the Handwritten Music Symbols 97

B.2 Results Obtained Without Elastic Matching for the Printed Music Symbols . . 99

B.3 Results Obtained With Elastic Matching for the Handwritten Music Symbols . 101

B.4 Results Obtained With Elastic Matching for the Printed Music Symbols . . . . 103

C Articles Submited to Conferences 105

List of Figures

1.1 Generic system architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 OMR architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 A directed acyclic graphs and its linearization. . . . . . . . . . . . . . . . . . . . . 14

3.1 An exemplicative example of the methodology. . . . . . . . . . . . . . . . . . . . . 26

3.2 Stable paths on a toy example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Exemplication of stable paths for Figure 3.1(a). . . . . . . . . . . . . . . . . . . . 28

3.4 Length of a chord through a skeleton point at some angle ϕ. . . . . . . . . . . . . . 32

3.5 Example (from [DDCF08]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 Two examples of music scores from the test set used on the experimental evaluation. 34

3.7 Some examples of applied deformations from the original image: a) Original; b)

Curvature c) Degradation after Kanungo; d) Sta line thickness variation; e) Sta

line y-variation; f) Typeset emulation; g) Rotation; h) White Speckles; i) Sta line

Interruptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.8 Generation of the deformed images and the respective Ground-Truth. . . . . . . . . 38

3.9 Examples of the results obtained using the sta line removal algorithm in our test set. 45

4.1 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Example: a) Tie above noteheads; b) Tie under noteheads. . . . . . . . . . . . . . 54

4.4 An exemplicative example of the variability in the music symbols. . . . . . . . . . 57

4.5 Segmentation process I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 Segmentation process II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.7 An exemplicative example of two dierent connected components in the same

bounding box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.8 An example of the existence of inconsistency in the beam thickness and in the link

with the stems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.9 Beam segmentation process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.10 Notes segmentation process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.11 An example of sharp detection on a real score. . . . . . . . . . . . . . . . . . . . . 60

4.12 An example of sharp detection on a ideal score. . . . . . . . . . . . . . . . . . . . . 61

4.13 An example of how the music symbols that are in the ideal scores were manually

splited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.14 Results of the error metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

VI

4.15 Results of the error metrics for the rotation deformation. . . . . . . . . . . . . . . . 63

4.16 Results of the error metrics for the curvature deformation. . . . . . . . . . . . . . . 63

4.17 Neural network topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A.1 Neural network architecture with three layers, two inputs and one output. . . . . . 87

A.2 Forward propagation of the signal in the neural network. . . . . . . . . . . . . . . . 88

A.3 Comparison between the signal output and the target. . . . . . . . . . . . . . . . . 88

A.4 Propagation of the error signal d in backward mode in the neural network. . . . . . 89

A.5 The calculation of weights in the neural network. . . . . . . . . . . . . . . . . . . . 90

List of Tables

3.1 Deformations in the images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Ranges of deformation parameters used in the tests: min:step:max. . . . . . . . . 37

3.3 Eect of dierent deformations on the overall sta detection error rates in percent-

age: average (standard deviation) of the false detection rate and miss detection rate.

See [DDCF08] for parameter details. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Detection performance on real music scores in percentage: average (standard devi-

ation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Sta segment extraction errors based on the number of segments in an equivalence

class r. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Removal performance on real music scores (in percentage): average (standard devi-

ation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Eect of dierent deformations on the overall sta removal error rates in percentage:

average (standard deviation) Individual pixels error. . . . . . . . . . . . . . . . . 44


average (standard deviation) Sta line interruption error. . . . . . . . . . . . . . 44


average (standard deviation) Segmentation region level. . . . . . . . . . . . . . . 45

4.1 Clefs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.3 Flags and beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Rests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Ties and Slurs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Accents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.7 Time signatures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.8 Number of neurons in the hidden layer (not)using elastic matching method (EM) in

the dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.9 The number of states in the hidden Markov model with and without the elastic

matching method (EM) in the dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.10 The values of the parameters C and γ in the support vector machines with and

without the elastic matching method (EM) in the dataset. . . . . . . . . . . . . . . 66

4.11 Classes list of handwritten and printed music symbols. . . . . . . . . . . . . . . . . 67

4.12 Results obtained with the test set in the classication process for the handwritten

music symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

VIII

4.13 Results obtained with the test set in the classication process for the printed music

symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.14 Results obtained with the test set in the classication process for the handwritten

music symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.15 Results obtained with the test set in the classication process for the printed music

symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.16 The performed of the natural and sharp symbols. . . . . . . . . . . . . . . . . . . . 70

B.1 Table of confusion of the nearest neighbor classier. . . . . . . . . . . . . . . . . . 97

B.2 Table of confusion of the neural network. . . . . . . . . . . . . . . . . . . . . . . . . 97

B.3 Table of confusion of the support vector machines. . . . . . . . . . . . . . . . . . . 98

B.4 Table of confusion of the hidden Markov model. . . . . . . . . . . . . . . . . . . . . 98




B.8 Table of confusion of the hidden Markov Model. . . . . . . . . . . . . . . . . . . . . 100









Glossary

OMR Optical Music Recognition

SVM Support Vector Machine

HMM Hidden Markov Model

EM Elastic Matching

LTH Line Track Height

X

Part I

Introduction

1

Chapter1

Introduction

Music, from Greek µυσικη (τ εχνη) musike (techne), which means the art of the muses, can

be dened as an organized sequence of sounds and silences so as to produce aesthetic plea-

sure in the listener. There are evidences, by Pictographs, that music is known and practiced

since prehistory. Over the years, the music expanded in many several music styles and for

many dierent purposes, like educational or therapy. All known cultures have their own mu-

sical practice. Music is a pivotal part in the cultural heritage of any society. In this way, its

preservation, in all of its forms, must be pursued. On the face of it, the Universal Declaration

on Cultural Diversity adopted by the General Conference of UNESCO on 2001 asserts that

cultural diversity is as necessary for humankind as biodiversity is for nature, and that policies

to promote and protect cultural diversity thus are an integral part of sustainable development1.

Portugal has a notorious lack in music publishing from virtually all eras of its musical history.

In spite of most of the original manuscripts of music known before twentieth century being

kept in the national library in Lisbon, there is not any repository of musical information from

the last century. Although there are recent eorts to catalogue and to preserve in digital form

the Portuguese music from the twentieth century notably the Music Information Center2

and the section on musical heritage from the Institute of the Arts website3 most of the mu-

sic pre-dating computer notation software was never published and still exists in the form of

manuscripts or photocopies spread out all over the country in discreet places. The risk of irre-

versibly losing this rich cultural heritage is a reality that must be taken seriously and dealt with

accordingly. Digitization has been commonly used as a possible tool for preservation, oering

easy duplications, distribution, and digital processing. However, transforming the paper-based

music scores and manuscripts into a machine-readable symbolic format (facilitating operations

such as search, retrieval and analysis), requires an Optical Music Recognition (OMR) system.

Unfortunately, the actual state-of-the-art of handwritten music recognition is far from provid-

ing a satisfactory solution.

The project Automatic recognition of handwritten music scores initiated in 2007 by Instituto

de Engenharia de Sistemas e Computadores do Porto (INESC Porto) and Escola Superior de

Música e das Artes do Espectáculo (ESMAE) was the starting point for creating an OMR

1http://www.unesco.org/bpi/eng/unescopress/2001/01-112e.shtml2http://www.mic.pt3http://patrimonio.dgartes.pt/?lang=pt

3

4 Chapter 1. Introduction

system that addresses some of the identied problems which will be described in detail in the

following sections.

It is the aim of this project to overcome the problem of musical symbol recognition in hand-

written scores through the research and application of the most recent techniques of machine

learning and articial intelligence. Moreover, there is also the intention of creating a web-based

system providing generalized access to a wide corpus of handwritten unpublished music en-

coded in digital format. The database will not only centralize as much information as possible

but will also serve to preserve the musical heritage in an innovative way with a wide range of

possibilities [CCRG08a, CRCG08]. The architecture for the proposed system is represented in

Figure 1.1.

Figure 1.1: Generic system architecture.

The system is composed by three dierent entities: repository, web server and web browser.

Briey, the Repository module stores the original scanned score, the digital counterpart in

MusicXML and all the descriptive metadata inserted by the user. All the remaining system

contents, such as the user information, are also stored in this entity. The Web Server is the

user access point to the system as well as to all of its processing modules run on the server,

encompassing the search engine and the optical recognition engine for the musical scores. The

Web Server interacts with the Repository and with the Web Browser, which establishes the

interface between the user and the system. The user interface on a Web Browser allows the

complete management of the musical scores and associated metadata, as well as carrying out

the system administration.

The project presented here is innovative for several reasons:

1. It has the ambition to develop a formalism to model, in a consistency way, the heteroge-

neous knowledge about the language and musical notation.

2. Aims to explore the wealth and the potential from the most recent techniques of machine

learning and articial intelligence, not only for representing and merging knowledge, but

also to make decisions.

1.1. OMR Architecture 5

3. Intends to digitalize and preserve a wide corpus of handwritten scores in an unprecedented

way.

4. Will include an OMR engine integrated with an archiving system and a user-friendly

interface for searching, browsing and edition. The digitized scores will be stored in Mu-

sicXML, a recent and expanding music interchange format designed for notation, analysis,

retrieval, and performance applications.

5. Intends to make accessible online the repository of handwritten scores for enjoyment,

educational and musicological purposes.

The work of this thesis, as carried out in this project, aims to develop new OMR algorithms

for the web-based sytem.

1.1 OMR Architecture

The principal aims of an OMR application are the recognition, the representation and the stor-

age of musical scores in a machine-readable format. An OMR program should thus be able

to recognize the musical content and make the semantic analysis of each musical symbol of a

musical work. In the end, all the musical information should be saved in an output format that

is easily readable by a computer.

In fact, the architecture of an OMR system is dependent on the methods used in the segmen-

tation and recognition steps. Generally, the OMR process can be divided into three principal

modules (see Figure 1.2):

1. Recognition of musical symbols from a music sheet.

2. Reconstruction of the musical information to build a logical description of musical nota-

tion.

3. Construction of a musical notation model for its representation as a symbolic description

of the musical sheet.

It is important to note that this is a sequential architecture. In other words, the results obtained

in one phase are the input to the next phase. The rst module is typically further divided into

three stages:

Ê Image pre-processing: it consists in the application of several techniques (e.g. binariza-

tion, noise removal, blurring, deskewing, among others) to make the recognition process

more robust and ecient.

Ë Sta lines detection and removal to obtain an image containing only the musical symbols.

Ì Objects segmentation and basic symbol recognition.

The second and third modules (Musical Notation Reconstruction and Final Representation

Construction) are intrinsically intertwined. In the second module of musical notation recon-

struction, the symbols primitives are merged to form musical symbols. Usually, in this step


graphical and syntactic rules are used. This is important for the introduction of context infor-

mation to validate and solve ambiguities from the last module of music symbol recognition. It

is also produced a document where detected symbols are interpreted to assign them a musical

meaning. In the third module of nal representation construction this document is converted

into a format of musical description, such as MusicXML, that allows for storage.

In this thesis I focused in the step of music symbol recognition. The system is composed by

a preliminary stage of segmentation and a stage of classication of musical symbols. More

specically, the segmentation phase is composed by sta line detection and removal, followed

by the stage of musical symbol extraction. In the classication step four distinct classiers are

used and an elastic matching method see Figure 1.2. This method is applied to all classiers

to simulate the possibility of deformations in the musical symbols. Moreover, it also permits

the enhancing of the test set.

Figure 1.2: OMR architecture.

1.2 Objectives

In order to overcome the limitations from the actual methods in musical symbol recognition

on handwritten musical scores and to organize them hierarchically, I researched the application

of the most recent techniques of machine learning and articial intelligence. I also developed

algorithms for the musical symbol detection and recognition. The musical notation is one of

the languages most widely known. It has had a continuous development over time, with re-

quirements of consistency and precision. For this reason, the proposed methodology should be

1.3. Contributions of this Project and Related Publications 7

naturally adaptable to handwritten scores and to dierent standard musical notations.

In general, the rst step in the process of handwritten musical scores recognition is sta lines

detection. This operation is one of the most important phases in musical symbol recognition,

because it determines the results for subsequent proceedings. With this purpose in mind we

had the following objectives:

Ê To test thoroughly the proposed algorithm for the sta lines detection using an appropri-

ate database and error metrics, and compare this algorithm with others from the state of

the art.

Ë To create a new database of ideal musical scores from an existing one, where several con-

troled deformations are applied to the ideal musical scores to simulate feasible problems

of handwritten musical scores.

Ì To use the proposed algorithm for the sta lines detection as a rst step in some state

of the art sta removal algorithms and conduct a series of experiments using appropriate

error metrics.

The musical symbol extraction is the next step before the classication. The segmentation

process is based in existing algorithms [DDCF08]. In this work, the symbols we want to

recognize were split into four dierent types: the symbols that were featured by a vertical

segment, the symbols that link the notes, the remaining symbols connected to sta lines and

the symbols above and under sta lines. Since we are working with handwritten scores many

situations may occur. The score degradation and highly dierent handwritten styles are possible

undesirable situations that may occur. The objectives for this phase were:

¬ Research and development of an algorithm to extract the handwritten musical symbols.

Simulate the variability with an elastic matching method in the music symbols, with the

aim of preparing the classier for several dierent situations that may occur in handwrit-

ten music symbols.

® Compare and study dierent classiers for the musical symbols recognition.

1.3 Contributions of this Project and Related Publications

This dissertation presents the following contributions for the preservation and the general access

to musical and cultural heritage:

1. The introduction to the music analysis community of the algorithm of the sta lines

detection based in the Stable Paths approach.

2. The creation of a database of real scores with its references: detection and removal

references.

3. New algorithms for the automatic detection and classication of musical symbols. In this

way, analysis and search of the score by the user will be more easily.


4. Integration of the new algorithms in the web-based system being developed under the

project.

The work related with the Project where the dissertation belongs, already resulted in the

publication of the followings papers:

A Connected Path Approach for Sta Detection on a Music Score, Jaime S. Cardoso,

Artur Capela, Ana Rebelo, Carlos Guedes, in the IEEE International Conference on

Image Processing (ICIP 2008).

Integrated Recognition System for Music Scores, Artur Capela, Jaime S. Cardoso, Ana

Rebelo, Carlos Guedes, in the International Computer Music Conference (ICMC 2008).

Sta Line Detection and Removal with Stable Paths, Artur Capela, Ana Rebelo, Jaime

S. Cardoso, Carlos Guedes, in Proceedings of the International Conference on Signal

Processing and Multimedia Applications (SIGMAP 2008), pages 263270, 2008.

And it waits for the result of the submission of the:

Optical Recognition of Music Symbols: a comparative study, Ana Rebelo, Artur Capela

and Jaime S. Cardoso, in International Journal of Document Analysis and Recognition

(IJDAR 2008).

Sta Detection with Stable Paths, Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos

Guedes, Joaquim Pinto da Costa, in IEEE Transaction on Pattern Analysis and Machine

Intelligence (TPAMI 2008).

During the second semester of the academic year 06/07 one more paper was written. The rst

version of the proposed algorithm has already been presented and published:

A Shortest Path Approach for Sta Line Detection, Ana Rebelo, Artur Capela, Joaquim

F. Pinto da Costa, Carlos Guedes, Eurico Carrapatoso, Jaime S. Cardoso, in Proceedings

of the International Conference on Automated Production of Cross Media Content for

Multi-channel Distribution (AxMedis 2007), pages 7985, Nov. 2007.

1.4 Dissertation Structure

This dissertation is organized in 5 chapters that describe the work developed in the last year.

It also has a set of appendices with complementary information that will help the description of

the work done. After this introductory Chapter, the description of the works related with this

project will be made in Chapter 2. It is constituted by three states: a presentation of the state

of the art of the algorithms in the sta lines detection area and the already existing algorithms

in the eld of sta lines removal; a presentation of the state of the art of the algorithms in the

musical symbols classication area; and a background presentation of the several methods used

in the classication phase. In Chapter 3, the proposed stable paths algorithm to detect the

sta lines is described. The database used, the error metrics applied in the several experiments

1.4. Dissertation Structure 9

done to test the stable paths algorithm, and the results obtained are also presented. In chapter

4, the segmentation and classication phase is presented. A brief description of the musical

symbols that the algorithm tries to recognize is also described. Finally, conclusions are drawn

and future work is suggested in chapter 5.

Chapter2

Related Works

The investigation in the OMR eld began with Pruslin [Pru66] and Prerau [Pre70]. However,

it was only in the decade of 1980's, when the equipment of digitalization became accessible,

that work in this area has expanded [Car89, Ng95, Coü96, Bai97]. Over the years, there

have been appearing several commercial OMR software, but no one has a satisfactory perfor-

mance in terms of precision and robustness. The complexity of the OMR task caused by the

bidimensional structure of the musical notation and also by the existence of several combined

symbols organized around the note heads have been conditioning the progress in this area. Until

now, even the most developed recognition systems (PIANOSCAN, NOTESCAN in Nightingale,

MIDISCAN in Finale, PHOTOSCORE in Sibelius, SMARTSCORE, SHARPEYE, etc) can not

identify all music notations. Besides that, classic OMR is more focused in regular printed music

sheets; so, a good eciency is only obtained when this type of scores are processed.

In this thesis, the recognition of standard handwritten music is the goal. With these manuscripts

emerge new and dierent additional problems comparatively with printed music. Handwritten

musical scores tend to be rather irregular and determined by the authors' own writing style.

Moreover, if we consider that most of these works are old, the quality of paper in which it is

written might have degraded throughout the years, making it a lot harder to correctly identify

its contents. Handwritten musical scores are likely to have changes of size, shape and intensity

of handwritten symbols by the same author into the same score. As a result, the detection

and recognition process is even more complicated. Furthermore, problems exist not only in

this level but also in the detection of the handwritten sta lines. These are rarely straight and

horizontal, and are not parallel to each other. For example, some staves may be tilted one

way or another on the same page or they may be curved. Depending on the level of the paper

degradation there may also exist discontinuities in the sta lines and in the remaining symbols.

In general, these scores have many years of existence, and therefore there is a sharp decay in

the quality of the paper and ink. Many works are also photocopies from photocopies which

adds noise through the degree of copy.

2.1 State of the Art in Sta Lines Detection

One important stage in OMR is sta line detection. The reason for that is the possibility to

isolate the musical symbols present in the score enabling their extraction. Sta lines detection

10

2.1. State of the Art in Sta Lines Detection 11

is, consequently, one of the fundamental steps in all OMR processes, being the following pro-

ceedings very dependently on the performance and results obtained in this initial phase. In

doing so, any OMR algorithm begins with this operation.

The problem of sta line detection is often considered simultaneously with the goal of their

removal, although exceptions exist [Mat85, MO07, Szw05, Pug06]. The simplest approach

consists in nding local maxima in the horizontal projection of the black pixels of the image

[KI90, Fuj04, RCF+93, MN96, BBN01, TSM06]. These local maxima represent line positions,

assuming straight and horizontal lines. Several horizontal projections can be made with dif-

ferent image rotation angles, keeping the image in which the local maxima are bigger. This

eliminates the assumption that the lines are always horizontal. An alternative strategy for

identifying sta lines is to use vertical scan lines [Car89]. This process is based on a Line

Adjacency Graph (LAG). LAG searches for potential sections of lines: sections that satisfy

criteria related to aspect ratio, connectedness and curvature. More recent works present a

nearly sophisticated use of a combination of projection techniques in order to improve the basic

approach [Bai97, BBN01, RB05].

Fujinaga [Fuj04] incorporates a set of image processing techniques in the algorithm, including

run-length coding (RLC), connected-component analysis, and projections. After applying the

RLC to nd the thickness of sta lines and the space between the sta lines, any vertical black

run that is more than twice the sta line height is removed from the original. Then, the con-

nected components are scanned in order to eliminate any component whose width is less than

the sta space height. After a global deskewing, taller components, such as slurs and dynamic

wedges are removed.

Other techniques for nding sta lines include the grouping of vertical columns based on

their spacing, thickness and vertical position on the image [RP96], rule-based classication

of thin horizontal line segments [Mah82], and line tracing [Pre70, RT88]. The methods pro-

posed in [MO07, Szw05] operate on a set of sta segments, with methods for linking two

segments horizontally and vertically and merging two segments with overlapping position into

one. Dalitz [DDCF08] is an improvement on these methods.

In spite of the variety of methods available, they all suer from some limitations. In partic-

ular, lines with some curvature or discontinuities are inadequately resolved. The dash detec-

tor [LCL93] is one of a few works that try to handle discontinuities. The dash detector is an

algorithm that searches the image, pixel by pixel, nding black pixel regions that it classies

as stains or dashes. Then, it tries to unite the dashes to construct lines.

A common problem to all the above-mentioned techniques is that they try to build sta lines

from local information, without properly incorporating global information in the detection

process. To our knowledge, none of the proposed methods in the literature tries to dene a

reasonable process from the intrinsic properties of sta lines, namelly the fact that they are the

only extensive black objects on the music score. Usually, we argue that the most interesting

12 Chapter 2. Related Works

techniques arise when one denes the detection process as the result of optimizing some global

function. In this thesis, we suggest a graph-theoretic framework where sta lines are the

solutions of a global optimization process.

2.2 State of the Art in Musical Symbol Extraction and

Classication

The segmentation task has been the object of study by some authors [Mah82, RT88, Car89,

RB05, TSM06]. This process deserves some careful attention because it interferes with the

classication stage. Major problems result from the diculty to obtain individual meaningful

objects. This is due to the printing and digitalization as well as to the paper degradation and

the lack of a standard notation. In addition there are distortions caused by sta lines, broken

and touching symbols as well as high density of symbols, with dierent sizes and shapes. Be-

sides, few research works have been done in handwritten scores [FLS05].

The most usual approach consists in extracting elementary graphic symbols: note heads, rests,

dots, etc, that can be composed to build music notation. Usually, the primitive segmentation

step is made along with the classication task [Mah82, RT88, Car89, KI90, RB05, TSM06],

however exceptions exist [Fuj04, BBN01, Pug06]. In [Mah82] and [Car89] it is applied the same

technique used in sta lines detection to the detection of musical symbols. Mahoney [Mah82]

builds a set of candidates to one or more symbols types and then use descriptors to select

the matching candidates. This technique is not feasible for handwritten scores, because the

manuscript symbols do not have stable shapes and sizes. Carter [Car89], in his turn, uses

the LAG to extract symbols. The objects resulting from this are classied according to the

bounding box size, to the number and organization of their constituent sections. Once again

this process of classication is not promissing for handwritten symbols, for the reasons pre-

sented above. Other authors [Fuj04, BBN01] have also chosen to apply projections to detect

symbols primitives. The recognition is done using features extracted from the projection pro-

les. In [Fuj04] the k-nearest neighbor rule is used in the classication phase, whether neural

networks is the classier used in [BBN01].

Randriamahefa [RCF+93] proposed a structural method based in construction of graphs for

each symbol. These are isolated by using a growing region method and thinning. Template

matching is adopted in [Mat85, MN96, Ros02, RB05, TSM06]. In [Ros02, RB05] a fuzzy model

that depends on a robust symbol detection and template matching results was developed. This

method aims to deal with uncertainty, exibility and fuzziness at a symbol level. In fact in this

approach the segmentation process has two steps: individual analysis of musical symbols and

fuzzy model construction. In the rst step, the vertical segments are detected by region growing

method and template matching. Then, beams are detected by a region growing algorithm and

a modied Hough transform. The remaining symbols are extracted again by template match-

ing. From this rst step results three recognition hypotheses; each assigning the pattern to a

possible class. The fuzzy model is used to make a consistent decision. The proposed process

2.3. Background Knowledge 13

incorporates graphical and syntactic rules. Besides, it allows the possibility of applying learning

procedures when potential errors occur, in an eort to gain robustness.

Other techniques for extracting and classifying musical symbols include rule-based systems to

represent all the musical information [RT88], a collection of processing modules that communi-

cate by a common working memory [KI90] and pixel tracking with template matching [TSM06].

Toyama [TSM06] checks for coherency in the primitive symbols detected by estimating touching

positions. This evaluation is done by music writing rules. Coüasnon [CC93, Coü96] proposed

a recognition process entirely controlled by grammar which formalizes the musical knowledge.

In [RP96] the segmentation process involves three stages: line and curves detection by LAG,

accidentals, rests and clefs detection by a character prole method and note heads recognition

by template matching. The contextual recognition is done by graph grammars. In [Pug06] the

segmentation task is based in Hidden Markov Models. This process performs segmentation and

classication simultaneously. However, this technique results only for very simple scores, that

is, scores without slurs or more that one symbol in the same column and sta.

We can conclude, based on this brief description of the state of the art, that although the OMR

systems already contribute to the analysis of musical scores by several methods, it is still far

from solved the problems imposed by handwritten scores. Therefore, new research and new

algorithms are important and necessary. By all the works already presented, only the fuzzy

model proposed by [RB05] lead us to deduce that maybe it is a good method to take into

account in a future work. It will be enriching for this project to do a comparative study of this

method with elastic matching and also with a segmentation process proposed in this thesis.

2.3 Background Knowledge

In this Section we review some concepts necessary for a better understanding of the work

presented in the course of this thesis.

2.3.1 Graphs

A graph is a pair G = (V,E) of sets such that E ⊆ [V ]21 the elements of E are 2-element

subsets of V . G is composed of two sets V and E. V is the set of vertices (or nodes), and E is

the set of edges (or lines) (p, q), p, q ∈ V [Die05].

A graph with vertex set V is called a graph on V . V (G) is the vertex set of a graph G and E(G)its edge set2. The number of vertices of a graph G is its order, |G|, and ||G|| is the number

of edges. A vertex v is incident with an edge e if v ∈ e; then e is an edge at v. Endvertices

or ends are two vertices incident with an edge, and an edge joins its ends. If x ∈ X and

y ∈ Y , then xy is an X − Y edge. E(X,Y ) is the set of all X − Y edges in a set E. The

set of all the edges in E at a vertex v is denoted by E(v). Two vertices x, y of a graph G

1[V ]k ≡ the set of all k-element of V .2Note that these conventions are independent of any names, for instance if we have a graph H = (W, F ) the

vertex set w is still referred to as V (H), not as W (H).


are adjacent, if xy is an edge of G. Two edges e 6= f are adjacent if they have an end in common.

The graph is weighted if a weight w(p, q) is associated to each edge, and it is called a digraph if

the edges are directed, i.e., (p, q) 6= (q, p). In other words, a digraph is a pair (V,E) of disjointsets of vertices and edges together with two maps init : E → V and ter : E → V assigning to

every edge e an initial vertex init(e) and a terminal vertex ter(e).

A path is a (non-empty) graph P = (V,E) of the form

V = x0, x1, . . . , xk E = x0x1, x1x2, . . . , xk−1xk ,

where the xi are all distinct. The end vertices x0 and xk are linked by P and the vertices

x1,. . .,xk−1 are the inner vertices of P . The path cost is the sum of each arc weight in the

path. The length of a path is the number of its edges, and the path of length k is denoted by

P k. A non-empty connected graph G is a graph where two of its vertices are linked by a path

in G. A direct graph D is oriented with an orientation of a graph G if V (D) = V (G) and

E(D) = E(G), and if init(e), ter(e) = (x, y) for every edge e = xy.

In graph theory, the shortest-path problem seeks the shortest path connecting two nodes; e-

cient methods are available to solve this problem, such as the dynamic programming algorithms.

Dynamic Programming

Before presenting the operations of dynamic programming techniques let us take a look to the

following example [SDV06]. Figure 2.1 represents a directed acyclic graphs and its linearization

(topological ordering the nodes are arranged on a line so that all edges go from left to right).

This linearization process is important for the shortest path.

Estart

A

B

C

D

1

2

53

6

1

4

Estart B A C D

1

2 5

36

14

Figure 2.1: A directed acyclic graphs and its linearization.

Suppose we want to gure out distances from node E to the other nodes. For concreteness,

let's focus on node C. The only way to get to it is through its predecessors, B or A; so to nd


the shortest path to C, we need only compare these two routes:

dist(C) = min dist(B) + 3, dist(A) + 4 .

A similar relation can be written for every node. If we compute these dist values in the left-

to-right order of Figure 2.1, we can certainly get to a node v, and then we already have the

information needed to compute dist(v). We are therefore able to compute all distances in a

single pass:

initialize all dist(·) values to ∞dist(s) = 0

for each v ∈ V s, in linearized order:

dist(v) = min(u,v)∈E dist(u) + l(u, v)

In face of it, we can note that this algorithm is solving a collection of subproblems, dist(u) : u ∈ V :it starts with the smallest of the distances, dist(e), since it immediately knows its answer to

be 0; then, it proceeds with progressively larger subproblems distances to vertices that are

further and further along in the linearization where it is thinking of a subproblem as large if

it needs to have solved a lot of other subproblems before it can get to it. This is a very general

technique. At each node, the algorithm computes some function of the values of the node's

predecessors. In this case, the particular function is a minimum of sums.

This is dynamic programming. This is a powerful algorithmic paradigm for eciently solving a

wide range of search and optimization problems which exhibit the characteristics of overlapping

subproblems (the problem can be broken down into subproblems which are reused several times)

and optimal substructure (optimal solutions of subproblems can be used to nd the optimal

solutions of the overall problem).

Now, it is crucial to know what the subproblems are when we are solving a problem by dynamic

programming. Each node will represent a subproblem, and each edge will represent a precedence

constraint, of the form (i − 1, j) → (i, j), (i, j − 1) → (i, j), and (i − 1, j − 1) → (i, j), on the

order in which the subproblems are tackled. We can also put weights on the edges.

2.3.2 Classication Methods

Dierent approaches were evaluated in this work for the classication problem: hidden markov

models, support vector machine, neural networks and k-nearest neighbor. Besides that, con-

cerning the fact that we are trying to classify handwritten musical symbols, we state one main

problem: the enormous variations in symbols. Therefore, the classication process needs to take

into account the variability in the writing style. An Elastic Matching method [Nis95, JZ97]

was used to expand our database to achieve a better approach for these several variances. In

the next subsections a brief explanation of these methods will be provided.


Hidden Markov Models

Hidden Markov Models (HMMs) have rarely been used in OMR except in experiences made

by [KPM96, MMM04, Pug06]. The application of this technique in musical symbol classica-

tion had its origins on optical character recognition. One of the reasons for the use of HMM

lies in its capability to perform segmentation and recognition at the same time. Limited by

time, for now, HMM were only used to recognize the symbols.

A HMM is a doubly stochastic process, that generates symbols sequence, with an underlying

stochastic process that is hidden and can only be detected through another process whose real-

izations are observable [BS00]. The hidden process consists of a set of states connected to each

other by transition probability. Left-right HMMs were used here. Transitions probabilities from

a state i to another state j are given by A = aij, where aij = P [qt+1 = Sj |qt = Si] , 1 ≤i, j ≤ N . The observed process consists of a set of outputs or observations. Each observation

is contained in a state with some probability density function. The set of observations proba-

bilities is given by B = bj(k), where bj(k) = P [ot = xk|qt = Sj ], 1 ≤ k ≤ M , j = 1, 2, ..., N .

bj(k) represents the probability of the observation xk in state Sj , oj denotes the observation

in time t and qt represents the state in time t. HMM can now be formulated as λ = (A;B;π),where π is a set of initial probabilities of states [WFJC01].

Support Vector Machines

Support Vector Machines (SVMs), pioneered by Vapnik [Vap98], deal with the problem of

classication as a problem of quadratic optimization. This technique has as main idea the con-

struction of a hyperplane as the decision surface in such a way that the margin of separation

between positive and negative examples is maximized. Support vector machines classify the

data using support vectors [Hay98].

Without loose of generalization, SVMs try to maximize the margin of the optimal hyperplane

which separates the data. This is typically done in a much higher dimension than that of the

original feature space. Formally, given the training set xi, yiNi=1 with input data xi ∈ RP

and corresponding binary class labels di ∈ −1, 1, the linear separable optimal hyperplane is

dened by g(x) = wtϕ(x) + b where ϕ(x) denotes a xed-feature space transformation and b a

bias parameter. x is assigned to class 1 if g(x) > 0 or to −1 if g(x) < 0. This is equivalent tohave di[wtϕ(x) + b] ≥ 1, i = 1, . . . , N . Summarizing, maximizing the margin is equivalent to

solving

minw,b

12wtw

s.t di[wtϕ(x) + b] ≥ 1, i = 1, . . . , N(2.1)

If the training classes are not linearly separable, it means that above conditions and problem

formulation can not be sustained. For this reason, slack variables ξi, i = 1, . . . , N were added.

These allowed to have a penalty for the data points wrongly classied. Finally, the objective


is to minimize the error. That is,

minw,b,C,ξi

12wtw + C

N∑

i=1

ξi

s.tdi[wtϕ(x) + b] ≥ 1− ξi, i = 1, . . . , Nξi ≥ 0

(2.2)

where the parameter C > 0 controls the trade-o between the slack variables and the margin.

In the feature space it is easier to solve the dual problem, and sometimes it is the only way to

train the support vector machines. It is possible to formulate the dual problem3 for a sample

of training xi, yiNi=1 not separable as follows:

maxα

N∑

i=1

αi −12

N∑

i=1

N∑

j=1

αiαjdidjk(xi, xj)

s.t

∑Ni=1 αidi = 0

0 ≤ αi ≤ C i = 1, 2, ..., N

(2.3)

where k(xi, xj) = ϕT (xi)ϕ(xj) =∑m1

l=0 ϕl(xi)ϕl(xj), i = 1, 2, ..., N and j = 1, 2, ..., N . ϕl(xi)is the l component in the application ϕ(xi) of xi; m1 is the dimension of the feature space.

Above a binary classier was described whereas in this work a multiclass problem is presented.

In fact, several works have been suggested as an extension to the binary problem classication

that originally was proposed [eeFc02, HL02, MA98]. Usually, there are two types of approach for

the multiclass classication problems. One is to build and to combine several binary classiers

one-against-one and one-against-all - and another approach is to consider directly a resolution of

the quadratic problem. The one-against-one methodology was used in this work. This method

consists of training the jth and kth classes in the following binary problem

minwjk,bjk,C,ξjk

i

12

(wjk)twjk + C

N∑

i=1

ξjki (wjk)t

s.tyi[(wjk)tϕ(x) + b] ≥ 1− ξjki , i = 1, . . . , Nξjki ≥ 0

(2.4)

with l training data (x1, y1), . . . , (xl, yl) where xi ∈ <n, i = 1, . . . , l, and yi ∈ 1, . . . , k is theclass of xi; the training data xi are mapped to an high dimensional space by the funtion ϕ.

The three most common types of inner-product kernels for SVMs are: polynomial learning

machine, radial-basis function network and tangent hyperbolic. In this work a radial-basis

function network was used, given by:

k(x,xi) = exp(−γ||x− xi||2), γ ≥ 0 (2.5)

3See Appendix A.1 to see the demonstration.


Neural Networks

The term neural network was initially studied with the aim to represent the information pro-

cessing in biological systems [MP88, WH88, RHW88]. The attempts were to produce intelligent

perception and cognition machines by simulating the physical structure of human brains. In

our days, the principles and algorithms of neural networks have found several applications in

diverse elds including pattern recognition and signal processing.

Neural networks are composed by interconnected neurons. These are the information processing

units. The neural model is form by three basic elements: a set of connecting links (each one is

multiplied by a weight), an adder (for summing the input signals) and an activation function

(for limiting the amplitude of the output of a neuron) [Hay98]. Formally, we can dene the

output of a neuron by

yk = ϕ(m∑

j=1

ωkjxj + bk) (2.6)

where x1,x2,. . .,xm are the input layers, ω are the weights of neuron k, bk is the bias and ϕ(.)is the activation function. There are three common types of activation functions: threshold

function, piecewise-linear function and sigmoid function. In this work, the last one was used

, whose graph is s-shaped. This way, the function accepts input values between −∞ and ∞,

and return values between 0 and 1. Hence, it is dened by

f(x) =1

1 + exp(−ax),where a> 0 represents the slope (2.7)

A multilayer feedforward neural network was used. Typically, this network consists of input

and output layers of neurons and one or more hidden layers that are not part of the input or

output of the network. The input signal propagates through the network in a forward direction.

The learning algorithm used to train this network was the backpropagation algorithm4. This

algorithm is based on the gradient descending technique to minimize the cost function. In a

simple explanation, the error backpropagation learning consists of two passes through the layers

of the network: a forward pass and a backward pass. In the forward pass the link weights of

the neurons are all xed. The input vector is applied to the sensory nodes of the networks

and it is produced a set of outputs. In the backward pass the link weights are all adjusted

in accordance with an error-correction rule. Basically, the output values of the network are

subtracted from a desired response (targets) to produce an error signal. This error signal is

then propagated backward through the network to all neurons. A network with K outputs

was used, one corresponding to each class, and target values of 1 for the correct class and 0

otherwise.

K-nearest Neighbour

K-nearest neighbour is a supervised learning method for classifying objects based on training

examples in the feature space [Fuk90]. This algorithm belongs to a set of techniques called

4See Appendix A.2 for more details.


Instance-based Learning. The K-nearest neighbour algorithm is very simple and basically it

does not have train. It starts by extending the local region around a data point x until the

kth nearest neighbour is found. The most represented class in the k-closest samples denes

the predicted class. Data training lies only in the estimation of the best k. In this work the

Euclidean distance was used :

deuclidean(a, b) =√∑d

i=1 (ai − bi)2

Elastic Matching

Several research methods exist in deformable template eld applied on handwritten digits and

handprinted characters recognition (e.g. [LS88, Wak94, Nis95, JZ97]). Lam [LS88], one of

the rst works in this area, proposed a method of recognition in two-stages. The images are

rst recognized by a tree classier. Those which cannot be satisfactory assigned to a class

are passed to a matching algorithm, which deforms the image to match with a template. In

Nishida [Nis95] a grammar-like model for applying deformations in primitive strokes was de-

veloped. A new and robust shape-matching approach to recognize numbers manuscripts was

proposed by Wakahara [Wak94]. The method uses successive local ane transformation (LAT)

operations to gradually deform the image. The aim is to yield the best match to an input

binary image. LAT on each point at one location is optimized using locations of other points

by means of least-squares data tting using Gaussian window functions.

The deformation and matching technique used in this work to classify the musical symbol is

based in [JZL96, JZ97]. In this approach, the image is mapped on a unit square S = [0, 1]×[0, 1].The points in this square are mapped by the function (x, y) → (x, y) + D(x, y). The space ofdisplacement functions is given by

exmn(x, y) = (2 sin(πnx) cos(πmy), 0) (2.8)

eymn(x, y) = (0, 2 sin(πny) cos(πmx)) (2.9)

Specically, the deformation function is chosen as follows:

D(x, y) =M∑

m=1

N∑

n=1

ξxmnexmn + ξymne

ymn

λmn(2.10)

where ξ = (ξxmn, ξymn),m, n = 1, 2, . . . are the projections of the deformation function on the

orthogonal basis. Because D(x, y) can represent complex deformations by choosing dierent

coecients of ξmn and dierent values of M and N, it is important to impose a probability

density on D(x, y). Therefore, the ξmn's are assumed to be independent of each other, inde-

pendent along the x and y directions and identically Gaussian distributed with zero mean and

variance σ2.

Part II

Detection

21

Chapter3

Staff Lines Detection and Removal∗

Sta line detection and removal are the rst fundamental stages in many systems of optical

music recognition, with subsequent processes relying heavily on their performance. The reasons

for detecting and removing the sta lines lie on the need to isolate the musical symbols for a

more ecient and correct detection of each symbol present on the score. For the sta lines

to be properly removed and the symbols correctly detected it is necessary to make a correct

detection of the sta lines. When we are dealing with printed and regular scores the process is

much simpler. However, in the case of handwritten music sheets, which is the case of study of

this dissertation, several vicissitudes, already listed in Chapter 2, may occur. For this reason,

despite the multitude of attempts to treat the problem of sta lines detection, when we are

working with old and handwritten music scores, the results are not completely satisfactory yet.

In this dissertation a new conceptualization to detect the sta lines is presented. The proposed

paradigm begins with the work done in [RCC+07]. The main idea is to consider the sta lines

as the result of the shortest path between the two margins of the music sheet, giving preference

to black pixels. The initial version only implemented a heuristic of this basic idea, without a

principle properly reasoned and tested. The proposed new learning methodology is extended

and explored in various directions in this thesis. Firstly, the concept of stable paths is intro-

duced in order to improve the computational performance of the method. Secondly, the design

of the weights on the graph resulting from the music score is generalized to dierentiate black

pixels belonging to the sta lines from black pixels resulting from the music symbols. Finally,

the post processing is rened, improving the overall performance. A further development is the

study of new sta removal algorithms, by incorporating the proposed sta line detection on

standard sta removal algorithms.

It is important to state that despite this Project being in the eld of recognition of standard

musical notation it is not necessary, for the sta line detection and removal phase, to be con-

ned only to this type of notation. Hence, a database was built. This database contains not

only several symbols with dissimilar shapes and intensities, but also a varying number of lines

per sta.

∗Some portions of this chapter appears in [CCR+08, CCRG08b].

23

24 Chapter 3. Sta Lines Detection and Removal

3.1 Basic Concepts

First of all, it is necessary to indicate that the paradigm described considers that a sta line

can be seen as a connected path from the left side of the music score to the right side. As sta

lines are almost the only extensive black objects on the music score, the path we are looking

for is the shortest path between the two margins if paths (almost) entirely through black pixels

are favoured.

Consequently, the image grid the music sheet is considered as a graph with pixels as nodes

(vertices) and edges connecting neighbouring pixels. The weight w(p, q) of each arc is a functionof the pixels values and their relative positions. A path from vertex (pixel) v1 to vertex (pixel)

vn is a list of unique vertices v1, v2, . . . , vn, with vi and vi+1 corresponding to neighbour pixels.

The total cost of a path is the sum of each arc weight in the path and is given by

n∑

i=2

w(vi−1, vi). (3.1)

A path from a source vertex v to a target vertex u is said to be a shortest path if its total cost is

minimum among all v-to-u paths. The distance between a source vertex v and a target vertex

u on a graph, d(v, u), is the total cost of a shortest path between v and u.

A path from a source vertex v to a sub-graph Ω is said to be a shortest path between v and Ωif its total cost is minimum among all v-to-u paths, where u ∈ Ω. The distance from a node

v to a sub-graph Ω, d(v,Ω), is the total cost of a shortest path between v and Ω and it is

represented by:

d(v,Ω) = minu∈Ω

d(v, u). (3.2)

A path from a sub-graph Ω1 to a sub-graph Ω2 is said to be a shortest path between Ω1 and

Ω2 if its total cost is minimum among all v-to-u paths, where u ∈ Ω1 and v ∈ Ω2. The distance

from a sub-graph Ω1 to a sub-graph Ω2, d(Ω1,Ω2), is the total cost of a shortest path between

Ω1 and Ω2, and is given by:

d(Ω1,Ω2) = minv∈Ω1,u∈Ω2

d(v, u). (3.3)

3.2 Underlying Principle

As mentioned before, sta lines can be considered as the only extensive objects made from

black pixels in the music score, connected paths of black pixels from the left side to the right

side of the music score. Assuming that paths through black pixels are preferred over paths

through white pixels, sta lines can then be found among the shortest paths from the left to

the right margin of the music score. Sta lines can be modelled as paths between two regions

Ω1 and Ω2; these represent, respectively, the left and right margins of the score. Besides that,

one may assume that sta lines do not zigzag back and forth, left and right. Therefore, one

may restrict the search among connected paths containing one, and only one, pixel in each

3.3. Algorithm Outline 25

column of the image2. Formally, let I be an N1×N2 image and dene an admissible sta to be

s = (x, y(x))N1x=1 , s.t. ∀x |y(x)− y(x− 1)| ≤ 1,

where y is a mapping y : [1, · · · , N1]→ [1, · · · , N2]. That is, a sta line is an 8-connected path

of pixels in the image from left to right, containing one, and only one, pixel in each column of

the image.

Given the weight function w(p, q), the cost of a sta can be dened as

C(s) =N1∑

i=2

w(vi−1, vi) (3.4)

The optimal sta line that minimizes this cost can be found using dynamic programming. The

rst step is to traverse the image from the second column to the last column and compute the

cumulative minimum cost C for all possible connected sta lines for each entry (i, j):

C(i, j) = min

C(i− 1, j − 1) + w(pi−1,j−1; pi,j)

C(i− 1, j) + w(pi−1,j ; pi,j)

C(i− 1, j + 1) + w(pi−1,j+1; pi,j)

,

where w(pi,j ; pl,m) represents the weight of the edge incident with pixels at positions (i, j) and(l,m). At the end of this process,

minj∈1,··· ,N2

C(N1, j)

indicates the end of the minimal connected sta. Hence, in the second step, one backtrack from

this minimum entry on C to nd the path of the optimal sta.

3.3 Algorithm Outline

Assume one wants to nd all sta lines present in a score. This can be approached by succes-

sively nding and erasing the shortest path from the left to the right margin of the score. The

erase operation is crucial to ensure that a sta is not detected more than once3.

Consider the music score presented in Figure 3.1(a); in Figure 3.1(b) the rst eleven shortest

paths are traced. From this example it is possible to conclude that music symbols placed on

top of sta lines do not interfere with the detection of the sta lines. Moreover, the example

also makes clear that slightly skewed scores do not pose any problem to the proposed approach.

Nonetheless, two main issues need to be properly addressed. On the one hand, a criterion

is needed to stop the iterative detection of the shortest paths, that is, the sta lines. On the

2These assumptions, 8-connectivity and one pixel per column, impose a maximum detectable 45 rotationdegree. However, a higher rotation degree, in a real case, is unlikely to happen, but may be previously corrected.

3We implemented the erase operation by setting to white the pixels on the detected sta; image resizingcould be a valid alternative [AS07].


(a) Skewed sta lines with music symbols. (b) The rst 11 shortest paths between left andright margins.

Figure 3.1: An exemplicative example of the methodology.

other hand, the initial and the nal parts of a path should also be trimmed, since these (almost)

completely white pixels do not belong to the sta line, despite belonging to the shortest path

of an opposite margin. Another subject deserving attention is the computational complexity.

The sequential computation of the shortest path may be prohibitive for some applications. It

would be interesting to be able to compute several 'shortest paths' simultaneously. Thus, before

presenting the complete algorithm, we introduce the concept of a stable path in a graph, which

will allow computing multiple sta lines in a single iteration, instead of sequentially computing

them one at a time.

3.4 Stable Paths on a Graph

Before presenting the formal denition of a stable path on a graph, it is necessary to consider

a hypothetical example of a simplied music score to motivate this concept. Therefore, lets

consider Figure 3.2 where a sta with only four sta lines (rows 1, 4, 6, and 8) in a 8× 9 image

is presented. The sta lines are the black elements, as in the real case. The presence of noise

on some of the sta lines is simulated by discontinuities. The design of the weight function

will be considered in the Section 3.5; for now, it suces to know that the presented graph was

constructed to favour paths through black pixels.

In fact, we can see in Figure 3.2 that the shortest path between the left and right margins

between the sub-graphs Ω1 and Ω2 is the path corresponding to the rst row, entirely through

black pixels. By following the strategy just delineated, one could nd the four sta lines in four

iterations, sequentially. Nonetheless, although only one sta line corresponds to the shortest

path, they all constitute a sort of (almost) optimal paths. The stable paths concept provides a

means to nd all of such paths simultaneously.

Denition 1. A path Ps,t is a stable path between regions Ω1 and Ω2 if Ps,t is the shortest

path between s ∈ Ω1 and the whole region Ω2, and Ps,t is the shortest path between t ∈ Ω2 and

the whole region Ω1.

The naming of stable paths has its roots in dynamical systems, as it resembles stable xed

points. If one considers the function FΩ1→Ω2 (), mapping a node s ∈ Ω1 to a node t ∈ Ω2 by

3.4. Stable Paths on a Graph 27

Figure 3.2: Stable paths on a toy example.

nding the shortest path Ps,t between s ∈ Ω1 and Ω2, with t = FΩ1→Ω2 (s) as the end node

of such shortest path, then

GΩ1→Ω1 (s) = FΩ2→Ω1 (FΩ1→Ω2 (s)) = s

if and only if Ps,t is a stable path. Note that the concept of stable path is valid for any graph

and any two sub-graphs in general. The computation of the stable paths on the toy example

of Figure 3.2 provides the three paths yellow-highlighted in the gure.

As a second example, in Figure 3.3(a) the shortest paths between each point on the left margin

and the whole right margin are traced for the score in Figure 3.1(a). As seen, the paths got

attracted by the staines. Likewise, Figure 3.3(b) shows the shortest paths between each point

on the right margin and the whole left margin. The set of stable paths between both margins

result as the set of paths present in both gures.

With the concept of Stable Paths, the computation of all the stable paths in the graph derived

from an image has only roughly twice the complexity of the shortest path computation. Notic-

ing that the procedure delineated in Section 3.3 actually gives the shortest path between the

whole left margin Ω1 and each point on the right margin Ω2. The rst step on the computation

of the stable paths corresponds verbatim to the computation of the shortest path presented on

Section 3.3. In a second step one repeats the same procedure, traversing now the graph from

the right column to the left. At the end of this process, if the two endpoints of a direct and

reverse path coincide, we are in the presence of a stable path.


(a) Shortest paths from each pixel in the left columnand the whole right column, superimposed on the orig-inal image.

(b) Shortest paths from each pixel in the right columnand the whole left column, superimposed on the origi-nal image.

Figure 3.3: Exemplication of stable paths for Figure 3.1(a).

On the toy example of Figure 3.2, the computation of the stable paths provides only three

out of the four sta lines, with the last stable path following partially through a segment of

the third sta line and a segment of the fourth sta line4. As a consequence, it appears that

the computation of the stable paths does not guarantee the discovery of all paths of interest.

Applying, in a second time, the stable paths procedure, after erasing the paths found on the

rst iteration, we get a new set of paths, including a path joining the remaining segments of the

3rd and 4th sta lines. A search based on stable paths in preference to a sequential search by

the shortest path is related with the number of stable paths found simultaneously. For instance,

while a score with 60 sta lines would require 60 iterations of the shortest path algorithm, it

requires a few (typically between 4 to 6) iterations with the stable paths method. Next, the

complete proposed algorithm for sta line detection is detailed.

3.5 Proposed Algorithm

The proposed algorithm can be implemented as a sequence of a few high-level operations see

Listing 1 that will be described in this Section.

Preprocessing With the objective of detecting the sta lines, the proposed algorithm starts

by estimating the sta space height, staspaceheight, and sta line height, staineheight.

These lengths will both be used as reference measures for the subsequent operations.

There are already, in common use, robust estimators for its calculation. Usually, the

most used technique starts by computing the vertical run-lengths representation of the

image. After doing that to a bit-mapped page of music, the most common black-runs

represents the sta line height and the most common white-runs represents the sta space

height [Fuj04]. After estimating the reference lengths, the edges' weights are determined

as explained in Section 3.6.

4Note that the sequential search of the shortest path would suer from the same limitations.

3.5. Proposed Algorithm 29

BEGIN PreProcessing

compute staffspaceheight and stafflineheight

compute weights of the graph

END PreProcessing

CYCLE

compute stable paths

validate paths with blackness and shape

remove valid paths from image

add valid paths to list of stafflines

END OF CYCLE if no valid path is found

BEGIN PostProcessing

uncross stafflines

organize stafflines in staves

smooth and trim stafflines

END PostProcessing

Listing 1: Main operations of the proposed method.

Main Cycle The preprocessing is followed by the main cycle of the methodology that incorpo-

rates successively nding the stable paths between the left and right margins. Moreover,

it is in this step that the paths found are added to the list of sta lines and then erased

from the image. The erase operation sets to white the pixels on a vertical strip of height

empirically xed at staspaceheight, centred on the detected sta line. As explained previ-

ously, the erase operation is necessary to ensure that a line is not detected multiple times,

even if its height is higher than one pixel.

Stopping Rule To stop the iterative sta line search, a sequence of (arguably) sensible rules

is used to validate the stable paths found; if none of them passes the checking, the

iterative search is stopped. Two validation rules were applied, both assessing features

with respect to the median values obtained during the rst iteration. If a path does not

have a percentage of black pixels above a xed threshold, it is discarded. The median

percentage of blackness of all lines found in the rst iteration of the main cycle provides

the necessary reference: a threshold of 80% of the median value was empirically selected.

Likewise, if the shape of the detected path diers too much from the shape of the line with

median blackness, it is discarded. A dissimilarity measured as the average y− distancebetween both paths, after removing the means above shapedi = 4×staspaceheight wasselected as threshold.

PostProcessing After the main search step, valid detected sta lines are post-processed.

Although true sta lines never intersect, the above algorithm may occasionally create

intersecting lines, detected on dierent iterations. That may be due to a local low quality

of a line, leading a stable path to jump between consecutive lines. In doing so, this local

discontinuities can be the reason for possible zigzag back and forth between consecutive

lines, because the detected path is likely to follow and connect the remaining segments,

and consequently intersect with the previous detected path. To preclude such nal, unde-

sired state, lines are post-processed to remove intersections. That is easily and eciently

accomplished by, for each image column, sorting on y the pixels of the detected lines and


assigning the i-pixel to the i-line. After this simple process, lines may touch but they do

not intersect.

After removing the intersections it is now possible to eliminate spurious sta lines and

to cluster them in staves. Since the lines are ordered, these operations require only it-

erating through the list of lines and starting a new sta whenever the distance between

two consecutive lines is above a xed threshold. The value considered for this thresh-

old was = 2× staspaceheight. Subsequently, spurious staves are eliminated by simply

discarding those with only single sta line. However more robust rules can be created, be-

cause staves with only one sta line exist (e.g. percussion), although they are uncommon.

Finally, each retained line is trimmed at the beginning and at the ending and smoothed.

As visible in the example of Figure 3.1(b), before meeting with a sta line, a path travels

through a sequence of white pixels. Likewise, after the end of the sta line, the path goes

again through a sequence of white pixels until it meets the right margin of the image. In

order to ignore all of these white pixels (undesirable segments), the trimming operation

works per sta. Thus, for each sta, a sequence of median colours is computed as follows:

for each column, the median of the colours of the lines is added to the sequence. Next,

the trimming points are found on this sequence: starting on the centre, we traverse

the sequence to the left and to the right until a run of whiterun = 2×staspaceheightwhite pixels is found (value obtained experimentally). The pixels between the left and

right runs are kept in the sta lines. At the end, lines are smoothed with a standard

average low-pass lter. Considering a sta line as a sequence y(x) of y-positions, a one-

dimensional averaging lter is applied. A window size of 2×staspaceheight was selectedon the experiments.

3.6 Design of the Weight Function

An immediate approach is to support the design of the weight function solely on the values

of the incident nodes, that is, if any of the corresponding pixels are black then a low cost is

assigned to the edge, otherwise the edge assumes a high cost. We call this the baseWeight

in Listing 2. Nevertheless, the weight function can be generalized taking into account other

factors. It is very useful to consider the prior knowledge about a music score when we want

to nd the sta lines. In order to incorporate this idea in the shortest path process, we

consider dierent alternatives to modify the weight function of the graph. In doing so, this

weight function will be modied by a term that codies that information about the music sheet.

Furthermore, two more attributes were considered in each pixel with the aim to inuence the

main contribution to the edge's weight resulting from the values of the incident pixels. The

prime intention of these additional features is to discriminate black pixels in the sta lines

from black pixels in the music symbols, penalizing the latter and favouring the former. With

this in mind, if a black pixel is part of a short vertical run of black pixels, then it is more

3.7. Sta Line Removal 31

likely to be part of a sta line rather than of a symbol. Therefore, a term beneting such

edges is included in the weight function. The other attribute is that a sta line is likely to

have another sta line at roughly staspaceheight pixels, assuming that staves have at least

two lines. Hence, if the nearest vertical run of black pixels on the same column is excessively

far from the vertical run of black pixels containing the current black pixel, then this pixel is

more likely to belong to a symbol (for instance, it may belong to a ligature) rather than to

sta line. Consequently, a penalising term is incorporated in the weight function for these cases.

The pseudo-code for the weight function is provided in Listing 2.

WeightFunction(pixelValue1, pixelValue2, vRun1,

vRun2, nearestVRun1, nearestVRun2,

NeighbourhoodType)

value = min(pixelValue1, pixelValue2);

weight = baseWeight(value, NeighbourhoodType);

if( (vRun1<=STAFFLINEHEIGHT)

OR(vRun2<=STAFFLINEHEIGHT))

weight = weight - delta;

if( (nearestVRun1>=STAFFSPACEHEIGHT+STAFFLINEHEIGHT)

OR(nearestVRun2>=STAFFSPACEHEIGHT+STAFFLINEHEIGHT))

weight = weight + delta;

return weight;

Listing 2: Pseudo-code for the weight Function. The base weight was set to 4 on black pixelsand 8 on white pixels for 4-neighbourhoods and to 6 and 12 on for 8-neighbourhoods. Thedelta penalizing term in the weight function was set to 1. For eciency, weights were designedwith integer values.

3.7 Sta Line Removal

The sta lines removal algorithms used in this work were adopted from [DDCF08]. Line Track

Height, Line Track Chord, Roach/Tatem and Skeleton were the algorithms chosen for the

experiments. A brief description of these algorithms will be provided now:

Line Track Height The algorithm tracks the sta lines and checks when a vertical black run

is longer than a threshold (experimentally set at 2×staineheight).

Line Track Height Modied The version modied of the Line Track Height algorithm also

track the sta lines positions obtained by a detection algorithm and removes vertical run

sequences of black pixels that have a value lower than a specied threshold (chosen exper-

imentally as 2×staineheight). In this version, a carefully attention to the deformations

sta lines may have discontinuities, be curved or inclined that may occur in the music

scores are given. These problems will inuence the success to achieve a correct detection

of lines contained on the score. The positions of the sta lines obtained by a sta line

detection algorithm may pass slightly above or under the real sta lines positions. There-

fore, if we are in presence of a white pixel when the sta lines are tracked, we search


vertically for the closest black pixel. If that distance is lower than a specied tolerance

experimentally chosen as 1+ceil(staineheight/3.0) we move the reference position of

the sta line to the position of the black pixel found5.

Line Track Chord This algorithm computes, for a xed angle resolution of three degrees,

the chord length through the skeleton point (see Figure 3.4). This results in a function

chordlenght(ϕ), where ϕ is the chord angle, for each skeleton point. When the sta line

pixel also belongs to a crossing music symbol, the function should have a second distinct

peak. To detect this peak the following thresholds are used:

1. There is a local maximum when chordlength is greater than 5×staineheight at an

angle below 30 degrees and another local maximum when chordlength is greater

than 1.75×staineheight× sin(ϕ) at an angle ϕ > 30 degrees.

2. The valley between two maxima must have a depth greater than 1.5×staineheight.

Concluding, this algorithm removes the sta line through the angles peaks of the chord

lengths. There are two distinct peaks depending if the pixels belong to a sta line or a

music symbol.

Figure 3.4: Length of a chord through a skeleton point at some angle ϕ.

Roach/Tatem This algorithm uses a labelling scheme based on the angle information and

pixel adjacency to identify the sta line pixels [RT88]. The chord length and the angle

function described in the Line Track Chord are computed for every pixel. Consequently,

the original image can be transformed into two-dimensional vector eld by picking the

angle and length of the longest chord for each black pixel. This will assign pixels on sta

lines a high length value and an angle value of zero. To avoid the removal of symbol pixels

on the sta lines, some horizontal line pixels are iteratively relabelled as non-horizontal

pixels, depending on the labels of their neighboring pixels.

Skeleton This method consists of the following steps:

1. The skeleton is split at branching point and corner points with an angle below

135 degrees. Around each spliting point a number of pixels are removed see

Figure 3.5(a).

5See Appendix A.6 for more details.

3.8. Database of Music Scores 33

2. Sta line segment candidates are picked as skeleton segments if the orientation an-

gle (least square tted line) is below 25 degrees, the segment is wider than tall

and the straightness (mean square deviation from least square tted line) is bellow

staineheigth2/2.

3. Apply a sta-nding algorithm to the sta segment candidates. Two sta segments

are horizontally linked when their extrapolations from the end points with the least

square tted angle come closer than staineheigth/2.

4. Remove false positives: from each ovelapping sta segment group on the same line

the one that is closest to its least square tted neighborhood is picked and the oth-

ers are discarded; non-sta segments that have the same branching point as a sta

segment are extrapollated by a parametric parabola: if this parabola is approxi-

mately tangential to the sta segment, the latter is considered a false positive see

Figure 3.5(b).

5. Remove sta lines: all vertical black runs around the detected sta skeleton are

removed.

(a) Pixels within the distance transformradius around each splitting point are re-moved.

(b) A falsely detected sta segment that canbe identied as belonging to a music symbolbecause it is appoximately tangential to an ex-trapolated parabola from a non-sta segment.

Figure 3.5: Example (from [DDCF08]).

3.8 Database of Music Scores

For the purpose of testing the proposed algorithm, a database with a sucient number of scores

to produce signicant results on which we can draw conclusions and test improvements, was

created. Although the assessment of a new sta detection algorithm may be done by visually

inspecting the output on a set of scores as adopted on [RCC+07] , here we support the

comparison with two test sets adopted for the qualitative evaluation of the proposed method:

the test set of synthetic scores from [DDCF08] and a new set of real score. The synthetic

database consists of 32 ideal scores see Figure 3.6(a) with dierent musical notation to

which several known deformations were applied. In total 2688 images were generated. The

simulation of distortions in a controlled manner made possible to build many feasible problems

that can be found in the handwritten music scores. Despite this we also test the performance of


the stable path algorithm with a test set of real scores in total 50 images from the Portuguese

composer Fernando Lapa with other old handwritten scores (see Figure 3.6(b)).

(a) Music score from the test set of 32 ideal scores. (b) Music score from the test set of 50 real scores.

Figure 3.6: Two examples of music scores from the test set used on the experimental evaluation.

3.8.1 Deformations in the Synthetic Scores6

The Gamera framework7 (Generalized Algorithms and Methods for Enhancement and Restora-

tion of Archives) was the base platform used to create the deformation database with intention

to development and test the proposed algorithm. In a brief description, this framework is a

toolkit for building document image recognition systems. It has a programming library with a

set of graphics tools for experimentation and training. Gamera provides a wealth of tools for

image processing. In this work, in order to complete the Gamera framework, for the concrete

problem of building a syntactic database, the MusicStaves plug-in was included. This toolkit

appears to be ideal for the problem, because

¶ It is a specic platform for the development and test of the detection and removal sta

lines algorithms.

· It is an open source toolkit.

¸ It allows using handwritten and printed scores with the algorithms already implemented

in Gamera.

The distortions adopted from [DDCF08] applied to the test set of ideal scores can be classied

in two categories:

1. Deterministic deformations, which depend of certain parameters, as for instance the ro-

tation.

2. Random defects, which use various parameters about the deformation and a pseudo-

random number generator.

In both cases it is necessary to apply the deformation in parallel with the original score and

the ground-truth sta image.

Despite some deformations being easy to understand by the name, there are others where this

is not true (the defects resolution, rotation and line interruption are self explanatory). A brief

6Consult [DDCF08] for more details.7See http://ldp.library.jhu.edu/vhost-base/gamera and [Cap08] for more details.


Deformation Type Parameter description

Rotation Deterministic Rotation angle

Curvature Deterministic Height ratio: width of sine curve

Typeset Emulation Both Range width, maximal heightand variance of vertical shift

Sta Line Interruptions Random Interruption frequency, maximal widthand variance of range width

Sta Line Thickness Variations Random Markov chain stacionarydistribution and inertia factor

Sta Line y-variation Random Markov chain stacionarydistribution and inertia factor

Degradation After Kanungo Random (η, α0, α, β0, β, κ), see [KHB+00]

White speckles Random Speckles frequency, randow walklength and smoothing factor

Table 3.1: Deformations in the images.

explanation of these distortions will be done. In Table 3.1 the deformations considered, which

are available in the MusicStaves toolkit, are listed. In Figure 3.7 it is possible to see the eects

caused in the images.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 3.7: Some examples of applied deformations from the original image: a) Original; b)Curvature c) Degradation after Kanungo; d) Sta line thickness variation; e) Sta line y-variation; f) Typeset emulation; g) Rotation; h) White Speckles; i) Sta line Interruptions.

Curvature The curvature is obtained through a half sine wave over the entire score area. The

intensity of the resulting curvature can be measure as the ratio of amplitude (height) and

the width of the wave.


Typeset emulation This deformation tries to reproduce the 16th-century prints, which are

similar, in some ways, to typewriters, causing, therefore, interruptions in the lines between

the symbols and a random vertical shift in each portion containing a symbol.

Sta line thickness variation and sta line y-variation Sta line thickness variation and

sta line y-variation are obtained by a Markov chain describing the evolution of the sta

line thickness from left to right. These deformations are achieved by that process because,

usually, the thickness at a particular x-position depends on the thickness at the previous

x-position. The parameter is the transition probability matrix P with pij := probability

of transition from thickness or y-deviation i to thickness or y-deviation j. The thickness

or y-deviation can be of n dierent values (states). To the stationary distribution of the

individual states a symmetric binomial distribution is assumed, that is:

πi =

(n− 1i− 1

)1

2n−1

The mean value (n− 1)/2 of this distribution is associated with the original value in the

image without the deformation (staine_height for the thickness or zero for the devia-

tion from the original y-position). The Markov chain is generated with the Metropolis-

Hastings algorithm [Has70]. The transition probability matrix Q, to obtain candidate

transition points, is chosen to be:

qij =

c for j = i

1− c/2 for j = i± 10 otherwise

where the probability c can be considered as an inertia factor that allows smooth transi-

tions: the closer c is to one, the slower is the state variation.

Degradation after Kanungo This deformation tries to imitate the local distortions caused

during printing and scanning. The model has six parameters (η, α0, α, β0, β, κ) with

dierent meanings:

Each black pixel in the original image is ipped with probability α0 exp−αd2

+η,where d is the distance to the closest background pixel.

Each background pixel is ipped with probability β0 exp−βd2

+η, where d is the

distance to the closest foreground pixel.

κ is the diameter of a disk of a morphological closing operation.

White speckles This degradation model has three parameters (p, n, k) with the following

meaning:

Each black pixel in the original image is taken with probability p as a starting point

for a random walk of length n.

k is a rectangle of a morphological closing operation that will smooth an image

containing the random walk.


Deformation Parameter range

Rotation angle = −5 : 2.5 : 5Curvature amplitude/stawidth = 0.02 : 0.02 : 0.10Typeset Emulation n_gap = 1 : 3 : 13, n_shift = 1 : 3 : 13, p_gap = 0.5Sta Line Interruptions frequency α = 0.01 : 0.02 : 0.10,

binomial parameter for width: n = 6, p = 0.5Sta Line Thickness Variation inertia c = 0.8, maximum deviation = 2 : 1 : 6Sta Line y-variation inertia c = 0.8, maximum deviation = 2 : 1 : 6Degradation After Kanungo η = 0, α0 = 0.5, α = 0.25 : 0.3 : 1.5,

β0 = 0.5, β = 0.25 : 0.3 : 1.5, κ = 2White Speckles smoothing factor k = 2,

random walk length n = 10,speckle frequency p = 0.03 : 0.02 : 0.11

Table 3.2: Ranges of deformation parameters used in the tests: min:step:max.

The image with the random walks is subtracted from the original: an image with

white speckles at the random walk positions is obtained.

Therefore, p can be interpreted as the speckle frequency, n as a measure for the speckle

and k as a smoothing factor.

The range of deformations parameters in our test set is given in Table 3.2. The values were

restricted with the aim of obtaining realistic deformations. Even so, when real scores are

treated, generally, the deformations do not occur in its pure form; a combination from several

deformations is more prone to happen. However, because the evaluation of the detection

algorithms is our objective it is more adequate to research the performance in each isolated

deformation.

3.8.2 Ground-truth Information

In order to establish a qualitative comparison between the various sta lines detection and

removal algorithms a ground-truth information is necessary. Hence the black pixels of the

images need to be labelled as being sta lines or otherwise. The manual process has two

disadvantages: it is very time consuming and it is prone to occur errors (e.g. the existence

of dubious pixels that may belong to the line or to the symbol that crosses the sta line).

Therefore, ground-truth information of the syntactic images released by the authors of the

MusicStaves toolkit was used. In Figure 3.8 the deformation process and the ground-truth

generation are displayed. In the code of the postscript images (scores) the macros of drawing

lines were removed and the resulting images were converted to one-bit raster images. In this

manner, real musical scores without lines can be obtained. The distortions are applied in the

bitmap images.

When we are dealing with handwritten music sheets we can only achieve the ground-truth

information manually, that is, we need to delete the symbols present in the score in order to


Figure 3.8: Generation of the deformed images and the respective Ground-Truth.

retain just the sta lines segments.

In order to evaluate the performance of the sta lines detection algorithm the scores with

only lines will be used as references, while the sta-less images will be used to test the sta

lines removal algorithms. Note that it is possible to directly store the same references for the

deformations synthetics images through skeleton generated in each distortion. However, once

again, in the handwritten music scores this process must be manual.

3.9 Evaluation Metrics and Results

This section presents the error metrics used to evaluate the performance of the proposed al-

gorithm for the detection and removal of sta lines in comparison with the shortest path al-

gorithm8 and the algorithms present in [DDCF08]. However, since the Dalitz's algorithm9 has

a signicantly better performance than the other two proposed in [DDCF08], the comparison

results presented here will be only for the Daltiz's algorithm.

3.9.1 Detection Error Metrics

With the purpose of evaluating the performance of the stable path algorithm and to study their

behaviour with respect to the several deformations applied to the images, two dierent error

metrics were considered: the percentage of false positive sta lines and the percentage of sta

lines missed to detect.

The process starts by computing the average Euclidian distance between each reference sta

line given by the ground-truth information; see Subsection 3.8.2 and each sta line actually

detected. Next, the matching problem on the resulting bipartite graph is solved by minimizing

the assignment cost, that is, the distance10. Only pairs with average error-distance below

staineheight were assumed correctly matched. The remaining pairs were assumed to originate

from a false positive sta line being matched to an undetected true sta line and were therefore

unmatched. In the end, the two metrics result as the number of unmatched detected sta lines,

8Description of the algorithm in [CCRG08b].9See Appendix A.3 for a brief description.

10See Appendix A.4 for a brief description of the method.

3.9. Evaluation Metrics and Results 39

that is, false positive, and unmatched reference sta lines, that is, false negative. It should be

noted that these metrics only measure whether sta lines are found, not how good the match

is.

Results

From the results presented in Table 3.3 we can conclude that the stable path based approach

(and the shortest path approach) outperforms the Dalitz algorithm11, for both error metrics

dened. In some deformations a visible/considerable dierence between stable path and Dalitz

algorithms does not exist. However, there are not cases where the proposed algorithm is in

disadvantage with respect to Dalitz algorithm.

Besides that, the performance of the stable path approach is almost independent of intensity of

the deformation, for the range of values considered. This performance gain is even more note-

worthy as the Dalitz algorithm is receiving as input the correct number of lines per sta. Had

this not been the case, the dierence between both would have been much larger. The current

implementation of the stable path algorithm run as fast as the Dalitz algorithm (and about ve

times faster than the shortest path version), as available at the Sta Removal Toolkit [DDCF07].

By generalizing the design of the weight function and enhancing the postprocessing, we were

able to improve the detection performance on the initial results in [CCRG08b]; the use of the

stable paths allowed to improve the detection speed.

Since all the parameters of the method are scaled by one of these two values estimates staine-

height and staspaceheight, a simple analysis of the sensitivity of the approach to these param-

eters was done. We re-run the experiments with values for staineheight and staspaceheight

one pixel above and below the true estimated values. In both cases, the stable path method

presented a good performance and continued to yield the best results (the strongest degradation

occurred for the rotation degradation, with angle=5°, where both error rates increased to 1.7%).

In a second similar experiment we evaluated the sta line detection methods on a set of 50

real music scores, for which reference sta lines were manually outlined as referenced in Sub-

section 3.8.2. Images were previously binarized with the Otsu threshold algorithm12, as im-

plemented in the Gamera project13. The evaluation of both detection algorithms yielded the

results presented in Table 3.4. Again, and although the correct number of sta lines per sta

was inputted into the Dalitz' algorithm, the stable path approach obtained the best perfor-

mance.

11For the deformations not shown, the stable path is not signicantly better than Dalitz.12See Appendix A.5 for a brief description of the method.13http://gamera.sourceforge.net

http://gamera.sourceforge.net


rotation

Angle

-5-2.5

02.5

5Runtime

Error

Stablepath

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.6

(3.5);0.6

(3.5)

0.7

(3.5);0.7

(3.5)

1.2

(4.0);1.2

(4.0)

858sec.

SortestPath

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.6

(3.5);0.6

(3.5)

0.7

(3.5);0.7

(3.5)

1.2

(4.0);1.2

(4.0)

6006sec.

Dalitz

17.8

(22.0);51.7

(38.1)

8.6

(14.0);15.5

(28.7)

0.0

(0.0);0.0

(0.0)

4.2

(19.6);9.8

(29.0)

5.5

(9.3);37.5

(41.9)

612sec.

curvature

Amplitude/stawidth

0.02

0.04

0.06

0.08

0.10

Runtime

Error

Stablepath

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

1.2

(4.0);1.2

(4.0)

822sec.

SortestPath

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

1.2

(4.0);1.2

(4.0)

5554sec.

Dalitz

12.7

(29.0);12.6

(28.7)

53.8

(44.5);55.7

(45.2)

83.3

(30.6);83.9

(30.4)

95.8

(17.5);100.0

(0.0)

96.0

(17.5);100.0

(0.0)

639sec.

whitesp

eckle

Rate

whitened

pixels

0.03

0.05

0.07

0.09

0.11

Runtime

Error

Stablepath

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

0.9

(3.7);0.9

(3.7)

1.2

(3.8);1.2

(3.8)

2.1

(4.6);2.3

(4.8)

809sec.

SortestPath

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

0.9

(3.7);0.9

(3.7)

1.7

(4.0);1.9

(4.3)

5.3

(7.4);7.0

(9.6)

5122sec.

Dalitz

0.0

(0.0);0.0

(0.0)

0.3

(1.4);0.3

(1.4)

26.7

(25.3);29.9

(27.2)

89.3

(54.6);86.9

(25.6)

54.5

(55.9);95.2

(17.0)

872sec.

liney-variation

Maxdeviation,n

23

45

6Runtime

Error

Stablepath

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

1.1

(3.8);1.1

(3.8)

767sec.

SortestPath

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.7

(3.5);0.7

(3.5)

0.8

(3.6);0.8

(3.6)

1.1

(3.8);1.1

(3.8)

5122sec.

Dalitz

31.0

(50.7);31.7

(46.1)

22.1

(38.9);32.6

(45.6)

15.7

(27.2);33.7

(45.0)

13.0

(20.1);33.7

(45.0)

12.8

(18.6);34.2

(44.7)

768sec.

typesetemulationI

Maxgapwidth,ngap

14

710

13

Runtime

Error

Stablepath

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

739sec.

SortestPath

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

0.6

(3.5);0.6

(3.5)

5085sec.

Dalitz

19.7

(34.1);13.7

(21.6)

21.4

(27.4);15.4

(17.3)

22.3

(30.0);17.4

(19.0)

24.2

(38.9);16.7

(22.0)

31.4

(42.3);19.2

(20.3)

703sec.

typesetemulationII

Maxverticalshift,

nshift

14

710

13

Runtime

Error

Stablepath

0.6

(3.5);0.6

(3.5)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

886sec.

SortestPath

0.6

(3.5);0.6

(3.5)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

0.7

(3.8);0.7

(3.8)

6106sec.

Dalitz

3.3

(10.4);2.3

(7.3)

15.9

(27.1);12.0

(17.1)

39.0

(48.7);24.6

(27.3)

48.7

(60.8);29.3

(30.7)

58.7

(54.9);37.3

(29.0)

842sec.

Table3.3:

Eectof

dierentdeform

ations

ontheoverallsta

detectionerrorratesin

percentage:

average(stand

arddeviation)

ofthefalsedetection

rate

andmissdetectionrate.See[DDCF08]forparameter

details.


False detection rate miss detection rate RuntimeDalitz 5.2% (10.4) 5.9% (11.3) 112 sec.

Shortest path 1.4% (3.5) 2.5% (7.3) 612 sec.Stable path 1.3% (5.7) 1.4% (6.4) 115 sec.

Table 3.4: Detection performance on real music scores in percentage: average (standard devi-ation).

3.9.2 Removal Error Metrics

Sta line detection algorithms can be used as a rst step in many sta removal algorithms.

To understand the potential of our algorithm to leverage the performance of existing sta re-

moval algorithms, we conducted a series of experiments, comparing existing versions of sta

line removal algorithms with modied versions of them, making use of the stable path algo-

rithm at the sta line detection step. The quantitative comparison of the dierent algorithms

is totally in line with the comparison presented in [DDCF08]. Adopting the naming convention

from [DDCF08], the following algorithms were adapted: LineTrack Height, LineTrack Chord,

Roach/Tatem, as already described in Section 3.7.

The original version of the algorithms was considered as available in [DDCF07], making use of

the Dalitz algorithm in the detection phase. The modied versions use instead the stable path

for detecting lines. We also adopted the same error metrics (individual pixels, sta-segment

regions and sta interruption location) presented in [DDCF08] and conducted the comparative

study on the same test set see Section 3.8.

Individual Pixels This metric considers the sta line removal as a two-class classication

problem at the pixel level, that is, one pixel can belong to a sta line or not. Therefore,

a natural performance measure is the error rate for this classication, given by

Pixel error rate =x+ y

z

x =Number of misclassied sta pixels

y =Number of misclassied non sta pixels

z =Number of all black pixels

However, this metric has one problem: little information is given about how well the sta

removal algorithm separates symbols that are otherwise connected by sta lines; this error

only indicates how badly the symbols are distorted when compared to the ideal sta-less

images.

Segmentation Region Level This error metric has as base the regions of the line segments.

The sta line removal can be considered as a segmentation problem where the sta lines

segments need to be separated from the symbol segments.


In an OMR application the sta lines segments are considered background and the re-

maining symbols are taken as being the segments of interest. Nevertheless, when we

are trying to evaluate the quality of the sta line removal the situation is reversed: our

interest lies on the sta segments and the rest constitutes background.

Following the notation given in [TMD99], we have two segmentations for the set of black

pixels present in the test image:

1. The ground-truth segmentation G = Gobj ∪ gnoise with G = g1, . . . , gM2. The segmentation detected from the algorithm S = Sobj ∪ Snoise with S =s1, . . . , sN, where each gi and sj contains the black pixels of a contiguous sta

segment respectively, and gnoise and snoise contain the remaining background black

pixels, respectively.

In the set of all sta line segments from both segmentations an equivalence classes of

overlapping segments is built(two segments are considered equivalent a ' b when a se-

quence c1, c2, . . . , cn exists with c1 = a, cn = b and ci ∩ ci+1 6= 0). For each equivalence

class r we count the number of the contained numbers of segments G and S and thus

detect recognition errors. All possible cases are listed in Table 3.5. The formula is given

by

Segmentation error rate =x− yz

x =Number of all classes r

y =Number of classes representing a correct recognition

z =Number of all classes r

Classes number Segments from Gobj Segments from Sobj Error description

n1 1 1 Correct

n2 1 0 Missed segment

n3 0 1 Falsely detected segment

n4 1 >1 Segment split

n5 >1 1 Segments merged

n6 >1 >1 Both splitting and merging occurred

Table 3.5: Sta segment extraction errors based on the number of segments in an equivalenceclass r.

Sta Line Interruption To obtain this metric a comparison between the ground-truth im-

ages, containing only the segments removed from the ideal case, with the image that

contains exactly the pixels that were in fact removed by the removal algorithm under

evaluation, is made. In doing so, each sta line is followed, from left to right, in the

images containing only the removed sta segments; and interruptions in the sta line are

looked at. Each interruption represents a detected music symbol that crosses the sta

line. It follows two sets of intervals: the interrupting intervals G = g1, . . . , gM in the

ground-truth data and those in the algorithm output S = s1, . . . , sN.


With the purpose to establish an error metric, a bipartite graph is created by adding links

between intervals gi and sj that overlap. In this manner, two types of error are revealed:

intervals from G to S without a link and intervals with more than one link. In order to

count the number of errors of the second type, the maximum cardinality matching in this

graph is computed [Gal86].This cardinality matching also removes the minimal number

of links leading to the second type error. As a result error rate we have

Sta line interruption error rate =min n3, n1 + n2

n3

n1 =Number of interruptions without link

n2 =Number of removed links

n3 =Number of ground-truth interruptions

Results

From the results obtained we can conclude that the qualitative eects of the deformations

and the insertion of the stable path as the detection algorithm are similar for all three error

metrics. In Tables 3.6, 3.7, 3.8 and 3.9, we present the results for all three error metrics

(individual pixels, sta-segment regions and sta interruption location) for the Line Track

Height algorithm (modied) (LTH), plus the Skeleton algorithm, which exhibited a competitive

performance in [DDCF08]. This algorithm does not clearly separate the sta lines detection

from the sta lines removal, doing both tasks simultaneously. So, this algorithm was used

alone, without providing the detection performed by stable paths algorithm.

Stable path + LTH Dalitz + LTH SkeletonPixel Error Rate 2.8 (1.2) 3.8 (2.6) 6.5 (8.2)

Segmentation Error Rate 0.3 (0.1) 0.3 (0.1) 0.3 (0.1)Sta Line Interruption Error Rate 0.3 (0.1) 0.3 (0.1) 0.3 (0.1)

Table 3.6: Removal performance on real music scores (in percentage): average (standard devi-ation).

A rst observation is that, overall, the replacement of the Dalitz method by the stable path

approach as the sta detection step improved the results in the algorithms under comparison.

Additionally, the LineTrack Height algorithm with the stable path consistently outperformed

the other algorithms. Nevertheless, the skeleton method [DDCF08], which does not have a clear

line detection step, continues to present a competitive performance. It is worth to nalize by

noticing that the skeleton algorithm is about two times slower than the modied Line Tracking

Height algorithm.

In Figure 3.9 it is possible to see some results obtained when we apply the LineTrack Height

algorithm with stable path approach to our test set.


rotation

Angle -5 -2.5 0 2.5 5Stable path + LTH 1.7 (0.7) 1.5 (0.7) 1.4 (0.7) 1.4 (0.7) 1.6 (0.7)

Dalitz + LTH 19.4 (18.4) 5.2 (8.7) 1.4 (0.8) 4.4 (8.8) 17.5 (18.9)Skeleton 1.9 (0.9) 1.7 (0.8) 1.5 (0.7) 1.6 (0.7) 1.7 (0.8)

curvature

Amplitude/stawidth 0.02 0.04 0.06 0.08 0.10Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.5 (0.7) 1.6 (0.7)


white speckle

Rate whitened pixels 0.03 0.05 0.07 0.09 0.11Stable path + LTH 11.9 (3.1) 17.2 (4.9) 21.1 (5.9) 24.0 (6.7) 26.1 (7.2)


line y-variation

Max deviation, n 2 3 4 5 6Stable path + LTH 1.2 (0.7) 1.3 (0.7) 1.3 (0.6) 1.4 (0.6) 1.4 (0.6)


typeset emulation I

Max gap width, ngap 1 4 7 10 13Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.4 (0.7)


typeset emulation II

Max vert. shift, nshift 1 4 7 10 13Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.5 (0.7) 1.6 (0.7)


Table 3.7: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Individual pixels error.

rotation



curvature



white speckle



line y-variation



typeset emulation I






Table 3.8: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Sta line interruption error.


rotation



curvature



white speckle



line y-variation



typeset emulation I






Table 3.9: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Segmentation region level.

Figure 3.9: Examples of the results obtained using the sta line removal algorithm in our testset.

Part III

Segmentation and Classication

47

Chapter4

Segmentation and Classification Process

This chapter begins with a description of the dierent music symbols that can be found in a

music score. Then proceeds with the description of the segmentation and classication steps.

Music notation, like any other written language, did not appear as the invention of one person.

It emerged from the combined and prolonged eorts of hundreds of musicians. They all hoped

to express by written symbols the essence of their musical ideas [Rea69]. Improvement in those

symbols came about because it was necessary. Hence, the musical notation is very extensive if

we consider all the existing possibilities and their variations over time. The result notation is

a kind of alphabet, shaped by a general consensus of opinion to serve as a general expressive

technique. Music notation is, thus, the visual manifestation of the interrelated properties of

musical sound (as pitch, intensity, time, timbre and pace). Symbols indicating the choice of

tones, their duration and their manner of performance are important because they form this

written language that we call music notation.

Several symbols can be confused with each other because of their shape similarity. Besides

that, in some cases, complex symbols can appear in the score, as for instance the guitar sym-

bols (diagram with the positions of the notes in the instrument). And if we consider that we

are working with handwritten music scores several additional problems come about. One with

more evidence is the variation in the symbols notation caused by each person. In this work we

decided to study and recognize the standard music notation. In this manner, variants of types

of specic notation will not be recognized, as drums notation. Furthermore, since tablature

is something specic to certain instruments, it will also not be recognized. In the following

Section the symbols that we treat in this thesis will be described.

4.1 Musical Symbols

Clefs The term Clef is derived, logically enough, from the Latin word clavis, meaning key.

The three clef signs in common use today had their origins in pitch letters: the earliest

form was the Latin letter F related with the tone f (fa), when a second line was latter

drawn above the text over the F the letter C was axed upon it, representing the pitch

of c' (do), the last letter was G which represents the tone g' (sol). The F clef became

the bass clef because of its lower register; the C clef became either tenor or alto clef ; and

49

50 Chapter 4. Segmentation and Classication Process

the G clef became the soprano or treble clef. The clef symbols are the rst symbols that

appears at the beginning of every music sta. It is very important because they tell us

which note is found on each line or space.

Table 4.1: Clefs.

Treble clef

Alto clef

Bass clef

The modern bass clef is used for all voices and instruments of essentially a bass register;

for instance baritone voices, keyboard and orchestra instruments as piano and organ.

The alto clef is used in English horn, trombones and violas. It is important to stress

that none of these last instruments uses the alto clef exclusively; only the viola employs

it with consistent regularity. The treble clef is the highest of all the clef signs, beginning

below the sta and extending above it [Rea69]. All instruments and voices of high range

employ this symbol, as for instance lyric soprano, orchestra instruments as piccolo and

ute, English horn, saxophone, among others.

Noteheads and stems A musical note indicates two important aspects. On the one hand,

the symbol indicates, by its position on the sta and by the clef used, a pitch to be played

or sung. On the other hand, the note establishes, by the exact appearance of its three

integral parts (notehead, stem and ag), the relative time duration of this musical sound.

Table 4.2: Notes.

Minim

Crochet

The notehead is somewhat oval in shape, and is either open Minim or closed Crochet.

4.1. Musical Symbols 51

The stem is the thin vertical line joined to the side of the note head (closed or open).

When the stem is to go down, it is placed on the left side of the notehead; when the stem

is to go up, it is put on the right side of the notehead. This stem direction is measured

as follows: when the notehead is above the centre, line of the sta, the stem goes down

see Figure 4.1(a); when the note is below the middle line of the sta, the stem goes up

see Figure 4.1(b); and when the note is centred on the sta, the stem may go in either

direction see Figure 4.1(c), although it is more common practice to draw it down. The

stem length is proportional, that is, it is measure interms of the note position on the sta.

(a) Stem down. (b) Stem up. (c) Note positioned in the centre ofsta the stem may go in either di-rection.

Figure 4.1: Example.

Flags and beams Flags are employed to indicate the relative time values of the notes with

black (closed) noteheads. These symbols are featured by a thin curved stroke that joins

to the end of the stem, and always goes to the right of this symbol, regardless of whether

the stem goes up or down. When a dot follows the notehead, the end of the curved line is

often shortened so as not to conict with the dot. On a downward stem the curved line

barely touches the bottom of the notehead.

Table 4.3: Flags and beams.

Quaver

Semiquaver

Demisemiquaver

Hemidemisemiquaver

Beams are compound ags used to connect notes in note-groups. The number of beams

always equals the number of ags appropriate for each note. These symbols demonstrate

the metrical and the rhythmic divisions within the measure and also the note-groupings


in oppositions to the normal patterns of the measure. Beams are considered in two

categories: primary (beams that link an entire group; without breaks) and secondary

(beams that are interrupted or partially broken) see Figure 4.2. The position of the

beams is solved by the same general principles that govern stem direction. That is, if the

majority of stems normally go up in the group to be beamed, then all the stems will go

up and the beams will be placed above the notes; on the other hand, if most of the notes

require a downward stem (most of the noteheads lie above the sta centre), then the

stems of all the notes must be written down and the beams placed below the noteheads.

The direction of all beams should approximately parallel to the main body of the notes

they connect.

Figure 4.2: Example.

Rests The rests were invented to indicate the exact duration of silence in the music. Therefore,

each note value has its corresponding rest sign. The written position of a rest between

two barlines is determined by its location in the meter, that is, a rest always occupies

the same position in a measure that an equivalent note value would ll, except the whole

rest, which is always centred.

The whole rest is solid, oblong symbol and placed just beneath the fourth line of the

sta (regardless of the clef used). It is incorrect to place it elsewhere unless two voices

or instruments share common sta. When this happens, the whole rest is usually writ-

ten beneath the top line for the upper part, and beneath the bottom line for the lower part.

The half rest is an oblong mark of the same length as the whole rest, located on the third

sta line. Like the whole rest it is incorrect to place this symbol on any other line unless

two voices or instrumental parts share the sta. In this case, the half rest goes above the

top line for the upper part, and above the bottom line for the lower part.

The quarter rest is the most dicult symbol to write. The sign is centred on the sta,

with the top tail ending in the fourth space and the bottom hook resting in the rst space.

When two parts share a common sta, this rest goes somewhat above the sta centre for

the upper part, and just below it for the lower part.


Table 4.4: Rests.

Rest Note

Whole rest

Half rest

Quarter rest

Eighth rest

Sixteenth rest

thirty-second rest

Sixty-fourth rest

The eighth rest has half of the value of the quarter rest. This symbol is written as a

slanting stem with a single hook. The stems rest occupies the second sta line and the

hook occupies the third space.

The sixteenth rest has half of the value of the eighth rest. This symbol has another hook

to the slanting stem. This second hook occupies the second sta space and the stem must

be extended to touch the bottom line.

The thirty-second rest has half of the value of the sixteenth rest. It consists of three

hooks and a further extended stem. The three hooks occupy the second, third, and fourth

spaces, with the stem resting on the bottom line.

The sixty-fourth rest has half of the values of the thirty-second rest. It is a elaborate

structure with its four hooks. Each hook must occupy one of the four sta spaces, and

the slating stem must accordingly be extended until the distance of one space below the


sta.

Ties and slurs Ties are a notational device used to prolong the time value of a written note

into the following beat. This symbol is a curved line that connects two successive note-

heads indicating, together, the total time value desired. The tie appears to be identical

to slur, however, while tie almost touches the notehead centre, the slur is set somewhat

above or below the notehead. Besides that, ties are normally employed to join the time

value of two notes of identical pitch, while slurs have a totally dierent function. The tie

always loops in the direction opposite that of the stem. In other words, when the tie notes

have their stems going up, the tie mark loops below the noteheads see Figure 4.3(b),

when the two stems go down, the tie is curved above the noteheads see Figure 4.3(a).

Slurs aect note-groups as entities indicating that the two notes are to be played in one

physical stroke, without break between them. When the slur is used in conjunction with

tied notes, it may be placed over or under the noteheads tied together, but the tie always

loop according to the note position on the sta.

Table 4.5: Ties and Slurs.

Tie

Slur

(a) (b)

Figure 4.3: Example: a) Tie above noteheads; b) Tie under noteheads.

Accidentals The signs that are placed before the note to designate changes in sounding pitch

are called accidentals. These signs are ve in number: the natural, the at, the sharp,

the double at and the double sharp. However, in this work, only the rst three symbols

will be treated.

The at [ is literally the rounded small letter b found in Guido's soft hexachord on

f. In modern practice, the shape of the at is somewhat more slanted upward than the


ordinary small letter b. when the at is placed on a line, the top of the stem extends to

the second line above. When the symbol is notated in a space, the stem extends to the

center of the second space above.

The form of the sign natural \ has its origin in the square b found in Guido's hard

hexachord. This sign represented the raised form of the pitch b. By adding a short tail

to the square b, and slightly slanting the horizontal strokes, we get the natural sing. The

two tails of this symbol are to be extended quite as far on the sta as the stem of the

at. When the sign is on a line, the upper tail of the natural extends only to the next

higher space and the lower tail to the next lower space. If the sign is in a space, the tails

go respectively to the adjacent lines above and below.

The sharp ] was written in a variety of ways, beginning with what looks like and continuing

with modied forms of the St. Andrew's cross. In our days, it resembles more closely an

elaborated natural sign with an extension of both vertical lines and the horizontal dashes.

Accents Accents marks are symbols for special or exaggerated stress upon any beat, or portion

of a beat. These indications are sometimes called ancillary signs because they are not

such basic symbols as the sta, clefs and notes. The accents are divided into categories:

those for percussive attack (higher dynamic levels) and those for pressure attack (lower

levels). The two principal signs for percussive accent are 4.6 Accent and 4.6 Marcato.

The rst symbol is made identically whether it is placed above or below the notehead.

The second sign is inverted when it is placed below any notehead, stem or beam. Choice

between the two is governed by the degree of force and intensity desired. While the sign 4.6

Marcato is for a stronger attack, the sign 4.6 Accent is for a moderately sharp attack.

The principal symbol for pressure accent is a short, heavy dash placed over or below the

notehead - see Figure 4.6 Tenuto. In our work only the Accent and Staccatissimo accents

was used to facilitate the classication process.

Table 4.6: Accents.

Percussive accents

Accent Marcato

Pressure accents

Staccatissimo Tenuto


Time signatures Meter is a recurring pattern of stress, and an established arrangement of

strong and weak pulsations. These pulsations are also known as beats. Meter is inextri-

cably related to time signatures. This symbol is the vertical arrangement of two gures

placed on the sta following the clef sign and any key signature. When placed on the

sta, in the normal position, the numerator, which indicates the number of beats or the

number of inner pulsations, occupies the two top spaces and the denominator, which

indicates the unit of time, occupies the two bottom spaces. Despite of these terms, a

time signature is not to be regarded as a fraction. There are several possible symbols for

the representation of time signatures. Some are fairly common in practice, some are very

rare. In this work we chose the common time and the cut time.

Table 4.7: Time signatures.

Common time

Cut time

4.2 Segmentation Process

The segmentation process used in this work is based on already existent algorithms [Fuj04,

BBN01]. This step is a hard one because of several reasons. One of the main problems deals

with the variability of the symbols. This is obvious when we consider handwritten music scores

see Figure 4.4(a). However this variability is also found in printed scores, when we have scores

from dierent editors see Figure 4.4(b). This variability will cause troubles of inconsistency

in the size and shape of each object.

Besides that, the number of possible arrangements of symbols primitives is very high. These will

bring complexity and ambiguity in the segmentation process, because we do not know all the

probable groups of symbols. Another problem is related with the sta line removal approach.

Sta lines connect most of the objects and when we eliminate them we may cause breaks in

the music symbols. These will bring undesirable imperfections that will dicult the segmenta-

tion process. Finally, a symbolic high density in the music score will also not help in this phase.

With this brief description of the several problems that we can nd in this step we may conclude

that it is clear that not all problems can be solved at this level, and the imperfections of the

segmentation have to be taken into account during the next processing steps.

The segmentation process architecture is presented in Figure 4.5 and an example of the process

on a real music score is presented in Figure 4.6. This process consists in localizing and isolating

4.2. Segmentation Process 57

(a) Variability in the real scores.

(b) Variability between dierent publishings in printed scores.

Figure 4.4: An exemplicative example of the variability in the music symbols.

the symbols in order to identify them. In this work, the symbols we want to recognize can

be split into four dierent types: the symbols that are featured by a vertical segment with

height greater than a threshold (notes, notes with ags and open notes), the symbols that link

the notes (beams), the remaining symbols connected to sta lines (clefs, rests, accidentals and

time signature), and the symbols above and under sta lines (notes, ties, slurs and accents).

As a result, the segmentation method proposed was based on a hierarchical decomposition

of the music image. This music sheet was therefore analyzed and split by stas, after the

sta lines removal, as being the rst step. Subsequently, the connected components1

1If you note by the Figure 4.6 there are symbols that are not linked together but connected. This is duebecause in our connected component method we use a threshold of 5 pixels for the distance between objects.This is necessary because some music symbols became separate in the sta line removal phase.


Figure 4.5: Segmentation process I.

Figure 4.6: Segmentation process II.

were identied. To extract the symbols with appropriate size, a selection of the connected

components detected in the previous step was done. The thresholds used for the height and

width of the symbols were experimentally chosen as staspaceheight and 16×staspaceheightfor the height and staspaceheight for the width. These values took into account the features

of the music symbols. In each bounding box of the connected components repeated objects

(e.g. when we have a group of notes and a sharp; in this case we have two distinct connected

components: notes and sharp; however when we extract the bounding box of the notes the

sharp is also included see Figure 4.7) can exist. Thus, it was necessary to remove them to

avoid multiple extractions of the same object. In the end, we are ready to nd and extract all

the music symbols. Next we will present the methods used to extract the symbols.

Beam Detection

It is reasonable to think that beams are one of the symbols with harder detection process.

The vicissitude inherent in their action in the musical score is the principle reason for this

diculty. Their shape and size, and the way they connect to each other and to other symbols

are combined in a huge number of dierent ways. They are also prone to present inconsistency

in the thickness and in the link with the stems see Figure 4.8. Thus, we propose a solution

that just checks the presence of a segment of adequate height which connects the extremities

of notes see Figure 4.9. Firstly, in the connected component detected, the stems are found


(a) Original image. (b) Image with two con-nect components: dier-ents shades of gray.

Figure 4.7: An exemplicative example of two dierent connected components in the samebounding box.

and removed to separate the beams from the notes. When we nd all the remaining objects

we select those who have a width bigger than 2×staspaceheight-staineheight and a height

lower than 4×staspaceheight. These values were selected experimentally and in accordance

with the geometric features of the object in cause. The height value needs to take into account

the inclination of the beams that may have. In the end, a check of the presence of a stem is

made to validate the objects selected. It is important to state that the beams are grouped if

the distance between them is lower than a half staspaceheight.

Figure 4.8: An example of the existence of inconsistency in the beam thickness and in the linkwith the stems.

Figure 4.9: Beam segmentation process.

Notes, Notes with Flags and Notes Open Detection

In this work, we decided to consider stems and note heads as one. In this way, we dened the

geometric features of the notes we want to extract as the objects with a height bigger than a

threshold, experimentally selected as twice staspaceheight, and a width limited by two values,


also experimentally chosen as half staspaceheight and 3×staspaceheight. To simplify this task

the beams detected were removed before applying this algorithm see Figure 4.10.

Figure 4.10: Notes segmentation process.

Accidentals, Rests, Accents and Time Signature Detection

Generally, these symbols have similar values for width and height. The procedure used to ex-

tract them was based on the combination of X-Y projection proles technique2 see Figures 4.11

and 4.12. The heuristics used were experimentally chosen and these values are arguably sensi-

ble. On the one hand, we have symbols that have vertical sequence of black pixels, for instance

sharps, naturals and rests. On the other hand, we need to take into account the symbols

topological position, because in this case we are trying to detect accents and time signature.

Figure 4.11: An example of sharp detection on a real score.

2Horizontal (Y) projection has as result a vector which the component i is the sum of all black pixels fromrow i in the image; Vertical (X) projection has as result a vector which the component j is the sum of all blackpixels from column j in the image.


Figure 4.12: An example of sharp detection on a ideal score.

Clefs, Ties and Slurs Detection

These symbols have their own attributes, like a large width for the ties and slurs and a bigger

height for the clefs. In both of them we have no presence of stems. With these in mind,

the projection proles procedure was used with specic heuristics. For clefs a staspaceheight

< width < 4× staspaceheight and 2× staspaceheight < height < 2× numberstaspace ×staspaceheight + numberstaine × staineheight were used in the experiments. These values

take into account the fact that clef symbols are the largest of all the signs, beginning below

the sta and extending above it, as we said in the description 4.1. On the other hand, for the

relations symbols (ties and slurs) the rules for extracting them were based in a large width.

4.2.1 Results

With the purpose of evaluating the performance of the segmentation process and to study

its behaviour with respect to some deformations curvature and rotation applied to the

images, two dierent error metrics were considered: the percentage of false positive symbols

and the percentage of symbols missed to detect. A false positive (failed symbol) happens

when the algorithm falsely identies a musical symbol which is not one; and a false negative

(missed symbol) is when the algorithm fails to detect a music symbol present in the score.

These percentages are computed using the symbols position reference and the symbols position

obtained by the segmentation algorithm. The references for the symbols position were partially

done manually, that is, the music symbols that are in the ideal scores were manually splited

in order to separate the composed existing symbols and the junction of the beams signs

see Figure 4.13(b). In the resulting images the deformations described in Subsection 3.8.1 are

applied. In the end, the music symbols positions are extracted using a Matlab© function that

gives their bounding box. It is important to stress that this experiment was only made with

ideal scores where deformations were applied, because it was our intention to see the behaviour


of the method with respect to some deformations and besides that, this was the only quickest

method of obtaining a database of reasonable number with the symbols position references in

a semi-automatic way3.

(a) Music score before split the composed symbols. (b) Music score after split the composed symbols.

Figure 4.13: An example of how the music symbols that are in the ideal scores were manuallysplited.

An overview from the results from the Figure 4.14 we can conclude that the segmentation

process has a good performance, since the percentage of correct symbols are largely over 80%.

The high percentage of failed symbols that sometimes we nd are related with extra symbols or

noise; that is, in the score that we are segmenting we have more symbols than in the reference

score. For example numbers or symbols that are not our objective to classify. Besides this result,

the performance of segmentation process is almost independent of intensity of the deformation,

for the range of values considered see Figure 4.16.

Figure 4.14: Results of the error metrics.

3We have only split the composed symbols in 18 ideal scores with standard music notation; the deformationswas obtained from these. In total we have 306 images.

4.3. Classication Process 63

Figure 4.15: Results of the error metrics for the rotation deformation.

Figure 4.16: Results of the error metrics for the curvature deformation.

4.3 Classication Process

In this Section several results obtained with the dierent classiers adopted are presented.

These classiers were already presented in Section 2.3. For the neural network, k-nearest

neighbour and support vector machines methods, each image of a symbol was initially resized

to 20 × 20 pixels and then converted to a vector of 400 binary values. This process is sig-


nicantly dierent for the case of the hidden markov model. In the latter, the images were

normalised in a sta height and width of 150 and 30 pixels, respectively. The segmentation-free

approach was used to extract features from images [Pug06]. For this reason, a 2-pixel slidingwindow mechanism was used to produce a sequence of observations. In doing so, dependent

observations are replaced by observations depending on the horizontal position of the window.

We extract the same features that in [Pug06]: n distinct connected components of black pixels,

the largest black element (a(ni)/S, where a(ni) is the area of the largest black element and

S = height× width), the area of the smallest white pixel (a(nj)/S, where a(nj) is the area of

the smallest white element and S = height× width) and the gravity centres.

To assign classes to the images of music symbols the process was as follows: for the case of

neural network, for each of the images converted in to vectors they were assigned to another

one of length 14 this value correspond to the 14 classes of symbols with value 1 in the

correct class and value 0 otherwise. For the other three classiers the images were assigned to

a vector numbering, between 1 and 14, depending on the class to which they belong.

In the training phase, a generalization control capability on the classier was made. Putting

dierently, the available data was divided into training, validation and test set, towards the as-

sessment of the network topology with better performance. By that, through a cross validation

we selected the number of neurons in the hidden layer. In other words, the cross validation

consists in optimizing the model by changing the number of neurons in the hidden layer of

the network using the training data; for a further evaluation of performance the validation set

was tuned. The network model with better result was used to compute the expected error in

the music symbols classication. Thus, the training data was added to the validation data to

conduct a nal training for this network and therefore obtaining the nal model. The expected

error was obtained computing the dierence between the target values and the values obtained

by the network, using the test set. This process is slightly modied when we apply elastic

matching to our database. The elastic matching method is adopted in the training data and

training data with validation set. We increased the dataset by causing variability in our images

in order to simulate distortions that can happen in real scores. Our objective is to prepare the

classier for several possible situations that may occur in handwritten music symbols.

The neural network structure used to classify a real score without elastic matching is illustrated

in Figure 4.17. The input layer is composed by 400 units which output values are the vector

values that represent the image of the music symbol. The output layer has 14 units that cor-

respond to the 14 classes of the symbols. When a pattern belonging to the class i is presented,

the target value for the unit i in the output layer is the maximum value. The range for the

number of hidden neurons chosen in this work was [1, 2, . . . , 9]. The number of neurons in the

hidden layer, for this case, was 9 neurons see Table 4.8. This value was estimated during the

training of the neural network. To each neuron is associated a log-sigmoid activation function.

Model discriminant HMM was adopted to construct a model for each class [AYV01]. The

training of the HMM to learn the parameters of the models (λ = (A;B;π)) was done by


Figure 4.17: Neural network topology.

Handwritten symbols printed symbolsWith EM 9 8

Without EM 9 9

Table 4.8: Number of neurons in the hidden layer (not)using elastic matching method (EM) inthe dataset.

the Baum-Welch algorithm4. The goal of classication is to decide which class the unknown

sequence belongs to, based on the model obtained in the training phase. These symbols are

classied on the basis of the maximum likelihood ratio which is obtained by Viterbi algorithm5.

In Table 4.9 the number of states obtained by cross validation are represented. The range was

4See Appendix A.7 for more details.5See Appendix A.8 for more details.


from 3 to 8.

Handwritten symbols Printed symbols

With EM 8 7Without EM 8 8

Table 4.9: The number of states in the hidden Markov model with and without the elasticmatching method (EM) in the dataset.

As in the previous methods, the value of k, in the k-nearest neigbor classier, and the values

of C and γ, in the support vector machines, were also obtained by cross validation and the

expected error was computed with the test set. The k value needs to be suciently big to

minimize the probability of error occurrence in the classication and suciently tiny for the

closest k points becoming closer to the point to classify. In doing so, for a k = 1, 2, . . . , 10, weobtained a k = 1 for a handwritten and printed dataset with and without the application of the

elastic matching. The parameters for the support vector machines are presented in Table 4.10.

The range was 2, 21.25, 21.5,. . ., 24.75, 25 for C value estimation, and 2−7, 2−6.75, 2−6.5,. . ., 22.75,

23 for γ value estimation.

Handwritten symbols Printed symbols

C γ C γ

With EM C = 23.5 = 11.3137 γ = 2−7 = 0.0078 C = 21.25 = 2.3784 γ = 2−5.75 = 0.0186

Without EM C = 24.5 = 22.6274 γ = 2−6 = 0.0156 C = 21.25 = 2.3784 γ = 2−5.75 = 0.0186

Table 4.10: The values of the parameters C and γ in the support vector machines with andwithout the elastic matching method (EM) in the dataset.

4.3.1 Results

Towards musical handwritten symbol classication, several sets of symbols were extracted from

dierent musical scores to train the classiers. In Table 4.11 the number of handwritten printed

music symbols per class used in the training phase of the classication structures is represented.

The symbols were grouped according to their shape. In doing so, the rests signs were divided

in two groups RestI and RestII; see Table 4.11. Besides that we include the unknown class

to classify those symbols that do not t into any of the other classes. In total we have 3222

handwritten music symbols and 2521 printed music symbols.


Handwritten

Symbol Class Total number

Printed

Symbol Class Total number

Music MusicSymbols Symbols

Accent 189 AltoClef 201

BassClef 26 Beam 291

Beam 438 Flat 155

Flat 230 Natural 127

Natural 317 Note 304

Note 466 NoteFlag 120

NoteFlag 122 NoteOpen 309

NoteOpen 208 TieSlur 67

RestI 135 RestI 63

RestII 401 RestII 321

Sharp 345 Sharp 13

Staccatissimo 21 Time 122

TrebleClef 99 TrebleClef 305

Unknown 404 Unknown 404

Table 4.11: Classes list of handwritten and printed music symbols.


Results obtained Without Elastic Matching‖

Neural network Nearest neighbor Support vector Hidden Markov

machines model

Accent 89,36% 100% 89,36% 95,83%

BassClef 0% 100% 100% 55,56%

Beam 86,24% 98,17% 96,33% 91,89%

Flat 78,95% 100% 100% 81,36%

Natural 83,54% 100% 100% 97,50%

Note 84,48% 97,41% 95,69% 69,49%

NoteFlag 70% 83,33% 93,33% 62,50%

NoteOpen 0% 66,67% 50% 22,22%

RestI 78,79% 100% 96,97% 100%

RestII 99% 100% 100% 93,07%

Sharp 79.07% 96,51% 98,84% 85,23%

Staccatissimo 60% 100% 100% 100%

TrebleClef 58.33% 100% 83,33% 96,30%

Unknown 58.42% 76,24% 93,01% 40,59%

99% CI for the Expected

performed in percentage: [78.85 (1.21); 82.88 (4.47)] [93.41 (0.34); 94.53 (1.24)] [94.59 (0.07); 96.90 (0.27)] [76.58 (0.57); 81.04 (2.09)]

average (standard deviation)

Table 4.12: Results obtained with the test set in the classication process for the handwrittenmusic symbols.


machines model

AltoClef 94% 100% 100% 78%

Beam 93,15% 100% 100% 97,26%

Flat 100% 100% 97,37% 92,68%

Natural 100% 100% 100% 100%

Note 88,16% 98,68% 100% 93,42%

NoteFlag 96% 93,33% 90% 63,33%

NoteOpen 90,67% 100% 100% 88,31%

TieSlur 63.33% 93,33% 86,67% 83,33%

RestI 100% 100% 100% 100%

RestII 60% 100% 100% 86,67%

Sharp 98.75% 100% 100% 100%

Time 0% 100% 100% 33,33%

TrebleClef 93.33% 100% 100% 36,67%

Unknown 67,11% 82,89% 97,37% 73,68%




Table 4.13: Results obtained with the test set in the classication process for the printed musicsymbols.

From the results obtained for the handwritten music symbols we can conclude that the classier

with better performance was the support vector machines with a 99% condence interval for

the expected performance [94.59%; 96.90%]. Besides that, Hidden Markov Model performed

better than neural network since the latter had an higher standard deviation than the other

and it also had two classes (BassClef and NoteOpen) of symbols with 0%.

‖See the tables of confusion in the Appendix B.


The results obtained for the printed music symbols show us that the support vector machine

is again the better classier compared with the others with a 99% condence interval for the

expected performance [97.30%; 98.29%].

Results obtained With Elastic Matching∗∗

The database during the cross validation process was divided into train, validation and test

sets. The deformations caused by the deformation function 2.10 of the elastic matching with

M = 1, 2, 3 and N = 1, 2, 3 were applied in the training data and training data with validation

set.


machines model

Accent 91,50% 100% 100% 91,66%

BassClef 0% 100% 100% 66,67%

Beam 86,24% 95,41% 100% 96,40%

Flat 71,93% 100% 100% 71,19%

Natural 89,87% 100% 100% 92,50%

Note 74,14% 96,55% 100% 77,97%

NoteFlag 40% 86,67% 100% 28,13%

NoteOpen 0% 66,67% 100% 22,22%

RestI 45,45% 100% 100% 100%

RestII 93% 100% 100% 85,15%

Sharp 88,37% 98,84% 100% 81,82%

Staccatissimo 20% 100% 100% 100%

TrebleClef 58.33% 100% 100% 77,78%

Unknown 44,55% 63,37% 100% 30,69%




Table 4.14: Results obtained with the test set in the classication process for the handwrittenmusic symbols.

The results lead us to conclude that the application of the elastic matching to the music sym-

bols does not outperform the performance of the classiers without elastic matching. However,

in handwritten music symbols with higher similarities in their shapes, the process had a better

behaviour than the classiers without elastic matching see Table 4.16. This feature lead us

to suggest a study, in a future work, where the application of the deformation function used

only in these types of symbols to determine if the expected performed is augmented.

The same does not happen with printed music symbols. Our explanation resides on the fact

that these signs do not have high distortions in their shapes. Thus the elastic matching method,

in presence of such music symbols, makes the performance of the classiers slightly lower than

with handwritten music symbols.

The best classier for the test set of real scores was the support vector machines with a 99%condence interval for the expected performance [87.79%; 96.00%]. However, the standard

deviation was very high [1.95%; 8.82%]. Consequently, I sugest the nearest neighbor classier as

∗∗See the tables of confusion in the Appendix B.



machines model

AltoClef 86% 98% 90% 78%

Beam 97,26% 100% 90,41% 100%

Flat 94,74% 100% 94,74% 97,37%

Natural 87.10% 100% 96,77% 100%

Note 84.21% 100% 97,37% 93,33%

NoteFlag 40% 90% 93,55% 50%

NoteOpen 92.21% 98,70% 90,79% 80,52%

TieSlur 13.33% 90% 86,67% 70%

RestI 100% 100% 93,75% 100%

RestII 0% 100% 93,33% 86,67%

Sharp 97.5% 98,77% 97,5% 100%

Time 0% 100% 100% 33,33%

TrebleClef 93,33% 100% 100% 100%

Unknown 56,57% 78,95% 100% 43,42%


performed in percentage: [77.66 (1.34); 82.11 (4.93)] [95.34 (0.22); 96.06 (0.80)] [93.13 (0.26); 94.23 (1.18)] [73.00 (1.00) 88.71 (3.67)]


Table 4.15: Results obtained with the test set in the classication process for the printed musicsymbols.

Neural network Nearest neighbor Support vector machines

With EM Without EM With EM Without EM With EM Without EM

Natural 89.87% 83.54% 100% 100% 100% 100%

Sharp 88.37% 79.07% 98.84% 96.51% 100% 98.84%

Table 4.16: The performed of the natural and sharp symbols.

the best classier with a 99% condence interval for the expected performance [92.24%; 93.29%],

because with a lower standard deviation ([0.32%; 1.17%]) the results will not largely span in

the classier performance. The best classier for the synthetic scores was the nearest neighbor

classier with a 99% condence interval for the expected performance [95.34%; 96.06%].

Part IV

Conclusions and Future Work

71

Chapter5

Conclusion

This dissertation had the aim to overcome the predicament of musical symbol recognition in

handwritten scores through research and application of recent techniques of machine learning

and articial intelligence to this problem. In fact, despite advances made in the area of research

in optical musical recognition algorithms, as we have seen in the brief description of state of

the art, several open problems still exist with handwritten music sheets. On the one hand, the

scores tend to be rather irregular and conditioned by the authors' own writing style, and the

quality of the paper in which it is written might have degraded throughout the years, making

it a lot harder to correctly identify its contents. On the other hand, they are likely to have

changes of size, shape and intensity of handwritten symbols by the same author into the same

score and the sta lines may be tilted one way or another on the same page. They also may

be curved and may have discontinuities. As a result, the detection and recognition process in

the handwritten music scores becomes more complicated than in printed music sheets.

The rst challenge faced by an OMR system was at sta lines detection and removal. These

operations were one of the most important phases of musical symbol recognition, because they

determine the results for subsequent proceedings. It was seen that the reasons for detecting and

removing the sta lines lie on the need to isolate the musical symbols for a more ecient and

correct detection of each symbol present on the score. The work of this dissertation resulted

in a robust algorithm for the automatic detection of sta lines in music scores based in stable

paths. The proposed method used a very simple but fundamental principle to assist detection

and avoids the diculties typically asserted by symbols superimposed on sta lines. The main

idea was to consider the sta lines as the result of the shortest path between the two margins

of the music sheet, giving preference to black pixels. This approach for the sta line detection

algorithm was adapted to a wide range of image conditions, thanks to its intrinsic robustness to

skewed images, discontinuities, and curved sta lines. The proposed method was also robust to

discontinuities in sta lines (due to low-quality digitalization or low-quality originals) or sta

lines as thin as one pixel. In order to take full advantage of the method, existing sta line

removal algorithms were enhanced by using the stable paths method as its rst processing step.

Several tests that enabled improvements in this proposed approach that induced better results

were also performed. The encouraging results obtained led us to consider that investigation in

the detection of music symbols would be beneted from the improved sta line detection and

removal. In doing so, the next step was the application of already existent algorithms for the

73

74 Chapter 5. Conclusion

segmentation process using in the previous step the stable paths approach for the detection of

the sta lines.

Since in this work we dealt with handwritten musical scores the segmentation and classication

steps were hard. On one hand, we had a huge variability in the music symbols that caused

inconsistency problems in the size and shape of each object. On the other hand, the sta

lines removal may cause breaks in the music symbols that can insert imperfections into them.

Besides that, we also had complexity and ambiguity problems related with the number of pos-

sible arrangements of music primitives. The segmentation method was based on a hierarchical

decomposition of the music image and the symbols that we wanted to recognize were split into

four dierent types: the symbols that are featured by a vertical segment, the symbols that link

the notes, the remaining symbols connected to sta lines and the symbols above and under sta

lines. In the music symbol classication a profound comparative study with some classiers,

as the hidden Markov model and support vector machine, was presented. A new methodology

based on elastic matching was used in our dataset with conjunction with others classiers. Our

aim was to simulate controlled distortions, in the database, that can happen in real scores to

prepare the classier for several undesirable situations that may occur in handwritten music

sheets. With the study done, we concluded that the application of the elastic matching to

the music symbols did not outperform the performance of the classiers without this method.

However, we also saw that the results for classiers with elastic matching presented a better

behaviour for the handwritten music symbols with higher similarities in their shapes. The

best classier for handwritten music symbols was support vector machines with a performance

higher than 94.59% for the test set without elastic matching and 93.10% for the test set with

elastic matching. The best classier for printed music symbols without elastic matching was

support vector machines with a performance higher than 97.30% and nearest neighbor classier

for the test set with elastic macthing with a performance higher than 95.34%.

5.1 Future Work

The various approaches in OMR to musical symbols segmentation and classication are still

below the expectations for handwritten musical scores. It is intention of this project, in a

future work, that the proposed methodology incorporates in a natural way the prior knowledge

of the musical rules in the recognition of symbols, in order to overcome the limitations of

the current approaches that are incapable of dealing, in a robust form, with specicities of

the handwritten music. The new proposed methodology should also be naturally adaptable

to manuscript images and to dierent music notations. Specically, we intend to explore our

initial work in several directions:

Continue to investigate new methods of optical music recognition for handwritten musical

scores and for dierent music notations. This line of research involves a study and a

profound understanding of the latest techniques of pattern recognition, machine learning

and inductive logic programming. The merger of rules and techniques from dierent areas

should help overcome the problems of the existing algorithms.

5.1. Future Work 75

Integration of the algorithms developed in an OMR system with remote access via the

internet to a wide corpus of unpublished handwritten music encoded in a adequate format;

The creation of a system like this will not only centralize as much information as possible

but will also serve to preserve this corpus in a way that is easily accessible for browsing,

analysing and downloading, while keeping the scores in their original format along with

their digital counterpart. The availability of a system with these features will contribute

for the preservation of the musical heritage.

Specically, one of the objectives for a future work, is the investigation of new methods for

the segmentation phase. It is our intention to incorporate Hidden Markov Models in this

step. It will be necessary to perform an intense study and investigation in this eld. Besides

that, a simple detection of the music symbols is not sucient. It is also necessary to perform

their interpretation and to understand the connections between them to have a recognition

of the analyzed score. Therefore, the denition of these relations are needed before they are

implemented in a system. In this manner, the integration of syntactic music rules is required

in the recognition process to acquire semantic information. Thus, this project has as another

goal the research and application of inductive logic programming techniques.

References

[AS07] Shai Avidan and Ariel Shamir. Seam carving for content-aware image resizing.

ACM Trans. Graph., 26(3):10, 2007.

[AYV01] N. Arica and F.T. Yarman-Vural. An overview of character recognition focused

on o-line handwriting. Systems, Man, and Cybernetics, Part C: Applications and

Reviews, IEEE Transactions on, 31(2):216233, May 2001.

[Bai97] D. Bainbridge. Extensible Optical Music Recognition. PhD thesis, Department of

Computer Science, University of Canterbury, Christchurch, NZ, 1997.

[BBN01] P. Bellini, I. Bruno, and P. Nesi. Optical music sheet segmentation. Web Delivering

of Music, 2001. Proceedings. First International Conference on, pages 183190,

Nov. 2001.

[BS00] Marija Bojovic and Milan D. Savic. Training of hidden markov models for cursive

handwritten word recognition. In ICPR '00: Proceedings of the International

Conference on Pattern Recognition, page 1973, Washington, DC, USA, 2000. IEEE

Computer Society.

[Cap08] Artur Capela. Reconhecimento de símbolos musicais manuscritos na framework

gamera. Master's thesis, Faculdade de Engenharia da Universidade do Porto, 2008.

[Car89] N. P. Carter. Automatic Recognition of Printed Music in the Context of Electronic

Publishing. PhD thesis, Departments of Physics and Music, 1989.

[CC93] B. Coüasnon and J. Camillerapp. Using grammars to segment and recognize music

scores. In Proc. of DAS-94: International Association for Pattern Recognition

Workshop on Document Analysis Systems, pages 1527, Kaiserslautern, 1993.

[CCR+08] Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos Guedes, and Joaquim Pinto

da Costa. Sta detection with stable paths, 2008. IEEE Transaction on Pattern

Analysis and Machine Intelligence (TPAMI), Submited.

[CCRG08a] Artur Capela, Jaime S. Cardoso, Ana Rebelo, and Carlos Guedes. Integrated

recognition system for music scores, 2008. International Computer Music Confer-

ence (ICMC), Accepted.

[CCRG08b] Jaime S. Cardoso, Artur Capela, Ana Rebelo, and Carlos Guedes. A connected

path approach for sta detection on a music score, 2008. IEEE International

Conference on Image Processing (ICIP), Accepted.

76

References 77

[Coü96] B. Coüasnon. Segmentation et reconnaissance de documents guidées par la con-

naissance a priori: application aux partitions musicales. PhD thesis, Université de

Rennes, 1996.

[CRCG08] Artur Capela, Ana Rebelo, Jaime S. Cardoso, and Carlos Guedes. Sta line de-

tection and removal with stable paths. In Proceedings of the International Con-

ference on Signal Processing and Multimedia Applications (SIGMAP 2008), pages

263270, 2008.

[DDCF07] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski, and Ichiro Fuji-

gana. Sta removal toolkit for gamera, 2005-2007. http://music-staves.

sourceforge.net.

[DDCF08] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski, and Ichiro Fujigana.

A comparative study of sta removal algorithms. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 30:753766, 2008.

[Die05] Reinhard Diestel. Graph Theory. Graduate Texts in Mathematics. Springer-Verlag,

third edition, 2005.

[eeFc02] Vojt ech ech Franc and Václav Hlavá c. Multi-class support vector machine. Tech-

nical report, 2002.

[FLS05] Alicia Fornés, Josep Lladós, and Gemma Sánchez. Primitive segmentation in old

handwritten music scores. In Wenyin Liu and Josep Lladós, editors, GREC, volume

3926 of Lecture Notes in Computer Science, pages 279290. Springer, 2005.

[Fuj04] Ichiro Fujinaga. Sta detection and removal. In Susan George, editor, Visual

Perception of Music Notation: On-Line and O-Line Recognition, pages 139.

Idea Group Inc., 2004.

[Fuk90] Keinosuke Fukunaga. Introduction to statistical pattern recognition (2nd ed.). Aca-

demic Press Professional, Inc., San Diego, CA, USA, 1990.

[Gal86] Zvi Galil. Ecient algorithms for nding maximum matching in graphs. ACM

Comput. Surv., 18(1):2338, 1986.

[GWE04] Rafael C. Gonzalez, Richard E. Woods, and Steven L. Eddins. Digital Im-

age processing using MATLAB, pages 405407. Upper Saddle River, NJ :

Pearson/Prentice-Hall, 2004.

[Has70] W. K. Hastings. Monte carlo sampling methods using markov chains and their

applications. Biometrika, 57(1):97109, April 1970.

[Hay98] Simon Haykin. Neural Networks: A Comprehensive Foundation (2nd Edition).

Prentice Hall, July 1998.

[HL02] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support

vector machines. IEEE Transactions on Neural Networks, 13(2):415425, 2002.

http://music-staves.sourceforge.net

http://music-staves.sourceforge.net

78 References

[JZ97] Anil K. Jain and Douglas Zongker. Representation and recognition of handwritten

digits using deformable templates. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 19(12):13861391, 1997.

[JZL96] Anil K. Jain, Yu Zhong, and Sridhar Lakshmanan. Object matching using de-

formable templates. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 18(3):267278, 1996.

[KHB+00] Tapas Kanungo, Robert M. Haralick, Henry S. Baird, Werner Stuezle, and David

Madigan. A statistical, nonparametric methodology for document degradation

model validation. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 22(11):12091223, 2000.

[KI90] H. Kato and S. Inokuchi. A recognition system for printed piano music using

musical knowledge and constraints. In Proceedings of the International Association

for Pattern Recognition Workshop on Syntactic and Structural Pattern Recognition,

pages 231248, 1990.

[KPM96] Gary E. Kopec, Phil A. Chouxerox Parc, and David A. Maltzcarnegie. Markov

source model for printed music decoding. Journal of Electronic Imaging, pages

714, 1996.

[LCL93] I. Leplumey, J. Camillerapp, and G. Lorette. A robust detector for music staves.

In Proceedings of the International Conference on Document Analysis and Recog-

nition, pages 902905, 1993.

[LS88] L. Lam and Ching Y. Suen. Structural classication and relaxation matching of

totally unconstrained handwritten zip-code numbers. Pattern Recogn., 21(1):19

32, 1988.

[MA98] E. Mayoraz and E. Alpaydim. Support vector machines for multiclass classication.

Technical report, 1998.

[Mah82] J. V. Mahoney. Automatic analysis of music score images. B.Sc thesis, 1982.

[Mat85] T. Matsushima. Automated high speed recognition of printed music (wabot2

vision system). Advanced Robotics 1985. ICAR 1985. International Conference

on, pages 477 482, 1985.

[MMM04] Youichi Mitobe, Hidetoshi Miyao, and Minoru Maruyama. A fast hmm algo-

rithm based on stroke lengths for on-line recognition of handwritten music scores.

In IWFHR '04: Proceedings of the Ninth International Workshop on Frontiers

in Handwriting Recognition, pages 521526, Washington, DC, USA, 2004. IEEE

Computer Society.

[MN96] H. Miyao and Y. Nakano. Note symbol extraction for printed piano scores using

neural networks. IEICE TRANSACTIONS on Information and Systems, E79-

D:548554, 1996.

References 79

[MO07] Hidetoshi Miyao and Masayuki Okamoto. Stave extraction for printed music scores

using DP matching. Journal of Advanced Computational Intelligence and Intelli-

gent Informatics, 8:208215, 2007.

[MP88] Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent

in nervous activity. pages 1527, 1988.

[Ng95] Kia Ng. Automated computer recognition of music score. PhD thesis, University

of Leeds, 1995.

[Nis95] Hirobumi Nishida. A structural model of shape deformation. Pattern Recognition,

28(10):16111620, 1995.

[Pre70] D. Prerau. Computer pattern recognition of standard engraved music notation.

PhD thesis, Massachusetts Institute of Technology, 1970.

[Pru66] D. Pruslin. Automatic recognition of sheet music. PhD thesis, Massachusetts

Institute of Technology, 1966.

[Pug06] Laurent Pugin. Optical music recognition of early typographic prints using hidden

markov models. In ISMIR, pages 5356, 2006.

[RB05] F. Rossant and I. Bloch. Optical music recognition based on a fuzzy modeling of

symbol classes and music writing rules. Image Processing, 2005. ICIP 2005. IEEE

International Conference on, 2:II53841, Sept. 2005.

[RCC+07] Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, Carlos Guedes, Eurico Car-

rapatoso, and Jaime S. Cardoso. A shortest path approach for sta line detection.

Automated Production of Cross Media Content for Multi-Channel Distribution,

2007. AXMEDIS '07. Third International Conference on, pages 7985, Nov. 2007.

[RCF+93] R. Randriamahefa, J.P. Cocquerez, C. Fluhr, F. Pepin, and S. Philipp. Printed

music recognition. Document Analysis and Recognition, 1993., Proceedings of the

Second International Conference on, pages 898901, Oct 1993.

[Rea69] Gardner Read. Music Notation: A Manual of Modern Practice (2nd ed.). Ta-

plinger, New York, 1969.

[RHW88] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal represen-

tations by error propagation. pages 673695, 1988.

[Ros02] Florence Rossant. A global method for music symbol recognition in typeset music

sheets. Pattern Recognition Letters, 23(10):11291141, 2002.

[RP96] K.T. Reed and J.R. Parker. Automatic computer recognition of printed music.

Pattern Recognition, 1996., Proceedings of the 13th International Conference on,

3:803807 vol.3, Aug 1996.

80 References

[RT88] J. W. Roach and J. E. Tatem. Using domain knowledge in low-level visual process-

ing to interpret handwritten music: an experiment. Pattern Recognition, 21(1):33

44, 1988.

[SDV06] C.H. Papadimitriou S. Dasgupta and U.V. Vazirani. Algorithms, pages 169179.

McGraw-Hill Higher Education, 2006.

[Szw05] Mariusz Szwoch. A robust detector for distorted music staves. In Computer Anal-

ysis of Images and Patterns, pages 701708. Springer-Verlag, Heidelberg, 2005.

[TK03] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern recognition, chapter

4.6. Academic Press, second edition, 2003.

[TMD99] Michael Thulke, Volker Märgner, and Andreas Dengel. A general approach to

quality evaluation of document segmentation results. In DAS '98: Selected Papers

from the Third IAPR Workshop on Document Analysis Systems, pages 4357,

London, UK, 1999. Springer-Verlag.

[TSM06] Fubito Toyama, Kenji Shoji, and Juichi Miyamichi. Symbol recognition of printed

piano scores with touching symbols. pages 480483, 2006.

[Vap98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, September

1998.

[Wak94] T. Wakahara. Shape matching using lat and its application to handwritten numeral

recognition. IEEE Trans. Pattern Anal. Mach. Intell., 16(6):618629, 1994.

[WFJC01] Yuan-Kai Wang, Kuo-Chin Fan, Yau-Tarng Juang, and Tai-Hong Chen. Using

hidden markov model for chinese business card recognition. In ICIP (1), pages

11061109, 2001.

[WH88] Bernard Widrow and Marcian E. Ho. Adaptive switching circuits. pages 123134,

1988.

Part V

Appendix

81

AppendixA

Fundamentals

A.1 Primal Problem vs Dual Problem

The primal problem for the case of nonseparable training classes (xi, di)Ni=1 in the feature

space is given by

minw,b,ξi

12wTw + C

N∑

i=1

ξi

s.a di(wTxi + b) ≥ 1− ξi i = 1, 2, ..., N

(A.1)

where the non negatives slack variables, ξiNi=1, measure the deviation of a data point from

the ideal condition of pattern separability and the parameter C controls the relation between

the complexity of the classier and the number of nonseparable points. It is possible to nd the

solution of the quadratic problem by determining the saddle point of the Lagrangian function

that is given by:

J(w, b, ξ, α, µ) =1

2wTw + C

N∑

i=1

ξi −N∑

i=1

αi

[di(w

T xi + b)− (1− ξi)]−

N∑

i=1

µiξi (A.2)

which has to be minimized with respect to w, b and ξi; it also has to be maximized with

respect to the nonnegative multipliers αi e µi. The parameters that minimize the function

J(w, b, ξ, α, µ) meet the following conditions:

∂J

∂w= w −

N∑

i=1

αidixi = 0 (A.3)

∂J

∂b= −

N∑

i=1

αidi = 0 (A.4)

∂J

∂ξi= C − µi − αi = 0 i = 1, 2, . . . , N (A.5)

From these conditions we have

w =N∑

i=1

αidixi (A.6)

83

84 Appendix A. Fundamentals

N∑

i=1

αidi = 0 (A.7)

µi + αi = C i = 1, 2, . . . , N (A.8)

If we expand the equation A.2, term by term, we have:

J(w, b, ξ, α, µ) =1

2wTw + C

N∑

i=1

ξi −N∑

i=1

αidiwT xi + b

N∑

i=1

αidi +

N∑

i=1

αi −N∑

i=1

αiξi −N∑

i=1

µiξi (A.9)

The fourth term in the right-hand side is zero by the optimality condition A.7 obtained above.

Besides that, from the condition A.6 we have

wTw =N∑

i=1

αidiwTxi =

12

N∑

i=1

N∑

j=1

αiαjdidjxTi xj

Replacing the condition A.8 and the result obtained above in the equation A.9, we obtain the

following dual objective function Lagrangiana:

JD =N∑

i=1

αi −12

N∑

i=1

N∑

j=1

αiαjdidjxTi xj (A.10)

which gives the inferior limit of the function objective A.1 to any admissible point. In doing

so, it is possible to formulate the dual problem for a non separable training sample(xi, di)Ni=1

as follows:

maxα

N∑

i=1

αi −12

N∑

i=1

N∑

j=1

αiαjdidjxTi xj

s.a

w =∑N

i=1 αidixi∑Ni=1 αidi = 0

C − µi − αi = 0 i = 1, 2, ..., Nαi ≥ 0, µi ≥ 0 i = 1, 2, ..., N

(A.11)

Adding the complementary conditions of the Karush-Kuhn-Tucker

αi[di(wTxi + b)− (1− ξi)

]= 0 (A.12)

µiξi = 0 (A.13)

αi[di(wTxi + b)− (1− ξi)

]≥ 0 (A.14)

to the conditions A.6A.8 for i = 1, . . . , N it is possible reformulate the dual problem A.11 as

follows:

A.2. Error Backpropagation Algorithm 85

maxα

N∑

i=1

αi −12

N∑

i=1

N∑

j=1

αiαjdidjxTi xj

s.a

∑Ni=1 αidi = 0

0 ≤ αi ≤ C i = 1, 2, ..., N

(A.15)

A.2 Error Backpropagation Algorithm

The error backpropagation method is a neural network train without feedback. This tech-

nique requires a dierentiable transfer function to be possible the minimization of the cost

function [TK03, Hay98]. Since the multilayer perceptron architectures apply the activation

function degree:

f(x) =

1, se x > 0

0, se x < 0(A.16)

which is discontinues in the point 0, this algorithm uses the family of continuous dierentiable

functions which is the family of sigmoid function:

f(x) =1

1 + exp(−ax),where a> 0 is the slope parameter (A.17)

A.2.1 Mathematic explanation

The learning method of using the error backpropagation uses the gradient descending technique

to minimize the cost function. This function expresses the mean square error dened as follows:

E(i) =N∑

i=1

12

kL∑

m=1

(ym(i)− ym(i))2 (A.18)

where ym(i) and ym(i) represent respectively the output obtained and the desired output for

the output neuron m. kL is the total number of the outputs neurons and N is the total number

of training samples available (y(i), x(i)). The updating of the weights of the edges is made

from

wrj (new) = wrj (old)− µ∂E(i)∂wrj

(A.19)

where wrj (old) is the current estimate of the unknow weights, µ is the learning coecients and∂E(i)∂wr

jis the gradient of error function.

Let υrj be the sum of the input weights in the jth neuron in the rth layer and yrj the corresponding

output after the activation function. If yr−1k is the output of the kth neuron, k = 1, 2, ..., kr−1,

in the r − 1th layer for the ith training pair and wrjk is the current estimate of the corresponding

weight in the jth neuron in the rth layer,j = 1, 2, ..., kr, then the argument of the activation

function f(.) of the latter neuron is


υrj (i) =kr−1∑

k=1

wrjkyr−1k (i) + wrj0 =

kr−1∑

k=0

wrjkyr−1k (i) (A.20)

where yr0(i) = 1,∀r, i. For the output layer, we have r = L, yrk(i) = yk(i), with k = 1, 2, ..., kLand for the network input we have r = 1, yrk(i) = xk(i), with k = 1, 2, ..., k0. By the chain rule

in dierentiation, we have

∂E(i)∂wrj

=∂E(i)∂υrj (i)

∂υrj (i)∂wrj

(A.21)

from A.20 we obtain

∂

∂wrjυrj (i) ≡

∂∂wr

j0υrj (i)...

∂∂wr

jkr−1

υrj (i)

= yr−1(i) (A.22)

Let us dene,

∂E(i)∂υrj (i)

= δrj (i) (A.23)

then A.19 becomes

wrj (new) = wrj (old)− µN∑

i=1

δrj (i)yr−1(i) (A.24)

Summarizing the error backpropagation algorithm:

1. Inicialization: Initialize all the weights with small random values.

2. Forward computations: For each of the training feature vectors x(i), i = 1, 2, ..., N com-

pute υrj (i) and yrj (i) = f(υrj (i)), for j = 1, 2, ..., kr, r = 1, 2, ..., L from the sigmoid

function f . Compute the cost function for the current estimate of weights from E(i) =∑Ni=1

12

∑kLm=1 (ym(i)− ym(i))2 andE(i) ≡ 1

2

∑kLm=1 e

2m(i) ≡ 1

2

∑kLm=1 (f(vLm(i))− ym(i))2.

3. Backward computations: For each i = 1, 2, ..., N and j = 1, 2, ..., kr compute δLj (i) from

δLj (i) = ej(i)f ′(υLj (i)) and in the end compute δr−1j (i) from δr−1

j (i) =∑kr

k=1 δrj (i)w

rkjf′(υr−1

j (i)),for r = L,L− 1, ..., 2 and j = 1, 2, ..., kr; f ′(.) represents the dierentiation with respect

to argument.

4. Update the weights: For r = 1, 2, ..., L and j = 1, 2, ..., krwrj (new) = wrj (old)− µ∑N

i=1 δrj (i)y

r−1(i).

A.2.2 Graphic explanation

To facilitate the layout of this method we will only consider a neural network with three layers

see Figure A.1. Each neuron is composed by the sum of products of the weights with the

input signals and the sigmoid activation function. The signal e represents the output signal of


Figure A.1: Neural network architecture with three layers, two inputs and one output.

the sum and y = f(e) represents the output signal of the neuron.

To prepare the network for the classication it is necessary to train a data set constituted

by the input signals (x1 and x2) associated with their targets z. This training is an interac-

tive process, because for each interaction the node weights are modied using a new training

dataset, in order to minimize the cost function. This modication is computed using the error

backpropagation algorithm. In doing so, each learning step starts by forcing both input signals

of the training set to determine the values of the output signals for each neuron in each network

layer. Figure A.2 shows how the signal propagates through the network. The symbols w(xm)n

are the weights of the edges between the network input xm and the neuron n in the input layer;

the symbols yn represent the output signal in the neuron n.

In the next step, the output signal y is compared with the value of the desired output (z)

see Figure A.3. The dierence is called error signal d of the neuron output layer. Since the

output values of the neurons in the hidden layers are unknown, it is possible to compute the

error signal. In this manner, the aim is to propagate the error signal d in backward mode to all

the neurons - see Figure A.4. The value of the weights wmn is equal to the value of the weights

to calculate the output value. Only the direction of the sense of the data is changed, that is,

the signals propagate itself, for all layers of network, from the output to the input, one behind

the others.

When the error signal for each neuron has been computed, the weights coecients for each

input neuron are modied see Figura A.51.

1 df(e)de

represents the derivative of the activation function of the neuron whose weights are modied.


(a) Forward propagation of the signal in the inputlayer.

(b) Forward propagation of the signal in the inputlayer.

(c) Forward propagation of the signal in the inputlayer.

[(d) Forward propagation of the signal in the hiddenlayer.

(e) Forward propagation of the signal in the hiddenlayer.

(f) Forward propagation of the signal in the outputlayer.

Figure A.2: Forward propagation of the signal in the neural network.

Figure A.3: Comparison between the signal output and the target.


(a) Backward propagation of the signal in the outputlayer.

(b) Backward propagation of the signal in the outputlayer.

(c) Backward propagation of the signal in the hiddenlayer.

(d) Backward propagation of the signal in the hiddenlayer.

(e) Backward propagation of the signal in the hiddenlayer.

Figure A.4: Propagation of the error signal d in backward mode in the neural network.


(a) The calculation of weight in the input layer. (b) The calculation of weight in the input layer.

(c) The calculation of weight in the input layer. (d) The calculation of weight in the hidden layer.

(e) The calculation of weight in the hidden layer. (f) The calculation of weight in the output layer.

Figure A.5: The calculation of weights in the neural network.

A.3. Dalitz's Algorithm 91

A.3 Dalitz's Algorithm

Dalitz algorithms is a generalization of the method described in [MO07]. It stars by estimating

the values for the sta line thickness and the sta space height. These values are estimated by

the technique used in Fujinaga [Fuj04]: the most frequent black-runs represents the sta line

height (staine_height) and the most common white-runs represents the vertical line distance

within the same sta (staspace_height). This technique starts by computing the vertical run-

lengths representation of the image.

The process of Dalitz for nding the sta lines operates on a set of stasegments and requires

methods not only for linking two of those stasegments horizontally and vertically, but also for

merging two segments with overlapping positions into one. In the following steps the algorithm

will be described:

1. Add vertical links between stasegments with a vertical distance around

staine_height+staspace_height.

2. Add horizontal links between adjacent stasegments possibly belonging to the same

sta line.

3. Partition the resulting graph into connected subgraphs; each subgraph that is wide and

high enough corresponds to a sta.

4. All stasegments within a system are labelled as belonging to a certain sta line. Seg-

ments of the same line at the same horizontal position are merged into one segment.

5. Due to ledger lines, ties and beams, some subgraphs will contain too many sta lines. To

reduce them to a predened number of lines per sta (typically ve for modern notation,

four for chant and six for tablature), the outer sta lines of each sta are subsequently

removed until the predened number of sta lines remains.

Dalitz developed the following algorithm to nd the stasegments:

1. Extract horizontal runs with more than 60 percent black pixels within a window of width

staspace_height.

2. The resulting laments are vertically thinned by replacing each vertical black run with

its middle pixel. For black runs higher than 2*staine_height, more than one skeleton

point is extracted.

3. The resulting skeleton segments wider than 2*staine_height are the stasegments.

A.4 Matching in Bipartite Graphs

Before moving to the denition of the Kuhn-Munkres Algorithm it is necessary to give some

denitions. Let G = (X ∪ Y,X × Y ) be a weighted complete bipartite graph, where edge xy

has weight w(xy). X and Y have the same size, n, and can be written as X = x1, x2, . . . , xnand Y = y1, y2, . . . , yn.

Denition 1: A feasible vertex labelling for N is a function l : V (N) → Z such that l(x) +l(y) = w(xy) for all x ∈ X and y ∈ Y . We dene the size of l, by size(l) =

∑v∈V (N) l(v).


Denition 2: Let l be a feasible vertex labelling of N . Then the equality subgraph Gl for l in

N is the spanning subgraph of N containing all edges xy for which l(x) + l(y) = w(xy).

Suppose P is a network obtained fromG = (X∪Y,X×Y ) by giving each edge e an integer weightw(e). The algorithm iteratively constructs a sequence of feasible vertex labelling l1, l2, . . . for

P such that size(li+1) < size(li), and a sequence of matchings Mi such that Mi is a maximum

matching in the equality subgraph G(li), for all i ≥ 1. It stops when it nds a feasible vertex

labelling li for which Mi is a perfect matching in G(li).

The Kuhn-Munkres Algorithm (or Hungarian method) Start with an arbitrary feasi-

ble vertex labeling l, determine Gl, and choose an arbitrary matching M in Gl.

1. If M is complete for G, then M is optimal. Stop. Otherwise, there is some un-

matched x ∈ X. Set S = x and T = ∅.

2. If JGl(S) 6= T , go to step 3. Otherwise, JGl

(S) = T . Find

αl = minx∈S,y∈T c

l(x) + l(y)− w(xy

where T c denotes the complement of T in Y , and construct a new labeling l′ by

l′(v) =

l(v)− αl for v ∈ Sl(v) + αl for v ∈ T

l(v) otherwise

Note that αl > 0 and JG′l(S) 6= T . Replace l by l′ and Gl by Gl′ .

3. Choose a vertex y in JGl(S), not in T . If y is matched inM , say with z ∈ X, replace

S by S ∪ z and T by T ∪ y, and go to step 2. Otherwise, there will be an M

alternating path from x to y, and we may use this path to nd a larger matching

M ′ in Gl. Replace M by M ′ and go to step 1.

A.5 Otsu Threshold Algorithm

The segmentation process divides the image into sub- homogeneous regions. Homogeneity is

the existence of compliance of some features as intensity of colour or levels of gray. The Otsu

algorithm is used to nd the parameter of intensity T that divides the initial image into two

parts - classes of white pixels and classes of black pixels. This method considers that any point

that presents an intensity equal or greater than T belongs to the region of interest, and all

the others are considered background of the image. In this manner, a new image g(x, y) is

generated as follows:

g(x, y) =

1 se f(x, y) ≥ T0 se f(x, y) < T

A.5. Otsu Threshold Algorithm 93

where f(x, y) is the original image. After the segmentation, the matrix of the new image will

be constituted by 0's black pixels and 1's white pixels [GWE04].

Otsu considers the image normalized histogram as a function of density of discrete probability,

as follows:

pr(rq) =nqn, q = 0, 1, 2, . . . , L− 1

where n is the total number of pixels in the image, nq is the number of pixels that have intensity

rq and L is the total number of possible levels of intensity in the image. The Otsu method

chooses the threshold k, such that k is the intensity level where C0 = [0, 1, . . . , k − 1] andC1 = [k, k + 1, . . . , L − 1], that maximize the variance between the classes σ2

B, that is dened

as

σ2B = ω0(µ0 − µT )2 + ω1(µ1 − µT )2

where,

ω0 =∑k−1

q=0 pq(rq)

ω1 =∑L−1

q=k pq(rq)

µ0 =∑k−1

q=0 qpq(rq)/ω0

µ1 =∑L−1

q=k qpq(rq)/ω1

µT =∑L−1

q=0 qpq(rq)


A.6 Sta Line Removal Algorithm

staffLineRemoval(IMAGE,STAVES)

threshold = 2*staffHeight;

tolerance = 1+ceil(staffHeight/3.0);

IMAGE\_REMOVE = copy(IMAGE);

For(nvalid = 0 to STAVES size)

Point2D staff = validStaves[nvalid];

For (i = 0 to staff size)

col = staff[i].x;

refRow = staff[i].y;

row = refRow;

pel = valuePixel(IMAGE, IMAGE\_REMOVE);

decrement/increase the reference row until one pixel

different from white pixel (dist1/dist2) is found;

If ( dist1 <= max(1, min (dist2, tolerance)) )

refRow-=dist1;

Else

If ( dist2 <= max(1, min (dist1, tolerance)) )

refRow+=dist2;

Else

continue;

Count the number of decrements/increase on the reference row

until the black pixel changes to white pixel (run);

If ( run >= threshold )

continue;

remove the vertical black sequences on the IMAGE;

Listing 3: Sta lines removal algorithm.

A.7 Baum-Welch Algorithm

The Baum-Welch algorithm or the Forward-Backward procedure consider the forward variable

αt(i) dened as

αt(i) = P (o1o2 . . . t, qt = Si|λ)

that is, the probability of the partial observation sequence, o1o2 . . . t, and state Si at time t,

given the model λ and the backward variable βt(i) dened as

βt(i) = P (ot+1ot+2 . . . T , qt = Si, λ)

that is, the probability of the partial observation sequence from t+ 1 to the end, given state Siat time t and the model λ. Besides these variables the Baum-Welch algorithm also needs two

more auxiliary variables that can be expressed in terms of the forward and backward variables:

ξ(i, j) = P (qt = Si, qt+1 = Sj |o, λ)

A.7. Baum-Welch Algorithm 95

that is, the probability of being in state Si at time t, and state Sj at time t+1, given the model

and the observation sequence. This is the same as,

ξ(i, j) =P (qt = Si, qt+1 = Sj , o|λ)

P (o|λ)

Using forward and backward variables this can be expressed as,

ξ(i, j) =αt(i)aijbj(ot+1)βt+1(j)

∑Ni=1

∑Nj=1 αtiaij(i)bj(ot+1)βt+1(j)

(A.25)

The second variable is the a posteriori probability,

γt(i) =N∑

j=1

ξt(i, j) (A.26)

In forward and backward variables this can be expressed by,

γt(i) =αt(i)βt(i)∑Ni=1 αt(i)βt(i)

Assuming a starting model λ = (A;B;π), this algorithm starts by computing the α's and β's

as follows. The initialization of the forward probabilities as the joint probability of state Siand initial observation o1:

α1(i) = πibi(o1), 1 ≤ i ≤ N

the next step consists of the product over all the N possible states Si, 1 ≤ i ≤ N at time t:

αt+1(j) =N∑

i=1

[αt(j)aij ] bj(ot+1), 1 ≤ t ≤ T − 1, 1 ≤ j ≤ N

This result in the probability of Si at time t + 1 with all the accompanying previous partial

observations. The αt+1(j) is obtained by accounting for observation ot+1 in state j, that is, by

multiplying the summed quantity by the probability bj(ot+1).The backward procedure starts

by the initialization of the βT (i):

βT (i) = 1, 1 ≤ i ≤ N

and the calculation of

βt(j) =N∑

j=1

aijbj(ot+1)βt+1(j), 1 ≤ i ≤ N, t = T − 1, T − 2, . . . , 1.

Then the algorithm computes the ξ's and γ's using Equations A.25 and A.26, respectively. The

next step is to update the HMM parameters according to

πi = γ1(i), 1 ≤ i ≤ N (A.27)

aij =∑T−1

t=1 ξt(i, j)∑T−1t=1 γt(i)

, 1 ≤ i ≤ N, 1 ≤ j ≤ N (A.28)

bj(k) =

∑Tt=1

s.tot=vk

γt(j)∑T

t=1 γt(j), 1 ≤ j ≤ N, 1 ≤ k ≤M (A.29)


A.8 Viterbi Algorithm

Let δt(i) = maxq1,q2,...,qt−1 P [q1, q2, . . . , qt = i, o1o2 . . .t |λ] the highest probability along a single

path, at time t, which accounts for the rst t observations and ends in state Si. By induction

we have

δt+1(j) =[maxiδt(i)aij

]bj(Ot+1)

The array ψt(i) has the track of δt+1(j) for each t and j. The procedure is described below:

1. Initialization:

δ1(i) = πibi(o1), 1 ≤ i ≤ N

ψ1(i) = 0

2. Recursion:

δt(j) = max1≤i≤N

[δt−1(i)aij ] bj(ot), 2 ≤ t ≤ T, 1 ≤ j ≤ N

ψt(j) = arg max1≤i≤N

[δt−1(i)aij ] , 2 ≤ t ≤ T, 1 ≤ j ≤ N

3. Termination:

P ∗ = max1≤i≤N

[δT (i)]

q∗T = arg max1≤i≤N

[δT (i)]

4. Path backtracking:

q∗T = ψt+1(qt+1)∗, t = T − 1, T − 2, . . . , 1

AppendixB

Table of confusion

B.1 Results Obtained Without Elastic Matching for the

Handwritten Music Symbols

Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown

Accent 47 0 0 0 0 0 0 0 0 0 0 0 0 0BassClef 0 6 0 0 0 0 0 0 0 0 0 0 0 0Beam 1 0 107 0 0 0 0 0 0 0 0 0 0 1Flat 0 0 0 57 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 0 79 0 0 0 0 0 0 0 0 0Note 0 0 0 1 0 113 2 0 0 0 0 0 0 0NoteFlag 0 0 0 1 0 4 25 0 0 0 0 0 0 0NoteOpen 0 0 0 0 0 1 0 4 0 0 0 0 0 1RestI 0 0 0 0 0 0 0 0 33 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 100 0 0 0 0Sharp 0 0 0 1 0 1 0 0 0 0 83 0 0 1Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 5 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 24 0Unknown 0 0 4 3 3 6 4 1 0 2 1 0 0 77

Table B.1: Table of confusion of the nearest neighbor classier.



Table B.2: Table of confusion of the neural network.

97

98 Appendix B. Table of confusion



Table B.3: Table of confusion of the support vector machines.



Table B.4: Table of confusion of the hidden Markov model.

B.2. Results Obtained Without Elastic Matching for the Printed Music Symbols 99

B.2 Results Obtained Without Elastic Matching for the

Printed Music Symbols

AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown

AltoClef 50 0 0 0 0 0 0 0 0 0 0 0 0 0Beam 0 73 0 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 38 0 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 0 0 75 1 0 0 0 0 0 0 0 0NoteFlag 0 0 0 0 1 28 0 1 0 0 0 0 0 0NoteOpen 0 0 0 0 0 0 77 0 0 0 0 0 0 0TieSlur 0 0 0 0 0 0 0 28 0 0 0 0 0 2RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 15 0 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 0 0 0 0 0 3 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 0 0 0 0 1 0 3 7 0 2 0 0 0 63











Table B.8: Table of confusion of the hidden Markov Model.

B.3. Results Obtained With Elastic Matching for the Handwritten Music Symbols 101

B.3 Results Obtained With Elastic Matching for the

Handwritten Music Symbols










B.4. Results Obtained With Elastic Matching for the Printed Music Symbols 103

B.4 Results Obtained With Elastic Matching for the Printed

Music Symbols










AppendixC

Articles Submited to Conferences

105

A Shortest Path Approach for Staff Line Detection

Ana Rebelo

FCUP and INESC Porto

Portugal

[email protected]

Artur Capela

FEUP and INESC Porto

Portugal

[email protected]

Joaquim F. Pinto da Costa

FCUP

Portugal

[email protected]

Carlos Guedes

ESMAE

Portugal

[email protected]

Eurico Carrapatoso


Portugal

[email protected]

Jaime S. Cardoso


Portugal

[email protected]

Abstract

Many music works produced in the past still exist only as

original manuscripts or as photocopies. Preserving them

entails their digitalization and consequent accessibility in

a digital format easy-to-manage. The manual process to

carry out this task is very time consuming and error prone.

Optical music recognition (OMR) is a form of structured

document image analysis where music symbols are isolated

and identified so that the music can be conveniently pro-

cessed. While OMR systems perform well on printed scores,

current methods for reading handwritten musical scores by

computers remain far from ideal. One of the fundamen-

tal stages of this process is the staff line detection. In this

paper a new method for the automatic detection of mu-

sic stave lines based on a shortest path approach is pre-

sented. Lines with some curvature, discontinuities, and in-

clination are robustly detected. The proposed algorithm be-

haves favourably when compared experimentally with well-

established algorithms.

1. Introduction

The impact of music in our lives can hardly be overes-

timated. Music is a pivotal part of our cultural heritage

and its preservation, in all of its forms, must be pursued.

Frequently, the preservation of many music works entails

the digitalization of these works and consequent accessi-

bility in a format that encourages browsing, analysis and

retrieval. In fact, many music works produced during the

last centuries still exist only as original manuscripts or as

photocopies. The digitalization of these works is therefore

a highly desirable goal. Unfortunately, the ambitious goal

of providing generalized access to handwritten scores that

were never published has been severely hampered by the ac-

tual state-of-the-art of handwritten music recognition. The

manual process required to recognize handwritten musical

symbols in scores and to put them in relationship with the

spine structure is very time consuming.

Despite the fact that OMR systems dealing with ma-

chine printed scores exhibit good performance, handwrit-

ten music recognition introduces several additional difficul-

ties. Outstanding problems include notation varying from

writer to writer, and possibly varying in the same score:

symbols and staff lines written with different sizes, shapes

or intensity. Despite the continued research on OMR, with

the availability of several commercial OMR systems, we are

still lacking a satisfactory performance in terms of precision

and reliability. Most of the existing work provides a real ef-

ficiency only when quite regular, printed music sheets are

processed. This condition is exacerbated with handwritten

music scores. This justifies the research around the defini-

tion of reliable OMR algorithms.

Staff line detection is one of the fundamental stages of

the OMR process, with subsequent processes relying heav-

ily on its performance. The reasons for detecting and re-

moving the staff lines lie on the need to isolate the musical

symbols for a more efficient and correct detection of each

symbol presented on the score.

The detection of staves is complicated by a variety of

reasons. The handwritten staff lines are rarely straight and

horizontal, and are not parallel to each other. For exam-

ple, some staves may be tilted one way or another on the

same page or they may be curved. This is especially true

for handwritten scores. Since these scores tend to be rather

irregular and determined by a person’s own writing style,

the staff lines might be twisted, being curved and not re-

ally horizontal at all. It depends on how regular the author

writes the symbols in his scores. There might also be big-

ger or smaller gaps along a staff line. And if we consider

that most of these works are old, the quality of the paper

in which it is written might have degraded throughout the

years, making it a lot harder to correctly identify its con-

tents.

In this paper a method for the automatic detection of staff

lines based on a shortest path approach is presented. The

proposed paradigm uses the image as a graph, where the

staff lines result as the shortest path between the two mar-

gins of the image.

This introduction is concluded with a brief review of the

work done in this area. In section 2 the proposed algorithm

is described. In section 3, the proposed algorithm is exper-

imentally evaluated using real music scores. Finally, con-

clusions are drawn and future work is outlined in section 4.

1.1. Related Works

Different methods for staff line detection have been re-

searched. The simplest approach consists on finding local

maxima in the horizontal projection of the black pixels of

the image [2]. These local maxima represent line positions.

This method assumes straight and horizontal lines. Several

horizontal projections can be made with different image ro-

tation angles, keeping the image in which the local maxima

are bigger. This eliminates the assumption that the lines are

always horizontal. An alternative strategy for identifying

staff lines is to use vertical scan lines [3, 7, 8, 13]. More

recent works present a more or less sophisticated use of a

combination of projection techniques to improve on the ba-

sic approach [1].

Fujinaga [5] incorporates a set of image processing tech-

niques in the algorithm, including run-length coding (RLC),

connected-component analysis, and projections. After ap-

plying the RLC to find the thickness of staff lines and the

space between the staff lines, any vertical black runs that are

more than twice the staff line height are removed from the

original. Then, the connected components are scanned in

order to eliminate any component whose width is less than

the staffspace height. After a global deskewing, taller com-

ponents, such as slurs and dynamic wedges are removed.

Other techniques for finding stave lines include the appli-

cation of mathematical morphology algorithms [11, 15, 6],

rule-based classification of thin horizontal line segments

[10], and line tracing [12, 14].

In spite of the variety of methods available, they all suffer

from some limitations. In particular, lines with some curva-

ture or discontinuities are inadequately resolved. The dash

detector [9] is one of few works that try to handle discon-

tinuities. The dash detector is an algorithm that searches

the image, pixel by pixel, finding black pixel regions that

it classifies as stains or dashes. Then, it tries to unite the

dashes to construct lines.

2. A Shortest Path Approach for Staff Line De-

tection

A staff line can be considered as a path from the left

side of the music score to the right side. As staff lines are

almost the only extensive black objects on the music score,

the path we are looking for is the shortest path between the

two margins if paths (almost) entirely through black pixels

are favoured.

In the work to be detailed, the image grid is considered

as a graph with pixels as nodes and edges connecting neigh-

bouring pixels. Therefore, some graph concepts are in or-

der.

2.1. Definitions and Notation

A graph G = (V,A) is composed of two sets V and A.V is the set of nodes, and A the set of arcs (p, q), p, q ∈ V .The graph is weighted if a weight w(p, q) is associated toeach arc, and it is called a digraph if the arcs are directed,

i.e., (p, q) 6= (q, p). A path from p1 to pn is a list of unique

nodes p1, p2, . . . , pn, (pi, pi+1) ∈ A. The path cost is thesum of each arc weight in the path.

In graph theory, the shortest-path problem seeks the

shortest path connecting two nodes; efficient algorithms are

available to solve this problem, such as the well-known Di-

jkstra algorithm [4].

2.2. General Framework Description

As mentioned before, a staff line corresponds to a path

from (almost) the left margin of the image to (almost) the

right side of the image, (almost) always through black pix-

els.

Starting by modelling the edge image as a graph, match

a node to each pixel. Connect two nodes with an arc on

the graph iff the corresponding pixels are neighbours (8-

connected neighbourhoods) on the image. The weight of

each arc is a function of pixels values and pixels relative

positions (see Figure 1):

w5w1

w3

w7

w2 w4

w8 w6

Figure 1. Arc weight between two pixels.

wi =

f(p, qi) if qi ∈ 4-connected neighbourhood of ph(p, qi) if qi 6∈ 4-connected neighbourhood of p

In this work we set h(., .) =√

2f(., .).The objective is then to design the weights wi such that

the weights are low for the pixels of interest (black) and

high otherwise (this will lead to small weighted distances

from left to right for paths through staff lines and large for

the rest). Therefore this setting will favour paths through

black pixels, as required. We set

f(p, q) =

c1 if p or q are black pixels

c2 otherwise

with c2 > c1. In this work c1 and c2 were experimentally

determined as 2 and 6, respectively. Note that c1 must be set

greater than zero in order to also favour the smallest path,

when more than one exists through black pixels only. Fi-

nally, the solution to the shortest path problem will yield

the intended staff line.

2.3. Illustrative Examples

In order to get a better intuition of the general result, it is

instructive to first explore some basic examples. Consider

a music score with a single staff line as represented in Fig-

ure 2(a). In Figure 2(b) the shortest paths between starting

(a) A single staff line.

s5 e5

s1 e1

(b) The shortest path between

some pairs of points.

Figure 2. A first exemplificative example.

points si on the left margin and ending points ei on the right

margin, at the same row, are traced. All paths get attracted

by the staff line. A similar condition is verified when we

have more than one staff line (see Figure 3). Now paths

get attracted to the nearest staff line. Half of the paths in-

between two consecutive staff lines goes along the top staff

line; the other half follows the bottom staff line.

The last example, in Figure 4, shows that music symbols

placed on top of staff lines do not interfere with the detec-

tion of the staff lines. Moreover, the example also makes

(a) A pair of staff lines.

s1 e1

s8 e8



Figure 3. A second exemplificative example.

(a) Skewed staff lines with music

symbols.



Figure 4. A third exemplificative example.

clear that slight skewed scores do not pose any problem to

the proposed approach.

Nonetheless, some issues are visible and need to be con-

veniently addressed. Due to the skew of the staff lines, some

of the shortest paths jump between consecutive staff lines.

The text on the top of the score also constitutes a false low

weight path between the two margins, inducing some paths

to go through it. Finally, even when a staff line is correctly

followed by the path, the initial and the final parts of the

path should be ignored.

2.4. Proposed Algorithm

To detect the staff lines, the proposed overall algorithm

starts by estimating the staffspace height. This length will

be used as a reference length for the subsequent operations.

Robust estimators of the staffspace height are already in

common use. The technique starts by computing the verti-

cal run-lengths representation of the image. If a bit-mapped

page of music is converted to vertical run-lengths coding,

the most common black-runs represents the staff line height

and the most common white-runs represents the staffspace

height [5].

After the estimation of the staffspace height, the pro-

posed approach applies the main step of the framework:

for each row of the image, the shortest path between

the leftmost pixel and the rightmost pixel is found, us-

ing the Dijkstra algorithm [4]. Instead of consider-

ing the whole image when computing the shortest path,

only a strip centred on the row of interest is used—

see Figure 5. This allows to constrain the complex-

ity of the algorithm. On the experiments we have set

STRIP HEIGHT=STAFFSPACE HEIGHT.

2 x STRIP HEIGHT

Figure 5. Vertical strip around the row of in-

terest.

Now, because the search for the shortest path is con-

strained to stay in a vertical strip (or because the nearest

staff line is far enough from the current row), the shortest

path may not follow a staff line. Therefore, the main step is

followed by a sequence of (arguably) sensible rules, aimed

at discarding false staff lines. First, all paths without a per-

centage of black pixels above a threshold BLACK PERC

are discarded (a threshold of BLACK PERC = 0.75 wasused on the experiments).

Next, each retained path is trimmed at the beginning and

at the end. As visible in the previous examples (refer to

Figure 4), before meeting with a staff line, a path travels

through a sequence of white pixels. Likewise, after the end

of the staff line, the path goes again through a sequence of

white pixels until it meets the right margin of the image. In

order to ignore all these white pixels, the initial pixels of

the path are discarded until a run of at least BLACK RUN

black pixels are found in the path. In the same way, all

pixels of the path after the last occurrence of a run of at

least BLACK RUN black pixels are discarded. A threshold

of BLACK RUN = 2× STAFFSPACE HEIGHT was usedon the experiments.

Finally, the proposed overall algorithm ends with the val-

idation of the preserved paths. Because the staff lines are

expected to be straight lines, the linear correlation coeffi-

cient of the x and y components of the pixels of the pathis used to reject paths that do not meet the linearity crite-

rion, with a correlation coefficient bellow a threshold COR-

RCOEF. The threshold used here is conservative, since for

manuscript lines the (perfect) linearity is not guaranteed.

This precaution is needed because staves on a page are often

distorted in different ways.

2.4.1 Summation

The shortest path approach has some advantages over stan-

dard algorithms presented in the literature for the detection

of staff lines:

• While almost all current approaches provide only iso-lated pieces of a staff line, the approach proposed

here outputs a complete line, with starting and ending

points.

• The staff lines are rarely straight and horizontal, andare not parallel to each other. For example, some staves

may be tilted one way or another on the same page or

they may be curved. While current approaches apply

a chain of heuristics to correct these undesired imper-

fections, the shortest path algorithm is naturally robust

to these challenging conditions.

• The proposed approach is robust to broken staff lines(due to low-quality digitalization or low-quality origi-

nals) or staff lines as thin as one pixel. Missing pieces

are automatically ’completed’ by the algorithm.

For reference, the overall algorithm is summarized in

Listing 1 with pseudo-code.

StaffLine_Detection(IMAGE, CORRCOEF)

STAFFSPACE_HEIGHT = computeStaffSpaceHeight();

BLACK_PERC = 0.75;

BLACK_RUN = STAFFSPACE_HEIGHT;

STRIP_HEIGHT = STAFFSPACE_HEIGHT;

for (int row = 0; row < ImageHeight; row++)

Point2D start(0, row);

Point2D end(ImageWidth-1, row);

Path path = findShortestPath (start, end);

if(blackPercentage(path) < BLACK_PERC)

continue;

path = trimPath(path);

if(corrcoef(path) < CORRCOEF)

continue;

addPathToSetOfStaffLines(path);

Listing 1: Shortest Path StaffLine Detection.

3. Results

We now present additional, complete, examples of the

proposed framework for automatic staff lines detection. The

code for all the examples in this paper has been written in

C++1. For comparison purposes, the method by Fujinaga

[5] was also implemented. Three types of scores were used

1The source code is available upon request to the authors.

in the experiments: machine printed scores—see Figure 6;

handwritten music scores sitting on regular staff lines—see

Figure 7 and Figure 8; and irregular handwritten scores—

see Figure 9. Before applying the staff line detection algo-

rithms, each image was binarized.

Analysing the results for Fujinaga’s method, we observe

that some information is lost, information that could be im-

portant for the recognition tasks which follow. The places

where musical symbols are present create gaps on the de-

tected staff lines; in some places the gaps are larger than the

space occupied by those symbols—see Figure 7(c). This

is mainly due to the low quality of the lines. The shortest

path approach is able to overcome these conditions because

it follows a continuous path connecting both line ends. The

robustness of the proposed approach to possible defects in

staff lines (curvature, discontinuities) is apparent in the ex-

ample of Figure 7(c).

Occasionally, the shortest path algorithm retains paths

that do not go completely through a staff line. This prob-

lem is prone to happen when there’s a large beamed note

density—see Figure 9(b)—making it go through the beams

for this set of notes (e.g. quavers and semiquavers) or when

the curvature of the lines is such that the path jumps be-

tween consecutive lines. This condition can be overcome

by automatically learning the best threshold on the linear-

ity test of the path for each image being processed, or by

the incorporating additional rules to validate the path. Both

approaches are currently being investigated. Nevertheless,

considering handwritten scores, we can see that the pro-

posed algorithm has good and promising results.

4. Conclusion

The first challenge faced by an OMR system is staff line

detection. This first task dictates the possibility of success

for the recognition of the music score. And when it comes

to handwritten music scores, existing solutions are far from

presenting satisfactory results.

In this paper, a new algorithm for the automatic detec-

tion of staff lines in music scores was proposed. The short-

est path approach for staff line detection algorithm brings a

new and promising approach to the staff line detection task.

The technique is adaptable to a wide range of image condi-

tions, thanks to its intrinsic robustness to (slightly) skewed

images, discontinuities, and curved staff lines.

There are several directions of research to pursue with

the framework introduced in this paper. We are currently in-

vestigating the computational feasibility of, instead of find-

ing the shortest path between a starting point on the left mar-

gin and an ending point on the right margin at the same row,

finding the shortest path between a starting point on the left

margin and the whole right margin. That will allow coping

with severely skewed scores. Another line of investigation

is the automatic learning of the parameters of the algorithm.

Although most of the parameters values are scaled by the

staff height space, the threshold for the test of linearity is

still manually set. This improvement could make the algo-

rithm work with a wide range of types of scores, without re-

quiring any parameter tuning by the user. Finally, more than

just having spatial continuity of the path— enforced by the

shortest path algorithm—, we are also pursuing the continu-

ity of the direction of the path. As staff lines do not suffer

abrupt changes of direction, even when manually drawn, it

seems sensible to impose this additional constraint.

Acknowledgments

This work was partially funded by Fundacao para a

Ciencia e Tecnologia (FCT) - Portugal through project

PTDC/EIA/71225/2006.

References

[1] D. Bainbridge. Extensible Optical Music Recognition. PhD

thesis, Department of Computer Science, University of Can-

terbury, Christchurch, NZ, 1997.

[2] D. Blostein and H. S. Baird. A critical survey of music image

analysis. In Baird, Bunke, and Y. (Eds.), editors, Structured

Document Image Analysis, pages 405–434. Springer-Verlag,

Heidelberg, 1992.

[3] N. P. Carter. Automatic Recognition of Printed Music in the

Context of Electronic Publishing. PhD thesis, Departments

of Physics and Music, University of Surrey, 1989.

[4] E. W. Dijkstra. A note on two problems in connexion with

graphs. Numerische Mathematik, 1:269–271, 1959.

[5] I. Fujinaga. Staff detection and removal. In S. George, edi-

tor, Visual Perception of Music Notation: On-Line and Off-

Line Recognition, pages 1–39. Idea Group Inc., 2004.

[6] I. Gawedzki. Optical music scores recognition. Technical

report, 2002.

[7] S. Glass. Optical music recognition. B.Sc thesis, 1989.

[8] H. Kato and S. Inokuchi. A recognition system for printed

piano music using musical knowledge and constraints. In

Proceedings of the International Association for Pattern

Recognition Workshop on Syntactic and Structural Pattern

Recognition, pages 231–248, 1990.

[9] I. Leplumey, J. Camillerapp, and G. Lorette. A robust de-

tector for music staves. In Proceedings of the International

Conference on Document Analysis and Recognition, pages

902–905, 1993.

[10] J. V. Mahoney. Automatic analysis of music score images.

B.Sc thesis, 1982.

[11] B. R. Modayur, V. Ramesh, R. M. Haralick, and L. G.

Shapiro. Muser: A prototype musical score recognition sys-

tem using mathematical morphology. Machine Vision and

Applications, 6:140–150, 1993.

[12] D. Prerau. Computer pattern recognition of standard en-

graved music notation. PhD thesis, Department of Computer

Science and Engineering, MIT, 1970.

(a) Original score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].

Figure 6. Results for a machine-printed score.


Figure 7. Results for a handwritten score.

[13] T. Reed. Optical music recognition. Master’s thesis, Depart-

ment of Computer Science, University of Calgary, Canada,

1995.

[14] J. W. Roach and J. E. Tatem. Using domain knowledge in

low-level visual processing to interpret handwritten music:

an experiment. Pattern Recognition, 21(1):33–44, 1988.

[15] M. Roth. OMR-optical music recognition. Master’s thesis,

Swiss Federal Institute of Tecnology, 1992.

(a) Original skewed score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].

Figure 8. Results for a skewed handwritten score.


Figure 9. Results for a handwritten score.

STAFF LINE DETECTION AND REMOVAL WITH STABLE PATHS

Artur Capela, Ana RebeloINESC Porto, Campus da FEUP, Rua Dr. Roberto Frias 378, 4200-465 Porto, Portugal

[email protected], [email protected]

Jaime S. CardosoINESC Porto, Faculdade de Engenharia, Universidade do Porto, Portugal

[email protected]

Carlos GuedesINESC Porto, Escola Superior de Musica e Artes do Espectaculo, Portugal

[email protected]

Keywords: Music, optical character recognition, document image processing, image analysis.

Abstract: Many music works produced in the past are currently available only as original manuscripts or as photocopies.Preserving them entails their digitalization and consequent accessibility in a machine-readable format, whichencourages browsing, retrieval, search and analysis while providing a generalized access to the digital material.Carrying this task manually is very time consuming and error prone. While optical music recognition (OMR)systems usually perform well on printed scores, the processing of handwritten music by computers remainsbelow the expectations. One of the fundamental stages to carry out this task is the detection and subsequentremoval of staff lines. In this paper we integrate a general-purpose, knowledge-free method for the automaticdetection of staff lines based on stable paths, into a recently developed staff line removal toolkit. Linesaffected by curvature, discontinuities, and inclination are robustly detected. We have also developed a staffremoval algorithm adapting an existing line removal approach to use the stable path algorithm at the detectionstage. Experimental results show that the proposed technique outperforms well-established algorithms. Thedeveloped algorithm will now be integrated in a web based system providing seamless access to browsing,retrieval, search and analysis of submitted scores.

1 INTRODUCTION

The Universal Declaration on Cultural Diversityadopted by the General Conference of UNESCO on2001 asserts that cultural diversity is as necessary forhumankind as biodiversity is for nature, and that poli-cies to promote and protect cultural diversity thus arean integral part of sustainable development. Beingmusic a pivotal part of our cultural heritage, its preser-vation, in all of its forms, must be pursued. Fre-quently, the preservation of many music works en-tails their digitalization and consequent accessibilityin a format that encourages browsing, analysis and re-trieval.

There is a vast amount of invaluable paper-basedheritage, including printed and handwritten musicscores, which are deteriorating over time due to natu-ral decaying of paper and chemical reaction (i.e., be-tween written ink and paper). Various efforts havebeen focused on this issue in order to preserve therecord of our heritage. Digitisation has been com-

monly used as a possible tool for preservation. Al-though a digital copy may not conserve the originaldocument, it can preserve the most important part:its data. It has also the advantages of easy dupli-cations, distribution, and digital processing. Never-theless, the output of the digitalization process is notamenable for further analysis or semantic search op-erations. Thus, an Optical Music Recognition (OMR)process is needed. However, the manual process re-quired to recognize handwritten musical symbols inscores and to put them in relationship with the spinestructure of the score is very time consuming. Thisjustifies the research around reliable automatic OMRalgorithms as current solutions are still below the ex-pectations.

As a concrete example, Portugal has a notoriouslack in music publishing from virtually all eras ofits musical history. However, whereas most of theknown original music manuscripts before the twen-tieth century are kept at the National Library Archivein Lisbon, there is virtually no national repository for

the Portuguese music from the twentieth century. Al-though there are recent efforts in order to catalogueand preserve in digital form the Portuguese musicfrom the late twentieth century—notably the MusicInformation Center (MIC, 2008) and the section onmusical heritage from the Institute of the Arts web-site (IOA, 2008)—most of the music pre-dating com-puter notation software was never published and stillexists as manuscripts or photocopies spread out allover the country in inconspicuous places. The riskof irreversibly losing this rich cultural heritage is thusa reality.

1.1 OMR System

The project “Optical recognition system for handwrit-ten music scores” initiated in 2007 by INESC Portoand ESMAE is the point of departure for creatinga web-based system of music manuscripts of Por-tuguese composers from the twentieth century. Thisdatabase will provide generalized access to a widecorpus of unpublished handwritten music encoded inMusicXML, which can be accessed remotely via theInternet. The database will not only centralize asmuch information as possible but will also serve topreserve this corpus in a way that is easily accessi-ble for browsing, analysis, and ultimately, for per-forming this repertoire, therefore helping to keep thePortuguese music alive (Capela et al., 2008). Al-though the aim of this project is Portuguese music,it is equally valid for all printed and handwritten mu-sic scores that need to be preserved from all aroundthe world.

The ambitious goal of providing generalized ac-cess to handwritten scores that have never been pub-lished has been severely hampered by the actual state-of-the-art of handwritten music recognition. Thereare currently various commercial OMR software so-lutions (Capella software1, SharpEye Music Reader2,OMeR3) and a few open source solutions (AOMR24,OpenOMR5, Audiveris6), but they are all offline stan-dalone applications. The existing online archives ofmusic scores (Lester Levi Collection, 2008; ClassicalSheet Music Collection, 2008; Mutopia Collection,2008) usually provide them in inadequate formats—

1http://www.capella-software.com/capscan.htm

2http://www.visiv.co.uk3http://www.myriad-online.com/en/products/

omer.htm4http://www.bzzt.net/˜arnouten/wiki/index.

php/Gamera#AOMR2:_omr_toolkit5http://sourceforge.net/projects/openomr6http://audiveris.dev.java.net

usually only as the scanned score image—for retrievalor automatic analysis. These online archives are merestandard websites, without facilities for optical recog-nition, editing and searching through the scores mu-sical content. The creation of an OMR system, in-tegrating optical recognition, storage, search, brows-ing and downloading capabilities, while keeping thescores in their original format along with their digitalcounterpart, would therefore be extremely beneficial.An integrated score editor would be provided in orderto view and edit the submitted music scores.

In our previous work on this project we havepresented a complete OMR System solution—OMRSYS (Capela et al., 2008)—comprising adatabase driven web application with one or moreOMR applications integrated in the proposed system.The proposed architecture successfully attends thestated objectives. At the end of our project we planon developing a fully functional system according tothe specified architecture and integrating a completeOMR package.

1.2 Detection and Removal of StaffLines

Staff line detection and removal are the first funda-mental stages on the OMR process, with subsequentprocesses relying heavily on their performance. Thereasons for detecting and removing the staff lines lieon the need to isolate the musical symbols for a moreefficient and correct detection of each symbol presenton the score. Although their primary application is asa preprocessing step in the recognition of music no-tation, the line detection problem also occurs in dif-ferent contexts (e.g., the recognition of bank transferforms).

The detection of staves is complicated due to avariety of reasons. The handwritten staff lines arerarely straight and horizontal, and are not parallel toeach other. For example, some staves may be tiltedone way or another on the same page or they may becurved. These scores tend to be rather irregular anddetermined by a person’s own writing style. More-over, if we consider that most of these works are old,the quality of the paper in which it is written mighthave degraded throughout the years, making it a lotharder to correctly identify its contents.

In (Cardoso et al., 2008a; Cardoso et al., 2008b)we presented a new and robust staff line detection al-gorithm based on a stable paths approach. The pro-posed paradigm uses the image as a graph, where thestaff lines result as connected paths between the twolateral margins of the image. A staff line can be con-sidered as a connected path from the left side to the

right side of the music score. As staff lines are almostthe only extensive black objects on the music score,the path to look for is the shortest path between thetwo margins if paths (almost) entirely through blackpixels are favoured.

In this paper we present our recent work and re-sults focusing on the implementation of the StablePaths algorithm as a C++ plugin for the MusicStavesToolkit (Dalitz et al., 2008; MusicStaves Toolkit,2008) (based on the Gamera Framework (MacMillanet al., 2002; Gamera Framework, 2008)), as well ason the removal stage by using our detection algorithmin the first stage. We have also adapted a removal al-gorithm based on the LineTrack Height approach pro-posed in (Dalitz et al., 2008), which we will present insection 3. In section 2 the Gamera and MusicStavesToolkit are presented, together with our C++ plugin.In section 4, both our proposed detection and removalalgorithms are evaluated experimentally using a well-known dataset of music scores. Finally, conclusionsare drawn and future work is outlined in section 5.

2 STABLE PATHS INTEGRATION

In this section we present the platform in which ourdetection algorithm was integrated for testing and val-idation. The platform is comprised by its core—theGamera Framework (MacMillan et al., 2002; GameraFramework, 2008)—and by a toolkit for the evalua-tion of detection and removal algorithms—the Mu-sicStaves Toolkit (Dalitz et al., 2008; MusicStavesToolkit, 2008). This toolkit is a set of Gamera plu-gins aiming to support the development and test ofstaff line detection and removal algorithms for musicscores, by extending the Gamera functionality.

After we describe each part constituting this plat-form, we present the implementation of our staff linedetection algorithm as a C++ plugin on the Music-Staves Toolkit. Finally, we present the integration ofour detection algorithm on some removal algorithmsin the MusicStaves toolkit. We have integrated ouralgorithm in those removal algorithms where the de-tection is processed separately from the removal op-eration, replacing the Dalitz algorithm (Dalitz et al.,2008) as the detection stage.

2.1 Gamera Framework

Gamera (MacMillan et al., 2002; Gamera Framework,2008) is a portable and open source framework to cre-ate structured documents analysis applications by do-main experts. The name Gamera is an acronym to

“Generalized Algorithms and Methods for Enhance-ment and Restoration of Archives”. It combines a pro-gramming library with graphical tools for an interac-tive training and development of recognition systems.This framework tries to be a tool to create custom ap-plications through the domain experts knowledge in-stead of responding to various requirements with amonolithic application. It aims at providing an ef-ficient test and refinement development cycle. Theprogramming language in its basis is the high-levellanguage Python, although it has many extensionswritten in C++ to carry out low-level image process-ing. Nevertheless, due to its nature, Python turns thecode writing process more agile and facilitates the useof scripting, which makes Gamera an interactive andbatchable framework. Besides the large number of ex-tensions deployed with this framework, it is also pos-sible to customize and extend with plugins or toolkits,written in Python and C++.

The Gamera framework is modular and organizedin a series of horizontal layers, which can be seen atFigure 1.

Figure 1: Gamera Architecture.Gamera follows a modular plugin architecture. It

is made of modules (plugins), both written in Pythonand C++, integrated in a high-level scripting environ-ment. Each module executes a task on the recognitionprocess. The framework maintains a toolbox designapproach, i.e., a user has access to a large set of toolsfor the optical recognition stages.

2.2 MusicStaves Toolkit

MusicStaves (Dalitz et al., 2008; MusicStavesToolkit, 2008) is a Gamera toolkit specific for the de-velopment and test of staff line detection and removalalgorithms. This Phyton toolkit also integrates facili-ties to create a test set of music scores and to evaluateresults with established metrics. As with Gamera, this

toolkit is portable and its source code is freely avail-able. In order to use MusicStaves, it must be importedto Gamera, either in the GUI or in a programmaticmanner.

MusicStaves is structured as seen in Figure 2. Thetoolkit is composed by a set of main classes where thestaff line detection and removal algorithms are imple-mented, and by a set of plugins in Python and C++.Some plugins are used by the implemented algorithmswhile others are tools to aid the algorithms testing. Itis an extendable toolkit in which one can integrate anew staff line detection and/or removal algorithm. Itsplugin system follows the Gamera framework ratio-nale and as such the plugins may be written in Pythonand C++. In order to write new staff line detectionor removal algorithms the toolkit provides two inter-faces: StaffFinder (detection) and MusicStaves (re-moval).

Figure 2: MusicStaves Architecture.

A test set of 32 synthetic music scores is also pro-vided by the authors of this toolkit. Moreover, thetoolkit allows applying a set of deformations (e.g., ro-tation, curvature, typeset emulation, white speckles)commonly found in the real world to these perfectscores—see Figure 3. The purpose is to be able tomeasure the performance of the removal algorithmscontained in MusicStaves by using three defined er-ror metrics (Dalitz et al., 2008): Pixel Level, Seg-mentation Region Level and Staffline Interruptions.However, the same set may be used to evaluate staffline detection algorithms alone by defining adequateerror metrics (Cardoso et al., 2008a; Cardoso et al.,2008b). We have restricted the range of the param-eters controlling the intensity of the deformations tovalues considered realistic.

2.3 Integration

We have integrated our recently proposed algorithmfor staff line detection (Cardoso et al., 2008a; Car-doso et al., 2008b) in MusicStaves as a C++ plugin.

Figure 3: Some examples of applied deformations: a) Cur-vature; b) Rotation; c) White Speckles; d) Staffline Inter-ruptions; e) Staffline y-variation; f) Typeset Emulation. See(Dalitz et al., 2008) for details.

The plugin encompasses the main StaffFinder class inthe toolkit root, the respective Python plugin in thePython plugins folder and the algorithm implementa-tion in C++ in the C++ plugins folder. In Figure 4we present a diagram of the algorithm as integrated inMusicStaves.

The algorithm processing starts with the call to themethod find staves, which receives a binarized imageas input. That image is then passed to the functionin C++ by the plugin Python code. In the C++ im-plementation, after the respective staff line detectionfunction is called, the image is converted to the formatused internally by our algorithm. After the whole de-tection task is complete it returns the staff lines skele-ton in the MusicStaves format. After receiving theskeleton list on its Python class, it fills the structureself.linelist with the obtained values.

Figure 4: Overall view.

Besides integrating our Stable Paths Approach al-gorithm into the MusicStaves toolkit as a C++ plu-gin, we have also integrated the algorithm with staffremoval algorithms in order to evaluate the improve-ments over the original results. However, this is nota standard integration as the toolkit does not providethe means for this kind of integration. This integra-tion was coded in the staff removal algorithms by

adding the possibility to choose the staff line detec-tion algorithm through a parameter value. A diagramillustrating this integration can be seen in Figure 5.Some removal algorithms present in MusicStaves de-tect the staff lines along with the removal process.From those, we have used the one with best resultsin general (Dalitz et al., 2008)—Skeleton—for com-parison purposes.

Figure 5: Stable Paths integration with staff removal algo-rithms.

3 STAFF REMOVALALGORITHMS

In the process of recognizing music scores the staff re-moval algorithm is processed after the staff line detec-tion stage takes place. In our current work the removalalgorithm is based on the LineTrack Height algorithmpresented on (Randriamahefa et al., 1993). The goalon this method is to track the staff lines positions ob-tained by a detection algorithm and remove verticalrun sequences of black pixels that have a value lowerthan a specified threshold, which was experimentallyset at 2*staffHeight.

As music scores may suffer from deformations,the staff lines may have descontinuities, be curved orinclined. These problems will influence the successto achieve a correct detection of lines contained onthe score to recognize. However, due to the above-mentioned problems, the positions of the staff linesobtained by a staff line detection algorithm may passslightly above or under the real staff lines positions.That way, if we are in presence of a white pixelwhen the staff lines are tracked, we search verticallyfor the closest black pixel. If that distance is lowerthan a specified tolerance—experimentally chosen as1+ceil(staffHeight/3.0)—we move the reference posi-tion of the staff line to the position of the black pixelfound.

On (Dalitz et al., 2008) a new method ispresented—Skeleton—which uses the skeleton infor-mation, but performs the staff removal on the original

Algorithm 1 Staff Removal Algorithm.procedure STAFFLINEREMOVAL(IMAGE,STAV ES)threshold = 2∗staffHeight;

tolerance = 1+ ceil(staffHeight/3.0);IMAGE REMOV E = copy(IMAGE);

for nvalid = 0 to STAVES size doPoint2D staff = validStaves[nvalid];for i = 0 to staff size do

col =staff[i].x;re f Row =staff[i].y;row = re f Row;pel = valuePixel(IMAGE, IMAGE REMOV E)decrement/increase the reference row until

one pixel different from white pixel (dist1/dist2) isfound;

if dist1≤max(1,min(dist2, tolerance)) thenre f Row−= dist1;

elseif dist2 ≤ max(1,min(dist1, tolerance))

thenre f Row+ = dist2;

elsecontinue;

end ifend ifCount the number of decrements/increase on

the reference row until the black pixel changes to whitepixel (run);

if run≥ threshold thencontinue;

end ifremove the vertical black sequences on the

IMAGE;end for

end forend procedure

image instead of the skeleton. The method relies onthe fact that symbols on the stafflines lead to junctionpoints or corner points in the skeleton.

4 RESULTS

Although the evaluation of new staff detection algo-rithms may be done by visually inspecting the out-put on a set of scores—as adopted on (Rebelo et al.,2007)—, our current comparison is supported onquantitative measures. The test set adopted for thequalitative evaluation of the proposed method is theone presented in (Dalitz et al., 2008) and already de-scribed. It consists of ideal synthetic scores to whicha set of known deformations have been applied—see(Dalitz et al., 2008) for more details. In total we havegenerated 2688 deformed images originated from 32perfect scores. In order to conveniently measure theperformance of staff line removal algorithms we have

adopted two error metrics from (Dalitz et al., 2008):Pixel Level and Segmentation Region Level.

Staff line detection algorithms can be used as afirst step in many staff removal algorithms. To un-derstand the potential of our algorithm to leveragethe performance of existing staff removal algorithms,we conducted a series of experiments, comparing theoriginal version of a staff removal algorithm with themodified version of it, making use of the Stable Pathsalgorithm at the staff line detection step. The quanti-tative comparison of the different algorithms is totallyin line with the comparison presented in (Dalitz et al.,2008).

With respect to the considered distortions, regard-ing the detection stage, the Stable Paths based ap-proach outperforms the Dalitz algorithm. In Figure 6we present our results for the removal algorithms:LineTrack Height (with Dalitz and Stable Paths),Skeleton and LineTrack Height Modified (with StablePaths). We chose the methods that present the best re-sults in (Dalitz et al., 2008), implementing our ownremoval algorithm with LineTrack Height as a basis.In general we verify that the replacement of the Dalitzmethod by our Stable Paths Approach algorithm asthe staff detection step has improved the final staffline removal results7. Additionally, the LineTrackingHeight Modified algorithm presents an overall betterperformance than the original LineTrack Height algo-rithm from (Dalitz et al., 2008). Our staff line de-tection and removal approaches also outperform theSkeleton method, although it continues to present acompetitive performance. We have not integrated theStable Paths algorithm with the Skeleton algorithmas the second performs the lines detection along withtheir removal instead of using two separate stages. Allthe parameters, both on the Stable Paths detection al-gorithm and LineTrack Height Modified, were pre-liminary tuned over an independent set of images.

This performance gain is even more noteworthy asthe MusicStaves algorithms are receiving as input thecorrect number of lines per staff. Had not this been thecase, the differential between both would have beenmuch larger. In summary, these experiments show thestrength of the algorithms presented here. Despite be-ing based on simple and intuitive underlying princi-ples, the performance of the proposed algorithms isquite competitive.

The analysed results have covered the detectionand removal accuracy but a brief word on speed is alsoin order. Comparing different algorithms for speedis notoriously difficult; we are simultaneously judg-ing mathematical properties and specific implementa-

7For the deformations not shown, the stable path is notsignificantly better than Dalitz.

tions. In the experimental study, the current imple-mentation of the Stable Paths algorithm run almostas fast as the Dalitz algorithm (20% slower). In re-spect to the removal algorithms our LineTrack Heightversion with Stable Paths is significantly faster thanthe Skeleton algorithm (two times faster). Compar-ing to the original LineTrack Height algorithm withthe Dalitz detection algorithm the runtime differenceis not significant. The algorithms were evaluated asavailable at the Staff Removal Toolkit (Dalitz et al.,2008).

5 CONCLUSIONS

This paper presented the integration of our robust Sta-ble Paths Approach algorithm (Cardoso et al., 2008a;Cardoso et al., 2008b) in the MusicStaves Toolkit(Dalitz et al., 2008) as a C++ plugin, an improvedversion of an existing staff line removal algorithm—LineTrack Height (Dalitz et al., 2008), and the re-sults we have obtained in our staff line removal tests.We have integrated our detection algorithm with ex-isting staff line removal algorithms. Our approachsuccessfully deals with the difficulties posed by thesymbols superimposed on the staff lines as well as awide range of image conditions (e.g., discontinuities,curved lines), frequently found on handwritten scores.

The encouraging results lead us now to considerinvestigating the detection of music symbols bene-fiting from the improved staff line detection and re-moval, creating a complete OMR application in or-der to integrate it on our proposed complete OMRsolution—OMRSYS (Capela et al., 2008). Thus, ourproposed system offers a complete solution for thepreservation of our musical heritage. It includes anoptical recognition engine integrated with an archiv-ing system and a user-friendly interface for search-ing, browsing and edition. The digitized scores arestored in MusicXML, a recent and expanding musicinterchange format designed for notation, analysis, re-trieval, and performance applications.

Our proposed algorithms and complete OMR sys-tem promote the creation of a full corpus of mu-sic documents, promoting its preservation and study.This project will culminate in the creation of a repos-itory of handwritten scores, accessible online. Thedatabase will be available for enjoyment, educationaland musicological purposes, thus preserving this cor-pus of music in an unprecedented way.

(a) Curvature. (b) Curvature. (c) White Speckles.

(d) Rotation. (e) Staffline Y-Variation. (f) Staffline Y-Variation.

(g) Staffline Interruptions. (h) Typeset Emulation Part 1. (i) Typeset Emulation Part 2.

Figure 6: Effect of different deformations on the overall staff removal error rates. See (Dalitz et al., 2008) for parameterdetails.

ACKNOWLEDGEMENTS

This work was partially funded by Fundacao paraa Ciencia e a Tecnologia (FCT) - Portugal throughproject PTDC/EIA/71225/2006.

REFERENCESCapela, A., Cardoso, J. S., Rebelo, A., and Guedes, C.

(2008). Integrated recognition system for musicscores. In International Computer Music Conference(ICMC) (Accepted).

Cardoso, J. S., Capela, A., Rebelo, A., and Guedes, C.(2008a). A connected path approach for staff detec-tion on a music score. In International Conference onImage Processing (ICIP 2008) (Accepted).

Cardoso, J. S., Capela, A., Rebelo, A., and Guedes, C.(2008b). Staff detection with stable paths. Trans-actions on Pattern Analysis and Machine Intelligence(TPAMI) (Submitted).

Classical Sheet Music Collection (2008). Classical SheetMusic and MIDI Files, http://www.music-scores.com.

Dalitz, C., Droettboom, M., Pranzas, B., and Fujinaga, I.(2008). A comparative study of staff removal algo-rithms. IEEE Transactions on Pattern Analysis andMachine Intelligence, 30(5):753–766.

Gamera Framework (2008). Gamera Framework, http://ldp.library.jhu.edu/projects/gamera/.

IOA (2008). Institute of the Arts, http://patrimonio.dgartes.pt/?lang=pt.

Lester Levi Collection (2008). The Lester S. Levi Collec-tion of Sheet Music, http://levysheetmusic.mse.jhu.edu/.

MacMillan, K., Droettboom, M., and Fujinaga, I. (2002).Gamera: Optical music recognition in a new shell. InInternational Computer Music Conference (ICMC),pages 482–485.

MIC (2008). Music Information Center, http://www.mic.pt.

MusicStaves Toolkit (2008). MusicStaves Toolkit,http://lionel.kr.hsnr.de/˜dalitz/data/projekte/stafflines/.

Mutopia Collection (2008). The Mutopia Project, http://sca.uwaterloo.ca/Mutopia.

Randriamahefa, R., Cocquerez, J. P., Fluhr, C., Ppin, F., andPhilipp, S. (1993). Printed music recognition. In Pro-ceedings of the 2nd International Conference on Doc-ument Analysis and Recognition (ICDAR’93), pages898–901.

Rebelo, A., Capela, A., da Costa, J. F. P., Guedes, C., Carra-patoso, E., and Cardoso, J. S. (2007). A shortest pathapproach for staff line detection. In AXMEDIS ’07:Proceedings of the Third International Conference onAutomated Production of Cross Media Content forMulti-Channel Distribution, pages 79–85, Washing-ton, DC, USA. IEEE Computer Society.

INTEGRATED RECOGNITION SYSTEM FOR MUSIC SCORES

Artur Capela, Jaime S. CardosoFEUP and INESC Porto

[email protected]

[email protected]

Ana RebeloFCUP and INESC Porto

[email protected]

Carlos GuedesESMAE and INESC Porto

[email protected]

ABSTRACT

Many music works produced in the last century still existonly as original manuscripts or as photocopies. Preservingthem entails their digitalization and consequent accessibil-ity in a digital format easy-to-manage which encouragesbrowsing, retrieval, search and analysis while providinga generalized access to the digital material. The manualprocess to carry out this task is very time consuming anderror prone. Automatic optical music recognition (OMR)has emerged as a partial solution to this problem. How-ever, the full potential of this process only reveals itselfwhen integrated in a system that provides seamless accessto browsing, retrieval, search and analysis. We addressthis demand by proposing a modular, flexible and scal-able framework that fully integrates the abovementionedfunctionalities. A web based system to carry out the auto-matic recognition process, allowing the creation and man-agement of a musiccorpus, while providing generalizedaccess to it, is a unique and innovative approach to theproblem. A prototype has been implemented and is beingused as a test platform for OMR algorithms.

1. INTRODUCTION

The impact of music in our lives can hardly be overesti-mated. Music is a pivotal part of our cultural heritage andits preservation, in all of its forms, must be pursued.

Portugal has a notorious lack in music publishing fromvirtually all eras of its musical history. However, whereasmost of the known original music manuscripts before thetwentieth century are kept at the National Library Archivein Lisbon, there is virtually no national repository for thePortuguese music from the twentieth century. Althoughthere are recent efforts in order to catalogue and preservein digital form the Portuguese music from the late twen-tieth century—notably the Music Information Center [10]and the section on musical heritage from the Institute ofthe Arts website [8]—most of the music pre-dating com-puter notation software was never published and still ex-ists as manuscripts or photocopies spread out all over thecountry in inconspicuous places.

For example, all the music composed by Jorge Peixi-nho (1940-1995), an internationally-renowned composerwho epitomized the Portugueseavant-gardein the 1960sand 1970s, was never published in Portugal (few of his

scores were published abroad), and almost his entireoeu-vre consists of manuscript paper [6]. Almost fourteenyears past his death, his music is already catalogued, al-though not published. Unfortunately, this case is not unique,and this situation is common with other great Portuguesecomposers from the twentieth century. The risk of irre-versibly losing this rich cultural heritage is thus a reality.

The project “Optical recognition system for handwrit-ten music scores” initiated in 2007 by INESC Porto andESMAE is the point of departure for creating a web-basedsystem of music manuscripts of Portuguese composers fromthe twentieth century. This database will provide gen-eralized access of a widecorpusof handwritten unpub-lished music encoded in MusicXML that can be accessedremotely via the Internet. The database will not only cen-tralize as much information as possible but will also serveto preserve thiscorpusin a way that is easily accessiblefor browsing, analysis, and ultimately, for performing thisrepertoire, therefore help keeping the Portuguese musicalive.

The ambitious goal of providing generalized access tohandwritten scores that have never been published has beenseverely hampered by the current state-of-the-art of hand-written music recognition. There are currently variouscommercial OMR software solutions [4, 16, 14] and a fewopen source solutions [1, 15, 2], but they are all offlinestandalone applications. The existing online archives ofmusic scores [9, 5, 13] usually provide them in inadequateformats—usually only as the scanned score image—forretrieval or automatic analysis. These online archives aremere standard websites, without facilities for optical recog-nition, editing and searching through the scores’ musicalcontent. The creation of an OMR system, integrating opti-cal recognition, storage, search, browsing and download-ing capabilities, while keeping the scores in their originalformat along with their digital counterpart, would there-fore be extremely beneficial.

Using this background, we present the specification andimplementation of a system integrating all the requiredfeatures. It uniquely combines OMR technology in a sys-tem, easing the conversion of scores to the Music Ex-tended Markup Language format [12]—MusicXML—asit is being widely adopted and meets our needs. In Sec-tion 2 the proposed system is described. We continue inSection 3 by presenting a usage scenario. Finally, conclu-sions are drawn in Section 4.

2. SYSTEM ARCHITECTURE ANDIMPLEMENTATION

The system that we propose on this paper comprises thecreation of a database of music scores and a web applica-tion mainly featuring:

• Addition of music scores to the system, performingtheir recognition and conversion to MusicXML inan integrated manner, allowing the user to confirmand correct the conversion results at the last stage ofthis process.

• Complete maintenance of a fully navigable musicscores archive, including both the original versionand the digital version obtained from the opticalrecognition.

• Browsing and searching the database, as well as theMusicXML contents. Visualization, downloadingand edition of the selected music scores.

• Complete system management.

The architecture for the proposed system is based on aclient-server model. The system is intended to be accessi-ble through the Internet. There are three different entitiespresent in this system, as it can be seen in Figure 1.

LAN

Internet

Web BrowserClient 1

Web BrowserClient 2

Web BrowserClient N

Repository 1Repository 2

Repository N

Web Server

OMR engine 1OMR engine 2OMR engine N

Search engine

Figure 1. Generic system architecture

TheRepositorymodule stores the original scanned score,the digital counterpart in MusicXML and all the descrip-tive metadata inserted by the user, as detailed latter. Allthe remaining system contents, such as the user informa-tion, are also stored in this entity.

TheWeb Serveris the user access point to the system aswell as to all of its processing modules run on the server,encompassing the search engine and the optical recogni-tion engine for the music scores. There is support for theinclusion of several OMR Engines, aiming to provide theability to meet different needs (e.g. different music no-tation systems). The most adequate OMR Engine can bechosen manually by the user or automatically by the sys-tem, by detecting the scores notation and type (i.e. hand-written or printed). Our Search Engine allows not onlygeneric searches throughout all of the system contents,but it also provides the capability of searching throughoutthe music scores MusicXML information in an innovativemanner. The Web Server interacts with the Repository and

with the Web Browser, which establishes the interface be-tween the user and the system.

The user interface on aWeb Browserallows the com-plete management of the music scores and associated meta-data, as well as carrying out the system administration.Generaly speaking, the user interface provides the user theability to execute all the necessary tasks to fully use theproposed system. On the administration side it is possibleto manage the users, as well as the whole system contentsand validate new ones.

There are four user types which can access the system:General User, Registered User, Privileged User and theAdministrator. The General User represents a visitor andmay only consult and download contents. The remain-ing types are registered users and according to their levelthey may add/edit/remove certain contents with or withoutrestrictions. The Privileged User is similar to the Admin-istrator and has full access to all functionalities, thoughit cannot manage the users from its own level. The con-tents added by Registered Users have to be validated byPrivileged Users or the Administrator, although the latertwo are able to add any contents without the need to bevalidated, they are considered to be trustful users.

For content management, functionalities include the ad-dition of music scores to the Repository, their automaticrecognition, visualization and edition, searching, and brows-ing. It is also possible to insert and browse informationrelated to the music scores—name, authors, instruments,musical genres—providing the user with a Repository con-taining all the necessary information to keep a completemusiccorpus. This metadata can then be used by the useron search queries or for a more flexible browsing expe-rience. Finally, a music work is organized into sections,where each section is a music score which represents apart from the whole music work. This flexible structureallows accommodating either simple or complex works onthe system. Each score can be visualized and its represen-tation in MusicXML edited, side-by-side with the originalscore directly on the Web Browser. Both the visualizationand the edition of the music scores are done in a graphi-cal easy-to-use editor available through the Web Browser.Figure 2 illustrates the information recorded in the systemassociated with a music work.

Figure 2. Content stored in the Repository associatedwith a music work

2.1. Prototype Implementation – OMRSYS

We developed a prototype—OMRSYS—taking the sys-tem architecture shown in Figure 1 as a basis. Currently,the prototype supports only a single Repository collocatedwith the Web Server and a single OMR Engine was usedto prove this concept.

The Repository is implemented as a PostgreSQL1 data-base, an open source Database Management System (DBMS).The main reasons for choosing PostgreSQL were its na-tive XML support, needed for creating a search enginethat allows searching on the scores’ MusicXML counter-parts stored on the database. This is an important aspectas it is a major feature for the proposed system. It isalso a very mature and well documented popular DBMS.Another important feature is the great ease of integrationon the framework we have selected to develop the pro-totype, which we will describe next. MySQL2 is the de-fault DBMS used on the framework chosen for developingthe Web Application, but it lacks on XML support. OtherDBMSs were also considered but PostgreSQL was the onethat best fitted our needs.

The development of the Web Application was supportedon Ruby on Rails3 . Rails is an open source, full-stackframework for developing dynamic database-backed webapplications according to the Model-View-Control (MVC)pattern. It is an almost complete platform which requiresonly a DBMS and a server. These features present us witha suitable choice for supporting the development of ourprototype. Ruby is the programming language at its core,a flexible and powerful Object Oriented language, with avast array of powerful characteristics. Another strong ad-vantage is the database manipulation, as it is greatly sim-plified, which is ideal to develop a system of this kind.Other frameworks [17, 7] were considered but fell shortcompared to the features of Ruby on Rails.

The Web Server selected for the prototype was the Apa-che HTTP Server4 with Mongrel5 to execute the WebApplication. This solution is suggested by Ruby on Railsand was found suitable, being both efficient and open source.

The Search Engine developed for this prototype allowsthe usual queries on the database contents, although thegroundbreaking search through the MusicXML contentsis still not possible at this stage.

The OMR Engine on the Web Server is the moduleresponsible for the automatic recognition of the submit-ted music scores. We initially adapted it from the opensource OpenOMR project [15]. The OMR Engine per-forms an automatic conversion of a submitted music scoreto a digital easy-to-use representation, the MusicXML for-mat. This digital format allows representing sheet musicby its musically-relevant parts, sections, phrases and mo-tives, thus easing the access to the relevant portions of thescore while browsing that score in a computer monitor.

1 http://www.postgresql.org2 http://www.mysql.com3 http://www.rubyonrails.org4 http://httpd.apache.org5 http://mongrel.rubyforge.org

Figure 3. The OMRSYS user interface

Simultaneously, MusicXML enables the retrieval of rel-evant musical information for analysis, thus facilitatingcertain types of computational analysis to be performedon acorpusof scores. Nevertheless, it also provides anadequate way to restore old sheet music, preventing themfrom oblivion. The other OMR applications we have anal-ysed were left aside because they were either less com-plete than OpenOMR at the time or they were commercialsolutions. Our main goal at this stage was to prove theconcept.

The User Interface has a great impact in the user ex-perience and is divided into several sections. There is theauthentication, title and quick search sections on the upperportion of the screen, followed by the middle and largestportion which includes the main menu and the contentsarea. The main menu works as a two-level expansiblemenu and allows access to all the system’s functionalitiesby grouping them in a logical manner. Similar functionali-ties follow similar designs to keep the interface consistent,intuitive and easy to learn. Some of the functionalities arethe common Create/Read/Update/Delete (CRUD) and thelisting of the database contents based on a chosen criteria,all following a familiar behaviour. The main differencesand most unique aspects rely on the submission and theupdate of music scores, which is discussed in Section 3.As an interface example, Figure 3 shows the GraphicalUser Interface (GUI) being used for browsing the musicscores available in the Repository. Both the PrivilegedUsers and the Administrator have additional options in themain menu for validation purposes.

The MusicXML Editor for this prototype is still imple-mented as a plain text editor and viewer, but already show-ing side-by-side the original score with its MusicXMLcounterpart. However, the development of a fully inte-grated graphical MusicXML editor is being pursued. Sucheditor would allow a higher level and intuitive edition andcould be developed for example in Flash, in the likes ofMusicRain [11], an online interactive sheet music viewer.The main purpose for the editor in this initial prototypewas to give the end-user the possibility to at least viewand edit the music scores in MusicXML. The existing edi-tors and visualizers [11] usually have a high maturity levelbut they are offline applications.

Figure 4. Score submission scenario

The Digital Rights Management control is done in twoways: the acceptance of a license agreement at the reg-istration process and the validation of submitted musicscores by a Privileged User before they become availableon the system.

3. USAGE SCENARIO: SCORE SUBMISSION

When the insertion of a music score in the system is re-quested, the user inserts the metadata associated with themusic score, of which some is optional—name, year, de-scription, etc—and associates it with one or more authorsand a musical genre. Each section of a music work hasto be associated with the instruments present on the musicscore. In the last submission step, after the insertion of therequested metadata, the user submits the various pages foreach section, as illustrated in Figure 4.

After validating the inserted data, the user then triggersthe automatic recognition process by calling a suitableOMR engine. Afterward, an overview of the submittedscore is shown by listing its contents allowing the user toview the result of the automatic process on the built-in ed-itor side-by-side with the original scanned image, offeringthe user the possibility to manually correct the automaticresults. After confirming the results and making the nec-essary corrections, the user then finalizes the music scoresubmission by accepting it. If the score was submittedby the Administrator or a Privileged User it then becomesimmediately available on the system; if it was submittedby a standard Registered User it is kept on queue for val-idation and will only become available once a user withadministration privileges validates it.

4. CONCLUSION

The proposed system offers a complete solution for thepreservation of our musical heritage. It includes an opti-cal recognition engine integrated with an archiving systemand a user-friendly interface for searching, browsing andedition. The digitized scores are stored in MusicXML, arecent and expanding music interchange format designedfor notation, analysis, retrieval, and performance applica-

tions. An additional benefit of the automatic conversion ofthe music score to MusicXML is the possibility of encod-ing the manuscript score in MX format, an XML-base,multi-layered format for music representation [3]. MXsynchronizes several layers belonging to the descriptionof a piece of music, e.g. an audio recording and score ofthe same piece.

A system of this kind promotes the creation of a fullcorpusof music documents, promoting its preservationand study. This project will culminate in the creation ofa repository of the handwritten scores, accessible online.The database will be available for enjoyment, educationaland musicological purposes, thus preserving thiscorpusof music in an unprecedented way.

Acknowledgments

This work was partially funded by Fundacao para a Cienciae a Tecnologia (FCT) - Portugal through projectPTDC/EIA/71225/2006.

5. REFERENCES

[1] AOMR2. [Online]. Available: http://www.bzzt.net/∼arnouten/wiki/index.php/Gamera#AOMR2:omr toolkit

[2] Audiveris. [Online]. Available: http://audiveris.dev.java.net

[3] A. Barate, G. Haus, and L. Ludovico, “An XML-based for-mat for advanced music fruition,” inProceedings of theThird Sound and Music Computing Conference, 2006, pp.141–147.

[4] Capella-scan. [Online]. Available: http://www.capella-software.com/capscan.htm

[5] Classical Sheet Music and MIDI Files. [Online]. Available:http://www.music-scores.com

[6] Delgado, Cristina, Machado, Jorge, and J. Machado,Catalogos da obra de Jorge Peixinho, ser. Jose Machado(ed.) Jorge Peixinho: In Memoriam. Lisbon: Caminho,2002.

[7] Google Web Toolkit. [Online]. Available: http://code.google.com/webtoolkit

[8] Institute of the Arts. [Online]. Available: http://patrimonio.dgartes.pt/?lang=pt

[9] The Lester S. Levi Collection of Sheet Music. [Online].Available: http://levysheetmusic.mse.jhu.edu/

[10] Music Information Center. [Online]. Available: http://www.mic.pt

[11] MusicRain. [Online]. Available: http://musicrain.us

[12] The MusicXML Format. [Online]. Available: http://www.musicxml.org

[13] The Mutopia Project. [Online]. Available: http://sca.uwaterloo.ca/Mutopia

[14] OMeR. [Online]. Available: http://www.myriad-online.com/en/products/omer.htm

[15] OpenOMR. [Online]. Available: http://sourceforge.net/projects/openomr

[16] SharpEye Music Reader. [Online]. Available: http://www.visiv.co.uk

[17] Tacos. [Online]. Available: http://tacos.sourceforge.net

A CONNECTED PATH APPROACH FOR STAFF DETECTION ON A MUSIC SCORE

Jaime S. Cardoso∗, Artur Capela†, Ana Rebelo‡, Carlos Guedes§

ABSTRACT

The preservation of many music works produced in the past en-tails their digitalization and consequent accessibility in an easy-to-manage digital format. Carrying this task manually is very time con-suming and error prone. While optical music recognition systemsusually perform well on printed scores, the processing of handwrit-ten musical scores by computers remain far from ideal. One of thefundamental stages to carry out this task is the staff line detection. Inthis paper a new method for the automatic detection of music stafflines based on a connected path approach is presented. Lines af-fected by curvature, discontinuities, and inclination are robustly de-tected. Experimental results show that the proposed technique con-sistently outperforms well-established algorithms.

Index Terms— Music, optical character recognition, documentimage processing, image analysis

1. INTRODUCTION

The Universal Declaration on Cultural Diversity adopted by the Gen-eral Conference of UNESCO on 2001 asserts that cultural diversityis as necessary for humankind as biodiversity is for nature, and thatpolicies to promote and protect cultural diversity thus are an inte-gral part of sustainable development. Being music a pivotal partof our cultural heritage, its preservation, in all of its forms, mustbe pursued. Frequently, the preservation of many music works en-tails their digitalization and consequent accessibility in a format thatencourages browsing, analysis and retrieval. In fact, many musicworks produced during the last centuries still exist only as originalmanuscripts or as photocopies. The digitalization of these worksis therefore a highly desirable goal. Unfortunately, the ambitiousgoal of providing generalized access to handwritten scores that werenever published has been severely hampered by the actual state-of-the-art of handwritten music recognition. The manual process re-quired to recognize handwritten musical symbols in scores and toput them in relationship with the spine structure of the score is verytime consuming. This justifies the research around the definition ofreliable optical music recognition (OMR) algorithms.

Staff line detection is one of the fundamental stages of the OMRprocess, with subsequent processes relying heavily on its perfor-mance. The reasons for detecting and removing the staff lines lieon the need to isolate the musical symbols for a more efficient andcorrect detection of each symbol presented on the score.

The detection of staves is complicated due to a variety of rea-sons. The handwritten staff lines are rarely straight and horizontal,

∗INESC Porto, Faculdade de Engenharia, Universidade do Porto, Portu-gal,email: [email protected]

†INESC Porto, Faculdade de Engenharia, Universidade do Porto, Portu-gal,email: [email protected]

‡INESC Porto, Faculdade de Ciencias, Universidade do Porto, Portugal,email: [email protected]

§INESC Porto, Escola Superior de Musica e Artes do Espectaculo, Portu-gal,email: [email protected]

and are not parallel to each other. For example, some staves may betilted one way or another on the same page or they may be curved.These scores tend to be rather irregular and determined by a per-son’s own writing style. Moreover, if we consider that most of theseworks are old, the quality of the paper in which it is written mighthave degraded throughout the years, making it a lot harder to cor-rectly identify its contents.

In this paper a method for the automatic detection of staff linesbased on a connected path approach is presented. The proposedparadigm uses the image as a graph, where the staff lines result as theconnected path between the two margins of the image. Our previouswork [1] was a first effort to explore this concept. We now presenta complete, principled solution, with a strong experimental valida-tion of the proposed approach. This introduction is concluded with abrief review of the work done in this area. In section 2 the proposedalgorithm is described. In section 3, the proposed algorithm is eval-uated experimentally using a well-known dataset of music scores.Finally, conclusions are drawn and future work is outlined in sec-tion 4.

1.1. Related Works

The problem of staff line detection is often considered simultane-ously with the goal of their removal, although exceptions exist [2, 3].The simplest approach consists on finding local maxima in the hor-izontal projection of the black pixels of the image [4]. These localmaxima represent line positions, assuming straight and horizontallines. Several horizontal projections can be made with different im-age rotation angles, keeping the image in which the local maximaare bigger. This eliminates the assumption that the lines are alwayshorizontal. An alternative strategy for identifying staff lines is to usevertical scan lines [5]. More recent works present a more or less so-phisticated use of a combination of projection techniques to improveon the basic approach [6].

Fujinaga [7] incorporates a set of image processing techniquesin the algorithm, including run-length coding (RLC), connected-component analysis, and projections. After applying the RLC tofind the thickness of staff lines and the space between the staff lines,any vertical black runs that are more than twice the staff line heightare removed from the original. Then, the connected componentsare scanned in order to eliminate any component whose width isless than the staff space height. After a global deskewing, tallercomponents, such as slurs and dynamic wedges are removed.

Other techniques for finding staff lines include the application ofmathematical morphology algorithms [8], rule-based classificationof thin horizontal line segments [9], and line tracing [10, 11]. Themethods proposed in [2, 3] operate on a set of ‘staff segments’, withmethods for linking two segments horizontally and vertically andmerging two segments with overlapping position into one.

In spite of the variety of methods available, they all suffer fromsome limitations. In particular, lines with some curvature or discon-tinuities are inadequately resolved. The dash detector [12] is one offew works that try to handle discontinuities. The dash detector is an

algorithm that searches the image, pixel by pixel, finding black pixelregions that it classifies as stains or dashes. Then, it tries to unite thedashes to construct lines.

2. A CONNECTED PATH APPROACH FOR STAFF LINEDETECTION

A staff line can be considered as a connected path from the left sideof the music score to the right side. As staff lines are almost the onlyextensive black objects on the music score, the path we are lookingfor is the shortest path between the two marginsif paths (almost)entirely through black pixels are favoured. More formally, lets andtbe two pixels of the image andPs,t a path over the image connectingthem. We are interested in finding the pathP that optimizes somepredefined distanced(s, t). This criterion should embed the need tofavour black pixels.

In the work to be detailed, the image grid is considered as agraph with pixels as nodes and edges connecting neighbouring pix-els. The weight of each arc,w(p, q), is a function of pixels valuesand pixels relative positions. A path from vertex (pixel)v1 to vertex(pixel) vn is a list of unique verticesv1, v2, . . . , vn, with vi−1 andvi corresponding to neighbour pixels. Thepath cost is the sum ofeach arc weight in the path

Pni=2 w(vi−1, vi).

As mentioned before, a staff line corresponds to a path from (al-most) the left margin of the image to (almost) the right side of theimage, (almost) always through black pixels. If the weight assignedto an edge captures the intensity of the path of the adjacent pixels,finding the best path between a points on the left margin and a pointt on the right margin translates into computing the minimum accu-mulated weight along all possible connected curves connectings andt:

d(s, t) = minPs,t

Xw(p, q). (1)

Staff lines are best modelled as paths between two regionsΩ1 andΩ2, the left and right margins of the score. The shortest path betweentwo regionsΩ1 andΩ2 is defined as a pathPs,t, with s ∈ Ω1 andt ∈ Ω2 and cost equal to

d(Ω1, Ω2) = mins∈Ω1,t∈Ω2

d(s, t). (2)

One may assume that staff lines do not zigzag back and forth,left and right. Therefore, one may restrict the search among con-nected paths containing one, and only one, pixel in each column ofthe image1. Formally, let I be an×m image and define an admissi-ble staff to be

s = (j, y(j))nj=1 , s.t.∀j |y(j)− y(j − 1)| ≤ 1,

wherey is a mappingy : [1, · · · , n] → [1, · · · , m]. That is, a staffline is an 8-connected path of pixels in the image from left to right,containing one, and only one, pixel in each column of the image.

Given the weight functionw(p, q), one can define the cost ofa staff asC(s) =

Pni=2 w(vi−1, vi). The optimal staff line that

minimizes this cost can be found using dynamic programming. Thefirst step is to traverse the image from the second column to the lastcolumn and compute the cumulative minimum cost C for all possibleconnected staff lines for each entry(i, j):

C(i, j) = min

8><>: C(i− 1, j − 1) + w(pi−1,j−1; pi,j)

C(i− 1, j) + w(pi−1,j ; pi,j)

C(i− 1, j + 1) + w(pi−1,j+1; pi,j)

1This assumption imposes a maximum detectable 45 rotation degree.

At the end of this process, the minimum value of the last column inC will indicate the end of the minimal connected staff. Hence, in thesecond step one backtrack from this minimum entry on C to find thepath of the optimal staff.

2.1. Algorithm outline

Assume one wants to find all staff lines present in a score. Thiscan be approached by successively finding and removing the shortestpath from the left to the right margin of the score. The removaloperation is required to ensure that a staff is not detected twice2.

Consider the music score presented in Figure 1(a). In Fig-ure 1(b) the first 11 shortest paths are traced. This example showsthat music symbols placed on top of staff lines do not interferewith the detection of the staff lines. Moreover, the example alsomakes clear that slight skewed scores do not pose any problem tothe proposed approach.

(a) Skewed staff lines with musicsymbols.

(b) The first shortest paths betweenleft and right margins.

Fig. 1. An exemplificative example.

Our first naive effort to apply the shortest path foundation to staffline detection in [1] did it by computing the shortest path betweentwo pixels at the same height on the left and right margin. Thatapproach was not robust enough to tilted scores, leading the detectedpaths to jump between consecutive staff lines.

Nonetheless, two main issues are still visible with the currentmethodology and need to be conveniently addressed. A criterion isneeded to stop the iterative detection of the shortest paths (staff lines)and the initial and the final parts of a path should be trimmed.

2.2. Proposed Algorithm

To detect the staff lines, the proposed overall algorithm starts by esti-mating the staff space height,staffspaceheight, and staff lineheight,stafflineheight. These lengths are used as referencelengths on subsequent operations. Robust estimators are already incommon use: the technique starts by computing the vertical run-lengths representation of the image. If a bit-mapped page of musicis converted to vertical run-lengths coding, the most common black-runs represents the staff line height and the most common white-runsrepresents the staff space height [7].

After estimating the reference lengths, the proposed approachapplies the main step of the framework, by successively findingthe shortest path between the left and right margin, adding thepath found to the list of staff lines and removing it from the im-age. The weightw(p, q) was experimentally set tow(p, q) =2 + (Ip + Iq)/255, with Ip, Iq ∈ 0, 255, for pixels in a 4-neighbourhood or

√2 times that value for 8-neighbours. The

removal operation sets to white the pixels on a vertical strip ofheight= 2×stafflineheight, centred on the detected staff

2We implemented the removal operation by setting to white the pixels onthe detected staff; image resizing could be a valid alternative [13].

line. To stop the iterative staff line search, a sequence of (arguably)sensible rules is used to validate the last found path; if it does notpass the checking, the iterative search is broken. Two validationrules were applied, both assessing features with respect to the firstdetected staff line (assumed to be the most perfect one). If the lastpath does not have a percentage of black pixels above a threshold, thesearch is broken (a threshold ofblackperc = 0.8 of the percentageof black pixels in the first staff line was used in the experiments).Likewise, if the shape of the last detected path differs too muchfrom the shape of the first detected path (measured as the average y-distance between both paths, after removing the means), the iterationis broken. A thresholdshapediff = 4×staffspaceheightwas experimentally selected.

After the main search step, detected staff lines are post-processed.Although true staff lines never intersect, the above algorithm mayoccasionally create intersecting lines. That may be due to a locallow quality of a line, leading the shortest path to jump betweenconsecutive lines; the next iteration will then follow the remainingsegments, intersecting with the previous detected line. To precludesuch final, undesired state, lines are post-processed to remove in-tersections. That is easily and efficiently accomplished by, for eachimage column, sorting ony the pixels of the detected lines and as-signing thei-pixel to thei-line. After this simple process, lines maytouch but they do not intersect. Each retained line is then trimmed atthe beginning and at the ending. As visible in the previous example(refer to Figure 1(b)), before meeting with a staff line, a path travelsthrough a sequence of white pixels. Likewise, after the end of thestaff line, the path goes again through a sequence of white pixelsuntil it meets the right margin of the image. In order to ignore all ofthese white pixels, the initial pixels of the path are discarded untila run of at leastblackrun black pixels are found in the path. Inthe same way, all pixels of the path after the last occurrence of a runof at leastblackrun black pixels are discarded. A threshold ofblackrun = 2×staffspaceheight was used on the experi-ments. Finally, lines are smoothed with a standard average low-passfilter. A window size of2×staffspaceheight was selected onthe experiments.

3. RESULTS

This section provides experimental results obtained on a set ofscores. Although the assessment of a new staff detection algorithmmay be done by visually inspecting the output on a set of scores—asadopted on [1]—, here we support the comparison on quantitativemeasures. The test set adopted for the qualitative evaluation of theproposed method is the one presented in [14]. The test set consistsof ideal scores to which known deformations can be applied. Thedistortions range from rotation and curvature to typeset emulationand staff line thickness variation—see [14, 15] for more details. Intotal, 630 images were generated from the originally perfect scores.To conveniently measure the performance of a staff line detectionalgorithm, we considered two different error metrics: the number offalse positive staff lines and missed to detect staff lines.

To evaluate these metrics, we start by computing the average Eu-clidian distance between each reference staff line and each actuallydetected staff line; then we solve the matching problem on the result-ing bipartite graph by minimizing the assignment cost (= distance).Only pairs with average error-distance bellow the staff line heightwere considered correctly matched (the other pairs were assumed tooriginate from a false positive staff line being matched to an unde-tected true staff line and were therefore unmatched). Now the twometrics result as the number of unmatched detected staff lines (false

positive) and unmatched reference staff lines (missed to detect).The proposed algorithm was compared with the three methods

considered in [14] for staff line detection3. As Dalitz’s algorithmperforms significantly better than the two other algorithms evaluatedin [14], we have only included Dalitz results in subsequent figures.It is important to state that the comparison reports only to staff linedetection algorithms, not to the removal phase. That explains theneed to introduce the aforementioned metrics, while not adoptingthe metrics introduced in [14] for assessing staff line removal.

The effects of the different deformations over the respective pa-rameter ranges are shown in Figure 2. With respect to the distortionsconsidered, our connected path based approach is the most robustand clearly outperforms the Dalitz algorithm. In fact, the perfor-mance of our approach is almost independent of intensity of the de-formation, for the range of values considered. This performance gainis even more noteworthy as the Dalitz algorithm is receiving as inputthe correct number of lines per staff, while the proposed approachdoes not rely on that information. Had not this been the case, thedifferential between both would have been much larger.

In summary, these experiments show the strengths of the pre-sented algorithm. Despite being based on a simple and intuitiveunderlying principle, the performance of the proposed algorithm isquite competitive. Moreover, the results are prone to be improvedeven further by elaborating the stopping criterion of the iterativesearch or the post-processing rules, while leaving intact the mainprinciple of the method.

4. CONCLUSION

The first challenge faced by an OMR system is staff line detection.This first task dictates the possibility of success for the recognitionof the music score. In the case of handwritten music scores, theexisting solutions are far from presenting satisfactory results.

In this paper, a new algorithm for the automatic detection ofstaff lines in music scores was proposed. The connected path ap-proach for staff line detection algorithm is adaptable to a wide rangeof image conditions, thanks to its intrinsic robustness to skewed im-ages, discontinuities, and curved staff lines. The handwritten stafflines are rarely straight and horizontal, and are not parallel to eachother. Some staves may be tilted one way or another on the samepage or they may be curved. While current approaches apply a chainof heuristics to correct these undesired imperfections, the connectedpath algorithm is naturally robust to these challenging conditions.The proposed approach is robust to broken staff lines (due to low-quality digitalization or low-quality originals) or staff lines as thinas one pixel. Missing pieces are automatically ‘completed’ by thealgorithm.

Acknowledgments

This work was partially funded by Fundacao para a Ciencia e a Tec-nologia (FCT) - Portugal through project PTDC/EIA/71225/2006.

5. REFERENCES

[1] Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, CarlosGuedes, Eurico Carrapatoso, and Jaime S. Cardoso, “A shortestpath approach for staff line detection,” inThird InternationalConference on Automated Production of Cross Media Contentfor Multi-channel Distribution (AXMEDIS 2007), 2007.

3The source code is available upon request to the authors.

−5 0 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Angle (degrees)

erro

r

false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)

(a) Rotation.

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Amplitude/staffwidth

erro

r


(b) Curvature.

0.02 0.04 0.06 0.08 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rate of whitened pixels

erro

r


(c) White speckle.

2 2.5 3 3.5 4 4.5 5 5.5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Maximum deviation n

erro

r


(d) Line y-variation. (e) Shortest path (left) and Dalitz (right) results for rotation(angle=-4).

2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Maximum vertical shift nshift

(for ngap

= 10)

erro

r


(f) Typeset emulation.

Fig. 2. Effect of different deformations on the overall error rates. See [14] for parameter details.

[2] Hidetoshi Miyao and Masayuki Okamoto, “Stave extractionfor printed music scores using dp matching,”Journal of Ad-vanced Computational Intelligence and Intelligent Informatics,vol. 8, pp. 208–215, 2007.

[3] Mariusz Szwoch, “A robust detector for distorted musicstaves,” inComputer Analysis of Images and Patterns, pp. 701–708. Springer-Verlag, Heidelberg, 2005.

[4] Dorothea Blostein and Henry S. Baird, “A critical survey ofmusic image analysis,” inStructured Document Image Anal-ysis, Baird, Bunke, and Yamamoto (Eds.), Eds., pp. 405–434.Springer-Verlag, Heidelberg, 1992.

[5] N. P. Carter, Automatic Recognition of Printed Music in theContext of Electronic Publishing, Ph.D. thesis, Departments ofPhysics and Music, University of Surrey, 1989.

[6] D. Bainbridge, Extensible Optical Music Recognition, Ph.D.thesis, Department of Computer Science, University of Canter-bury, Christchurch, NZ, 1997.

[7] Ichiro Fujinaga, “Staff detection and removal,” inVisual Per-ception of Music Notation: On-Line and Off-Line Recognition,Susan George, Ed., pp. 1–39. Idea Group Inc., 2004.

[8] Ignacy Gawedzki, “Optical music scores recognition,” Tech.Rep., 2002.

[9] J. V. Mahoney, “Automatic analysis of music score images.B.Sc thesis,” 1982.

[10] D. Prerau,Computer pattern recognition of standard engravedmusic notation, Ph.D. thesis, Department of Computer Scienceand Engineering, MIT, 1970.

[11] J. W. Roach and J. E. Tatem, “Using domain knowledge inlow-level visual processing to interpret handwritten music: an

experiment,” Pattern Recognition, vol. 21, no. 1, pp. 33–44,1988.

[12] I. Leplumey, J. Camillerapp, and G. Lorette, “A robust de-tector for music staves,” inProceedings of the InternationalConference on Document Analysis and Recognition, 1993, pp.902–905.

[13] Shai Avidan and Ariel Shamir, “Seam carving for content-aware image resizing,” inACM Transactions on Graphics(SIGGRAPH 2007), 2007, vol. 26.

[14] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski,and Ichiro Fujigana, “A comparative study of staff removalalgorithms,” IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2004.

[15] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski,and Ichiro Fujigana, “Staff removal toolkit for gamera(2005-2007,” .

new methodologies towards an automatic optical …jsc/students/2008anarebelo/2008... · new...

Documents