new methodologies towards an automatic optical …jsc/students/2008anarebelo/2008... · new...
TRANSCRIPT
Ana Maria Rebelo
New methodologies towards an automatic
optical recognition of handwritten musical
scores
Dissertação de Mestrado
Universidade do Porto
Outubro 2008
Universidade do Porto
Departamento de Matemática Aplicada
Tese submetida à Faculdade de Ciências da
Universidade do Porto para obtenção do grau de
Mestre em Engenharia Matemática
New methodologies towards an automatic
optical recognition of handwritten musical
scores
Ana Maria Rebelo
Dissertação realizada sob a supervisão do
Professor Doutor Jaime S. Cardoso (INESC Porto, FEUP)
e sob a co-orientação do
Professor Doutor Joaquim F. Pinto da Costa (DMA, FCUP).
Porto, Outubro 2008
A painter paints pictures on canvas. But musicians paint their pictures on silence.
Leopold Stokowski
Abstract
Many music works produced in the past are currently available only as original manuscripts
or as photocopies. Their preservation has entailed a vast amount of research in the last years.
The digitalization has been commonly used as a possible preservation method which comprises
setting up a digital copy. Despite the fact of an easy accessibility in a machine-readable format,
which encourages browsing, retrieval, search and analysis and, most importantly, the preser-
vation of endangered works, while providing a generalized access to digital content, actually
it only keeps a portion of the information. Several systems have been presented in order to
fulll the lack of a tool able to analyse and perform semantic search operations. Carrying
this task manually is very time consuming and error prone. While optical music recognition
systems usually perform well on printed scores, the processing of handwritten musical scores
by computers remains far from ideal.
One of the fundamental stages in the optical music recognition is the detection and subsequent
removal of sta lines. In this project we investigate a general-purpose, knowledge-free method
for the automatic detection of music sta lines based on stable paths approach. Lines aected
by curvature, discontinuities, and inclination are robustly detected. A sta removal algorithm is
also developed by adapting an existing line removal approach to use the stable paths algorithm
at the detection stage. Experimental results show that the proposed technique consistently
outperforms well-established algorithms. The developed approach will be integrated, as future
work, in a web based system providing seamless access to browsing, retrieval, search and anal-
ysis of submitted scores.
Symbol detection is also crucial for a good performance in an optical music recognition system.
In this work we propose a segmentation method based on the hierarchical decomposition of
music sheets. The symbols are split into four dierent types to facilitate their extraction. Since
we are working with handwritten scores many situations may occur. The work degradation,
highly handwritten dierent types as well as other possible undesirable discontinuities are
situations to take into account. One way to diminish these situations in the classication
process is to create a database where several distortions are simulated. So, a new methodology
is presented where elastic matching is used in conjunction with several classiers, for instance
neural networks and hidden Markov models. A profound comparative study is made about
classiers with and without the elastic matching applied to handwritten scores. It is expected
that the obtained results outperform the actual state of the art.
I
Resumo
Muitos dos trabalhos musicais produzidos no passado estão actualmente disponíveis apenas
como manuscritos originais ou fotocópias. A sua preservação tem causado um vasto leque
de pesquisas nos últimos anos. A digitalização tem sido usualmente utilizada como possível
método de preservação em que se resume à criação de uma cópia digital. Apesar de constituir
uma acessibilidade fácil num formato legível por computador, que permite a consulta, a recu-
peração, a procura e a análise, e mais importante, a preservação das obras em risco, enquanto
promove um acesso generalizado ao material digital, na verdade apenas guarda uma parte da
informação. Vários sistemas têm sido apresentados para preencher a falta de uma ferramenta
que seja capaz de realizar análises e executar operações de pesquisa semântica. Executar esta
tarefa manualmente é deveras dispendioso e susceptível a erros. Enquanto que os sistemas de
reconhecimento óptico musical apresentam, geralmente, um bom desempenho para partituras
impressas, para partituras manuscritas o processo recorrendo ao computador continua longe do
ideal.
Uma das etapas fundamentais em reconhecimento óptico é a detecção e subsequente remoção
das linhas de pauta. Neste projecto é investigado um método de uso geral e conhecimento livre
para a detecção automática das linhas de música baseado na aproximação do caminho estável.
Linhas afectadas por curvatura, discontinuidades e inclinação são detectadas robustamente.
Também é desenvolvido um algoritmo de remoção de linhas adaptando um processo já exis-
tente de remoção para usar o algoritmo do caminho estável na fase de detecção. Os resultados
experimentais obtidos mostram que a técnica proposta supera consistentemente os algoritmos
bem estabelecidos. A aproximação desenvolvida será agora integrada, num trabalho futuro,
num sistema web, proporcionando um acesso à navegação, recuperação, pesquisa e análise das
partituras submetidas.
A detecção dos símbolos é também crucial para um bom desempenho de um sistema de re-
conhecimento óptico musical. Neste trabalho é proposto um método de segmentação baseado
numa decomposição hierárquica da folha de música. Os símbolos são divididos em quatro tipos
diferentes para facilitar a sua extracção. Como estamos a lidar com partituras manuscritas,
diversas situações são susceptíveis de ocorrer. A degradação do trabalho, diferentes tipos de
manuscritos, assim como outras possíveis descontinuidades não desejáveis são situações a ter em
conta. Uma forma de diminuir estes problemas, no processo de classicação, é criar uma base
de dados onde se simulam várias distorções. Neste sentido, uma nova metodologia é apresen-
tada, onde se recorre à modelação elástica em conjunção com vários classicadores, como por
exemplo, redes neuronais e cadeias de Markov escondidas. É apresentado um profundo estudo
comparativo sobre estes classicadores com e sem modelação elástica aplicada às partituras
manuscritas. É esperado que os resultados obtidos superem o estado actual da arte.
Acknowledgements
Was this project successfully accomplished? The period was long and sometimes dicult; how-
ever I believe that the answer is yes. I am thankful to a set of persons who made all this work
possible and fun to perform.
I came to INESC Porto to nish my undergraduate in Mathematical Applied to Technology
and I ended up staying for another year for the realization of my Master course in Mathemat-
ical Engineering. I am sincerely grateful to Prof. Dr. Jaime S. Cardoso, my thesis supervisor,
for giving me the opportunity to continue to work with him, by the trust in my work, by
the availability and patience always demonstrated, for his guidance given with high rigor and
professionalism and by his wise knowledge. I also thank my thesis co-supervisor, Prof. Dr.
Joaquim Pinto da Costa, for the availability to help and guide with all the rigor and erudition.
It was a great pleasure to work in the INESC Porto with a team as fabulous as the UTM that
done everything to integrate me in the group as soon as possible. To all my colleagues who
accompanied me on this journey and in one way or another supported me and encouraged me,
my sincere thanks. I also do a special thank to Artur Capela, my colleague in the OMR project,
for all the patience and support given, and to Prof. Dr. Carlos Guedes, professor from ESMAE
associated with this Project.
I thank INESC Porto for providing the right environment for high-quality research and FCT
(Fundação para a Ciência e Tecnologia) for nancial support.
To my parents who made a huge eort to tolerate and help the realization of my dreams and
aspirations. To my twin sister my greatest appreciation for all that she represents for me, for
her support and the countless hours of complicity available. To D. Lurdes, a very special friend
who made me face the diculties with courage and determination. Last but not least, I thank
a person very special to me, accompanied me on this journey with great patience and spirit
winner, helping me in this work with his critical and rigorous eye. Thank you for your presence
at all times.
I would like to nish these acknowledgments stressing that with determination and humbleness
our dreams can take place.
Ana Maria Silva Rebelo,
October, 2008
III
Contents
Contents IV
List of Figures VI
List of Tables VIII
Glossary X
I Introduction 1
1 Introduction 3
1.1 OMR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contributions of this Project and Related Publications . . . . . . . . . . . . . . 7
1.4 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related Works 10
2.1 State of the Art in Sta Lines Detection . . . . . . . . . . . . . . . . . . . . . . 10
2.2 State of the Art in Musical Symbol Extraction and Classication . . . . . . . . 12
2.3 Background Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
II Detection 21
3 Sta Lines Detection and Removal 23
3.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Underlying Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Algorithm Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Stable Paths on a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Design of the Weight Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Sta Line Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Database of Music Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Evaluation Metrics and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
IV
IIISegmentation and Classication 47
4 Segmentation and Classication Process 49
4.1 Musical Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Segmentation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Classication Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
IVConclusions and Future Work 71
5 Conclusion 73
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References 76
V Appendix 81
A Fundamentals 83
A.1 Primal Problem vs Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.2 Error Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.3 Dalitz's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.4 Matching in Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.5 Otsu Threshold Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.6 Sta Line Removal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.7 Baum-Welch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.8 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
B Table of confusion 97
B.1 Results Obtained Without Elastic Matching for the Handwritten Music Symbols 97
B.2 Results Obtained Without Elastic Matching for the Printed Music Symbols . . 99
B.3 Results Obtained With Elastic Matching for the Handwritten Music Symbols . 101
B.4 Results Obtained With Elastic Matching for the Printed Music Symbols . . . . 103
C Articles Submited to Conferences 105
List of Figures
1.1 Generic system architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 OMR architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 A directed acyclic graphs and its linearization. . . . . . . . . . . . . . . . . . . . . 14
3.1 An exemplicative example of the methodology. . . . . . . . . . . . . . . . . . . . . 26
3.2 Stable paths on a toy example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Exemplication of stable paths for Figure 3.1(a). . . . . . . . . . . . . . . . . . . . 28
3.4 Length of a chord through a skeleton point at some angle ϕ. . . . . . . . . . . . . . 32
3.5 Example (from [DDCF08]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Two examples of music scores from the test set used on the experimental evaluation. 34
3.7 Some examples of applied deformations from the original image: a) Original; b)
Curvature c) Degradation after Kanungo; d) Sta line thickness variation; e) Sta
line y-variation; f) Typeset emulation; g) Rotation; h) White Speckles; i) Sta line
Interruptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Generation of the deformed images and the respective Ground-Truth. . . . . . . . . 38
3.9 Examples of the results obtained using the sta line removal algorithm in our test set. 45
4.1 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Example: a) Tie above noteheads; b) Tie under noteheads. . . . . . . . . . . . . . 54
4.4 An exemplicative example of the variability in the music symbols. . . . . . . . . . 57
4.5 Segmentation process I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Segmentation process II. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7 An exemplicative example of two dierent connected components in the same
bounding box. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.8 An example of the existence of inconsistency in the beam thickness and in the link
with the stems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.9 Beam segmentation process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.10 Notes segmentation process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.11 An example of sharp detection on a real score. . . . . . . . . . . . . . . . . . . . . 60
4.12 An example of sharp detection on a ideal score. . . . . . . . . . . . . . . . . . . . . 61
4.13 An example of how the music symbols that are in the ideal scores were manually
splited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.14 Results of the error metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
VI
4.15 Results of the error metrics for the rotation deformation. . . . . . . . . . . . . . . . 63
4.16 Results of the error metrics for the curvature deformation. . . . . . . . . . . . . . . 63
4.17 Neural network topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.1 Neural network architecture with three layers, two inputs and one output. . . . . . 87
A.2 Forward propagation of the signal in the neural network. . . . . . . . . . . . . . . . 88
A.3 Comparison between the signal output and the target. . . . . . . . . . . . . . . . . 88
A.4 Propagation of the error signal d in backward mode in the neural network. . . . . . 89
A.5 The calculation of weights in the neural network. . . . . . . . . . . . . . . . . . . . 90
List of Tables
3.1 Deformations in the images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Ranges of deformation parameters used in the tests: min:step:max. . . . . . . . . 37
3.3 Eect of dierent deformations on the overall sta detection error rates in percent-
age: average (standard deviation) of the false detection rate and miss detection rate.
See [DDCF08] for parameter details. . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Detection performance on real music scores in percentage: average (standard devi-
ation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Sta segment extraction errors based on the number of segments in an equivalence
class r. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Removal performance on real music scores (in percentage): average (standard devi-
ation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Eect of dierent deformations on the overall sta removal error rates in percentage:
average (standard deviation) Individual pixels error. . . . . . . . . . . . . . . . . 44
3.8 Eect of dierent deformations on the overall sta removal error rates in percentage:
average (standard deviation) Sta line interruption error. . . . . . . . . . . . . . 44
3.9 Eect of dierent deformations on the overall sta removal error rates in percentage:
average (standard deviation) Segmentation region level. . . . . . . . . . . . . . . 45
4.1 Clefs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Notes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 Flags and beams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Rests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Ties and Slurs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Accents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.7 Time signatures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.8 Number of neurons in the hidden layer (not)using elastic matching method (EM) in
the dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.9 The number of states in the hidden Markov model with and without the elastic
matching method (EM) in the dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.10 The values of the parameters C and γ in the support vector machines with and
without the elastic matching method (EM) in the dataset. . . . . . . . . . . . . . . 66
4.11 Classes list of handwritten and printed music symbols. . . . . . . . . . . . . . . . . 67
4.12 Results obtained with the test set in the classication process for the handwritten
music symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
VIII
4.13 Results obtained with the test set in the classication process for the printed music
symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.14 Results obtained with the test set in the classication process for the handwritten
music symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.15 Results obtained with the test set in the classication process for the printed music
symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.16 The performed of the natural and sharp symbols. . . . . . . . . . . . . . . . . . . . 70
B.1 Table of confusion of the nearest neighbor classier. . . . . . . . . . . . . . . . . . 97
B.2 Table of confusion of the neural network. . . . . . . . . . . . . . . . . . . . . . . . . 97
B.3 Table of confusion of the support vector machines. . . . . . . . . . . . . . . . . . . 98
B.4 Table of confusion of the hidden Markov model. . . . . . . . . . . . . . . . . . . . . 98
B.5 Table of confusion of the nearest neighbor classier. . . . . . . . . . . . . . . . . . 99
B.6 Table of confusion of the neural network. . . . . . . . . . . . . . . . . . . . . . . . . 99
B.7 Table of confusion of the support vector machines. . . . . . . . . . . . . . . . . . . 99
B.8 Table of confusion of the hidden Markov Model. . . . . . . . . . . . . . . . . . . . . 100
B.9 Table of confusion of the nearest neighbor classier. . . . . . . . . . . . . . . . . . 101
B.10 Table of confusion of the neural network. . . . . . . . . . . . . . . . . . . . . . . . . 101
B.11 Table of confusion of the support vector machines. . . . . . . . . . . . . . . . . . . 101
B.12 Table of confusion of the hidden Markov model. . . . . . . . . . . . . . . . . . . . . 102
B.13 Table of confusion of the nearest neighbor classier. . . . . . . . . . . . . . . . . . 103
B.14 Table of confusion of the neural network. . . . . . . . . . . . . . . . . . . . . . . . . 103
B.15 Table of confusion of the support vector machines. . . . . . . . . . . . . . . . . . . 103
B.16 Table of confusion of the hidden Markov model. . . . . . . . . . . . . . . . . . . . . 104
Glossary
OMR Optical Music Recognition
SVM Support Vector Machine
HMM Hidden Markov Model
EM Elastic Matching
LTH Line Track Height
X
Part I
Introduction
1
Chapter1
Introduction
Music, from Greek µυσικη (τ εχνη) musike (techne), which means the art of the muses, can
be dened as an organized sequence of sounds and silences so as to produce aesthetic plea-
sure in the listener. There are evidences, by Pictographs, that music is known and practiced
since prehistory. Over the years, the music expanded in many several music styles and for
many dierent purposes, like educational or therapy. All known cultures have their own mu-
sical practice. Music is a pivotal part in the cultural heritage of any society. In this way, its
preservation, in all of its forms, must be pursued. On the face of it, the Universal Declaration
on Cultural Diversity adopted by the General Conference of UNESCO on 2001 asserts that
cultural diversity is as necessary for humankind as biodiversity is for nature, and that policies
to promote and protect cultural diversity thus are an integral part of sustainable development1.
Portugal has a notorious lack in music publishing from virtually all eras of its musical history.
In spite of most of the original manuscripts of music known before twentieth century being
kept in the national library in Lisbon, there is not any repository of musical information from
the last century. Although there are recent eorts to catalogue and to preserve in digital form
the Portuguese music from the twentieth century notably the Music Information Center2
and the section on musical heritage from the Institute of the Arts website3 most of the mu-
sic pre-dating computer notation software was never published and still exists in the form of
manuscripts or photocopies spread out all over the country in discreet places. The risk of irre-
versibly losing this rich cultural heritage is a reality that must be taken seriously and dealt with
accordingly. Digitization has been commonly used as a possible tool for preservation, oering
easy duplications, distribution, and digital processing. However, transforming the paper-based
music scores and manuscripts into a machine-readable symbolic format (facilitating operations
such as search, retrieval and analysis), requires an Optical Music Recognition (OMR) system.
Unfortunately, the actual state-of-the-art of handwritten music recognition is far from provid-
ing a satisfactory solution.
The project Automatic recognition of handwritten music scores initiated in 2007 by Instituto
de Engenharia de Sistemas e Computadores do Porto (INESC Porto) and Escola Superior de
Música e das Artes do Espectáculo (ESMAE) was the starting point for creating an OMR
1http://www.unesco.org/bpi/eng/unescopress/2001/01-112e.shtml2http://www.mic.pt3http://patrimonio.dgartes.pt/?lang=pt
3
4 Chapter 1. Introduction
system that addresses some of the identied problems which will be described in detail in the
following sections.
It is the aim of this project to overcome the problem of musical symbol recognition in hand-
written scores through the research and application of the most recent techniques of machine
learning and articial intelligence. Moreover, there is also the intention of creating a web-based
system providing generalized access to a wide corpus of handwritten unpublished music en-
coded in digital format. The database will not only centralize as much information as possible
but will also serve to preserve the musical heritage in an innovative way with a wide range of
possibilities [CCRG08a, CRCG08]. The architecture for the proposed system is represented in
Figure 1.1.
Figure 1.1: Generic system architecture.
The system is composed by three dierent entities: repository, web server and web browser.
Briey, the Repository module stores the original scanned score, the digital counterpart in
MusicXML and all the descriptive metadata inserted by the user. All the remaining system
contents, such as the user information, are also stored in this entity. The Web Server is the
user access point to the system as well as to all of its processing modules run on the server,
encompassing the search engine and the optical recognition engine for the musical scores. The
Web Server interacts with the Repository and with the Web Browser, which establishes the
interface between the user and the system. The user interface on a Web Browser allows the
complete management of the musical scores and associated metadata, as well as carrying out
the system administration.
The project presented here is innovative for several reasons:
1. It has the ambition to develop a formalism to model, in a consistency way, the heteroge-
neous knowledge about the language and musical notation.
2. Aims to explore the wealth and the potential from the most recent techniques of machine
learning and articial intelligence, not only for representing and merging knowledge, but
also to make decisions.
1.1. OMR Architecture 5
3. Intends to digitalize and preserve a wide corpus of handwritten scores in an unprecedented
way.
4. Will include an OMR engine integrated with an archiving system and a user-friendly
interface for searching, browsing and edition. The digitized scores will be stored in Mu-
sicXML, a recent and expanding music interchange format designed for notation, analysis,
retrieval, and performance applications.
5. Intends to make accessible online the repository of handwritten scores for enjoyment,
educational and musicological purposes.
The work of this thesis, as carried out in this project, aims to develop new OMR algorithms
for the web-based sytem.
1.1 OMR Architecture
The principal aims of an OMR application are the recognition, the representation and the stor-
age of musical scores in a machine-readable format. An OMR program should thus be able
to recognize the musical content and make the semantic analysis of each musical symbol of a
musical work. In the end, all the musical information should be saved in an output format that
is easily readable by a computer.
In fact, the architecture of an OMR system is dependent on the methods used in the segmen-
tation and recognition steps. Generally, the OMR process can be divided into three principal
modules (see Figure 1.2):
1. Recognition of musical symbols from a music sheet.
2. Reconstruction of the musical information to build a logical description of musical nota-
tion.
3. Construction of a musical notation model for its representation as a symbolic description
of the musical sheet.
It is important to note that this is a sequential architecture. In other words, the results obtained
in one phase are the input to the next phase. The rst module is typically further divided into
three stages:
Ê Image pre-processing: it consists in the application of several techniques (e.g. binariza-
tion, noise removal, blurring, deskewing, among others) to make the recognition process
more robust and ecient.
Ë Sta lines detection and removal to obtain an image containing only the musical symbols.
Ì Objects segmentation and basic symbol recognition.
The second and third modules (Musical Notation Reconstruction and Final Representation
Construction) are intrinsically intertwined. In the second module of musical notation recon-
struction, the symbols primitives are merged to form musical symbols. Usually, in this step
6 Chapter 1. Introduction
graphical and syntactic rules are used. This is important for the introduction of context infor-
mation to validate and solve ambiguities from the last module of music symbol recognition. It
is also produced a document where detected symbols are interpreted to assign them a musical
meaning. In the third module of nal representation construction this document is converted
into a format of musical description, such as MusicXML, that allows for storage.
In this thesis I focused in the step of music symbol recognition. The system is composed by
a preliminary stage of segmentation and a stage of classication of musical symbols. More
specically, the segmentation phase is composed by sta line detection and removal, followed
by the stage of musical symbol extraction. In the classication step four distinct classiers are
used and an elastic matching method see Figure 1.2. This method is applied to all classiers
to simulate the possibility of deformations in the musical symbols. Moreover, it also permits
the enhancing of the test set.
Figure 1.2: OMR architecture.
1.2 Objectives
In order to overcome the limitations from the actual methods in musical symbol recognition
on handwritten musical scores and to organize them hierarchically, I researched the application
of the most recent techniques of machine learning and articial intelligence. I also developed
algorithms for the musical symbol detection and recognition. The musical notation is one of
the languages most widely known. It has had a continuous development over time, with re-
quirements of consistency and precision. For this reason, the proposed methodology should be
1.3. Contributions of this Project and Related Publications 7
naturally adaptable to handwritten scores and to dierent standard musical notations.
In general, the rst step in the process of handwritten musical scores recognition is sta lines
detection. This operation is one of the most important phases in musical symbol recognition,
because it determines the results for subsequent proceedings. With this purpose in mind we
had the following objectives:
Ê To test thoroughly the proposed algorithm for the sta lines detection using an appropri-
ate database and error metrics, and compare this algorithm with others from the state of
the art.
Ë To create a new database of ideal musical scores from an existing one, where several con-
troled deformations are applied to the ideal musical scores to simulate feasible problems
of handwritten musical scores.
Ì To use the proposed algorithm for the sta lines detection as a rst step in some state
of the art sta removal algorithms and conduct a series of experiments using appropriate
error metrics.
The musical symbol extraction is the next step before the classication. The segmentation
process is based in existing algorithms [DDCF08]. In this work, the symbols we want to
recognize were split into four dierent types: the symbols that were featured by a vertical
segment, the symbols that link the notes, the remaining symbols connected to sta lines and
the symbols above and under sta lines. Since we are working with handwritten scores many
situations may occur. The score degradation and highly dierent handwritten styles are possible
undesirable situations that may occur. The objectives for this phase were:
¬ Research and development of an algorithm to extract the handwritten musical symbols.
Simulate the variability with an elastic matching method in the music symbols, with the
aim of preparing the classier for several dierent situations that may occur in handwrit-
ten music symbols.
® Compare and study dierent classiers for the musical symbols recognition.
1.3 Contributions of this Project and Related Publications
This dissertation presents the following contributions for the preservation and the general access
to musical and cultural heritage:
1. The introduction to the music analysis community of the algorithm of the sta lines
detection based in the Stable Paths approach.
2. The creation of a database of real scores with its references: detection and removal
references.
3. New algorithms for the automatic detection and classication of musical symbols. In this
way, analysis and search of the score by the user will be more easily.
8 Chapter 1. Introduction
4. Integration of the new algorithms in the web-based system being developed under the
project.
The work related with the Project where the dissertation belongs, already resulted in the
publication of the followings papers:
A Connected Path Approach for Sta Detection on a Music Score, Jaime S. Cardoso,
Artur Capela, Ana Rebelo, Carlos Guedes, in the IEEE International Conference on
Image Processing (ICIP 2008).
Integrated Recognition System for Music Scores, Artur Capela, Jaime S. Cardoso, Ana
Rebelo, Carlos Guedes, in the International Computer Music Conference (ICMC 2008).
Sta Line Detection and Removal with Stable Paths, Artur Capela, Ana Rebelo, Jaime
S. Cardoso, Carlos Guedes, in Proceedings of the International Conference on Signal
Processing and Multimedia Applications (SIGMAP 2008), pages 263270, 2008.
And it waits for the result of the submission of the:
Optical Recognition of Music Symbols: a comparative study, Ana Rebelo, Artur Capela
and Jaime S. Cardoso, in International Journal of Document Analysis and Recognition
(IJDAR 2008).
Sta Detection with Stable Paths, Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos
Guedes, Joaquim Pinto da Costa, in IEEE Transaction on Pattern Analysis and Machine
Intelligence (TPAMI 2008).
During the second semester of the academic year 06/07 one more paper was written. The rst
version of the proposed algorithm has already been presented and published:
A Shortest Path Approach for Sta Line Detection, Ana Rebelo, Artur Capela, Joaquim
F. Pinto da Costa, Carlos Guedes, Eurico Carrapatoso, Jaime S. Cardoso, in Proceedings
of the International Conference on Automated Production of Cross Media Content for
Multi-channel Distribution (AxMedis 2007), pages 7985, Nov. 2007.
1.4 Dissertation Structure
This dissertation is organized in 5 chapters that describe the work developed in the last year.
It also has a set of appendices with complementary information that will help the description of
the work done. After this introductory Chapter, the description of the works related with this
project will be made in Chapter 2. It is constituted by three states: a presentation of the state
of the art of the algorithms in the sta lines detection area and the already existing algorithms
in the eld of sta lines removal; a presentation of the state of the art of the algorithms in the
musical symbols classication area; and a background presentation of the several methods used
in the classication phase. In Chapter 3, the proposed stable paths algorithm to detect the
sta lines is described. The database used, the error metrics applied in the several experiments
1.4. Dissertation Structure 9
done to test the stable paths algorithm, and the results obtained are also presented. In chapter
4, the segmentation and classication phase is presented. A brief description of the musical
symbols that the algorithm tries to recognize is also described. Finally, conclusions are drawn
and future work is suggested in chapter 5.
Chapter2
Related Works
The investigation in the OMR eld began with Pruslin [Pru66] and Prerau [Pre70]. However,
it was only in the decade of 1980's, when the equipment of digitalization became accessible,
that work in this area has expanded [Car89, Ng95, Coü96, Bai97]. Over the years, there
have been appearing several commercial OMR software, but no one has a satisfactory perfor-
mance in terms of precision and robustness. The complexity of the OMR task caused by the
bidimensional structure of the musical notation and also by the existence of several combined
symbols organized around the note heads have been conditioning the progress in this area. Until
now, even the most developed recognition systems (PIANOSCAN, NOTESCAN in Nightingale,
MIDISCAN in Finale, PHOTOSCORE in Sibelius, SMARTSCORE, SHARPEYE, etc) can not
identify all music notations. Besides that, classic OMR is more focused in regular printed music
sheets; so, a good eciency is only obtained when this type of scores are processed.
In this thesis, the recognition of standard handwritten music is the goal. With these manuscripts
emerge new and dierent additional problems comparatively with printed music. Handwritten
musical scores tend to be rather irregular and determined by the authors' own writing style.
Moreover, if we consider that most of these works are old, the quality of paper in which it is
written might have degraded throughout the years, making it a lot harder to correctly identify
its contents. Handwritten musical scores are likely to have changes of size, shape and intensity
of handwritten symbols by the same author into the same score. As a result, the detection
and recognition process is even more complicated. Furthermore, problems exist not only in
this level but also in the detection of the handwritten sta lines. These are rarely straight and
horizontal, and are not parallel to each other. For example, some staves may be tilted one
way or another on the same page or they may be curved. Depending on the level of the paper
degradation there may also exist discontinuities in the sta lines and in the remaining symbols.
In general, these scores have many years of existence, and therefore there is a sharp decay in
the quality of the paper and ink. Many works are also photocopies from photocopies which
adds noise through the degree of copy.
2.1 State of the Art in Sta Lines Detection
One important stage in OMR is sta line detection. The reason for that is the possibility to
isolate the musical symbols present in the score enabling their extraction. Sta lines detection
10
2.1. State of the Art in Sta Lines Detection 11
is, consequently, one of the fundamental steps in all OMR processes, being the following pro-
ceedings very dependently on the performance and results obtained in this initial phase. In
doing so, any OMR algorithm begins with this operation.
The problem of sta line detection is often considered simultaneously with the goal of their
removal, although exceptions exist [Mat85, MO07, Szw05, Pug06]. The simplest approach
consists in nding local maxima in the horizontal projection of the black pixels of the image
[KI90, Fuj04, RCF+93, MN96, BBN01, TSM06]. These local maxima represent line positions,
assuming straight and horizontal lines. Several horizontal projections can be made with dif-
ferent image rotation angles, keeping the image in which the local maxima are bigger. This
eliminates the assumption that the lines are always horizontal. An alternative strategy for
identifying sta lines is to use vertical scan lines [Car89]. This process is based on a Line
Adjacency Graph (LAG). LAG searches for potential sections of lines: sections that satisfy
criteria related to aspect ratio, connectedness and curvature. More recent works present a
nearly sophisticated use of a combination of projection techniques in order to improve the basic
approach [Bai97, BBN01, RB05].
Fujinaga [Fuj04] incorporates a set of image processing techniques in the algorithm, including
run-length coding (RLC), connected-component analysis, and projections. After applying the
RLC to nd the thickness of sta lines and the space between the sta lines, any vertical black
run that is more than twice the sta line height is removed from the original. Then, the con-
nected components are scanned in order to eliminate any component whose width is less than
the sta space height. After a global deskewing, taller components, such as slurs and dynamic
wedges are removed.
Other techniques for nding sta lines include the grouping of vertical columns based on
their spacing, thickness and vertical position on the image [RP96], rule-based classication
of thin horizontal line segments [Mah82], and line tracing [Pre70, RT88]. The methods pro-
posed in [MO07, Szw05] operate on a set of sta segments, with methods for linking two
segments horizontally and vertically and merging two segments with overlapping position into
one. Dalitz [DDCF08] is an improvement on these methods.
In spite of the variety of methods available, they all suer from some limitations. In partic-
ular, lines with some curvature or discontinuities are inadequately resolved. The dash detec-
tor [LCL93] is one of a few works that try to handle discontinuities. The dash detector is an
algorithm that searches the image, pixel by pixel, nding black pixel regions that it classies
as stains or dashes. Then, it tries to unite the dashes to construct lines.
A common problem to all the above-mentioned techniques is that they try to build sta lines
from local information, without properly incorporating global information in the detection
process. To our knowledge, none of the proposed methods in the literature tries to dene a
reasonable process from the intrinsic properties of sta lines, namelly the fact that they are the
only extensive black objects on the music score. Usually, we argue that the most interesting
12 Chapter 2. Related Works
techniques arise when one denes the detection process as the result of optimizing some global
function. In this thesis, we suggest a graph-theoretic framework where sta lines are the
solutions of a global optimization process.
2.2 State of the Art in Musical Symbol Extraction and
Classication
The segmentation task has been the object of study by some authors [Mah82, RT88, Car89,
RB05, TSM06]. This process deserves some careful attention because it interferes with the
classication stage. Major problems result from the diculty to obtain individual meaningful
objects. This is due to the printing and digitalization as well as to the paper degradation and
the lack of a standard notation. In addition there are distortions caused by sta lines, broken
and touching symbols as well as high density of symbols, with dierent sizes and shapes. Be-
sides, few research works have been done in handwritten scores [FLS05].
The most usual approach consists in extracting elementary graphic symbols: note heads, rests,
dots, etc, that can be composed to build music notation. Usually, the primitive segmentation
step is made along with the classication task [Mah82, RT88, Car89, KI90, RB05, TSM06],
however exceptions exist [Fuj04, BBN01, Pug06]. In [Mah82] and [Car89] it is applied the same
technique used in sta lines detection to the detection of musical symbols. Mahoney [Mah82]
builds a set of candidates to one or more symbols types and then use descriptors to select
the matching candidates. This technique is not feasible for handwritten scores, because the
manuscript symbols do not have stable shapes and sizes. Carter [Car89], in his turn, uses
the LAG to extract symbols. The objects resulting from this are classied according to the
bounding box size, to the number and organization of their constituent sections. Once again
this process of classication is not promissing for handwritten symbols, for the reasons pre-
sented above. Other authors [Fuj04, BBN01] have also chosen to apply projections to detect
symbols primitives. The recognition is done using features extracted from the projection pro-
les. In [Fuj04] the k-nearest neighbor rule is used in the classication phase, whether neural
networks is the classier used in [BBN01].
Randriamahefa [RCF+93] proposed a structural method based in construction of graphs for
each symbol. These are isolated by using a growing region method and thinning. Template
matching is adopted in [Mat85, MN96, Ros02, RB05, TSM06]. In [Ros02, RB05] a fuzzy model
that depends on a robust symbol detection and template matching results was developed. This
method aims to deal with uncertainty, exibility and fuzziness at a symbol level. In fact in this
approach the segmentation process has two steps: individual analysis of musical symbols and
fuzzy model construction. In the rst step, the vertical segments are detected by region growing
method and template matching. Then, beams are detected by a region growing algorithm and
a modied Hough transform. The remaining symbols are extracted again by template match-
ing. From this rst step results three recognition hypotheses; each assigning the pattern to a
possible class. The fuzzy model is used to make a consistent decision. The proposed process
2.3. Background Knowledge 13
incorporates graphical and syntactic rules. Besides, it allows the possibility of applying learning
procedures when potential errors occur, in an eort to gain robustness.
Other techniques for extracting and classifying musical symbols include rule-based systems to
represent all the musical information [RT88], a collection of processing modules that communi-
cate by a common working memory [KI90] and pixel tracking with template matching [TSM06].
Toyama [TSM06] checks for coherency in the primitive symbols detected by estimating touching
positions. This evaluation is done by music writing rules. Coüasnon [CC93, Coü96] proposed
a recognition process entirely controlled by grammar which formalizes the musical knowledge.
In [RP96] the segmentation process involves three stages: line and curves detection by LAG,
accidentals, rests and clefs detection by a character prole method and note heads recognition
by template matching. The contextual recognition is done by graph grammars. In [Pug06] the
segmentation task is based in Hidden Markov Models. This process performs segmentation and
classication simultaneously. However, this technique results only for very simple scores, that
is, scores without slurs or more that one symbol in the same column and sta.
We can conclude, based on this brief description of the state of the art, that although the OMR
systems already contribute to the analysis of musical scores by several methods, it is still far
from solved the problems imposed by handwritten scores. Therefore, new research and new
algorithms are important and necessary. By all the works already presented, only the fuzzy
model proposed by [RB05] lead us to deduce that maybe it is a good method to take into
account in a future work. It will be enriching for this project to do a comparative study of this
method with elastic matching and also with a segmentation process proposed in this thesis.
2.3 Background Knowledge
In this Section we review some concepts necessary for a better understanding of the work
presented in the course of this thesis.
2.3.1 Graphs
A graph is a pair G = (V,E) of sets such that E ⊆ [V ]21 the elements of E are 2-element
subsets of V . G is composed of two sets V and E. V is the set of vertices (or nodes), and E is
the set of edges (or lines) (p, q), p, q ∈ V [Die05].
A graph with vertex set V is called a graph on V . V (G) is the vertex set of a graph G and E(G)its edge set2. The number of vertices of a graph G is its order, |G|, and ||G|| is the number
of edges. A vertex v is incident with an edge e if v ∈ e; then e is an edge at v. Endvertices
or ends are two vertices incident with an edge, and an edge joins its ends. If x ∈ X and
y ∈ Y , then xy is an X − Y edge. E(X,Y ) is the set of all X − Y edges in a set E. The
set of all the edges in E at a vertex v is denoted by E(v). Two vertices x, y of a graph G
1[V ]k ≡ the set of all k-element of V .2Note that these conventions are independent of any names, for instance if we have a graph H = (W, F ) the
vertex set w is still referred to as V (H), not as W (H).
14 Chapter 2. Related Works
are adjacent, if xy is an edge of G. Two edges e 6= f are adjacent if they have an end in common.
The graph is weighted if a weight w(p, q) is associated to each edge, and it is called a digraph if
the edges are directed, i.e., (p, q) 6= (q, p). In other words, a digraph is a pair (V,E) of disjointsets of vertices and edges together with two maps init : E → V and ter : E → V assigning to
every edge e an initial vertex init(e) and a terminal vertex ter(e).
A path is a (non-empty) graph P = (V,E) of the form
V = x0, x1, . . . , xk E = x0x1, x1x2, . . . , xk−1xk ,
where the xi are all distinct. The end vertices x0 and xk are linked by P and the vertices
x1,. . .,xk−1 are the inner vertices of P . The path cost is the sum of each arc weight in the
path. The length of a path is the number of its edges, and the path of length k is denoted by
P k. A non-empty connected graph G is a graph where two of its vertices are linked by a path
in G. A direct graph D is oriented with an orientation of a graph G if V (D) = V (G) and
E(D) = E(G), and if init(e), ter(e) = (x, y) for every edge e = xy.
In graph theory, the shortest-path problem seeks the shortest path connecting two nodes; e-
cient methods are available to solve this problem, such as the dynamic programming algorithms.
Dynamic Programming
Before presenting the operations of dynamic programming techniques let us take a look to the
following example [SDV06]. Figure 2.1 represents a directed acyclic graphs and its linearization
(topological ordering the nodes are arranged on a line so that all edges go from left to right).
This linearization process is important for the shortest path.
Estart
A
B
C
D
1
2
53
6
1
4
Estart B A C D
1
2 5
36
14
Figure 2.1: A directed acyclic graphs and its linearization.
Suppose we want to gure out distances from node E to the other nodes. For concreteness,
let's focus on node C. The only way to get to it is through its predecessors, B or A; so to nd
2.3. Background Knowledge 15
the shortest path to C, we need only compare these two routes:
dist(C) = min dist(B) + 3, dist(A) + 4 .
A similar relation can be written for every node. If we compute these dist values in the left-
to-right order of Figure 2.1, we can certainly get to a node v, and then we already have the
information needed to compute dist(v). We are therefore able to compute all distances in a
single pass:
initialize all dist(·) values to ∞dist(s) = 0
for each v ∈ V s, in linearized order:
dist(v) = min(u,v)∈E dist(u) + l(u, v)
In face of it, we can note that this algorithm is solving a collection of subproblems, dist(u) : u ∈ V :it starts with the smallest of the distances, dist(e), since it immediately knows its answer to
be 0; then, it proceeds with progressively larger subproblems distances to vertices that are
further and further along in the linearization where it is thinking of a subproblem as large if
it needs to have solved a lot of other subproblems before it can get to it. This is a very general
technique. At each node, the algorithm computes some function of the values of the node's
predecessors. In this case, the particular function is a minimum of sums.
This is dynamic programming. This is a powerful algorithmic paradigm for eciently solving a
wide range of search and optimization problems which exhibit the characteristics of overlapping
subproblems (the problem can be broken down into subproblems which are reused several times)
and optimal substructure (optimal solutions of subproblems can be used to nd the optimal
solutions of the overall problem).
Now, it is crucial to know what the subproblems are when we are solving a problem by dynamic
programming. Each node will represent a subproblem, and each edge will represent a precedence
constraint, of the form (i − 1, j) → (i, j), (i, j − 1) → (i, j), and (i − 1, j − 1) → (i, j), on the
order in which the subproblems are tackled. We can also put weights on the edges.
2.3.2 Classication Methods
Dierent approaches were evaluated in this work for the classication problem: hidden markov
models, support vector machine, neural networks and k-nearest neighbor. Besides that, con-
cerning the fact that we are trying to classify handwritten musical symbols, we state one main
problem: the enormous variations in symbols. Therefore, the classication process needs to take
into account the variability in the writing style. An Elastic Matching method [Nis95, JZ97]
was used to expand our database to achieve a better approach for these several variances. In
the next subsections a brief explanation of these methods will be provided.
16 Chapter 2. Related Works
Hidden Markov Models
Hidden Markov Models (HMMs) have rarely been used in OMR except in experiences made
by [KPM96, MMM04, Pug06]. The application of this technique in musical symbol classica-
tion had its origins on optical character recognition. One of the reasons for the use of HMM
lies in its capability to perform segmentation and recognition at the same time. Limited by
time, for now, HMM were only used to recognize the symbols.
A HMM is a doubly stochastic process, that generates symbols sequence, with an underlying
stochastic process that is hidden and can only be detected through another process whose real-
izations are observable [BS00]. The hidden process consists of a set of states connected to each
other by transition probability. Left-right HMMs were used here. Transitions probabilities from
a state i to another state j are given by A = aij, where aij = P [qt+1 = Sj |qt = Si] , 1 ≤i, j ≤ N . The observed process consists of a set of outputs or observations. Each observation
is contained in a state with some probability density function. The set of observations proba-
bilities is given by B = bj(k), where bj(k) = P [ot = xk|qt = Sj ], 1 ≤ k ≤ M , j = 1, 2, ..., N .
bj(k) represents the probability of the observation xk in state Sj , oj denotes the observation
in time t and qt represents the state in time t. HMM can now be formulated as λ = (A;B;π),where π is a set of initial probabilities of states [WFJC01].
Support Vector Machines
Support Vector Machines (SVMs), pioneered by Vapnik [Vap98], deal with the problem of
classication as a problem of quadratic optimization. This technique has as main idea the con-
struction of a hyperplane as the decision surface in such a way that the margin of separation
between positive and negative examples is maximized. Support vector machines classify the
data using support vectors [Hay98].
Without loose of generalization, SVMs try to maximize the margin of the optimal hyperplane
which separates the data. This is typically done in a much higher dimension than that of the
original feature space. Formally, given the training set xi, yiNi=1 with input data xi ∈ RP
and corresponding binary class labels di ∈ −1, 1, the linear separable optimal hyperplane is
dened by g(x) = wtϕ(x) + b where ϕ(x) denotes a xed-feature space transformation and b a
bias parameter. x is assigned to class 1 if g(x) > 0 or to −1 if g(x) < 0. This is equivalent tohave di[wtϕ(x) + b] ≥ 1, i = 1, . . . , N . Summarizing, maximizing the margin is equivalent to
solving
minw,b
12wtw
s.t di[wtϕ(x) + b] ≥ 1, i = 1, . . . , N(2.1)
If the training classes are not linearly separable, it means that above conditions and problem
formulation can not be sustained. For this reason, slack variables ξi, i = 1, . . . , N were added.
These allowed to have a penalty for the data points wrongly classied. Finally, the objective
2.3. Background Knowledge 17
is to minimize the error. That is,
minw,b,C,ξi
12wtw + C
N∑
i=1
ξi
s.tdi[wtϕ(x) + b] ≥ 1− ξi, i = 1, . . . , Nξi ≥ 0
(2.2)
where the parameter C > 0 controls the trade-o between the slack variables and the margin.
In the feature space it is easier to solve the dual problem, and sometimes it is the only way to
train the support vector machines. It is possible to formulate the dual problem3 for a sample
of training xi, yiNi=1 not separable as follows:
maxα
N∑
i=1
αi −12
N∑
i=1
N∑
j=1
αiαjdidjk(xi, xj)
s.t
∑Ni=1 αidi = 0
0 ≤ αi ≤ C i = 1, 2, ..., N
(2.3)
where k(xi, xj) = ϕT (xi)ϕ(xj) =∑m1
l=0 ϕl(xi)ϕl(xj), i = 1, 2, ..., N and j = 1, 2, ..., N . ϕl(xi)is the l component in the application ϕ(xi) of xi; m1 is the dimension of the feature space.
Above a binary classier was described whereas in this work a multiclass problem is presented.
In fact, several works have been suggested as an extension to the binary problem classication
that originally was proposed [eeFc02, HL02, MA98]. Usually, there are two types of approach for
the multiclass classication problems. One is to build and to combine several binary classiers
one-against-one and one-against-all - and another approach is to consider directly a resolution of
the quadratic problem. The one-against-one methodology was used in this work. This method
consists of training the jth and kth classes in the following binary problem
minwjk,bjk,C,ξjk
i
12
(wjk)twjk + C
N∑
i=1
ξjki (wjk)t
s.tyi[(wjk)tϕ(x) + b] ≥ 1− ξjki , i = 1, . . . , Nξjki ≥ 0
(2.4)
with l training data (x1, y1), . . . , (xl, yl) where xi ∈ <n, i = 1, . . . , l, and yi ∈ 1, . . . , k is theclass of xi; the training data xi are mapped to an high dimensional space by the funtion ϕ.
The three most common types of inner-product kernels for SVMs are: polynomial learning
machine, radial-basis function network and tangent hyperbolic. In this work a radial-basis
function network was used, given by:
k(x,xi) = exp(−γ||x− xi||2), γ ≥ 0 (2.5)
3See Appendix A.1 to see the demonstration.
18 Chapter 2. Related Works
Neural Networks
The term neural network was initially studied with the aim to represent the information pro-
cessing in biological systems [MP88, WH88, RHW88]. The attempts were to produce intelligent
perception and cognition machines by simulating the physical structure of human brains. In
our days, the principles and algorithms of neural networks have found several applications in
diverse elds including pattern recognition and signal processing.
Neural networks are composed by interconnected neurons. These are the information processing
units. The neural model is form by three basic elements: a set of connecting links (each one is
multiplied by a weight), an adder (for summing the input signals) and an activation function
(for limiting the amplitude of the output of a neuron) [Hay98]. Formally, we can dene the
output of a neuron by
yk = ϕ(m∑
j=1
ωkjxj + bk) (2.6)
where x1,x2,. . .,xm are the input layers, ω are the weights of neuron k, bk is the bias and ϕ(.)is the activation function. There are three common types of activation functions: threshold
function, piecewise-linear function and sigmoid function. In this work, the last one was used
, whose graph is s-shaped. This way, the function accepts input values between −∞ and ∞,
and return values between 0 and 1. Hence, it is dened by
f(x) =1
1 + exp(−ax),where a> 0 represents the slope (2.7)
A multilayer feedforward neural network was used. Typically, this network consists of input
and output layers of neurons and one or more hidden layers that are not part of the input or
output of the network. The input signal propagates through the network in a forward direction.
The learning algorithm used to train this network was the backpropagation algorithm4. This
algorithm is based on the gradient descending technique to minimize the cost function. In a
simple explanation, the error backpropagation learning consists of two passes through the layers
of the network: a forward pass and a backward pass. In the forward pass the link weights of
the neurons are all xed. The input vector is applied to the sensory nodes of the networks
and it is produced a set of outputs. In the backward pass the link weights are all adjusted
in accordance with an error-correction rule. Basically, the output values of the network are
subtracted from a desired response (targets) to produce an error signal. This error signal is
then propagated backward through the network to all neurons. A network with K outputs
was used, one corresponding to each class, and target values of 1 for the correct class and 0
otherwise.
K-nearest Neighbour
K-nearest neighbour is a supervised learning method for classifying objects based on training
examples in the feature space [Fuk90]. This algorithm belongs to a set of techniques called
4See Appendix A.2 for more details.
2.3. Background Knowledge 19
Instance-based Learning. The K-nearest neighbour algorithm is very simple and basically it
does not have train. It starts by extending the local region around a data point x until the
kth nearest neighbour is found. The most represented class in the k-closest samples denes
the predicted class. Data training lies only in the estimation of the best k. In this work the
Euclidean distance was used :
deuclidean(a, b) =√∑d
i=1 (ai − bi)2
Elastic Matching
Several research methods exist in deformable template eld applied on handwritten digits and
handprinted characters recognition (e.g. [LS88, Wak94, Nis95, JZ97]). Lam [LS88], one of
the rst works in this area, proposed a method of recognition in two-stages. The images are
rst recognized by a tree classier. Those which cannot be satisfactory assigned to a class
are passed to a matching algorithm, which deforms the image to match with a template. In
Nishida [Nis95] a grammar-like model for applying deformations in primitive strokes was de-
veloped. A new and robust shape-matching approach to recognize numbers manuscripts was
proposed by Wakahara [Wak94]. The method uses successive local ane transformation (LAT)
operations to gradually deform the image. The aim is to yield the best match to an input
binary image. LAT on each point at one location is optimized using locations of other points
by means of least-squares data tting using Gaussian window functions.
The deformation and matching technique used in this work to classify the musical symbol is
based in [JZL96, JZ97]. In this approach, the image is mapped on a unit square S = [0, 1]×[0, 1].The points in this square are mapped by the function (x, y) → (x, y) + D(x, y). The space ofdisplacement functions is given by
exmn(x, y) = (2 sin(πnx) cos(πmy), 0) (2.8)
eymn(x, y) = (0, 2 sin(πny) cos(πmx)) (2.9)
Specically, the deformation function is chosen as follows:
D(x, y) =M∑
m=1
N∑
n=1
ξxmnexmn + ξymne
ymn
λmn(2.10)
where ξ = (ξxmn, ξymn),m, n = 1, 2, . . . are the projections of the deformation function on the
orthogonal basis. Because D(x, y) can represent complex deformations by choosing dierent
coecients of ξmn and dierent values of M and N, it is important to impose a probability
density on D(x, y). Therefore, the ξmn's are assumed to be independent of each other, inde-
pendent along the x and y directions and identically Gaussian distributed with zero mean and
variance σ2.
Part II
Detection
21
Chapter3
Staff Lines Detection and Removal∗
Sta line detection and removal are the rst fundamental stages in many systems of optical
music recognition, with subsequent processes relying heavily on their performance. The reasons
for detecting and removing the sta lines lie on the need to isolate the musical symbols for a
more ecient and correct detection of each symbol present on the score. For the sta lines
to be properly removed and the symbols correctly detected it is necessary to make a correct
detection of the sta lines. When we are dealing with printed and regular scores the process is
much simpler. However, in the case of handwritten music sheets, which is the case of study of
this dissertation, several vicissitudes, already listed in Chapter 2, may occur. For this reason,
despite the multitude of attempts to treat the problem of sta lines detection, when we are
working with old and handwritten music scores, the results are not completely satisfactory yet.
In this dissertation a new conceptualization to detect the sta lines is presented. The proposed
paradigm begins with the work done in [RCC+07]. The main idea is to consider the sta lines
as the result of the shortest path between the two margins of the music sheet, giving preference
to black pixels. The initial version only implemented a heuristic of this basic idea, without a
principle properly reasoned and tested. The proposed new learning methodology is extended
and explored in various directions in this thesis. Firstly, the concept of stable paths is intro-
duced in order to improve the computational performance of the method. Secondly, the design
of the weights on the graph resulting from the music score is generalized to dierentiate black
pixels belonging to the sta lines from black pixels resulting from the music symbols. Finally,
the post processing is rened, improving the overall performance. A further development is the
study of new sta removal algorithms, by incorporating the proposed sta line detection on
standard sta removal algorithms.
It is important to state that despite this Project being in the eld of recognition of standard
musical notation it is not necessary, for the sta line detection and removal phase, to be con-
ned only to this type of notation. Hence, a database was built. This database contains not
only several symbols with dissimilar shapes and intensities, but also a varying number of lines
per sta.
∗Some portions of this chapter appears in [CCR+08, CCRG08b].
23
24 Chapter 3. Sta Lines Detection and Removal
3.1 Basic Concepts
First of all, it is necessary to indicate that the paradigm described considers that a sta line
can be seen as a connected path from the left side of the music score to the right side. As sta
lines are almost the only extensive black objects on the music score, the path we are looking
for is the shortest path between the two margins if paths (almost) entirely through black pixels
are favoured.
Consequently, the image grid the music sheet is considered as a graph with pixels as nodes
(vertices) and edges connecting neighbouring pixels. The weight w(p, q) of each arc is a functionof the pixels values and their relative positions. A path from vertex (pixel) v1 to vertex (pixel)
vn is a list of unique vertices v1, v2, . . . , vn, with vi and vi+1 corresponding to neighbour pixels.
The total cost of a path is the sum of each arc weight in the path and is given by
n∑
i=2
w(vi−1, vi). (3.1)
A path from a source vertex v to a target vertex u is said to be a shortest path if its total cost is
minimum among all v-to-u paths. The distance between a source vertex v and a target vertex
u on a graph, d(v, u), is the total cost of a shortest path between v and u.
A path from a source vertex v to a sub-graph Ω is said to be a shortest path between v and Ωif its total cost is minimum among all v-to-u paths, where u ∈ Ω. The distance from a node
v to a sub-graph Ω, d(v,Ω), is the total cost of a shortest path between v and Ω and it is
represented by:
d(v,Ω) = minu∈Ω
d(v, u). (3.2)
A path from a sub-graph Ω1 to a sub-graph Ω2 is said to be a shortest path between Ω1 and
Ω2 if its total cost is minimum among all v-to-u paths, where u ∈ Ω1 and v ∈ Ω2. The distance
from a sub-graph Ω1 to a sub-graph Ω2, d(Ω1,Ω2), is the total cost of a shortest path between
Ω1 and Ω2, and is given by:
d(Ω1,Ω2) = minv∈Ω1,u∈Ω2
d(v, u). (3.3)
3.2 Underlying Principle
As mentioned before, sta lines can be considered as the only extensive objects made from
black pixels in the music score, connected paths of black pixels from the left side to the right
side of the music score. Assuming that paths through black pixels are preferred over paths
through white pixels, sta lines can then be found among the shortest paths from the left to
the right margin of the music score. Sta lines can be modelled as paths between two regions
Ω1 and Ω2; these represent, respectively, the left and right margins of the score. Besides that,
one may assume that sta lines do not zigzag back and forth, left and right. Therefore, one
may restrict the search among connected paths containing one, and only one, pixel in each
3.3. Algorithm Outline 25
column of the image2. Formally, let I be an N1×N2 image and dene an admissible sta to be
s = (x, y(x))N1x=1 , s.t. ∀x |y(x)− y(x− 1)| ≤ 1,
where y is a mapping y : [1, · · · , N1]→ [1, · · · , N2]. That is, a sta line is an 8-connected path
of pixels in the image from left to right, containing one, and only one, pixel in each column of
the image.
Given the weight function w(p, q), the cost of a sta can be dened as
C(s) =N1∑
i=2
w(vi−1, vi) (3.4)
The optimal sta line that minimizes this cost can be found using dynamic programming. The
rst step is to traverse the image from the second column to the last column and compute the
cumulative minimum cost C for all possible connected sta lines for each entry (i, j):
C(i, j) = min
C(i− 1, j − 1) + w(pi−1,j−1; pi,j)
C(i− 1, j) + w(pi−1,j ; pi,j)
C(i− 1, j + 1) + w(pi−1,j+1; pi,j)
,
where w(pi,j ; pl,m) represents the weight of the edge incident with pixels at positions (i, j) and(l,m). At the end of this process,
minj∈1,··· ,N2
C(N1, j)
indicates the end of the minimal connected sta. Hence, in the second step, one backtrack from
this minimum entry on C to nd the path of the optimal sta.
3.3 Algorithm Outline
Assume one wants to nd all sta lines present in a score. This can be approached by succes-
sively nding and erasing the shortest path from the left to the right margin of the score. The
erase operation is crucial to ensure that a sta is not detected more than once3.
Consider the music score presented in Figure 3.1(a); in Figure 3.1(b) the rst eleven shortest
paths are traced. From this example it is possible to conclude that music symbols placed on
top of sta lines do not interfere with the detection of the sta lines. Moreover, the example
also makes clear that slightly skewed scores do not pose any problem to the proposed approach.
Nonetheless, two main issues need to be properly addressed. On the one hand, a criterion
is needed to stop the iterative detection of the shortest paths, that is, the sta lines. On the
2These assumptions, 8-connectivity and one pixel per column, impose a maximum detectable 45 rotationdegree. However, a higher rotation degree, in a real case, is unlikely to happen, but may be previously corrected.
3We implemented the erase operation by setting to white the pixels on the detected sta; image resizingcould be a valid alternative [AS07].
26 Chapter 3. Sta Lines Detection and Removal
(a) Skewed sta lines with music symbols. (b) The rst 11 shortest paths between left andright margins.
Figure 3.1: An exemplicative example of the methodology.
other hand, the initial and the nal parts of a path should also be trimmed, since these (almost)
completely white pixels do not belong to the sta line, despite belonging to the shortest path
of an opposite margin. Another subject deserving attention is the computational complexity.
The sequential computation of the shortest path may be prohibitive for some applications. It
would be interesting to be able to compute several 'shortest paths' simultaneously. Thus, before
presenting the complete algorithm, we introduce the concept of a stable path in a graph, which
will allow computing multiple sta lines in a single iteration, instead of sequentially computing
them one at a time.
3.4 Stable Paths on a Graph
Before presenting the formal denition of a stable path on a graph, it is necessary to consider
a hypothetical example of a simplied music score to motivate this concept. Therefore, lets
consider Figure 3.2 where a sta with only four sta lines (rows 1, 4, 6, and 8) in a 8× 9 image
is presented. The sta lines are the black elements, as in the real case. The presence of noise
on some of the sta lines is simulated by discontinuities. The design of the weight function
will be considered in the Section 3.5; for now, it suces to know that the presented graph was
constructed to favour paths through black pixels.
In fact, we can see in Figure 3.2 that the shortest path between the left and right margins
between the sub-graphs Ω1 and Ω2 is the path corresponding to the rst row, entirely through
black pixels. By following the strategy just delineated, one could nd the four sta lines in four
iterations, sequentially. Nonetheless, although only one sta line corresponds to the shortest
path, they all constitute a sort of (almost) optimal paths. The stable paths concept provides a
means to nd all of such paths simultaneously.
Denition 1. A path Ps,t is a stable path between regions Ω1 and Ω2 if Ps,t is the shortest
path between s ∈ Ω1 and the whole region Ω2, and Ps,t is the shortest path between t ∈ Ω2 and
the whole region Ω1.
The naming of stable paths has its roots in dynamical systems, as it resembles stable xed
points. If one considers the function FΩ1→Ω2 (), mapping a node s ∈ Ω1 to a node t ∈ Ω2 by
3.4. Stable Paths on a Graph 27
Figure 3.2: Stable paths on a toy example.
nding the shortest path Ps,t between s ∈ Ω1 and Ω2, with t = FΩ1→Ω2 (s) as the end node
of such shortest path, then
GΩ1→Ω1 (s) = FΩ2→Ω1 (FΩ1→Ω2 (s)) = s
if and only if Ps,t is a stable path. Note that the concept of stable path is valid for any graph
and any two sub-graphs in general. The computation of the stable paths on the toy example
of Figure 3.2 provides the three paths yellow-highlighted in the gure.
As a second example, in Figure 3.3(a) the shortest paths between each point on the left margin
and the whole right margin are traced for the score in Figure 3.1(a). As seen, the paths got
attracted by the staines. Likewise, Figure 3.3(b) shows the shortest paths between each point
on the right margin and the whole left margin. The set of stable paths between both margins
result as the set of paths present in both gures.
With the concept of Stable Paths, the computation of all the stable paths in the graph derived
from an image has only roughly twice the complexity of the shortest path computation. Notic-
ing that the procedure delineated in Section 3.3 actually gives the shortest path between the
whole left margin Ω1 and each point on the right margin Ω2. The rst step on the computation
of the stable paths corresponds verbatim to the computation of the shortest path presented on
Section 3.3. In a second step one repeats the same procedure, traversing now the graph from
the right column to the left. At the end of this process, if the two endpoints of a direct and
reverse path coincide, we are in the presence of a stable path.
28 Chapter 3. Sta Lines Detection and Removal
(a) Shortest paths from each pixel in the left columnand the whole right column, superimposed on the orig-inal image.
(b) Shortest paths from each pixel in the right columnand the whole left column, superimposed on the origi-nal image.
Figure 3.3: Exemplication of stable paths for Figure 3.1(a).
On the toy example of Figure 3.2, the computation of the stable paths provides only three
out of the four sta lines, with the last stable path following partially through a segment of
the third sta line and a segment of the fourth sta line4. As a consequence, it appears that
the computation of the stable paths does not guarantee the discovery of all paths of interest.
Applying, in a second time, the stable paths procedure, after erasing the paths found on the
rst iteration, we get a new set of paths, including a path joining the remaining segments of the
3rd and 4th sta lines. A search based on stable paths in preference to a sequential search by
the shortest path is related with the number of stable paths found simultaneously. For instance,
while a score with 60 sta lines would require 60 iterations of the shortest path algorithm, it
requires a few (typically between 4 to 6) iterations with the stable paths method. Next, the
complete proposed algorithm for sta line detection is detailed.
3.5 Proposed Algorithm
The proposed algorithm can be implemented as a sequence of a few high-level operations see
Listing 1 that will be described in this Section.
Preprocessing With the objective of detecting the sta lines, the proposed algorithm starts
by estimating the sta space height, staspaceheight, and sta line height, staineheight.
These lengths will both be used as reference measures for the subsequent operations.
There are already, in common use, robust estimators for its calculation. Usually, the
most used technique starts by computing the vertical run-lengths representation of the
image. After doing that to a bit-mapped page of music, the most common black-runs
represents the sta line height and the most common white-runs represents the sta space
height [Fuj04]. After estimating the reference lengths, the edges' weights are determined
as explained in Section 3.6.
4Note that the sequential search of the shortest path would suer from the same limitations.
3.5. Proposed Algorithm 29
BEGIN PreProcessing
compute staffspaceheight and stafflineheight
compute weights of the graph
END PreProcessing
CYCLE
compute stable paths
validate paths with blackness and shape
remove valid paths from image
add valid paths to list of stafflines
END OF CYCLE if no valid path is found
BEGIN PostProcessing
uncross stafflines
organize stafflines in staves
smooth and trim stafflines
END PostProcessing
Listing 1: Main operations of the proposed method.
Main Cycle The preprocessing is followed by the main cycle of the methodology that incorpo-
rates successively nding the stable paths between the left and right margins. Moreover,
it is in this step that the paths found are added to the list of sta lines and then erased
from the image. The erase operation sets to white the pixels on a vertical strip of height
empirically xed at staspaceheight, centred on the detected sta line. As explained previ-
ously, the erase operation is necessary to ensure that a line is not detected multiple times,
even if its height is higher than one pixel.
Stopping Rule To stop the iterative sta line search, a sequence of (arguably) sensible rules
is used to validate the stable paths found; if none of them passes the checking, the
iterative search is stopped. Two validation rules were applied, both assessing features
with respect to the median values obtained during the rst iteration. If a path does not
have a percentage of black pixels above a xed threshold, it is discarded. The median
percentage of blackness of all lines found in the rst iteration of the main cycle provides
the necessary reference: a threshold of 80% of the median value was empirically selected.
Likewise, if the shape of the detected path diers too much from the shape of the line with
median blackness, it is discarded. A dissimilarity measured as the average y− distancebetween both paths, after removing the means above shapedi = 4×staspaceheight wasselected as threshold.
PostProcessing After the main search step, valid detected sta lines are post-processed.
Although true sta lines never intersect, the above algorithm may occasionally create
intersecting lines, detected on dierent iterations. That may be due to a local low quality
of a line, leading a stable path to jump between consecutive lines. In doing so, this local
discontinuities can be the reason for possible zigzag back and forth between consecutive
lines, because the detected path is likely to follow and connect the remaining segments,
and consequently intersect with the previous detected path. To preclude such nal, unde-
sired state, lines are post-processed to remove intersections. That is easily and eciently
accomplished by, for each image column, sorting on y the pixels of the detected lines and
30 Chapter 3. Sta Lines Detection and Removal
assigning the i-pixel to the i-line. After this simple process, lines may touch but they do
not intersect.
After removing the intersections it is now possible to eliminate spurious sta lines and
to cluster them in staves. Since the lines are ordered, these operations require only it-
erating through the list of lines and starting a new sta whenever the distance between
two consecutive lines is above a xed threshold. The value considered for this thresh-
old was = 2× staspaceheight. Subsequently, spurious staves are eliminated by simply
discarding those with only single sta line. However more robust rules can be created, be-
cause staves with only one sta line exist (e.g. percussion), although they are uncommon.
Finally, each retained line is trimmed at the beginning and at the ending and smoothed.
As visible in the example of Figure 3.1(b), before meeting with a sta line, a path travels
through a sequence of white pixels. Likewise, after the end of the sta line, the path goes
again through a sequence of white pixels until it meets the right margin of the image. In
order to ignore all of these white pixels (undesirable segments), the trimming operation
works per sta. Thus, for each sta, a sequence of median colours is computed as follows:
for each column, the median of the colours of the lines is added to the sequence. Next,
the trimming points are found on this sequence: starting on the centre, we traverse
the sequence to the left and to the right until a run of whiterun = 2×staspaceheightwhite pixels is found (value obtained experimentally). The pixels between the left and
right runs are kept in the sta lines. At the end, lines are smoothed with a standard
average low-pass lter. Considering a sta line as a sequence y(x) of y-positions, a one-
dimensional averaging lter is applied. A window size of 2×staspaceheight was selectedon the experiments.
3.6 Design of the Weight Function
An immediate approach is to support the design of the weight function solely on the values
of the incident nodes, that is, if any of the corresponding pixels are black then a low cost is
assigned to the edge, otherwise the edge assumes a high cost. We call this the baseWeight
in Listing 2. Nevertheless, the weight function can be generalized taking into account other
factors. It is very useful to consider the prior knowledge about a music score when we want
to nd the sta lines. In order to incorporate this idea in the shortest path process, we
consider dierent alternatives to modify the weight function of the graph. In doing so, this
weight function will be modied by a term that codies that information about the music sheet.
Furthermore, two more attributes were considered in each pixel with the aim to inuence the
main contribution to the edge's weight resulting from the values of the incident pixels. The
prime intention of these additional features is to discriminate black pixels in the sta lines
from black pixels in the music symbols, penalizing the latter and favouring the former. With
this in mind, if a black pixel is part of a short vertical run of black pixels, then it is more
3.7. Sta Line Removal 31
likely to be part of a sta line rather than of a symbol. Therefore, a term beneting such
edges is included in the weight function. The other attribute is that a sta line is likely to
have another sta line at roughly staspaceheight pixels, assuming that staves have at least
two lines. Hence, if the nearest vertical run of black pixels on the same column is excessively
far from the vertical run of black pixels containing the current black pixel, then this pixel is
more likely to belong to a symbol (for instance, it may belong to a ligature) rather than to
sta line. Consequently, a penalising term is incorporated in the weight function for these cases.
The pseudo-code for the weight function is provided in Listing 2.
WeightFunction(pixelValue1, pixelValue2, vRun1,
vRun2, nearestVRun1, nearestVRun2,
NeighbourhoodType)
value = min(pixelValue1, pixelValue2);
weight = baseWeight(value, NeighbourhoodType);
if( (vRun1<=STAFFLINEHEIGHT)
OR(vRun2<=STAFFLINEHEIGHT))
weight = weight - delta;
if( (nearestVRun1>=STAFFSPACEHEIGHT+STAFFLINEHEIGHT)
OR(nearestVRun2>=STAFFSPACEHEIGHT+STAFFLINEHEIGHT))
weight = weight + delta;
return weight;
Listing 2: Pseudo-code for the weight Function. The base weight was set to 4 on black pixelsand 8 on white pixels for 4-neighbourhoods and to 6 and 12 on for 8-neighbourhoods. Thedelta penalizing term in the weight function was set to 1. For eciency, weights were designedwith integer values.
3.7 Sta Line Removal
The sta lines removal algorithms used in this work were adopted from [DDCF08]. Line Track
Height, Line Track Chord, Roach/Tatem and Skeleton were the algorithms chosen for the
experiments. A brief description of these algorithms will be provided now:
Line Track Height The algorithm tracks the sta lines and checks when a vertical black run
is longer than a threshold (experimentally set at 2×staineheight).
Line Track Height Modied The version modied of the Line Track Height algorithm also
track the sta lines positions obtained by a detection algorithm and removes vertical run
sequences of black pixels that have a value lower than a specied threshold (chosen exper-
imentally as 2×staineheight). In this version, a carefully attention to the deformations
sta lines may have discontinuities, be curved or inclined that may occur in the music
scores are given. These problems will inuence the success to achieve a correct detection
of lines contained on the score. The positions of the sta lines obtained by a sta line
detection algorithm may pass slightly above or under the real sta lines positions. There-
fore, if we are in presence of a white pixel when the sta lines are tracked, we search
32 Chapter 3. Sta Lines Detection and Removal
vertically for the closest black pixel. If that distance is lower than a specied tolerance
experimentally chosen as 1+ceil(staineheight/3.0) we move the reference position of
the sta line to the position of the black pixel found5.
Line Track Chord This algorithm computes, for a xed angle resolution of three degrees,
the chord length through the skeleton point (see Figure 3.4). This results in a function
chordlenght(ϕ), where ϕ is the chord angle, for each skeleton point. When the sta line
pixel also belongs to a crossing music symbol, the function should have a second distinct
peak. To detect this peak the following thresholds are used:
1. There is a local maximum when chordlength is greater than 5×staineheight at an
angle below 30 degrees and another local maximum when chordlength is greater
than 1.75×staineheight× sin(ϕ) at an angle ϕ > 30 degrees.
2. The valley between two maxima must have a depth greater than 1.5×staineheight.
Concluding, this algorithm removes the sta line through the angles peaks of the chord
lengths. There are two distinct peaks depending if the pixels belong to a sta line or a
music symbol.
Figure 3.4: Length of a chord through a skeleton point at some angle ϕ.
Roach/Tatem This algorithm uses a labelling scheme based on the angle information and
pixel adjacency to identify the sta line pixels [RT88]. The chord length and the angle
function described in the Line Track Chord are computed for every pixel. Consequently,
the original image can be transformed into two-dimensional vector eld by picking the
angle and length of the longest chord for each black pixel. This will assign pixels on sta
lines a high length value and an angle value of zero. To avoid the removal of symbol pixels
on the sta lines, some horizontal line pixels are iteratively relabelled as non-horizontal
pixels, depending on the labels of their neighboring pixels.
Skeleton This method consists of the following steps:
1. The skeleton is split at branching point and corner points with an angle below
135 degrees. Around each spliting point a number of pixels are removed see
Figure 3.5(a).
5See Appendix A.6 for more details.
3.8. Database of Music Scores 33
2. Sta line segment candidates are picked as skeleton segments if the orientation an-
gle (least square tted line) is below 25 degrees, the segment is wider than tall
and the straightness (mean square deviation from least square tted line) is bellow
staineheigth2/2.
3. Apply a sta-nding algorithm to the sta segment candidates. Two sta segments
are horizontally linked when their extrapolations from the end points with the least
square tted angle come closer than staineheigth/2.
4. Remove false positives: from each ovelapping sta segment group on the same line
the one that is closest to its least square tted neighborhood is picked and the oth-
ers are discarded; non-sta segments that have the same branching point as a sta
segment are extrapollated by a parametric parabola: if this parabola is approxi-
mately tangential to the sta segment, the latter is considered a false positive see
Figure 3.5(b).
5. Remove sta lines: all vertical black runs around the detected sta skeleton are
removed.
(a) Pixels within the distance transformradius around each splitting point are re-moved.
(b) A falsely detected sta segment that canbe identied as belonging to a music symbolbecause it is appoximately tangential to an ex-trapolated parabola from a non-sta segment.
Figure 3.5: Example (from [DDCF08]).
3.8 Database of Music Scores
For the purpose of testing the proposed algorithm, a database with a sucient number of scores
to produce signicant results on which we can draw conclusions and test improvements, was
created. Although the assessment of a new sta detection algorithm may be done by visually
inspecting the output on a set of scores as adopted on [RCC+07] , here we support the
comparison with two test sets adopted for the qualitative evaluation of the proposed method:
the test set of synthetic scores from [DDCF08] and a new set of real score. The synthetic
database consists of 32 ideal scores see Figure 3.6(a) with dierent musical notation to
which several known deformations were applied. In total 2688 images were generated. The
simulation of distortions in a controlled manner made possible to build many feasible problems
that can be found in the handwritten music scores. Despite this we also test the performance of
34 Chapter 3. Sta Lines Detection and Removal
the stable path algorithm with a test set of real scores in total 50 images from the Portuguese
composer Fernando Lapa with other old handwritten scores (see Figure 3.6(b)).
(a) Music score from the test set of 32 ideal scores. (b) Music score from the test set of 50 real scores.
Figure 3.6: Two examples of music scores from the test set used on the experimental evaluation.
3.8.1 Deformations in the Synthetic Scores6
The Gamera framework7 (Generalized Algorithms and Methods for Enhancement and Restora-
tion of Archives) was the base platform used to create the deformation database with intention
to development and test the proposed algorithm. In a brief description, this framework is a
toolkit for building document image recognition systems. It has a programming library with a
set of graphics tools for experimentation and training. Gamera provides a wealth of tools for
image processing. In this work, in order to complete the Gamera framework, for the concrete
problem of building a syntactic database, the MusicStaves plug-in was included. This toolkit
appears to be ideal for the problem, because
¶ It is a specic platform for the development and test of the detection and removal sta
lines algorithms.
· It is an open source toolkit.
¸ It allows using handwritten and printed scores with the algorithms already implemented
in Gamera.
The distortions adopted from [DDCF08] applied to the test set of ideal scores can be classied
in two categories:
1. Deterministic deformations, which depend of certain parameters, as for instance the ro-
tation.
2. Random defects, which use various parameters about the deformation and a pseudo-
random number generator.
In both cases it is necessary to apply the deformation in parallel with the original score and
the ground-truth sta image.
Despite some deformations being easy to understand by the name, there are others where this
is not true (the defects resolution, rotation and line interruption are self explanatory). A brief
6Consult [DDCF08] for more details.7See http://ldp.library.jhu.edu/vhost-base/gamera and [Cap08] for more details.
3.8. Database of Music Scores 35
Deformation Type Parameter description
Rotation Deterministic Rotation angle
Curvature Deterministic Height ratio: width of sine curve
Typeset Emulation Both Range width, maximal heightand variance of vertical shift
Sta Line Interruptions Random Interruption frequency, maximal widthand variance of range width
Sta Line Thickness Variations Random Markov chain stacionarydistribution and inertia factor
Sta Line y-variation Random Markov chain stacionarydistribution and inertia factor
Degradation After Kanungo Random (η, α0, α, β0, β, κ), see [KHB+00]
White speckles Random Speckles frequency, randow walklength and smoothing factor
Table 3.1: Deformations in the images.
explanation of these distortions will be done. In Table 3.1 the deformations considered, which
are available in the MusicStaves toolkit, are listed. In Figure 3.7 it is possible to see the eects
caused in the images.
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 3.7: Some examples of applied deformations from the original image: a) Original; b)Curvature c) Degradation after Kanungo; d) Sta line thickness variation; e) Sta line y-variation; f) Typeset emulation; g) Rotation; h) White Speckles; i) Sta line Interruptions.
Curvature The curvature is obtained through a half sine wave over the entire score area. The
intensity of the resulting curvature can be measure as the ratio of amplitude (height) and
the width of the wave.
36 Chapter 3. Sta Lines Detection and Removal
Typeset emulation This deformation tries to reproduce the 16th-century prints, which are
similar, in some ways, to typewriters, causing, therefore, interruptions in the lines between
the symbols and a random vertical shift in each portion containing a symbol.
Sta line thickness variation and sta line y-variation Sta line thickness variation and
sta line y-variation are obtained by a Markov chain describing the evolution of the sta
line thickness from left to right. These deformations are achieved by that process because,
usually, the thickness at a particular x-position depends on the thickness at the previous
x-position. The parameter is the transition probability matrix P with pij := probability
of transition from thickness or y-deviation i to thickness or y-deviation j. The thickness
or y-deviation can be of n dierent values (states). To the stationary distribution of the
individual states a symmetric binomial distribution is assumed, that is:
πi =
(n− 1i− 1
)1
2n−1
The mean value (n− 1)/2 of this distribution is associated with the original value in the
image without the deformation (staine_height for the thickness or zero for the devia-
tion from the original y-position). The Markov chain is generated with the Metropolis-
Hastings algorithm [Has70]. The transition probability matrix Q, to obtain candidate
transition points, is chosen to be:
qij =
c for j = i
1− c/2 for j = i± 10 otherwise
where the probability c can be considered as an inertia factor that allows smooth transi-
tions: the closer c is to one, the slower is the state variation.
Degradation after Kanungo This deformation tries to imitate the local distortions caused
during printing and scanning. The model has six parameters (η, α0, α, β0, β, κ) with
dierent meanings:
Each black pixel in the original image is ipped with probability α0 exp−αd2
+η,where d is the distance to the closest background pixel.
Each background pixel is ipped with probability β0 exp−βd2
+η, where d is the
distance to the closest foreground pixel.
κ is the diameter of a disk of a morphological closing operation.
White speckles This degradation model has three parameters (p, n, k) with the following
meaning:
Each black pixel in the original image is taken with probability p as a starting point
for a random walk of length n.
k is a rectangle of a morphological closing operation that will smooth an image
containing the random walk.
3.8. Database of Music Scores 37
Deformation Parameter range
Rotation angle = −5 : 2.5 : 5Curvature amplitude/stawidth = 0.02 : 0.02 : 0.10Typeset Emulation n_gap = 1 : 3 : 13, n_shift = 1 : 3 : 13, p_gap = 0.5Sta Line Interruptions frequency α = 0.01 : 0.02 : 0.10,
binomial parameter for width: n = 6, p = 0.5Sta Line Thickness Variation inertia c = 0.8, maximum deviation = 2 : 1 : 6Sta Line y-variation inertia c = 0.8, maximum deviation = 2 : 1 : 6Degradation After Kanungo η = 0, α0 = 0.5, α = 0.25 : 0.3 : 1.5,
β0 = 0.5, β = 0.25 : 0.3 : 1.5, κ = 2White Speckles smoothing factor k = 2,
random walk length n = 10,speckle frequency p = 0.03 : 0.02 : 0.11
Table 3.2: Ranges of deformation parameters used in the tests: min:step:max.
The image with the random walks is subtracted from the original: an image with
white speckles at the random walk positions is obtained.
Therefore, p can be interpreted as the speckle frequency, n as a measure for the speckle
and k as a smoothing factor.
The range of deformations parameters in our test set is given in Table 3.2. The values were
restricted with the aim of obtaining realistic deformations. Even so, when real scores are
treated, generally, the deformations do not occur in its pure form; a combination from several
deformations is more prone to happen. However, because the evaluation of the detection
algorithms is our objective it is more adequate to research the performance in each isolated
deformation.
3.8.2 Ground-truth Information
In order to establish a qualitative comparison between the various sta lines detection and
removal algorithms a ground-truth information is necessary. Hence the black pixels of the
images need to be labelled as being sta lines or otherwise. The manual process has two
disadvantages: it is very time consuming and it is prone to occur errors (e.g. the existence
of dubious pixels that may belong to the line or to the symbol that crosses the sta line).
Therefore, ground-truth information of the syntactic images released by the authors of the
MusicStaves toolkit was used. In Figure 3.8 the deformation process and the ground-truth
generation are displayed. In the code of the postscript images (scores) the macros of drawing
lines were removed and the resulting images were converted to one-bit raster images. In this
manner, real musical scores without lines can be obtained. The distortions are applied in the
bitmap images.
When we are dealing with handwritten music sheets we can only achieve the ground-truth
information manually, that is, we need to delete the symbols present in the score in order to
38 Chapter 3. Sta Lines Detection and Removal
Figure 3.8: Generation of the deformed images and the respective Ground-Truth.
retain just the sta lines segments.
In order to evaluate the performance of the sta lines detection algorithm the scores with
only lines will be used as references, while the sta-less images will be used to test the sta
lines removal algorithms. Note that it is possible to directly store the same references for the
deformations synthetics images through skeleton generated in each distortion. However, once
again, in the handwritten music scores this process must be manual.
3.9 Evaluation Metrics and Results
This section presents the error metrics used to evaluate the performance of the proposed al-
gorithm for the detection and removal of sta lines in comparison with the shortest path al-
gorithm8 and the algorithms present in [DDCF08]. However, since the Dalitz's algorithm9 has
a signicantly better performance than the other two proposed in [DDCF08], the comparison
results presented here will be only for the Daltiz's algorithm.
3.9.1 Detection Error Metrics
With the purpose of evaluating the performance of the stable path algorithm and to study their
behaviour with respect to the several deformations applied to the images, two dierent error
metrics were considered: the percentage of false positive sta lines and the percentage of sta
lines missed to detect.
The process starts by computing the average Euclidian distance between each reference sta
line given by the ground-truth information; see Subsection 3.8.2 and each sta line actually
detected. Next, the matching problem on the resulting bipartite graph is solved by minimizing
the assignment cost, that is, the distance10. Only pairs with average error-distance below
staineheight were assumed correctly matched. The remaining pairs were assumed to originate
from a false positive sta line being matched to an undetected true sta line and were therefore
unmatched. In the end, the two metrics result as the number of unmatched detected sta lines,
8Description of the algorithm in [CCRG08b].9See Appendix A.3 for a brief description.
10See Appendix A.4 for a brief description of the method.
3.9. Evaluation Metrics and Results 39
that is, false positive, and unmatched reference sta lines, that is, false negative. It should be
noted that these metrics only measure whether sta lines are found, not how good the match
is.
Results
From the results presented in Table 3.3 we can conclude that the stable path based approach
(and the shortest path approach) outperforms the Dalitz algorithm11, for both error metrics
dened. In some deformations a visible/considerable dierence between stable path and Dalitz
algorithms does not exist. However, there are not cases where the proposed algorithm is in
disadvantage with respect to Dalitz algorithm.
Besides that, the performance of the stable path approach is almost independent of intensity of
the deformation, for the range of values considered. This performance gain is even more note-
worthy as the Dalitz algorithm is receiving as input the correct number of lines per sta. Had
this not been the case, the dierence between both would have been much larger. The current
implementation of the stable path algorithm run as fast as the Dalitz algorithm (and about ve
times faster than the shortest path version), as available at the Sta Removal Toolkit [DDCF07].
By generalizing the design of the weight function and enhancing the postprocessing, we were
able to improve the detection performance on the initial results in [CCRG08b]; the use of the
stable paths allowed to improve the detection speed.
Since all the parameters of the method are scaled by one of these two values estimates staine-
height and staspaceheight, a simple analysis of the sensitivity of the approach to these param-
eters was done. We re-run the experiments with values for staineheight and staspaceheight
one pixel above and below the true estimated values. In both cases, the stable path method
presented a good performance and continued to yield the best results (the strongest degradation
occurred for the rotation degradation, with angle=5°, where both error rates increased to 1.7%).
In a second similar experiment we evaluated the sta line detection methods on a set of 50
real music scores, for which reference sta lines were manually outlined as referenced in Sub-
section 3.8.2. Images were previously binarized with the Otsu threshold algorithm12, as im-
plemented in the Gamera project13. The evaluation of both detection algorithms yielded the
results presented in Table 3.4. Again, and although the correct number of sta lines per sta
was inputted into the Dalitz' algorithm, the stable path approach obtained the best perfor-
mance.
11For the deformations not shown, the stable path is not signicantly better than Dalitz.12See Appendix A.5 for a brief description of the method.13http://gamera.sourceforge.net
40 Chapter 3. Sta Lines Detection and Removal
rotation
Angle
-5-2.5
02.5
5Runtime
Error
Stablepath
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.6
(3.5);0.6
(3.5)
0.7
(3.5);0.7
(3.5)
1.2
(4.0);1.2
(4.0)
858sec.
SortestPath
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.6
(3.5);0.6
(3.5)
0.7
(3.5);0.7
(3.5)
1.2
(4.0);1.2
(4.0)
6006sec.
Dalitz
17.8
(22.0);51.7
(38.1)
8.6
(14.0);15.5
(28.7)
0.0
(0.0);0.0
(0.0)
4.2
(19.6);9.8
(29.0)
5.5
(9.3);37.5
(41.9)
612sec.
curvature
Amplitude/stawidth
0.02
0.04
0.06
0.08
0.10
Runtime
Error
Stablepath
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
1.2
(4.0);1.2
(4.0)
822sec.
SortestPath
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
1.2
(4.0);1.2
(4.0)
5554sec.
Dalitz
12.7
(29.0);12.6
(28.7)
53.8
(44.5);55.7
(45.2)
83.3
(30.6);83.9
(30.4)
95.8
(17.5);100.0
(0.0)
96.0
(17.5);100.0
(0.0)
639sec.
whitesp
eckle
Rate
whitened
pixels
0.03
0.05
0.07
0.09
0.11
Runtime
Error
Stablepath
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
0.9
(3.7);0.9
(3.7)
1.2
(3.8);1.2
(3.8)
2.1
(4.6);2.3
(4.8)
809sec.
SortestPath
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
0.9
(3.7);0.9
(3.7)
1.7
(4.0);1.9
(4.3)
5.3
(7.4);7.0
(9.6)
5122sec.
Dalitz
0.0
(0.0);0.0
(0.0)
0.3
(1.4);0.3
(1.4)
26.7
(25.3);29.9
(27.2)
89.3
(54.6);86.9
(25.6)
54.5
(55.9);95.2
(17.0)
872sec.
liney-variation
Maxdeviation,n
23
45
6Runtime
Error
Stablepath
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
1.1
(3.8);1.1
(3.8)
767sec.
SortestPath
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.7
(3.5);0.7
(3.5)
0.8
(3.6);0.8
(3.6)
1.1
(3.8);1.1
(3.8)
5122sec.
Dalitz
31.0
(50.7);31.7
(46.1)
22.1
(38.9);32.6
(45.6)
15.7
(27.2);33.7
(45.0)
13.0
(20.1);33.7
(45.0)
12.8
(18.6);34.2
(44.7)
768sec.
typesetemulationI
Maxgapwidth,ngap
14
710
13
Runtime
Error
Stablepath
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
739sec.
SortestPath
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
0.6
(3.5);0.6
(3.5)
5085sec.
Dalitz
19.7
(34.1);13.7
(21.6)
21.4
(27.4);15.4
(17.3)
22.3
(30.0);17.4
(19.0)
24.2
(38.9);16.7
(22.0)
31.4
(42.3);19.2
(20.3)
703sec.
typesetemulationII
Maxverticalshift,
nshift
14
710
13
Runtime
Error
Stablepath
0.6
(3.5);0.6
(3.5)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
886sec.
SortestPath
0.6
(3.5);0.6
(3.5)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
0.7
(3.8);0.7
(3.8)
6106sec.
Dalitz
3.3
(10.4);2.3
(7.3)
15.9
(27.1);12.0
(17.1)
39.0
(48.7);24.6
(27.3)
48.7
(60.8);29.3
(30.7)
58.7
(54.9);37.3
(29.0)
842sec.
Table3.3:
Eectof
dierentdeform
ations
ontheoverallsta
detectionerrorratesin
percentage:
average(stand
arddeviation)
ofthefalsedetection
rate
andmissdetectionrate.See[DDCF08]forparameter
details.
3.9. Evaluation Metrics and Results 41
False detection rate miss detection rate RuntimeDalitz 5.2% (10.4) 5.9% (11.3) 112 sec.
Shortest path 1.4% (3.5) 2.5% (7.3) 612 sec.Stable path 1.3% (5.7) 1.4% (6.4) 115 sec.
Table 3.4: Detection performance on real music scores in percentage: average (standard devi-ation).
3.9.2 Removal Error Metrics
Sta line detection algorithms can be used as a rst step in many sta removal algorithms.
To understand the potential of our algorithm to leverage the performance of existing sta re-
moval algorithms, we conducted a series of experiments, comparing existing versions of sta
line removal algorithms with modied versions of them, making use of the stable path algo-
rithm at the sta line detection step. The quantitative comparison of the dierent algorithms
is totally in line with the comparison presented in [DDCF08]. Adopting the naming convention
from [DDCF08], the following algorithms were adapted: LineTrack Height, LineTrack Chord,
Roach/Tatem, as already described in Section 3.7.
The original version of the algorithms was considered as available in [DDCF07], making use of
the Dalitz algorithm in the detection phase. The modied versions use instead the stable path
for detecting lines. We also adopted the same error metrics (individual pixels, sta-segment
regions and sta interruption location) presented in [DDCF08] and conducted the comparative
study on the same test set see Section 3.8.
Individual Pixels This metric considers the sta line removal as a two-class classication
problem at the pixel level, that is, one pixel can belong to a sta line or not. Therefore,
a natural performance measure is the error rate for this classication, given by
Pixel error rate =x+ y
z
x =Number of misclassied sta pixels
y =Number of misclassied non sta pixels
z =Number of all black pixels
However, this metric has one problem: little information is given about how well the sta
removal algorithm separates symbols that are otherwise connected by sta lines; this error
only indicates how badly the symbols are distorted when compared to the ideal sta-less
images.
Segmentation Region Level This error metric has as base the regions of the line segments.
The sta line removal can be considered as a segmentation problem where the sta lines
segments need to be separated from the symbol segments.
42 Chapter 3. Sta Lines Detection and Removal
In an OMR application the sta lines segments are considered background and the re-
maining symbols are taken as being the segments of interest. Nevertheless, when we
are trying to evaluate the quality of the sta line removal the situation is reversed: our
interest lies on the sta segments and the rest constitutes background.
Following the notation given in [TMD99], we have two segmentations for the set of black
pixels present in the test image:
1. The ground-truth segmentation G = Gobj ∪ gnoise with G = g1, . . . , gM2. The segmentation detected from the algorithm S = Sobj ∪ Snoise with S =s1, . . . , sN, where each gi and sj contains the black pixels of a contiguous sta
segment respectively, and gnoise and snoise contain the remaining background black
pixels, respectively.
In the set of all sta line segments from both segmentations an equivalence classes of
overlapping segments is built(two segments are considered equivalent a ' b when a se-
quence c1, c2, . . . , cn exists with c1 = a, cn = b and ci ∩ ci+1 6= 0). For each equivalence
class r we count the number of the contained numbers of segments G and S and thus
detect recognition errors. All possible cases are listed in Table 3.5. The formula is given
by
Segmentation error rate =x− yz
x =Number of all classes r
y =Number of classes representing a correct recognition
z =Number of all classes r
Classes number Segments from Gobj Segments from Sobj Error description
n1 1 1 Correct
n2 1 0 Missed segment
n3 0 1 Falsely detected segment
n4 1 >1 Segment split
n5 >1 1 Segments merged
n6 >1 >1 Both splitting and merging occurred
Table 3.5: Sta segment extraction errors based on the number of segments in an equivalenceclass r.
Sta Line Interruption To obtain this metric a comparison between the ground-truth im-
ages, containing only the segments removed from the ideal case, with the image that
contains exactly the pixels that were in fact removed by the removal algorithm under
evaluation, is made. In doing so, each sta line is followed, from left to right, in the
images containing only the removed sta segments; and interruptions in the sta line are
looked at. Each interruption represents a detected music symbol that crosses the sta
line. It follows two sets of intervals: the interrupting intervals G = g1, . . . , gM in the
ground-truth data and those in the algorithm output S = s1, . . . , sN.
3.9. Evaluation Metrics and Results 43
With the purpose to establish an error metric, a bipartite graph is created by adding links
between intervals gi and sj that overlap. In this manner, two types of error are revealed:
intervals from G to S without a link and intervals with more than one link. In order to
count the number of errors of the second type, the maximum cardinality matching in this
graph is computed [Gal86].This cardinality matching also removes the minimal number
of links leading to the second type error. As a result error rate we have
Sta line interruption error rate =min n3, n1 + n2
n3
n1 =Number of interruptions without link
n2 =Number of removed links
n3 =Number of ground-truth interruptions
Results
From the results obtained we can conclude that the qualitative eects of the deformations
and the insertion of the stable path as the detection algorithm are similar for all three error
metrics. In Tables 3.6, 3.7, 3.8 and 3.9, we present the results for all three error metrics
(individual pixels, sta-segment regions and sta interruption location) for the Line Track
Height algorithm (modied) (LTH), plus the Skeleton algorithm, which exhibited a competitive
performance in [DDCF08]. This algorithm does not clearly separate the sta lines detection
from the sta lines removal, doing both tasks simultaneously. So, this algorithm was used
alone, without providing the detection performed by stable paths algorithm.
Stable path + LTH Dalitz + LTH SkeletonPixel Error Rate 2.8 (1.2) 3.8 (2.6) 6.5 (8.2)
Segmentation Error Rate 0.3 (0.1) 0.3 (0.1) 0.3 (0.1)Sta Line Interruption Error Rate 0.3 (0.1) 0.3 (0.1) 0.3 (0.1)
Table 3.6: Removal performance on real music scores (in percentage): average (standard devi-ation).
A rst observation is that, overall, the replacement of the Dalitz method by the stable path
approach as the sta detection step improved the results in the algorithms under comparison.
Additionally, the LineTrack Height algorithm with the stable path consistently outperformed
the other algorithms. Nevertheless, the skeleton method [DDCF08], which does not have a clear
line detection step, continues to present a competitive performance. It is worth to nalize by
noticing that the skeleton algorithm is about two times slower than the modied Line Tracking
Height algorithm.
In Figure 3.9 it is possible to see some results obtained when we apply the LineTrack Height
algorithm with stable path approach to our test set.
44 Chapter 3. Sta Lines Detection and Removal
rotation
Angle -5 -2.5 0 2.5 5Stable path + LTH 1.7 (0.7) 1.5 (0.7) 1.4 (0.7) 1.4 (0.7) 1.6 (0.7)
Dalitz + LTH 19.4 (18.4) 5.2 (8.7) 1.4 (0.8) 4.4 (8.8) 17.5 (18.9)Skeleton 1.9 (0.9) 1.7 (0.8) 1.5 (0.7) 1.6 (0.7) 1.7 (0.8)
curvature
Amplitude/stawidth 0.02 0.04 0.06 0.08 0.10Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.5 (0.7) 1.6 (0.7)
Dalitz + LTH 3.8 (5.8) 14.0 (12.2) 22.8 (13.7) 31.1 (11.0) 35.0 (10.6)Skeleton 2.6 (2.4) 5.2 (5.1) 8.1 (7.2) 11.9 (8.6) 15.4 (10.4)
white speckle
Rate whitened pixels 0.03 0.05 0.07 0.09 0.11Stable path + LTH 11.9 (3.1) 17.2 (4.9) 21.1 (5.9) 24.0 (6.7) 26.1 (7.2)
Dalitz + LTH 11.5 (3.2) 16.8 (4.9) 26.7 (8.0) 53.3 (14.9) 73.3 (14.6)Skeleton 14.6 (3.2) 21.5 (4.6) 27.1 (5.6) 35.2 (12.8) 46.9 (18.7)
line y-variation
Max deviation, n 2 3 4 5 6Stable path + LTH 1.2 (0.7) 1.3 (0.7) 1.3 (0.6) 1.4 (0.6) 1.4 (0.6)
Dalitz + LTH 9.0 (13.2) 10.4 (14.1) 10.9 (14.5) 10.9 (14.5) 11.0 (14.6)Skeleton 1.5 (0.8) 1.7 (0.8) 2.2 (0.9) 3.7 (1.7) 5.2 (2.2)
typeset emulation I
Max gap width, ngap 1 4 7 10 13Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.4 (0.7)
Dalitz + LTH 2.6 (1.8) 2.9 (2.0) 3.2 (1.7) 2.9 (1.7) 3.0 (1.8)Skeleton 26.4 (9.8) 27.3 (10.1) 27.2 (11.3) 25.5 (9.8) 26.4 (10.3)
typeset emulation II
Max vert. shift, nshift 1 4 7 10 13Stable path + LTH 1.4 (0.7) 1.4 (0.7) 1.4 (0.7) 1.5 (0.7) 1.6 (0.7)
Dalitz + LTH 1.5 (0.8) 2.8 (1.6) 3.3 (2.5) 3.8 (2.4) 4.7 (3.7)Skeleton 7.9 (8.9) 24.1 (9.1) 26.7 (11.0) 26.1 (9.6) 29.1 (10.7)
Table 3.7: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Individual pixels error.
rotation
Angle -5 -2.5 0 2.5 5Stable path + LTH 0.3 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.3 (0.2)
Dalitz + LTH 0.6 (0.3) 0.3 (0.3) 0.2 (0.2) 0.3 (0.3) 0.5 (0.3)Skeleton 0.3 (0.2) 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.3 (0.2)
curvature
Amplitude/stawidth 0.02 0.04 0.06 0.08 0.10Stable path + LTH 0.2 (0.2) 0.2 (0.2) 0.2 (0.2) 0.3 (0.2) 0.3 (0.2)
Dalitz + LTH 0.3 (0.2) 0.4 (0.2) 0.5 (0.2) 0.6 (0.2) 0.7 (0.1)Skeleton 0.6 (0.2) 0.3 (0.2) 0.3 (0.2) 0.3 (0.2) 0.4 (0.2)
white speckle
Rate whitened pixels 0.03 0.05 0.07 0.09 0.11Stable path + LTH 0.2 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.04) 0.1 (0.04)
Dalitz + LTH 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.6 (0.2) 0.9 (0.2)Skeleton 0.3 (0.1) 0.3 (0.1) 0.3 (0.1) 0.3 (0.1) 0.4 (0.2)
line y-variation
Max deviation, n 2 3 4 5 6Stable path + LTH 0.2 (0.2) 0.3 (0.2) 0.3 (0.2) 0.3 (0.2) 0.4 (0.2)
Dalitz + LTH 0.5 (0.4) 0.5 (0.4) 0.5 (0.4) 0.5 (0.3) 0.5 (0.3)Skeleton 0.2 (0.2) 0.3 (0.2) 0.3 (0.2) 0.3 (0.2) 0.3 (0.2)
typeset emulation I
Max gap width, ngap 1 4 7 10 13Stable path + LTH 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1)
Dalitz + LTH 0.1 (0.1) 0.1 (0.1) 0.2 (0.1) 0.1 (0.1) 0.2 (0.1)Skeleton 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1)
typeset emulation II
Max vert. shift, nshift 1 4 7 10 13Stable path + LTH 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1)
Dalitz + LTH 0.1 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1)Skeleton 0.3 (0.2) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.7 (0.1)
Table 3.8: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Sta line interruption error.
3.9. Evaluation Metrics and Results 45
rotation
Angle -5 -2.5 0 2.5 5Stable path + LTH 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1)
Dalitz + LTH 0.6 (0.3) 0.3 (0.3) 0.2 (0.2) 0.3 (0.3) 0.5 (0.3)Skeleton 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1)
curvature
Amplitude/stawidth 0.02 0.04 0.06 0.08 0.10Stable path + LTH 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.3 (0.1) 0.3 (0.1)
Dalitz + LTH 0.3 (0.2) 0.5 (0.2) 0.6 (0.2) 0.8 (0.1) 0.8 (0.1)Skeleton 0.2 (0.1) 0.3 (0.1) 0.3 (0.1) 0.4 (0.2) 0.5 (0.2)
white speckle
Rate whitened pixels 0.03 0.05 0.07 0.09 0.11Stable path + LTH 0.6 (0.1) 0.5 (0.1) 0.5 (0.1) 0.5 (0.1) 0.4 (0.1)
Dalitz + LTH 0.6 (0.1) 0.5 (0.1) 0.6 (0.1) 0.8 (0.1) 0.9 (0.1)Skeleton 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.7 (0.2)
line y-variation
Max deviation, n 2 3 4 5 6Stable path + LTH 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.3 (0.1)
Dalitz + LTH 0.5 (0.3) 0.5 (0.3) 0.5 (0.3) 0.5 (0.3) 0.5 (0.3)Skeleton 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.3 (0.1) 0.3 (0.1)
typeset emulation I
Max gap width, ngap 1 4 7 10 13Stable path + LTH 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1)
Dalitz + LTH 0.1 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1)Skeleton 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1)
typeset emulation II
Max vert. shift, nshift 1 4 7 10 13Stable path + LTH 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1) 0.1 (0.1)
Dalitz + LTH 0.1 (0.1) 0.1 (0.1) 0.2 (0.1) 0.2 (0.1) 0.2 (0.1)Skeleton 0.3 (0.2) 0.6 (0.1) 0.6 (0.1) 0.6 (0.1) 0.7 (0.1)
Table 3.9: Eect of dierent deformations on the overall sta removal error rates in percentage:average (standard deviation) Segmentation region level.
Figure 3.9: Examples of the results obtained using the sta line removal algorithm in our testset.
Part III
Segmentation and Classication
47
Chapter4
Segmentation and Classification Process
This chapter begins with a description of the dierent music symbols that can be found in a
music score. Then proceeds with the description of the segmentation and classication steps.
Music notation, like any other written language, did not appear as the invention of one person.
It emerged from the combined and prolonged eorts of hundreds of musicians. They all hoped
to express by written symbols the essence of their musical ideas [Rea69]. Improvement in those
symbols came about because it was necessary. Hence, the musical notation is very extensive if
we consider all the existing possibilities and their variations over time. The result notation is
a kind of alphabet, shaped by a general consensus of opinion to serve as a general expressive
technique. Music notation is, thus, the visual manifestation of the interrelated properties of
musical sound (as pitch, intensity, time, timbre and pace). Symbols indicating the choice of
tones, their duration and their manner of performance are important because they form this
written language that we call music notation.
Several symbols can be confused with each other because of their shape similarity. Besides
that, in some cases, complex symbols can appear in the score, as for instance the guitar sym-
bols (diagram with the positions of the notes in the instrument). And if we consider that we
are working with handwritten music scores several additional problems come about. One with
more evidence is the variation in the symbols notation caused by each person. In this work we
decided to study and recognize the standard music notation. In this manner, variants of types
of specic notation will not be recognized, as drums notation. Furthermore, since tablature
is something specic to certain instruments, it will also not be recognized. In the following
Section the symbols that we treat in this thesis will be described.
4.1 Musical Symbols
Clefs The term Clef is derived, logically enough, from the Latin word clavis, meaning key.
The three clef signs in common use today had their origins in pitch letters: the earliest
form was the Latin letter F related with the tone f (fa), when a second line was latter
drawn above the text over the F the letter C was axed upon it, representing the pitch
of c' (do), the last letter was G which represents the tone g' (sol). The F clef became
the bass clef because of its lower register; the C clef became either tenor or alto clef ; and
49
50 Chapter 4. Segmentation and Classication Process
the G clef became the soprano or treble clef. The clef symbols are the rst symbols that
appears at the beginning of every music sta. It is very important because they tell us
which note is found on each line or space.
Table 4.1: Clefs.
Treble clef
Alto clef
Bass clef
The modern bass clef is used for all voices and instruments of essentially a bass register;
for instance baritone voices, keyboard and orchestra instruments as piano and organ.
The alto clef is used in English horn, trombones and violas. It is important to stress
that none of these last instruments uses the alto clef exclusively; only the viola employs
it with consistent regularity. The treble clef is the highest of all the clef signs, beginning
below the sta and extending above it [Rea69]. All instruments and voices of high range
employ this symbol, as for instance lyric soprano, orchestra instruments as piccolo and
ute, English horn, saxophone, among others.
Noteheads and stems A musical note indicates two important aspects. On the one hand,
the symbol indicates, by its position on the sta and by the clef used, a pitch to be played
or sung. On the other hand, the note establishes, by the exact appearance of its three
integral parts (notehead, stem and ag), the relative time duration of this musical sound.
Table 4.2: Notes.
Minim
Crochet
The notehead is somewhat oval in shape, and is either open Minim or closed Crochet.
4.1. Musical Symbols 51
The stem is the thin vertical line joined to the side of the note head (closed or open).
When the stem is to go down, it is placed on the left side of the notehead; when the stem
is to go up, it is put on the right side of the notehead. This stem direction is measured
as follows: when the notehead is above the centre, line of the sta, the stem goes down
see Figure 4.1(a); when the note is below the middle line of the sta, the stem goes up
see Figure 4.1(b); and when the note is centred on the sta, the stem may go in either
direction see Figure 4.1(c), although it is more common practice to draw it down. The
stem length is proportional, that is, it is measure interms of the note position on the sta.
(a) Stem down. (b) Stem up. (c) Note positioned in the centre ofsta the stem may go in either di-rection.
Figure 4.1: Example.
Flags and beams Flags are employed to indicate the relative time values of the notes with
black (closed) noteheads. These symbols are featured by a thin curved stroke that joins
to the end of the stem, and always goes to the right of this symbol, regardless of whether
the stem goes up or down. When a dot follows the notehead, the end of the curved line is
often shortened so as not to conict with the dot. On a downward stem the curved line
barely touches the bottom of the notehead.
Table 4.3: Flags and beams.
Quaver
Semiquaver
Demisemiquaver
Hemidemisemiquaver
Beams are compound ags used to connect notes in note-groups. The number of beams
always equals the number of ags appropriate for each note. These symbols demonstrate
the metrical and the rhythmic divisions within the measure and also the note-groupings
52 Chapter 4. Segmentation and Classication Process
in oppositions to the normal patterns of the measure. Beams are considered in two
categories: primary (beams that link an entire group; without breaks) and secondary
(beams that are interrupted or partially broken) see Figure 4.2. The position of the
beams is solved by the same general principles that govern stem direction. That is, if the
majority of stems normally go up in the group to be beamed, then all the stems will go
up and the beams will be placed above the notes; on the other hand, if most of the notes
require a downward stem (most of the noteheads lie above the sta centre), then the
stems of all the notes must be written down and the beams placed below the noteheads.
The direction of all beams should approximately parallel to the main body of the notes
they connect.
Figure 4.2: Example.
Rests The rests were invented to indicate the exact duration of silence in the music. Therefore,
each note value has its corresponding rest sign. The written position of a rest between
two barlines is determined by its location in the meter, that is, a rest always occupies
the same position in a measure that an equivalent note value would ll, except the whole
rest, which is always centred.
The whole rest is solid, oblong symbol and placed just beneath the fourth line of the
sta (regardless of the clef used). It is incorrect to place it elsewhere unless two voices
or instruments share common sta. When this happens, the whole rest is usually writ-
ten beneath the top line for the upper part, and beneath the bottom line for the lower part.
The half rest is an oblong mark of the same length as the whole rest, located on the third
sta line. Like the whole rest it is incorrect to place this symbol on any other line unless
two voices or instrumental parts share the sta. In this case, the half rest goes above the
top line for the upper part, and above the bottom line for the lower part.
The quarter rest is the most dicult symbol to write. The sign is centred on the sta,
with the top tail ending in the fourth space and the bottom hook resting in the rst space.
When two parts share a common sta, this rest goes somewhat above the sta centre for
the upper part, and just below it for the lower part.
4.1. Musical Symbols 53
Table 4.4: Rests.
Rest Note
Whole rest
Half rest
Quarter rest
Eighth rest
Sixteenth rest
thirty-second rest
Sixty-fourth rest
The eighth rest has half of the value of the quarter rest. This symbol is written as a
slanting stem with a single hook. The stems rest occupies the second sta line and the
hook occupies the third space.
The sixteenth rest has half of the value of the eighth rest. This symbol has another hook
to the slanting stem. This second hook occupies the second sta space and the stem must
be extended to touch the bottom line.
The thirty-second rest has half of the value of the sixteenth rest. It consists of three
hooks and a further extended stem. The three hooks occupy the second, third, and fourth
spaces, with the stem resting on the bottom line.
The sixty-fourth rest has half of the values of the thirty-second rest. It is a elaborate
structure with its four hooks. Each hook must occupy one of the four sta spaces, and
the slating stem must accordingly be extended until the distance of one space below the
54 Chapter 4. Segmentation and Classication Process
sta.
Ties and slurs Ties are a notational device used to prolong the time value of a written note
into the following beat. This symbol is a curved line that connects two successive note-
heads indicating, together, the total time value desired. The tie appears to be identical
to slur, however, while tie almost touches the notehead centre, the slur is set somewhat
above or below the notehead. Besides that, ties are normally employed to join the time
value of two notes of identical pitch, while slurs have a totally dierent function. The tie
always loops in the direction opposite that of the stem. In other words, when the tie notes
have their stems going up, the tie mark loops below the noteheads see Figure 4.3(b),
when the two stems go down, the tie is curved above the noteheads see Figure 4.3(a).
Slurs aect note-groups as entities indicating that the two notes are to be played in one
physical stroke, without break between them. When the slur is used in conjunction with
tied notes, it may be placed over or under the noteheads tied together, but the tie always
loop according to the note position on the sta.
Table 4.5: Ties and Slurs.
Tie
Slur
(a) (b)
Figure 4.3: Example: a) Tie above noteheads; b) Tie under noteheads.
Accidentals The signs that are placed before the note to designate changes in sounding pitch
are called accidentals. These signs are ve in number: the natural, the at, the sharp,
the double at and the double sharp. However, in this work, only the rst three symbols
will be treated.
The at [ is literally the rounded small letter b found in Guido's soft hexachord on
f. In modern practice, the shape of the at is somewhat more slanted upward than the
4.1. Musical Symbols 55
ordinary small letter b. when the at is placed on a line, the top of the stem extends to
the second line above. When the symbol is notated in a space, the stem extends to the
center of the second space above.
The form of the sign natural \ has its origin in the square b found in Guido's hard
hexachord. This sign represented the raised form of the pitch b. By adding a short tail
to the square b, and slightly slanting the horizontal strokes, we get the natural sing. The
two tails of this symbol are to be extended quite as far on the sta as the stem of the
at. When the sign is on a line, the upper tail of the natural extends only to the next
higher space and the lower tail to the next lower space. If the sign is in a space, the tails
go respectively to the adjacent lines above and below.
The sharp ] was written in a variety of ways, beginning with what looks like and continuing
with modied forms of the St. Andrew's cross. In our days, it resembles more closely an
elaborated natural sign with an extension of both vertical lines and the horizontal dashes.
Accents Accents marks are symbols for special or exaggerated stress upon any beat, or portion
of a beat. These indications are sometimes called ancillary signs because they are not
such basic symbols as the sta, clefs and notes. The accents are divided into categories:
those for percussive attack (higher dynamic levels) and those for pressure attack (lower
levels). The two principal signs for percussive accent are 4.6 Accent and 4.6 Marcato.
The rst symbol is made identically whether it is placed above or below the notehead.
The second sign is inverted when it is placed below any notehead, stem or beam. Choice
between the two is governed by the degree of force and intensity desired. While the sign 4.6
Marcato is for a stronger attack, the sign 4.6 Accent is for a moderately sharp attack.
The principal symbol for pressure accent is a short, heavy dash placed over or below the
notehead - see Figure 4.6 Tenuto. In our work only the Accent and Staccatissimo accents
was used to facilitate the classication process.
Table 4.6: Accents.
Percussive accents
Accent Marcato
Pressure accents
Staccatissimo Tenuto
56 Chapter 4. Segmentation and Classication Process
Time signatures Meter is a recurring pattern of stress, and an established arrangement of
strong and weak pulsations. These pulsations are also known as beats. Meter is inextri-
cably related to time signatures. This symbol is the vertical arrangement of two gures
placed on the sta following the clef sign and any key signature. When placed on the
sta, in the normal position, the numerator, which indicates the number of beats or the
number of inner pulsations, occupies the two top spaces and the denominator, which
indicates the unit of time, occupies the two bottom spaces. Despite of these terms, a
time signature is not to be regarded as a fraction. There are several possible symbols for
the representation of time signatures. Some are fairly common in practice, some are very
rare. In this work we chose the common time and the cut time.
Table 4.7: Time signatures.
Common time
Cut time
4.2 Segmentation Process
The segmentation process used in this work is based on already existent algorithms [Fuj04,
BBN01]. This step is a hard one because of several reasons. One of the main problems deals
with the variability of the symbols. This is obvious when we consider handwritten music scores
see Figure 4.4(a). However this variability is also found in printed scores, when we have scores
from dierent editors see Figure 4.4(b). This variability will cause troubles of inconsistency
in the size and shape of each object.
Besides that, the number of possible arrangements of symbols primitives is very high. These will
bring complexity and ambiguity in the segmentation process, because we do not know all the
probable groups of symbols. Another problem is related with the sta line removal approach.
Sta lines connect most of the objects and when we eliminate them we may cause breaks in
the music symbols. These will bring undesirable imperfections that will dicult the segmenta-
tion process. Finally, a symbolic high density in the music score will also not help in this phase.
With this brief description of the several problems that we can nd in this step we may conclude
that it is clear that not all problems can be solved at this level, and the imperfections of the
segmentation have to be taken into account during the next processing steps.
The segmentation process architecture is presented in Figure 4.5 and an example of the process
on a real music score is presented in Figure 4.6. This process consists in localizing and isolating
4.2. Segmentation Process 57
(a) Variability in the real scores.
(b) Variability between dierent publishings in printed scores.
Figure 4.4: An exemplicative example of the variability in the music symbols.
the symbols in order to identify them. In this work, the symbols we want to recognize can
be split into four dierent types: the symbols that are featured by a vertical segment with
height greater than a threshold (notes, notes with ags and open notes), the symbols that link
the notes (beams), the remaining symbols connected to sta lines (clefs, rests, accidentals and
time signature), and the symbols above and under sta lines (notes, ties, slurs and accents).
As a result, the segmentation method proposed was based on a hierarchical decomposition
of the music image. This music sheet was therefore analyzed and split by stas, after the
sta lines removal, as being the rst step. Subsequently, the connected components1
1If you note by the Figure 4.6 there are symbols that are not linked together but connected. This is duebecause in our connected component method we use a threshold of 5 pixels for the distance between objects.This is necessary because some music symbols became separate in the sta line removal phase.
58 Chapter 4. Segmentation and Classication Process
Figure 4.5: Segmentation process I.
Figure 4.6: Segmentation process II.
were identied. To extract the symbols with appropriate size, a selection of the connected
components detected in the previous step was done. The thresholds used for the height and
width of the symbols were experimentally chosen as staspaceheight and 16×staspaceheightfor the height and staspaceheight for the width. These values took into account the features
of the music symbols. In each bounding box of the connected components repeated objects
(e.g. when we have a group of notes and a sharp; in this case we have two distinct connected
components: notes and sharp; however when we extract the bounding box of the notes the
sharp is also included see Figure 4.7) can exist. Thus, it was necessary to remove them to
avoid multiple extractions of the same object. In the end, we are ready to nd and extract all
the music symbols. Next we will present the methods used to extract the symbols.
Beam Detection
It is reasonable to think that beams are one of the symbols with harder detection process.
The vicissitude inherent in their action in the musical score is the principle reason for this
diculty. Their shape and size, and the way they connect to each other and to other symbols
are combined in a huge number of dierent ways. They are also prone to present inconsistency
in the thickness and in the link with the stems see Figure 4.8. Thus, we propose a solution
that just checks the presence of a segment of adequate height which connects the extremities
of notes see Figure 4.9. Firstly, in the connected component detected, the stems are found
4.2. Segmentation Process 59
(a) Original image. (b) Image with two con-nect components: dier-ents shades of gray.
Figure 4.7: An exemplicative example of two dierent connected components in the samebounding box.
and removed to separate the beams from the notes. When we nd all the remaining objects
we select those who have a width bigger than 2×staspaceheight-staineheight and a height
lower than 4×staspaceheight. These values were selected experimentally and in accordance
with the geometric features of the object in cause. The height value needs to take into account
the inclination of the beams that may have. In the end, a check of the presence of a stem is
made to validate the objects selected. It is important to state that the beams are grouped if
the distance between them is lower than a half staspaceheight.
Figure 4.8: An example of the existence of inconsistency in the beam thickness and in the linkwith the stems.
Figure 4.9: Beam segmentation process.
Notes, Notes with Flags and Notes Open Detection
In this work, we decided to consider stems and note heads as one. In this way, we dened the
geometric features of the notes we want to extract as the objects with a height bigger than a
threshold, experimentally selected as twice staspaceheight, and a width limited by two values,
60 Chapter 4. Segmentation and Classication Process
also experimentally chosen as half staspaceheight and 3×staspaceheight. To simplify this task
the beams detected were removed before applying this algorithm see Figure 4.10.
Figure 4.10: Notes segmentation process.
Accidentals, Rests, Accents and Time Signature Detection
Generally, these symbols have similar values for width and height. The procedure used to ex-
tract them was based on the combination of X-Y projection proles technique2 see Figures 4.11
and 4.12. The heuristics used were experimentally chosen and these values are arguably sensi-
ble. On the one hand, we have symbols that have vertical sequence of black pixels, for instance
sharps, naturals and rests. On the other hand, we need to take into account the symbols
topological position, because in this case we are trying to detect accents and time signature.
Figure 4.11: An example of sharp detection on a real score.
2Horizontal (Y) projection has as result a vector which the component i is the sum of all black pixels fromrow i in the image; Vertical (X) projection has as result a vector which the component j is the sum of all blackpixels from column j in the image.
4.2. Segmentation Process 61
Figure 4.12: An example of sharp detection on a ideal score.
Clefs, Ties and Slurs Detection
These symbols have their own attributes, like a large width for the ties and slurs and a bigger
height for the clefs. In both of them we have no presence of stems. With these in mind,
the projection proles procedure was used with specic heuristics. For clefs a staspaceheight
< width < 4× staspaceheight and 2× staspaceheight < height < 2× numberstaspace ×staspaceheight + numberstaine × staineheight were used in the experiments. These values
take into account the fact that clef symbols are the largest of all the signs, beginning below
the sta and extending above it, as we said in the description 4.1. On the other hand, for the
relations symbols (ties and slurs) the rules for extracting them were based in a large width.
4.2.1 Results
With the purpose of evaluating the performance of the segmentation process and to study
its behaviour with respect to some deformations curvature and rotation applied to the
images, two dierent error metrics were considered: the percentage of false positive symbols
and the percentage of symbols missed to detect. A false positive (failed symbol) happens
when the algorithm falsely identies a musical symbol which is not one; and a false negative
(missed symbol) is when the algorithm fails to detect a music symbol present in the score.
These percentages are computed using the symbols position reference and the symbols position
obtained by the segmentation algorithm. The references for the symbols position were partially
done manually, that is, the music symbols that are in the ideal scores were manually splited
in order to separate the composed existing symbols and the junction of the beams signs
see Figure 4.13(b). In the resulting images the deformations described in Subsection 3.8.1 are
applied. In the end, the music symbols positions are extracted using a Matlab© function that
gives their bounding box. It is important to stress that this experiment was only made with
ideal scores where deformations were applied, because it was our intention to see the behaviour
62 Chapter 4. Segmentation and Classication Process
of the method with respect to some deformations and besides that, this was the only quickest
method of obtaining a database of reasonable number with the symbols position references in
a semi-automatic way3.
(a) Music score before split the composed symbols. (b) Music score after split the composed symbols.
Figure 4.13: An example of how the music symbols that are in the ideal scores were manuallysplited.
An overview from the results from the Figure 4.14 we can conclude that the segmentation
process has a good performance, since the percentage of correct symbols are largely over 80%.
The high percentage of failed symbols that sometimes we nd are related with extra symbols or
noise; that is, in the score that we are segmenting we have more symbols than in the reference
score. For example numbers or symbols that are not our objective to classify. Besides this result,
the performance of segmentation process is almost independent of intensity of the deformation,
for the range of values considered see Figure 4.16.
Figure 4.14: Results of the error metrics.
3We have only split the composed symbols in 18 ideal scores with standard music notation; the deformationswas obtained from these. In total we have 306 images.
4.3. Classication Process 63
Figure 4.15: Results of the error metrics for the rotation deformation.
Figure 4.16: Results of the error metrics for the curvature deformation.
4.3 Classication Process
In this Section several results obtained with the dierent classiers adopted are presented.
These classiers were already presented in Section 2.3. For the neural network, k-nearest
neighbour and support vector machines methods, each image of a symbol was initially resized
to 20 × 20 pixels and then converted to a vector of 400 binary values. This process is sig-
64 Chapter 4. Segmentation and Classication Process
nicantly dierent for the case of the hidden markov model. In the latter, the images were
normalised in a sta height and width of 150 and 30 pixels, respectively. The segmentation-free
approach was used to extract features from images [Pug06]. For this reason, a 2-pixel slidingwindow mechanism was used to produce a sequence of observations. In doing so, dependent
observations are replaced by observations depending on the horizontal position of the window.
We extract the same features that in [Pug06]: n distinct connected components of black pixels,
the largest black element (a(ni)/S, where a(ni) is the area of the largest black element and
S = height× width), the area of the smallest white pixel (a(nj)/S, where a(nj) is the area of
the smallest white element and S = height× width) and the gravity centres.
To assign classes to the images of music symbols the process was as follows: for the case of
neural network, for each of the images converted in to vectors they were assigned to another
one of length 14 this value correspond to the 14 classes of symbols with value 1 in the
correct class and value 0 otherwise. For the other three classiers the images were assigned to
a vector numbering, between 1 and 14, depending on the class to which they belong.
In the training phase, a generalization control capability on the classier was made. Putting
dierently, the available data was divided into training, validation and test set, towards the as-
sessment of the network topology with better performance. By that, through a cross validation
we selected the number of neurons in the hidden layer. In other words, the cross validation
consists in optimizing the model by changing the number of neurons in the hidden layer of
the network using the training data; for a further evaluation of performance the validation set
was tuned. The network model with better result was used to compute the expected error in
the music symbols classication. Thus, the training data was added to the validation data to
conduct a nal training for this network and therefore obtaining the nal model. The expected
error was obtained computing the dierence between the target values and the values obtained
by the network, using the test set. This process is slightly modied when we apply elastic
matching to our database. The elastic matching method is adopted in the training data and
training data with validation set. We increased the dataset by causing variability in our images
in order to simulate distortions that can happen in real scores. Our objective is to prepare the
classier for several possible situations that may occur in handwritten music symbols.
The neural network structure used to classify a real score without elastic matching is illustrated
in Figure 4.17. The input layer is composed by 400 units which output values are the vector
values that represent the image of the music symbol. The output layer has 14 units that cor-
respond to the 14 classes of the symbols. When a pattern belonging to the class i is presented,
the target value for the unit i in the output layer is the maximum value. The range for the
number of hidden neurons chosen in this work was [1, 2, . . . , 9]. The number of neurons in the
hidden layer, for this case, was 9 neurons see Table 4.8. This value was estimated during the
training of the neural network. To each neuron is associated a log-sigmoid activation function.
Model discriminant HMM was adopted to construct a model for each class [AYV01]. The
training of the HMM to learn the parameters of the models (λ = (A;B;π)) was done by
4.3. Classication Process 65
Figure 4.17: Neural network topology.
Handwritten symbols printed symbolsWith EM 9 8
Without EM 9 9
Table 4.8: Number of neurons in the hidden layer (not)using elastic matching method (EM) inthe dataset.
the Baum-Welch algorithm4. The goal of classication is to decide which class the unknown
sequence belongs to, based on the model obtained in the training phase. These symbols are
classied on the basis of the maximum likelihood ratio which is obtained by Viterbi algorithm5.
In Table 4.9 the number of states obtained by cross validation are represented. The range was
4See Appendix A.7 for more details.5See Appendix A.8 for more details.
66 Chapter 4. Segmentation and Classication Process
from 3 to 8.
Handwritten symbols Printed symbols
With EM 8 7Without EM 8 8
Table 4.9: The number of states in the hidden Markov model with and without the elasticmatching method (EM) in the dataset.
As in the previous methods, the value of k, in the k-nearest neigbor classier, and the values
of C and γ, in the support vector machines, were also obtained by cross validation and the
expected error was computed with the test set. The k value needs to be suciently big to
minimize the probability of error occurrence in the classication and suciently tiny for the
closest k points becoming closer to the point to classify. In doing so, for a k = 1, 2, . . . , 10, weobtained a k = 1 for a handwritten and printed dataset with and without the application of the
elastic matching. The parameters for the support vector machines are presented in Table 4.10.
The range was 2, 21.25, 21.5,. . ., 24.75, 25 for C value estimation, and 2−7, 2−6.75, 2−6.5,. . ., 22.75,
23 for γ value estimation.
Handwritten symbols Printed symbols
C γ C γ
With EM C = 23.5 = 11.3137 γ = 2−7 = 0.0078 C = 21.25 = 2.3784 γ = 2−5.75 = 0.0186
Without EM C = 24.5 = 22.6274 γ = 2−6 = 0.0156 C = 21.25 = 2.3784 γ = 2−5.75 = 0.0186
Table 4.10: The values of the parameters C and γ in the support vector machines with andwithout the elastic matching method (EM) in the dataset.
4.3.1 Results
Towards musical handwritten symbol classication, several sets of symbols were extracted from
dierent musical scores to train the classiers. In Table 4.11 the number of handwritten printed
music symbols per class used in the training phase of the classication structures is represented.
The symbols were grouped according to their shape. In doing so, the rests signs were divided
in two groups RestI and RestII; see Table 4.11. Besides that we include the unknown class
to classify those symbols that do not t into any of the other classes. In total we have 3222
handwritten music symbols and 2521 printed music symbols.
4.3. Classication Process 67
Handwritten
Symbol Class Total number
Printed
Symbol Class Total number
Music MusicSymbols Symbols
Accent 189 AltoClef 201
BassClef 26 Beam 291
Beam 438 Flat 155
Flat 230 Natural 127
Natural 317 Note 304
Note 466 NoteFlag 120
NoteFlag 122 NoteOpen 309
NoteOpen 208 TieSlur 67
RestI 135 RestI 63
RestII 401 RestII 321
Sharp 345 Sharp 13
Staccatissimo 21 Time 122
TrebleClef 99 TrebleClef 305
Unknown 404 Unknown 404
Table 4.11: Classes list of handwritten and printed music symbols.
68 Chapter 4. Segmentation and Classication Process
Results obtained Without Elastic Matching‖
Neural network Nearest neighbor Support vector Hidden Markov
machines model
Accent 89,36% 100% 89,36% 95,83%
BassClef 0% 100% 100% 55,56%
Beam 86,24% 98,17% 96,33% 91,89%
Flat 78,95% 100% 100% 81,36%
Natural 83,54% 100% 100% 97,50%
Note 84,48% 97,41% 95,69% 69,49%
NoteFlag 70% 83,33% 93,33% 62,50%
NoteOpen 0% 66,67% 50% 22,22%
RestI 78,79% 100% 96,97% 100%
RestII 99% 100% 100% 93,07%
Sharp 79.07% 96,51% 98,84% 85,23%
Staccatissimo 60% 100% 100% 100%
TrebleClef 58.33% 100% 83,33% 96,30%
Unknown 58.42% 76,24% 93,01% 40,59%
99% CI for the Expected
performed in percentage: [78.85 (1.21); 82.88 (4.47)] [93.41 (0.34); 94.53 (1.24)] [94.59 (0.07); 96.90 (0.27)] [76.58 (0.57); 81.04 (2.09)]
average (standard deviation)
Table 4.12: Results obtained with the test set in the classication process for the handwrittenmusic symbols.
Neural network Nearest neighbor Support vector Hidden Markov
machines model
AltoClef 94% 100% 100% 78%
Beam 93,15% 100% 100% 97,26%
Flat 100% 100% 97,37% 92,68%
Natural 100% 100% 100% 100%
Note 88,16% 98,68% 100% 93,42%
NoteFlag 96% 93,33% 90% 63,33%
NoteOpen 90,67% 100% 100% 88,31%
TieSlur 63.33% 93,33% 86,67% 83,33%
RestI 100% 100% 100% 100%
RestII 60% 100% 100% 86,67%
Sharp 98.75% 100% 100% 100%
Time 0% 100% 100% 33,33%
TrebleClef 93.33% 100% 100% 36,67%
Unknown 67,11% 82,89% 97,37% 73,68%
99% CI for the Expected
performed in percentage: [87.60 (0.41); 88.97 (1.53)] [95.93 (0.24); 96.74 (0.90)] [97.30 (0.30); 98.29 (1.10)] [84.17 (0.32); 86.71 (1.19)]
average (standard deviation)
Table 4.13: Results obtained with the test set in the classication process for the printed musicsymbols.
From the results obtained for the handwritten music symbols we can conclude that the classier
with better performance was the support vector machines with a 99% condence interval for
the expected performance [94.59%; 96.90%]. Besides that, Hidden Markov Model performed
better than neural network since the latter had an higher standard deviation than the other
and it also had two classes (BassClef and NoteOpen) of symbols with 0%.
‖See the tables of confusion in the Appendix B.
4.3. Classication Process 69
The results obtained for the printed music symbols show us that the support vector machine
is again the better classier compared with the others with a 99% condence interval for the
expected performance [97.30%; 98.29%].
Results obtained With Elastic Matching∗∗
The database during the cross validation process was divided into train, validation and test
sets. The deformations caused by the deformation function 2.10 of the elastic matching with
M = 1, 2, 3 and N = 1, 2, 3 were applied in the training data and training data with validation
set.
Neural network Nearest neighbor Support vector Hidden Markov
machines model
Accent 91,50% 100% 100% 91,66%
BassClef 0% 100% 100% 66,67%
Beam 86,24% 95,41% 100% 96,40%
Flat 71,93% 100% 100% 71,19%
Natural 89,87% 100% 100% 92,50%
Note 74,14% 96,55% 100% 77,97%
NoteFlag 40% 86,67% 100% 28,13%
NoteOpen 0% 66,67% 100% 22,22%
RestI 45,45% 100% 100% 100%
RestII 93% 100% 100% 85,15%
Sharp 88,37% 98,84% 100% 81,82%
Staccatissimo 20% 100% 100% 100%
TrebleClef 58.33% 100% 100% 77,78%
Unknown 44,55% 63,37% 100% 30,69%
99% CI for the Expected
performed in percentage: [74.88 (0.84); 77.67 (3.09)] [92.24 (0.32); 93.29 (1.17)] [87.79 (1.95); 96.00 (8.82)] [70.89 (0.67); 76.17 (2.47)]
average (standard deviation)
Table 4.14: Results obtained with the test set in the classication process for the handwrittenmusic symbols.
The results lead us to conclude that the application of the elastic matching to the music sym-
bols does not outperform the performance of the classiers without elastic matching. However,
in handwritten music symbols with higher similarities in their shapes, the process had a better
behaviour than the classiers without elastic matching see Table 4.16. This feature lead us
to suggest a study, in a future work, where the application of the deformation function used
only in these types of symbols to determine if the expected performed is augmented.
The same does not happen with printed music symbols. Our explanation resides on the fact
that these signs do not have high distortions in their shapes. Thus the elastic matching method,
in presence of such music symbols, makes the performance of the classiers slightly lower than
with handwritten music symbols.
The best classier for the test set of real scores was the support vector machines with a 99%condence interval for the expected performance [87.79%; 96.00%]. However, the standard
deviation was very high [1.95%; 8.82%]. Consequently, I sugest the nearest neighbor classier as
∗∗See the tables of confusion in the Appendix B.
70 Chapter 4. Segmentation and Classication Process
Neural network Nearest neighbor Support vector Hidden Markov
machines model
AltoClef 86% 98% 90% 78%
Beam 97,26% 100% 90,41% 100%
Flat 94,74% 100% 94,74% 97,37%
Natural 87.10% 100% 96,77% 100%
Note 84.21% 100% 97,37% 93,33%
NoteFlag 40% 90% 93,55% 50%
NoteOpen 92.21% 98,70% 90,79% 80,52%
TieSlur 13.33% 90% 86,67% 70%
RestI 100% 100% 93,75% 100%
RestII 0% 100% 93,33% 86,67%
Sharp 97.5% 98,77% 97,5% 100%
Time 0% 100% 100% 33,33%
TrebleClef 93,33% 100% 100% 100%
Unknown 56,57% 78,95% 100% 43,42%
99% CI for the Expected
performed in percentage: [77.66 (1.34); 82.11 (4.93)] [95.34 (0.22); 96.06 (0.80)] [93.13 (0.26); 94.23 (1.18)] [73.00 (1.00) 88.71 (3.67)]
average (standard deviation)
Table 4.15: Results obtained with the test set in the classication process for the printed musicsymbols.
Neural network Nearest neighbor Support vector machines
With EM Without EM With EM Without EM With EM Without EM
Natural 89.87% 83.54% 100% 100% 100% 100%
Sharp 88.37% 79.07% 98.84% 96.51% 100% 98.84%
Table 4.16: The performed of the natural and sharp symbols.
the best classier with a 99% condence interval for the expected performance [92.24%; 93.29%],
because with a lower standard deviation ([0.32%; 1.17%]) the results will not largely span in
the classier performance. The best classier for the synthetic scores was the nearest neighbor
classier with a 99% condence interval for the expected performance [95.34%; 96.06%].
Part IV
Conclusions and Future Work
71
Chapter5
Conclusion
This dissertation had the aim to overcome the predicament of musical symbol recognition in
handwritten scores through research and application of recent techniques of machine learning
and articial intelligence to this problem. In fact, despite advances made in the area of research
in optical musical recognition algorithms, as we have seen in the brief description of state of
the art, several open problems still exist with handwritten music sheets. On the one hand, the
scores tend to be rather irregular and conditioned by the authors' own writing style, and the
quality of the paper in which it is written might have degraded throughout the years, making
it a lot harder to correctly identify its contents. On the other hand, they are likely to have
changes of size, shape and intensity of handwritten symbols by the same author into the same
score and the sta lines may be tilted one way or another on the same page. They also may
be curved and may have discontinuities. As a result, the detection and recognition process in
the handwritten music scores becomes more complicated than in printed music sheets.
The rst challenge faced by an OMR system was at sta lines detection and removal. These
operations were one of the most important phases of musical symbol recognition, because they
determine the results for subsequent proceedings. It was seen that the reasons for detecting and
removing the sta lines lie on the need to isolate the musical symbols for a more ecient and
correct detection of each symbol present on the score. The work of this dissertation resulted
in a robust algorithm for the automatic detection of sta lines in music scores based in stable
paths. The proposed method used a very simple but fundamental principle to assist detection
and avoids the diculties typically asserted by symbols superimposed on sta lines. The main
idea was to consider the sta lines as the result of the shortest path between the two margins
of the music sheet, giving preference to black pixels. This approach for the sta line detection
algorithm was adapted to a wide range of image conditions, thanks to its intrinsic robustness to
skewed images, discontinuities, and curved sta lines. The proposed method was also robust to
discontinuities in sta lines (due to low-quality digitalization or low-quality originals) or sta
lines as thin as one pixel. In order to take full advantage of the method, existing sta line
removal algorithms were enhanced by using the stable paths method as its rst processing step.
Several tests that enabled improvements in this proposed approach that induced better results
were also performed. The encouraging results obtained led us to consider that investigation in
the detection of music symbols would be beneted from the improved sta line detection and
removal. In doing so, the next step was the application of already existent algorithms for the
73
74 Chapter 5. Conclusion
segmentation process using in the previous step the stable paths approach for the detection of
the sta lines.
Since in this work we dealt with handwritten musical scores the segmentation and classication
steps were hard. On one hand, we had a huge variability in the music symbols that caused
inconsistency problems in the size and shape of each object. On the other hand, the sta
lines removal may cause breaks in the music symbols that can insert imperfections into them.
Besides that, we also had complexity and ambiguity problems related with the number of pos-
sible arrangements of music primitives. The segmentation method was based on a hierarchical
decomposition of the music image and the symbols that we wanted to recognize were split into
four dierent types: the symbols that are featured by a vertical segment, the symbols that link
the notes, the remaining symbols connected to sta lines and the symbols above and under sta
lines. In the music symbol classication a profound comparative study with some classiers,
as the hidden Markov model and support vector machine, was presented. A new methodology
based on elastic matching was used in our dataset with conjunction with others classiers. Our
aim was to simulate controlled distortions, in the database, that can happen in real scores to
prepare the classier for several undesirable situations that may occur in handwritten music
sheets. With the study done, we concluded that the application of the elastic matching to
the music symbols did not outperform the performance of the classiers without this method.
However, we also saw that the results for classiers with elastic matching presented a better
behaviour for the handwritten music symbols with higher similarities in their shapes. The
best classier for handwritten music symbols was support vector machines with a performance
higher than 94.59% for the test set without elastic matching and 93.10% for the test set with
elastic matching. The best classier for printed music symbols without elastic matching was
support vector machines with a performance higher than 97.30% and nearest neighbor classier
for the test set with elastic macthing with a performance higher than 95.34%.
5.1 Future Work
The various approaches in OMR to musical symbols segmentation and classication are still
below the expectations for handwritten musical scores. It is intention of this project, in a
future work, that the proposed methodology incorporates in a natural way the prior knowledge
of the musical rules in the recognition of symbols, in order to overcome the limitations of
the current approaches that are incapable of dealing, in a robust form, with specicities of
the handwritten music. The new proposed methodology should also be naturally adaptable
to manuscript images and to dierent music notations. Specically, we intend to explore our
initial work in several directions:
Continue to investigate new methods of optical music recognition for handwritten musical
scores and for dierent music notations. This line of research involves a study and a
profound understanding of the latest techniques of pattern recognition, machine learning
and inductive logic programming. The merger of rules and techniques from dierent areas
should help overcome the problems of the existing algorithms.
5.1. Future Work 75
Integration of the algorithms developed in an OMR system with remote access via the
internet to a wide corpus of unpublished handwritten music encoded in a adequate format;
The creation of a system like this will not only centralize as much information as possible
but will also serve to preserve this corpus in a way that is easily accessible for browsing,
analysing and downloading, while keeping the scores in their original format along with
their digital counterpart. The availability of a system with these features will contribute
for the preservation of the musical heritage.
Specically, one of the objectives for a future work, is the investigation of new methods for
the segmentation phase. It is our intention to incorporate Hidden Markov Models in this
step. It will be necessary to perform an intense study and investigation in this eld. Besides
that, a simple detection of the music symbols is not sucient. It is also necessary to perform
their interpretation and to understand the connections between them to have a recognition
of the analyzed score. Therefore, the denition of these relations are needed before they are
implemented in a system. In this manner, the integration of syntactic music rules is required
in the recognition process to acquire semantic information. Thus, this project has as another
goal the research and application of inductive logic programming techniques.
References
[AS07] Shai Avidan and Ariel Shamir. Seam carving for content-aware image resizing.
ACM Trans. Graph., 26(3):10, 2007.
[AYV01] N. Arica and F.T. Yarman-Vural. An overview of character recognition focused
on o-line handwriting. Systems, Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on, 31(2):216233, May 2001.
[Bai97] D. Bainbridge. Extensible Optical Music Recognition. PhD thesis, Department of
Computer Science, University of Canterbury, Christchurch, NZ, 1997.
[BBN01] P. Bellini, I. Bruno, and P. Nesi. Optical music sheet segmentation. Web Delivering
of Music, 2001. Proceedings. First International Conference on, pages 183190,
Nov. 2001.
[BS00] Marija Bojovic and Milan D. Savic. Training of hidden markov models for cursive
handwritten word recognition. In ICPR '00: Proceedings of the International
Conference on Pattern Recognition, page 1973, Washington, DC, USA, 2000. IEEE
Computer Society.
[Cap08] Artur Capela. Reconhecimento de símbolos musicais manuscritos na framework
gamera. Master's thesis, Faculdade de Engenharia da Universidade do Porto, 2008.
[Car89] N. P. Carter. Automatic Recognition of Printed Music in the Context of Electronic
Publishing. PhD thesis, Departments of Physics and Music, 1989.
[CC93] B. Coüasnon and J. Camillerapp. Using grammars to segment and recognize music
scores. In Proc. of DAS-94: International Association for Pattern Recognition
Workshop on Document Analysis Systems, pages 1527, Kaiserslautern, 1993.
[CCR+08] Jaime S. Cardoso, Artur Capela, Ana Rebelo, Carlos Guedes, and Joaquim Pinto
da Costa. Sta detection with stable paths, 2008. IEEE Transaction on Pattern
Analysis and Machine Intelligence (TPAMI), Submited.
[CCRG08a] Artur Capela, Jaime S. Cardoso, Ana Rebelo, and Carlos Guedes. Integrated
recognition system for music scores, 2008. International Computer Music Confer-
ence (ICMC), Accepted.
[CCRG08b] Jaime S. Cardoso, Artur Capela, Ana Rebelo, and Carlos Guedes. A connected
path approach for sta detection on a music score, 2008. IEEE International
Conference on Image Processing (ICIP), Accepted.
76
References 77
[Coü96] B. Coüasnon. Segmentation et reconnaissance de documents guidées par la con-
naissance a priori: application aux partitions musicales. PhD thesis, Université de
Rennes, 1996.
[CRCG08] Artur Capela, Ana Rebelo, Jaime S. Cardoso, and Carlos Guedes. Sta line de-
tection and removal with stable paths. In Proceedings of the International Con-
ference on Signal Processing and Multimedia Applications (SIGMAP 2008), pages
263270, 2008.
[DDCF07] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski, and Ichiro Fuji-
gana. Sta removal toolkit for gamera, 2005-2007. http://music-staves.
sourceforge.net.
[DDCF08] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski, and Ichiro Fujigana.
A comparative study of sta removal algorithms. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 30:753766, 2008.
[Die05] Reinhard Diestel. Graph Theory. Graduate Texts in Mathematics. Springer-Verlag,
third edition, 2005.
[eeFc02] Vojt ech ech Franc and Václav Hlavá c. Multi-class support vector machine. Tech-
nical report, 2002.
[FLS05] Alicia Fornés, Josep Lladós, and Gemma Sánchez. Primitive segmentation in old
handwritten music scores. In Wenyin Liu and Josep Lladós, editors, GREC, volume
3926 of Lecture Notes in Computer Science, pages 279290. Springer, 2005.
[Fuj04] Ichiro Fujinaga. Sta detection and removal. In Susan George, editor, Visual
Perception of Music Notation: On-Line and O-Line Recognition, pages 139.
Idea Group Inc., 2004.
[Fuk90] Keinosuke Fukunaga. Introduction to statistical pattern recognition (2nd ed.). Aca-
demic Press Professional, Inc., San Diego, CA, USA, 1990.
[Gal86] Zvi Galil. Ecient algorithms for nding maximum matching in graphs. ACM
Comput. Surv., 18(1):2338, 1986.
[GWE04] Rafael C. Gonzalez, Richard E. Woods, and Steven L. Eddins. Digital Im-
age processing using MATLAB, pages 405407. Upper Saddle River, NJ :
Pearson/Prentice-Hall, 2004.
[Has70] W. K. Hastings. Monte carlo sampling methods using markov chains and their
applications. Biometrika, 57(1):97109, April 1970.
[Hay98] Simon Haykin. Neural Networks: A Comprehensive Foundation (2nd Edition).
Prentice Hall, July 1998.
[HL02] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support
vector machines. IEEE Transactions on Neural Networks, 13(2):415425, 2002.
78 References
[JZ97] Anil K. Jain and Douglas Zongker. Representation and recognition of handwritten
digits using deformable templates. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 19(12):13861391, 1997.
[JZL96] Anil K. Jain, Yu Zhong, and Sridhar Lakshmanan. Object matching using de-
formable templates. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 18(3):267278, 1996.
[KHB+00] Tapas Kanungo, Robert M. Haralick, Henry S. Baird, Werner Stuezle, and David
Madigan. A statistical, nonparametric methodology for document degradation
model validation. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 22(11):12091223, 2000.
[KI90] H. Kato and S. Inokuchi. A recognition system for printed piano music using
musical knowledge and constraints. In Proceedings of the International Association
for Pattern Recognition Workshop on Syntactic and Structural Pattern Recognition,
pages 231248, 1990.
[KPM96] Gary E. Kopec, Phil A. Chouxerox Parc, and David A. Maltzcarnegie. Markov
source model for printed music decoding. Journal of Electronic Imaging, pages
714, 1996.
[LCL93] I. Leplumey, J. Camillerapp, and G. Lorette. A robust detector for music staves.
In Proceedings of the International Conference on Document Analysis and Recog-
nition, pages 902905, 1993.
[LS88] L. Lam and Ching Y. Suen. Structural classication and relaxation matching of
totally unconstrained handwritten zip-code numbers. Pattern Recogn., 21(1):19
32, 1988.
[MA98] E. Mayoraz and E. Alpaydim. Support vector machines for multiclass classication.
Technical report, 1998.
[Mah82] J. V. Mahoney. Automatic analysis of music score images. B.Sc thesis, 1982.
[Mat85] T. Matsushima. Automated high speed recognition of printed music (wabot2
vision system). Advanced Robotics 1985. ICAR 1985. International Conference
on, pages 477 482, 1985.
[MMM04] Youichi Mitobe, Hidetoshi Miyao, and Minoru Maruyama. A fast hmm algo-
rithm based on stroke lengths for on-line recognition of handwritten music scores.
In IWFHR '04: Proceedings of the Ninth International Workshop on Frontiers
in Handwriting Recognition, pages 521526, Washington, DC, USA, 2004. IEEE
Computer Society.
[MN96] H. Miyao and Y. Nakano. Note symbol extraction for printed piano scores using
neural networks. IEICE TRANSACTIONS on Information and Systems, E79-
D:548554, 1996.
References 79
[MO07] Hidetoshi Miyao and Masayuki Okamoto. Stave extraction for printed music scores
using DP matching. Journal of Advanced Computational Intelligence and Intelli-
gent Informatics, 8:208215, 2007.
[MP88] Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent
in nervous activity. pages 1527, 1988.
[Ng95] Kia Ng. Automated computer recognition of music score. PhD thesis, University
of Leeds, 1995.
[Nis95] Hirobumi Nishida. A structural model of shape deformation. Pattern Recognition,
28(10):16111620, 1995.
[Pre70] D. Prerau. Computer pattern recognition of standard engraved music notation.
PhD thesis, Massachusetts Institute of Technology, 1970.
[Pru66] D. Pruslin. Automatic recognition of sheet music. PhD thesis, Massachusetts
Institute of Technology, 1966.
[Pug06] Laurent Pugin. Optical music recognition of early typographic prints using hidden
markov models. In ISMIR, pages 5356, 2006.
[RB05] F. Rossant and I. Bloch. Optical music recognition based on a fuzzy modeling of
symbol classes and music writing rules. Image Processing, 2005. ICIP 2005. IEEE
International Conference on, 2:II53841, Sept. 2005.
[RCC+07] Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, Carlos Guedes, Eurico Car-
rapatoso, and Jaime S. Cardoso. A shortest path approach for sta line detection.
Automated Production of Cross Media Content for Multi-Channel Distribution,
2007. AXMEDIS '07. Third International Conference on, pages 7985, Nov. 2007.
[RCF+93] R. Randriamahefa, J.P. Cocquerez, C. Fluhr, F. Pepin, and S. Philipp. Printed
music recognition. Document Analysis and Recognition, 1993., Proceedings of the
Second International Conference on, pages 898901, Oct 1993.
[Rea69] Gardner Read. Music Notation: A Manual of Modern Practice (2nd ed.). Ta-
plinger, New York, 1969.
[RHW88] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal represen-
tations by error propagation. pages 673695, 1988.
[Ros02] Florence Rossant. A global method for music symbol recognition in typeset music
sheets. Pattern Recognition Letters, 23(10):11291141, 2002.
[RP96] K.T. Reed and J.R. Parker. Automatic computer recognition of printed music.
Pattern Recognition, 1996., Proceedings of the 13th International Conference on,
3:803807 vol.3, Aug 1996.
80 References
[RT88] J. W. Roach and J. E. Tatem. Using domain knowledge in low-level visual process-
ing to interpret handwritten music: an experiment. Pattern Recognition, 21(1):33
44, 1988.
[SDV06] C.H. Papadimitriou S. Dasgupta and U.V. Vazirani. Algorithms, pages 169179.
McGraw-Hill Higher Education, 2006.
[Szw05] Mariusz Szwoch. A robust detector for distorted music staves. In Computer Anal-
ysis of Images and Patterns, pages 701708. Springer-Verlag, Heidelberg, 2005.
[TK03] Sergios Theodoridis and Konstantinos Koutroumbas. Pattern recognition, chapter
4.6. Academic Press, second edition, 2003.
[TMD99] Michael Thulke, Volker Märgner, and Andreas Dengel. A general approach to
quality evaluation of document segmentation results. In DAS '98: Selected Papers
from the Third IAPR Workshop on Document Analysis Systems, pages 4357,
London, UK, 1999. Springer-Verlag.
[TSM06] Fubito Toyama, Kenji Shoji, and Juichi Miyamichi. Symbol recognition of printed
piano scores with touching symbols. pages 480483, 2006.
[Vap98] Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, September
1998.
[Wak94] T. Wakahara. Shape matching using lat and its application to handwritten numeral
recognition. IEEE Trans. Pattern Anal. Mach. Intell., 16(6):618629, 1994.
[WFJC01] Yuan-Kai Wang, Kuo-Chin Fan, Yau-Tarng Juang, and Tai-Hong Chen. Using
hidden markov model for chinese business card recognition. In ICIP (1), pages
11061109, 2001.
[WH88] Bernard Widrow and Marcian E. Ho. Adaptive switching circuits. pages 123134,
1988.
Part V
Appendix
81
AppendixA
Fundamentals
A.1 Primal Problem vs Dual Problem
The primal problem for the case of nonseparable training classes (xi, di)Ni=1 in the feature
space is given by
minw,b,ξi
12wTw + C
N∑
i=1
ξi
s.a di(wTxi + b) ≥ 1− ξi i = 1, 2, ..., N
(A.1)
where the non negatives slack variables, ξiNi=1, measure the deviation of a data point from
the ideal condition of pattern separability and the parameter C controls the relation between
the complexity of the classier and the number of nonseparable points. It is possible to nd the
solution of the quadratic problem by determining the saddle point of the Lagrangian function
that is given by:
J(w, b, ξ, α, µ) =1
2wTw + C
N∑
i=1
ξi −N∑
i=1
αi
[di(w
T xi + b)− (1− ξi)]−
N∑
i=1
µiξi (A.2)
which has to be minimized with respect to w, b and ξi; it also has to be maximized with
respect to the nonnegative multipliers αi e µi. The parameters that minimize the function
J(w, b, ξ, α, µ) meet the following conditions:
∂J
∂w= w −
N∑
i=1
αidixi = 0 (A.3)
∂J
∂b= −
N∑
i=1
αidi = 0 (A.4)
∂J
∂ξi= C − µi − αi = 0 i = 1, 2, . . . , N (A.5)
From these conditions we have
w =N∑
i=1
αidixi (A.6)
83
84 Appendix A. Fundamentals
N∑
i=1
αidi = 0 (A.7)
µi + αi = C i = 1, 2, . . . , N (A.8)
If we expand the equation A.2, term by term, we have:
J(w, b, ξ, α, µ) =1
2wTw + C
N∑
i=1
ξi −N∑
i=1
αidiwT xi + b
N∑
i=1
αidi +
N∑
i=1
αi −N∑
i=1
αiξi −N∑
i=1
µiξi (A.9)
The fourth term in the right-hand side is zero by the optimality condition A.7 obtained above.
Besides that, from the condition A.6 we have
wTw =N∑
i=1
αidiwTxi =
12
N∑
i=1
N∑
j=1
αiαjdidjxTi xj
Replacing the condition A.8 and the result obtained above in the equation A.9, we obtain the
following dual objective function Lagrangiana:
JD =N∑
i=1
αi −12
N∑
i=1
N∑
j=1
αiαjdidjxTi xj (A.10)
which gives the inferior limit of the function objective A.1 to any admissible point. In doing
so, it is possible to formulate the dual problem for a non separable training sample(xi, di)Ni=1
as follows:
maxα
N∑
i=1
αi −12
N∑
i=1
N∑
j=1
αiαjdidjxTi xj
s.a
w =∑N
i=1 αidixi∑Ni=1 αidi = 0
C − µi − αi = 0 i = 1, 2, ..., Nαi ≥ 0, µi ≥ 0 i = 1, 2, ..., N
(A.11)
Adding the complementary conditions of the Karush-Kuhn-Tucker
αi[di(wTxi + b)− (1− ξi)
]= 0 (A.12)
µiξi = 0 (A.13)
αi[di(wTxi + b)− (1− ξi)
]≥ 0 (A.14)
to the conditions A.6A.8 for i = 1, . . . , N it is possible reformulate the dual problem A.11 as
follows:
A.2. Error Backpropagation Algorithm 85
maxα
N∑
i=1
αi −12
N∑
i=1
N∑
j=1
αiαjdidjxTi xj
s.a
∑Ni=1 αidi = 0
0 ≤ αi ≤ C i = 1, 2, ..., N
(A.15)
A.2 Error Backpropagation Algorithm
The error backpropagation method is a neural network train without feedback. This tech-
nique requires a dierentiable transfer function to be possible the minimization of the cost
function [TK03, Hay98]. Since the multilayer perceptron architectures apply the activation
function degree:
f(x) =
1, se x > 0
0, se x < 0(A.16)
which is discontinues in the point 0, this algorithm uses the family of continuous dierentiable
functions which is the family of sigmoid function:
f(x) =1
1 + exp(−ax),where a> 0 is the slope parameter (A.17)
A.2.1 Mathematic explanation
The learning method of using the error backpropagation uses the gradient descending technique
to minimize the cost function. This function expresses the mean square error dened as follows:
E(i) =N∑
i=1
12
kL∑
m=1
(ym(i)− ym(i))2 (A.18)
where ym(i) and ym(i) represent respectively the output obtained and the desired output for
the output neuron m. kL is the total number of the outputs neurons and N is the total number
of training samples available (y(i), x(i)). The updating of the weights of the edges is made
from
wrj (new) = wrj (old)− µ∂E(i)∂wrj
(A.19)
where wrj (old) is the current estimate of the unknow weights, µ is the learning coecients and∂E(i)∂wr
jis the gradient of error function.
Let υrj be the sum of the input weights in the jth neuron in the rth layer and yrj the corresponding
output after the activation function. If yr−1k is the output of the kth neuron, k = 1, 2, ..., kr−1,
in the r − 1th layer for the ith training pair and wrjk is the current estimate of the corresponding
weight in the jth neuron in the rth layer,j = 1, 2, ..., kr, then the argument of the activation
function f(.) of the latter neuron is
86 Appendix A. Fundamentals
υrj (i) =kr−1∑
k=1
wrjkyr−1k (i) + wrj0 =
kr−1∑
k=0
wrjkyr−1k (i) (A.20)
where yr0(i) = 1,∀r, i. For the output layer, we have r = L, yrk(i) = yk(i), with k = 1, 2, ..., kLand for the network input we have r = 1, yrk(i) = xk(i), with k = 1, 2, ..., k0. By the chain rule
in dierentiation, we have
∂E(i)∂wrj
=∂E(i)∂υrj (i)
∂υrj (i)∂wrj
(A.21)
from A.20 we obtain
∂
∂wrjυrj (i) ≡
∂∂wr
j0υrj (i)...
∂∂wr
jkr−1
υrj (i)
= yr−1(i) (A.22)
Let us dene,
∂E(i)∂υrj (i)
= δrj (i) (A.23)
then A.19 becomes
wrj (new) = wrj (old)− µN∑
i=1
δrj (i)yr−1(i) (A.24)
Summarizing the error backpropagation algorithm:
1. Inicialization: Initialize all the weights with small random values.
2. Forward computations: For each of the training feature vectors x(i), i = 1, 2, ..., N com-
pute υrj (i) and yrj (i) = f(υrj (i)), for j = 1, 2, ..., kr, r = 1, 2, ..., L from the sigmoid
function f . Compute the cost function for the current estimate of weights from E(i) =∑Ni=1
12
∑kLm=1 (ym(i)− ym(i))2 andE(i) ≡ 1
2
∑kLm=1 e
2m(i) ≡ 1
2
∑kLm=1 (f(vLm(i))− ym(i))2.
3. Backward computations: For each i = 1, 2, ..., N and j = 1, 2, ..., kr compute δLj (i) from
δLj (i) = ej(i)f ′(υLj (i)) and in the end compute δr−1j (i) from δr−1
j (i) =∑kr
k=1 δrj (i)w
rkjf′(υr−1
j (i)),for r = L,L− 1, ..., 2 and j = 1, 2, ..., kr; f ′(.) represents the dierentiation with respect
to argument.
4. Update the weights: For r = 1, 2, ..., L and j = 1, 2, ..., krwrj (new) = wrj (old)− µ∑N
i=1 δrj (i)y
r−1(i).
A.2.2 Graphic explanation
To facilitate the layout of this method we will only consider a neural network with three layers
see Figure A.1. Each neuron is composed by the sum of products of the weights with the
input signals and the sigmoid activation function. The signal e represents the output signal of
A.2. Error Backpropagation Algorithm 87
Figure A.1: Neural network architecture with three layers, two inputs and one output.
the sum and y = f(e) represents the output signal of the neuron.
To prepare the network for the classication it is necessary to train a data set constituted
by the input signals (x1 and x2) associated with their targets z. This training is an interac-
tive process, because for each interaction the node weights are modied using a new training
dataset, in order to minimize the cost function. This modication is computed using the error
backpropagation algorithm. In doing so, each learning step starts by forcing both input signals
of the training set to determine the values of the output signals for each neuron in each network
layer. Figure A.2 shows how the signal propagates through the network. The symbols w(xm)n
are the weights of the edges between the network input xm and the neuron n in the input layer;
the symbols yn represent the output signal in the neuron n.
In the next step, the output signal y is compared with the value of the desired output (z)
see Figure A.3. The dierence is called error signal d of the neuron output layer. Since the
output values of the neurons in the hidden layers are unknown, it is possible to compute the
error signal. In this manner, the aim is to propagate the error signal d in backward mode to all
the neurons - see Figure A.4. The value of the weights wmn is equal to the value of the weights
to calculate the output value. Only the direction of the sense of the data is changed, that is,
the signals propagate itself, for all layers of network, from the output to the input, one behind
the others.
When the error signal for each neuron has been computed, the weights coecients for each
input neuron are modied see Figura A.51.
1 df(e)de
represents the derivative of the activation function of the neuron whose weights are modied.
88 Appendix A. Fundamentals
(a) Forward propagation of the signal in the inputlayer.
(b) Forward propagation of the signal in the inputlayer.
(c) Forward propagation of the signal in the inputlayer.
[(d) Forward propagation of the signal in the hiddenlayer.
(e) Forward propagation of the signal in the hiddenlayer.
(f) Forward propagation of the signal in the outputlayer.
Figure A.2: Forward propagation of the signal in the neural network.
Figure A.3: Comparison between the signal output and the target.
A.2. Error Backpropagation Algorithm 89
(a) Backward propagation of the signal in the outputlayer.
(b) Backward propagation of the signal in the outputlayer.
(c) Backward propagation of the signal in the hiddenlayer.
(d) Backward propagation of the signal in the hiddenlayer.
(e) Backward propagation of the signal in the hiddenlayer.
Figure A.4: Propagation of the error signal d in backward mode in the neural network.
90 Appendix A. Fundamentals
(a) The calculation of weight in the input layer. (b) The calculation of weight in the input layer.
(c) The calculation of weight in the input layer. (d) The calculation of weight in the hidden layer.
(e) The calculation of weight in the hidden layer. (f) The calculation of weight in the output layer.
Figure A.5: The calculation of weights in the neural network.
A.3. Dalitz's Algorithm 91
A.3 Dalitz's Algorithm
Dalitz algorithms is a generalization of the method described in [MO07]. It stars by estimating
the values for the sta line thickness and the sta space height. These values are estimated by
the technique used in Fujinaga [Fuj04]: the most frequent black-runs represents the sta line
height (staine_height) and the most common white-runs represents the vertical line distance
within the same sta (staspace_height). This technique starts by computing the vertical run-
lengths representation of the image.
The process of Dalitz for nding the sta lines operates on a set of stasegments and requires
methods not only for linking two of those stasegments horizontally and vertically, but also for
merging two segments with overlapping positions into one. In the following steps the algorithm
will be described:
1. Add vertical links between stasegments with a vertical distance around
staine_height+staspace_height.
2. Add horizontal links between adjacent stasegments possibly belonging to the same
sta line.
3. Partition the resulting graph into connected subgraphs; each subgraph that is wide and
high enough corresponds to a sta.
4. All stasegments within a system are labelled as belonging to a certain sta line. Seg-
ments of the same line at the same horizontal position are merged into one segment.
5. Due to ledger lines, ties and beams, some subgraphs will contain too many sta lines. To
reduce them to a predened number of lines per sta (typically ve for modern notation,
four for chant and six for tablature), the outer sta lines of each sta are subsequently
removed until the predened number of sta lines remains.
Dalitz developed the following algorithm to nd the stasegments:
1. Extract horizontal runs with more than 60 percent black pixels within a window of width
staspace_height.
2. The resulting laments are vertically thinned by replacing each vertical black run with
its middle pixel. For black runs higher than 2*staine_height, more than one skeleton
point is extracted.
3. The resulting skeleton segments wider than 2*staine_height are the stasegments.
A.4 Matching in Bipartite Graphs
Before moving to the denition of the Kuhn-Munkres Algorithm it is necessary to give some
denitions. Let G = (X ∪ Y,X × Y ) be a weighted complete bipartite graph, where edge xy
has weight w(xy). X and Y have the same size, n, and can be written as X = x1, x2, . . . , xnand Y = y1, y2, . . . , yn.
Denition 1: A feasible vertex labelling for N is a function l : V (N) → Z such that l(x) +l(y) = w(xy) for all x ∈ X and y ∈ Y . We dene the size of l, by size(l) =
∑v∈V (N) l(v).
92 Appendix A. Fundamentals
Denition 2: Let l be a feasible vertex labelling of N . Then the equality subgraph Gl for l in
N is the spanning subgraph of N containing all edges xy for which l(x) + l(y) = w(xy).
Suppose P is a network obtained fromG = (X∪Y,X×Y ) by giving each edge e an integer weightw(e). The algorithm iteratively constructs a sequence of feasible vertex labelling l1, l2, . . . for
P such that size(li+1) < size(li), and a sequence of matchings Mi such that Mi is a maximum
matching in the equality subgraph G(li), for all i ≥ 1. It stops when it nds a feasible vertex
labelling li for which Mi is a perfect matching in G(li).
The Kuhn-Munkres Algorithm (or Hungarian method) Start with an arbitrary feasi-
ble vertex labeling l, determine Gl, and choose an arbitrary matching M in Gl.
1. If M is complete for G, then M is optimal. Stop. Otherwise, there is some un-
matched x ∈ X. Set S = x and T = ∅.
2. If JGl(S) 6= T , go to step 3. Otherwise, JGl
(S) = T . Find
αl = minx∈S,y∈T c
l(x) + l(y)− w(xy
where T c denotes the complement of T in Y , and construct a new labeling l′ by
l′(v) =
l(v)− αl for v ∈ Sl(v) + αl for v ∈ T
l(v) otherwise
Note that αl > 0 and JG′l(S) 6= T . Replace l by l′ and Gl by Gl′ .
3. Choose a vertex y in JGl(S), not in T . If y is matched inM , say with z ∈ X, replace
S by S ∪ z and T by T ∪ y, and go to step 2. Otherwise, there will be an M
alternating path from x to y, and we may use this path to nd a larger matching
M ′ in Gl. Replace M by M ′ and go to step 1.
A.5 Otsu Threshold Algorithm
The segmentation process divides the image into sub- homogeneous regions. Homogeneity is
the existence of compliance of some features as intensity of colour or levels of gray. The Otsu
algorithm is used to nd the parameter of intensity T that divides the initial image into two
parts - classes of white pixels and classes of black pixels. This method considers that any point
that presents an intensity equal or greater than T belongs to the region of interest, and all
the others are considered background of the image. In this manner, a new image g(x, y) is
generated as follows:
g(x, y) =
1 se f(x, y) ≥ T0 se f(x, y) < T
A.5. Otsu Threshold Algorithm 93
where f(x, y) is the original image. After the segmentation, the matrix of the new image will
be constituted by 0's black pixels and 1's white pixels [GWE04].
Otsu considers the image normalized histogram as a function of density of discrete probability,
as follows:
pr(rq) =nqn, q = 0, 1, 2, . . . , L− 1
where n is the total number of pixels in the image, nq is the number of pixels that have intensity
rq and L is the total number of possible levels of intensity in the image. The Otsu method
chooses the threshold k, such that k is the intensity level where C0 = [0, 1, . . . , k − 1] andC1 = [k, k + 1, . . . , L − 1], that maximize the variance between the classes σ2
B, that is dened
as
σ2B = ω0(µ0 − µT )2 + ω1(µ1 − µT )2
where,
ω0 =∑k−1
q=0 pq(rq)
ω1 =∑L−1
q=k pq(rq)
µ0 =∑k−1
q=0 qpq(rq)/ω0
µ1 =∑L−1
q=k qpq(rq)/ω1
µT =∑L−1
q=0 qpq(rq)
94 Appendix A. Fundamentals
A.6 Sta Line Removal Algorithm
staffLineRemoval(IMAGE,STAVES)
threshold = 2*staffHeight;
tolerance = 1+ceil(staffHeight/3.0);
IMAGE\_REMOVE = copy(IMAGE);
For(nvalid = 0 to STAVES size)
Point2D staff = validStaves[nvalid];
For (i = 0 to staff size)
col = staff[i].x;
refRow = staff[i].y;
row = refRow;
pel = valuePixel(IMAGE, IMAGE\_REMOVE);
decrement/increase the reference row until one pixel
different from white pixel (dist1/dist2) is found;
If ( dist1 <= max(1, min (dist2, tolerance)) )
refRow-=dist1;
Else
If ( dist2 <= max(1, min (dist1, tolerance)) )
refRow+=dist2;
Else
continue;
Count the number of decrements/increase on the reference row
until the black pixel changes to white pixel (run);
If ( run >= threshold )
continue;
remove the vertical black sequences on the IMAGE;
Listing 3: Sta lines removal algorithm.
A.7 Baum-Welch Algorithm
The Baum-Welch algorithm or the Forward-Backward procedure consider the forward variable
αt(i) dened as
αt(i) = P (o1o2 . . . t, qt = Si|λ)
that is, the probability of the partial observation sequence, o1o2 . . . t, and state Si at time t,
given the model λ and the backward variable βt(i) dened as
βt(i) = P (ot+1ot+2 . . . T , qt = Si, λ)
that is, the probability of the partial observation sequence from t+ 1 to the end, given state Siat time t and the model λ. Besides these variables the Baum-Welch algorithm also needs two
more auxiliary variables that can be expressed in terms of the forward and backward variables:
ξ(i, j) = P (qt = Si, qt+1 = Sj |o, λ)
A.7. Baum-Welch Algorithm 95
that is, the probability of being in state Si at time t, and state Sj at time t+1, given the model
and the observation sequence. This is the same as,
ξ(i, j) =P (qt = Si, qt+1 = Sj , o|λ)
P (o|λ)
Using forward and backward variables this can be expressed as,
ξ(i, j) =αt(i)aijbj(ot+1)βt+1(j)
∑Ni=1
∑Nj=1 αtiaij(i)bj(ot+1)βt+1(j)
(A.25)
The second variable is the a posteriori probability,
γt(i) =N∑
j=1
ξt(i, j) (A.26)
In forward and backward variables this can be expressed by,
γt(i) =αt(i)βt(i)∑Ni=1 αt(i)βt(i)
Assuming a starting model λ = (A;B;π), this algorithm starts by computing the α's and β's
as follows. The initialization of the forward probabilities as the joint probability of state Siand initial observation o1:
α1(i) = πibi(o1), 1 ≤ i ≤ N
the next step consists of the product over all the N possible states Si, 1 ≤ i ≤ N at time t:
αt+1(j) =N∑
i=1
[αt(j)aij ] bj(ot+1), 1 ≤ t ≤ T − 1, 1 ≤ j ≤ N
This result in the probability of Si at time t + 1 with all the accompanying previous partial
observations. The αt+1(j) is obtained by accounting for observation ot+1 in state j, that is, by
multiplying the summed quantity by the probability bj(ot+1).The backward procedure starts
by the initialization of the βT (i):
βT (i) = 1, 1 ≤ i ≤ N
and the calculation of
βt(j) =N∑
j=1
aijbj(ot+1)βt+1(j), 1 ≤ i ≤ N, t = T − 1, T − 2, . . . , 1.
Then the algorithm computes the ξ's and γ's using Equations A.25 and A.26, respectively. The
next step is to update the HMM parameters according to
πi = γ1(i), 1 ≤ i ≤ N (A.27)
aij =∑T−1
t=1 ξt(i, j)∑T−1t=1 γt(i)
, 1 ≤ i ≤ N, 1 ≤ j ≤ N (A.28)
bj(k) =
∑Tt=1
s.tot=vk
γt(j)∑T
t=1 γt(j), 1 ≤ j ≤ N, 1 ≤ k ≤M (A.29)
96 Appendix A. Fundamentals
A.8 Viterbi Algorithm
Let δt(i) = maxq1,q2,...,qt−1 P [q1, q2, . . . , qt = i, o1o2 . . .t |λ] the highest probability along a single
path, at time t, which accounts for the rst t observations and ends in state Si. By induction
we have
δt+1(j) =[maxiδt(i)aij
]bj(Ot+1)
The array ψt(i) has the track of δt+1(j) for each t and j. The procedure is described below:
1. Initialization:
δ1(i) = πibi(o1), 1 ≤ i ≤ N
ψ1(i) = 0
2. Recursion:
δt(j) = max1≤i≤N
[δt−1(i)aij ] bj(ot), 2 ≤ t ≤ T, 1 ≤ j ≤ N
ψt(j) = arg max1≤i≤N
[δt−1(i)aij ] , 2 ≤ t ≤ T, 1 ≤ j ≤ N
3. Termination:
P ∗ = max1≤i≤N
[δT (i)]
q∗T = arg max1≤i≤N
[δT (i)]
4. Path backtracking:
q∗T = ψt+1(qt+1)∗, t = T − 1, T − 2, . . . , 1
AppendixB
Table of confusion
B.1 Results Obtained Without Elastic Matching for the
Handwritten Music Symbols
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 47 0 0 0 0 0 0 0 0 0 0 0 0 0BassClef 0 6 0 0 0 0 0 0 0 0 0 0 0 0Beam 1 0 107 0 0 0 0 0 0 0 0 0 0 1Flat 0 0 0 57 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 0 79 0 0 0 0 0 0 0 0 0Note 0 0 0 1 0 113 2 0 0 0 0 0 0 0NoteFlag 0 0 0 1 0 4 25 0 0 0 0 0 0 0NoteOpen 0 0 0 0 0 1 0 4 0 0 0 0 0 1RestI 0 0 0 0 0 0 0 0 33 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 100 0 0 0 0Sharp 0 0 0 1 0 1 0 0 0 0 83 0 0 1Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 5 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 24 0Unknown 0 0 4 3 3 6 4 1 0 2 1 0 0 77
Table B.1: Table of confusion of the nearest neighbor classier.
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 42 0 0 1 0 0 2 0 0 0 0 0 0 2BassClef 0 0 0 0 0 3 0 0 1 0 0 0 0 2Beam 1 0 94 0 1 1 1 0 0 0 1 0 0 10Flat 2 0 0 45 0 0 3 0 0 0 0 0 0 7Natural 1 0 0 0 66 3 0 0 0 0 0 0 0 9Note 0 0 0 1 0 98 3 0 1 2 1 0 1 9NoteFlag 0 0 0 2 0 4 21 0 1 0 0 0 0 2NoteOpen 0 0 0 0 0 5 0 0 0 0 0 0 0 1RestI 0 0 0 0 0 1 0 0 26 2 2 0 0 2RestII 0 0 0 0 0 1 0 0 0 99 0 0 0 0Sharp 0 0 0 2 0 0 4 0 3 0 68 0 0 9Staccatissimo 0 0 0 0 0 0 0 0 0 0 2 3 0 0TrebleClef 0 0 1 0 0 3 0 0 0 0 1 0 14 5Unknown 0 0 4 2 0 17 9 0 4 3 3 0 0 59
Table B.2: Table of confusion of the neural network.
97
98 Appendix B. Table of confusion
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 42 0 0 0 0 2 0 0 0 0 0 0 0 3BassClef 0 6 0 0 0 0 0 0 0 0 0 0 0 0Beam 0 0 105 0 0 0 0 0 0 0 0 0 0 4Flat 0 0 0 57 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 0 79 0 0 0 0 0 0 0 0 0Note 0 0 0 0 0 111 0 0 0 0 0 0 0 5NoteFlag 0 0 0 0 0 0 28 0 0 0 0 0 0 2NoteOpen 0 0 0 0 0 1 0 3 0 0 0 0 0 2RestI 0 0 0 0 0 0 0 0 32 0 0 0 0 1RestII 0 0 0 0 0 0 0 0 0 100 0 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 85 0 0 1Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 5 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 20 4Unknown 0 0 2 2 1 1 1 0 0 0 0 0 0 94
Table B.3: Table of confusion of the support vector machines.
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 46 0 0 0 0 1 0 0 0 0 0 0 0 1BassClef 0 5 0 0 0 2 0 0 0 0 0 0 0 2Beam 3 0 102 0 0 3 0 0 0 0 0 0 0 3Flat 0 0 0 48 11 0 0 0 0 0 0 0 0 0Natural 0 0 0 1 78 1 0 0 0 0 0 0 0 0Note 0 0 4 1 12 82 0 0 1 4 3 0 0 11NoteFlag 3 0 0 0 2 2 20 0 0 1 1 0 0 3NoteOpen 0 0 0 0 3 2 2 2 0 0 0 0 0 0RestI 0 0 0 0 0 0 0 0 36 0 0 0 0 0RestII 2 0 0 2 2 0 0 0 0 94 0 0 0 1Sharp 0 0 0 0 10 1 0 0 1 1 75 0 0 0Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 6 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 26 1Unknown 2 0 17 2 7 23 1 0 1 0 3 0 4 41
Table B.4: Table of confusion of the hidden Markov model.
B.2. Results Obtained Without Elastic Matching for the Printed Music Symbols 99
B.2 Results Obtained Without Elastic Matching for the
Printed Music Symbols
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 50 0 0 0 0 0 0 0 0 0 0 0 0 0Beam 0 73 0 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 38 0 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 0 0 75 1 0 0 0 0 0 0 0 0NoteFlag 0 0 0 0 1 28 0 1 0 0 0 0 0 0NoteOpen 0 0 0 0 0 0 77 0 0 0 0 0 0 0TieSlur 0 0 0 0 0 0 0 28 0 0 0 0 0 2RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 15 0 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 0 0 0 0 0 3 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 0 0 0 0 1 0 3 7 0 2 0 0 0 63
Table B.5: Table of confusion of the nearest neighbor classier.
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 47 3 0 0 0 0 0 0 0 0 0 0 0 0Beam 1 68 0 0 0 0 2 0 0 0 0 0 0 2Flat 0 0 38 0 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 3 0 0 67 1 1 0 0 0 0 0 0 4NoteFlag 0 0 0 0 0 24 0 0 0 0 0 0 0 1NoteOpen 0 0 0 0 4 1 68 1 0 0 0 0 0 1TieSlur 0 0 0 0 4 1 0 19 0 0 0 0 0 6RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 1 0 0 1 0 9 0 2 0 2Sharp 0 0 0 0 0 0 0 0 0 0 79 0 0 1Time 0 0 0 0 0 0 0 1 0 0 0 0 0 2TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 28 2Unknown 3 0 0 0 4 4 5 7 0 1 0 1 0 51
Table B.6: Table of confusion of the neural network.
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 50 0 0 0 0 0 0 0 0 0 0 0 0 0Beam 0 73 0 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 37 0 0 0 0 0 0 0 0 0 0 1Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 0 0 76 0 0 0 0 0 0 0 0 0NoteFlag 0 0 0 0 0 27 0 0 0 0 0 0 0 3NoteOpen 0 0 0 0 0 0 77 0 0 0 0 0 0 0TieSlur 0 0 0 0 0 0 0 26 0 0 0 0 0 4RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 15 0 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 0 0 0 0 0 3 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 0 0 0 0 0 0 1 1 0 0 0 0 0 74
Table B.7: Table of confusion of the support vector machines.
100 Appendix B. Table of confusion
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 39 0 0 0 0 0 0 0 0 0 0 0 0 11Beam 0 71 0 0 0 0 0 0 0 0 0 0 0 2Flat 3 0 38 0 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 0 0 71 0 0 0 0 0 0 0 0 5NoteFlag 0 0 0 0 1 19 2 0 0 1 2 0 0 5NoteOpen 2 0 0 0 0 1 68 0 0 0 0 0 0 6TieSlur 0 0 1 0 0 0 0 25 0 0 0 0 0 4RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 13 0 0 0 2Sharp 0 0 0 0 0 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 0 0 0 0 0 1 0 2TrebleClef 0 0 0 0 0 4 0 0 0 0 0 0 11 15Unknown 1 4 0 0 6 0 3 6 0 0 0 0 0 56
Table B.8: Table of confusion of the hidden Markov Model.
B.3. Results Obtained With Elastic Matching for the Handwritten Music Symbols 101
B.3 Results Obtained With Elastic Matching for the
Handwritten Music Symbols
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 47 0 0 0 0 0 0 0 0 0 0 0 0 0BassClef 0 6 0 0 0 0 0 0 0 0 0 0 0 0Beam 2 0 104 0 0 0 0 0 1 0 0 0 0 2Flat 0 0 0 57 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 0 79 0 0 0 0 0 0 0 0 0Note 0 0 0 0 0 112 1 1 0 0 0 0 0 2NoteFlag 0 0 0 0 0 4 26 0 0 0 0 0 0 0NoteOpen 0 0 0 0 0 2 0 4 0 0 0 0 0 0RestI 0 0 0 0 0 0 0 0 33 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 100 0 0 0 0Sharp 0 0 0 0 1 0 0 0 0 0 85 0 0 0Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 5 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 24 0Unknown 1 0 7 14 0 7 2 0 1 2 3 0 0 64
Table B.9: Table of confusion of the nearest neighbor classier.
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 43 0 0 1 1 0 0 0 1 0 0 0 1 0BassClef 0 0 0 0 0 3 1 0 0 0 0 0 0 2Beam 0 0 94 0 0 3 0 0 0 0 7 0 0 5Flat 1 0 0 41 6 1 0 0 3 0 3 0 0 2Natural 2 0 1 3 71 0 0 0 0 0 1 0 0 1Note 0 0 2 0 1 86 13 0 1 3 1 0 2 7NoteFlag 0 0 1 0 0 8 12 0 0 0 1 0 1 7NoteOpen 0 0 0 0 0 3 1 0 0 2 0 0 0 0RestI 6 0 0 0 3 0 0 0 15 0 7 0 0 2RestII 0 0 1 0 0 0 0 0 0 93 4 0 0 2Sharp 2 0 0 2 1 2 0 0 0 1 76 0 1 1Staccatissimo 0 0 0 0 0 0 0 0 0 2 2 1 0 0TrebleClef 0 0 0 0 0 2 0 0 1 1 0 0 14 6Unknown 3 0 8 3 6 17 10 0 0 4 3 0 2 45
Table B.10: Table of confusion of the neural network.
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 47 0 0 0 0 0 0 0 0 0 0 0 0 0BassClef 0 6 0 0 0 0 0 0 0 0 0 0 0 0Beam 0 0 109 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 0 57 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 0 79 0 0 0 0 0 0 0 0 0Note 0 0 0 0 0 116 0 0 0 0 0 0 0 0NoteFlag 0 0 0 0 0 0 30 0 0 0 0 0 0 0NoteOpen 0 0 0 0 0 0 0 6 0 0 0 0 0 0RestI 0 0 0 0 0 0 0 0 33 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 100 0 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 86 0 0 0Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 5 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 24 0Unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 101
Table B.11: Table of confusion of the support vector machines.
102 Appendix B. Table of confusion
Accent BassClef Beam Flat Natural Note NoteFlag NoteOpen RestI RestII Sharp Staccatissimo TrebleClef Unknown
Accent 44 0 0 0 1 0 0 0 1 1 0 0 0 1BassClef 0 6 1 0 2 0 0 0 0 0 0 0 0 0Beam 0 0 107 0 0 3 0 0 0 0 1 0 0 0Flat 0 0 2 42 11 0 0 1 3 0 0 0 0 0Natural 0 0 0 2 74 0 0 0 3 0 0 0 0 1Note 0 0 7 2 4 92 2 0 4 6 0 1 0 0NoteFlag 1 0 1 0 3 4 9 1 1 4 6 0 0 2NoteOpen 0 0 0 0 4 2 1 2 0 0 0 0 0 0RestI 0 0 0 0 0 0 0 0 36 0 0 0 0 0RestII 4 0 0 0 3 1 0 1 6 86 0 0 0 0Sharp 0 0 2 0 8 3 0 0 0 3 72 0 0 0Staccatissimo 0 0 0 0 0 0 0 0 0 0 0 6 0 0TrebleClef 0 1 0 0 1 0 0 0 0 0 4 0 21 0Unknown 3 1 16 8 8 18 4 0 2 0 7 2 1 31
Table B.12: Table of confusion of the hidden Markov model.
B.4. Results Obtained With Elastic Matching for the Printed Music Symbols 103
B.4 Results Obtained With Elastic Matching for the Printed
Music Symbols
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 49 0 0 0 0 0 0 0 0 0 0 0 0 1Beam 0 73 0 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 38 0 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 0 0 76 0 0 0 0 0 0 0 0 0NoteFlag 0 0 0 0 3 27 0 0 0 0 0 0 0 0NoteOpen 0 0 0 0 1 0 76 0 0 0 0 0 0 0TieSlur 0 0 0 0 0 0 2 27 0 0 0 0 0 1RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 15 0 0 0 0Sharp 0 0 0 0 1 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 0 0 0 0 0 3 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 1 1 2 0 3 0 3 4 0 0 1 0 1 60
Table B.13: Table of confusion of the nearest neighbor classier.
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 43 0 0 0 0 0 0 0 0 0 0 0 0 7Beam 0 71 0 0 0 0 1 0 0 0 1 0 0 0Flat 0 0 36 0 0 0 0 0 0 0 0 0 0 2Natural 0 1 0 27 0 0 0 2 0 0 0 0 0 1Note 0 0 0 0 64 0 7 2 0 0 1 0 2 0NoteFlag 0 0 0 0 3 12 4 0 0 0 2 0 3 6NoteOpen 0 0 1 0 1 0 71 0 0 0 0 0 2 2TieSlur 2 1 0 0 0 0 5 4 0 0 1 0 3 14RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 2 2 0 3 0 0 3 0 1 4Sharp 1 0 0 1 0 0 0 0 0 0 78 0 0 0Time 0 0 0 0 0 0 0 2 0 0 1 0 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 2 0 28 0Unknown 3 1 2 0 7 3 8 3 0 0 5 0 1 43
Table B.14: Table of confusion of the neural network.
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 45 0 0 0 0 0 0 0 0 0 0 0 0 5Beam 0 66 0 0 0 0 0 0 0 0 0 0 0 7Flat 0 0 37 0 0 0 0 0 0 0 0 0 0 1Natural 0 0 0 29 0 0 0 0 0 0 0 0 0 2Note 0 0 0 0 69 0 0 0 0 0 0 0 0 7NoteFlag 0 0 0 0 0 29 0 0 0 0 0 0 0 1NoteOpen 0 0 0 0 0 0 72 0 0 0 0 0 0 5TieSlur 0 0 0 0 0 0 0 26 0 0 0 0 0 4RestI 0 0 0 0 0 0 0 0 15 0 0 0 0 1RestII 0 0 0 0 0 0 0 0 0 14 0 0 0 1Sharp 0 0 0 0 0 0 0 0 0 0 78 0 0 2Time 0 0 0 0 0 0 0 0 0 0 0 3 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 0 0 0 0 0 0 0 0 0 0 0 0 0 76
Table B.15: Table of confusion of the support vector machines.
104 Appendix B. Table of confusion
AltoClef Beam Flat Natural Note NoteFlag NoteOpen TieSlur RestI RestII Sharp Time TrebleClef Unknown
AltoClef 39 0 0 0 0 1 0 0 0 0 0 0 0 10Beam 0 73 0 0 0 0 0 0 0 0 0 0 0 0Flat 0 0 37 1 0 0 0 0 0 0 0 0 0 0Natural 0 0 0 31 0 0 0 0 0 0 0 0 0 0Note 0 0 1 0 70 0 1 3 0 0 0 0 0 0NoteFlag 0 0 0 0 11 15 0 1 0 0 2 0 0 1NoteOpen 0 0 0 5 0 8 62 0 0 0 0 0 0 2TieSlur 0 0 1 0 6 0 0 21 0 0 0 0 0 2RestI 0 0 0 0 0 0 0 0 16 0 0 0 0 0RestII 0 0 0 0 0 0 0 0 0 13 2 0 0 0Sharp 0 0 0 0 0 0 0 0 0 0 80 0 0 0Time 0 0 0 0 0 0 2 0 0 0 0 1 0 0TrebleClef 0 0 0 0 0 0 0 0 0 0 0 0 30 0Unknown 2 6 1 0 9 7 5 3 6 3 1 0 0 33
Table B.16: Table of confusion of the hidden Markov model.
AppendixC
Articles Submited to Conferences
105
A Shortest Path Approach for Staff Line Detection
Ana Rebelo
FCUP and INESC Porto
Portugal
Artur Capela
FEUP and INESC Porto
Portugal
Joaquim F. Pinto da Costa
FCUP
Portugal
Carlos Guedes
ESMAE
Portugal
Eurico Carrapatoso
FEUP and INESC Porto
Portugal
Jaime S. Cardoso
FEUP and INESC Porto
Portugal
Abstract
Many music works produced in the past still exist only as
original manuscripts or as photocopies. Preserving them
entails their digitalization and consequent accessibility in
a digital format easy-to-manage. The manual process to
carry out this task is very time consuming and error prone.
Optical music recognition (OMR) is a form of structured
document image analysis where music symbols are isolated
and identified so that the music can be conveniently pro-
cessed. While OMR systems perform well on printed scores,
current methods for reading handwritten musical scores by
computers remain far from ideal. One of the fundamen-
tal stages of this process is the staff line detection. In this
paper a new method for the automatic detection of mu-
sic stave lines based on a shortest path approach is pre-
sented. Lines with some curvature, discontinuities, and in-
clination are robustly detected. The proposed algorithm be-
haves favourably when compared experimentally with well-
established algorithms.
1. Introduction
The impact of music in our lives can hardly be overes-
timated. Music is a pivotal part of our cultural heritage
and its preservation, in all of its forms, must be pursued.
Frequently, the preservation of many music works entails
the digitalization of these works and consequent accessi-
bility in a format that encourages browsing, analysis and
retrieval. In fact, many music works produced during the
last centuries still exist only as original manuscripts or as
photocopies. The digitalization of these works is therefore
a highly desirable goal. Unfortunately, the ambitious goal
of providing generalized access to handwritten scores that
were never published has been severely hampered by the ac-
tual state-of-the-art of handwritten music recognition. The
manual process required to recognize handwritten musical
symbols in scores and to put them in relationship with the
spine structure is very time consuming.
Despite the fact that OMR systems dealing with ma-
chine printed scores exhibit good performance, handwrit-
ten music recognition introduces several additional difficul-
ties. Outstanding problems include notation varying from
writer to writer, and possibly varying in the same score:
symbols and staff lines written with different sizes, shapes
or intensity. Despite the continued research on OMR, with
the availability of several commercial OMR systems, we are
still lacking a satisfactory performance in terms of precision
and reliability. Most of the existing work provides a real ef-
ficiency only when quite regular, printed music sheets are
processed. This condition is exacerbated with handwritten
music scores. This justifies the research around the defini-
tion of reliable OMR algorithms.
Staff line detection is one of the fundamental stages of
the OMR process, with subsequent processes relying heav-
ily on its performance. The reasons for detecting and re-
moving the staff lines lie on the need to isolate the musical
symbols for a more efficient and correct detection of each
symbol presented on the score.
The detection of staves is complicated by a variety of
reasons. The handwritten staff lines are rarely straight and
horizontal, and are not parallel to each other. For exam-
ple, some staves may be tilted one way or another on the
same page or they may be curved. This is especially true
for handwritten scores. Since these scores tend to be rather
irregular and determined by a person’s own writing style,
the staff lines might be twisted, being curved and not re-
ally horizontal at all. It depends on how regular the author
writes the symbols in his scores. There might also be big-
ger or smaller gaps along a staff line. And if we consider
that most of these works are old, the quality of the paper
in which it is written might have degraded throughout the
years, making it a lot harder to correctly identify its con-
tents.
In this paper a method for the automatic detection of staff
lines based on a shortest path approach is presented. The
proposed paradigm uses the image as a graph, where the
staff lines result as the shortest path between the two mar-
gins of the image.
This introduction is concluded with a brief review of the
work done in this area. In section 2 the proposed algorithm
is described. In section 3, the proposed algorithm is exper-
imentally evaluated using real music scores. Finally, con-
clusions are drawn and future work is outlined in section 4.
1.1. Related Works
Different methods for staff line detection have been re-
searched. The simplest approach consists on finding local
maxima in the horizontal projection of the black pixels of
the image [2]. These local maxima represent line positions.
This method assumes straight and horizontal lines. Several
horizontal projections can be made with different image ro-
tation angles, keeping the image in which the local maxima
are bigger. This eliminates the assumption that the lines are
always horizontal. An alternative strategy for identifying
staff lines is to use vertical scan lines [3, 7, 8, 13]. More
recent works present a more or less sophisticated use of a
combination of projection techniques to improve on the ba-
sic approach [1].
Fujinaga [5] incorporates a set of image processing tech-
niques in the algorithm, including run-length coding (RLC),
connected-component analysis, and projections. After ap-
plying the RLC to find the thickness of staff lines and the
space between the staff lines, any vertical black runs that are
more than twice the staff line height are removed from the
original. Then, the connected components are scanned in
order to eliminate any component whose width is less than
the staffspace height. After a global deskewing, taller com-
ponents, such as slurs and dynamic wedges are removed.
Other techniques for finding stave lines include the appli-
cation of mathematical morphology algorithms [11, 15, 6],
rule-based classification of thin horizontal line segments
[10], and line tracing [12, 14].
In spite of the variety of methods available, they all suffer
from some limitations. In particular, lines with some curva-
ture or discontinuities are inadequately resolved. The dash
detector [9] is one of few works that try to handle discon-
tinuities. The dash detector is an algorithm that searches
the image, pixel by pixel, finding black pixel regions that
it classifies as stains or dashes. Then, it tries to unite the
dashes to construct lines.
2. A Shortest Path Approach for Staff Line De-
tection
A staff line can be considered as a path from the left
side of the music score to the right side. As staff lines are
almost the only extensive black objects on the music score,
the path we are looking for is the shortest path between the
two margins if paths (almost) entirely through black pixels
are favoured.
In the work to be detailed, the image grid is considered
as a graph with pixels as nodes and edges connecting neigh-
bouring pixels. Therefore, some graph concepts are in or-
der.
2.1. Definitions and Notation
A graph G = (V,A) is composed of two sets V and A.V is the set of nodes, and A the set of arcs (p, q), p, q ∈ V .The graph is weighted if a weight w(p, q) is associated toeach arc, and it is called a digraph if the arcs are directed,
i.e., (p, q) 6= (q, p). A path from p1 to pn is a list of unique
nodes p1, p2, . . . , pn, (pi, pi+1) ∈ A. The path cost is thesum of each arc weight in the path.
In graph theory, the shortest-path problem seeks the
shortest path connecting two nodes; efficient algorithms are
available to solve this problem, such as the well-known Di-
jkstra algorithm [4].
2.2. General Framework Description
As mentioned before, a staff line corresponds to a path
from (almost) the left margin of the image to (almost) the
right side of the image, (almost) always through black pix-
els.
Starting by modelling the edge image as a graph, match
a node to each pixel. Connect two nodes with an arc on
the graph iff the corresponding pixels are neighbours (8-
connected neighbourhoods) on the image. The weight of
each arc is a function of pixels values and pixels relative
positions (see Figure 1):
w5w1
w3
w7
w2 w4
w8 w6
Figure 1. Arc weight between two pixels.
wi =
f(p, qi) if qi ∈ 4-connected neighbourhood of ph(p, qi) if qi 6∈ 4-connected neighbourhood of p
In this work we set h(., .) =√
2f(., .).The objective is then to design the weights wi such that
the weights are low for the pixels of interest (black) and
high otherwise (this will lead to small weighted distances
from left to right for paths through staff lines and large for
the rest). Therefore this setting will favour paths through
black pixels, as required. We set
f(p, q) =
c1 if p or q are black pixels
c2 otherwise
with c2 > c1. In this work c1 and c2 were experimentally
determined as 2 and 6, respectively. Note that c1 must be set
greater than zero in order to also favour the smallest path,
when more than one exists through black pixels only. Fi-
nally, the solution to the shortest path problem will yield
the intended staff line.
2.3. Illustrative Examples
In order to get a better intuition of the general result, it is
instructive to first explore some basic examples. Consider
a music score with a single staff line as represented in Fig-
ure 2(a). In Figure 2(b) the shortest paths between starting
(a) A single staff line.
s5 e5
s1 e1
(b) The shortest path between
some pairs of points.
Figure 2. A first exemplificative example.
points si on the left margin and ending points ei on the right
margin, at the same row, are traced. All paths get attracted
by the staff line. A similar condition is verified when we
have more than one staff line (see Figure 3). Now paths
get attracted to the nearest staff line. Half of the paths in-
between two consecutive staff lines goes along the top staff
line; the other half follows the bottom staff line.
The last example, in Figure 4, shows that music symbols
placed on top of staff lines do not interfere with the detec-
tion of the staff lines. Moreover, the example also makes
(a) A pair of staff lines.
s1 e1
s8 e8
(b) The shortest path between
some pairs of points.
Figure 3. A second exemplificative example.
(a) Skewed staff lines with music
symbols.
(b) The shortest path between
some pairs of points.
Figure 4. A third exemplificative example.
clear that slight skewed scores do not pose any problem to
the proposed approach.
Nonetheless, some issues are visible and need to be con-
veniently addressed. Due to the skew of the staff lines, some
of the shortest paths jump between consecutive staff lines.
The text on the top of the score also constitutes a false low
weight path between the two margins, inducing some paths
to go through it. Finally, even when a staff line is correctly
followed by the path, the initial and the final parts of the
path should be ignored.
2.4. Proposed Algorithm
To detect the staff lines, the proposed overall algorithm
starts by estimating the staffspace height. This length will
be used as a reference length for the subsequent operations.
Robust estimators of the staffspace height are already in
common use. The technique starts by computing the verti-
cal run-lengths representation of the image. If a bit-mapped
page of music is converted to vertical run-lengths coding,
the most common black-runs represents the staff line height
and the most common white-runs represents the staffspace
height [5].
After the estimation of the staffspace height, the pro-
posed approach applies the main step of the framework:
for each row of the image, the shortest path between
the leftmost pixel and the rightmost pixel is found, us-
ing the Dijkstra algorithm [4]. Instead of consider-
ing the whole image when computing the shortest path,
only a strip centred on the row of interest is used—
see Figure 5. This allows to constrain the complex-
ity of the algorithm. On the experiments we have set
STRIP HEIGHT=STAFFSPACE HEIGHT.
2 x STRIP HEIGHT
Figure 5. Vertical strip around the row of in-
terest.
Now, because the search for the shortest path is con-
strained to stay in a vertical strip (or because the nearest
staff line is far enough from the current row), the shortest
path may not follow a staff line. Therefore, the main step is
followed by a sequence of (arguably) sensible rules, aimed
at discarding false staff lines. First, all paths without a per-
centage of black pixels above a threshold BLACK PERC
are discarded (a threshold of BLACK PERC = 0.75 wasused on the experiments).
Next, each retained path is trimmed at the beginning and
at the end. As visible in the previous examples (refer to
Figure 4), before meeting with a staff line, a path travels
through a sequence of white pixels. Likewise, after the end
of the staff line, the path goes again through a sequence of
white pixels until it meets the right margin of the image. In
order to ignore all these white pixels, the initial pixels of
the path are discarded until a run of at least BLACK RUN
black pixels are found in the path. In the same way, all
pixels of the path after the last occurrence of a run of at
least BLACK RUN black pixels are discarded. A threshold
of BLACK RUN = 2× STAFFSPACE HEIGHT was usedon the experiments.
Finally, the proposed overall algorithm ends with the val-
idation of the preserved paths. Because the staff lines are
expected to be straight lines, the linear correlation coeffi-
cient of the x and y components of the pixels of the pathis used to reject paths that do not meet the linearity crite-
rion, with a correlation coefficient bellow a threshold COR-
RCOEF. The threshold used here is conservative, since for
manuscript lines the (perfect) linearity is not guaranteed.
This precaution is needed because staves on a page are often
distorted in different ways.
2.4.1 Summation
The shortest path approach has some advantages over stan-
dard algorithms presented in the literature for the detection
of staff lines:
• While almost all current approaches provide only iso-lated pieces of a staff line, the approach proposed
here outputs a complete line, with starting and ending
points.
• The staff lines are rarely straight and horizontal, andare not parallel to each other. For example, some staves
may be tilted one way or another on the same page or
they may be curved. While current approaches apply
a chain of heuristics to correct these undesired imper-
fections, the shortest path algorithm is naturally robust
to these challenging conditions.
• The proposed approach is robust to broken staff lines(due to low-quality digitalization or low-quality origi-
nals) or staff lines as thin as one pixel. Missing pieces
are automatically ’completed’ by the algorithm.
For reference, the overall algorithm is summarized in
Listing 1 with pseudo-code.
StaffLine_Detection(IMAGE, CORRCOEF)
STAFFSPACE_HEIGHT = computeStaffSpaceHeight();
BLACK_PERC = 0.75;
BLACK_RUN = STAFFSPACE_HEIGHT;
STRIP_HEIGHT = STAFFSPACE_HEIGHT;
for (int row = 0; row < ImageHeight; row++)
Point2D start(0, row);
Point2D end(ImageWidth-1, row);
Path path = findShortestPath (start, end);
if(blackPercentage(path) < BLACK_PERC)
continue;
path = trimPath(path);
if(corrcoef(path) < CORRCOEF)
continue;
addPathToSetOfStaffLines(path);
Listing 1: Shortest Path StaffLine Detection.
3. Results
We now present additional, complete, examples of the
proposed framework for automatic staff lines detection. The
code for all the examples in this paper has been written in
C++1. For comparison purposes, the method by Fujinaga
[5] was also implemented. Three types of scores were used
1The source code is available upon request to the authors.
in the experiments: machine printed scores—see Figure 6;
handwritten music scores sitting on regular staff lines—see
Figure 7 and Figure 8; and irregular handwritten scores—
see Figure 9. Before applying the staff line detection algo-
rithms, each image was binarized.
Analysing the results for Fujinaga’s method, we observe
that some information is lost, information that could be im-
portant for the recognition tasks which follow. The places
where musical symbols are present create gaps on the de-
tected staff lines; in some places the gaps are larger than the
space occupied by those symbols—see Figure 7(c). This
is mainly due to the low quality of the lines. The shortest
path approach is able to overcome these conditions because
it follows a continuous path connecting both line ends. The
robustness of the proposed approach to possible defects in
staff lines (curvature, discontinuities) is apparent in the ex-
ample of Figure 7(c).
Occasionally, the shortest path algorithm retains paths
that do not go completely through a staff line. This prob-
lem is prone to happen when there’s a large beamed note
density—see Figure 9(b)—making it go through the beams
for this set of notes (e.g. quavers and semiquavers) or when
the curvature of the lines is such that the path jumps be-
tween consecutive lines. This condition can be overcome
by automatically learning the best threshold on the linear-
ity test of the path for each image being processed, or by
the incorporating additional rules to validate the path. Both
approaches are currently being investigated. Nevertheless,
considering handwritten scores, we can see that the pro-
posed algorithm has good and promising results.
4. Conclusion
The first challenge faced by an OMR system is staff line
detection. This first task dictates the possibility of success
for the recognition of the music score. And when it comes
to handwritten music scores, existing solutions are far from
presenting satisfactory results.
In this paper, a new algorithm for the automatic detec-
tion of staff lines in music scores was proposed. The short-
est path approach for staff line detection algorithm brings a
new and promising approach to the staff line detection task.
The technique is adaptable to a wide range of image condi-
tions, thanks to its intrinsic robustness to (slightly) skewed
images, discontinuities, and curved staff lines.
There are several directions of research to pursue with
the framework introduced in this paper. We are currently in-
vestigating the computational feasibility of, instead of find-
ing the shortest path between a starting point on the left mar-
gin and an ending point on the right margin at the same row,
finding the shortest path between a starting point on the left
margin and the whole right margin. That will allow coping
with severely skewed scores. Another line of investigation
is the automatic learning of the parameters of the algorithm.
Although most of the parameters values are scaled by the
staff height space, the threshold for the test of linearity is
still manually set. This improvement could make the algo-
rithm work with a wide range of types of scores, without re-
quiring any parameter tuning by the user. Finally, more than
just having spatial continuity of the path— enforced by the
shortest path algorithm—, we are also pursuing the continu-
ity of the direction of the path. As staff lines do not suffer
abrupt changes of direction, even when manually drawn, it
seems sensible to impose this additional constraint.
Acknowledgments
This work was partially funded by Fundacao para a
Ciencia e Tecnologia (FCT) - Portugal through project
PTDC/EIA/71225/2006.
References
[1] D. Bainbridge. Extensible Optical Music Recognition. PhD
thesis, Department of Computer Science, University of Can-
terbury, Christchurch, NZ, 1997.
[2] D. Blostein and H. S. Baird. A critical survey of music image
analysis. In Baird, Bunke, and Y. (Eds.), editors, Structured
Document Image Analysis, pages 405–434. Springer-Verlag,
Heidelberg, 1992.
[3] N. P. Carter. Automatic Recognition of Printed Music in the
Context of Electronic Publishing. PhD thesis, Departments
of Physics and Music, University of Surrey, 1989.
[4] E. W. Dijkstra. A note on two problems in connexion with
graphs. Numerische Mathematik, 1:269–271, 1959.
[5] I. Fujinaga. Staff detection and removal. In S. George, edi-
tor, Visual Perception of Music Notation: On-Line and Off-
Line Recognition, pages 1–39. Idea Group Inc., 2004.
[6] I. Gawedzki. Optical music scores recognition. Technical
report, 2002.
[7] S. Glass. Optical music recognition. B.Sc thesis, 1989.
[8] H. Kato and S. Inokuchi. A recognition system for printed
piano music using musical knowledge and constraints. In
Proceedings of the International Association for Pattern
Recognition Workshop on Syntactic and Structural Pattern
Recognition, pages 231–248, 1990.
[9] I. Leplumey, J. Camillerapp, and G. Lorette. A robust de-
tector for music staves. In Proceedings of the International
Conference on Document Analysis and Recognition, pages
902–905, 1993.
[10] J. V. Mahoney. Automatic analysis of music score images.
B.Sc thesis, 1982.
[11] B. R. Modayur, V. Ramesh, R. M. Haralick, and L. G.
Shapiro. Muser: A prototype musical score recognition sys-
tem using mathematical morphology. Machine Vision and
Applications, 6:140–150, 1993.
[12] D. Prerau. Computer pattern recognition of standard en-
graved music notation. PhD thesis, Department of Computer
Science and Engineering, MIT, 1970.
(a) Original score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].
Figure 6. Results for a machine-printed score.
(a) Original score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].
Figure 7. Results for a handwritten score.
[13] T. Reed. Optical music recognition. Master’s thesis, Depart-
ment of Computer Science, University of Calgary, Canada,
1995.
[14] J. W. Roach and J. E. Tatem. Using domain knowledge in
low-level visual processing to interpret handwritten music:
an experiment. Pattern Recognition, 21(1):33–44, 1988.
[15] M. Roth. OMR-optical music recognition. Master’s thesis,
Swiss Federal Institute of Tecnology, 1992.
(a) Original skewed score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].
Figure 8. Results for a skewed handwritten score.
(a) Original score. (b) Staff lines detected by our algorithm. (c) Staff lines detected by Fujinaga’s method [5].
Figure 9. Results for a handwritten score.
STAFF LINE DETECTION AND REMOVAL WITH STABLE PATHS
Artur Capela, Ana RebeloINESC Porto, Campus da FEUP, Rua Dr. Roberto Frias 378, 4200-465 Porto, Portugal
[email protected], [email protected]
Jaime S. CardosoINESC Porto, Faculdade de Engenharia, Universidade do Porto, Portugal
Carlos GuedesINESC Porto, Escola Superior de Musica e Artes do Espectaculo, Portugal
Keywords: Music, optical character recognition, document image processing, image analysis.
Abstract: Many music works produced in the past are currently available only as original manuscripts or as photocopies.Preserving them entails their digitalization and consequent accessibility in a machine-readable format, whichencourages browsing, retrieval, search and analysis while providing a generalized access to the digital material.Carrying this task manually is very time consuming and error prone. While optical music recognition (OMR)systems usually perform well on printed scores, the processing of handwritten music by computers remainsbelow the expectations. One of the fundamental stages to carry out this task is the detection and subsequentremoval of staff lines. In this paper we integrate a general-purpose, knowledge-free method for the automaticdetection of staff lines based on stable paths, into a recently developed staff line removal toolkit. Linesaffected by curvature, discontinuities, and inclination are robustly detected. We have also developed a staffremoval algorithm adapting an existing line removal approach to use the stable path algorithm at the detectionstage. Experimental results show that the proposed technique outperforms well-established algorithms. Thedeveloped algorithm will now be integrated in a web based system providing seamless access to browsing,retrieval, search and analysis of submitted scores.
1 INTRODUCTION
The Universal Declaration on Cultural Diversityadopted by the General Conference of UNESCO on2001 asserts that cultural diversity is as necessary forhumankind as biodiversity is for nature, and that poli-cies to promote and protect cultural diversity thus arean integral part of sustainable development. Beingmusic a pivotal part of our cultural heritage, its preser-vation, in all of its forms, must be pursued. Fre-quently, the preservation of many music works en-tails their digitalization and consequent accessibilityin a format that encourages browsing, analysis and re-trieval.
There is a vast amount of invaluable paper-basedheritage, including printed and handwritten musicscores, which are deteriorating over time due to natu-ral decaying of paper and chemical reaction (i.e., be-tween written ink and paper). Various efforts havebeen focused on this issue in order to preserve therecord of our heritage. Digitisation has been com-
monly used as a possible tool for preservation. Al-though a digital copy may not conserve the originaldocument, it can preserve the most important part:its data. It has also the advantages of easy dupli-cations, distribution, and digital processing. Never-theless, the output of the digitalization process is notamenable for further analysis or semantic search op-erations. Thus, an Optical Music Recognition (OMR)process is needed. However, the manual process re-quired to recognize handwritten musical symbols inscores and to put them in relationship with the spinestructure of the score is very time consuming. Thisjustifies the research around reliable automatic OMRalgorithms as current solutions are still below the ex-pectations.
As a concrete example, Portugal has a notoriouslack in music publishing from virtually all eras ofits musical history. However, whereas most of theknown original music manuscripts before the twen-tieth century are kept at the National Library Archivein Lisbon, there is virtually no national repository for
the Portuguese music from the twentieth century. Al-though there are recent efforts in order to catalogueand preserve in digital form the Portuguese musicfrom the late twentieth century—notably the MusicInformation Center (MIC, 2008) and the section onmusical heritage from the Institute of the Arts web-site (IOA, 2008)—most of the music pre-dating com-puter notation software was never published and stillexists as manuscripts or photocopies spread out allover the country in inconspicuous places. The riskof irreversibly losing this rich cultural heritage is thusa reality.
1.1 OMR System
The project “Optical recognition system for handwrit-ten music scores” initiated in 2007 by INESC Portoand ESMAE is the point of departure for creatinga web-based system of music manuscripts of Por-tuguese composers from the twentieth century. Thisdatabase will provide generalized access to a widecorpus of unpublished handwritten music encoded inMusicXML, which can be accessed remotely via theInternet. The database will not only centralize asmuch information as possible but will also serve topreserve this corpus in a way that is easily accessi-ble for browsing, analysis, and ultimately, for per-forming this repertoire, therefore helping to keep thePortuguese music alive (Capela et al., 2008). Al-though the aim of this project is Portuguese music,it is equally valid for all printed and handwritten mu-sic scores that need to be preserved from all aroundthe world.
The ambitious goal of providing generalized ac-cess to handwritten scores that have never been pub-lished has been severely hampered by the actual state-of-the-art of handwritten music recognition. Thereare currently various commercial OMR software so-lutions (Capella software1, SharpEye Music Reader2,OMeR3) and a few open source solutions (AOMR24,OpenOMR5, Audiveris6), but they are all offline stan-dalone applications. The existing online archives ofmusic scores (Lester Levi Collection, 2008; ClassicalSheet Music Collection, 2008; Mutopia Collection,2008) usually provide them in inadequate formats—
1http://www.capella-software.com/capscan.htm
2http://www.visiv.co.uk3http://www.myriad-online.com/en/products/
omer.htm4http://www.bzzt.net/˜arnouten/wiki/index.
php/Gamera#AOMR2:_omr_toolkit5http://sourceforge.net/projects/openomr6http://audiveris.dev.java.net
usually only as the scanned score image—for retrievalor automatic analysis. These online archives are merestandard websites, without facilities for optical recog-nition, editing and searching through the scores mu-sical content. The creation of an OMR system, in-tegrating optical recognition, storage, search, brows-ing and downloading capabilities, while keeping thescores in their original format along with their digitalcounterpart, would therefore be extremely beneficial.An integrated score editor would be provided in orderto view and edit the submitted music scores.
In our previous work on this project we havepresented a complete OMR System solution—OMRSYS (Capela et al., 2008)—comprising adatabase driven web application with one or moreOMR applications integrated in the proposed system.The proposed architecture successfully attends thestated objectives. At the end of our project we planon developing a fully functional system according tothe specified architecture and integrating a completeOMR package.
1.2 Detection and Removal of StaffLines
Staff line detection and removal are the first funda-mental stages on the OMR process, with subsequentprocesses relying heavily on their performance. Thereasons for detecting and removing the staff lines lieon the need to isolate the musical symbols for a moreefficient and correct detection of each symbol presenton the score. Although their primary application is asa preprocessing step in the recognition of music no-tation, the line detection problem also occurs in dif-ferent contexts (e.g., the recognition of bank transferforms).
The detection of staves is complicated due to avariety of reasons. The handwritten staff lines arerarely straight and horizontal, and are not parallel toeach other. For example, some staves may be tiltedone way or another on the same page or they may becurved. These scores tend to be rather irregular anddetermined by a person’s own writing style. More-over, if we consider that most of these works are old,the quality of the paper in which it is written mighthave degraded throughout the years, making it a lotharder to correctly identify its contents.
In (Cardoso et al., 2008a; Cardoso et al., 2008b)we presented a new and robust staff line detection al-gorithm based on a stable paths approach. The pro-posed paradigm uses the image as a graph, where thestaff lines result as connected paths between the twolateral margins of the image. A staff line can be con-sidered as a connected path from the left side to the
right side of the music score. As staff lines are almostthe only extensive black objects on the music score,the path to look for is the shortest path between thetwo margins if paths (almost) entirely through blackpixels are favoured.
In this paper we present our recent work and re-sults focusing on the implementation of the StablePaths algorithm as a C++ plugin for the MusicStavesToolkit (Dalitz et al., 2008; MusicStaves Toolkit,2008) (based on the Gamera Framework (MacMillanet al., 2002; Gamera Framework, 2008)), as well ason the removal stage by using our detection algorithmin the first stage. We have also adapted a removal al-gorithm based on the LineTrack Height approach pro-posed in (Dalitz et al., 2008), which we will present insection 3. In section 2 the Gamera and MusicStavesToolkit are presented, together with our C++ plugin.In section 4, both our proposed detection and removalalgorithms are evaluated experimentally using a well-known dataset of music scores. Finally, conclusionsare drawn and future work is outlined in section 5.
2 STABLE PATHS INTEGRATION
In this section we present the platform in which ourdetection algorithm was integrated for testing and val-idation. The platform is comprised by its core—theGamera Framework (MacMillan et al., 2002; GameraFramework, 2008)—and by a toolkit for the evalua-tion of detection and removal algorithms—the Mu-sicStaves Toolkit (Dalitz et al., 2008; MusicStavesToolkit, 2008). This toolkit is a set of Gamera plu-gins aiming to support the development and test ofstaff line detection and removal algorithms for musicscores, by extending the Gamera functionality.
After we describe each part constituting this plat-form, we present the implementation of our staff linedetection algorithm as a C++ plugin on the Music-Staves Toolkit. Finally, we present the integration ofour detection algorithm on some removal algorithmsin the MusicStaves toolkit. We have integrated ouralgorithm in those removal algorithms where the de-tection is processed separately from the removal op-eration, replacing the Dalitz algorithm (Dalitz et al.,2008) as the detection stage.
2.1 Gamera Framework
Gamera (MacMillan et al., 2002; Gamera Framework,2008) is a portable and open source framework to cre-ate structured documents analysis applications by do-main experts. The name Gamera is an acronym to
“Generalized Algorithms and Methods for Enhance-ment and Restoration of Archives”. It combines a pro-gramming library with graphical tools for an interac-tive training and development of recognition systems.This framework tries to be a tool to create custom ap-plications through the domain experts knowledge in-stead of responding to various requirements with amonolithic application. It aims at providing an ef-ficient test and refinement development cycle. Theprogramming language in its basis is the high-levellanguage Python, although it has many extensionswritten in C++ to carry out low-level image process-ing. Nevertheless, due to its nature, Python turns thecode writing process more agile and facilitates the useof scripting, which makes Gamera an interactive andbatchable framework. Besides the large number of ex-tensions deployed with this framework, it is also pos-sible to customize and extend with plugins or toolkits,written in Python and C++.
The Gamera framework is modular and organizedin a series of horizontal layers, which can be seen atFigure 1.
Figure 1: Gamera Architecture.Gamera follows a modular plugin architecture. It
is made of modules (plugins), both written in Pythonand C++, integrated in a high-level scripting environ-ment. Each module executes a task on the recognitionprocess. The framework maintains a toolbox designapproach, i.e., a user has access to a large set of toolsfor the optical recognition stages.
2.2 MusicStaves Toolkit
MusicStaves (Dalitz et al., 2008; MusicStavesToolkit, 2008) is a Gamera toolkit specific for the de-velopment and test of staff line detection and removalalgorithms. This Phyton toolkit also integrates facili-ties to create a test set of music scores and to evaluateresults with established metrics. As with Gamera, this
toolkit is portable and its source code is freely avail-able. In order to use MusicStaves, it must be importedto Gamera, either in the GUI or in a programmaticmanner.
MusicStaves is structured as seen in Figure 2. Thetoolkit is composed by a set of main classes where thestaff line detection and removal algorithms are imple-mented, and by a set of plugins in Python and C++.Some plugins are used by the implemented algorithmswhile others are tools to aid the algorithms testing. Itis an extendable toolkit in which one can integrate anew staff line detection and/or removal algorithm. Itsplugin system follows the Gamera framework ratio-nale and as such the plugins may be written in Pythonand C++. In order to write new staff line detectionor removal algorithms the toolkit provides two inter-faces: StaffFinder (detection) and MusicStaves (re-moval).
Figure 2: MusicStaves Architecture.
A test set of 32 synthetic music scores is also pro-vided by the authors of this toolkit. Moreover, thetoolkit allows applying a set of deformations (e.g., ro-tation, curvature, typeset emulation, white speckles)commonly found in the real world to these perfectscores—see Figure 3. The purpose is to be able tomeasure the performance of the removal algorithmscontained in MusicStaves by using three defined er-ror metrics (Dalitz et al., 2008): Pixel Level, Seg-mentation Region Level and Staffline Interruptions.However, the same set may be used to evaluate staffline detection algorithms alone by defining adequateerror metrics (Cardoso et al., 2008a; Cardoso et al.,2008b). We have restricted the range of the param-eters controlling the intensity of the deformations tovalues considered realistic.
2.3 Integration
We have integrated our recently proposed algorithmfor staff line detection (Cardoso et al., 2008a; Car-doso et al., 2008b) in MusicStaves as a C++ plugin.
Figure 3: Some examples of applied deformations: a) Cur-vature; b) Rotation; c) White Speckles; d) Staffline Inter-ruptions; e) Staffline y-variation; f) Typeset Emulation. See(Dalitz et al., 2008) for details.
The plugin encompasses the main StaffFinder class inthe toolkit root, the respective Python plugin in thePython plugins folder and the algorithm implementa-tion in C++ in the C++ plugins folder. In Figure 4we present a diagram of the algorithm as integrated inMusicStaves.
The algorithm processing starts with the call to themethod find staves, which receives a binarized imageas input. That image is then passed to the functionin C++ by the plugin Python code. In the C++ im-plementation, after the respective staff line detectionfunction is called, the image is converted to the formatused internally by our algorithm. After the whole de-tection task is complete it returns the staff lines skele-ton in the MusicStaves format. After receiving theskeleton list on its Python class, it fills the structureself.linelist with the obtained values.
Figure 4: Overall view.
Besides integrating our Stable Paths Approach al-gorithm into the MusicStaves toolkit as a C++ plu-gin, we have also integrated the algorithm with staffremoval algorithms in order to evaluate the improve-ments over the original results. However, this is nota standard integration as the toolkit does not providethe means for this kind of integration. This integra-tion was coded in the staff removal algorithms by
adding the possibility to choose the staff line detec-tion algorithm through a parameter value. A diagramillustrating this integration can be seen in Figure 5.Some removal algorithms present in MusicStaves de-tect the staff lines along with the removal process.From those, we have used the one with best resultsin general (Dalitz et al., 2008)—Skeleton—for com-parison purposes.
Figure 5: Stable Paths integration with staff removal algo-rithms.
3 STAFF REMOVALALGORITHMS
In the process of recognizing music scores the staff re-moval algorithm is processed after the staff line detec-tion stage takes place. In our current work the removalalgorithm is based on the LineTrack Height algorithmpresented on (Randriamahefa et al., 1993). The goalon this method is to track the staff lines positions ob-tained by a detection algorithm and remove verticalrun sequences of black pixels that have a value lowerthan a specified threshold, which was experimentallyset at 2*staffHeight.
As music scores may suffer from deformations,the staff lines may have descontinuities, be curved orinclined. These problems will influence the successto achieve a correct detection of lines contained onthe score to recognize. However, due to the above-mentioned problems, the positions of the staff linesobtained by a staff line detection algorithm may passslightly above or under the real staff lines positions.That way, if we are in presence of a white pixelwhen the staff lines are tracked, we search verticallyfor the closest black pixel. If that distance is lowerthan a specified tolerance—experimentally chosen as1+ceil(staffHeight/3.0)—we move the reference posi-tion of the staff line to the position of the black pixelfound.
On (Dalitz et al., 2008) a new method ispresented—Skeleton—which uses the skeleton infor-mation, but performs the staff removal on the original
Algorithm 1 Staff Removal Algorithm.procedure STAFFLINEREMOVAL(IMAGE,STAV ES)threshold = 2∗staffHeight;
tolerance = 1+ ceil(staffHeight/3.0);IMAGE REMOV E = copy(IMAGE);
for nvalid = 0 to STAVES size doPoint2D staff = validStaves[nvalid];for i = 0 to staff size do
col =staff[i].x;re f Row =staff[i].y;row = re f Row;pel = valuePixel(IMAGE, IMAGE REMOV E)decrement/increase the reference row until
one pixel different from white pixel (dist1/dist2) isfound;
if dist1≤max(1,min(dist2, tolerance)) thenre f Row−= dist1;
elseif dist2 ≤ max(1,min(dist1, tolerance))
thenre f Row+ = dist2;
elsecontinue;
end ifend ifCount the number of decrements/increase on
the reference row until the black pixel changes to whitepixel (run);
if run≥ threshold thencontinue;
end ifremove the vertical black sequences on the
IMAGE;end for
end forend procedure
image instead of the skeleton. The method relies onthe fact that symbols on the stafflines lead to junctionpoints or corner points in the skeleton.
4 RESULTS
Although the evaluation of new staff detection algo-rithms may be done by visually inspecting the out-put on a set of scores—as adopted on (Rebelo et al.,2007)—, our current comparison is supported onquantitative measures. The test set adopted for thequalitative evaluation of the proposed method is theone presented in (Dalitz et al., 2008) and already de-scribed. It consists of ideal synthetic scores to whicha set of known deformations have been applied—see(Dalitz et al., 2008) for more details. In total we havegenerated 2688 deformed images originated from 32perfect scores. In order to conveniently measure theperformance of staff line removal algorithms we have
adopted two error metrics from (Dalitz et al., 2008):Pixel Level and Segmentation Region Level.
Staff line detection algorithms can be used as afirst step in many staff removal algorithms. To un-derstand the potential of our algorithm to leveragethe performance of existing staff removal algorithms,we conducted a series of experiments, comparing theoriginal version of a staff removal algorithm with themodified version of it, making use of the Stable Pathsalgorithm at the staff line detection step. The quanti-tative comparison of the different algorithms is totallyin line with the comparison presented in (Dalitz et al.,2008).
With respect to the considered distortions, regard-ing the detection stage, the Stable Paths based ap-proach outperforms the Dalitz algorithm. In Figure 6we present our results for the removal algorithms:LineTrack Height (with Dalitz and Stable Paths),Skeleton and LineTrack Height Modified (with StablePaths). We chose the methods that present the best re-sults in (Dalitz et al., 2008), implementing our ownremoval algorithm with LineTrack Height as a basis.In general we verify that the replacement of the Dalitzmethod by our Stable Paths Approach algorithm asthe staff detection step has improved the final staffline removal results7. Additionally, the LineTrackingHeight Modified algorithm presents an overall betterperformance than the original LineTrack Height algo-rithm from (Dalitz et al., 2008). Our staff line de-tection and removal approaches also outperform theSkeleton method, although it continues to present acompetitive performance. We have not integrated theStable Paths algorithm with the Skeleton algorithmas the second performs the lines detection along withtheir removal instead of using two separate stages. Allthe parameters, both on the Stable Paths detection al-gorithm and LineTrack Height Modified, were pre-liminary tuned over an independent set of images.
This performance gain is even more noteworthy asthe MusicStaves algorithms are receiving as input thecorrect number of lines per staff. Had not this been thecase, the differential between both would have beenmuch larger. In summary, these experiments show thestrength of the algorithms presented here. Despite be-ing based on simple and intuitive underlying princi-ples, the performance of the proposed algorithms isquite competitive.
The analysed results have covered the detectionand removal accuracy but a brief word on speed is alsoin order. Comparing different algorithms for speedis notoriously difficult; we are simultaneously judg-ing mathematical properties and specific implementa-
7For the deformations not shown, the stable path is notsignificantly better than Dalitz.
tions. In the experimental study, the current imple-mentation of the Stable Paths algorithm run almostas fast as the Dalitz algorithm (20% slower). In re-spect to the removal algorithms our LineTrack Heightversion with Stable Paths is significantly faster thanthe Skeleton algorithm (two times faster). Compar-ing to the original LineTrack Height algorithm withthe Dalitz detection algorithm the runtime differenceis not significant. The algorithms were evaluated asavailable at the Staff Removal Toolkit (Dalitz et al.,2008).
5 CONCLUSIONS
This paper presented the integration of our robust Sta-ble Paths Approach algorithm (Cardoso et al., 2008a;Cardoso et al., 2008b) in the MusicStaves Toolkit(Dalitz et al., 2008) as a C++ plugin, an improvedversion of an existing staff line removal algorithm—LineTrack Height (Dalitz et al., 2008), and the re-sults we have obtained in our staff line removal tests.We have integrated our detection algorithm with ex-isting staff line removal algorithms. Our approachsuccessfully deals with the difficulties posed by thesymbols superimposed on the staff lines as well as awide range of image conditions (e.g., discontinuities,curved lines), frequently found on handwritten scores.
The encouraging results lead us now to considerinvestigating the detection of music symbols bene-fiting from the improved staff line detection and re-moval, creating a complete OMR application in or-der to integrate it on our proposed complete OMRsolution—OMRSYS (Capela et al., 2008). Thus, ourproposed system offers a complete solution for thepreservation of our musical heritage. It includes anoptical recognition engine integrated with an archiv-ing system and a user-friendly interface for search-ing, browsing and edition. The digitized scores arestored in MusicXML, a recent and expanding musicinterchange format designed for notation, analysis, re-trieval, and performance applications.
Our proposed algorithms and complete OMR sys-tem promote the creation of a full corpus of mu-sic documents, promoting its preservation and study.This project will culminate in the creation of a repos-itory of handwritten scores, accessible online. Thedatabase will be available for enjoyment, educationaland musicological purposes, thus preserving this cor-pus of music in an unprecedented way.
(a) Curvature. (b) Curvature. (c) White Speckles.
(d) Rotation. (e) Staffline Y-Variation. (f) Staffline Y-Variation.
(g) Staffline Interruptions. (h) Typeset Emulation Part 1. (i) Typeset Emulation Part 2.
Figure 6: Effect of different deformations on the overall staff removal error rates. See (Dalitz et al., 2008) for parameterdetails.
ACKNOWLEDGEMENTS
This work was partially funded by Fundacao paraa Ciencia e a Tecnologia (FCT) - Portugal throughproject PTDC/EIA/71225/2006.
REFERENCESCapela, A., Cardoso, J. S., Rebelo, A., and Guedes, C.
(2008). Integrated recognition system for musicscores. In International Computer Music Conference(ICMC) (Accepted).
Cardoso, J. S., Capela, A., Rebelo, A., and Guedes, C.(2008a). A connected path approach for staff detec-tion on a music score. In International Conference onImage Processing (ICIP 2008) (Accepted).
Cardoso, J. S., Capela, A., Rebelo, A., and Guedes, C.(2008b). Staff detection with stable paths. Trans-actions on Pattern Analysis and Machine Intelligence(TPAMI) (Submitted).
Classical Sheet Music Collection (2008). Classical SheetMusic and MIDI Files, http://www.music-scores.com.
Dalitz, C., Droettboom, M., Pranzas, B., and Fujinaga, I.(2008). A comparative study of staff removal algo-rithms. IEEE Transactions on Pattern Analysis andMachine Intelligence, 30(5):753–766.
Gamera Framework (2008). Gamera Framework, http://ldp.library.jhu.edu/projects/gamera/.
IOA (2008). Institute of the Arts, http://patrimonio.dgartes.pt/?lang=pt.
Lester Levi Collection (2008). The Lester S. Levi Collec-tion of Sheet Music, http://levysheetmusic.mse.jhu.edu/.
MacMillan, K., Droettboom, M., and Fujinaga, I. (2002).Gamera: Optical music recognition in a new shell. InInternational Computer Music Conference (ICMC),pages 482–485.
MIC (2008). Music Information Center, http://www.mic.pt.
MusicStaves Toolkit (2008). MusicStaves Toolkit,http://lionel.kr.hsnr.de/˜dalitz/data/projekte/stafflines/.
Mutopia Collection (2008). The Mutopia Project, http://sca.uwaterloo.ca/Mutopia.
Randriamahefa, R., Cocquerez, J. P., Fluhr, C., Ppin, F., andPhilipp, S. (1993). Printed music recognition. In Pro-ceedings of the 2nd International Conference on Doc-ument Analysis and Recognition (ICDAR’93), pages898–901.
Rebelo, A., Capela, A., da Costa, J. F. P., Guedes, C., Carra-patoso, E., and Cardoso, J. S. (2007). A shortest pathapproach for staff line detection. In AXMEDIS ’07:Proceedings of the Third International Conference onAutomated Production of Cross Media Content forMulti-Channel Distribution, pages 79–85, Washing-ton, DC, USA. IEEE Computer Society.
INTEGRATED RECOGNITION SYSTEM FOR MUSIC SCORES
Artur Capela, Jaime S. CardosoFEUP and INESC Porto
Ana RebeloFCUP and INESC Porto
Carlos GuedesESMAE and INESC Porto
ABSTRACT
Many music works produced in the last century still existonly as original manuscripts or as photocopies. Preservingthem entails their digitalization and consequent accessibil-ity in a digital format easy-to-manage which encouragesbrowsing, retrieval, search and analysis while providinga generalized access to the digital material. The manualprocess to carry out this task is very time consuming anderror prone. Automatic optical music recognition (OMR)has emerged as a partial solution to this problem. How-ever, the full potential of this process only reveals itselfwhen integrated in a system that provides seamless accessto browsing, retrieval, search and analysis. We addressthis demand by proposing a modular, flexible and scal-able framework that fully integrates the abovementionedfunctionalities. A web based system to carry out the auto-matic recognition process, allowing the creation and man-agement of a musiccorpus, while providing generalizedaccess to it, is a unique and innovative approach to theproblem. A prototype has been implemented and is beingused as a test platform for OMR algorithms.
1. INTRODUCTION
The impact of music in our lives can hardly be overesti-mated. Music is a pivotal part of our cultural heritage andits preservation, in all of its forms, must be pursued.
Portugal has a notorious lack in music publishing fromvirtually all eras of its musical history. However, whereasmost of the known original music manuscripts before thetwentieth century are kept at the National Library Archivein Lisbon, there is virtually no national repository for thePortuguese music from the twentieth century. Althoughthere are recent efforts in order to catalogue and preservein digital form the Portuguese music from the late twen-tieth century—notably the Music Information Center [10]and the section on musical heritage from the Institute ofthe Arts website [8]—most of the music pre-dating com-puter notation software was never published and still ex-ists as manuscripts or photocopies spread out all over thecountry in inconspicuous places.
For example, all the music composed by Jorge Peixi-nho (1940-1995), an internationally-renowned composerwho epitomized the Portugueseavant-gardein the 1960sand 1970s, was never published in Portugal (few of his
scores were published abroad), and almost his entireoeu-vre consists of manuscript paper [6]. Almost fourteenyears past his death, his music is already catalogued, al-though not published. Unfortunately, this case is not unique,and this situation is common with other great Portuguesecomposers from the twentieth century. The risk of irre-versibly losing this rich cultural heritage is thus a reality.
The project “Optical recognition system for handwrit-ten music scores” initiated in 2007 by INESC Porto andESMAE is the point of departure for creating a web-basedsystem of music manuscripts of Portuguese composers fromthe twentieth century. This database will provide gen-eralized access of a widecorpusof handwritten unpub-lished music encoded in MusicXML that can be accessedremotely via the Internet. The database will not only cen-tralize as much information as possible but will also serveto preserve thiscorpusin a way that is easily accessiblefor browsing, analysis, and ultimately, for performing thisrepertoire, therefore help keeping the Portuguese musicalive.
The ambitious goal of providing generalized access tohandwritten scores that have never been published has beenseverely hampered by the current state-of-the-art of hand-written music recognition. There are currently variouscommercial OMR software solutions [4, 16, 14] and a fewopen source solutions [1, 15, 2], but they are all offlinestandalone applications. The existing online archives ofmusic scores [9, 5, 13] usually provide them in inadequateformats—usually only as the scanned score image—forretrieval or automatic analysis. These online archives aremere standard websites, without facilities for optical recog-nition, editing and searching through the scores’ musicalcontent. The creation of an OMR system, integrating opti-cal recognition, storage, search, browsing and download-ing capabilities, while keeping the scores in their originalformat along with their digital counterpart, would there-fore be extremely beneficial.
Using this background, we present the specification andimplementation of a system integrating all the requiredfeatures. It uniquely combines OMR technology in a sys-tem, easing the conversion of scores to the Music Ex-tended Markup Language format [12]—MusicXML—asit is being widely adopted and meets our needs. In Sec-tion 2 the proposed system is described. We continue inSection 3 by presenting a usage scenario. Finally, conclu-sions are drawn in Section 4.
2. SYSTEM ARCHITECTURE ANDIMPLEMENTATION
The system that we propose on this paper comprises thecreation of a database of music scores and a web applica-tion mainly featuring:
• Addition of music scores to the system, performingtheir recognition and conversion to MusicXML inan integrated manner, allowing the user to confirmand correct the conversion results at the last stage ofthis process.
• Complete maintenance of a fully navigable musicscores archive, including both the original versionand the digital version obtained from the opticalrecognition.
• Browsing and searching the database, as well as theMusicXML contents. Visualization, downloadingand edition of the selected music scores.
• Complete system management.
The architecture for the proposed system is based on aclient-server model. The system is intended to be accessi-ble through the Internet. There are three different entitiespresent in this system, as it can be seen in Figure 1.
LAN
Internet
Web BrowserClient 1
Web BrowserClient 2
Web BrowserClient N
Repository 1Repository 2
Repository N
Web Server
OMR engine 1OMR engine 2OMR engine N
Search engine
Figure 1. Generic system architecture
TheRepositorymodule stores the original scanned score,the digital counterpart in MusicXML and all the descrip-tive metadata inserted by the user, as detailed latter. Allthe remaining system contents, such as the user informa-tion, are also stored in this entity.
TheWeb Serveris the user access point to the system aswell as to all of its processing modules run on the server,encompassing the search engine and the optical recogni-tion engine for the music scores. There is support for theinclusion of several OMR Engines, aiming to provide theability to meet different needs (e.g. different music no-tation systems). The most adequate OMR Engine can bechosen manually by the user or automatically by the sys-tem, by detecting the scores notation and type (i.e. hand-written or printed). Our Search Engine allows not onlygeneric searches throughout all of the system contents,but it also provides the capability of searching throughoutthe music scores MusicXML information in an innovativemanner. The Web Server interacts with the Repository and
with the Web Browser, which establishes the interface be-tween the user and the system.
The user interface on aWeb Browserallows the com-plete management of the music scores and associated meta-data, as well as carrying out the system administration.Generaly speaking, the user interface provides the user theability to execute all the necessary tasks to fully use theproposed system. On the administration side it is possibleto manage the users, as well as the whole system contentsand validate new ones.
There are four user types which can access the system:General User, Registered User, Privileged User and theAdministrator. The General User represents a visitor andmay only consult and download contents. The remain-ing types are registered users and according to their levelthey may add/edit/remove certain contents with or withoutrestrictions. The Privileged User is similar to the Admin-istrator and has full access to all functionalities, thoughit cannot manage the users from its own level. The con-tents added by Registered Users have to be validated byPrivileged Users or the Administrator, although the latertwo are able to add any contents without the need to bevalidated, they are considered to be trustful users.
For content management, functionalities include the ad-dition of music scores to the Repository, their automaticrecognition, visualization and edition, searching, and brows-ing. It is also possible to insert and browse informationrelated to the music scores—name, authors, instruments,musical genres—providing the user with a Repository con-taining all the necessary information to keep a completemusiccorpus. This metadata can then be used by the useron search queries or for a more flexible browsing expe-rience. Finally, a music work is organized into sections,where each section is a music score which represents apart from the whole music work. This flexible structureallows accommodating either simple or complex works onthe system. Each score can be visualized and its represen-tation in MusicXML edited, side-by-side with the originalscore directly on the Web Browser. Both the visualizationand the edition of the music scores are done in a graphi-cal easy-to-use editor available through the Web Browser.Figure 2 illustrates the information recorded in the systemassociated with a music work.
Figure 2. Content stored in the Repository associatedwith a music work
2.1. Prototype Implementation – OMRSYS
We developed a prototype—OMRSYS—taking the sys-tem architecture shown in Figure 1 as a basis. Currently,the prototype supports only a single Repository collocatedwith the Web Server and a single OMR Engine was usedto prove this concept.
The Repository is implemented as a PostgreSQL1 data-base, an open source Database Management System (DBMS).The main reasons for choosing PostgreSQL were its na-tive XML support, needed for creating a search enginethat allows searching on the scores’ MusicXML counter-parts stored on the database. This is an important aspectas it is a major feature for the proposed system. It isalso a very mature and well documented popular DBMS.Another important feature is the great ease of integrationon the framework we have selected to develop the pro-totype, which we will describe next. MySQL2 is the de-fault DBMS used on the framework chosen for developingthe Web Application, but it lacks on XML support. OtherDBMSs were also considered but PostgreSQL was the onethat best fitted our needs.
The development of the Web Application was supportedon Ruby on Rails3 . Rails is an open source, full-stackframework for developing dynamic database-backed webapplications according to the Model-View-Control (MVC)pattern. It is an almost complete platform which requiresonly a DBMS and a server. These features present us witha suitable choice for supporting the development of ourprototype. Ruby is the programming language at its core,a flexible and powerful Object Oriented language, with avast array of powerful characteristics. Another strong ad-vantage is the database manipulation, as it is greatly sim-plified, which is ideal to develop a system of this kind.Other frameworks [17, 7] were considered but fell shortcompared to the features of Ruby on Rails.
The Web Server selected for the prototype was the Apa-che HTTP Server4 with Mongrel5 to execute the WebApplication. This solution is suggested by Ruby on Railsand was found suitable, being both efficient and open source.
The Search Engine developed for this prototype allowsthe usual queries on the database contents, although thegroundbreaking search through the MusicXML contentsis still not possible at this stage.
The OMR Engine on the Web Server is the moduleresponsible for the automatic recognition of the submit-ted music scores. We initially adapted it from the opensource OpenOMR project [15]. The OMR Engine per-forms an automatic conversion of a submitted music scoreto a digital easy-to-use representation, the MusicXML for-mat. This digital format allows representing sheet musicby its musically-relevant parts, sections, phrases and mo-tives, thus easing the access to the relevant portions of thescore while browsing that score in a computer monitor.
1 http://www.postgresql.org2 http://www.mysql.com3 http://www.rubyonrails.org4 http://httpd.apache.org5 http://mongrel.rubyforge.org
Figure 3. The OMRSYS user interface
Simultaneously, MusicXML enables the retrieval of rel-evant musical information for analysis, thus facilitatingcertain types of computational analysis to be performedon acorpusof scores. Nevertheless, it also provides anadequate way to restore old sheet music, preventing themfrom oblivion. The other OMR applications we have anal-ysed were left aside because they were either less com-plete than OpenOMR at the time or they were commercialsolutions. Our main goal at this stage was to prove theconcept.
The User Interface has a great impact in the user ex-perience and is divided into several sections. There is theauthentication, title and quick search sections on the upperportion of the screen, followed by the middle and largestportion which includes the main menu and the contentsarea. The main menu works as a two-level expansiblemenu and allows access to all the system’s functionalitiesby grouping them in a logical manner. Similar functionali-ties follow similar designs to keep the interface consistent,intuitive and easy to learn. Some of the functionalities arethe common Create/Read/Update/Delete (CRUD) and thelisting of the database contents based on a chosen criteria,all following a familiar behaviour. The main differencesand most unique aspects rely on the submission and theupdate of music scores, which is discussed in Section 3.As an interface example, Figure 3 shows the GraphicalUser Interface (GUI) being used for browsing the musicscores available in the Repository. Both the PrivilegedUsers and the Administrator have additional options in themain menu for validation purposes.
The MusicXML Editor for this prototype is still imple-mented as a plain text editor and viewer, but already show-ing side-by-side the original score with its MusicXMLcounterpart. However, the development of a fully inte-grated graphical MusicXML editor is being pursued. Sucheditor would allow a higher level and intuitive edition andcould be developed for example in Flash, in the likes ofMusicRain [11], an online interactive sheet music viewer.The main purpose for the editor in this initial prototypewas to give the end-user the possibility to at least viewand edit the music scores in MusicXML. The existing edi-tors and visualizers [11] usually have a high maturity levelbut they are offline applications.
Figure 4. Score submission scenario
The Digital Rights Management control is done in twoways: the acceptance of a license agreement at the reg-istration process and the validation of submitted musicscores by a Privileged User before they become availableon the system.
3. USAGE SCENARIO: SCORE SUBMISSION
When the insertion of a music score in the system is re-quested, the user inserts the metadata associated with themusic score, of which some is optional—name, year, de-scription, etc—and associates it with one or more authorsand a musical genre. Each section of a music work hasto be associated with the instruments present on the musicscore. In the last submission step, after the insertion of therequested metadata, the user submits the various pages foreach section, as illustrated in Figure 4.
After validating the inserted data, the user then triggersthe automatic recognition process by calling a suitableOMR engine. Afterward, an overview of the submittedscore is shown by listing its contents allowing the user toview the result of the automatic process on the built-in ed-itor side-by-side with the original scanned image, offeringthe user the possibility to manually correct the automaticresults. After confirming the results and making the nec-essary corrections, the user then finalizes the music scoresubmission by accepting it. If the score was submittedby the Administrator or a Privileged User it then becomesimmediately available on the system; if it was submittedby a standard Registered User it is kept on queue for val-idation and will only become available once a user withadministration privileges validates it.
4. CONCLUSION
The proposed system offers a complete solution for thepreservation of our musical heritage. It includes an opti-cal recognition engine integrated with an archiving systemand a user-friendly interface for searching, browsing andedition. The digitized scores are stored in MusicXML, arecent and expanding music interchange format designedfor notation, analysis, retrieval, and performance applica-
tions. An additional benefit of the automatic conversion ofthe music score to MusicXML is the possibility of encod-ing the manuscript score in MX format, an XML-base,multi-layered format for music representation [3]. MXsynchronizes several layers belonging to the descriptionof a piece of music, e.g. an audio recording and score ofthe same piece.
A system of this kind promotes the creation of a fullcorpusof music documents, promoting its preservationand study. This project will culminate in the creation ofa repository of the handwritten scores, accessible online.The database will be available for enjoyment, educationaland musicological purposes, thus preserving thiscorpusof music in an unprecedented way.
Acknowledgments
This work was partially funded by Fundacao para a Cienciae a Tecnologia (FCT) - Portugal through projectPTDC/EIA/71225/2006.
5. REFERENCES
[1] AOMR2. [Online]. Available: http://www.bzzt.net/∼arnouten/wiki/index.php/Gamera#AOMR2:omr toolkit
[2] Audiveris. [Online]. Available: http://audiveris.dev.java.net
[3] A. Barate, G. Haus, and L. Ludovico, “An XML-based for-mat for advanced music fruition,” inProceedings of theThird Sound and Music Computing Conference, 2006, pp.141–147.
[4] Capella-scan. [Online]. Available: http://www.capella-software.com/capscan.htm
[5] Classical Sheet Music and MIDI Files. [Online]. Available:http://www.music-scores.com
[6] Delgado, Cristina, Machado, Jorge, and J. Machado,Catalogos da obra de Jorge Peixinho, ser. Jose Machado(ed.) Jorge Peixinho: In Memoriam. Lisbon: Caminho,2002.
[7] Google Web Toolkit. [Online]. Available: http://code.google.com/webtoolkit
[8] Institute of the Arts. [Online]. Available: http://patrimonio.dgartes.pt/?lang=pt
[9] The Lester S. Levi Collection of Sheet Music. [Online].Available: http://levysheetmusic.mse.jhu.edu/
[10] Music Information Center. [Online]. Available: http://www.mic.pt
[11] MusicRain. [Online]. Available: http://musicrain.us
[12] The MusicXML Format. [Online]. Available: http://www.musicxml.org
[13] The Mutopia Project. [Online]. Available: http://sca.uwaterloo.ca/Mutopia
[14] OMeR. [Online]. Available: http://www.myriad-online.com/en/products/omer.htm
[15] OpenOMR. [Online]. Available: http://sourceforge.net/projects/openomr
[16] SharpEye Music Reader. [Online]. Available: http://www.visiv.co.uk
[17] Tacos. [Online]. Available: http://tacos.sourceforge.net
A CONNECTED PATH APPROACH FOR STAFF DETECTION ON A MUSIC SCORE
Jaime S. Cardoso∗, Artur Capela†, Ana Rebelo‡, Carlos Guedes§
ABSTRACT
The preservation of many music works produced in the past en-tails their digitalization and consequent accessibility in an easy-to-manage digital format. Carrying this task manually is very time con-suming and error prone. While optical music recognition systemsusually perform well on printed scores, the processing of handwrit-ten musical scores by computers remain far from ideal. One of thefundamental stages to carry out this task is the staff line detection. Inthis paper a new method for the automatic detection of music stafflines based on a connected path approach is presented. Lines af-fected by curvature, discontinuities, and inclination are robustly de-tected. Experimental results show that the proposed technique con-sistently outperforms well-established algorithms.
Index Terms— Music, optical character recognition, documentimage processing, image analysis
1. INTRODUCTION
The Universal Declaration on Cultural Diversity adopted by the Gen-eral Conference of UNESCO on 2001 asserts that cultural diversityis as necessary for humankind as biodiversity is for nature, and thatpolicies to promote and protect cultural diversity thus are an inte-gral part of sustainable development. Being music a pivotal partof our cultural heritage, its preservation, in all of its forms, mustbe pursued. Frequently, the preservation of many music works en-tails their digitalization and consequent accessibility in a format thatencourages browsing, analysis and retrieval. In fact, many musicworks produced during the last centuries still exist only as originalmanuscripts or as photocopies. The digitalization of these worksis therefore a highly desirable goal. Unfortunately, the ambitiousgoal of providing generalized access to handwritten scores that werenever published has been severely hampered by the actual state-of-the-art of handwritten music recognition. The manual process re-quired to recognize handwritten musical symbols in scores and toput them in relationship with the spine structure of the score is verytime consuming. This justifies the research around the definition ofreliable optical music recognition (OMR) algorithms.
Staff line detection is one of the fundamental stages of the OMRprocess, with subsequent processes relying heavily on its perfor-mance. The reasons for detecting and removing the staff lines lieon the need to isolate the musical symbols for a more efficient andcorrect detection of each symbol presented on the score.
The detection of staves is complicated due to a variety of rea-sons. The handwritten staff lines are rarely straight and horizontal,
∗INESC Porto, Faculdade de Engenharia, Universidade do Porto, Portu-gal,email: [email protected]
†INESC Porto, Faculdade de Engenharia, Universidade do Porto, Portu-gal,email: [email protected]
‡INESC Porto, Faculdade de Ciencias, Universidade do Porto, Portugal,email: [email protected]
§INESC Porto, Escola Superior de Musica e Artes do Espectaculo, Portu-gal,email: [email protected]
and are not parallel to each other. For example, some staves may betilted one way or another on the same page or they may be curved.These scores tend to be rather irregular and determined by a per-son’s own writing style. Moreover, if we consider that most of theseworks are old, the quality of the paper in which it is written mighthave degraded throughout the years, making it a lot harder to cor-rectly identify its contents.
In this paper a method for the automatic detection of staff linesbased on a connected path approach is presented. The proposedparadigm uses the image as a graph, where the staff lines result as theconnected path between the two margins of the image. Our previouswork [1] was a first effort to explore this concept. We now presenta complete, principled solution, with a strong experimental valida-tion of the proposed approach. This introduction is concluded with abrief review of the work done in this area. In section 2 the proposedalgorithm is described. In section 3, the proposed algorithm is eval-uated experimentally using a well-known dataset of music scores.Finally, conclusions are drawn and future work is outlined in sec-tion 4.
1.1. Related Works
The problem of staff line detection is often considered simultane-ously with the goal of their removal, although exceptions exist [2, 3].The simplest approach consists on finding local maxima in the hor-izontal projection of the black pixels of the image [4]. These localmaxima represent line positions, assuming straight and horizontallines. Several horizontal projections can be made with different im-age rotation angles, keeping the image in which the local maximaare bigger. This eliminates the assumption that the lines are alwayshorizontal. An alternative strategy for identifying staff lines is to usevertical scan lines [5]. More recent works present a more or less so-phisticated use of a combination of projection techniques to improveon the basic approach [6].
Fujinaga [7] incorporates a set of image processing techniquesin the algorithm, including run-length coding (RLC), connected-component analysis, and projections. After applying the RLC tofind the thickness of staff lines and the space between the staff lines,any vertical black runs that are more than twice the staff line heightare removed from the original. Then, the connected componentsare scanned in order to eliminate any component whose width isless than the staff space height. After a global deskewing, tallercomponents, such as slurs and dynamic wedges are removed.
Other techniques for finding staff lines include the application ofmathematical morphology algorithms [8], rule-based classificationof thin horizontal line segments [9], and line tracing [10, 11]. Themethods proposed in [2, 3] operate on a set of ‘staff segments’, withmethods for linking two segments horizontally and vertically andmerging two segments with overlapping position into one.
In spite of the variety of methods available, they all suffer fromsome limitations. In particular, lines with some curvature or discon-tinuities are inadequately resolved. The dash detector [12] is one offew works that try to handle discontinuities. The dash detector is an
algorithm that searches the image, pixel by pixel, finding black pixelregions that it classifies as stains or dashes. Then, it tries to unite thedashes to construct lines.
2. A CONNECTED PATH APPROACH FOR STAFF LINEDETECTION
A staff line can be considered as a connected path from the left sideof the music score to the right side. As staff lines are almost the onlyextensive black objects on the music score, the path we are lookingfor is the shortest path between the two marginsif paths (almost)entirely through black pixels are favoured. More formally, lets andtbe two pixels of the image andPs,t a path over the image connectingthem. We are interested in finding the pathP that optimizes somepredefined distanced(s, t). This criterion should embed the need tofavour black pixels.
In the work to be detailed, the image grid is considered as agraph with pixels as nodes and edges connecting neighbouring pix-els. The weight of each arc,w(p, q), is a function of pixels valuesand pixels relative positions. A path from vertex (pixel)v1 to vertex(pixel) vn is a list of unique verticesv1, v2, . . . , vn, with vi−1 andvi corresponding to neighbour pixels. Thepath cost is the sum ofeach arc weight in the path
Pni=2 w(vi−1, vi).
As mentioned before, a staff line corresponds to a path from (al-most) the left margin of the image to (almost) the right side of theimage, (almost) always through black pixels. If the weight assignedto an edge captures the intensity of the path of the adjacent pixels,finding the best path between a points on the left margin and a pointt on the right margin translates into computing the minimum accu-mulated weight along all possible connected curves connectings andt:
d(s, t) = minPs,t
Xw(p, q). (1)
Staff lines are best modelled as paths between two regionsΩ1 andΩ2, the left and right margins of the score. The shortest path betweentwo regionsΩ1 andΩ2 is defined as a pathPs,t, with s ∈ Ω1 andt ∈ Ω2 and cost equal to
d(Ω1, Ω2) = mins∈Ω1,t∈Ω2
d(s, t). (2)
One may assume that staff lines do not zigzag back and forth,left and right. Therefore, one may restrict the search among con-nected paths containing one, and only one, pixel in each column ofthe image1. Formally, let I be an×m image and define an admissi-ble staff to be
s = (j, y(j))nj=1 , s.t.∀j |y(j)− y(j − 1)| ≤ 1,
wherey is a mappingy : [1, · · · , n] → [1, · · · , m]. That is, a staffline is an 8-connected path of pixels in the image from left to right,containing one, and only one, pixel in each column of the image.
Given the weight functionw(p, q), one can define the cost ofa staff asC(s) =
Pni=2 w(vi−1, vi). The optimal staff line that
minimizes this cost can be found using dynamic programming. Thefirst step is to traverse the image from the second column to the lastcolumn and compute the cumulative minimum cost C for all possibleconnected staff lines for each entry(i, j):
C(i, j) = min
8><>: C(i− 1, j − 1) + w(pi−1,j−1; pi,j)
C(i− 1, j) + w(pi−1,j ; pi,j)
C(i− 1, j + 1) + w(pi−1,j+1; pi,j)
1This assumption imposes a maximum detectable 45 rotation degree.
At the end of this process, the minimum value of the last column inC will indicate the end of the minimal connected staff. Hence, in thesecond step one backtrack from this minimum entry on C to find thepath of the optimal staff.
2.1. Algorithm outline
Assume one wants to find all staff lines present in a score. Thiscan be approached by successively finding and removing the shortestpath from the left to the right margin of the score. The removaloperation is required to ensure that a staff is not detected twice2.
Consider the music score presented in Figure 1(a). In Fig-ure 1(b) the first 11 shortest paths are traced. This example showsthat music symbols placed on top of staff lines do not interferewith the detection of the staff lines. Moreover, the example alsomakes clear that slight skewed scores do not pose any problem tothe proposed approach.
(a) Skewed staff lines with musicsymbols.
(b) The first shortest paths betweenleft and right margins.
Fig. 1. An exemplificative example.
Our first naive effort to apply the shortest path foundation to staffline detection in [1] did it by computing the shortest path betweentwo pixels at the same height on the left and right margin. Thatapproach was not robust enough to tilted scores, leading the detectedpaths to jump between consecutive staff lines.
Nonetheless, two main issues are still visible with the currentmethodology and need to be conveniently addressed. A criterion isneeded to stop the iterative detection of the shortest paths (staff lines)and the initial and the final parts of a path should be trimmed.
2.2. Proposed Algorithm
To detect the staff lines, the proposed overall algorithm starts by esti-mating the staff space height,staffspaceheight, and staff lineheight,stafflineheight. These lengths are used as referencelengths on subsequent operations. Robust estimators are already incommon use: the technique starts by computing the vertical run-lengths representation of the image. If a bit-mapped page of musicis converted to vertical run-lengths coding, the most common black-runs represents the staff line height and the most common white-runsrepresents the staff space height [7].
After estimating the reference lengths, the proposed approachapplies the main step of the framework, by successively findingthe shortest path between the left and right margin, adding thepath found to the list of staff lines and removing it from the im-age. The weightw(p, q) was experimentally set tow(p, q) =2 + (Ip + Iq)/255, with Ip, Iq ∈ 0, 255, for pixels in a 4-neighbourhood or
√2 times that value for 8-neighbours. The
removal operation sets to white the pixels on a vertical strip ofheight= 2×stafflineheight, centred on the detected staff
2We implemented the removal operation by setting to white the pixels onthe detected staff; image resizing could be a valid alternative [13].
line. To stop the iterative staff line search, a sequence of (arguably)sensible rules is used to validate the last found path; if it does notpass the checking, the iterative search is broken. Two validationrules were applied, both assessing features with respect to the firstdetected staff line (assumed to be the most perfect one). If the lastpath does not have a percentage of black pixels above a threshold, thesearch is broken (a threshold ofblackperc = 0.8 of the percentageof black pixels in the first staff line was used in the experiments).Likewise, if the shape of the last detected path differs too muchfrom the shape of the first detected path (measured as the average y-distance between both paths, after removing the means), the iterationis broken. A thresholdshapediff = 4×staffspaceheightwas experimentally selected.
After the main search step, detected staff lines are post-processed.Although true staff lines never intersect, the above algorithm mayoccasionally create intersecting lines. That may be due to a locallow quality of a line, leading the shortest path to jump betweenconsecutive lines; the next iteration will then follow the remainingsegments, intersecting with the previous detected line. To precludesuch final, undesired state, lines are post-processed to remove in-tersections. That is easily and efficiently accomplished by, for eachimage column, sorting ony the pixels of the detected lines and as-signing thei-pixel to thei-line. After this simple process, lines maytouch but they do not intersect. Each retained line is then trimmed atthe beginning and at the ending. As visible in the previous example(refer to Figure 1(b)), before meeting with a staff line, a path travelsthrough a sequence of white pixels. Likewise, after the end of thestaff line, the path goes again through a sequence of white pixelsuntil it meets the right margin of the image. In order to ignore all ofthese white pixels, the initial pixels of the path are discarded untila run of at leastblackrun black pixels are found in the path. Inthe same way, all pixels of the path after the last occurrence of a runof at leastblackrun black pixels are discarded. A threshold ofblackrun = 2×staffspaceheight was used on the experi-ments. Finally, lines are smoothed with a standard average low-passfilter. A window size of2×staffspaceheight was selected onthe experiments.
3. RESULTS
This section provides experimental results obtained on a set ofscores. Although the assessment of a new staff detection algorithmmay be done by visually inspecting the output on a set of scores—asadopted on [1]—, here we support the comparison on quantitativemeasures. The test set adopted for the qualitative evaluation of theproposed method is the one presented in [14]. The test set consistsof ideal scores to which known deformations can be applied. Thedistortions range from rotation and curvature to typeset emulationand staff line thickness variation—see [14, 15] for more details. Intotal, 630 images were generated from the originally perfect scores.To conveniently measure the performance of a staff line detectionalgorithm, we considered two different error metrics: the number offalse positive staff lines and missed to detect staff lines.
To evaluate these metrics, we start by computing the average Eu-clidian distance between each reference staff line and each actuallydetected staff line; then we solve the matching problem on the result-ing bipartite graph by minimizing the assignment cost (= distance).Only pairs with average error-distance bellow the staff line heightwere considered correctly matched (the other pairs were assumed tooriginate from a false positive staff line being matched to an unde-tected true staff line and were therefore unmatched). Now the twometrics result as the number of unmatched detected staff lines (false
positive) and unmatched reference staff lines (missed to detect).The proposed algorithm was compared with the three methods
considered in [14] for staff line detection3. As Dalitz’s algorithmperforms significantly better than the two other algorithms evaluatedin [14], we have only included Dalitz results in subsequent figures.It is important to state that the comparison reports only to staff linedetection algorithms, not to the removal phase. That explains theneed to introduce the aforementioned metrics, while not adoptingthe metrics introduced in [14] for assessing staff line removal.
The effects of the different deformations over the respective pa-rameter ranges are shown in Figure 2. With respect to the distortionsconsidered, our connected path based approach is the most robustand clearly outperforms the Dalitz algorithm. In fact, the perfor-mance of our approach is almost independent of intensity of the de-formation, for the range of values considered. This performance gainis even more noteworthy as the Dalitz algorithm is receiving as inputthe correct number of lines per staff, while the proposed approachdoes not rely on that information. Had not this been the case, thedifferential between both would have been much larger.
In summary, these experiments show the strengths of the pre-sented algorithm. Despite being based on a simple and intuitiveunderlying principle, the performance of the proposed algorithm isquite competitive. Moreover, the results are prone to be improvedeven further by elaborating the stopping criterion of the iterativesearch or the post-processing rules, while leaving intact the mainprinciple of the method.
4. CONCLUSION
The first challenge faced by an OMR system is staff line detection.This first task dictates the possibility of success for the recognitionof the music score. In the case of handwritten music scores, theexisting solutions are far from presenting satisfactory results.
In this paper, a new algorithm for the automatic detection ofstaff lines in music scores was proposed. The connected path ap-proach for staff line detection algorithm is adaptable to a wide rangeof image conditions, thanks to its intrinsic robustness to skewed im-ages, discontinuities, and curved staff lines. The handwritten stafflines are rarely straight and horizontal, and are not parallel to eachother. Some staves may be tilted one way or another on the samepage or they may be curved. While current approaches apply a chainof heuristics to correct these undesired imperfections, the connectedpath algorithm is naturally robust to these challenging conditions.The proposed approach is robust to broken staff lines (due to low-quality digitalization or low-quality originals) or staff lines as thinas one pixel. Missing pieces are automatically ‘completed’ by thealgorithm.
Acknowledgments
This work was partially funded by Fundacao para a Ciencia e a Tec-nologia (FCT) - Portugal through project PTDC/EIA/71225/2006.
5. REFERENCES
[1] Ana Rebelo, Artur Capela, Joaquim F. Pinto da Costa, CarlosGuedes, Eurico Carrapatoso, and Jaime S. Cardoso, “A shortestpath approach for staff line detection,” inThird InternationalConference on Automated Production of Cross Media Contentfor Multi-channel Distribution (AXMEDIS 2007), 2007.
3The source code is available upon request to the authors.
−5 0 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Angle (degrees)
erro
r
false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)
(a) Rotation.
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Amplitude/staffwidth
erro
r
false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)
(b) Curvature.
0.02 0.04 0.06 0.08 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rate of whitened pixels
erro
r
false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)
(c) White speckle.
2 2.5 3 3.5 4 4.5 5 5.5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Maximum deviation n
erro
r
false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)
(d) Line y-variation. (e) Shortest path (left) and Dalitz (right) results for rotation(angle=-4).
2 4 6 8 10 120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Maximum vertical shift nshift
(for ngap
= 10)
erro
r
false staves (connected path)missed staves (connected path)false staves (Dalitz)missed staves (Dalitz)
(f) Typeset emulation.
Fig. 2. Effect of different deformations on the overall error rates. See [14] for parameter details.
[2] Hidetoshi Miyao and Masayuki Okamoto, “Stave extractionfor printed music scores using dp matching,”Journal of Ad-vanced Computational Intelligence and Intelligent Informatics,vol. 8, pp. 208–215, 2007.
[3] Mariusz Szwoch, “A robust detector for distorted musicstaves,” inComputer Analysis of Images and Patterns, pp. 701–708. Springer-Verlag, Heidelberg, 2005.
[4] Dorothea Blostein and Henry S. Baird, “A critical survey ofmusic image analysis,” inStructured Document Image Anal-ysis, Baird, Bunke, and Yamamoto (Eds.), Eds., pp. 405–434.Springer-Verlag, Heidelberg, 1992.
[5] N. P. Carter, Automatic Recognition of Printed Music in theContext of Electronic Publishing, Ph.D. thesis, Departments ofPhysics and Music, University of Surrey, 1989.
[6] D. Bainbridge, Extensible Optical Music Recognition, Ph.D.thesis, Department of Computer Science, University of Canter-bury, Christchurch, NZ, 1997.
[7] Ichiro Fujinaga, “Staff detection and removal,” inVisual Per-ception of Music Notation: On-Line and Off-Line Recognition,Susan George, Ed., pp. 1–39. Idea Group Inc., 2004.
[8] Ignacy Gawedzki, “Optical music scores recognition,” Tech.Rep., 2002.
[9] J. V. Mahoney, “Automatic analysis of music score images.B.Sc thesis,” 1982.
[10] D. Prerau,Computer pattern recognition of standard engravedmusic notation, Ph.D. thesis, Department of Computer Scienceand Engineering, MIT, 1970.
[11] J. W. Roach and J. E. Tatem, “Using domain knowledge inlow-level visual processing to interpret handwritten music: an
experiment,” Pattern Recognition, vol. 21, no. 1, pp. 33–44,1988.
[12] I. Leplumey, J. Camillerapp, and G. Lorette, “A robust de-tector for music staves,” inProceedings of the InternationalConference on Document Analysis and Recognition, 1993, pp.902–905.
[13] Shai Avidan and Ariel Shamir, “Seam carving for content-aware image resizing,” inACM Transactions on Graphics(SIGGRAPH 2007), 2007, vol. 26.
[14] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski,and Ichiro Fujigana, “A comparative study of staff removalalgorithms,” IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2004.
[15] Christoph Dalitz, Michael Droettboom, Bastian Czerwinski,and Ichiro Fujigana, “Staff removal toolkit for gamera(2005-2007,” .