jaymz campbell - final year ise3 report

Imperial College London

Department of Electrical and Electronic Engineering

Final Year Project Report 2006

Project Title: Music to Tablature Transcription for Electric Guitar

Student: J.T. Campbell

Course: ISE3

Project Supervisor: Dr C. Papavassiliou

Second Marker: Dr T.J.W Clarke

Music to Tablature Transcription for Electric Guitar

2

Chapter 1: Abstract

The guitar is a universally loved instrument. Learning to play takes time and

dedication and many accomplished amateur’s first start by playing along to

favourite songs. Often the only source these people have to semi-accurate

transcriptions come from the form of tablature files downloaded from the web.

Obviously this leaves the beginner at the mercy of others idea’s as to how best to

play certain songs.

Another way people learn new songs is by simply listening and playing along

until they get ‘the sound just right’. The software outlined here is an automated

way of this time old method. Using Fourier transforms, spectrograms and

harmonic analysis to look at the frequency content in a sound recording, a set of

methods are put forward to match this information to notes on the guitar. After

the matching has been done the output would represent a tablature style version

of what had just been analysed.

Starting with a look at simply what we can learn from the frequency content,

this report moves on to discuss the issues and problems with identifying

temporal information and recognising further features of common guitar play

such as chords and bends. Looking to the future of the project ideas are put

forward for identifying where in particular a note should be placed on the

tablature; better chord handling; and improving the ability to see more detail in

the time that notes are played without sacrificing being able to work out the

notes themselves quickly and efficiently.


3

Chapter 2: Acknowledgements

I would like to thank my friends both in London and abroad for their support

during this academic year, which has been particularly stressful as I balanced

employment with academic commitments.

My family have also proven to be an invaluable form of support and

reassurance throughout my time here at Imperial.

Finally, Dr P. Naylor, for his invaluable assistance these past few years, without

his help I may not have even had the chance to work on this project.

I would like to dedicate my work to my grandfather, James Campbell Snr, who

passed away during the start of this academic year. He will be a greatly missed

member of my family and was an inspiration to all my siblings.

Jaymz Campbell


4

Contents

Chapter 1: Abstract....................................................................................... 1

Chapter 2: Acknowledgements......................................................................3

Contents ...........................................................................................................4

Chapter 3: Introduction ................................................................................7

The Main Idea............................................................................................... 7

What were the goals?.....................................................................................8

How were the goals to be achieved? ...............................................................8

In the end, what was accomplished?...............................................................9

What is the future direction for the project? .................................................... 9

Chapter 4: Background Theory ................................................................... 10

A closer look at the Music............................................................................ 10

The Guitar itself .......................................................................................... 13

First things first: Good vibrations ................................................................. 15

Making it sing: Playing the guitar................................................................. 16

Timing in Music & Tablature....................................................................... 21

Looking for pitch: Frequency Analysis ......................................................... 22

Limitations of this method: Time resolution................................................. 29

Other potential methods of determining frequency ....................................... 30

A note on the phase information in signals ................................................... 31

Chapter 5: Design....................................................................................... 33

The STFT & time resolution ........................................................................ 33

Spectrograms............................................................................................... 36

Harmonic Product Spectrum (HPS) ............................................................. 41

Identifying the String a note is played on...................................................... 45

Identifying a chord ...................................................................................... 46

Identifying a bend........................................................................................ 48

Identifying hammer-on’s / pull-off’s............................................................. 49

Identifying actual notes, not ‘time-slice notes’............................................... 50

A ST-HPS rather than the ST-FT ................................................................. 50

Chapter 6: Implementation ......................................................................... 52

Chapter 7: Evaluation & Conclusions ......................................................... 55

The results obtained in general..................................................................... 55


5

Tackling too complex a problem .................................................................. 55

Transcription of varying speed of play (tempo) ............................................. 55

Thoughts about the work and future direction .............................................. 56

Chapter 8: Future Work.............................................................................. 58

Use of Wavelets over the STFT.................................................................... 58

Moving to a C++ framework ....................................................................... 58

Building a Complete chord database ............................................................ 58

Determining the current string being played ................................................. 59

Adding Probability maps for each note......................................................... 59

Chapter 9: Bibliography.............................................................................. 62

Appendix 1: Guitar note names vs. Frequency............................................ 63

Appendix 2: Sampling Theorem................................................................. 64

Appendix 3: Setup used to record............................................................... 65

Appendix 4: Software used throughout....................................................... 66

Appendix 5: Easy Tab Pro ......................................................................... 67

Images, formula & tables

How notes are arranged into octaves, repeating every 12 .................................. 11

Comparing Middle C on a guitar & Piano........................................................ 12

Fret board map of standard tuned guitar .......................................................... 14

Relationship between change in length and change in frequency....................... 15

Relationship between change in frequency and change in tension ..................... 16

Relationship between change in frequency and change in density ..................... 16

Example of a 3-note pull-off being performed................................................... 17

Wave forms for the pull off example ................................................................ 18

Example of a 1-step bend being performed ....................................................... 19

Wave forms for the bend example.................................................................... 19

Wave form of the G-Chord example................................................................ 21

The DFT equation .......................................................................................... 24

Euler’s Formula .............................................................................................. 24

An example of the FFT on time domain data................................................... 25

Rectangular windowing effect of using a FFT size greater than the sample size. 27

A 100 Point Hamming window ....................................................................... 28


6

The effect of a Hamming window on a large size FFT spectrum....................... 29

Definition of the ‘Hamming’ window .............................................................. 29

Comparing the usefulness of the Phase and Amplitude spectra of a complex signal .............................................................................................................. 32

The STFT definition ....................................................................................... 33

The effect of the window size for frequency and time resolution for common sample rates .................................................................................................... 35

Resolution compared to various window sizes for a 22.05 KHz sampling rate... 36

Definition of the spectrogram .......................................................................... 36

Spectrogram of the pull-off example................................................................. 37

Power spectrum at 0.1seconds (first note)......................................................... 38

Power spectrum at 0.2seconds (second note) .................................................... 38

Power spectrum at 0.3seconds (third note) ....................................................... 39

Spectrogram for notes running from F# to A on the Low E string (87-110Hz)... 40

Power spectrum for the G# (103Hz) note at 3 seconds into the spectrogram ..... 41

Graphical description of the harmonic product spectrum.................................. 42

Detail of the down sampled and original spectra for G#................................... 43

The harmonic product spectrum for G#........................................................... 44

Spectrogram of the same note on different strings (A at 110Hz) ........................ 45

Spectrogram of a G chord (root note fundamental: 98Hz) ................................ 46

Applying the Harmonic Product Spectrum to a chord to identify its notes......... 47

Spectrogram of a one-step bend ....................................................................... 48

Path cost function using a heuristic for general graph search............................. 60

Example of 3x3 ‘next note’ probability matrix multiplied by a matrix of possible positions ......................................................................................................... 61

Note name vs. Frequency ................................................................................ 63

Photograph of the microphone in position near the amplifier............................ 65

Screen shot of Easy Tab Pro, taken from www.lookoutsoft.net ......................... 67


7

Chapter 3: Introduction

The Main Idea

This project is designed to perform a specific task. That task is to take the

recorded sound of a guitar being played and turn that sound into tablature.

There are a number of problems to overcome to accomplish this and they will

be outlined over the next few pages along with hints as to potential solutions.

The technical details have been, for the most part, omitted here. Instead you

may find them in the chapters that follow, the mathematics is not particularly

involved and can be followed quite easily with enough intuition. Where an idea

is being described it should be quite understandable even for those without a

background in signal processing.

Many papers on the topic of music analysis focus on transcribing the full score

of any and every instrument or methods of psychoacoustic analysis. I chose to

focus on the guitar as it is something with which I am very familiar with and felt

I could really relate the theory to the needs of the software. For example, in the

design section, during the discussion on selecting how many time ‘slices’ should

be analysed (and therefore how many notes will be able to be resolved) I made

assumptions on what the requirements of an average to proficient guitar player

would be based on my own experience.

The idea for this project is around six years old and came from watching my

brother work with his band mates at practices. As the guys would work on new

songs, my brother being the lead guitarist, would come up with various

rhythms, licks & mini-solo’s as the night would go on. Often, the only way to

‘take note’ of these bursts of inspiration was to record the entire practice session

and then listen back whilst trying to remember the flow of the music. It was on

one particular night in Philadelphia that he was having trouble transcribing a

fairly lengthy solo that I thought about the idea of somehow looking at the

recorded sound file and picking out the notes.


8

As I had yet to discover Fourier analysis (converting a signal from its time

domain form to the frequency domain) the idea was impractical and put on the

back burner whilst I focused on other projects. Now however, with an

understanding of the relationship between time and frequency in signals, and

the methods to switch between the two, I realised I was at a point to finally

develop the idea into a viable system.

What were the goals?

The main goal was to convert an audio file of a guitar being played to its

tablature form. This obviously requires someway of recognising when a note is

played. Further enhancements were to recognise more descriptive guitar play

such as the bending of strings. A discussion on some basic music theory and

guitar basics as well as the fundamentals of Fourier (frequency) analysis can be

found in the Background Theory section.

How were the goals to be achieved?

I had first planned to develop a hardware interface for connecting the guitar to

the PC. Early on however I abandoned this idea as I found the time

concentrating on hardware design to be of little value to the overall goal.

Remembering the initial problem, that of my brother having to listen to hours of

audio tape, reminded me that the real reason I wanted to work on this system

was for the analysis, not the fact that I could plug a guitar into a PC. Thinking

in a similar manner about the software, I decided that Matlab TM provided an

excellent environment to explore the signals without worrying about the lower

level work that would be necessary in say, C++.

Matlab provides high level abstracted functions such as wavread (for reading in

a Microsoft TM WAV file) and the FFT (fast Fourier transform, converts a time

signal into a frequency one). With just these two functions, it should be possible

to look, in a detailed way, at the frequency content, and therefore note content,

of a particular file. In order to understand the content in the Design and

Implementation sections it is recommended reviewing the material in the

Background Theory section, in particular the Frequency Analysis subtopic.


9

In the end, what was accomplished?

Much of the work from this project is in the form of a set of observations from

which there is a real opportunity for further development. I have suggested

some possible algorithms from which more detailed work can be made as well

as a number of ideas to further explore. It will likely be appreciated as this

report is read that to explore every avenue mentioned over the next few chapters

would have been impractical in the time available. Rather, I hope by showing

the idea’s I have been working with, that this project still has a very open and

active future.

Some of the accomplishments include a method for determining, to quite an

accurate level, the occurrence of chords, including and their individual note

make-up. I also describe a potential method for determining where a note

should be placed on the tablature solely by analysing the wave file. The only

other piece of software I have seen that does something similar requires the use

of an A/D D/A converter.

What is the future direction for the project?

After starting this project it did not take long to appreciate the depth to this

subject. The analysis of music is an emerging field and the more time I spent

thinking about the problems I was encountering, the more I felt like with

enough work and perhaps a user community behind it this project has the scope

to be quite successful.

For this reason I registered the domain http://www.writemytab.com and aim

to start work building the site itself after the presentation, which will follow

soon after this reports submission.


10

Chapter 4: Background Theory

A closer look at the Music

Music is one of the great methods of communication for human beings. Music

can transcend class and cultural boundaries and bring together people on an

equal footing. Before people began to look at it from a purely mathematical

point of view, instruments were being created, used and enjoyed without the

deep understanding of why certain lengths, thicknesses, materials etc. produced

the sounds they did.

In order to follow the discussions that will present themselves later, it would be

useful to have an idea about how musical notes work. A good starting point is a

single note.

A note can be thought of as an atomic unit of music. In other words, a single

note is the base from which all other musical elements are made. A chord such

as Fmajor uses 4 notes. Notes are associated with a pitch or fundamental

frequency. For many people pitch and frequency are alike, however there are

subtle differences. Pitch can be thought of as the brains link between the note

name and its fundamental frequency. It is a psychological correlation between

what we think as a certain note and its true frequency. Over the course of time

various pitches have been used to define musical scales, during the 18th Century

for example A above middle C varied from 400Hz to 450Hz. ISO standards

define A above middle C as 440Hz, and the purpose of this project it is

important to note that this is the value used in any discussion regarding

correlating notes to frequency. This is the standard in the UK and USA,

although on the continent slightly higher pitches have become the norm.

Most modern/western music makes use of the diatonic scale. This is the familiar

scale of 7 notes, A-G, that we have grown up with. Within this scale, there are 5

whole-tone and 2 half-tone steps. If the half-tone steps are maximally separated

(i.e., the notes are spaced as far as part as they can be), this leads us to the

familiar arrangement of notes, running from A, A#, B, C, C#, D, D#, E, F, F#,

G, G#. The study of musical scales can become quite complex for those not


11

familiar with the terminology. To understand this project, it is sufficient to have

knowledge of the ordering of the notes only. With this basic knowledge of notes

and the concept of a fundamental frequency as an identifier for then, let us turn

our attention to the grouping of notes into octaves.

D# E F F# G G# A A# B C C# D D# E F F# G G# A A# B C C# D

How notes are arranged into octaves, repeating every 12

An octave is simply the interval between one note and another of half or double

the frequency. So if a pure A tone is 440Hz, an octave above would be 880Hz

and an octave below 220Hz. For the seven standard notes and the related 5

sharps/flats, the notes repeat themselves every octave. The notes within each

octave use the same names as their upper and lower octave equivalents as they

are perceptually the ‘same sound’. There is a quality to them which makes our

brains associate them as the same note only higher or lower in pitch. The diagram

above shows in blue one full octave. To the left is the lower octave and to the

right in green the next octave up. The green A could be 880Hz for example,

making the blue one 440Hz.

While notes related by a power of 2 are separated by an octave, notes that are

integer multiples of the original are known as harmonics. Obviously, some

harmonics of the fundamental can be members of lower or higher octaves of the

original note. These extra notes often appear with a reduced volume in the

sound although this is not always the case. Harmonics add richness to a musical

note; unfortunately this richness adds complexity when we wish to find the

original, fundamental frequency of the note in question.

You can get an idea of the difference harmonics make by comparing the sound

of a piano note to the equivalent on a guitar. The guitar sounds warmer and

there is a certain quality to the sound that tells us it is not a pure note. We will

look at this in more detail in just a bit, for now take a look at the graph below.


12

0 1 2 3 4 5

x 104

-0.5

0

0.5

Guitar, Full waveform

0 1 2 3 4

x 104

-1

-0.5

0

0.5

1Piano, Full waveform

2.91 2.915 2.92 2.925 2.93 2.935

x 104

-0.06

-0.04

-0.02

0

0.02

0.04

0.06Guitar, waveform detail

1.65 1.655 1.66 1.665 1.67

x 104

-0.2

-0.1

0

0.1

0.2

0.3

Piano, waveform detail

Comparing Middle C on a guitar & Piano

The two waveforms have quite a similar appearance when viewed fully. You

can see the trailing off as the note becomes quieter with time. Just from the top

two graphs we can see that the guitar falls off into silence much more quickly

than the piano. This is known as its decay rate. What is more interesting however

is the detail in the waveforms. The piano note has a smooth appearance that is

constant, almost sine like. The guitar note, whilst sharing the same fundamental

frequency, is noisy and full of other components. These are the harmonics that

were mentioned earlier.

Now that we have some familiarity with the way notes are arranged and how

they are formed in general it is important to see how they translate to the guitar.


13

The Guitar itself

The guitar can trace its early roots back to 1400b.c, in what would now be

Syria. Evidence suggests a four string instrument with a curved body was played

by the Hittites, an early race occupying Asia Minor. The Roman’s and Greek’s

both had guitar like instruments, evolving into two distinct families by around

1200a.d. One of these families was known as the guitarra Latina (latin guitar) and

with its single sound hole and narrow neck, closely resembles the modern day

acoustic guitar.

It was at the turn of 19th century however that the guitar had evolved into the

familiar six string form of today; Antonio Torres Jurado is widely regarded as

having made the changes that have resulted in what we would call a guitar.

During the 1930’s Rickenbacker started to produce early electric guitars using

tungsten in the pickups. The solid body form that is common today was


14

pioneered by Les Paul in the early 40’s. With no resonating airspaces, the sound

is completely produced by the strings vibration over the pickups.

The guitar shown above is a BC Rich Warlock, and despite the body’s shape, it

shares all the things that make it a modern electric guitar. Starting from left, the

first thing to note are the tuning pegs. These hold the strings tight and can be

adjusted as needed to provide tuning. Winding the peg tighter causes the string

to become further tensed, therefore making a higher pitch note.

The neck is the most important part of any guitar. It is made from a single piece

of wood and then divided up into smaller sections by frets. Frets are metal

inserts, placed into the neck, that mark the boundaries between semitones in

notes. Guitars normally have between 22 and 24 frets, the BC Rich Warlock

that is shown here, has 24, allowing for a full range of 5 octaves. As you move

down the neck towards the body, the distance between consecutive frets

decreases although the ratio of the distance between these frets and the bridge

remains constant ( 12 2 ). This is due to the equal temperament of the frets (i.e.,

the octave is divided in equal frequency ratio’s).

As there are only twelve fundamental notes and due to the fact that strings are

separated only by 4 or 5 semitones, notes will obviously overlap on the fret

board. This is one of the main problems that need’s to be solved if we are to

map played notes to tablature in a realistic way. The fret board below shows

which notes are actually equivalent for the first 12 frets. Fret’s 13-24 will be an

octave higher than their cousins to the left.

1 2 3 4 5 6 7 8 9 10 11 12

High F F# G G# A A# B C C# D D# E

B C C# D D# E F F# G G# A A# B

G G# A A# B C C# D D# E F F# G

D D# E F F# G G# A A# B C C# D

A A# B C C# D D# E F F# G G# A

E F F# G G# A A# B C C# D D# E

Fret board map of standard tuned guitar

Each block of colour represents a set of notes that overlap to repeat. The

variously shaded individual notes show similar notes across all strings. The ‘F’

for example on the high E string could also be played at fret 6, b-string or fret


15

10, g-string and so on. Obviously this redundancy is good for the player; it

makes it much easier to move around the full tonal range since the hand can be

kept in a certain position whilst the fingers move around different strings and

nearby frets.

Ease of use for the player complicates things when we want to determine which

string is actually fretting a certain note. If the ‘f’ note from the above paragraph

was played on the b-string, how will the software determine this from the other

possible strings?

First things first: Good vibrations

The most important thing to realise about the guitar and indeed any instrument

is that it works on vibrations. For the guitar, it’s all about how the string

vibrates over the pickup and a good player can vary this vibration mid-note to

produce all manner of effects. It is these vibrations that produce the sounds

which we hear, and thus if we can find a way to look at the vibrations in some

detail, then perhaps we can work back to the note on the instrument that

produced it.

There are only 3 ways to change the pitch of a vibrating string and their

relations are well understood. The easiest way is to simply change the length of

the string. This is what happens when somebody frets the guitar at a certain

position. The length is reduced to that of the next fret and the bridge. If the

string is made longer then it obviously will take longer to vibrate, therefore

reducing the pitch/frequency. In words, a change in frequency is inversely

proportional to the logarithm of the length ratios.

l

lff 0

0 log

Relationship between change in length and change in frequency

The second method that will be familiar to any guitar player is changing the

tension in the string. The tuning pegs are used for this purpose, as they wind

round; the string is tensed more & more. The tenser the string becomes the

higher the pitch. The actual relationship is frequency change is proportional to

the square root of the change in tension.


16

Tf

Relationship between change in frequency and change in tension

Finally, the pitch can also be changed by varying the density of the string.

Obviously, a denser, heavier string will vibrate more slowly than a lighter one

given the same energy. The relationship is similar to that for tension only

inverted.

1

f

Relationship between change in frequency and change in density

From these three equations it can be seen that as we move up the strings

towards high E and up the fret board itself towards fret 24, the gap in frequency

will increase between consecutive frets. This will present a problem when we

come to decide on how many frequencies we need to differentiate between for

accurate transcription.

Changing the density of the string just isn’t possible when you’re actually

playing a guitar, instead by fretting and changing the tension, guitar players can

create new sounds on the fly as they play and open up the tonal range. They do

this with a combination of hammer-on’s, pull-off’s and bends. The diagrams

over the next few pages describe how these work, as well as some examples of

the sound waves produced. In the main section we will look in detail at these

wave forms, for now consider the next few pages a quick course in basic guitar

playing.

Making it sing: Playing the guitar

Aside from basic fretting, most guitarists use a combination of hammer-on’s,

pull-off’s and bends to create interesting music. It is important to be aware of

how each of these mechanisms work in order to understand what will be

important to look out for in later sections. First, the pull-off.

This is quite simple, and is the opposite of a ‘hammer-on’. The fingers fret each

note to be played; then, in one smooth motion (after the string has been

plucked) each finger is snapped off the fret board in turn. This sounds each note


17

separately but continuously. The fact that the frequency changes are relatively

continuous (meaning without silence, not continuous in the full sense of

frequency transition) is important, since this is the hallmark of a pull-

off/hammer-on and should give a clue as to their occurrence.

In order to show a pull-off in tab, each separate note is marked on the string line

and a bracket is used to link the two. The image below shows an example of

this.

Example of a 3-note pull-off being performed

The hammer-on is the same only in reverse. So for the above example, the

player would first fret 12, then after the string has been plucked, force his finger

down on fret 14 for a moment and then a further finger would depress fret 15.

The tab is also identical with only the ordering of notes reversed. The wave

form below shows the pull-off example.

In the complete waveform it is possible to see the three notes. As the first note is

played and then pulled off to sound the next one, a drop in volume occurs. The

effect of snapping the finger off the board to sound the last note provides a boost


18

in volume. The three other plots show 100 samples from within the range of

each note. Whilst the change in frequency is rather difficult to appreciate here, it

is possible to see 3 different periodic waveforms, indicative of three separate

notes.

0 0.5 1 1.5 2

x 104

-1

-0.5

0

0.5

1Pulloff Example, high E, fret 15-14-12

0 50 100 150-1

-0.5

0

0.5

1Zoomed detail, first note

0 50 100 150-0.4

-0.2

0

0.2

0.4Zoomed detail, second note

0 50 100 150-1

-0.5

0

0.5

1Zoomed detail, third note

Wave forms for the pull off example

With hammer-on’s and pull-off’s covered it leaves only the bend. Bends are

quite easy to perform poorly but when mastered give the music a completely

new feel. The important thing to remember with bends is unlike pull-offs for

example, the frequency or pitch change is continuously changing until the

appropriate note is sounded. Bends also tend to vary in how long they are held,

some are quick, lasting only a few milliseconds, others can be drawn out for

perhaps 30 seconds. Obviously this will present its own set of problems to be

discussed.


19

Example of a 1-step bend being performed

0 50 100 150-1

-0.5

0

0.5

1Detail, the C note

0 50 100 150-0.4

-0.2

0

0.2

0.4Detail, the D note

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

-1

-0.5

0

0.5

1G-String, Fret 5 bent 1 step

Wave forms for the bend example

This is a fairly normal bend, lasting around 4 seconds. The decay rate of the

guitar is apparent, after 1 second the signal quickly fades out. The two subplots

show a zoomed region, firstly the C note (G string fretted at 5) which occurs

initially when the string is struck. At around 3 seconds the D note (what would

be the G string fretted at 7) can be heard. Again, much like the pulloff example


20

it can be difficult to see the difference between the two wave forms. Later, after

moving to the frequency domain, the differences will be much more apparent.

Finally, an example of a chord, here a G is played. For this particular chord all

strings are struck. There are 3 main notes that make up the sound of the G

chord. The lowest pitch note is G, which is where the name comes from

(compare the note name to fret 3, low E string on the fret board map showed

previously).


21

0 1 2 3 4 5 6 7

x 104

-1

-0.5

0

0.5

1G Chord, full waveform

0 200 400 600-1

-0.5

0

0.5

1G Chord detail

0 50 100 150-1

-0.5

0

0.5

1G Chord further detail

Wave form of the G-Chord example

The wave form is extremely complex, being made up of numerous notes’ (bear

in mind all strings where struck, open strings sound to their tuning). The

number of notes & harmonics present give this sound a warm feeling, listening

to it, it is very obvious that this is more than just a single note. Determining

chords will pose a particularly tricky problem.

Timing in Music & Tablature

Musicians tend not to think in terms of ‘play this note for x seconds’. Instead

they refer to the time signature of the piece and the particular type of note to be

played, such as a quarter or semibreve. A time signature of 4

4 means that each

bar has 4 beats, with the bottom number 4 meaning that ‘quarter’ notes are

assigned 1 beat. A signature of 8

6 would mean a bar has 6 beats, and each

‘eighth’ note has a value of 1 beat. A bar is simply one section or measure of

music. Different time signatures give a different quality to the music. Most


22

western music makes use of 4

4time; things like waltzes often use

4

3 time to

create the ‘la-ta-taa, la-ta-taa’ feel.

In order to give an idea for the speed of a certain piece it will be accompanied

by its beats per minute or bpm value. Common values for guitar pieces are in the

order of 80-160bpm. The actual number of notes that are to be played per

second will depend upon the information in the time signature. If says to use

quarter notes and play at 120bpm, then this would equal a note rate of around 2

per second (120/60 = 2 beats per second, and using quarter notes gives 2/4 =

one note every half second). World famous speed guitarists such as Steve Vai

can play around 28 notes per second, far beyond the realm of most ordinary

guitarists. In general, a note rate of around 4-8 per second is more likely for

most after considerable practise.

When it comes to tablature, timing information is often scarce. This is due to

the limited nature of the medium. ‘Tab’ sheets are often no more than the list of

notes in the correct order, normally the player is expected to be playing along

with a song and therefore know the correct timing of the notes. Professional

tablature books will normally include the actual score above the tab itself.

Without it there would be no way of knowing if a note was a quarter beat, a half

beat or so on. For the purposes of this project, the only timing information

required will be getting the notes in the correct order.

Looking for pitch: Frequency Analysis

Having looked at the guitar itself and some background regarding musical notes

it is time to start looking at ways to examine the frequency (and hence pitch)

content of a signal. Immediately, Fourier analysis comes to mind. A fairly

simplistic overview of the methods will now be given.

In 1822 Joseph Fourier published his ‘Théorie analytique de la chaleur’. In this text

he claimed that any function (continuous or discrete) of a variable could be

represented by a summation of sine’s, each a multiple of the original variable.

Johann Dirichlet showed (under restrictions) that this was not wholly true;


23

however, Fourier’s real genius was to recognise that some discontinuous

functions could be represented by an infinite summation of a series.

By discontinuous you can imagine a function that is non-zero only for a certain

period of time. It is aperiodic. The signals from the guitar obviously fit this

profile. There are 4 main families of the Fourier transform. They are:

Fourier Series

Continuous Fourier transform

Discrete Fourier transform

Discrete-time Fourier transform

The Fourier series and Continuous transform both deal with signals that are

defined for all time t. In fact, the Continuous transform is a generalization of the

Series, extending it beyond solely periodic functions over infinite time t. The

Discrete transforms deal with signals which have been quantized in time. This

means that the signal itself is only defined at certain times t. When signals are

represented within a computer they cannot be infinite in length for obvious

reasons (RAM/storage availability). For this reason, the ‘DFT’ and ‘DTFT’ are

used on computer systems. The difference between the two stems from how the

signal is treated regarding its periodicity. In the case of the DTFT it can be

thought of as applying the Continuous Fourier transform to a set of discrete

data which is aperiodic. If the non-zero part of the signal is repeated over an

infinite time and the transform is taken the DFT is the result. The DTFT will

have a continuous frequency domain representation whilst the DFT will result

in a discrete frequency representation. This also leads to the notion of the DFT

being seen as a sampled version of the DTFT.

Imagine a signal made up of a cosine wave of 500Hz, amplitude 1 unit. If this

signal is sampled at a rate of 2000Hz (i.e. every 0.5mS the amplitude of the

signal is measured) and we take say 100 samples we will end up with a rough

sine curve, made up from 100 data points. The DFT is defined as so:

1...0,1

0

2

NkexX

N

n

knN

i

nk


24

The DFT equation

Here kX are the complex coefficients that represent the frequency content of

the signal in nx . This can be related to a summation of sinusoids using Euler’s

formula:

)sin()cos( iei

Euler’s Formula

Using the DFT directly to calculate the kX values requires O(N2) arithmetic

operations. The Fast Fourier Transform (FFT) algorithm takes advantage of

redundancy in the DFT calculations to reduce this time to O(N.logN). It is not

necessary to delve into the inner workings of the FFT but it is useful to know

that when Matlab or other applications work on data, they will make use of the

FFT not the DFT. The benefit here is in the reduced number of operations from

proportional to N2 to N.logN, making real-time and large vector analysis

possible on modern computers.

Returning to the signal mentioned earlier, when Matlab takes the FFT of the

data a clear spike is seen at the point 0.25 on the normalized frequency axis.

Normalized frequency simply means that frequency scale has been divided by

the sampling frequency. It is important to sample at a rate no less than twice the

maximum data rate of your signal, otherwise aliasing will occur. Rather than go

into the sampling theorem here, which would detract from the discussion, see

the appendix for notes on sampling rates.


25

0 10 20 30 40 50 60 70 80 90 100-1

-0.5

0

0.5

1100 Samples of cos(2*500), sampled at 2KHz

Sample number

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50Spectrum of the above Signal

Normalised Frequency

An example of the FFT on time domain data

The spectrum of the signal (that is, taking the magnitude of the complex FFT

points) shows a mirror image about the frequency=0.5 point. This is because the

signal we are dealing with is composed of real numbers. When the FFT is taken

the complex numbers that are returned will be conjugates of each other, giving

the mirror image at half the sampling rate. As long as the signal is sampled at

twice the maximum data rate of the signal itself, the points between

frequency=0 and frequency=0.5 will be the true frequency data. Converting

between normalized and actual frequencies is very easy, simply multiply by the

sampling rate. In this case, the spike occurs at 0.25, hence 0.25*2000=500Hz,

which was the frequency of the time signal to start with.

The real power of the FFT comes from being able to determine the frequency

content of a complex time signal, including each component’s amplitude. To

see this, consider a signal that is defined like so:

)600*2cos()250*2cos(5)100*2cos(2)( ttttx


26

Looking at the signal in the time domain, it is extremely difficult to determine

visually what the original function was. There is an obvious nesting of

periodicity, but without examining it further it would be hard to that it was a

summation of 3 various frequency cosines.

0 10 20 30 40 50 60 70 80 90 100-10

-5

0

5

10100 Samples of 2cos(2*100)+5cos(2*250)+cos(2*600), sampled at 2KHz

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

50

100

150

200

250

300Spectrum of the above Signal, using 1024 point FFT

The spectrum plot however, as calculated using quite a large FFT window size,

shows clearly 3 distinct spikes, separate from the noise. Indeed, from this it is

possible to visually work out the original function. The heights of the spikes are

related to the amplitude of the wave from the time domain. Here you can see

that the strongest component is a signal at around 250Hz (0.125*2000), which is

about 2.5 times stronger than a signal at 100Hz (0.05*2000) and 5 times

stronger than a signal at around 600Hz (0.3*2000).

The many other small components that have appeared are due to what is known

as leakage. This leakage is caused by choosing an FFT size greater than the

number of samples in our data set. When Matlab runs the FFT on the data it

first pads it out with zeros. This doesn’t actually effect the overall results of the

DFT, you can think of this as like multiplying the original signal by a


27

rectangular ‘window’; being equal to one for the length the data set and zero up

to the size of the FFT. The diagram below will help make this clearer.

0 200 400 600 800 1000 1200-10

-5

0

5

10The data after windowing is applied

0 500 1000 1500-10

-5

0

5

10The orignal signal, defined for "all" t

0 500 1000 15000

0.2

0.4

0.6

0.8

1The rectangular window, equals 1 for our sample size of 100

Rectangular windowing effect of using a FFT size greater than the sample size

The extremely sharp cut-off is what causes the leakage in the spectrum. Sharp

corners are a hallmark of high frequency signals. Intuitively this makes sense;

sharp corners are hard edged, unlike the soft curves of low frequency waves like

ripples on a pond. The leakage in this example is not that much of a problem as

you can still clearly see the three main spikes. The effect of this leakage can be

reduced however if a windowing function other than a rectangular one is used.

The following plot shows an example of a Hamming window of length 100.


28

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1100 Point Hamming window

A 100 Point Hamming window

Window functions such as the Hamming, Hann & Blackman types have

smooth cut-offs, much like a sine or cosine wave. This smooth change down to

zero of the signal lowers the effect of the leakage compared to a simple

rectangular cut-off. The FFT of the signal before and after a Hamming window

was applied is compared below.


29

0 20 40 60 80 100-10

-5

0

5

10Original signal

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

3001024 point FFT of original

0 20 40 60 80 100-10

-5

0

5

10Windowed signal

0 0.2 0.4 0.6 0.8 10

50

100

1501024 point FFT of windowed signal

The effect of a Hamming window on a large size FFT spectrum

The effect is rather dramatic; the spectrum of the windowed signal is much

cleaner and well defined than its non-windowed cousin. A hamming window is

a popular choice in the signal processing community due to its simplicity and

effectiveness. It is defined as:

)2

cos(46.054.0][N

nnw

Definition of the ‘Hamming’ window

Unless otherwise stated, any reference to applying a window to any data refers

to this particular definition.

Limitations of this method: Time resolution

So far the discussion has put examining the amplitude spectrum in a positive

light, there is however one major drawback to this method and that is time

resolution. Taking the FFT of a set of data will give plenty of information

regarding the frequencies which appear but absolutely nothing about the time at

which these components manifest themselves. In a constantly changing signal


30

(such as somebody playing a selection of notes on a guitar) this is an obvious

drawback. It’s of little use to simply list each note that has been played with no

indication of order.

This is where knowledge of windowing functions becomes useful. If the data

was divided up into small chunks, each chunk being analysed individually for

frequency information, we could in theory determine what frequencies appear

at what times.

The Short-Time Fourier Transform (STFT) does exactly this, although further

discussion on this is left to the design section.

Other potential methods of determining frequency

Whilst the FFT does an excellent job of extracting the frequency information, it

is not the only way to accomplish this. A method which deals solely in the time

domain and can determine the fundamental frequency with good results is the

autocorrelation algorithm.

Autocorrelation of a signal is quite simple, using a sample size at least twice that

of the maximum signal frequency (again, due to the sampling theorem), a copy

of the signal is shifted for a number of samples and the absolute difference noted

for each.

When the signals are at their most different, the absolute difference will be high

but as the copy starts to get close to lining up with the original signal, the

difference will rapidly approach zero. The first minimum of the autocorrelation

function will be equal to the fundamental frequency of the original signal.

Whilst this method sounds good regarding the nature of the guitar signals

(remember, a single note will contain many harmonics as well as the

fundamental, which the note name is based on), making use of the

autocorrelation function in practise is computationally expensive. For each

block of the signal a large number of multiplications need to be done, then the

first derivative of the autocorrelation signal must be taken to determine the

minimum. For a large number of blocks, computing this much data could be

prohibitively costly.


31

Gareth Middleton, a U.S. researcher, has developed a method which he calls

‘FAST Autocorrelation’. It makes use of temporal redundancy in frequency

content, using previous window sizes to base future calculations on. Speed

improvements of over 70% are reported possible. Despite autocorrelation being

potentially useful for determining specific notes, it will at most return one

frequency per block examined. If a chord has been struck, for example a G

chord like earlier, it would only return the root note ‘G’ and give no indication

of the other, higher, major frequency components that go into making the chord

itself.

Wavelets are another area of interest. In their current form they are a recent

development (circa. 1980’s) and improve on the time-frequency resolution of the

STFT. Wavelets are quite complex in scope compared to Fourier methods. As

much of the research and thinking I have applied to this project related to

Fourier, wavelets have been left as part of the potential future direction I wish to

take this project. Further details on them and their benefits are left to the ‘Future

Work’ section after the conclusions.

A note on the phase information in signals

The previous discussions on the use of Fourier methods to provide information

on the frequency content of the signal make use of only the amplitude

information from the complex result of the FFT. This means so far the phase

information in the signal has not been used.

Human beings cannot differentiate between one signal and another with

inverted phase; they are perceptually the same sound. The phase information

itself is of no use in trying to determine the pitch of a note. The amplitude

component is the main interest here as it is the one that clearly marks the

occurrence of certain notes/frequencies. Below is a diagram showing the

amplitude and phase spectra for the 3 cosine signal used earlier to demonstrate

the FFT and windowing.

Clearly, the amplitude spectrum shows the most information as regards the

nature of the signal and in a clear way compared to the phase spectrum.


32

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

50

100

150Amplitude Spectrum

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-4

-2

0

2

4Phase Spectrum

Comparing the usefulness of the Phase and Amplitude spectra of a complex signal

Whilst there are notable changes when a frequency component makes itself

present, trying to do anything of use using the information contained in the

phase spectrum would be difficult to say the least. For this reason any reference

to ‘spectrum’ in this report refers to the amplitude (magnitude or absolute value)

or power spectrums of the signal.


33

Chapter 5: Design

The STFT & time resolution

As stated in the background theory chapter, the STFT provides a means to

examine frequency components at points in time throughout the signal. To do

this the signal is first cut up into blocks. Each block then has the FFT applied to

it to give the frequency information and the results are concatenated together to

give the overall time-frequency spectrum.

Since we are applying the Fourier transform to a windowed version of the signal

and then moving this window along the time axis, the STFT results in a 2-

dimensional representation of the source signal. For the discrete case, which is

what we will be dealing with in Matlab, the STFT can be defined as:

n

knN

iemnwnxkmXxSTFT

2

][][),(()}{

The STFT definition

Where ‘w[n - m]’ refers to the window signal (e.g. hamming), centred around

zero. This equation can be thought of as moving the centre of the window to the

point we are interested in, applying a window of fixed length to the signal and

then taking the DFT of this windowed section. Each point ‘m’ in X will be

associated with its own spectra given by its values for k.

The size of the window determines the resolution of the STFT and for variable

frequency signals there is a trade-off between the degree of frequency resolution

and that of time. Designing a STFT for the purpose of the guitar signals will be

a delicate balance between the two.

Firstly a time and frequency resolution needs to be settled on. Consulting

Appendix 1, a list of the frequencies for each note (on a standard tuned guitar)

shows that the very minimum difference in frequency between two notes is

4.9Hz. Ideally, if we could resolve to a frequency of around half of this it would

mean we could determine each note without much difficulty.


34

If our window is length N and our signal has been sampled at fs, the Fourier

transform will produce N coefficients. As the data is real (it will be a vector of

real numbers, audio signals do not contain complex components) the spectrum

will be a mirror image about the Nyquist frequency,2

sf therefore only

2

Nof the

coefficients will be of any use. These 2

N coefficients are associated with

frequencies running from 0 to2

sf.

Putting this together means that coefficients in the frequency spectrum will be

spaced by HzN

f s

1)2(

2

. As the window size increases and becomes much larger

than 1 this approximates with little error to HzN

f s between coefficients.

The practical effect of this means that to increase frequency resolution, the

sampling rate could be decreased, meaning less samples for the same size

window (i.e. the window is applied over a greater time, therefore less time

resolution). Increasing the size of the window N obviously has the same effect.

As noted a little earlier, a frequency resolution of around 2.5Hz would be

useable across the entire range of the guitar. Assuming a sampling rate of

22050Hz, which is very common amongst audio recoding equipment on any

platform, gives a value of N of 88205.2

22050 samples. Is this any good?

For an average guitar player, playing at a beat rate of 120bpm and using quarter

notes is fairly routine. This would correspond to 2 beat notes played every

second. The window size of 8820 samples with the same sampling rate of

22050Hz would give a time resolution of 0.4seconds. So if we are happy that

the signal’s we want to process will be 120bpm or below and use quarter notes,

this value should be good enough. For more demanding music however, such

as thrash metal with beat rates commonly above 200bpm using eighth notes, we

would need a time resolution of around 0.04seconds. Using the same sampling

rate of 22050Hz, the new window size would be 0.04*22050 = 882samples,


35

giving a frequency resolution of 25Hz. Looking at the table of note frequencies

in Appendix 1, we can see that we could only start to reliably detect notes after

A above middle C (440Hz), which on the guitar would be fret 5 on the high E

string and above (obviously the higher fret’s on the strings below have the same

frequency notes, see the fret board map in the background theory chapter).

So for a window size of around 800 samples we should be able to work out high

frequency solo’s, whilst increasing it to around 8000 samples means better

results for slower, more acoustic style music.

The graphs below show the effect on frequency and time resolution, based on

the window size, for 3 of the most common sample rates used in recording (44.1

KHz, 22.05 KHz & 11.025 KHz).

The effect of the window size for frequency and time resolution for common sample rates

The graph for frequency shows a rapid drop at around a window size of 100

samples and then slow rate of further decrease as the window length progresses

beyond 1,000. The following table summarises the time & frequency resolution

for window lengths in the thousands based on using a sample rate of 22.05

KHz.

Window Length (N) Frequency Resolution (Hz) Time Resolution (Sec)

1000 22.05 0.045351

2000 11.025 0.090703

3000 7.35 0.136054


36

4000 5.5125 0.181406

5000 4.41 0.226757

6000 3.675 0.272109

7000 3.15 0.31746

8000 2.75625 0.362812

9000 2.45 0.408163

10000 2.205 0.453515

Resolution compared to various window sizes for a 22.05 KHz sampling rate

Spectrograms

The STFT will return information on both the phase and amplitude of the

frequency components at each time point that is measured. As stated earlier in

the background theory section, the phase information is of little use in working

out the frequencies involved in a signal. Instead the amplitude spectrum was

suggested as a means to examine the content of each time slice.

Taking the magnitude squared of the STFT results in the power spectrum for

each time slice, when the result is combined into one graph it becomes known

as a spectrogram.

2)()( xSTFTxmspectrogra

Definition of the spectrogram

When spectrogram’s and indeed STFT’s are being calculated in practice on a set

of data it is normal to overlap the windows by a certain amount and average the

results.

The spectrogram for the pull-off example from the background theory section is

shown below. It was generated using Spectrogram 14 from Visualization Software

LLC. Time and frequency occupy the x and y axis respectively. To show the

power in a certain frequency band a colour is used, in this case the darker the

colour, the stronger that frequency at that particular time.


37

Spectrogram of the pull-off example

Examining the spectrogram of the pull-off will be a good start to seeing what is

possible with regards determining note names for time slices, as the rapid

change and generally closely spaced pitch’s will stretch the limit of the STFT

resolution.

This spectrogram was calculated using a window size of 8192 points (213),

which gives a frequency resolution of around 2.7Hz, around the range which

was decided on earlier during the discussion on the STFT resolution. The FFT

works fastest if the length of it is a power of 2, hence the choice of 8192 points

rather than the 8820 calculated as ‘exact’ previously.

The two redlines mark 784 Hz and 659 Hz; these are the frequencies of the

notes at frets 15 and 12 on the high E string. The note at fret 14 sounds at 740

Hz, which is just below the top red line.

From the spectrogram, it can be seen that there is limited amount of banding,

which represents the change in note pitch. This can be seen at approximately

time’s 0.14s and 0.24s. Examining the power spectrum within each of these

bands gives encouraging results.


38

Power spectrum at 0.1seconds (first note)

Power spectrum at 0.2seconds (second note)


39

Power spectrum at 0.3seconds (third note)

The left red marker lines up quite closely to the peak of the first major spike.

The high powered, higher frequency components which appear above 1.5 KHz

are harmonics of the note. The guitar itself, if it was to play only pure notes,

would max out in frequency at around 1.4 KHz. It would be desirable to either

remove or in some way make use of these harmonics when it comes to

examining the spectra for each time slice. A method which is perfect for this is

the Harmonic Product Spectrum. Before examining this, a look at the other end of

the guitar, the lower frequency notes, should give an idea how well the STFT

method is holding up from one extreme to another given the set resolution.


40

Spectrogram for notes running from F# to A on the Low E string (87-110Hz)

Here the red lines are the boundaries between the lowest frequency note (F# at

87Hz) and the highest frequency note played (A at 110Hz). Interestingly the

harmonics have more power in them than the fundamental itself. It can be seen

that one note is played roughly every second with a slight pause in between.

The strong bands can be seen to move upwards as the plot moves along in time,

corresponding to the increase in pitch of the notes being played. The power

spectrum for the note played during the third second is shown below.

Whilst the fundamental is hard to see, the harmonics from 2fo to 8f0 are well

defined. Obviously, if we were to try and determine the fundamental from this

plot alone by taking the maximum point, it would return a false result (3f0).

Using the harmonic product spectrum can increase the likelihood that we have

identified the real fundamental and to a good degree of accuracy.


41

Power spectrum for the G# (103Hz) note at 3 seconds into the spectrogram

Harmonic Product Spectrum (HPS)

From the background theory it’s known that when a guitar note is struck it will

manifest itself not only as the fundamental frequency (the note’s basic pitch) but

also as a number of harmonics separated at integer multiples of this frequency.

By exploiting the sampling theorem and down sampling repeatedly, the

harmonic components can be used to actually pinpoint the fundamental with

greater precision than the original spectrum alone. I first saw this method in a

paper by Gareth Middleton, on cnx.org and then found it mentioned in other

texts related to pitch analysis.

Firstly the power (or magnitude) spectrum of the windowed block must be

calculated. This is what was done previously when the spectrogram was

obtained. The spectrum is then down sampled N times by integer amounts, with

each down sampled spectrum being stored temporarily. Finally, the spectrums

are multiplied together to give the result.


42

Graphical description of the harmonic product spectrum

The HPS method works well because as the signal is down sampled, harmonics

at for example 3f0 (3 times the fundamental) will line up with the original

fundamental peak if they are down sampled by a factor of three. A harmonic at

nfo will line up with the fundamental if it is down sampled by a factor of n. The

strongest point of overlap will be at the fundamental, nearby harmonics will

also be reinforced. The first major spike will be the fundamental however,

which is of course the result we are interested in.

Down sampling can be thought of as dividing the frequency resolution by a

factor, to down sample by 5 for example, you simply remove every five samples

point from the spectrum and then pad it out with the required number of zero’s.

The plot below shows the spectrum for the G# note from above and also 3

down sampled versions (down sampled by 2, 3 & 4 samples). It is clear that the

higher harmonics from the original have shifted down to the fundamental. The

large numbers of harmonics present in this note have also reinforced the second

& third harmonic components somewhat.

1

2

3

4

HPS|FFT|


43

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

100

200

300

400

500

600

700G# Note - Downsampled & original spectrums

Normalized frequency

originaldown. 2down. 3down. 4

Fundamental Frequency (103Hz)

Second Harmonic (206Hz)

Third Harmonic (308Hz)

Detail of the down sampled and original spectra for G#

If the four spectrums are multiplied together the result is quite dramatic.

Comparing this to the original spectrum, the fundamental is clearly visible. To

totally remove the other harmonics would require increasing the number of

down sampled spectrums that are combined. At some point however the overall

power after the spectrums are multiplied will be reduced to an unusable level.

Using 3 down sampled spectra and the original gives good enough results to

determine the note on itself. I found that increasing the harmonic components

used beyond 8 began to take quite some time and reduced the resultant spike

height so much that it would be impractical to use.


44

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02

0.5

1

1.5

2

2.5

3

3.5

4x 10

8

Normalised frequency

Result of the HPS spectrum using 3 downsampled versions

Fundamental Frequency (103Hz)

Second Harmonic (206Hz)

Third Harmonic (308Hz)

The harmonic product spectrum for G#

Making use of this plot is quite simple. One possible method to detect the mid

point of the first peak would be this:

set a ‘lock’ variable to 0

set a threshold variable to some level of power

move through the HPS signal, when the signal level goes beyond the threshold, set the lock to 1

make a note of the current sample index

when the signal falls below the threshold stop looping

make a note of this new sample index

sum the two index and divide in two and round up/down, the result is the sample number for the midpoint of the first spike

This is quite a simple algorithm and is quite nice in that it does not need to go

through the entire signal, only up to the point it finds the end of the first spike.

However it will need to be modified if chords are to be taken into account.

Combining everything discussed up to this point should give a reasonably

accurate output of the note values in sequential order, the basic idea of a

transcription system. Before looking at the implementation however there are 5


45

things left to consider to creating a really useful transcription system. These have

already been discussed in a little detail during the background section. They are:

Identifying the string a note is played on

Identifying if a chord has been played

Identifying a bend

Identifying a hammer-on or pull-off

Most importantly, identifying actual notes rather than ‘time slice notes’

Identifying the String a note is played on

The spectrogram below shows firstly an ‘A’ played on the low E string at fret 5,

then there is some silence followed by an open-A string being struck. The two

notes sound the same and unfortunately looking at the spectrogram there is no

immediate way to tell the two apart.

For an in-tune guitar that is exactly the result that should be expected. Despite

the strings being different density’s or even made of different materials, the fret

board and tuning is designed to make the string vibrate at 110Hz regardless of if

it is coiled metal or smooth nylon.

Spectrogram of the same note on different strings (A at 110Hz)


46

The problem of identifying the correct string is something I have left to future

work. This isn’t a real problem in any case however. Guitar tablature is an

interpreted art to some extent. Some people prefer to jump across the fret board

from left to right whilst others would rather use their fingers to move up and

down. The important thing for both people is are they playing the same ‘note’?

That is something which can be determined with accuracy of fundamental

identification.

Identifying a chord

A chord is sounded when numerous notes are struck together at the same time.

Of course, this means that there are going to be quite a few fundamental

frequencies and a lot of harmonic components. The G-chord from the

background theory section will now be analysed to see what can be learned that

could help in identifying a chord during a time slice.

Spectrogram of a G chord (root note fundamental: 98Hz)

The spectrogram backs up what is already known. The strength of the

components from the root note (remember, a chord is named after its lowest

pitch note) is quite high across the spectrum, with no immediate way to tell

what bands are actually interesting (from a note determination point of view).


47

This is where using the Harmonic Product Spectrum, rather than just the

magnitude or power spectrum, becomes a major advantage.

All that is left within the spectrum are the fundamental notes that make up the

chord. The particular type of G-chord that was played made use of all six

strings; the last component has disappeared after the spectrums were combined.

This is not that great a loss however, as the information left still allows us to

identify this wave form as a G-chord.

The time slice used is quite small, and we have assumed because of this that the

player is unlikely to be performing numerous notes within one slice. This leads

to the conclusion that if the harmonic product spectrum contains multiple

spikes of frequencies that do not share a common heritage (i.e. they are not

simply harmonics of the fundamental) then a chord has been played. Identifying

the name of the chord is as simple as returning the root (fundamental) note.

2000 4000 6000 8000

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

G-Chord (windowed), time waveform

0 0.05 0.1 0.15 0.2

50

100

150

200

G-Chord, amplitude spectrum


0.05 0.1 0.15 0.2

50

100

150


Downsampled and original spectra

0 0.005 0.010

0.5

1

1.5

2x 10

7 Harmonic Product Spectrum


Root note (G, 98Hz)

2nd Note (B, 123Hz)

4th Note(Open G, 196Hz)

3rd Note(Open D, 148Hz)

5th Note(D, 293Hz)

Applying the Harmonic Product Spectrum to a chord to identify its notes

To incorporate this into the algorithm previously mentioned for identifying the

fundamental in a slice is fairly straightforward.


48

Loop through as before, to determine the fundamental

When this is finished, instead of breaking the loop continue on

Identify the next note, if it is a harmonic then continue on

If the note is not a harmonic of the previously found fundamental, it is a new note. Assume that because there are numerous fundamentals a chord has been struck and set a flag to indicate this.

as an enhancement, rather than simply list the root note of the chord perform a lookup (from a database) to return the tablature form of the chord.

Identifying a bend

Bend’s are generally played well by professionals and normally a lot less than

well by beginners and amateurs. Hitting a bend correctly requires intuitively

knowing when the sound is correct. Normally a guitarist will have a good idea

as to how far to bend the string for the right sound. The spectrogram below

shows a bend being played on the G string at fret 5. It is bent up by one step,

which makes the note sound like it is being played at fret 6.

Spectrogram of a one-step bend

The red lines represent the frequency of the two ‘notes’ that are being played.

Looking at the fundamental band it is near impossible to see that anything

much has happened. However, the high frequency harmonics show a definite

curving.


49

One possible way to identify a bend could be the following:

identify the first fundamental note, as normal

in the ‘bending’ time slices, there will be a fundamental which is in between the two notes. This note will not be on a frequency – note pair table, so it could potentially indicate a bend is occurring. Flag this in a variable and store the current note name.

in the next time slice if the ‘bend_occuring’ variable is set and the fundamental for this window is in the frequency –note table, the bend has finished. Reset the bend_occuring flag.

output a bend based on the difference between the new note and the previous ‘good’ note. So for example if the note now is F# and previously it was D#, it would be D# bend 2 steps.

The problem with this approach is that a 2 step bend will pass though that other

‘step’ (the in-between note) which, if the time slice allows, could be detected as

the finishing note. Of course, on the next time slice the algorithm will flag

another ‘bend_occuring’, giving an output of the form ‘D# bend 1 step, D bend 1

step’ rather than ‘D# bend 2 steps’.

Identifying hammer-on’s / pull-off’s

Looking back to the spectrogram for the pull-off example, it can be seen that the

signature of one of these moves is that of a number of notes happen quickly and

consecutively without gaps. Of course, if a guitarist is playing at pace, any

number of notes played together could be interpreted as hammer-on’s or pull-

off’s.

In general, if two or more notes that are separated by only one or two steps on

the same string have been played within a very short time, it’s very likely that a

hammer-on or pull-off has occurred.

identify the current note as normal

check if the note in the last time slice is within ±2 steps

if it was, a hammer-on / pull-off may have occurred.

Although this may sound like it will work reasonably, the problem with

determining the current string being played makes it too unreliable to be totally

trusted. A guitarist could have been simply playing two nearby notes on

separate strings in quick succession.


50

Identifying actual notes, not ‘time-slice notes’

A ‘time-slice note’ is just the result of determining the fundamental of that

particular block. Of course, a note could be held over a number of time slices

and it would be incorrect to simply return a list of note’s full of duplicates

bunched together. Instead some method needs to be worked out that can

determine if a note from that time slice is new or merely the same frequency

content from a note started much earlier. There is a simple way to do this that

does not require much computation and should return reasonably accurate ‘real’

notes. Remember, the STFT is limited in its time-frequency resolution, so there

is already a trade off between fully accurate identification of notes and usability

over the guitar’s range.

1. as the notes are being decoded, a (global) variable keeps track of the previous note value.

2. it is then compared with the current one, and only if the notes are different will a new note value be output.

Obviously this falls short of ideal if the guitarist keeps playing the same note

over and over. For things like solos however, which have lots of rapidly

changing notes in a short time, it should give a reasonable description of the

original music.

A ST-HPS rather than the ST-FT

This subsection title is rather a misnomer. When the signal is to be processed it

is common to overlap the window used in the STFT on the signal and sum the

results for consecutive samples. This helps to improve the result of the STFT

although choosing a good value can be subjective.

SAMPLE DATA

Sliding

window

Increasing time


51

Here the light blue area represents the overlap. Simply summing the

overlapping elements is acceptable, since the windowing function causes the

spectrum to be cleaned up rather than drastically attenuating its sides it will only

serve to reinforce any major spike already present in multiple overlapping

spectra. As the exact value for the spikes is not necessary to know we can avoid

any extra calculation such as averaging over the shaded region. In general an

overlap of 25-70% is used for most applications. It will of course result in the

spectrogram that is either very expanded or very dense. After the HP-spectrum

is taken however the spectrums will be quite clean and defined for each sample

anyway unless there is silence or distortion; so the issue of overlap is not really

that critical to whether or not the fundamental’s will be found.


52

Chapter 6: Implementation

This section details the functions needed to help analyse the signals to some

automated extent. Function definitions are given here directly rather than as an

appendix as they are relatively short and terse enough to be included in the

main body, ready for discussion. These functions are quite easy to understand.

They provide a means to analyse the signals in a way that is useful for

determining what is being played in the sound file.

It should be noted that when it comes to creating real ‘production’ code only the

first half of the spectrum’s need to be used, the rest can be discarded. This is

because the data we are dealing with is real, which will result in a spectrum

populated by complex conjugates, hence the mirror image around the 0.5

normalized frequency point.

% returns a hamming window of length x

function y = Hamming(x)

n = [1:x];

y = 0.54 - 0.46*cos(2*pi*n/x);

% pad’s out the vector x up to the length z with zeros

function y = zeropad(x, z)

y = [x zeros(1, z-length(x))];

% removes blocks of N samples to compress the signal, it

% is then zeropadded to it’s original length

function y = downsample(x, N)

y = zeropad(x(:, 1:N:end), length(x));

% returns the harmonic product spectrum for a vector x

% it uses a FFT of length 8192 samples as this is the

% window length I have been using throughout

function [f, y] = hps(x)

x = x.*hamming(length(x)); % apply window

x = abs(fft(x, 8192)); % take spectrum

x2 = downsample(x, 2); % calculate the down

x3 = downsample(x, 3); % sampled spectra

x4 = downsample(x, 4);


53

y = x.*x2.*x3.*x4; % combine spectra

f = [0:8191]/8192; % normalized frequency axis

hps = [f y]; % return result

% findPeaks – returns the list of peaks it finds in a signal

% which are above a threshold value, use to find notes

function [mids, pwrs] = findPeaks(signal, threshold)

lock = 0; midpoint = []; powerATpeak = [];

for i = 1:length(signal)

if (signal(i) > threshold) && (lock == 0)

start = i;

lock = 1;

end

if (signal(i) < threshold) && (lock == 1)

finish = i;

lock = 0;

midpoint = [midpoint (start + (finish-start)/2)];

power = signal(round(midpoint));

powerATpeak = [powerATpeak power(end)];

end

end

mids = midpoint';

pwrs = powerATpeak';

Evan Ruzanski has published a .M file that is freely available for download for

producing spectrogram plots. This can be easily edited to use the harmonic

power spectrum instead of solely the FFT. It can be found here:

http://www.mathworks.com/matlabcentral/fileexchange/loadAuthor.do?obje

ctType=author&objectId=1094324

With these functions and the basic outline of the algorithms it was fairly trivial

to simply find the fundamental frequencies. I decided to not spend the time

creating outputting functions and things like a lookup database itself for storing

the note names since these are fairly easy to implement once all the features of

the file have been identified. Instead I concentrated more on developing a good


54

framework to take the project on further when there was an unlimited amount

of time to spend playing with code.


55

Chapter 7: Evaluation & Conclusions

The results obtained in general

Despite the huge scope of what could be done in the field of music analysis and

automatic scoring I am quite happy with the progress I have made with the

initial idea. By using a ‘for loop’ and windowed HPS spectrums along with the

findPeaks function it is possible to obtain lists of fundamental frequencies (i.e.

notes) from input wave forms. In a full application these returned frequencies

would be compared to a database and the nearest value chosen as the note

name. As long as the resolution is high enough, small deviations in the exact

frequency would not be a problem.

Although I was disappointed with the fact that it seems extremely difficult to

work out the string being played as well as the note itself, I have had further

thoughts on this and have thought about one potential solution, or at least

workaround. This is in the next chapter under ‘Adding probability maps to

notes’.

Tackling too complex a problem

When I initially thought of this idea I was around 17. Back then I knew it

would be a fairly demanding problem to solve but I had no idea until I started

this work just how complex the field of automated music transcription is.

My initial enthusiasm to have not only a fully working, transcribing C++

application but also a hardware dongle to connect the guitar to the computer for

real time processing was too much to expect given the amount of work there

was to do outside of the project itself concurrently.

While the problem is a complex one, the progress I have made has left me in a

good standing to continue to develop the idea, which I fully intend to do.

Transcription of varying speed of play (tempo)

One of the short comings of the STFT is its single resolution; it can determine

frequencies to high precision at the expense of working out when exactly these


56

transitions occur. A transcription system for music should be able to identify a

wide range of frequencies but with precise time resolution.

For guitar tablature exact timing is not required but the basics of the ideas

outlined here could be extended to transcribing say piano or violin music.

Creating an accurate representation of the piece itself requires high demands on

the accuracy of the processing method.

Using the spectrogram (via the STFT) provided a familiar way for me to explore

the signals that are generated by musical instruments and work out ways to

identify their occurrence. As I explored the signals further it became apparent

that for a truly accurate and useful system some method other than the STFT

would need to be employed. This is when I came across wavelet transforms,

which promise to offer far greater resolution in both domains and an increase in

performance. I have included a short description of the potential benefits of

moving to wavelets in the future work section following this chapter.

Thoughts about the work and future direction

Personally I really enjoyed working on this project and report. It was

thoroughly satisfying to be able to put to use methods I have been taught and

used in almost all the courses I have taken at Imperial to good use with

something I love.

I would like to take the ideas I have mentioned in this chapter beyond paper

and start to get round to implementing them during my spare time between

work. Having been a user of Linux and open source software in general for

some years it would be great to give something back to the community.

Currently there is only one other program I have heard of that creates a

transcription of a guitar as it is played, it is known as Easy Tab Pro. I have

included a summary of it in the appendices. As it is closed source and requires

the use of an A/D D/A converter, making it rather inaccessible for the amateur

or cash-strapped artist, I would prefer to work on my own. The aim of this

being to release it to the community under the GPL license and thus accelerate

the development of the methods I have outlined. As of June I have registered

the domain name ‘http://www.writemytab.com’ as a point of reference for the


57

progress I continue to make on this project. The first thing I plan to do is set up

a wiki and transfer what I have learned throughout to a web format.


58

Chapter 8: Future Work

Use of Wavelets over the STFT

In the Design chapter the limitations of the STFT where exposed when the

frequency resolution was either good for fast solo’s but not for identifying low

frequency notes or else good at identifying all the notes played, just not if they

were played very fast.

The problem is that the STFT is capable of only a fixed resolution based on the

window size and sampling frequency. Wavelets offer a means of better describing

rapidly changing signals and an increased performance boost over the use of the

FFT.

Wavelets are seen as a new and improved version of the old Fourier methods,

moving away from simply frequency analysis and instead examining scale

analysis. It would be too much to go into wavelets at this stage in the work,

instead there are references to some works which I found useful whilst looking

for an improvement over the problems I was facing with the resolution.

Moving to a C++ framework

All of the analysis and resultant functions were created using Matlab, since this

offered the ability to play around with the signals in a friendlier way than

creating a full blown C++ based application. Rather than spend a substantial

amount of time debugging and writing fairly routine code I decided to abandon

the idea of working at this relatively low level in favour of the ease of use.

My primary goal of this is, and has been for some time however, to create an

application that can be used by guitar hobbyists freely and easily.

Building a Complete chord database

The harmonic product spectrum allows the individual fundamental harmonics

to be determined within a time slice. This led on to being able to identify the

signature of numerous unrelated fundamental frequencies as chords. A

complete database of chords, with their root and other notes listed would make

it easy to quickly output the correct chord, and given the root note’s pitch, it’s


59

most likely positioning on the fret board. It would also be useful to have a

separate database containing the just root nodes as a quick lookup table for

outputting the tablature.

Determining the current string being played

This is the single most frustrating problem I have encountered. Rather naively

when I began this project I thought that by looking at the various frequency and

time waveforms for ‘same-note different-string’ signals something would rather

obviously jump from the page and lend itself to string identification.

That unfortunately did not happen. Being able to identify the current string

would be a great boost to what could be properly implemented. The problem of

trying to decide if nearby notes are being hammered-on or pulled-off would be

near non-existent. Also, adding support for trills (quickly switching from one

note to another many, many times) and tapping (an advanced technique

whereby a selection of notes on the same string are played by ‘galloping’ on the

fret board with the tip of the finger) would become a real possibility. The

software would ‘know’ that these notes are to be played like this due to their

timing occurrence and the fact that they happen on the same string.

If the guitar was to be actually plugged into the computer at the time however

some form of calibration would be possible in order to train the software into

recognising the correct ‘version’ of each note according to string. The system I

have been working towards however aims to decode any recoded signal and

could be likely applied to many more instruments given the right ‘profile’

information.

Adding Probability maps for each note

This idea came to me in a burst of inspiration recently whilst studying heuristics

for my artificial intelligence exam. Wanting to concentrate primarily on note

identification I haven’t been able to properly explore it but I will outline my

idea here.

When looking at heuristics that help lead a robot to a goal in a maze I thought

about applying the same idea to the guitar. The robot searches through a graph


60

space looking for a goal node, all the while calculating its next move by means

of a path cost function )()()( xhxgxf , where g(x) is the cost to get where the

robot currently is, h(x) is the heuristic based cost to the goal node and f(x) is the

current projected total path cost to the goal node. The heuristic can be thought

of as a best guess to the goal.

In the diagram below, the green node is the starting or initial node, the light blue

node is the current position and the red node is the goal state.

Path cost function using a heuristic for general graph search

It was the idea of guessing what note will be played next that led me to think

about assigning a matrix of probabilities to each ‘node’ of the guitar search

space. In this case a node would be a certain fret on a certain string. When

going through the harmonic product spectrum a variable would hold the value

of the previous note. This value would be associated with a matrix of

probabilities which would indicate where the most likely following note was to

be found. The current note would multiply this matrix with its own, constructed

by setting every possible position on the fret board it can occur to 1 and

everything else to zero. When the two are multiplied the result would be a

matrix left with only the probabilities where of the current note could be relative

to the last one played. It is then a case of finding the maximum in this matrix,

)(xf

)(xg

)(xh


61

which would then indicate where the most likely position (and therefore string)

for the note to go on the tablature is.

This is one potential way of perhaps getting round the complicated issued of

trying to determine the string a note was played on. Primarily by exploiting

some of the spatial redundancy of most hand positions tablature uses. That

matrix could for example by just 5x6 elements, since this would cover all

possible positions for where the hand is on the fret board and yet not be very

expensive in terms of computation time.

000

6.000

001.0

000

100

001

4.05.04.0

6.003.0

9.02.01.0

Example of 3x3 ‘next note’ probability matrix multiplied by a matrix of possible positions

The example above is not meant to relate to the guitar in any real way but

simply illustrate my idea. If the matrix on the left represented probabilities that

the next note to be played would be in that particular position and the matrix of

1’s and 0’s represents the positions the current note can be played in relation to

the previous notes fret board position, then the taking the element with the

highest score would be the ‘most likely’ position for that note to be played.

I am quite excited about the possibilities of this method, and think it could

provide a very neat solution to creating quite accurate and playable

transcriptions. The benefit also is that it would be customizable to certain styles

of play for increased accuracy. Spanish guitar style for example uses different

scales and positioning than 12 bar blues. By compiling a comprehensive

database of note positioning and creating the required probabilities from their

statistics, it should be possible to create matrices for all manner of styles and

tunings, therefore bypassing the need to do an over complicated and most likely

time prohibitive analysis.

This is of course dependant on whether or not the problem of accurately

detecting what string has been struck could be solved.


62

Chapter 9: Bibliography

Automatic Transcription of Music, Anssi P. Klapuri, Augusut 2003,

http://www.cs.tut.fi/sgn/arg/klap/smac2003_klapuri.pdf

Matlab Vectorization Tips, Drea Thomas,

http://www.ee.columbia.edu/~marios/matlab/Vectorization.pdf

Short Time Fourier Transform, August 2005, Ivan Selensnick,

http://cnx.org/content/m10570/latest/

Signal Processing Methods for the Automatic Transcription of Music, A. Klapuri, March 2004,

http://www.cs.tut.fi/sgn/arg/klap/phd/klap_phd.pdf

The Discrete Wavelet Transform, Collective Authors,

http://en.wikipedia.org/wiki/Discrete_wavelet_transform

Pitch Detection Alogrithims, Gareth Middleton, December 2003,

http://cnx.org/content/m11714/latest/?format=pdf

The Computer Music Tutorial, Curtis Roads, 1996, MIT Press

QB Express Issue #20, Collective Authors May 2006,

http://www.petesqbsite.com/sections/express/issue20/index.html

Mathworks Matlab Signal Processing Documentaion,

http://www.mathworks.com/access/helpdesk/help/toolbox/signal/

Discussion on Notes Per Second, Various, 2006,

http://ilx.wh3rd.net/thread.php?msgid=3223935

Short Time Fourier Transform on Wikipedia, Various, 2006,

http://en.wikipedia.org/wiki/Short-time_Fourier_transform

Timing is Everything, David Hode February 2003,

http://www.guitarnoise.com/article.php?id=86

Pitch Detection Methods Review, Marina Bosi, Unknown,

http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm


63

Appendix 1: Guitar note names vs. Frequency

The table below shows information on the frequency of notes on the guitar over

its full range. Also listed is the difference in frequency between two consecutive

notes, useful in the discussions to do with STFT window size. The overall

average for each octave is also listed.

Note Frequency Difference DifferencesE 82.41 N/AF 87.31 4.9 Overall 11.89F# 92.5 5.19 1st Octave 6.65G 98 5.5 2nd Octave 12.96G# 103.83 5.83 3rd Octave 20.83A 110 6.17 4th Octave 51.86A# 116.54 6.54B 123.47 6.93C 130.81 7.34 Note Frequency DifferenceC# 138.59 7.78 A 440 24.7D 146.83 8.24 A# 466.16 26.16D# 155.56 8.73 B 493.88 27.72E 164.81 9.25 C 523.25 29.37F 174.61 9.8 C# 554.37 31.12F# 185 10.39 D 587.33 32.96G 196 11 D# 622.25 34.92G# 207.65 11.65 E 659.26 37.01A 220 12.35 F 698.46 39.2A# 233.08 13.08 F# 739.99 41.53B 246.94 13.86 G 783.99 44C 261.63 14.69 G# 830.61 46.62C# 277.18 15.55 A 880 49.39D 293.66 16.48 A# 932.33 52.33D# 311.13 17.47 B 987.77 55.44E 329.63 18.5 C 1046.5 58.73F 349.23 19.6 C# 1108.73 62.23F# 369.99 20.76 D 1174.66 65.93G 392 22.01 D# 1244.51 69.85G# 415.3 23.3 E 1318.51 74

Note name vs. Frequency


64

Appendix 2: Sampling Theorem

The sampling theorem put simply, states that a signal sampled at a rate no less

than twice its maximum frequency is totally recoverable from its samples.

This is easily seen from the diagram below.

Here the dashed line shows what would be called an ‘aliased’ frequency. As the

original signal was sampled at less than twice it’s own frequency, the points that

result can be matched to a lower frequency harmonic. If the signal had been

sampled at twice its rate or higher then the only way for a sinusoid to fit within

the point would be to exactly replicate the original signal.

This is the sampling theorem in a nutshell.


65

Appendix 3: Setup used to record

Although I had initially planned on making a hardware interface to connect the

guitar to the PC for processing, I decided against pursuing this for reasons of

time but also because I felt it was unnecessary.

Guitarists that own an electric generally tend to have an amplifier and using a

cheap microphone like that on many VoIP headsets to pick up the signal can

give surprisingly good results. In order to minimise any noise and record the

sound faithfully I fixed my headset with some tape to the top of the amplifier

and angled the microphone towards the centre cone.

Photograph of the microphone in position near the amplifier

The sound files were then recorded using Sony TM Sound Forge 7.0, although

the sound recorder included in Windows TM will also suffice for capture.

Headsets with microphones of reasonable quality can be picked up for under £5

now-a-days, so an extra one could be got just for this purpose without any

problem.

The sounds that were used throughout this report have been included, along

with all other files, on the CD attached at the end of this document; they are all

Microsoft WAV TM files and were sampled at 22050 Hz.


66

Appendix 4: Software used throughout

All work was done under Microsoft Windows XP

Spectrograms were generated with Spectrogram 14, Visualization Software LLC

Analysis of the signals was performed in Mathworks’ Matlab, version 6.5.0.

Sound capture was done using Sony Sound Forge 7.0, sampled at 22050 Hz.

Other graphs were created using MathGV from http://www.MathGV.com).


67

Appendix 5: Easy Tab Pro

Easy Tab Pro is a proprietary piece of software now free in cost from VisAid

Development. It was released in 1999 and took a ‘solid year of development’.

Screen shot of Easy Tab Pro, taken from www.lookoutsoft.net

From the website:

‘Easy Guitar Tabs Maker Pro allows you to write guitar tabs easily by plugging

your guitar into your computer. While you play Easy, Easy Guitar Tab Maker

Pro analyzes the pitch and tone of the signal transmitted from your guitar. It

monitors the change and combination of the chords played. Then it analyzes

the pitch and tone to determine which strings were played and where your

fingers were at. Easy Guitar Tab Maker Pro then graphs the results as

tablature.’

As it requires the use of an A/D D/A converter on the line of the computer I

suspect that after some sort of calibration it can determine the difference

between strings. It is interesting that it mentions tone, as when you hear the for

example the low E fret 5 being played compared to an open A there is a tiny

difference but not enough for any determination to be made from the

spectrogram’s at the very least from what I could conclude.

jaymz campbell - final year ise3 report

Documents

tablature transcription

music tablature

tablature style version

electric guitar2chapter

electric guitar3chapter

electric guitarstudent

project title

frequency analysis