4. content-based search in music - dejoridejori.eu/uploads//is-kap04-music-search-en.pdf4....

19
G. Specht: Information Systems 4-1 4-1 4. Content-Based Search in Music 4-2 4.1 Audio Technology 4.1.1 Human Ear: Acoustic Sound Waves Frequency: ~ Pitch Amplitude: ~ Volume Subsonic noise 0 Hz up to 20 Hz Audible sound 20 Hz up to 20 KHz Intensive domain: 600 Hz - 6.000 Hz Ultrasound: 20 KHz up to 1 GHz In general no pure sinus waves: overlaps (harmonic, “tone”)

Upload: voquynh

Post on 18-Apr-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

G. Specht: Information Systems 4-1

4-1

4. Content-Based Search in Music

4-2

4.1 Audio Technology4.1.1 Human Ear: Acoustic Sound Waves

• Frequency: ~ Pitch• Amplitude: ~ Volume• Subsonic noise 0 Hz up to 20 Hz• Audible sound 20 Hz up to 20 KHz• Intensive domain: 600 Hz - 6.000 Hz• Ultrasound: 20 KHz up to 1 GHz

• In general no pure sinus waves: overlaps (harmonic, “tone”)

G. Specht: Information Systems 4-2

4-3

On Storage: Three Possibilities

1. Analog storage: compare to vinyl records, audio tapes.• Out of interest here

2. Digital storage: e.g. audio CDs

3. Symbolic storage: MIDI-technology, e.g. synthesizer

4-4

4.1.2 Digital Audio Storage

• Sampling Theorem:– The sampling frequency has to be greater than twice the highest

frequency occurring in the signal.

informationloss

=>

no noteworthyinformation loss

G. Specht: Information Systems 4-3

4-5

Digital Audio Storage

• Example:– Audible domain: 20 Hz - 20.000 Hz

– Telephone: 3.000 Hz > 6.000 sample points / sec.

– Hifi: 22.000 Hz > 44.000 sample points / sec.(per channel!)

– CD technology: Pulse Code Modulation (PCM-Coding):16 Bit (= 2 Byte) per sampling value

4-6

Digital Audio Storage

• Memory demands:– Audio data rate =

– Compare to phone data rate:

– Goal: via compressione.g. for mobile communication

2 Bytesampling rate

x 44.000 sampling ratesec x channel

x 2 channels x 1 KByte1024 Byte

= KBytesec

8KBitsec

64

= 1,6 KBytesec

KBitsec13

~~ 172 KBytesec

G. Specht: Information Systems 4-4

4-7

Digital Audio Storage

• Example: Audio CD: 74 min. of length

– Storage requirements

– Additional storage requirement for:• Error correction, index information, file descriptors

– Today’s Audio-CDs: 700 up to 880 MByte

> 74 Min. x 60 secMin

172 KBytesecx x

1 MByte1024 kByte

= 746 MByte

4-8

4.1.3 Symbolic Storage: MIDI-Technology

• For music only (not suitable for language) • Basic concept:

– Instead of digitalization of the sound, the score itself is stored (compare to synthesizer)

– Tuple: (Instrument, begin of a note, end of a note, fundamental frequency,

volume)– MIDI enables:

• 10 octave = 128 notes (= 27 = 7 Bit)• 128 instruments (incl. noises) altogether• Simultaneously: 16 channels = 16 instruments• Simultaneous notes per channel: 3-16 (quality feature of synthesizers)

(needed e.g. for organs, not for flutes)

G. Specht: Information Systems 4-5

4-9

Symbolic Storage: MIDI-Technology

• Example: key of a piano, begin of a note, end of a note, release of the key, key = note, velocity.

– 10 min. music ~ 200 KByte =>

• Result:

– Compact description of music files, which enables lifelike playback.

KBytesec0,33 data rate

4-10

Symbolic Storage: MIDI

• I/O:– Input via keyboard (piano keyboard) or notes on the screen– Previewer: synthesizer, PC

• Advantages of MIDI:– Compact presentation– Easily editable ( sequencer: editor, converts the sheet of music into

an internal MIDI format)– Extendable by intra document anchors (and so there exist links)

• Disadvantages of MIDI:– Not suitable for language– Artistic interpretation for one play? (only limited)– No documentary footage

G. Specht: Information Systems 4-6

4-11

Language

Acoustic and phoneticanalysis

(recognition of sound patterns)

Token

Syntactic analysis

Semantic analysis

Meaning of the text + (Gr.- Analysis)

Voice Synthesizer

Language

Sound patternSpoken referencesWord models

Word lexicon Morpheme lexiconSyntax grammar rules

Semantic word lexiconSemantic grammar rules

Fast access on mediumto high amounts of data

-Content bases search possible-- or e.g. translation (all steps backwards into the target language)

4.1.4 Outlook: Speech input and output

4-12

Outlook: Voice Input and Output

• Today’s technology:

– Speaker bound recognition of up to 20.000 words (trained software)

– Not bound to the speaker: recognition of up to 500 words!(troublesome: dialect, emotional pronunciation, background noise)

– At ca. 5% erroneously recognized words i.e. a sentence with three words: probability for a correct recognition:0,95 * 0,95 * 0,95 = 0,86

G. Specht: Information Systems 4-7

4-13

4.2. Content Based Search in Music Databases

• Goals :

– Melody Piece of music– Similar pieces of music, plagiarism etc.– Assignment, classification, genre identification

4-14

4.2.1. Data Source / Preprocessing

• MIDI -> Extraction of melody tracks

R U D U R U D (DUR)

G. Specht: Information Systems 4-8

4-15

4.2.2. Suitable Indexing

• Approches– String matching (-)– DUR – QPD (Query by Pitch Dynamic)

Window |w| = 2k = 23 = 8 Vector 2k-1

Point in a 2k-1-dim. space Storage in spatial data structure

4-16

4.2.3. Query by Pitch Dynamics(QPD)

• Example7531

P-structure: 1 1 3 1 5 5 7 5

∑-tree: 2 4 10 12

6 22

G. Specht: Information Systems 4-9

4-17

QPD

• Example (continuation)P-structure: 1 1 3 1 5 5 7 5

∑-tree: 2 4 10 12

6 22

hierarchical differences:

0 2 0 2

-2 -2

-16

4-18

QPD

• Example (continuation)

Haar-Wavelet Transformation (-16, -2, -2, 2, 0, 2, 0)

1

-1

G. Specht: Information Systems 4-10

4-19

QPD

Vectors respectively points in the2k-1-dim. space

Distance metric:

dE = (v1 - v2) o (v1 - v2)

Storage e.g. in an R-tree:

Melody search: Point searchSimilarity search: Bound-box searchClassifications: Cluster search

4-20

4.3 MusicDB

• Lyrics– Easy => performing a text query

• Metadata Tags– iTunes

• caches information from the audio format's tag (ID3)• Music Genome Project

– Discover new music– collecting details on every song

• melody, harmony, instrumentation, rhythm, vocals, lyricshttp://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes

– find songs with interesting musical similarities• cf. Pandora – a subproject of the Music Genome Project

G. Specht: Information Systems 4-11

4-21

4.3.1 Introduction

• Query by Humming-System (QbH)

– Allows user to find a song by humming the tune– Serveral issues pose significant challenges:

• No perfect queries• Capturing pitches and notes from user is difficult• Melody extraction of a pre-recorded music file is difficult

4-22

Introduction

• QbH research project at NYU– D. Shasha, Y. Zhu– Demo available at http://querybyhum.cs.nyu.edu/

• MusicDB– Massachusetts Institute of Technology, 2005– http://web.mit.edu/edmond/www/projects/musicdb/

• Musicline– Fraunhofer Institut – Demo available at http://www.musicline.de/de/melodiesuche/input

• Musipedia– Demo available at http://www.musipedia.org/query_by_humming.0.html

G. Specht: Information Systems 4-12

4-23

4.3.2 MusicDB Concepts

• Based on MIDI files– No complicated pitch extraction necessary

• Indexing possibilities– Simple string-matching– DUR (UDS) – CubyHum (Extension to UDS)– QPD

4-24

MusicDB Concepts

• Architecture of MusicDB

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

G. Specht: Information Systems 4-13

4-25

MusicDB Concepts

• Melody Extractor

– Conversion of MIDI files into time series

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

4-26

MusicDB Concepts

– Isolation of the channel containing the main melody• Melody is a time series characterized by the highest pitched

(connected) notes in a song• MIDI data of a song:

Screenshots taken from Sonar 4

G. Specht: Information Systems 4-14

4-27

MusicDB Concepts

– Remove certain tracks and isolate the channels with the top 3 averagepitch values & choose channel with most single notes

Screenshots taken from Sonar 4

4-28

MusicDB Concepts

– Note duration and start-time get quantized and overlapping notes will bedeleted (Skyline-Alg)

Screenshots taken from Sonar 4

G. Specht: Information Systems 4-15

4-29

MusicDB Concepts

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

4-30

MusicDB Concepts

• Loading Pipeline

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

G. Specht: Information Systems 4-16

4-31

Melody Time Series

Time-series Slices Normalized Time-Series

To account off-key hummingnormalizing

MusicDB Concepts

4-32

Time-Series Database

PAA –dimensionality reduction

Transformed Database

R-Tree

Insert

Insert Into R-tree with N dimensions

MusicDB Concepts

PAA: Piecewise Aggregate Approximation(Resulting transform is similar to a Haar wavelet transform)

PAA

G. Specht: Information Systems 4-17

4-33

MusicDB Concepts

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

4-34

MusicDB Concepts

• Query

– Same procedure for user-humming as for the MIDI files in thedatabase

– Comparison of both time series (user & R-tree) according to a distance metric

– Results get ordered

G. Specht: Information Systems 4-18

4-35

4.3.3 Conclusion (of MusicDB Concepts)

- MusicDB fuses 2 techniques together (Melody extraction1 and a QbH system2)

- System uses widely available multi-channel polyphonic MIDI files

- Utilizes an open source tool to transcribe notes (Wave to MIDI)

1 Manipulation of Music For Melody Matching, Uitdenbogerd and J. Zobel 2 Query by Humming: A Time-Series Database Approach, Y. Zhu and D. Shasha

MusicDB: A Query by Humming System,Edmond Lau, Annie Ding, Calvin On

4-36

4.3.4 Demos

• QbH research project at NYUhttp://querybyhum.cs.nyu.edu/

• Musiclinehttp://www.musicline.de/de/melodiesuche/input

• Musipediahttp://www.musipedia.org/query_by_humming.0.html

G. Specht: Information Systems 4-19

4-37

4.3.5 References

• MusicDB - A Query by Humming System– http://web.mit.edu/edmond/www/projects/musicdb/

• Demos– http://querybyhum.cs.nyu.edu/– http://www.musicline.de/de/melodiesuche/input– http://www.musipedia.org/query_by_humming.0.html

• Pandora– www.pandora.com