mp25: audio fingerprinting and metadata correction with python

20
Audio fingerprinting and metadata correction with Python Alastair Porter November 21, 2011

Upload: montreal-python

Post on 22-Apr-2015

1.854 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Mp25: Audio Fingerprinting and metadata correction with Python

Audio fingerprinting and metadatacorrection with Python

Alastair Porter

November 21, 2011

Page 2: Mp25: Audio Fingerprinting and metadata correction with Python

Me

Background in Computer ScienceMasters McGill Music TechOnline

http://github.com/alastair (20/28 music; 11 in python)http://twitter.com/alastairporter

Page 3: Mp25: Audio Fingerprinting and metadata correction with Python

Python as a go-to language

Quick for prototypingUse the same code in a production releaseVery handy for API access (thin wrapper around urllib2)

Page 4: Mp25: Audio Fingerprinting and metadata correction with Python

Music and Metadata

Page 5: Mp25: Audio Fingerprinting and metadata correction with Python

Music and Metadata

The problem:People are really bad at naming musicInconsistent over releases

The solution:CrowdsourcingGet info from as many trusted sources as possibleMake renaming take no effort

Page 6: Mp25: Audio Fingerprinting and metadata correction with Python

MusicBrainz

Page 7: Mp25: Audio Fingerprinting and metadata correction with Python

Amazon

Page 8: Mp25: Audio Fingerprinting and metadata correction with Python

Amazon (Coverart)

Page 9: Mp25: Audio Fingerprinting and metadata correction with Python

Last.fm

Page 10: Mp25: Audio Fingerprinting and metadata correction with Python

Last.fm (Genre tags)

Page 11: Mp25: Audio Fingerprinting and metadata correction with Python

MusicBrainz

Page 12: Mp25: Audio Fingerprinting and metadata correction with Python

albumidentify

http://github.com/albumidentify/albumidentify

Page 13: Mp25: Audio Fingerprinting and metadata correction with Python

MP3, FLAC, Ogg, CDs

Page 14: Mp25: Audio Fingerprinting and metadata correction with Python

Identification strategy

If there’s a CD TOC, use that (musicbrainz lookup)If no match, use audio fingerprintingIf no match, do a text lookup (artist/album)

Page 15: Mp25: Audio Fingerprinting and metadata correction with Python

Fingerprinting

Converts an audio signal to a short sequence of numbersSmaller to compare than an entire filePerceptual features rather than byte comparison (workswith different encodings)

Page 16: Mp25: Audio Fingerprinting and metadata correction with Python

Identification strategy

Fingerprinting gives us a set of candidate tracksA track could be on many albums (original release, best of,mix album)Keep a list of what tracks we have for each albumOnce we fill all the slots for an album, success!

Page 17: Mp25: Audio Fingerprinting and metadata correction with Python

Metadata strategy

Text information from MusicbrainzGenre from last.fmImage from Amazon (or folder.jpg)Musicbrainz tells us where these are (don’t need to search)Save in every file (Text is cheap)

Page 18: Mp25: Audio Fingerprinting and metadata correction with Python

Writing it all out

Custom MP3/ID3 writerOgg meta tagsFLAC meta tagsName files

Artist/Artist - Year - Album/01 - Artist - Track

Replaygain!Be a good citizen: Submit fingerprints to musicbrainz

Page 19: Mp25: Audio Fingerprinting and metadata correction with Python

What’s next

New version of musicbrainzNew fingerprinterMore metadataMore metadata

Page 20: Mp25: Audio Fingerprinting and metadata correction with Python

Thanks

More information:

MusicBrainz: http://musicbrainz.orgalbumidentify:http://github.com/albumidentify/albumidentify

More fingerprinting: http://acoustid.org,http://echoprint.me

Last.fm