using search engines for classification: does it still work?

Post on 06-May-2015

414 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

My presentation at the adMIRe workshop on ISM 2009 in San Diego. The presentation is about our study on the use of search engines to classify genres.

TRANSCRIPT

USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT

STILL WORK?Sten Govaerts, Nik Corthaut, Erik Duval

•Our problem

•Classification using search engines

•The setup

•The evaluation

•Conclusion

TUNIFY

TUNIFY

TUNIFY

HOW DOES IT WORK?

• manually annotated metadata

• 5 music experts at Aristo Music and different consultants

• almost 80,000 songs

• but, not enough...

PROBLEMS

• satisfying the music choice of all customers

• retail and catering differ from you and me!

• new markets

• react fast on emerging music trends

• adding the full Belgian library catalog

GENERATE THE METADATA

• from different sources:

• the audio signal• web sources• the Aristo database• attention metadata

• using our metadata generation framework: SamgI

GENRE...

• our master thesis looked at different ways to generate genre...

ONE APPROACH...

• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265.

• G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.

CLASSIFICATION WITH SEARCH ENGINES

using co-occurrence

CLASSIFICATION WITH SEARCH ENGINES

using co-occurrence

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

CLASSIFICATION WITH SEARCH ENGINES

Artist + Genre + Schema

using co-occurrence

Rock:

Blues:

Country:

Jazz:

Pop:

Metal:

Rock:

Blues:

Country:

Jazz:

Pop:

Metal:

0,013

0,009

0,013

0,005

0,0150,009

RESULTS

• master thesis student’s results were much worse

• what happened?

• did Google search result count change?

• has Google Search API different results?

• is the student’s implementation correct?

HOW TO EVALUATE THIS?

• re-run the original experiment

• evaluate on the same data set: 1995 artists and 9 genres.

• different search engines: Google, Yahoo! and Live! Search.

• over time: 8 times over a period of 36 days.

THE DATA SET

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

THE DATA SET

9%

12%

5%4%

41%

13%

2%3%10%

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

THE DATA SET

Blues Country ElectronicFolk Jazz MetalRap Reggae RnB

MOTION CHART

• http://hmdb.cs.kuleuven.be/muzik/gapminder.html

MORE FINE-GRAINED...

• 18 artists

• more search engines: Google.co.uk/.fr/.be, uk/fr.search.yahoo.com

• twice a day for 53 days

• 250,000 queries!

2 Pac Rap

Alan Lomax Folk

Art Pepper Jazz

Cradle of Filth Metal

David Parsons Electronic

Desmond Dekker Reggae

Downpour Metal

IceT Rap

Jerry Butler RnB

Joy Lynn White Country

Louisiana Red Blues

Lou Rawls RnB

LTJ Bukem Electronic

Peter Tosh Reggae

Pinetop Smith Jazz

Robert Johnson Blues

Roy Rogers Country

Steeleye Span Folk

MAIN SEARCH ENGINE RESULTS

REGIONAL GOOGLES

WHAT TO USE?

• use Google when it’s stable else rely on Yahoo!

• when is it stable? test with a small set

• some artists get classified incorrectly on bad days

• compare the accuracy achieved with the test set to the average.

CONCLUSION

• still works after 3 years

• Google -> Yahoo! -> Live! Search

• why does Google fluctuate?

• a generic version of an all purpose classifier is implemented in metadata generation framework

FUTURE WORK

• understand the performance differences of regional search engines

• use alternative search engines

• tweak the genre taxonomy depending on the search engine

Q & A.

DEMO METADATA GENERATION

• http://ariadne.cs.kuleuven.be/samgi-service/

top related