Beyond the QueryByExample Paradigm: New Query
Interfaces for Music Information Retrieval

George Tzanetakis, Andreye Ermolinskyi, Perry Cook
Computer Science Department, Princeton University
email: gtzan@cs.princeton.edu

Abstract
The majority of existing work in music information retrieval 
for audio signals has followed the content-based
querybyexample paradigm. In this paradigm a musical 
piece is used as a query and the result is a list of
other musical pieces ranked by their content similarity. 
In this paper we describe algorithms and graphical
user interfaces that enable novel alternative ways for
querying and browsing large audio collections. Computer 
audition algorithms are used to extract content
information from audio signals. This automatically extracted 
information is used to configure the graphical
user interfaces and to genereate new query audio signals 
for browsing and retrieval.

References
Bartsch, M. A. and G. H. Wakefield (2001). To Catch a
Chorus: Using ChromaBased Representation for Audio 
Thumbnailing. In Proc. Int. Workshop on applications 
of Signal Processing to Audio and Acoustics,
Mohonk, NY, pp. 15--19. IEEE.
Belongie, S., C. Carson, H. Greenspan, and J. Malik
(1998, January). Blobworld: A system for region
based image indexing and retrieval. In Proc. 6th Int.
Conf. on Computer Vision.
Biles, J. (1994, September). GenJam: A Genetic Algorithms 
for Generating Jazz Solos. In Proc. Int. Computer 
Music Conf. (ICMC), Aarhus, Denmark, pp.
131--137.
Cook, P. (1997, August). Physically inspired sonic modeling 
(PHISM): synthesis of percussive sounds. Computer Music Journal 21(3).
Cook, P. (1999). Toward physicallyinformed parametric
synthesis of sound effects. In Proc. IEEE Workshop
on applications of Signal Processing to Audio and
Acoustics, WASPAA, New Paltz, NY. Invited Keynote
Address.
Cook, P. and G. Scavone (1999, October). The Synthesis 
Toolkit (STK), version 2.1. In Proc. Int. Computer
Music Conf. ICMC, Beijing, China. ICMA.
Dannenberg, R. (1984). An online algorithm for realtime
accompaniment. In Proc. Int. Computer Music Conf.,
Paris, France, pp. 187--191.
Essl, G. and P. Cook (2000). Measurements and efficient
simulations of bowed bars. Journal of Acoustical Society 
of America (JASA) 108(1), 379--388.
Fernstrom, M. and E. Brazil (2001, July). Sonic Browsing:
an auditory tool for multimedia asset management. In
Proc. Int. Conf. on Auditory Display (ICAD), Espoo,
Finland.
Flickner, M. and et al. (1995, September). Query by image 
and video content: the QBIC system. IEEE Computer 28(9), 23--32.
Garton, B. (1992, October). Virtual Performance Modelling. 
In Proc. Int. Computer Music Conf. (ICMC),
San Jose, California, pp. 219--222.
Goto, M. and Y. Muraoka (1998). Music Understanding
at the Beat Level: Realtime Beat Tracking of Audio 
Signals. In D. Rosenthal and H. Okuno (Eds.),
Computational Auditory Scene Analysis, pp. 157--
176. Lawrence Erlbaum Associates.
Jose, J. M., J. Furner, and D. J. Harper (1998). Spatial
querying for image retrieval: a useroriented evaluation. 
In Proc. SIGIR Conf. on research and development 
in Information Retrieval, Melbourne, Australia.
ACM.
Krasner, G. E. and S. T. Pope (1988, August). A cook
book for using the modelviewcontroller user interface 
paradigm in Smalltalk80. Journal of Object
Oriented Programming 1(3), 26--49.
Laroche, J. (2001). Estimating Tempo, Swing and Beat
Locations in Audio Recordings. In Proc. Int. Workshop 
on applications of Signal Processing to Audio
and Acoustics WASPAA, Mohonk, NY, pp. 135--139.
IEEE.
Logan, B. (2000). Music summarization using key
phrases. In Proc. Int. Conf. on Acoustics, Speech and
Signal Processing ICASSP. IEEE.
Masako Nishijima and K. Watanabe (1992). Interactive
Music Composer based on Neural Networks. In Proc.
Int. Computer Music Conf. (ICMC), San Jose, California.
Pentland, A., R. Picard, and S. Sclaroff (1994, July). Photobook: 
Tools for ContentBased Manipulation of Image Databases. IEEE Multimedia, 73--75.
Scheirer, E. (1998, January). Tempo and beat analysis of
acoustic musical signals. Journal of the .Acoustical
Society of America 103(1), 588,601.
Scheirer, E. and M. Slaney (1997). Construction and evaluation 
of a robust multifeature speech/music discriminator. 
In Proc. Int. Conf. on Acoustics, Speech and
Signal Processing ICASSP, pp. 1331--1334. IEEE.
Schwarz, D. (2000, December). A system for data
driven concatenative sound synthesis. In Proc. Cost
G6 Conf. on Digital Audio Effects (DAFX), Verona,
Italy.
Shneiderman, B. (1998). Designing the User Interface:
Strategies for Effective HumanComputer Interaction
(3rd ed. ed.). AddisonWesley.
Tzanetakis, G. and P. Cook (2000). Marsyas: A frame
work for audio analysis. Organised Sound 4(3).
Tzanetakis, G. and P. Cook (2001, August). Marsyas3D:
a prototype audio browsereditor using a large scale
immersive visual and audio display. In Proc. Int. Conf.
on Auditoy Display (ICAD), Espoo, Finland.
Tzanetakis, G. and P. Cook (2002). Musical Genre Classification 
of Audio Signals. IEEE Transactions on
Speech and Audio Processing. (accepted for publication).
Wold, E., T. Blum, D. Keislar, and J. Wheaton (1996).
Contentbased classification, search and retrieval of
audio. IEEE Multimedia 3(2).
Zils, A. and F. Pachet (2001, December). Musical Mosaicing. 
In Proc. CostG6 Conf. on Digital Audio Effects
(DAFX), Limerick, Ireland.