This PhD project by Christian Spevak deals with music information retrieval, in particular the detection of perceptually similar sounds in an audio document (sound spotting). The idea is to select a target event and search for similar occurrences in the whole document; for example, a piece of music. In database environments this is called a query by example. The system investigated employs an auditory model, a self-organizing neural network, and a pattern matching technique (DP matching). The research was proposed by INA-GRM (Groupe de Recherches Musicales, Paris) with a view to analyzing and transcribing non-notated music and retrieving sounds from archives.

The raw audio data is preprocessed by a computational model of the human ear to extract perceptually relevant features and divided into short frames to reduce the amount of data.

In the second stage a self-organizing map is used to quantize the feature vectors, collecting similar frames into the same ‘best-matching unit’.

While the first two stages produce an index of the audio data, the third stage accomplishes the retrieval. This is done by an approximate string matching algorithm that searches the entire text for substrings similar to the selected target pattern.

block diagram of Sound Spotter

