Perceptual Hashing for the Identification of Telephone Speech

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Grutzek, Gary; Strobl, Julian; Mainka, Bernhard; Pörschmann, Christoph; Knospe, Heiko (Institute of Communications Engineering, Cologne University of Applied Sciences, 50679 Cologne, Germany)
Kurth, Frank (Fraunhofer Institute for Communication, Information Processing and Ergonomics FKIE, 53343 Wachtberg, Germany)

Abstract:
The hashing of audio content for the identification of specific recordings and their degradations has many applications. In particular music identification is well established. In this paper, the perceptual hashing of speech is investigated and applied to the content-based identification of telephone spam. Based on well-known audio fingerprinting methods, various modifications and extensions have been developed and compared. We explore index-based search methods in order to match sequences of feature vectors. We investigate the influence of the hash size on the recognition rate and in particular the search efficiency in a large and and constantly updated fingerprint database like in a telephone speech scenario. It is shown that two 32- bit hashes with a unique time-distance allow for an efficient identification of telephone speech within a large call database.