Perceptual Hashing for the Identification of Telephone Speech

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Grutzek, Gary; Strobl, Julian; Mainka, Bernhard; Pörschmann, Christoph; Knospe, Heiko (Institute of Communications Engineering, Cologne University of Applied Sciences, 50679 Cologne, Germany)
Kurth, Frank (Fraunhofer Institute for Communication, Information Processing and Ergonomics FKIE, 53343 Wachtberg, Germany)

The hashing of audio content for the identification of specific recordings and their degradations has many applications. In particular music identification is well established. In this paper, the perceptual hashing of speech is investigated and applied to the content-based identification of telephone spam. Based on well-known audio fingerprinting methods, various modifications and extensions have been developed and compared. We explore index-based search methods in order to match sequences of feature vectors. We investigate the influence of the hash size on the recognition rate and in particular the search efficiency in a large and and constantly updated fingerprint database like in a telephone speech scenario. It is shown that two 32- bit hashes with a unique time-distance allow for an efficient identification of telephone speech within a large call database.