Keyword Detection for the Activation of Speech Assistants

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Hirsch, Hans-Guenter; Gref, Michael (Institute for Pattern Recognition, Niederrhein University of Applied Sciences, Krefeld, Germany)

The detection and recognition of a spoken keyword is the most comfortable method to activate a speech assistant. Normally this keyword detection has to be realized in a client system with only limited computational resources. However, erroneously detecting not spoken keywords as well as not identifying spoken keywords should occur as rarely as possible. We present a two stage algorithm that has been designed under the main criterion of low computational requirements. The first stage is based on comparing the emission and accumulated probabilities with a set of thresholds during a Viterbi decoding with a keyword HMM. To lower the number of erroneous detections the MEL spectrum of a detected keyword is analyzed with means of a neural net in a second processing stage. First recognition experiments with speech data not containing the keyword show that already the first stage leads to a quite low false acceptance rate (FAR). This FAR can be considerably reduced by the second stage.