Phoneme Boundary Detection using Deep Bidirectional LSTMs

Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Franke, Joerg; Mueller, Markus; Stueker, Sebastian; Waibel, Alex (Karlsruhe Institute of Technology, Karlsruhe, Germany)
Hamlaoui, Fatima (Zentrum für Allgemeine Sprachwissenschaft, Berlin, Germany)

Abstract:
In this paper we investigate the automatic detection of phoneme boundaries in audio recordings with the help of deep bidirectional LSTMs. This work is motivated by the needs of the project BULB which aims to support linguists in documenting unwritten languages. The automatic detection of phoneme boundaries in audio recordings of a new language is part of the technical requirements of the BULB project. For our first experiments with LSTMs for this task, we worked on TIMIT and BUCKEYE and measured the performance of our LSTMs using accuracy, precision, recall and F-measure. We then applied the trained networks crosslingually to Basaa, one of the Bantu languages addressed in BULB. With the LSTMs trained for this paper we achieve a phoneme segmentation performance on TIMIT that, to the best of our knowledge, outperforms the systems reported in literature so far.