ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 321: Speech Communication

16th ITG Conference, 24. – 26.09.2025 in Berlin, Germany

ITG-Fachberichte

2025, VII, 182 pages, 140 x 124 mm, Slimlinebox, CD-Rom
ISBN 978-3-8007-6617-8, e-book: ISBN 978-3-8007-6618-5
Personal VDE Members are entitled to a 10% discount on this title

Content Foreword

This volume presents selected papers from the 16th ITG Conference on Speech Communication, held in Berlin, Germany, from September 24–26, 2025.
The contributions in this volume cover a broad spectrum of research in speech, audio, and spoken language processingfrom fundamental advances to new applications. They reect the interdisciplinary nature of the field, addressing both long-standing challenges and the new opportunities emerging with large-scale models and evolving technologies.
We would like to thank all authors for their valuable contributions, the reviewers for their careful evaluations, and the Organizing and Technical Committees for their dedication and effort.
We hope that this volume provides a useful reference for researchers in the eld and gives food for thought for future research.
Die VDE ITG ist als interdisziplinär arbeitende, wissenschaftliche Fachgesellschaft in das fachübergreifende Netzwerk des VDE eingebunden. Sie agiert als Schnittstelle für Experten der Informationstechnik (ITK) in Wirtschaft, Verwaltung, Lehre und Forschung. Ihre Mitglieder bündeln in enger internationaler Anbindung die deutsche Kompetenz im Bereich der ITK. Die VDE ITG fördert Forschung und Anwendung dieser Schlüsseltechnologie sowie deren effizienten Einsatz in den Bereichen Daten- und Kommunikationstechnik und -systeme, Umweltschutz, Medizin und Verkehr.
Mit ihrem weitgespannten internationalen Netzwerk versteht sich die VDE ITG als Plattform für Innovationen und Wissenstransfer für die erfolgreiche Kooperation von Industriepartnern und Forschungseinrichtungen. Hierzu führt die ITG eine ganze Reihe von Fachtagungen, Diskussionssitzungen und Workshops durch. Mit ihren Studien und Empfehlungen bringt die VDE ITG ihre Expertise in Politik und Gesellschaft ein und nimmt an Förderprogrammen teil.
1

Error Analysis in a Modular Meeting Transcription System

Authors:
Vieting, Peter; Berger, Simon; von Neumann, Thilo; Boeddeker, Christoph; Schlueter, Ralf; Haeb-Umbach, Reinhold

2

Building a German-centric SpeechLLM Using Limited Data

Authors:
Maurya, Manas; Dethmann, Thomas; Walter, Oliver; Schmidt, Christoph Andreas; Koehler, Joachim

3

4

Unveiling Deep Speech Embeddings: Acoustic Insights into Hatespeech Detection

Authors:
Rammohan, Rathi Adarshi; Ren, Zhao; Swiderska, Aleksandra; Kuester, Dennis; Schultz, Tanja

5

6

Extending Manifold-Based MIMO System Identification to Adaptive Crosstalk Cancellation

Authors:
Hahn, Johannes; Kabzinski, Tobias; Jax, Peter

7

Speaker vs Noise Conditioning for Adaptive Speech Enhancement

Authors:
Triantafyllopoulos, Andreas; Tsangko, Iosif; Mueller, Michael; Schroeter, Hendrik; Schuller, Bjoern

8

YOLO-based Signal Detection of Amplitude Modulated Audio Transmissions in Realistic HF Scenarios

Authors:
Henneke, Lukas; Urrigshardt, Sebastian; Fritz, Fabian; Kurth, Frank

9

Effectiveness of Acceleration Sensors on the Thorax and Abdomen for Speech Breathing Analysis

Authors:
Kazzy, Dani; Kleiner, Christian; Fuchs, Susanne; Birkholz, Peter

10

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

Authors:
Amiri, Mahdi; Shahreza, Hatef Otroshi; Kodrasi, Ina

11

Navigating PESQ: Up-to-Date Versions and Open Implementations

Authors:
Torcoli, Matteo; Halimeh, Mhd Modar; Habets, Emanuel A. P.

12

Mixed-Effects Models Neural Networks for Improved Speech-Based Predictions of Cognitive Decline

Authors:
Behrendt, Jordan; Zhang, Jiumeng; Boernhorst, Claudia; Schultz, Tanja

13

Personalized Speech Synthesis for Zero-Shot Keyword Spotting

Authors:
Goekgoez, Fahrettin; Cornaggia-Urrigshardt, Alessia; Wilkinghoff, Kevin

14

Target Speaker Extraction: the Importance of a Powerful Extractor and Content-Informed Embeddings

Authors:
De Souter, Elias; Kindt, Stijn; Yang, Kaixuan; Zhao, Haixin; Song, Siyuan; Song, Yanjue; Madhu, Nilesh

15

16

17

Evaluating the Impact of Crowdsourced Audio Data on Speech Quality Assessment

Authors:
Shchegelskiy, Kirill; El-Tannir, Malek; Wardah, Wafaa; Kocak Bueyuektas, Tugce Melike; Moeller, Sebastian

18

A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models

Authors:
Bayoumi, Noureldin; Schmitt, Robin; Raissi, Tina; Zeyer, Albert; Schlueter, Ralf; Ney, Hermann

19

Neural Prosody Prediction for German Articulatory Speech Synthesis

Authors:
Steiner, Peter; Huang, Zihao; Fietkau, Arne-Lukas; Birkholz, Peter

20

Adapting the Frechet Audio Distance as an Objective Metric for Text-to-Speech Quality Evaluation

Authors:
Zavistanavicius, Laurynas; Zalkow, Frank; Dittmar, Christian; Stevenson, Robert L.

21

22

Room Reverberation Effectively Masks Deepfake Traces

Authors:
Hoppe, Sophie; Hacker, Anabell; Brueckl, Markus

23

24

On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation

Authors:
Meise, Adrian; Cord-Landwehr, Tobias; Haeb-Umbach, Reinhold

25

Binaural Distance Estimation Using a Joint Latent Representation of Acoustic Distance and Direct Path Response

Authors:
Neudek, Daniel; Stodt, Benjamin; Getzmann, Stephan; Martin, Rainer

26

Detecting COPD Exacerbations Before Onset Using Vocal Biomarkers

Authors:
Nippert, Lars; Simons, Sami O.; Hoxha, Julia

27

Comparison of Knowledge Distillation Methods for Low-complexity Multimicrophone Speech Enhancement using the FT-JNF Architecture

Authors:
Metzger, Robert; Ohlenbusch, Mattes; Rollwage, Christian; Doclo, Simon

28

29

Low-Complexity Neural Wind Noise Reduction for Audio Recordings

Authors:
Eftekhari, Hesam; Chetupalli, Srikanth Raj; Shetu, Shrishti Saha; Habets, Emanuel A. P.; Thiergart, Oliver

30

An Improved Neural Network Architecture for Target Speech Extraction

Authors:
Joos, David; Faubel, Friedrich; Jungclaussen, Jonas; Buck, Markus; Minker, Wolfgang

31

32

Evaluating the Recognition Performance of the RehaLingo Speech Training System with Aphasic Speech

Authors:
Hirsch, Hans-Guenter; Tiggelkamp, Yannic; Neumann, Christian; Bolten, Tobias

33

Optimization of Feature and Loss Exponents for Lightweight DNN-based Binaural Speech Enhancement

Authors:
Chinaev, Aleksej; Enzner, Gerald; Thaleiser, Stefan

34

Unified Learnable 2D Convolutional Feature Extraction for ASR

Authors:
Vieting, Peter; Hilmes, Benedikt; Schlueter, Ralf; Ney, Hermann

35

Blind Estimation of Head Rotations From Binaural Recordings

Authors:
Fleischhauer, Erik; Jax, Peter

36