Rubric: Proceedings - ITG Reports

ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 321: Speech Communication

16th ITG Conference, 24. – 26.09.2025 in Berlin, Germany

ITG-Fachberichte

2025, VII, 182 pages, 140 x 124 mm, Slimlinebox, CD-Rom
ISBN 978-3-8007-6617-8, e-book: ISBN 978-3-8007-6618-5
Personal VDE Members are entitled to a 10% discount on this title

Content Foreword

118.00 € CD-Rom 118.00 € e-book/PDF

This volume presents selected papers from the 16th ITG Conference on Speech Communication, held in Berlin, Germany, from September 24–26, 2025.
The contributions in this volume cover a broad spectrum of research in speech, audio, and spoken language processingfrom fundamental advances to new applications. They reect the interdisciplinary nature of the field, addressing both long-standing challenges and the new opportunities emerging with large-scale models and evolving technologies.
We would like to thank all authors for their valuable contributions, the reviewers for their careful evaluations, and the Organizing and Technical Committees for their dedication and effort.
We hope that this volume provides a useful reference for researchers in the eld and gives food for thought for future research.

Die VDE ITG ist als interdisziplinär arbeitende, wissenschaftliche Fachgesellschaft in das fachübergreifende Netzwerk des VDE eingebunden. Sie agiert als Schnittstelle für Experten der Informationstechnik (ITK) in Wirtschaft, Verwaltung, Lehre und Forschung. Ihre Mitglieder bündeln in enger internationaler Anbindung die deutsche Kompetenz im Bereich der ITK. Die VDE ITG fördert Forschung und Anwendung dieser Schlüsseltechnologie sowie deren effizienten Einsatz in den Bereichen Daten- und Kommunikationstechnik und -systeme, Umweltschutz, Medizin und Verkehr.
Mit ihrem weitgespannten internationalen Netzwerk versteht sich die VDE ITG als Plattform für Innovationen und Wissenstransfer für die erfolgreiche Kooperation von Industriepartnern und Forschungseinrichtungen. Hierzu führt die ITG eine ganze Reihe von Fachtagungen, Diskussionssitzungen und Workshops durch. Mit ihren Studien und Empfehlungen bringt die VDE ITG ihre Expertise in Politik und Gesellschaft ein und nimmt an Förderprogrammen teil.

This conference proceeding contains the following papers, purchasable as PDF download with payment via credit card / PayPal:

Search Conference Papers

Error Analysis in a Modular Meeting Transcription System

Authors:

Vieting, Peter; Berger, Simon; von Neumann, Thilo; Boeddeker, Christoph; Schlueter, Ralf; Haeb-Umbach, Reinhold

Building a German-centric SpeechLLM Using Limited Data

Authors:

Maurya, Manas; Dethmann, Thomas; Walter, Oliver; Schmidt, Christoph Andreas; Koehler, Joachim

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Authors:

Li, Jiatong; Doclo, Simon

Unveiling Deep Speech Embeddings: Acoustic Insights into Hatespeech Detection

Authors:

Rammohan, Rathi Adarshi; Ren, Zhao; Swiderska, Aleksandra; Kuester, Dennis; Schultz, Tanja

The Role of Surprisal and Entropy in Spoken Intercomprehension: An Experiment on Translation of Cognates with Varied Predictability

Authors:

Xue, Wei; Steuer, Julius; Klakow, Dietrich; Moebius, Bernd

Extending Manifold-Based MIMO System Identification to Adaptive Crosstalk Cancellation

Authors:

Hahn, Johannes; Kabzinski, Tobias; Jax, Peter

Speaker vs Noise Conditioning for Adaptive Speech Enhancement

Authors:

Triantafyllopoulos, Andreas; Tsangko, Iosif; Mueller, Michael; Schroeter, Hendrik; Schuller, Bjoern

YOLO-based Signal Detection of Amplitude Modulated Audio Transmissions in Realistic HF Scenarios

Authors:

Henneke, Lukas; Urrigshardt, Sebastian; Fritz, Fabian; Kurth, Frank

Effectiveness of Acceleration Sensors on the Thorax and Abdomen for Speech Breathing Analysis

Authors:

Kazzy, Dani; Kleiner, Christian; Fuchs, Susanne; Birkholz, Peter

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

Authors:

Amiri, Mahdi; Shahreza, Hatef Otroshi; Kodrasi, Ina

Navigating PESQ: Up-to-Date Versions and Open Implementations

Authors:

Torcoli, Matteo; Halimeh, Mhd Modar; Habets, Emanuel A. P.

Mixed-Effects Models Neural Networks for Improved Speech-Based Predictions of Cognitive Decline

Authors:

Behrendt, Jordan; Zhang, Jiumeng; Boernhorst, Claudia; Schultz, Tanja

Personalized Speech Synthesis for Zero-Shot Keyword Spotting

Authors:

Goekgoez, Fahrettin; Cornaggia-Urrigshardt, Alessia; Wilkinghoff, Kevin

Target Speaker Extraction: the Importance of a Powerful Extractor and Content-Informed Embeddings

Authors:

De Souter, Elias; Kindt, Stijn; Yang, Kaixuan; Zhao, Haixin; Song, Siyuan; Song, Yanjue; Madhu, Nilesh

Enhancement of Neural Embeddings for Speaker Identification in Ad-hoc Acoustic Sensor Networks and Multi-Speaker Scenarios

Authors:

Intek, Philipp; Becker, Luca; Koppelmann, Timm; Martin, Rainer

Comparison of Knowledge Distillation Methods for Low-complexity Multimicrophone Speech Enhancement using the FT-JNF Architecture

Authors:

Metzger, Robert; Ohlenbusch, Mattes; Rollwage, Christian; Doclo, Simon

Acoustical characterization and perceptual comparison of four types of 3Dprinted vocal tract models for the German and Japanese vowels /a,e,i,o,u/

Authors:

Kleiner, Christian; Birkholz, Peter; Schaefer, Dominik; Arai, Takayuki

ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 321: Speech Communication

Error Analysis in a Modular Meeting Transcription System

Building a German-centric SpeechLLM Using Limited Data

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Unveiling Deep Speech Embeddings: Acoustic Insights into Hatespeech Detection

The Role of Surprisal and Entropy in Spoken Intercomprehension: An Experiment on Translation of Cognates with Varied Predictability

Extending Manifold-Based MIMO System Identification to Adaptive Crosstalk Cancellation

Speaker vs Noise Conditioning for Adaptive Speech Enhancement

YOLO-based Signal Detection of Amplitude Modulated Audio Transmissions in Realistic HF Scenarios

Effectiveness of Acceleration Sensors on the Thorax and Abdomen for Speech Breathing Analysis

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

Navigating PESQ: Up-to-Date Versions and Open Implementations

Mixed-Effects Models Neural Networks for Improved Speech-Based Predictions of Cognitive Decline

Personalized Speech Synthesis for Zero-Shot Keyword Spotting

Target Speaker Extraction: the Importance of a Powerful Extractor and Content-Informed Embeddings

Enhancement of Neural Embeddings for Speaker Identification in Ad-hoc Acoustic Sensor Networks and Multi-Speaker Scenarios

A fully Zero-shot Approach to Obtaining Specialized and Compact Audio Tagging Models

Evaluating the Impact of Crowdsourced Audio Data on Speech Quality Assessment

A Comparative Analysis on ASR System Combination for Attention, CTC, Factored Hybrid, and Transducer Models

Neural Prosody Prediction for German Articulatory Speech Synthesis

Adapting the Frechet Audio Distance as an Objective Metric for Text-to-Speech Quality Evaluation

Towards Complex-Valued VAE-Based Distillation for Representation Learning in Speech Enhancement

Room Reverberation Effectively Masks Deepfake Traces

Early and Late Reflections in Acoustic Echo Control: An Experimental Study on (Neural) Kalman Filters and DNN Methods

On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation

Binaural Distance Estimation Using a Joint Latent Representation of Acoustic Distance and Direct Path Response

Detecting COPD Exacerbations Before Onset Using Vocal Biomarkers

Comparison of Knowledge Distillation Methods for Low-complexity Multimicrophone Speech Enhancement using the FT-JNF Architecture

Acoustical characterization and perceptual comparison of four types of 3Dprinted vocal tract models for the German and Japanese vowels /a,e,i,o,u/

Low-Complexity Neural Wind Noise Reduction for Audio Recordings

An Improved Neural Network Architecture for Target Speech Extraction

A Very-Low Delay High-Performance Speech Vocoder Based on the Encodec Speech Decoder

Evaluating the Recognition Performance of the RehaLingo Speech Training System with Aphasic Speech

Optimization of Feature and Loss Exponents for Lightweight DNN-based Binaural Speech Enhancement

Unified Learnable 2D Convolutional Feature Extraction for ASR

Blind Estimation of Head Rotations From Binaural Recordings

ReverbFX: A Dataset of Room Impulse Responses Derived from Reverb Effect Plugins for Singing Voice Dereverberation

ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 321: Speech Communication

Your Request about getting an E-Book Network Licence

Individual Cookie Settings

Necessary Cookies

Optional Cookies