Emonymous

Speaker Anonymization while Preserving the Emotional Expression Effect
(BMBF 2021-2023)

Thanks to technological advances in the field of artificial intelligence (AI), interactive and intelligent voice assistants are increasingly finding their way into everyday social life. For data protection reasons, however, their use is mostly limited to applications in the private sphere. In particular, the ability to identify speakers on the basis of a large amount of collected data prevents the effective use of voice assistants in areas that are sensitive under data protection law, such as the healthcare sector or learning support. For many applications, however, the identity of the speaker is not necessarily relevant; it is only necessary to know exactly what was said. In addition to the content of what has been said, language also contains other indicators, such as emotionality or expression. However, preserving these linguistic subtleties after anonymizing the speaker is very important for the interpretation and comprehensive understanding of what has been said in many areas of application (e.g. to correctly assess a patient’s state of health).

At the start of the project, comprehensive emotional speech data was collected and validated to create a solid database for the development and training of anonymization methods. New metrics for evaluating the emotional similarity between original and anonymized speech were designed and implemented to ensure that emotional expressions remain authentic. Resynthesis and StarGAN methods for speaker anonymization were investigated. These methods showed promising results, with the StarGAN method in particular achieving a significant improvement in anonymization. This was followed by the development and application of Differential Digital Signal Processing (DDSP) technology, which enables the modification of speaker-specific features while preserving linguistic and emotional content.

offical project brief of the funding initiative

Deutschlandfunk radio interview with Ingo Siegert about the project (in German)

Project Partners

References

2025

  1. StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation
    Suhita Ghosh, Melanie Jouaiti, Jan-Ole Perschewski, and Sebastian Stober
    In Interspeech 2025, Aug 2025
  2. Investigating Inclusivity of Whisper for Dysfluent Speech
    Evelyn Starzew, Suhita Ghosh, and Valerie Krug
    In 12th edition of the Disfluency in Spontaneous Speech Workshop (DiSS 2025), Sep 2025

2024

  1. Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
    Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert, and Sebastian Stober
    In Interspeech 2024, Sep 2024
  2. Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
    Suhita Ghosh, Tim Thiele, Frederic Lorbeer, and Sebastian Stober
    In Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation, 2024
  3. T-DVAE: A Transformer-Based Dynamical Variational Autoencoder for Speech
    Jan-Ole Perschewski and Sebastian Stober
    In Artificial Neural Networks and Machine Learning – ICANN 2024, 2024

2023

  1. Improving voice conversion for dissimilar speakers using perceptual losses
    Suhita Ghosh, Yamini Sinha, Ingo Siegert, and Sebastian Stober
    In 49. Jahrestagung für Akustik DAGA 2023, Hamburg, Mar 2023
  2. Anonymization of Stuttered Speech – Removing Speaker Information while Preserving the Utterance
    Jan Hintz, Sebastian Bayerl, Yamini Sinha, Suhita Ghosh, Martha Schubert, Sebastian Stober, Korbinian Riedhammer, and Ingo Siegert
    In 3rd Symposium on Security and Privacy in Speech Communication, Aug 2023
  3. StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings
    Arnab Das, Suhita Ghosh, Tim Polzehl, Ingo Siegert, and Sebastian Stober
    In 12th ISCA Speech Synthesis Workshop (SSW2023), Aug 2023
  4. Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion
    Suhita Ghosh, Arnab Das, Yamini Sinha, Ingo Siegert, Tim Polzehl, and Sebastian Stober
    In INTERSPEECH 2023, Aug 2023

2022

  1. Voice Privacy - leveraging multi-scale blocks with ECAPA-TDNN SE-Res2NeXt extension for speaker anonymization
    Razieh Khamsehashari, Yamini Sinha, Jan Hintz, Suhita Ghosh, Tim Polzehl, Clarlos Franzreb, Sebastian Stober, and Ingo Siegert
    In 2nd Symposium on Security and Privacy in Speech Communication, Sep 2022