Silvan Mertes M.Sc.
Phone: | +49 (821) 598 - 2342 |
Email: | silvan.mertes@informatik.uni-augsburginformatik.uni-augsburg.de () |
Room: | 2038 (N) |
Address: | Universitätsstraße 6a, 86159 Augsburg |
Research Interests
- Deep Learning
- Adversarial Learning
- Generative Models
- Sound and Image Processing
Links
Academic Activities
- Review activities for Transactions on Affective Computing
- Review activities for ACM Conference on Human Factors in Computing Systems (CHI)
- Review activities for IEEE Signal Processing Magazine
-
Review activities for International Conference on Multimodal Interaction (ICMI)
-
Review activities for Transactions on Audio, Speech and Language Processing
-
Review activities for Applied Artificial Intelligence
-
Review activities for XAI2023 (XAI@IJCAI)
-
Review activities for European Conference on Artificial Intelligence (ECAI)
-
Review activities for IEEE Robotics and Automation Letters
-
Review activities for Elsevier Expert Systems With Applications
-
Review activities for International Conference on Affective Computing & Intelligent Interaction (ACII)
-
Review activities for Audio Mostly
-
Review activities for PeerJ Computer Science
-
Coordinator Human-Centered Production Technologies in the AI production network Augsburg
-
Program Committee member ACM Conference on Intelligent User Interfaces (IUI) 2025
-
Program Committee member International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2025
-
Scientific Committee member Audiomostly 2024
-
Organizing Committee member Interdisciplinary Tutorshop on Interactions with Embodied Virtual Agents at IVA 2024
-
Session Chair 5th International Conference on Deep Learning Theory and Applications (DeLTA’24)
-
Program Committee member Trustworthy Sequential Decicion-making and Optimization Workshop at ECAI 2024
-
Program Committee member International Conference on Affective Computing & Intelligent Interaction (ACII) 2024
-
Program Committee member Workshop on Explainable Artificial Intelligence at IJCAI 2023
-
Session Chair 2nd International Conference on Deep Learning Theory and Applications (DeLTA’21)
-
Program Committee member International Conference on Multimodal Interaction (ICMI) 2021-2023
Awards
- International Conference on Deep Learning Theories and Applications (DeLTA 2020) - Best Paper Award Paper
- IEEE Virtual Reality (IEEEVR 2022) - Honorable Mention Paper
- Creativity & Cognition (C&C 2022) - Honorable Mention Paper
- ACII A-VB Challenge 2022 "Type" Subtask - 1st Place Paper
- ComParE Challenge 2021 "Escalation Detection" Subtask - 2nd Place Paper
- International Conference on Deep Learning Theories and Applications (DeLTA 2024) - Best Poster Award
Projects
Supervised Theses
- Automatische Generierung von Soundkulissen mit Hilfe von Deep Learning. (Bachelor, 2024)
- Generating Personalized Counterfactual Feedback for Javelin Throw Technique Improvement. (Bachelor, 2024)
- Automatische Kolorierung von Mangas mithilfe von Deep Learning. (Bachelor, 2024)
- Gezielte Manipulation von Umgebung und Darstellung virtueller Charaktere in Bildern durch Diffusion Models. (Bachelor, 2024)
- Konzeption und Implementierung einer nutzerfreundlichen grafischen Oberfläche für multimodale Emotionserkennung. (Bachelor, 2024)
- Entwicklung eines interaktiven, durch maschinelles Lernen gestützten Trainingssystems für extreme Gesangstechniken. (Bachelor, 2024, Co-Betreuung)
- Computer-assisted Feedback for Javelin Throw. (Bachelor, 2024, Co-Betreuung)
- Texture Editing with Diffusion Models. (Project Module, 2024)
- GradCam zur Analyse von GAN-Trainingsprozessen. (Bachelor, 2024)
- Using CycleGAN to Learn Image-to-Image Translation for Unpaired Facial Expression Data. (Master, 2023, Co-Betreuung)
- Computational Generation and Adaption of Climbing Routes through Adversarial Learning. (Master, 2023, Co-Betreuung)
- Generating Audio Triggers for an Autonomous Sensory Meridian Response with Generative Adversarial Networks. (Bachelor, 2023)
- Diffusion-based Counterfactual Explanation Generation for Facial Emotion Recognition. (Project Module, 2023)
- Using GANs for Combining Counterfactual Explanations and Feature Attribution. (Master, 2023)
- Evaluating GAN-based Alterfactual Explanation Generation. (Project Module, 2023)
- Exploring Tangible User Interfaces for Latent Space Manipulation of Generative Adversarial networks. (Bachelor, 2022, Co-Betreuung)
- Implementation of a Classification Model for Rhythmic Attunement in Music Therapy Sessions. (Bachelor, 2022, Co-Betreuung)
- Generating Counterfactual Explanations for Atari Agents via Generative Adversarial Networks. (Master, 2022, Co-Betreuung)
- Alterfactuals as a Novel Explanation Method for Image Classifiers. (Master, 2021)
- Exploring Opportunities for Musical Creativity Support in VR through Human-Computer-Interfaces and Interaction Design. (Master, 2021, Co-Betreuung)
- Reinforcement Learning Techniques as Enhancement of frame-level Speech Emotion Recognition. (Master, 2021, Co-Betreuung)
- Konträre Chatbotpersonas im internen Businessumfeld: Entwicklung und Präferenzanalyse. (Master, 2021)
- Conditional Human Image Synthesis with Generative Adversarial Networks. (Bachelor, 2020)
Open Thesis Topics
The following topics can be flexibly varied in scope and orientation, so that the realization as a bachelor thesis, master thesis or project module is possible. Furthermore, the focus of the content can of course be aligned with the interests of the student.
Furthermore, I am always happy to receive your own suggestions for topics, as long as they show a certain overlap with my research focus.
Alterfactual Explanations
Alterfactual Explanations sind ein neuartiger Ansatz, künstliche Intelligenz zu erklären. Hierbei werden Eingabedaten so verändert, dass für die Entscheidung der KI irrelevante Merkmale verändert werden. Ziel dieser Arbeit ist, existierende, GAN-basierte Algorithmen zur Erzeugung von Alterfactual Explanations auf mehrere Datensätze anzuwenden und anschließend das Konzept von Alterfactuals in einer Nutzerstudie zu evaluieren.
Audio Diffusion Models
Diffusion Models sind die neuester Generation generativer künstlicher Intelligenz, bekannt geworden unter anderem durch Applikationen wie "DALL-E 2" oder "Midjourney". In dieser Arbeit soll untersucht werden, ob mit Hilfe von Diffusion Models Textbeschreibungen zu Audiodaten umgewandelt werden können, so wie es im Bereich der Bildgenerierung bereits verbreitet ist.
Interaktives Lehrsystem mit Diffusion Models
Diffusion Models sind die neuester Generation generativer künstlicher Intelligenz, bekannt geworden unter anderem durch Applikationen wie "DALL-E 2" oder "Midjourney", welche hochwertige Bilder aus Textbeschreibungen generieren können. Mit Hilfe von Diffusion Models ist es außerdem möglich, Teile eines vorhandenen Bildes neu zu generieren ("Inpainting"). In dieser Arbeit soll diese Möglichkeit ausgenutzt werden, um ein interaktives Erklärsystem zu implementieren, indem Diffusion Models und Techniken aus dem Bereich XAI kombiniert werden.
Text-to-Speech mit Diffusion Models
Diffusion Models sind die neuester Generation generativer künstlicher Intelligenz, bekannt geworden unter anderem durch Applikationen wie "DALL-E 2" oder "Midjourney". In dieser Arbeit soll untersucht werden, ob mit Hilfe von Diffusion Models Text zu Audio umgewandelt werden kann, um ein hochqualitatives Text-to-Speech System zu erhalten.
Audio Counterfactual Explanations
In dieser Arbeit soll ein System entwickelt werden, das auf Basis von Latent Vector Evolution (LVE) Erklärungen für KI-Systeme für die Audio-Domäne erzeugt. LVE ist ein auf evolutionären Algorithmen basierendes Verfahren, um GANs zu durchsuchen. Mithilfe dieser Algorithmen sollen Counterfactual Explanations generiert werden. Dies bedeutet, von einer KI bewertete Audiodaten sollen so verändert werden, dass sich die Bewertung der KI ändert. Dadurch wird dem Nutzer des Systems eine „alternative Realität“ gezeigt, die ein besseres Verständnis der KI bewirken soll.
Video Style Conversion mit Diffusion Models
Diffusion Models sind die neuester Generation generativer künstlicher Intelligenz, bekannt geworden unter anderem durch Applikationen wie "DALL-E 2" oder "Midjourney". Diffusion Models können beispielsweise dazu benutzt werden, den Stil eines Bildes zu ändern (z.B. von photorealistisch zu comic-like). In dieser Arbeit soll eine bestehende Diffusion Model Architektur erweitert werden, um den Stil von Videos zu ändern.
GUI Design for Social Signal Processing Framework
In this thesis, a functional and appealing graphical user interface for an existing Python framework that was developed at our lab is to be conceptualized and implemented. For this purpose, current developments and research work in the field of user design and user experience are to be included in the conception.
Teaching
name | semester | type |
---|---|---|
Seminar Grundlagen der Generativen Künstlichen Intelligenz | winter semester 2024/25 | Seminar |
Seminar Generative Künstliche Intelligenz | winter semester 2024/25 | Seminar |
Übung zu Generative AI for Human-Computer Interaction Lab | winter semester 2024/25 | Übung |
Praktikum Spieleprogrammierung | winter semester 2024/25 | Praktikum |
Generative AI for Human-Computer Interaction Lab | winter semester 2024/25 | Vorlesung |
Publications
2024 |
Johanna Holzinger, Alexander Heimerl, Ruben Schlagowski, Elisabeth André and Silvan Mertes. 2024. A machine learning-driven interactive training system for extreme vocal techniques. In Luca Andrea Ludovico, Davide Andrea Mauro (Eds.). AM '24: proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures, September 18-20, 2024, Milan, Italy. ACM, New York, NY, 348-354 DOI: 10.1145/3678299.3678334 |
Fabio Hellmann, Elisabeth André, Mohamed Benouis, Benedikt Buchner and Silvan Mertes. 2024. Anonymization of faces: technical and legal perspectives. Datenschutz und Datensicherheit - DuD 48, 6, 364-367. DOI: 10.1007/s11623-024-1938-6 |
Fabio Hellmann, Silvan Mertes, Mohamed Benouis, Alexander Hustinx, Tzung-Chien Hsieh, Cristina Conati, Peter Krawitz and Elisabeth André. in press. GANonymization: a GAN-based face anonymization framework for preserving emotional expressions. ACM Transactions on Multimedia Computing, Communications, and Applications 3641107. DOI: 10.1145/3641107 |
Pol van Rijn, Silvan Mertes, Kathrin Janowski, Katharina Weitz, Nori Jacoby and Elisabeth André. 2024. Giving robots a voice: human-in-the-loop voice creation and open-ended labeling. In Florian Floyd Mueller, Penny Kyburz, Julie R. Williamson, Corina Sas, Max L. Wilson, Phoebe Toups Dugas, Irina Shklovski (Eds.). CHI '24: proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, May 11-16, 2024. Association for Computing Machinery (ACM), New York, NY, 584 DOI: 10.1145/3613904.3642038 |
Silvan Mertes, Tobias Huber, Christina Karle, Katharina Weitz, Ruben Schlagowski, Cristina Conati and Elisabeth André. in press. Relevant irrelevance: generating alterfactual explanations for image classifiers. preprint. DOI: 10.48550/arXiv.2405.05295 |
Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schön, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Müller, Silvan Mertes, Niklas Schröter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis, Ngoc Dung Huynh, Imran Razzak, Reda Bouadjenek, Mario Verdicchio, Pasquale Borrelli, Marco Aiello, James A. Meakin, Alexander Lemm, Christoph Russ, Razvan Ionasec, Nikos Paragios, Bram van Ginneken and Marie-Pierre Revel Dubios. 2024. The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data. Medical Image Analysis 97, 103230. DOI: 10.1016/j.media.2024.103230 |
Ruben Schlagowski, Maurizio Volanti, Katharina Weitz, Silvan Mertes, Johanna Kuch and Elisabeth André. 2024. The feeling of being classified: raising empathy and awareness for AI bias through perspective-taking in VR. Frontiers in Virtual Reality 5, 1340250. DOI: 10.3389/frvir.2024.1340250 |
Ruben Schlagowski, Silvan Mertes, Dariia Nazarenko, Alexander Dauber and Elisabeth André. 2024. XR composition in the wild: the impact of user environments on creativity, UX and flow during music production in augmented reality. In Luca Andrea Ludovico, Davide Andrea Mauro (Eds.). AM '24: proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures, September 18-20, 2024, Milan, Italy. ACM, New York, NY, 152-161 DOI: 10.1145/3678299.3678314 |
2023 |
Silvan Mertes, Marcel Strobl, Ruben Schlagowski and Elisabeth André. 2023. ASMRcade: interactive audio triggers for an autonomous sensory meridian response. In Elisabeth André, Mohamed Chetouani, Dominique Vaufreydaz, Gale Lucas, Tanja Schultz, Louis-Philippe Morency, Alessandro Vinciarelli (Eds.). ICMI '23: proceedings of the 25th International Conference on Multimodal Interaction, October 9-13, 2023, Paris, France. ACM, New York, NY, 70-78 DOI: 10.1145/3577190.3614155 |
Andreas Triantafyllopoulos, Bjorn W. Schuller, Gokce Iymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu and Jianhua Tao. 2023. An overview of affective speech synthesis and conversion in the deep learning era. Proceedings of the IEEE 111, 10, 1355-1381. DOI: 10.1109/jproc.2023.3250266 |
Tobias Huber, Maximilian Demmler, Silvan Mertes, Matthew Olson and Elisabeth Andrè. 2023. GANterfactual-RL: understanding reinforcement learning agents' strategies through visual counterfactual explanations. In Noa Agmon, Bo An, Alessandro Ricci, William Yeoh (Eds.). AAMAS '23: proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 29 May - 2 June 2023, London, United Kingdom. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1097-1106 |
Dominik Mueller, Silvan Mertes, Niklas Schroeter, Fabio Hellmann, Miriam Elia, Bernhard Bauer, Wolfgang Reif, Elisabeth André and Frank Kramer. 2023. Towards automated COVID-19 presence and severity classification. In Maria Hägglund, Madeleine Blusi, Stefano Bonacina, Lina Nilsson, Inge Cort Madsen, Sylvia Pelayo, Anne Moen, Arriel Benis, Lars Lindsköld and Parisis Gallos (Ed.). Caring is sharing – exploiting the value in data for health and innovation. IOS Press, Amsterdam (Studies in Health Technology and Informatics ; 302), 917-921. DOI: 10.3233/shti230309 |
Ruben Schlagowski, Dariia Nazarenko, Yekta Said Can, Kunal Gupta, Silvan Mertes, Mark Billinghurst and Elisabeth André. 2023. Wish you were here: mental and physiological effects of remote music collaboration in mixed reality. In Albrecht Schmidt, Kaisa Väänänen, Tesh Goyal, Per Ola Kristensson, Anicia Peters, Stefanie Mueller, Julie R. Williamson, Max L. Wilson (Eds.). Chi '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, April 23 - 28, 2023. ACM, New York, NY, 102 DOI: 10.1145/3544548.3581162 |
2022 |
Alexander Heimerl, Silvan Mertes, Tanja Schneeberger, Tobias Baur, Ailin Liu, Linda Becker, Nicolas Rohleder, Patrick Gebhard and Elisabeth André. in press. "GAN I hire you?" - A system for personalized virtual job interview training. preprint. DOI: 10.48550/arXiv.2206.03869 |
Silvan Mertes, Christina Karle, Tobias Huber, Katharina Weitz, Ruben Schlagowski and Elisabeth André. in press. Alterfactual explanations: the relevance of irrelevance for explaining AI systems. preprint. DOI: 10.48550/arXiv.2207.09374 |
Ruben Schlagowski, Fabian Wildgrube, Silvan Mertes, Ceenu George and Elisabeth André. 2022. Flow with the beat! Human-centered design of virtual environments for musical creativity support in VR. In C&C '22: Creativity and Cognition, Venice, Italy, June 20-23, 2022. ACM, New York, NY, 428-442 DOI: 10.1145/3527927.3532799 |
Silvan Mertes, Tobias Huber, Katharina Weitz, Alexander Heimerl and Elisabeth André. 2022. GANterfactual - counterfactual explanations for medical non-experts using generative adversarial learning. Frontiers in Artificial Intelligence 5, 825565. DOI: 10.3389/frai.2022.825565 |
Alexander Heimerl, Silvan Mertes, Tanja Schneeberger, Tobias Baur, Ailin Liu, Linda Becker, Nicolas Rohleder, Patrick Gebhard and Elisabeth André. 2022. Generating personalized behavioral feedback for a virtual job interview training system through adversarial learning. Lecture Notes in Computer Science 13355, 679-684. DOI: 10.1007/978-3-031-11644-5_67 |
Ruben Schlagowski, Kunal Gupta, Silvan Mertes, Mark Billinghurst, Susanne Metzner and Elisabeth André. 2022. Jamming in MR: towards real-time music collaboration in mixed reality. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 12-16 March 2022, Christchurch, New Zealand (virtual event). IEEE, Piscataway, NJ, 854-855. DOI: 10.1109/vrw55335.2022.00278 |
2021 |
Alice Baird, Silvan Mertes, Manuel Milling, Lukas Stappen, Thomas Wiest, Elisabeth André and Björn W. Schuller. 2021. A prototypical network approach for evaluating generated emotional speech. In Hynek Heřmanský, Honza Černocký, Lukáš Burget, Lori Lamel, Odette Scharenborg and Petr Motlicek (Ed.). Interspeech 2021, Brno, Czechia, 30 August - 3 September 2021. ISCA, Baixas, 3161-3165. DOI: 10.21437/interspeech.2021-1123 |
Dominik Schiller, Silvan Mertes, Pol van Rijn and Elisabeth André. 2021. Analysis by synthesis: using an expressive TTS model as feature extractor for paralinguistic speech classification. In Hynek Heřmanský, Honza Černocký, Lukáš Burget, Lori Lamel, Odette Scharenborg and Petr Motlicek (Ed.). Interspeech 2021, Brno, Czechia, 30 August - 3 September 2021. ISCA, Baixas, 486-490. DOI: 10.21437/interspeech.2021-1587 |
Silvan Mertes, Florian Lingenfelser, Thomas Kiderle, Michael Dietz, Lama Diab and Elisabeth André. 2021. Continuous emotions: exploring label interpolation in conditional generative adversarial networks for face generation. In Ana Fred, Carlo Sansone and Kurosh Madani (Ed.). Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, July 7-9, 2021. SciTePress, Setúbal, 132-139. DOI: 10.5220/0010549401320139 |
Tobias Huber, Silvan Mertes, Stanislava Rangelova, Simon Flutura and Elisabeth André. 2021. Dynamic difficulty adjustment in virtual reality exergames through experience-driven procedural content generation. In Keeley Crockett, Sanaz Mostaghim, Dipti Srinivasan and Anna Wilbik (Ed.). 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 5-7 December 2021, Orlando, FL, USA. IEEE, Piscataway, NJ, 1-8. DOI: 10.1109/ssci50451.2021.9660086 |
Pol van Rijn, Silvan Mertes, Dominik Schiller, Peter M. C. Harrison, Pauline Larrouy-Maestri, Elisabeth André and Nori Jacoby. 2021. Exploring emotional prototypes in a high dimensional TTS latent space. In Hynek Heřmanský, Honza Černocký, Lukáš Burget, Lori Lamel, Odette Scharenborg and Petr Motlicek (Ed.). Interspeech 2021, Brno, Czechia, 30 August - 3 September 2021. ISCA, Baixas, 3870-3874. DOI: 10.21437/interspeech.2021-1538 |
Silvan Mertes, Thomas Kiderle, Ruben Schlagowski, Florian Lingenfelser and Elisabeth André. 2021. On the potential of modular voice conversion for virtual agents. In 2021 9th International Conference on Affective Computing and Intelligent Interaction, Workshops and Demos (ACIIW), 28 September – 1 October, 2021, Virtual Event, Nara, Japan. IEEE, Piscataway, NJ, 1-7 DOI: 10.1109/ACIIW52867.2021.9666349 |
Thomas Kiderle, Hannes Ritschel, Kathrin Janowski, Silvan Mertes, Florian Lingenfelser and Elisabeth André. 2021. Socially-aware personality adaptation. In 2021 9th International Conference on Affective Computing and Intelligent Interaction, Workshops and Demos (ACIIW), 28 September – 1 October, 2021, Virtual Event, Nara, Japan. IEEE, Piscataway, NJ, 1-8 DOI: 10.1109/ACIIW52867.2021.9666197 |
Ruben Schlagowski, Silvan Mertes and Elisabeth André. 2021. Taming the chaos: exploring graphical input vector manipulation user interfaces for GANs in a musical context. In AM '21: Audio Mostly 2021, virtual/Trento, Italy, September 1-3, 2021. ACM, New York, NY (International Conference Proceeding Series (ICPS)), 216-223. DOI: 10.1145/3478384.3478411 |
2020 |
Silvan Mertes, Alice Baird, Dominik Schiller, Björn Schuller and Elisabeth André. 2020. An evolutionary-based generative approach for audio data augmentation. In Atanas Gotchev, Dong Tian and Joao Ascenso (Ed.). 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), 21-24 Sept. 2020, Tampere, Finland. IEEE, Piscataway, NJ, 1-6. DOI: 10.1109/mmsp48831.2020.9287156 |
Silvan Mertes, Andreas Margraf, Christoph Kommer, Steffen Geinitz and Elisabeth André. 2020. Data augmentation for semantic segmentation in the context of carbon fiber defect detection using adversarial learning. In Ana Fred and Kurosh Madani (Ed.). Proceedings of the 1st International Conference on Deep Learning Theory and Applications - Volume 1: DeLTA, July 8-10, 2020. SciTePress, Setúbal, 59-67. DOI: 10.5220/0009823500590067 |
Dominik Schiller, Silvan Mertes and Elisabeth André. 2020. Embedded emotions - a data driven approach to learn transferable feature representations from raw speech input for emotion recognition. preprint. |
2019 |
Hannes Ritschel, Ilhan Aslan, Silvan Mertes, Andreas Seiderer and Elisabeth André. 2019. Personalized synthesis of intentional and emotional non-verbal sounds for social robots. In 8th International Conference on Affective Computing & Intelligent Interaction (ACII 2019), Cambridge, UK, 3-6 September 2019. IEEE, Piscataway, NJ, 1-7. DOI: 10.1109/ACII.2019.8925487 |