Bachelorarbeiten
Empathetic Behaviour Analysis with Deep Learning
Description
Empathic behaviour analysis is one of the most overlooked mechanisms in intelligent systems today. It can be defined as a complex process whereby "an observer reacting emotionally because he perceives that another is experiencing or about to experience an emotion”. Future machines should be endowed the ability to behave in an empathic manner, aiming at establishing and maintaining positive and long-term relationships with users.
Task
(a) study the work in the literature where empathetic behaviour is detected automatically in text/speech/video, (b) a preliminary hands-on evaluation of a deep learning approach for empathetic behaviour detection from speech and facial expressions.
Utilises
Python, Tensorflow or pytorch
Requirements
Programming skills in Python
Languages
English
Supervisor
Jing Han (jing.han@informatik.uni-augsburg.de)
Cross-Culture Emotion Recognition in the Wild
Description
Detecting and understanding emotional states of humans automatically is essential to improve the effectiveness of intelligent systems and devices, via providing an affective-based personalised user experience. Thanks to the great developments of machine learning, innovative technologies and algorithms are urging to handle affective information. Yet, in affective computing, there is not a lot of study on emotion recognition under cross-culture or multi-culture scenarios, by taking the effect of culture into account.
Task
(a) survey on the present techniques and related works in this emerging research field, and (b) evaluation on an audiovisual emotional dataset with 6 cultures.
Utilises
Python, Tensorflow or Pytorch
Requirements
Preliminary knowledge of machine learning, good programming skills in Python
Languages
English
Supervisor
Jing Han (jing.han@informatik.uni-augsburg.de)
Audio-Based Depression Recognition App
Description
Depression recognition
Task
Develop an Android application using available machine learning models for depression recognition
Utilises
Android Neural Networks API (NNAPI)
Requirements
Basic programming knowledge
Languages
German, English
Supervisor
Shahin Amiriparian, M. Sc. (shahin.amiriparian@informatik.uni-augsburg.de)
Correlation Between Emotion and Deception
Description
Is deception emotional?
Task
An in-depth analysis of the correlation between emotion and deception
Utilises
-
Requirements
Basic programming knowledge
Languages
English
Supervisor
Shahin Amiriparian, M. Sc. (shahin.amiriparian@informatik.uni-augsburg.de)
Explainable AI for Health Sensing
Description
The success of machine learning research has lead to an increase in potential applications, especially in the health domain. However, many contemporary systems are essentially black boxes; the internal operations determining their outputs are not transparent. Especially in the health domain, those developing machine-learning systems should be able to explain their rationale and characterise their strengths and weaknesses.
Task
Explore the efficacy of different explainable AI techniques with a focus on health
Utilises
Python, potentially deep learning toolkits
Requirements
Machine learning knowledge a plus
Languages
English
Supervisor
Dr. Nicholas Cummins (nicholas.cummins@informatik.uni-augsburg.de)
Deep Learning for Health Sensing
Description
Deep learning has undoubtedly led to improvements in what is possible concerning system accuracy and performance in a range of signal analysis tasks. However, the benefits of contemporary deep learning solutions can provide in the analysis different health states based on audio, visual and/or biosignals are yet to be fully explored.
Task
Application of deep learning to a range of different health detection tasks such as detection of different health states, abnormal heart beat detection & medical image analysis.
Utilises
Python, Tensorflow/Keras
Requirements
Prior machine learning knowledge related programming skills a plus
Languages
English
Supervisor
Dr. Nicholas Cummins (nicholas.cummins@informatik.uni-augsburg.de)
Denoising Audio Signals from in-the-wild Youtube Videos utilising Deep Learning
Description
In recent years, the use of deep learning has rapidly increased in many research areas and industry, pushing the boundaries of automated data analysis. Large data companies (e.g. Google, Facebook) have a huge amount of data to train stable and versatile models and, thus, inspire many fields and architectures in deep learning. In contrast, generic research is tailored to very specific areas, such as emotion recognition, and models have been trained under laboratory conditions on academic datasets to learn domain-specific, valuable features.
The use of large in-the-wild datasets is beneficial for both sides. On the one hand, from a purely research perspective, they enable specific and, at the same time, stable models. On the other hand, industry can transfer pre-trained models, architectures and feature extraction frameworks to new applications. In-the-wild data, however, has a higher granularity and noise than laboratory data. In order to facilitate its use in both sectors, noise and particularly deleterious training influences have to be automatically detected, extracted and removed.
The aim of this study is to adapt one or more deep learning architectures for audio denoising, enhance them for a specific domain and tune them, identifying appropriate parameters. Recently, WaveNet [1] showed promising performance on a similar task [2] and will be analyzed regarding its utilizability. Audio examples [3] and a first implementation [4] are also available. The dataset that will be used in this project comprises Youtube videos capturing emotional car reviews (EmoCaR). Further data , e.g. to add natural noise, are available from the Diverse Environments Multichannel Acoustic Noise Database (DEMAND). Typical noise patterns in the original videos are background music or car sounds.
[1] https://deepmind.com/blog/wavenet-generative-model-raw-audio/
[2] https://arxiv.org/pdf/1706.07162.pdf
[3] http://www.jordipons.me/apps/speech-denoising-wavenet/25.html
[4] https://github.com/drethage/speech-denoising-wavenet
Task
In this thesis, the student(s) will develop a state-of-the-art deep learning audio denoising technique.
Utilises
audio, deep neural networks, WaveNet, encoder-decoder, CNN-based
Requirements
Preliminary knowledge in deep learning and audio processing, good programming skills (e.g. Python, C++).
Languages
German or English
Supervisor
Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)
Unsupervised Topic and Aspect Detection in Spoken Narratives
Description
Extracting the relevant topics and entities of a conversation is an important part of sentiment analysis that wants to categorize the opinions expressed towards a particular topic. Especially when designing data sets with the aim to make sentiment analysis supervised learnable, a prior extraction of the relevant topics is elementary, since only relevant entities and topics that are frequently used are important to annotate. Besides previously known k-mean algorithms on word embeddings, a new form of attention clustering emerged recently (https://www.aclweb.org/anthology/P17-1036) showed good qualitative results and should be further evaluated on linguistic, spoken narrative data
Task
In this work, the student(s) will implement unsupervised topic and aspect detection utilising Attention-Clustering (https://github.com/ruidan/Unsupervised-Aspect-Extraction) and compares the result to previously common k-means on word embeddings on two different sentiment databases (SEWA, EmCaR).
Utilises
Keras, Tensorflow/Theano, Attention Neural Networks
Requirements
Advanced knowledge in machine learning and natural language processing, good programming skills (e.g. Python, C++)
Languages
German or English
Supervisor
Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)
Investigation and Optimization of Annotations for training Neural Networks for Emotion Recognition in Videos
Description
To perform the most common form of deep learning called supervised learning, it requires labels and annotation of the data. These are the prediction target and the training stimulus of the neural network. The neural network is optimized to reduce the distance between the real target (labels, annotation) and the predicted label. For example, in object recognition & localization, squares or polygons frame the objects to be recognized. The network has to learn the most decisive features to predict these.
A big challenge remains the high cost of annotation. In Affective Computing these costs are many times higher compared to images, because a) the data are videos and b) emotions are perceived differently by people, therefore each video has to be labelled by 5 different annotators for the same type of emotion and subsequently, this annotation has to be merged into a single golden label. The continuous annotations are annotated with a joystick while playing the video.
For this reason, during the creation of our last database EmCaR (Emotional Car Reviews) we collected discrete metadata per annotation in addition to the continuous AC annotations.
In this thesis the student designs and implements a method suitable for analysing complex discrete and continuous emotional annotations. In addition, state-of-the-art neural networks (e.g. Transformer) are trained and benchmarked on differently generated annotations (and golden labels).
Task
In this thesis, the student will design and implement a new text representation suitable for a visual input layer. In addition, this text representation will be compared to others like word2vec visual embeddings and traditional word embeddings. For this purpose, a benchmark will be performed on the popular NLP tasks; e.g., Text Sentiment Classification based on Amazon Review 5-class polarity dataset.
Utilises
Tensorflow/Pytorch, Neural networks, Statistical Correlations Methods
Requirements
Fundamental knowledge in Machine Learning and Statistics, Good programming skills (e.g. Python).
Languages
German or English
Supervisor
Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)
Empirical Comparison of Context and Transformer Word Embeddings on Few-Shot Learning Tasks
Description
Google has recently demonstrated a new method to learn word embeddings through transformer networks (BERT – https://github.com/google-research/bert), which obtain state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. Word embeddings are the fundamental feature sets for any NLP task and especially important to avoid elaborate representational learning. Furthermore, deep learning approaches suffer from poor sampling efficiency in contrast to human perception. One- and few-shot learning tries to learn representations from only a few samples and is often used in tasks where only few data and targets are available. Recently, researchers also started to use these techniques on linguistic data.
Task
In this work, the student(s) will bring together two novel ways in NLP by empirically comparing different word embeddings (including BERT) in the context of few-shot learning.
Utilises
NLP, Transformer Word Embeddings, Few-shot learning
Requirements
Advanced knowledge in Machine Learning and Natural Language Processing, Good programming skills (e.g. Python, C++)
Languages
German or English
Supervisor
Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)
Banner and Advertisement Detection and Localisation in YouTube Videos Utilising Pseudo-Supervised Deep Learning
Description
The use of large in-the-wild datasets is beneficial for research and industry. In-the-wild data, however, have a higher granularity and noise than laboratory data. In order to simplify the joint use, noise and particularly disturbing training influences have to be automatically detected, extracted and removed. Data sources such as YouTube, represent a very good data source due to its public availability and extensive content. These videos, however, often include banner highlighting additional information in textual form. These video elements are disturbing training influences that can confuse feature extraction frameworks trained by deep learning models. Removing these banners by hand would be extra effort for the creators, reducing the chance of receiving permission of them to use their videos for research purposes.
The aim of this study is to automatically detect and localise distracting elements in videos utilising SOTA deep learning algorithms. For this purpose, a label generator has to be developed, which projects realistic boxes and texts on random positions and in different sizes into the video. These elements are used as pseudo labels in the subsequent training process. The developed neural network should learn to predict these elements and their position in a video sequence (see Pixel CNNs).
Task
In this thesis, the student(s) will develop a state-of-the-art data generator and deep learning method for banner detection.
Utilises
Advanced Data Augmentation, Video/Image Segmentation/Masking, R-CNN
Requirements
Preliminary knowledge in Deep Learning, Computer Vision, Good programming skills (e.g. Python, C++)
Languages
German or English
Supervisor
Lukas Stappen, M. Sc. (lukas.stappen@informatik.uni-augsburg.de)
Augmentation of Natural Soundscapes
Description
Our daily lives are surrounded by chaotic noise, methods to alter our sonic enviroments are needed urgently. Computational generation approaches for audio are becoming more robust, and offer the chance for emotion-based conditioning of high fidelty audio.
Task
Exploring methods to extract musicality from natural sound environments, to augment the original data source.
Provided with a dataset of emotional soundscapes (i.e. urban/ natural/ mechanical), evaluate meaningful methods to extract musicality/ rhythm/ genre, from the natural data (i.e. chroma features/ comb filters).
Utilises
Python (librosa /madmom)
Requirements
good programming skills (e.g. Python)
Languages
German or English
Supervisor
Alice Baird (alice.baird@informatik.uni-augsburg.de)