Cognitive neuroscience inspired techniques for eXplainable AI
(BMBF, 2019-2023)
The CogXAI project aimed to improve the explainability and transparency of deep neural networks (DNNs) by transferring methods and insights from cognitive neuroscience into AI research. The project pursued two complementary goals: the creation of post‑hoc explanation methods inspired by cognitive neuroscience, and the design of inherently interpretable neural‑network architectures. The first goal led to the introduction of Neuron Activation Profiles (NAPs). A NAP records how a network responds to distinct groups of inputs, allowing researchers to compare activation patterns across classes or conditions. Because a NAP aggregates responses over many examples, it provides a global explanation that does not rely on visualisation of individual inputs. The project also developed a visualisation technique for these profiles, inspired by brain‑activity maps. By re‑ordering neurons according to similarity of activation, the method produces topographic activation maps that display the internal state of a network in a spatially organised manner, facilitating intuitive interpretation of hidden layers.
The second goal involved translating principles from predictive coding and active inference into neural‑network design. Our team produced predictive‑coding KNNs that employ exact error‑backpropagation without requiring a global error signal. In deeper variants, the architecture enforces a strictly local information flow, so that each layer can be inspected independently. These designs enable a new form of interpretability: local error signals and layer‑wise activations can be examined without reference to the entire network. In addition, the project explored active‑learning and planning models that adaptively adjust their internal representations during inference, further aligning network behaviour with cognitive processes.
In summary, CogXAI advanced the explainability of deep neural networks by combining cognitive‑neuroscience‑inspired analysis tools with novel, locally interpretable architectures. Beyond fundamental research, the project maintained a strong practical focus through collaborations with associated industry partners in two high-impact domains: speech assistance systems (Fraunhofer IIS) and autonomous driving (Motor Ai GmbH).
Deep Neural Networks (DNNs) are successful but work as black-boxes. Elucidating their inner workings is crucial but a difficult task. In this work, we investigate how activity and confidence of a DNN relate in a simple Multi-Layer Perceptron. Further, we observe how activity, confidence and their relation develop during model training. For ease of visual comparison, we use a technique to display DNN activity as topographic maps, similar to common visualization of brain activity. Our results indicate that activity becomes stronger and distinguished both with training time and confidence.
@inbook{Krug2025,author={Krug, Valerie and Olson, Christopher and Stober, Sebastian},pages={341--351},publisher={Springer Nature Switzerland},title={Relation of Activity and Confidence When Training Deep Neural Networks},year={2025},isbn={9783031746277},booktitle={Machine Learning and Principles and Practice of Knowledge Discovery in Databases},doi={10.1007/978-3-031-74627-7_27},issn={1865-0937}}
Investigating Inclusivity of Whisper for Dysfluent Speech
Evelyn
Starzew, Suhita
Ghosh, and Valerie
Krug
In 12th edition of the Disfluency in Spontaneous Speech Workshop (DiSS 2025), Sep 2025
Speech recognition models have gained popularity in the last couple of years and are able to achieve remarkable performance. However, the under-representation of pathological speech in the training data leads to significant performance drops for many state-of-the-art models on pathological speech. In our work, we investigate the inclusivity of the pre-trained Whisper model in its base variant using dysarthric speech as a use case. We aim to identify potential inequalities and whether they can be reduced through fine-tuning. For this, we compare embedding-based and attention-based representations of healthy and dysarthric samples and analyze the development of the layers’ representational capacities. Our key findings are that there are clear inequalities in the performance and computation of representations, which can be reduced significantly in automatic speech recognition by 73.44% WER through the adaptation to dysarthric speech by fine-tuning.
@inproceedings{Starzew2025,author={Starzew, Evelyn and Ghosh, Suhita and Krug, Valerie},booktitle={12th edition of the Disfluency in Spontaneous Speech Workshop (DiSS 2025)},title={Investigating Inclusivity of Whisper for Dysfluent Speech},year={2025},month=sep,pages={77--81},publisher={ISCA},series={diss_2025},collection={diss_2025},doi={10.21437/diss.2025-16}}
Intersectional Bias Quantification in Facial Image Processing with Pre-Trained ImageNet Classifiers
Valerie
Krug, Florian
Röhrbein, and Sebastian
Stober
In 2025 International Joint Conference on Neural Networks (IJCNN), Jun 2025
Deep Learning models have achieved significant success, often facilitated by transfer learning. This involves using pre-trained models as a basis for new tasks. However, this practice carries the risk of propagating biases that are present in the original training data. In this study, we examine biases related to the protected attributes of "race", "age", and "gender" in several pre-trained classifiers that were trained on the widely used ImageNet dataset. Our analysis emphasizes intersectionality, exploring how interactions between these attributes influence biases. We introduce and employ a novel, model-agnostic approach to analyze biases in the representations of pre-trained deep neural networks through activation similarity-based clustering, with a focus on intersectionality. Our results suggest that, regardless of the specific model, ImageNet classifiers representations strongly reflect age information, cluster certain ethnic groups, and differentiate genders in middle-aged individuals.
@inproceedings{Krug2025a,author={Krug, Valerie and Röhrbein, Florian and Stober, Sebastian},booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},title={Intersectional Bias Quantification in Facial Image Processing with Pre-Trained ImageNet Classifiers},year={2025},month=jun,pages={1--8},publisher={IEEE},doi={10.1109/ijcnn64981.2025.11228524}}
Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models
Deep Learning models have achieved remarkable success. Training them is often accelerated by building on top of pre-trained models which poses the risk of perpetuating encoded biases. Here, we investigate biases in the representations of commonly used ImageNet classifiers for facial images while considering intersections of sensitive variables age, race and gender. To assess the biases, we use linear classifier probes and visualize activations as topographic maps. We find that representations in ImageNet classifiers particularly allow differentiation between ages. Less strongly pronounced, the models appear to associate certain ethnicities and distinguish genders in middle-aged groups.
@article{Krug2025b,author={Krug, Valerie and Stober, Sebastian},journal={arXiv},title={Assessing Intersectional Bias in Representations of Pre-Trained Image Recognition Models},year={2025},copyright={Creative Commons Attribution 4.0 International},doi={10.48550/ARXIV.2506.03664},publisher={arXiv}}
2024
Exploration of Interpretability Techniques for Deep COVID-19 Classification Using Chest X-ray Images
Soumick
Chatterjee, Fatima
Saad, Chompunuch
Sarasaen, Suhita
Ghosh, Valerie
Krug, Rupali
Khatun, Rahul
Mishra, Nirja
Desai, Petia
Radeva, Georg
Rose, Sebastian
Stober, Oliver
Speck, and Andreas
Nürnberger
The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread, and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing of infected patients. Medical imaging, such as X-ray and computed tomography (CT), combined with the potential of artificial intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article, five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their ensemble, using majority voting, have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods—occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT—and using a global technique—neuron activation profiles. The mean micro F1 score of the models for COVID-19 classifications ranged from 0.66 to 0.875, and was 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.
@article{Chatterjee2024,author={Chatterjee, Soumick and Saad, Fatima and Sarasaen, Chompunuch and Ghosh, Suhita and Krug, Valerie and Khatun, Rupali and Mishra, Rahul and Desai, Nirja and Radeva, Petia and Rose, Georg and Stober, Sebastian and Speck, Oliver and Nürnberger, Andreas},journal={Journal of Imaging},title={Exploration of Interpretability Techniques for Deep COVID-19 Classification Using Chest X-ray Images},year={2024},issn={2313-433X},month=feb,number={2},pages={45},volume={10},doi={10.3390/jimaging10020045},publisher={MDPI AG}}
2023
Visualizing Deep Neural Networks with Topographic Activation Maps
Valerie
Krug, Raihan Kabir
Ratul, Christopher
Olson, and Sebastian
Stober
In HHAI 2023: Augmenting Human Intellect, Jun 2023
@inbook{Krug2023hhai,author={Krug, Valerie and Ratul, Raihan Kabir and Olson, Christopher and Stober, Sebastian},publisher={IOS Press},title={Visualizing Deep Neural Networks with Topographic Activation Maps},year={2023},isbn={9781643683959},month=jun,booktitle={HHAI 2023: Augmenting Human Intellect},doi={10.3233/faia230080},issn={1879-8314}}
Visualizing Bias in Activations of Deep Neural Networks as Topographic Maps
Valerie
Krug, Christopher
Olson, and Sebastian
Stober
In Proceedings of the 1st Workshop on Fairness and Bias in AI (AEQUITAS 2023) co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland, 2023
Deep Neural Networks (DNNs) are successful but work as black-boxes. Elucidating their inner workings is crucial as DNNs are prone to reproducing data biases and potentially harm underrepresented or historically discriminated demographic groups. In this work, we demonstrate an approach for visualizing DNN activations that facilitates to visually detect biases in learned representations. This approach displays activations as topographic maps, similar to common visualization of brain activity. In addition to visual inspection of activations, we evaluate different measures to quantify the quality of the topographic maps. With visualization and measurement of quality, we provide qualitative and quantitative means for investigating bias in representations and demonstrate this for activations of a pre-trained image recognition model when processing images of peoples’ faces. We find biases for different sensitive variables, particularly in deeper layers of the investigated DNN, and support the subjective evaluation with a quantitative measure of visual quality.
@inproceedings{krug2023aequitas,author={Krug, Valerie and Olson, Christopher and Stober, Sebastian},booktitle={Proceedings of the 1st Workshop on Fairness and Bias in AI (AEQUITAS 2023) co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland},title={Visualizing Bias in Activations of Deep Neural Networks as Topographic Maps},year={2023},publisher={CEUR-WS},url={http://ceur-ws.org/Vol-3523/}}
Relation of Activity and Confidence when Training Deep Neural Networks
Valerie
Krug, Christopher
Olson, and Sebastian
Stober
In Uncertainty meets Explainability, Workshop at ECML-PKDD 2023, Torino, Italy, 2023
Deep Neural Networks (DNNs) are successful but work as black-boxes. Elucidating their inner workings is crucial but a difficult task. In this work, we investigate how activity and confidence of a DNN relate in a simple Multi-Layer Perceptron. Further, we observe how activity, confidence and their relation develop during model training. For ease of visual comparison, we use a technique to display DNN activity as topographic maps, similar to common visualization of brain activity. Our results indicate that activity becomes stronger and distinguished both with training time and confidence.
@inproceedings{krug2023ecml,author={Krug, Valerie and Olson, Christopher and Stober, Sebastian},booktitle={Uncertainty meets Explainability, Workshop at ECML-PKDD 2023, Torino, Italy},title={Relation of Activity and Confidence when Training Deep Neural Networks},year={2023}}
Visualizing Deep Neural Networks with Topographic Activation Maps
Valerie
Krug, Raihan Kabir
Ratul, Christopher
Olson, and Sebastian
Stober
In VeriLearn 2023: Workshop on Verifying Learning AI Systems, co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland, 2023
Machine Learning with Deep Neural Networks (DNNs) has become successful in solving tasks across various fields of ap- plication. However, the complexity of DNNs makes it difficult to understand how they solve their learned task. We research techniques to layout neurons of DNNs in a two-dimensional space such that neurons of similar activity are in the vicinity of each other. This allows to visualize DNN activations as topographic maps similar to how brain activity is commonly displayed. Our novel visualization technique improves the transparency of DNN-based decision-making systems and is interpretable without expert knowledge in Machine Learning
@inproceedings{krug2023verilearn,author={Krug, Valerie and Ratul, Raihan Kabir and Olson, Christopher and Stober, Sebastian},booktitle={VeriLearn 2023: Workshop on Verifying Learning AI Systems, co-located with 26th European Conference on Artificial Intelligence (ECAI 2023) Kraków, Poland},title={Visualizing Deep Neural Networks with Topographic Activation Maps},year={2023},url={https://dtai.cs.kuleuven.be/events/VeriLearn2023/papers/VeriLearn23_paper_12.pdf}}
2022
Visualizing Deep Neural Networks with Topographic Activation Maps
Andreas
Krug, Raihan Kabir
Ratul, and Sebastian
Stober
Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
@article{Krug2022,author={Krug, Andreas and Ratul, Raihan Kabir and Stober, Sebastian},journal={arXiv preprint arXiv:2204.03528},title={Visualizing Deep Neural Networks with Topographic Activation Maps},year={2022},month=apr,archiveprefix={arXiv},eprint={2204.03528},primaryclass={cs.LG},url={http://arxiv.org/abs/2204.03528}}
CogXAI ANNalyzer: Cognitive Neuroscience Inspired Techniques for eXplainable AI
Maral
Ebrahimzadeh, Valerie
Krug, and Sebastian
Stober
In 23rd International Society for Music Information Retrieval Conference (ISMIR’22) - Late Breaking & Demo Papers, 2022
Over the past few years, deep Artificial Neural Networks (ANNs) have become more popular due to their great success in various tasks. However, their improvements made them more capable but less interpretable. To overcome this issue, some introspection techniques have been proposed. According to the fact that ANNs are inspired by human brains, we adapt techniques from cognitive neuroscience to easier interpret them. Our approach first computes characteristic network responses for groups of input examples, for example, relating to a specific error. We then use these to compare network responses between different groups. To this end, we compute representational similarity and we visualize the activations as topographic activation maps. In this work, we present a graphical user interface called CogXAI ANNalyzer to easily apply our techniques to trained ANNs and to interpret their results. Further, we demonstrate our tool using an audio ANN for speech recognition.
@inproceedings{Ebrahimzadeh2022ismirlbd,author={Ebrahimzadeh, Maral and Krug, Valerie and Stober, Sebastian},booktitle={23rd International Society for Music Information Retrieval Conference (ISMIR'22) - Late Breaking \& Demo Papers},title={CogXAI ANNalyzer: Cognitive Neuroscience Inspired Techniques for eXplainable AI},year={2022},url={https://archives.ismir.net/ismir2022/latebreaking/000050.pdf}}
Generalized Predictive Coding: Bayesian Inference in Static and Dynamic Models
André
Ofner, Beren
Millidge, and Sebastian
Stober
In NeurIPS 2022 Workshop on Shared Visual Representations in Human & Machine Intelligence (SVRHM’22), 2022
Predictive coding networks (PCNs) have an inherent degree of biological plausibility and can perform approximate backpropagation of error in supervised learning settings. However, it is less clear how predictive coding compares to state-of-the-art architectures, such as VAEs, in unsupervised and probabilistic settings. We propose a PCN that, inspired by generalized predictive coding in neuroscience, parameterizes hierarchical distributions of latent states under the Laplace approximation and maximises model evidence via iterative inference using locally computed error signals. Unlike its inspiration it uses multi-layer neural networks with nonlinearities between latent distributions. We compare our model to VAE and VLAE baselines on three different image datasets and find that generalized predictive coding shows performance comparable to variational autoencoders trained with exact error backpropagation. Finally, we investigate the possibility of learning temporal dynamics via static prediction by encoding sequential observations in generalized coordinates of motion.
@inproceedings{ofner2022svrhm,author={Ofner, Andr{\'e} and Millidge, Beren and Stober, Sebastian},booktitle={NeurIPS 2022 Workshop on Shared Visual Representations in Human & Machine Intelligence (SVRHM'22)},title={Generalized Predictive Coding: Bayesian Inference in Static and Dynamic Models},year={2022},url={https://openreview.net/forum?id=qaT_CByg1X5}}
2021
Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles
Andreas
Krug, Maral
Ebrahimzadeh, Jost
Alemann, Jens
Johannsmeier, and Sebastian
Stober
Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.
@article{Krug2021,author={Krug, Andreas and Ebrahimzadeh, Maral and Alemann, Jost and Johannsmeier, Jens and Stober, Sebastian},journal={Electronics},title={Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles},year={2021},month=jun,number={11},pages={1350},volume={10},doi={10.3390/electronics10111350},publisher={{MDPI} {AG}}}
Predictive coding, precision and natural gradients
Andre
Ofner, Raihan Kabir
Ratul, Suhita
Ghosh, and Sebastian
Stober
There is an increasing convergence between biologically plausible computational models of inference and learning with local update rules and the global gradient-based optimization of neural network models employed in machine learning. One particularly exciting connection is the correspondence between the locally informed optimization in predictive coding networks and the error backpropagation algorithm that is used to train state-of-the-art deep artificial neural networks. Here we focus on the related, but still largely under-explored connection between precision weighting in predictive coding networks and the Natural Gradient Descent algorithm for deep neural networks. Precision-weighted predictive coding is an interesting candidate for scaling up uncertainty-aware optimization – particularly for models with large parameter spaces – due to its distributed nature of the optimization process and the underlying local approximation of the Fisher information metric, the adaptive learning rate that is central to Natural Gradient Descent. Here, we show that hierarchical predictive coding networks with learnable precision indeed are able to solve various supervised and unsupervised learning tasks with performance comparable to global backpropagation with natural gradients and outperform their classical gradient descent counterpart on tasks where high amounts of noise are embedded in data or label inputs. When applied to unsupervised auto-encoding of image inputs, the deterministic network produces hierarchically organized and disentangled embeddings, hinting at the close connections between predictive coding and hierarchical variational inference.
@article{Ofner2021,author={Ofner, Andre and Ratul, Raihan Kabir and Ghosh, Suhita and Stober, Sebastian},journal={arXiv preprint arXiv:2111.06942},title={Predictive coding, precision and natural gradients},year={2021},month=nov,archiveprefix={arXiv},eprint={2111.06942},primaryclass={cs.LG},url={http://arxiv.org/abs/2111.06942}}
PredProp: Bidirectional Stochastic Optimization with Precision Weighted Predictive Coding
We present PredProp, a method for bidirectional, parallel and local optimisation of weights, activities and precision in neural networks. PredProp jointly addresses inference and learning, scales learning rates dynamically and weights gradients by the curvature of the loss function by optimizing prediction error precision. PredProp optimizes network parameters with Stochastic Gradient Descent and error forward propagation based strictly on prediction errors and variables locally available to each layer. Neighboring layers optimise shared activity variables so that prediction errors can propagate forward in the network, while predictions propagate backwards. This process minimises the negative Free Energy, or evidence lower bound of the entire network. We show that networks trained with PredProp resemble gradient based predictive coding when the number of weights between neighboring activity variables is one. In contrast to related work, PredProp generalizes towards backward connections of arbitrary depth and optimizes precision for any deep network architecture. Due to the analogy between prediction error precision and the Fisher information for each layer, PredProp implements a form of Natural Gradient Descent. When optimizing DNN models, layer-wise PredProp renders the model a bidirectional predictive coding network. Alternatively DNNs can parameterize the weights between two activity variables. We evaluate PredProp for dense DNNs on simple inference, learning and combined tasks. We show that, without an explicit sampling step in the network, PredProp implements a form of variational inference that allows to learn disentangled embeddings from low amounts of data and leave evaluation on more complex tasks and datasets to future work.
@article{Ofner2021a,author={Ofner, André and Stober, Sebastian},journal={arXiv preprint arXiv:2111.08792},title={PredProp: Bidirectional Stochastic Optimization with Precision Weighted Predictive Coding},year={2021},month=nov,archiveprefix={arXiv},eprint={2111.08792},primaryclass={cs.LG},url={http://arxiv.org/abs/2111.08792}}
This paper deals with differentiable dynamical models congruent with neural process theories that cast brain function as the hierarchical refinement of an internal generative model explaining observations. Our work extends existing implementations of gradient-based predictive coding with automatic differentiation and allows to integrate deep neural networks for non-linear state parameterization. Gradient-based predictive coding optimises inferred states and weights locally in for each layer by optimising precision-weighted prediction errors that propagate from stimuli towards latent states. Predictions flow backwards, from latent states towards lower layers. The model suggested here optimises hierarchical and dynamical predictions of latent states. Hierarchical predictions encode expected content and hierarchical structure. Dynamical predictions capture changes in the encoded content along with higher order derivatives. Hierarchical and dynamical predictions interact and address different aspects of the same latent states. We apply the model to various perception and planning tasks on sequential data and show their mutual dependence. In particular, we demonstrate how learning sampling distances in parallel address meaningful locations data sampled at discrete time steps. We discuss possibilities to relax the assumption of linear hierarchies in favor of more flexible graph structure with emergent properties. We compare the granular structure of the model with canonical microcircuits describing predictive coding in biological networks and review the connection to Markov Blankets as a tool to characterize modularity. A final section sketches out ideas for efficient perception and planning in nested spatio-temporal hierarchies.
@article{Ofner2021b,author={Ofner, André and Stober, Sebastian},journal={arXiv preprint arXiv:2112.0337},title={Differentiable Generalised Predictive Coding},year={2021},month=dec,archiveprefix={arXiv},eprint={2112.03378},primaryclass={cs.LG},url={http://arxiv.org/abs/2112.0337}}
Hierarchical Predictive Coding and Interpretable Audio Analysis-Synthesis
André
Ofner, Johannes
Schleiss, and Sebastian
Stober
In 15th International Symposium on Computer Music Multidisciplinary Research (CMMR’21), 2021
Humans efficiently extract relevant information from complex auditory stimuli. Oftentimes, the interpretation of the signal is ambiguous and musical meaning is derived from the subjective context. Predictive processing interpretations of brain function describe subjective music experience driven by hierarchical precisionweighted expectations. There is still a lack of efficient and structurally interpretable machine learning models operating on audio featuring such biological plausibility. We therefore propose a bio-plausible predictive coding model that analyses auditory signals in comparison to a continuously updated differentiable generative model. For this, we discuss and build upon the connections between Infinite Impulse Response filters, Kalman filters, and the inference in predictive coding with local update rules. Our results show that such gradient-based predictive coding is useful for classical digital signal processing applications like audio filtering. We test the model capability on beat tracking and audio filtering tasks and conclude by showing how top-down expectations modulate the activity on lower layers during prediction.
@inproceedings{Ofner2021cmmr,author={Ofner, André and Schleiss, Johannes and Stober, Sebastian},booktitle={15th International Symposium on Computer Music Multidisciplinary Research (CMMR'21)},title={Hierarchical {Predictive} {Coding} and {Interpretable} {Audio} {Analysis}-{Synthesis}},year={2021},url={https://cmmr2021.github.io/proceedings/pdffiles/cmmr2021_25.pdf}}
2020
Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models
Deep Learning based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods have been proposed. Adapting such techniques from computer vision to speech recognition is not straight-forward, because speech data is more complex and less interpretable than image data. In this work, we introduce Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret features and representations in Deep Neural Networks. GradNAPs are characteristic responses of ANNs to particular groups of inputs, which incorporate the relevance of neurons for prediction. We show how to utilize GradNAPs to gain insight about how data is processed in ANNs. This includes different ways of visualizing features and clustering of GradNAPs to compare embeddings of different groups of inputs in any layer of a given network. We demonstrate our proposed techniques using a fully-convolutional ASR model.
@article{Krug2020,author={Krug, Andreas and Stober, Sebastian},journal={arXiv preprint arXiv:2002.08125},title={Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models},year={2020},date={2020-02-19},eprint={2002.08125v1},eprintclass={cs.LG},eprinttype={arXiv},url={http://arxiv.org/abs/2002.08125}}
Balancing Active Inference and Active Learning with Deep Variational Predictive Coding for EEG
André
Ofner and Sebastian
Stober
In IEEE International Conference on Systems, Man, and Cybernetics (SMC 2020), 2020
@inproceedings{ofner2020smc,author={Ofner, André and Stober, Sebastian},booktitle={IEEE International Conference on Systems, Man, and Cybernetics (SMC 2020)},title={Balancing Active Inference and Active Learning with Deep Variational Predictive Coding for {EEG}},year={2020},doi={10.1109/SMC42975.2020.9283147},}
PredNet and Predictive Coding: A Critical Review
Roshan Prakash
Rane, Edit
Szügyi, Vageesh
Saxena, André
Ofner, and Sebastian
Stober
In Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland, 2020
PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspired architecture based on the propagation of prediction error with self-supervised representation learning in video. While the architecture has drawn a lot of attention and various extensions of the model exist, there is a lack of a critical analysis. We fill in the gap by evaluating PredNet both as an implementation of the predictive coding theory and as a self-supervised video prediction model using a challenging video action classification dataset. We design an extended model to test if conditioning future frame predictions on the action class of the video improves the model performance. We show that PredNet does not yet completely follow the principles of predictive coding. The proposed top-down conditioning leads to a performance gain on synthetic data, but does not scale up to the more complex real-world action classification dataset. Our analysis is aimed at guiding future research on similar architectures based on the predictive coding theory.
@inproceedings{rane2020icmr,author={Rane, Roshan Prakash and Sz\"{u}gyi, Edit and Saxena, Vageesh and Ofner, Andr\'{e} and Stober, Sebastian},booktitle={Proceedings of the 2020 International Conference on Multimedia Retrieval},title={PredNet and Predictive Coding: A Critical Review},year={2020},address={New York, NY, USA},pages={233–241},publisher={Association for Computing Machinery},series={ICMR ’20},doi={10.1145/3372278.3390694},isbn={9781450370875},location={Dublin, Ireland},numpages={9},url={https://doi.org/10.1145/3372278.3390694}}
Modeling perception with hierarchical prediction: Auditory segmentation with deep predictive coding locates candidate evoked potentials in EEG
André
Ofner and Sebastian
Stober
In 21st International Society for Music Information Retrieval Conference (ISMIR’20), 2020
The human response to music combines low-level expectations that are driven by the perceptual characteristics of audio with high-level expectations from the context and the listener’s expertise. This paper discusses surprisal based music representation learning with a hierarchical predictive neural network. In order to inspect the cognitive validity of the network’s predictions along their time-scales, we use the network’s prediction error to segment electroencephalograms (EEG) based on the audio signal. For this, we investigate the unsupervised segmentation of audio and EEG into events using the NMED-T dataset on passive natural music listening. The conducted exploratory analysis of EEG at locations connected to peaks in prediction error in the network allowed to visualize auditory evoked potentials connected to local and global musical structures. This indicates the potential of unsupervised predictive learning with deep neural networks as means to retrieve musical structure from audio and as a basis to uncover the corresponding cognitive processes in the human brain.
@inproceedings{ofner2020ismir,author={Ofner, André and Stober, Sebastian},booktitle={21st International Society for Music Information Retrieval Conference (ISMIR'20)},title={Modeling perception with hierarchical prediction: Auditory segmentation with deep predictive coding locates candidate evoked potentials in {EEG}},year={2020},doi={10.5281/zenodo.4245495},url={https://program.ismir2020.net/static/final_papers/219.pdf}}
2019
Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps
Andreas
Krug and Sebastian
Stober
In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2019
The uninformative ordering of artificial neurons in Deep Neural Networks complicates visualizing activations in deeper layers. This is one reason why the internal structure of such models is very unintuitive. In neuroscience, activity of real brains can be visualized by highlighting active regions. Inspired by those techniques, we train a convolutional speech recognition model, where filters are arranged in a 2D grid and neighboring filters are similar to each other. We show, how those topographic filter maps visualize artificial neuron activations more intuitively. Moreover, we investigate, whether this causes phoneme-responsive neurons to be grouped in certain regions of the topographic map.
@inproceedings{krug2019blackboxnlp,author={Krug, Andreas and Stober, Sebastian},title={Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps},booktitle={Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},year={2019},}
Siri visualisiert
Andreas
Krug and Sebastian
Stober
In Proceedings of the 2019 NaWik Symposium Karlsruhe, 2019
@inproceedings{krug2019nawik,author={Krug, Andreas and Stober, Sebastian},title={Siri visualisiert},booktitle={Proceedings of the 2019 NaWik Symposium Karlsruhe},year={2019},pages={24--25},organization={NaWik},}
Hybrid Variational Predictive Coding as a Bridge between Human and Artificial Cognition
Predictive coding and its generalization to active inference offer a unified theory of brain function. The underlying predictive processing paradigmhas gained significant attention in artificial intelligence research for its representation learning and predictive capacity. Here, we suggest that it is possible to integrate human and artificial generative models with a predictive coding network that processes sensations simultaneously with the signature of predictive coding found in human neuroimaging data. We propose a recurrent hierarchical predictive coding model that predicts low-dimensional representations of stimuli, electroencephalogram and physiological signals with variational inference. We suggest that in a shared environment, such hybrid predictive coding networks learn to incorporate the human predictive model in order to reduce prediction error. We evaluate the model on a publicly available EEG dataset of subjects watching one-minute long video excerpts. Our initial results indicate that the model can be trained to predict visual properties such as the amount, distance and motion of human subjects in videos.
@article{ofner2019alife,author={Ofner, André and Stober, Sebastian},journal={The 2019 Conference on Artificial Life},title={Hybrid Variational Predictive Coding as a Bridge between Human and Artificial Cognition},year={2019},number={31},pages={68-69},doi={10.1162/isal\_a\_00142},eprint={https://www.mitpressjournals.org/doi/pdf/10.1162/isal_a_00142},url={https://www.mitpressjournals.org/doi/abs/10.1162/isal_a_00142},}
Knowledge transfer in coupled predictive coding networks
Predictive coding offers a comprehensive explanation of human brain function through prediction error minimisation. This idea has found traction in machine learning, where deterministic and stochastic inference allow efficient representation of sensory signals. Recently, these artificial predictive coding networks have been coupled with the brain as its natural counterpart to develop co-adaptive brain-computer interfaces based on predictive coding as a shared principle.
However, it remains unclear how differences in prior knowledge affect information transfer between the coupled predictive coding networks. To address this question, this study introduces a sequential and hierarchical stochastic predictive coding model where predictions about future sensory states are conditioned on past states and top-down predictive signal for each layer.
Using synthetic visual stimuli, we demonstrate the model‘s capacity to incorporate knowledge from a coupled network by comparing the generated prediction error signature with the corresponding stimulus. Our results show that information from the coupled network aids the functional differentiation and can be used to encode aspects of the stimuli that are not visible to the model itself.
@inproceedings{ofner2019bc,author={Ofner, André and Stober, Sebastian},title={Knowledge transfer in coupled predictive coding networks},booktitle={Bernstein Conference 2019},year={2019},doi={10.12751/nncn.bc2019.0073},}
Predictive Coding Based Vision For Autonomous Cars
Roshan Prakash
Rane, André
Ofner, Shreyas
Gite, and Sebastian
Stober
In recent decades, Predictive Coding has emerged as a unifying theory of human cognition. Related theories in cognitive neuroscience, such as Active Inference and Free Energy Minimization, have demonstrated that Predictive Coding can account for many aspects of human perception and action. However, little work has been done to explore the Predictive Coding framework in the practical domains like computer vision or robotics.
A popular implementation in the field of computer vision that is inspired by Predictive Coding is called the ‘PredNet’. PredNet is trained on videos to perform future frame prediction. In a purely perceptual setup like this, Predictive Coding is defined as a hierarchical generative model that dynamically infers low-dimensional causes from high-dimensional perceptual stimuli. The architecture is trained at each level of it’s hierarchy to learn low-dimensional causal factors from temporal visual data by actively generating top-down predictions or hypotheses and testing them against bottom-up incoming frames or sensory evidence. In our recent work, we inspected the PredNet architecture and found that it fails to emulate and therefore benefit from many core ideas of Predictive Coding. We will highlight these conceptual limitations of PredNet and present preliminary results from our improved Predictive Coding architecture.
Even though our architecture is inspired by PredNet, it differs from it in three main ways: (1) It is designed to perform semantic segmentation which is an important vision task for autonomous driving. The task is to classify pixels of an image as belonging to a semantic category like drivable road, pedestrian or car (2) The top-down predictions represent semantic class maps and not pixel values and (3) It performs not just short-term but also long-term predictions along its hierarchy.
Finally, we compare our architecture’s performance against contemporary deep learning methods for the autonomous driving vision task. We access the semantic segmentation accuracy with an emphasis on the computational efficiency. This includes the model size, amount of training data it needs and the run-time. We also inspect the ability of the model to adjust to differing visual contexts like day time, night time and different weather conditions like rain or snow.
@inproceedings{rane2019comco,author={Rane, Roshan Prakash and Ofner, André and Gite, Shreyas and Stober, Sebastian},booktitle={Computational Cognition 2019 Workshop},title={Predictive Coding Based Vision For Autonomous Cars},year={2019},url={http://www.comco2019.com/abstracts/day1_rane.pdf}}
2018
Neuron Activation Profiles for Interpreting Convolutional Speech Recognition Models
Andreas
Krug, René
Knaebel, and Sebastian
Stober
In NeurIPS 2018 Interpretability and Robustness for Audio, Speech and Language Workshop (IRASL’18), 2018
The increasing complexity of deep Artificial Neural Networks (ANNs) allows to solve complex tasks in various applications. This comes with less understanding of decision processes in ANNs. Therefore, introspection techniques have been proposed to interpret how the network accomplishes its task. Those methods mostly visualize their results in the input domain and often only process single samples. For images, highlighting important features or creating inputs which activate certain neurons is intuitively interpretable. The same introspection for speech is much harder to interpret. In this paper, we propose an alternative method which analyzes neuron activations for whole data sets. Its generality allows application to complex data like speech. We introduce time-independent Neuron Activation Profiles (NAPs) as characteristic network responses to certain groups of inputs. By clustering those time-independent NAPs, we reveal that layers are specific to certain groups. We demonstrate our method for a fully-convolutional speech recognizer. There, we investigate whether phonemes are implicitly learned as an intermediate representation for predicting graphemes. We show that our method reveals, which layers encode phonemes and graphemes and that similarities between phonetic categories are reflected in the clustering of time-independent NAPs.
@inproceedings{krug2018irasl,author={Krug, Andreas and Knaebel, René and Stober, Sebastian},title={Neuron Activation Profiles for Interpreting Convolutional Speech Recognition Models},booktitle={NeurIPS 2018 Interpretability and Robustness for Audio, Speech and Language Workshop (IRASL'18)},year={2018},}
Introspection for Convolutional Automatic Speech Recognition
Andreas
Krug and Sebastian
Stober
In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018
Artificial Neural Networks (ANNs) have experienced great success in the past few years. The increasing complexity of these models leads to less understanding about their decision processes. Therefore, introspection techniques have been proposed, mostly for images as input data. Patterns or relevant regions in images can be intuitively interpreted by a human observer. This is not the case for more complex data like speech recordings. In this work, we investigate the application of common introspection techniques from computer vision to an Automatic Speech Recognition (ASR) task. To this end, we use a model similar to image classification, which predicts letters from spectrograms. We show difficulties in applying image introspection to ASR. To tackle these problems, we propose normalized averaging of aligned inputs (NAvAI): a data-driven method to reveal learned patterns for prediction of specific classes. Our method integrates information from many data examples through local introspection techniques for Convolutional Neural Networks (CNNs). We demonstrate that our method provides better interpretability of letter-specific patterns than existing methods.#pdf#
@inproceedings{krug2018introspection,author={Krug, Andreas and Stober, Sebastian},title={Introspection for Convolutional Automatic Speech Recognition},booktitle={Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP},year={2018},pages={187--199},url={http://www.aclweb.org/anthology/W18-5421}}
We describe a framework of hybrid cognition by formulating a hybrid cognitive agent that performs hierarchical active inference across a human and a machine part. We suggest that, in addition to enhancing human cognitive functions with an intelligent and adaptive interface, integrated cognitive processing could accelerate emergent properties within artificial intelligence. To establish this, a machine learning part learns to integrate into human cognition by explaining away multi-modal sensory measurements from the environment and physiology simultaneously with the brain signal. With ongoing training, the amount of predictable brain signal increases. This lends the agent the ability to self-supervise on increasingly high levels of cognitive processing in order to further minimize surprise in predicting the brain signal. Furthermore, with increasing level of integration, the access to sensory information about environment and physiology is substituted with access to their representation in the brain. While integrating into a joint embodiment of human and machine, human action and perception are treated as the machine’s own. The framework can be implemented with invasive as well as non-invasive sensors for environment, body and brain interfacing. Online and offline training with different machine learning approaches are thinkable. Building on previous research on shared representation learning, we suggest a first implementation leading towards hybrid active inference with non-invasive brain interfacing and state of the art probabilistic deep learning methods. We further discuss how implementation might have effect on the meta-cognitive abilities of the described agent and suggest that with adequate implementation the machine part can continue to execute and build upon the learned cognitive processes autonomously.
@article{ofner2018hai,author={Ofner, André and Stober, Sebastian},title={Hybrid Active Inference},journal={arXiv preprint arXiv:1810.02647},year={2018},archiveprefix={arXiv},eprint={1810.02647},primaryclass={cs.AI},url={https://arxiv.org/abs/1810.02647}}
Towards Bridging Human and Artificial Cognition: Hybrid Variational Predictive Coding of the Physical World, the Body and the Brain
André
Ofner and Sebastian
Stober
In NeurIPS 2018 Workshop on Modeling the Physical World, 2018
Predictive coding and its generalization to active inference offer a unified theory of brain function. The underlying predictive processing paradigm has gained significant attention within the machine learning community for its representation learning and predictive capacity. Here, we suggest that it is possible to integrate human and artificial predictive models with an artificial neural network that learns to predict sensations simultaneously with their representation in the brain. Guided by the principles of active inference, we propose a recurrent hierarchical predictive coding model that jointly predicts stimuli, electroencephalogram and physiological signals under variational inference. We suggest that in a shared environment, the artificial inference process can learn to predict and exploit the human generative model. We evaluate the model on a publicly available dataset of subjects watching one-minute long video excerpts and show that the model can be trained to predict physical properties such as the amount, distance and motion of human subjects in future frames of the videos. Our results hint at the possibility of bi-directional active inference across human and machine.
@inproceedings{ofner2018hpc,author={Ofner, André and Stober, Sebastian},title={Towards Bridging Human and Artificial Cognition: Hybrid Variational Predictive Coding of the Physical World, the Body and the Brain},booktitle={NeurIPS 2018 Workshop on Modeling the Physical World},year={2018},}
2017
Transfer Learning for Speech Recognition on a Budget
Julius
Kunze, Louis
Kirsch, Ilia
Kurenkov, Andreas
Krug, Jens
Johannsmeier, and Sebastian
Stober
In 2n Workshop on Representation Learning for NLP at the Annual Meeting of the Association for Computational Linguistics (ACL’17), 2017
End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network’s weights were sufficient for good performance, especially for inner layers.
@inproceedings{kunzeKKKJS2017acl,author={Kunze, Julius and Kirsch, Louis and Kurenkov, Ilia and Krug, Andreas and Johannsmeier, Jens and Stober, Sebastian},title={Transfer Learning for Speech Recognition on a Budget},booktitle={2n Workshop on Representation Learning for NLP at the Annual Meeting of the Association for Computational Linguistics (ACL'17)},year={2017},url={https://arxiv.org/abs/1706.00290},xposter={rl4nlp2017poster.pdf}}
Adaptation of the Event-Related Potential Technique for Analyzing Artificial Neural Nets
Andreas
Krug and Sebastian
Stober
In Conference on Cognitive Computational Neuroscience (CCN’17), 2017
The increase in complexity of Artificial Neural Nets (ANNs) results in difficulties in understanding what they have learned and how they accomplish their goal. As their complexity becomes closer to the one of the human brain, neuroscientific techniques could facilitate their analysis. This paper investigates an adaptation of the Event-Related Potential (ERP) technique for analyzing ANNs demonstrated for a speech recognizer. Our adaptation involves deriving a large number of recordings (trials) for the same word and averaging the resulting neuron activations. This allows for a systematic analysis of neuron activation to reveal their function in detecting specific letters. We compare those observations between an English and German speech recognizer.
@inproceedings{krug2017ccn,author={Krug, Andreas and Stober, Sebastian},title={Adaptation of the Event-Related Potential Technique for Analyzing Artificial Neural Nets},booktitle={Conference on Cognitive Computational Neuroscience (CCN'17)},year={2017},xposter={ccn2017poster.pdf}}