My project “Brain-Computer Interaction through Music Imagery” was part of the ongoing effort of the Owen Lab to develop means for communicating with diagnosed “vegetative state” patients who are still able to control their imagination. A nice summary is provided by a 2012 Nature news feature and the following animated video:
Specifically, I explored how music imagination - i.e. imagining listening to specific music pieces - could be used as paradigm for brain-computer interfaces. In this context, I introduced several new deep learning techniques in the context of EEG analysis and cognitive neuroscience in general. Furthermore, I started the OpenMIIR Initiative and published a public-domain dataset of EEG recordings that were taken as part of my research.
@inbook{Ofner2022,author={Ofner, André and Stober, Sebastian},pages={112--122},publisher={Routledge},title={Deep Neural Networks and Auditory Imagery},year={2022},isbn={9780429330070},month=nov,booktitle={Music and Mental Imagery},doi={10.4324/9780429330070-12}}
2018
Decoding Music Perception and Imagination using Deep Learning Techniques
Sebastian
Stober and Avital
Sternin
In Signal Processing and Machine Learning for Brain-Machine Interfaces, 2018
Deep learning is a sub-field of machine learning that has recently gained substantial popularity in various domains such as computer vision, automatic speech recog- nition, natural language processing, and bioinformatics. Deep learning techniques are able to learn complex feature representations from raw signals and thus also have potential to improve signal processing in the context of brain-computer inter- faces (BCIs). However, they typically require large amounts of data for training – much more than what can often be provided with reasonable effort when working with brain activity recordings of any kind. In order to still leverage the power of deep learning techniques with limited available data, special care needs to be taken when designing the BCI task, defining the structure of the deep model, and choosing the training method.
This chapter presents example approaches for the specific scenario of music- based brain-computer interaction through electroencephalography (EEG) – in the hope that these will prove to be valuable in different settings as well. We explain important decisions for the design of the BCI task and their impact on the models and training techniques that can be used. Furthermore, we present and compare various pre-training techniques that aim to improve the signal-to-noise ratio. Finally, we discuss approaches to interpret the trained models.
@inbook{stober2018bcibook,chapter={Decoding Music Perception and Imagination using Deep Learning Techniques},pages={271--299},publisher={IET},year={2018},author={Stober, Sebastian and Sternin, Avital},editor={Tanaka, Toshihisa and Arvaneh, Mahnaz},booktitle={Signal Processing and Machine Learning for Brain-Machine Interfaces},doi={10.1049/PBCE114E},}
Moving Beyond ERP Components: A Selective Review of Approaches to Integrate EEG and Behavior
David A.
Bridwell, James F.
Cavanagh, Anne G.E.
Collins, Michael D.
Nunez, Ramesh
Srinivasan, Sebastian
Stober, and Vince D.
Calhoun
Relationships between neuroimaging measures and behavior provide important clues about brain function and cognition in healthy and clinical populations. While electroencephalograhy (EEG) provides a portable, low cost measure of brain dynamics, it has been somewhat underrepresented in the emerging field of model-based inference. We seek to address this gap in this article by highlighting the utility of linking EEG and behavior, with an emphasis on approaches for EEG analysis that move beyond focusing on peaks or “components” derived from averaging EEG responses across trials and subjects (generating the event-related potential (ERP)). First, we review methods for deriving features from EEG in order to enhance the signal within single-trials. These methods include filtering based on user-defined features (i.e. frequency decomposition, time-frequency decomposition), filtering based on data-driven properties (i.e. blind source separation (BSS)), and generating more abstract representations of data (e.g. using deep learning). We then review cognitive models which extract latent variables from experimental tasks, including the drift diffusion model (DDM) and reinforcement learning approaches. Next, we discuss ways to access associations among these measures, including statistical models, data-driven joint models, and cognitive joint modeling using hierarchical Bayesian models (HBM). We think that these methodological tools are likely to contribute to theoretical advancements, and will help inform our understandings of brain dynamics that contribute to moment-to-moment cognitive function.
@article{frontiers2018,author={Bridwell, David A. and Cavanagh, James F. and Collins, Anne G.E. and Nunez, Michael D. and Srinivasan, Ramesh and Stober, Sebastian and Calhoun, Vince D.},title={Moving Beyond {ERP} Components: A Selective Review of Approaches to Integrate {EEG} and Behavior},journal={Frontiers in Neuroscience},year={2018},volume={12},pages={106},doi={10.3389/fnhum.2018.00106},url={https://www.frontiersin.org/article/10.3389/fnhum.2018.00106},}
Shared Generative Representation of Auditory Concepts and EEG to Reconstruct Perceived and Imagined Music
André
Ofner and Sebastian
Stober
In 19th International Society for Music Information Retrieval Conference (ISMIR’18), 2018
Retrieving music information from brain activity is a challenging and still largely unexplored research problem. In this paper we investigate the possibility to reconstruct perceived and imagined musical stimuli from electroencephalography (EEG) recordings based on two datasets. One dataset contains multi-channel EEG of subjects listening to and imagining rhythmical patterns presented both as sine wave tones and short looped spoken utterances. These utterances leverage the well-known speech-to-song illusory transformation which results in very catchy and easy to reproduce motifs. A second dataset provides EEG recordings for the perception of 10 full length songs. Using a multi-view deep generative model we demonstrate the feasibility of learning a shared latent representation of brain activity and auditory concepts, such as rhythmical motifs appearing across different instrumentations. Introspection of the model trained on the rhythm dataset reveals disentangled rhythmical and timbral features within and across subjects. The model allows continuous interpolation between representations of different observed variants of the presented stimuli. By decoding the learned embeddings we were able to reconstruct both perceived and imagined music. Stimulus complexity and the choice of training data shows strong effect on the reconstruction quality.
@inproceedings{ofner2018ismir,author={Ofner, André and Stober, Sebastian},title={Shared Generative Representation of Auditory Concepts and EEG to Reconstruct Perceived and Imagined Music},booktitle={19th International Society for Music Information Retrieval Conference (ISMIR'18)},year={2018},pages={392--399},paper={ismir2018.pdf},url={http://ismir2018.ircam.fr/doc/pdfs/101_Paper.pdf}}
2017
Learning Discriminative Features from Electroencephalography Recordings by Encoding Similarity Constraints
Sebastian
Stober
In Proceedings of 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17), 2017
This paper introduces a pre-training technique for learning discriminative features from electroencephalography (EEG) recordings using deep neural networks. EEG data are generally only available in small quantities, they are high-dimensional with a poor signal-to-noise ratio, and there is considerable variability between individual subjects and recording sessions. Similarity-constraint encoders as introduced in this paper specifically address these challenges for feature learning. They learn features that allow to distinguish between classes by demanding that encodings of two trials from the same class are more similar to each other than to encoded trials from other classes. This tuple-based training approach is especially suitable for small datasets. The proposed technique is evaluated using the publicly available OpenMIIR dataset of EEG recordings taken while participants listened to and imagined music. For this dataset, a simple convolutional filter can be learned that significantly improves the signal-to-noise ratio while aggregating the 64 EEG channels into a single waveform.
@inproceedings{stober2017icassp,author={Stober, Sebastian},title={Learning Discriminative Features from Electroencephalography Recordings by Encoding Similarity Constraints},booktitle={Proceedings of 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'17)},year={2017},pages={6175--6179},supplements={https://dx.doi.org/10.6084/m9.figshare.4530797},}
Towards Studying Music Cognition with Information Retrieval Techniques: Lessons Learned from the OpenMIIR Initiative
As an emerging sub-field of music information retrieval (MIR), music imagery information retrieval (MIIR) aims to retrieve information from brain activity recorded during music cognition - such as listening to or imagining music pieces. This is a highly inter-disciplinary endeavor that requires expertise in MIR as well as cognitive neuroscience and psychology. The OpenMIIR initiative strives to foster collaborations between these fields to advance the state of the art in MIIR. As a first step, electroencephalography (EEG) recordings of music perception and imagination have been made publicly available, enabling MIR researchers to easily test and adapt their existing approaches for music analysis like fingerprinting, beat tracking or tempo estimation on this new kind of data. This paper reports on first results of MIIR experiments using these OpenMIIR datasets and points out how these findings could drive new research in cognitive neuroscience.
@article{stober2017frontiers,author={Stober, Sebastian},title={Towards {Studying} {Music} {Cognition} with {Information} {Retrieval} {Techniques}: {Lessons} {Learned} from the {OpenMIIR} {Initiative}},journal={Frontiers in Psychology},year={2017},volume={8},issn={1664-1078},doi={10.3389/fpsyg.2017.01255},language={English},shorttitle={Towards {Studying} {Music} {Cognition} with {Information} {Retrieval} {Techniques}},url={http://journal.frontiersin.org/article/10.3389/fpsyg.2017.01255/abstract},urldate={2017-07-24},}
2016
Brain Beats: Tempo Extraction from EEG Data
Sebastian
Stober, Thomas
Prätzlich, and Meinard
Müller
In 17th International Society for Music Information Retrieval Conference (ISMIR’16), 2016
This paper addresses the question how music information retrieval techniques originally developed to process audio recordings can be adapted for the analysis of corresponding brain activity data. In particular, we conducted a case study applying beat tracking techniques to extract the tempo from electroencephalography (EEG) recordings obtained from people listening to music stimuli. We point out similarities and differences in processing audio and EEG data and show to which extent the tempo can be successfully extracted from EEG signals. Furthermore, we demonstrate how the tempo extraction from EEG signals can be stabilized by applying different fusion approaches on the mid-level tempogram features.
@inproceedings{stober2016ismir,author={Stober, Sebastian and Pr\"{a}tzlich, Thomas and M\"{u}ller, Meinard},title={Brain Beats: Tempo Extraction from EEG Data},booktitle={17th International Society for Music Information Retrieval Conference (ISMIR'16)},year={2016},options={maxnames=10},url={https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/022_Paper.pdf},}
Learning Discriminative Features from Electroencephalography Recordings by Encoding Similarity Constraints
This work introduces a pre-training technique for learning discriminative features from electroencephalography (EEG) recordings using deep artificial neural networks. EEG data are generally only available in small quantities, they are high-dimensional with a poor signal-to-noise ratio, and there is considerable variability between individual subjects and recording sessions. Similarity-constraint encoders as introduced here specifically address these challenges for feature learning. They learn features that allow to distinguish between classes by demanding that encodings of two trials from the same class are more similar to each other than to encoded trials from other classes. This tuple-based training approach is especially suitable for small datasets. The proposed technique is evaluated using the publicly available OpenMIIR dataset of EEG recordings taken while 9 subjects listened to 12 short music pieces. For this dataset, a simple convolutional filter can be learned that is stable across subjects and significantly improves the signal-to-noise ratio while aggregating the 64 EEG channels into a single waveform. With this filter, a neural network classifier can be trained that is simple enough to allow for interpretation of the learned parameters by domain experts and facilitate findings about the cognitive processes. Further, a cross-subject classification accuracy of 27% is obtained with values above 40% for individual subjects.
@inproceedings{srober2016bc,author={Stober, Sebastian},title={Learning Discriminative Features from Electroencephalography Recordings by Encoding Similarity Constraints},booktitle={Bernstein Conference 2016},year={2016},doi={10.12751/nncn.bc2016.0223},}
2015
Tempo Estimation from the EEG Signal during Perception and Imagination of Music
Avital
Sternin, Sebastian
Stober, Jessica A.
Grahn, and Adrian M.
Owen
In 1st International Workshop on Brain-Computer Music Interfacing / 11th International Symposium on Computer Music Multidisciplinary Research (BCMI/CMMR’15), 2015
Electroencephalography (EEG) recordings taken during the perception and the imagination of music contain enough information to estimate the tempo of a musical piece. Five participants listened to and imagined 12 short clips taken from familiar musical pieces – each 7s-16s long. Basic EEG preprocessing techniques were used to remove artifacts and a dynamic beat tracker was used to estimate average tempo. Autocorrelation curves were computed to investigate the periodicity seen in the average EEG waveforms, and the peaks from these curves were found to be proportional to stimulus measure length. As the tempo at which participants imagine may vary over time we used an aggregation technique that allowed us to estimate an accurate tempo over the course of an entire trial. We propose future directions involving convolutional neural networks (CNNs) that will allow us to apply our results to build a brain-computer interface.
@inproceedings{sternin2015bcmi,author={Sternin, Avital and Stober, Sebastian and Grahn, Jessica A. and Owen, Adrian M.},booktitle={1st International Workshop on Brain-Computer Music Interfacing / 11th International Symposium on Computer Music Multidisciplinary Research (BCMI/CMMR'15)},title={Tempo Estimation from the EEG Signal during Perception and Imagination of Music},year={2015},date-added={2015-06-07 23:48:52 +0000},date-modified={2015-06-08 00:03:10 +0000},xposter={bcmi2015slides.pdf},xsupplements={http://dx.doi.org/10.6084/m9.figshare.1213903}}
Classifying Perception and Imagination of Music from EEG
Avital
Sternin, Sebastian
Stober, Adrian M.
Owen, and Jessica A.
Grahn
In Society for Music Perception & Cognition Conference (SMPC’15), 2015
The neural processes involved in the perception of music are also involved in imagination. This overlap can be exploited by techniques that attempt to classify the contents of imagination from neural signals, such as signals recorded by EEG. Successful EEG-based classification of what an individual is imagining could pave the way for novel communication technologies, such as brain-computer interfaces. Our study explored whether we could accurately classify perceived and imagined musical stimuli from EEG data. To determine what characteristics of music resulted in the most distinct, and therefore most classifiable, EEG activity, we systematically varied properties of the music. These properties included time signature (3/4 versus 4/4), lyrics (music with lyrics versus music without), tempo (slow versus fast), and instrumentation. Our primary goal was to reliably distinguish between groups of stimuli based on these properties. We recorded EEG with a 64-channel BioSemi system while participants heard or imagined the different musical stimuli. We hypothesized that we would be able to classify which piece was being heard, or being imagined, from the EEG data.
Using principal components analysis, we identified components common to both the perception and imagination conditions. Preliminary analyses show that the time courses of these components are unique to each stimulus and may be used for classification. To investigate other features of the EEG recordings that correlate with stimuli and thus enable accurate classification, we applied a machine learning approach, using deep learning techniques including sparse auto-encoders and convolutional neural networks. This approach has shown promising initial results: we were able to classify stimuli at above chance levels based on their time signature and to estimate the tempo of perceived and imagined music from EEG data. Our findings may ultimately lead to the development of a music-based brain-computer interface.
@inproceedings{sternin2015smpc,author={Sternin, Avital and Stober, Sebastian and Owen, Adrian M. and Grahn, Jessica A.},booktitle={Society for Music Perception \& Cognition Conference (SMPC'15)},title={Classifying Perception and Imagination of Music from EEG},year={2015},note={abstract/poster},date-added={2015-06-08 00:03:38 +0000},date-modified={2015-06-08 00:07:14 +0000},}
Deep Feature Learning for EEG Recordings
Sebastian
Stober, Avital
Sternin, Adrian M.
Owen, and Jessica A.
Grahn
We introduce and compare several strategies for learning discriminative features from electroencephalography (EEG) recordings using deep learning techniques. EEG data are generally only available in small quantities, they are high-dimensional with a poor signal-to-noise ratio, and there is considerable variability between individual subjects and recording sessions. Our proposed techniques specifically address these challenges for feature learning. Cross-trial encoding forces auto-encoders to focus on features that are stable across trials. Similarity-constraint encoders learn features that allow to distinguish between classes by demanding that two trials from the same class are more similar to each other than to trials from other classes. This tuple-based training approach is especially suitable for small datasets. Hydra-nets allow for separate processing pathways adapting to subsets of a dataset and thus combine the advantages of individual feature learning (better adaptation of early, low-level processing) with group model training (better generalization of higher-level processing in deeper layers). This way, models can, for instance, adapt to each subject individually to compensate for differences in spatial patterns due to anatomical differences or variance in electrode positions. The different techniques are evaluated using the publicly available OpenMIIR dataset of EEG recordings taken while participants listened to and imagined music.
@article{stober2015arXiv:1511.04306,title={Deep Feature Learning for {EEG} Recordings},author={Stober, Sebastian and Sternin, Avital and Owen, Adrian M. and Grahn, Jessica A.},journal={arXiv preprint arXiv:1511.04306},year={2015},note={submitted as conference paper for ICLR 2016},date-added={2015-11-22 00:57:30 +0000},date-modified={2015-11-22 01:01:25 +0000},url={http://arxiv.org/abs/1511.04306}}
Towards Music Imagery Information Retrieval: Introducing the OpenMIIR Dataset of EEG Recordings from Music Perception and Imagination
Sebastian
Stober, Avital
Sternin, Adrian M.
Owen, and Jessica A.
Grahn
In 16th International Society for Music Information Retrieval Conference (ISMIR’15), 2015
Music imagery information retrieval (MIIR) systems may one day be able to recognize a song from only our thoughts. As a step towards such technology, we are presenting a public domain dataset of electroencephalography (EEG) recordings taken during music perception and imagination. We acquired this data during an ongoing study that so far comprises 10 subjects listening to and imagining 12 short music fragments - each 7-16s long - taken from well-known pieces. These stimuli were selected from different genres and systematically vary along musical dimensions such as meter, tempo and the presence of lyrics. This way, various retrieval scenarios can be addressed and the success of classifying based on specific dimensions can be tested. The dataset is aimed to enable music information retrieval researchers interested in these new MIIR challenges to easily test and adapt their existing approaches for music analysis like fingerprinting, beat tracking, or tempo estimation on EEG data.
@inproceedings{stober2015ismir,author={Stober, Sebastian and Sternin, Avital and Owen, Adrian M. and Grahn, Jessica A.},booktitle={16th International Society for Music Information Retrieval Conference (ISMIR'15)},title={Towards Music Imagery Information Retrieval: Introducing the OpenMIIR Dataset of {EEG} Recordings from Music Perception and Imagination},year={2015},pages={763--769},bdsk-url-1={http://ismir2015.uma.es/articles/224_Paper.pdf},url={http://ismir2015.uma.es/articles/224_Paper.pdf},xsupplements={http://dx.doi.org/10.6084/m9.figshare.1108287}}
2014
Using Deep Learning Techniques to Analyze and Classify EEG Recordings
Sebastian
Stober
In Computational Neuroscience Workshop at Unconventional Computation and Natural Computation Conference (UCNC’14), 2014
@inproceedings{stober2014ucnc,author={Stober, Sebastian},booktitle={Computational Neuroscience Workshop at Unconventional Computation and Natural Computation Conference (UCNC'14)},title={Using Deep Learning Techniques to Analyze and Classify {EEG} Recordings},year={2014},note={abstract/poster},date-added={2014-07-27 13:30:49 +0000},date-modified={2014-07-27 14:02:00 +0000},}
Does the Beat go on? – Identifying Rhythms from Brain Waves Recorded after Their Auditory Presentation
Sebastian
Stober, Daniel J.
Cameron, and Jessica A.
Grahn
In Proceedings of the 9th Audio Mostly: A Conference on Interaction With Sound (AM’14), 2014
Music imagery information retrieval (MIIR) systems may one day be able to recognize a song just as we think of it. As one step towards such technology, we investigate whether rhythms can be identified from an electroencephalography (EEG) recording taken directly after their auditory presentation. The EEG data has been collected during a rhythm perception study in Kigali, Rwanda and comprises 12 East African and 12 Western rhythmic stimuli presented to 13 participants. Each stimulus was presented as a loop for 32 seconds followed by a break of four seconds before the next one started. Using convolutional neural networks (CNNs), we are able to recognize individual rhythms with a mean accuracy of 22.9% over all subjects by just looking at the EEG recorded during the silence between the stimuli.
@inproceedings{stober2014audiomostly,author={Stober, Sebastian and Cameron, Daniel J. and Grahn, Jessica A.},booktitle={Proceedings of the 9th Audio Mostly: A Conference on Interaction With Sound (AM'14)},title={Does the Beat go on? -- Identifying Rhythms from Brain Waves Recorded after Their Auditory Presentation},year={2014},pages={23:1--23:8},articleno={23},bdsk-url-1={http://doi.acm.org/10.1145/2636879.2636904},bdsk-url-2={http://dx.doi.org/10.1145/2636879.2636904},date-added={2014-07-08 15:57:25 +0000},date-modified={2014-07-08 16:00:24 +0000},doi={10.1145/2636879.2636904},numpages={8},url={http://doi.acm.org/10.1145/2636879.2636904},}
Classifying EEG Recordings of Rhythm Perception
Sebastian
Stober, Daniel J.
Cameron, and Jessica A.
Grahn
In 15th International Society for Music Information Retrieval Conference (ISMIR’14), 2014
Electroencephalography (EEG) recordings of rhythm perception might contain enough information to distinguish different rhythm types/genres or even identify the rhythms themselves. In this paper, we present first classification results using deep learning techniques on EEG data recorded within a rhythm perception study in Kigali, Rwanda. We tested 13 adults, mean age 21, who performed three behavioral tasks using rhythmic tone sequences derived from either East African or Western music. For the EEG testing, 24 rhythms - half East African and half Western with identical tempo and based on a 2-bar 12/8 scheme - were each repeated for 32 seconds. During presentation, the participants’ brain waves were recorded via 14 EEG channels. We applied stacked denoising autoencoders and convolutional neural networks on the collected data to distinguish African and Western rhythms on a group and individual participant level. Furthermore, we investigated how far these techniques can be used to recognize the individual rhythms.
@inproceedings{stober2014ismir,author={Stober, Sebastian and Cameron, Daniel J. and Grahn, Jessica A.},booktitle={15th International Society for Music Information Retrieval Conference (ISMIR'14)},title={Classifying {EEG} Recordings of Rhythm Perception},year={2014},pages={649--654},bdsk-url-1={http://www.terasoft.com.tw/conf/ismir2014/proceedings/T117_317_Paper.pdf},date-added={2014-07-08 15:56:06 +0000},date-modified={2014-10-23 16:46:10 +0000},supplements={http://dx.doi.org/10.6084/m9.figshare.1108287},url={http://www.terasoft.com.tw/conf/ismir2014/proceedings/T117_317_Paper.pdf}}
Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings
Sebastian
Stober, Daniel J.
Cameron, and Jessica A.
Grahn
In Advances in Neural Information Processing Systems 27 (NIPS’14), 2014
Electroencephalography (EEG) recordings of rhythm perception might contain enough information to distinguish different rhythm types/genres or even identify the rhythms themselves. We apply convolutional neural networks (CNNs) to analyze and classify EEG data recorded within a rhythm perception study in Kigali, Rwanda which comprises 12 East African and 12 Western rhythmic stimuli - each presented in a loop for 32 seconds to 13 participants. We investigate the impact of the data representation and the pre-processing steps for this classification tasks and compare different network structures. Using CNNs, we are able to recognize individual rhythms from the EEG with a mean classification accuracy of 24.4% (chance level 4.17%) over all subjects by looking at less than three seconds from a single channel. Aggregating predictions for multiple channels, a mean accuracy of up to 50% can be achieved for individual subjects.
@inproceedings{stober2014nips,author={Stober, Sebastian and Cameron, Daniel J. and Grahn, Jessica A.},booktitle={Advances in Neural Information Processing Systems 27 (NIPS'14)},title={Using Convolutional Neural Networks to Recognize Rhythm Stimuli from Electroencephalography Recordings},year={2014},pages={1449--1457},bdsk-url-1={http://papers.nips.cc/paper/5272-using-convolutional-neural-networks-to-recognize-rhythm-stimuli-from-electroencephalography-recordings},date-added={2014-07-08 15:56:06 +0000},date-modified={2014-10-21 15:48:49 +0000},supplements={http://dx.doi.org/10.6084/m9.figshare.1213903},url={http://papers.nips.cc/paper/5272-using-convolutional-neural-networks-to-recognize-rhythm-stimuli-from-electroencephalography-recordings},}
2012
Music Imagery Information Retrieval: Bringing the Song on Your Mind back to Your Ears
Sebastian
Stober and Jessica
Thompson
In 13th International Conference on Music Information Retrieval (ISMIR’12) - Late-Breaking & Demo Papers, 2012
Most existing Music Information Retrieval (MIR) technologies require a user to use a query interface to search for a musical document.
The mental image of the desired music is likely much richer than what the user is able to express through any query interface.
This expressivity bottleneck could be circumvented if it was possible to directly read the music query from the user’s mind.
To the authors’ knowledge, no such attempt has been made in the field of MIR so far.
However, there have been recent advances in cognitive neuroscience that suggest such a system might be possible.
Given these new insights, it seems promising to extend the focus of MIR by including music imagery - possibly forming a sub-discipline
which could be called Music Imagery Information Retrieval (MIIR).
As a first effort, there has been a dedicated session at the Late-Breaking & Demos event at the ISMIR 2012 conference.
This paper aims to stimulate research in the field of MIIR by laying a roadmap for future work.
@inproceedings{ismir2012miir,title={Music Imagery Information Retrieval: Bringing the Song on Your Mind back to Your Ears},author={Stober, Sebastian and Thompson, Jessica},booktitle={13th International Conference on Music Information Retrieval (ISMIR'12) - Late-Breaking \& Demo Papers},year={2012},quality={1},}