Watcharasupat, Karn N.; Ding, Yiwei; Ma, T. Aleksandra; Seshadri, Pavan; Lerch, Alexander Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition Proceedings Article In: Proceedings of the European Conference on Information Retrieval (ECIR), arXiv, Lucca, Italy, 2025. Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Kim, Yonghyun; Lerch, Alexander Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ma, T. Aleksandra; Lerch, Alexander Music auto-tagging in the long tail: A few-shot approach Proceedings Article In: Proceedings of the AES Convention, New York, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3 Han, Chaeyeon; Seshadri, Pavan; Ding, Yiwei; Posner, Noah; Koo, Bon Woo; Agrawal, Animesh; Lerch, Alexander; Guhathakurta, Subhrajit Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963. Abstract | Links | BibTeX | Tags: Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors Watcharasupat, Karn N.; Lerch, Alexander A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Seshadri, Pavan; Han, Chaeyeon; Koo, Bon-Woo; Posner, Noah; Guhathakurta, Subhrajit; Lerch, Alexander ASPED: An Audio Dataset for Detecting Pedestrians Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Watcharasupat, Karn N; Lerch, Alexander Quantifying Spatial Audio Quality Impairment Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ding, Yiwei; Lerch, Alexander Audio Embeddings as Teachers for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ma, Alison B; Lerch, Alexander Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]). Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Vinay, Ashvala; Lerch, Alexander Evaluating Generative Audio Systems and their Metrics Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]). Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Hung, Yun-Ning; Lerch, Alexander Feature-informed Latent Space Regularization for Music Source Separation Miscellaneous 2022, (arXiv:2203.09132 [eess]). Abstract | Links | BibTeX | Tags: Electrical Engineering and Systems Science - Audio and Speech Processing Kalbag, Vedant; Lerch, Alexander Scream Detection in Heavy Metal Music Proceedings Article In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Hung, Yun-Ning; Lerch, Alexander Feature-informed Embedding Space Regularization for Audio Classification Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Watcharasupat, Karn N; Lerch, Alexander Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing2025
@inproceedings{watcharasupat_uncertainty_2025,
title = {Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition},
author = {Karn N. Watcharasupat and Yiwei Ding and T. Aleksandra Ma and Pavan Seshadri and Alexander Lerch},
url = {http://arxiv.org/abs/2501.11570},
doi = {10.48550/arXiv.2501.11570},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-30},
booktitle = {Proceedings of the European Conference on Information Retrieval (ECIR)},
publisher = {arXiv},
address = {Lucca, Italy},
abstract = {Any data annotation for subjective tasks shows potential variations between individuals. This is particularly true for annotations of emotional responses to musical stimuli. While older approaches to music emotion recognition systems frequently addressed this uncertainty problem through probabilistic modeling, modern systems based on neural networks tend to ignore the variability and focus only on predicting central tendencies of human subjective responses. In this work, we explore several methods for estimating not only the central tendencies of the subjective responses to a musical stimulus, but also for estimating the uncertainty associated with these responses. In particular, we investigate probabilistic loss functions and inference-time random sampling. Experimental results indicate that while the modeling of the central tendencies is achievable, modeling of the uncertainty in subjective responses proves significantly more challenging with currently available approaches even when empirical estimates of variations in the responses are available.},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
2024
@inproceedings{kim_towards_2024,
title = {Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation},
author = {Yonghyun Kim and Alexander Lerch},
url = {http://arxiv.org/abs/2410.14122},
doi = {10.48550/arXiv.2410.14122},
year = {2024},
date = {2024-10-01},
urldate = {2024-10-25},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {arXiv},
address = {San Francisco},
abstract = {Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{ma_music_2024,
title = {Music auto-tagging in the long tail: A few-shot approach},
author = {T. Aleksandra Ma and Alexander Lerch},
url = {http://arxiv.org/abs/2409.07730},
doi = {10.48550/arXiv.2409.07730},
year = {2024},
date = {2024-09-01},
urldate = {2024-09-13},
booktitle = {Proceedings of the AES Convention},
address = {New York},
abstract = {In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostly accurate, whereas automatic tagging through supervised learning has approached satisfying accuracy but is restricted to a predefined set of training tags. Few-shot learning offers a viable solution to expand beyond this small set of predefined tags by enabling models to learn from only a few human-provided examples to understand tag meanings and subsequently apply these tags autonomously. We propose to integrate few-shot learning methodology into multi-label music auto-tagging by using features from pre-trained models as inputs to a lightweight linear classifier, also known as a linear probe. We investigate different popular pre-trained features, as well as different few-shot parametrizations with varying numbers of classes and samples per class. Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models while using significantly less training data, such as 20 samples per tag. Additionally, our linear probe performs competitively with leading models when trained on the entire training dataset. The results show that this transfer learning-based few-shot approach could effectively address the issue of automatically assigning long-tail tags with only limited labeled data.},
keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{han_understanding_2024,
title = {Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors},
author = {Chaeyeon Han and Pavan Seshadri and Yiwei Ding and Noah Posner and Bon Woo Koo and Animesh Agrawal and Alexander Lerch and Subhrajit Guhathakurta},
url = {https://doi.org/10.1007/s44212-024-00053-9},
doi = {10.1007/s44212-024-00053-9},
issn = {2731-6963},
year = {2024},
date = {2024-07-01},
urldate = {2024-07-10},
journal = {Urban Informatics},
volume = {3},
number = {1},
pages = {22},
abstract = {While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.},
keywords = {Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{watcharasupat_stem-agnostic_2024,
title = {A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems},
author = {Karn N. Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2406.18747},
doi = {10.48550/arXiv.2406.18747},
year = {2024},
date = {2024-06-01},
urldate = {2024-08-08},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {San Francisco},
abstract = {Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{seshadri_asped_2024,
title = {ASPED: An Audio Dataset for Detecting Pedestrians},
author = {Pavan Seshadri and Chaeyeon Han and Bon-Woo Koo and Noah Posner and Subhrajit Guhathakurta and Alexander Lerch},
url = {http://arxiv.org/abs/2309.06531},
doi = {10.48550/arXiv.2309.06531},
year = {2024},
date = {2024-01-01},
urldate = {2023-12-14},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Seoul},
abstract = {We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{watcharasupat_quantifying_2024,
title = {Quantifying Spatial Audio Quality Impairment},
author = {Karn N Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2309.06531},
doi = {10.48550/arXiv.2309.06531},
year = {2024},
date = {2024-01-01},
urldate = {2023-12-14},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Seoul},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
2023
@inproceedings{ding_audio_2023,
title = {Audio Embeddings as Teachers for Music Classification},
author = {Yiwei Ding and Alexander Lerch},
url = {http://arxiv.org/abs/2306.17424},
doi = {10.48550/arXiv.2306.17424},
year = {2023},
date = {2023-06-01},
urldate = {2023-06-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Milan, Italy},
abstract = {Music classification has been one of the most popular tasks in the field of music information retrieval. With the development of deep learning models, the last decade has seen impressive improvements in a wide range of classification tasks. However, the increasing model complexity makes both training and inference computationally expensive. In this paper, we integrate the ideas of transfer learning and feature-based knowledge distillation and systematically investigate using pre-trained audio embeddings as teachers to guide the training of low-complexity student networks. By regularizing the feature space of the student networks with the pre-trained embeddings, the knowledge in the teacher embeddings can be transferred to the students. We use various pre-trained audio embeddings and test the effectiveness of the method on the tasks of musical instrument classification and music auto-tagging. Results show that our method significantly improves the results in comparison to the identical model trained without the teacher's knowledge. This technique can also be combined with classical knowledge distillation approaches to further improve the model's performance.},
keywords = {Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
2022
@inproceedings{ma_representation_2022,
title = {Representation Learning for the Automatic Indexing of Sound Effects Libraries},
author = {Alison B Ma and Alexander Lerch},
url = {http://arxiv.org/abs/2208.09096},
doi = {10.48550/arXiv.2208.09096},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-22},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Bangalore, IN},
abstract = {Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established representations such as OpenL3. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.},
note = {arXiv:2208.09096 [cs, eess]},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{vinay_evaluating_2022,
title = {Evaluating Generative Audio Systems and their Metrics},
author = {Ashvala Vinay and Alexander Lerch},
url = {http://arxiv.org/abs/2209.00130},
doi = {10.48550/arXiv.2209.00130},
year = {2022},
date = {2022-08-01},
urldate = {2022-09-03},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Bangalore, IN},
abstract = {Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reported metrics in most cases unknown, prohibiting any conclusive insights with respect to practical usability and audio quality. This paper presents a study that investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and with (ii) a listening study. The results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.},
note = {arXiv:2209.00130 [cs, eess]},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@misc{hung_feature-informed_2022-1,
title = {Feature-informed Latent Space Regularization for Music Source Separation},
author = {Yun-Ning Hung and Alexander Lerch},
url = {http://arxiv.org/abs/2203.09132},
doi = {10.48550/arXiv.2203.09132},
year = {2022},
date = {2022-06-01},
urldate = {2022-09-03},
publisher = {arXiv},
abstract = {The integration of additional side information to improve music source separation has been investigated numerous times, e.g., by adding features to the input or by adding learning targets in a multi-task learning scenario. These approaches, however, require additional annotations such as musical scores, instrument labels, etc. in training and possibly during inference. The available datasets for source separation do not usually provide these additional annotations. In this work, we explore transfer learning strategies to incorporate VGGish features with a state-of-the-art source separation model; VGGish features are known to be a very condensed representation of audio content and have been successfully used in many MIR tasks. We introduce three approaches to incorporate the features, including two latent space regularization methods and one naive concatenation method. Experimental results show that our proposed approaches improve several evaluation metrics for music source separation.},
note = {arXiv:2203.09132 [eess]},
keywords = {Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {misc}
}
@inproceedings{kalbag_scream_2022,
title = {Scream Detection in Heavy Metal Music},
author = {Vedant Kalbag and Alexander Lerch},
url = {http://arxiv.org/abs/2205.05580},
doi = {10.48550/arXiv.2205.05580},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the Sound and Music Computing Conference (SMC)},
address = {Saint-Etienne},
abstract = {Harsh vocal effects such as screams or growls are far more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music, specifically the identification of different scream techniques. We investigate the suitability of various feature representations, including cepstral, spectral, and temporal features as input representations for classification. The main contributions of this work are (i) a manually annotated dataset comprised of over 280 minutes of heavy metal songs of various genres with a statistical analysis of occurrences of different extreme vocal techniques in heavy metal music, and (ii) a systematic study of different input feature representations for the classification of heavy metal vocals},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_feature-informed_2022,
title = {Feature-informed Embedding Space Regularization for Audio Classification},
author = {Yun-Ning Hung and Alexander Lerch},
url = {http://arxiv.org/abs/2206.04850},
doi = {10.48550/arXiv.2206.04850},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
address = {Belgrade, Serbia},
abstract = {Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computational burdens during inference. We propose to leverage both detailed task-specific features from spectrogram input and generic pre-trained features by introducing two regularization methods that integrate the information of both feature classes. The workload is kept low during inference as the pre-trained features are only necessary for training. In experiments with the pre-trained features VGGish, OpenL3, and a combination of both, we show that the proposed methods not only outperform baseline methods, but also can improve state-of-the-art models on several audio classification tasks. The results also suggest that using the mixture of features performs better than using individual features.},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
2021
@inproceedings{watcharasupat_evaluation_2021,
title = {Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes},
author = {Karn N Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2110.05587},
year = {2021},
date = {2021-10-01},
urldate = {2021-11-11},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Online},
abstract = {Controllable music generation with deep generative models has become increasingly reliant on disentanglement learning techniques. However, current disentanglement metrics, such as mutual information gap (MIG), are often inadequate and misleading when used for evaluating latent representations in the presence of interdependent semantic attributes often encountered in real-world music datasets. In this work, we propose a dependency-aware information metric as a drop-in replacement for MIG that accounts for the inherent relationship between semantic attributes.},
keywords = {Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
publications
Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition Proceedings Article In: Proceedings of the European Conference on Information Retrieval (ECIR), arXiv, Lucca, Italy, 2025. Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024. Music auto-tagging in the long tail: A few-shot approach Proceedings Article In: Proceedings of the AES Convention, New York, 2024. Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963. A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024. ASPED: An Audio Dataset for Detecting Pedestrians Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Quantifying Spatial Audio Quality Impairment Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Audio Embeddings as Teachers for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023. Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]). Evaluating Generative Audio Systems and their Metrics Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]). Feature-informed Latent Space Regularization for Music Source Separation Miscellaneous 2022, (arXiv:2203.09132 [eess]). Scream Detection in Heavy Metal Music Proceedings Article In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Feature-informed Embedding Space Regularization for Audio Classification Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021.2025
2024
2023
2022
2021