publications | Music Informatics Group

2025

Watcharasupat, Karn N.; Ding, Yiwei; Ma, T. Aleksandra; Seshadri, Pavan; Lerch, Alexander

Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition Proceedings Article

In: Proceedings of the European Conference on Information Retrieval (ECIR), arXiv, Lucca, Italy, 2025.

Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

2024

Kim, Yonghyun; Lerch, Alexander

Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article

In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024.

Ma, T. Aleksandra; Lerch, Alexander

Music auto-tagging in the long tail: A few-shot approach Proceedings Article

In: Proceedings of the AES Convention, New York, 2024.

Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3

@inproceedings{ma_music_2024,

title = {Music auto-tagging in the long tail: A few-shot approach},

author = {T. Aleksandra Ma and Alexander Lerch},

url = {http://arxiv.org/abs/2409.07730},

doi = {10.48550/arXiv.2409.07730},

year  = {2024},

date = {2024-09-01},

urldate = {2024-09-13},

booktitle = {Proceedings of the AES Convention},

address = {New York},

abstract = {In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostly accurate, whereas automatic tagging through supervised learning has approached satisfying accuracy but is restricted to a predefined set of training tags. Few-shot learning offers a viable solution to expand beyond this small set of predefined tags by enabling models to learn from only a few human-provided examples to understand tag meanings and subsequently apply these tags autonomously. We propose to integrate few-shot learning methodology into multi-label music auto-tagging by using features from pre-trained models as inputs to a lightweight linear classifier, also known as a linear probe. We investigate different popular pre-trained features, as well as different few-shot parametrizations with varying numbers of classes and samples per class. Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models while using significantly less training data, such as 20 samples per tag. Additionally, our linear probe performs competitively with leading models when trained on the entire training dataset. The results show that this transfer learning-based few-shot approach could effectively address the issue of automatically assigning long-tail tags with only limited labeled data.},

keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3},

pubstate = {published},

tppubtype = {inproceedings}

}

Han, Chaeyeon; Seshadri, Pavan; Ding, Yiwei; Posner, Noah; Koo, Bon Woo; Agrawal, Animesh; Lerch, Alexander; Guhathakurta, Subhrajit

Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article

In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963.

Abstract | Links | BibTeX | Tags: Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors

@article{han_understanding_2024,

title = {Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors},

author = {Chaeyeon Han and Pavan Seshadri and Yiwei Ding and Noah Posner and Bon Woo Koo and Animesh Agrawal and Alexander Lerch and Subhrajit Guhathakurta},

url = {https://doi.org/10.1007/s44212-024-00053-9},

doi = {10.1007/s44212-024-00053-9},

issn = {2731-6963},

year  = {2024},

date = {2024-07-01},

urldate = {2024-07-10},

journal = {Urban Informatics},

volume = {3},

number = {1},

pages = {22},

abstract = {While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.},

keywords = {Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors},

pubstate = {published},

tppubtype = {article}

}

Watcharasupat, Karn N.; Lerch, Alexander

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024.

@inproceedings{watcharasupat_stem-agnostic_2024,

title = {A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems},

author = {Karn N. Watcharasupat and Alexander Lerch},

url = {http://arxiv.org/abs/2406.18747},

doi = {10.48550/arXiv.2406.18747},

year  = {2024},

date = {2024-06-01},

urldate = {2024-08-08},

booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},

address = {San Francisco},

abstract = {Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.},

keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},

pubstate = {published},

tppubtype = {inproceedings}

}

Ding, Yiwei; Lerch, Alexander

Embedding Compression for Teacher-to-Student Knowledge Transfer Proceedings Article

In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) - Satellite Workshop Deep Neural Network Model Compression, Institute of Electrical and Electronics Engineers (IEEE), Seoul, Korea, 2024, (arXiv:2402.06761 [cs]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning

2022

Ma, Alison B; Lerch, Alexander

Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Vinay, Ashvala; Lerch, Alexander

Evaluating Generative Audio Systems and their Metrics Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Kalbag, Vedant; Lerch, Alexander

Scream Detection in Heavy Metal Music Proceedings Article

In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022.

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

2021

Watcharasupat, Karn N; Lerch, Alexander

Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article

In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021.

Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Music Informatics Group