Kim, Yonghyun; Lerch, Alexander Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ma, T. Aleksandra; Lerch, Alexander Music auto-tagging in the long tail: A few-shot approach Proceedings Article In: Proceedings of the AES Convention, New York, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3 Han, Chaeyeon; Seshadri, Pavan; Ding, Yiwei; Posner, Noah; Koo, Bon Woo; Agrawal, Animesh; Lerch, Alexander; Guhathakurta, Subhrajit Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963. Abstract | Links | BibTeX | Tags: Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors Watcharasupat, Karn N.; Lerch, Alexander A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ooi, Kenneth; Goh, Jessie; Lin, Hao-Weng; Ong, Zhen-Ting; Wong, Trevor; Watcharasupat, Karn N.; Lam, Bhan; Gan, Woon-Seng Lion City Soundscapes: Modified Partitioning around Medoids for a
Perceptually Diverse Dataset of Singaporean Soundscapes Journal Article In: JASA Express Letters, vol. 4, no. 4, pp. 047402, 2024, ISSN: 2691-1191. Ding, Yiwei; Han, Chaeyeon; Seshadri, Pavan; Koo, Bon Woo; Posner, Noah; Guhathakurta, Subhro; Lerch, Alexander Toward audio-based sensing for pedestrian detection Journal Article In: The Journal of the Acoustical Society of America, vol. 155, no. 3_Supplement, pp. A282, 2024, ISSN: 0001-4966. Abstract | Links | BibTeX | Tags: Ding, Yiwei; Lerch, Alexander Embedding Compression for Teacher-to-Student Knowledge Transfer Proceedings Article In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) - Satellite Workshop Deep Neural Network Model Compression, Institute of Electrical and Electronics Engineers (IEEE), Seoul, Korea, 2024, (arXiv:2402.06761 [cs]). Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning Seshadri, Pavan; Han, Chaeyeon; Koo, Bon-Woo; Posner, Noah; Guhathakurta, Subhrajit; Lerch, Alexander ASPED: An Audio Dataset for Detecting Pedestrians Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Watcharasupat, Karn N; Lerch, Alexander Quantifying Spatial Audio Quality Impairment Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Ooi, Kenneth; Ong, Zhen-Ting; Watcharasupat, Karn N.; Lam, Bhan; Hong, Joo Young; Gan, Woon-Seng ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to
Augmented Urban Soundscapes Journal Article In: IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 105–120, 2024, ISSN: 1949-3045. Watcharasupat, Karn N; Ooi, Kenneth; Lam, Bhan; Ong, Zhen-Ting; Jaratjarungkiat, Sureenate; Gan, Woon-Seng Validating Thai Translations of Perceptual Soundscape Attributes: A
Non-Procrustean Approach with a Procrustes Projection Journal Article In: Applied Acoustics, 2024. Liu, Shimiao; Lerch, Alexander Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings Proceedings Article In: Proceedings of the IEEE International Symposium on the Internet of Sounds (IS2), Erlangen, 2024. Abstract | Links | BibTeX | Tags: Contrastive learning, Encoding, Fitting, Immersive experience, Internet, Labeling, Manuals, multi-modal, music, music recommendation, Recommender systems, trans-former, Transformers Lam, Bhan; Chieng, Julia; Ooi, Kenneth; Ong, Zhen Ting; Watcharasupat, Karn N.; Hong, Joo Young; Gan, Woon Seng Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape
Attributes in Bahasa Melayu Journal Article In: Applied Acoustics, vol. 214, 2023, ISSN: 1872910X. Ding, Yiwei; Lerch, Alexander Audio Embeddings as Teachers for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Knees, Peter; Lerch, Alexander MILC 2023: 3rd Workshop on Intelligent Music Interfaces for Listening and Creation Proceedings Article In: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 185–186, Association for Computing Machinery, Sydney, 2023, ISBN: 9798400701078. Abstract | Links | BibTeX | Tags: Chen, Hsin-Hung; Lerch, Alexander Music Instrument Classification Reprogrammed Proceedings Article In: Proceedings of the International Conference on Multimedia Modeling (MMM), Bergen, Norway, 2023. Lerch, Alexander An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications Book 2, Wiley-IEEE Press, Hoboken, N.J, 2023, ISBN: 978-1-119-89094-2. Abstract | Links | BibTeX | Tags: analysis, audio, Audio content analysis, audio signal processing, Automatic Music Transcription, Computer sound processing, machine listening, Matlab, MIR, music analysis, music informatics, music information retrieval, Python Lerch, Alexander Audioinhaltsanalyse Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–20, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4. Abstract | Links | BibTeX | Tags: Audio content analysis, Grundfrequenzerkennung, music information retrieval, Musikklassifizierung, Musiktranskription, Tonarterkennung Smith, Jason Brent; Vinay, Ashvala; Freeman, Jason The Impact of Salient Musical Features in a Hybrid Recommendation System for a Sound Library Proceedings Article In: Joint Proceedings of the ACM IUI Workshops (MILC), Sydney, 2023. Abstract | Links | BibTeX | Tags: Hung, Yun-Ning; Yang, Chao-Han Huck; Chen, Pin-Yu; Lerch, Alexander Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Rhodes Island, Greece, 2023. Lerch, Alexander Grundlagen digitaler Audiosignale Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–13, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4. Abstract | Links | BibTeX | Tags: Vinay, Ashvala; Lerch, Alexander AQUATK: An Audio Quality Assessment Toolkit Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Milan, 2023. Watcharasupat, Karn N; Wu, Chih-Wei; Ding, Yiwei; Orife, Iroro; Hipple, Aaron J; Williams, Phillip A; Kramer, Scott; Lerch, Alexander; Wolcott, William A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation Journal Article In: IEEE Open Journal of Signal Processing, pp. 1–9, 2023, ISSN: 2644-1322. Abstract | Links | BibTeX | Tags: Lam, Bhan; Ooi, Kenneth; Ong, Zhen-Ting; Wong, Trevor; Gan, Woon-Seng; Watcharasupat, Karn Preliminary Investigation of the Short-Term in Situ Performance of an
Automatic Masker Selection System Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. Ong, Zhen-Ting; Ooi, Kenneth; Wong, Trevor; Lam, Bhan; Gan, Woon-Seng; Watcharasupat, Karn N. Effect of Masker Selection Schemes on the Perceived Affective Quality of
Soundscapes: A Pilot Study Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. Ooi, Kenneth; Ong, Zhen-Ting; Lam, Bhan; Wong, Trevor; Gan, Woon-Seng; Watcharasupat, Karn ARAUSv2: An Expanded Dataset and Multimodal Models of Affective Responses to
Augmented Urban Soundscapes Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. Ooi, Kenneth; Watcharasupat, Karn N; Lam, Bhan; Ong, Zhen-Ting; Gan, Woon-Seng Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and
Participant-linked Inputs Proceedings Article In: Proceedings of the 2023 International Conference on Acoustics, Speech, and
Signal Processing, 2023. Hung, Yun-Ning; Wu, Chih-Wei; Orife, Iroro; Hipple, Aaron; Wolcott, William; Lerch, Alexander A large TV dataset for speech and music activity detection Journal Article In: EURASIP Journal on Audio, Speech, and Music Processing, vol. 2022, no. 1, pp. 21, 2022, ISSN: 1687-4722. Abstract | Links | BibTeX | Tags: Dataset, Production TV audio, Speech and music activation detection Ma, Alison B; Lerch, Alexander Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]). Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Vinay, Ashvala; Lerch, Alexander Evaluating Generative Audio Systems and their Metrics Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]). Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Lerch, Alexander libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages Journal Article In: Software Impacts, pp. 100349, 2022, ISSN: 2665-9638. Abstract | Links | BibTeX | Tags: Audio content analysis, C++, Matlab, music information retrieval, Python Hung, Yun-Ning; Lerch, Alexander Feature-informed Latent Space Regularization for Music Source Separation Miscellaneous 2022, (arXiv:2203.09132 [eess]). Abstract | Links | BibTeX | Tags: Electrical Engineering and Systems Science - Audio and Speech Processing Wang, Ju-Chiang; Hung, Yun-Ning; Smith, Jordan B. L. To Catch A Chorus, Verse, Intro, or Anything Else: Analyzing a Song with Structural Functions Proceedings Article In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 416–420, 2022, (ISSN: 2379-190X). Abstract | Links | BibTeX | Tags: Location awareness, music, Music structure, segmentation, semantic labeling, Semantics, Signal processing, Signal processing algorithms, SpecTNT, Taxonomy, Transformer, Transformers Watcharasupat, Karn N; Lee, Junyoung; Lerch, Alexander Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models Journal Article In: Software Impacts, pp. 100222, 2022, ISSN: 26659638. Abstract | Links | BibTeX | Tags: Kalbag, Vedant; Lerch, Alexander Scream Detection in Heavy Metal Music Proceedings Article In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Hung, Yun-Ning; Lerch, Alexander Feature-informed Embedding Space Regularization for Audio Classification Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Guo, Weian; Hua, Zhenyao; Kang, Zecheng; Li, Dongyang; Wang, Lei; Wu, Qidi; Lerch, Alexander Deep Reinforcement Learning for Urban Multi-taxis Cruising Strategy Journal Article In: Neural Computing and Applications, 2022, ISSN: 1433-3058. Abstract | Links | BibTeX | Tags: Data-driven model, deep Q-learning network, Multi-taxis cruising, Urban transportation Herre, Jürgen; Disch, Sascha; Lerch, Alexander Quellcodierung Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–23, Springer, Berlin, Heidelberg, 2022, ISBN: 978-3-662-60357-4. Abstract | Links | BibTeX | Tags: Audiocodierung, Audiokomprimierung, Codec, MPEG, Psychoakustik, verlustfrei, verlustlos Li, Dongyang; Wang, Lei; Li, Li; Guo, Weian; Wu, Qidi; Lerch, Alexander A Large-Scale Multiobjective Particle Swarm Optimizer With Enhanced Balance of Convergence and Diversity Journal Article In: IEEE Transactions on Cybernetics, pp. 1–12, 2022, ISSN: 2168-2275. Abstract | Links | BibTeX | Tags: Complexity theory, Convergence, Cybernetics, diversity, Estimation, large-scale multiobjective optimization, multidimensional local sparseness, Optimization, Particle swarm optimization, particle swarm optimization (PSO), Weight measurement, weighted convergence factor (WCF) Watcharasupat, Karn N.; Ooi, Kenneth; Lam, Bhan; Wong, Trevor; Ong, Zhen Ting; Gan, Woon Seng Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and
Gain Journal Article In: IEEE Signal Processing Letters, vol. 29, pp. 1749–1753, 2022, ISSN: 15582361. Watcharasupat, Karn N; Lerch, Alexander Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Hung, Yun-Ning; Wichern, Gordon; Roux, Jonathan Le Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision Proceedings Article In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 46–50, 2021, (ISSN: 2379-190X). Abstract | Links | BibTeX | Tags: audio source separation, Conferences, Instruments, music, music transcription, Particle separators, Source separation, Time-frequency analysis, Training, weakly-labeled data, weakly-supervised separation Li, Dongyang; Wang, Lei; Lerch, Alexander; Wu, Qidi An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization Journal Article In: Swarm and Evolutionary Computation, vol. 60, 2021, ISSN: 2210-6502. Vinay, Ashvala; Lerch, Alexander; Leslie, Grace Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Toronto, Ontario, Canada, 2021. Abstract | Links | BibTeX | Tags: Seshadri, Pavan; Lerch, Alexander Improving Music Performance Assessment with Contrastive Learning Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Abstract | Links | BibTeX | Tags: Pati, Ashis; Lerch, Alexander Is Disentanglement Enough? On Latent Representations for Controllable Music Generation Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Semi-Supervised Audio Classification with Partially Labeled Data Proceedings Article In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), online, 2021. Lerch, Alexander; Knees, Peter Machine Learning Applied to Music/Audio Signal Processing Journal Article In: Electronics, vol. 10, no. 24, pp. 3077, 2021. Abstract | Links | BibTeX | Tags: n/a Hung, Yun-Ning; Watcharasupat, Karn N; Wu, Chih-Wei; Orife, Iroro; Li, Kelian; Seshadri, Pavan; Lee, Junyoung AVASPEECH-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-occurence Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 3, Online, 2021. Huang, Jiawen; Hung, Yun-Ning; Pati, Ashis K; Gururani, Siddharth; Lerch, Alexander Score-informed Networks for Music Performance Assessment Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020.2024
@inproceedings{kim_towards_2024,
title = {Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation},
author = {Yonghyun Kim and Alexander Lerch},
url = {http://arxiv.org/abs/2410.14122},
doi = {10.48550/arXiv.2410.14122},
year = {2024},
date = {2024-10-01},
urldate = {2024-10-25},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {arXiv},
address = {San Francisco},
abstract = {Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{ma_music_2024,
title = {Music auto-tagging in the long tail: A few-shot approach},
author = {T. Aleksandra Ma and Alexander Lerch},
url = {http://arxiv.org/abs/2409.07730},
doi = {10.48550/arXiv.2409.07730},
year = {2024},
date = {2024-09-01},
urldate = {2024-09-13},
booktitle = {Proceedings of the AES Convention},
address = {New York},
abstract = {In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostly accurate, whereas automatic tagging through supervised learning has approached satisfying accuracy but is restricted to a predefined set of training tags. Few-shot learning offers a viable solution to expand beyond this small set of predefined tags by enabling models to learn from only a few human-provided examples to understand tag meanings and subsequently apply these tags autonomously. We propose to integrate few-shot learning methodology into multi-label music auto-tagging by using features from pre-trained models as inputs to a lightweight linear classifier, also known as a linear probe. We investigate different popular pre-trained features, as well as different few-shot parametrizations with varying numbers of classes and samples per class. Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models while using significantly less training data, such as 20 samples per tag. Additionally, our linear probe performs competitively with leading models when trained on the entire training dataset. The results show that this transfer learning-based few-shot approach could effectively address the issue of automatically assigning long-tail tags with only limited labeled data.},
keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{han_understanding_2024,
title = {Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors},
author = {Chaeyeon Han and Pavan Seshadri and Yiwei Ding and Noah Posner and Bon Woo Koo and Animesh Agrawal and Alexander Lerch and Subhrajit Guhathakurta},
url = {https://doi.org/10.1007/s44212-024-00053-9},
doi = {10.1007/s44212-024-00053-9},
issn = {2731-6963},
year = {2024},
date = {2024-07-01},
urldate = {2024-07-10},
journal = {Urban Informatics},
volume = {3},
number = {1},
pages = {22},
abstract = {While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.},
keywords = {Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{watcharasupat_stem-agnostic_2024,
title = {A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems},
author = {Karn N. Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2406.18747},
doi = {10.48550/arXiv.2406.18747},
year = {2024},
date = {2024-06-01},
urldate = {2024-08-08},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {San Francisco},
abstract = {Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{Ooi2024LionCitySoundscapes,
title = {Lion City Soundscapes: Modified Partitioning around Medoids for a
Perceptually Diverse Dataset of Singaporean Soundscapes},
author = {Kenneth Ooi and Jessie Goh and Hao-Weng Lin and Zhen-Ting Ong and Trevor Wong and Karn N. Watcharasupat and Bhan Lam and Woon-Seng Gan},
doi = {10.1121/10.0025830},
issn = {2691-1191},
year = {2024},
date = {2024-04-01},
journal = {JASA Express Letters},
volume = {4},
number = {4},
pages = {047402},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@article{ding_toward_2024,
title = {Toward audio-based sensing for pedestrian detection},
author = {Yiwei Ding and Chaeyeon Han and Pavan Seshadri and Bon Woo Koo and Noah Posner and Subhro Guhathakurta and Alexander Lerch},
url = {https://doi.org/10.1121/10.0027509},
doi = {10.1121/10.0027509},
issn = {0001-4966},
year = {2024},
date = {2024-03-01},
urldate = {2024-07-07},
journal = {The Journal of the Acoustical Society of America},
volume = {155},
number = {3_Supplement},
pages = {A282},
abstract = {The detection and counting of pedestrians plays a central role for the design of smart cities. Although the use of cameras for this task has been shown to have high accuracy, they come at a high cost and are susceptible to challenges such as poor lighting, fog, and obstructed views. Our study investigates audio-based pedestrian detection, combining potentially low cost sensors with advanced machine learning based audio analysis algorithms. With an audio sensor installed along the walkway, machine learning algorithms can tell from the audio whether there is a pedestrian or not, or how far the pedestrian is from the sensor. Results show the general feasibility of audio-based pedestrian detection but fall short of reaching the accuracy levels of video-based detection.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{ding_embedding_2024,
title = {Embedding Compression for Teacher-to-Student Knowledge Transfer},
author = {Yiwei Ding and Alexander Lerch},
url = {http://arxiv.org/abs/2402.06761},
doi = {10.48550/arXiv.2402.06761},
year = {2024},
date = {2024-02-01},
urldate = {2024-02-27},
booktitle = {Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) - Satellite Workshop Deep Neural Network Model Compression},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Seoul, Korea},
abstract = {Common knowledge distillation methods require the teacher model and the student model to be trained on the same task. However, the usage of embeddings as teachers has also been proposed for different source tasks and target tasks. Prior work that uses embeddings as teachers ignores the fact that the teacher embeddings are likely to contain irrelevant knowledge for the target task. To address this problem, we propose to use an embedding compression module with a trainable teacher transformation to obtain a compact teacher embedding. Results show that adding the embedding compression module improves the classification performance, especially for unsupervised teacher embeddings. Moreover, student models trained with the guidance of embeddings show stronger generalizability.},
note = {arXiv:2402.06761 [cs]},
keywords = {Computer Science - Machine Learning},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{seshadri_asped_2024,
title = {ASPED: An Audio Dataset for Detecting Pedestrians},
author = {Pavan Seshadri and Chaeyeon Han and Bon-Woo Koo and Noah Posner and Subhrajit Guhathakurta and Alexander Lerch},
url = {http://arxiv.org/abs/2309.06531},
doi = {10.48550/arXiv.2309.06531},
year = {2024},
date = {2024-01-01},
urldate = {2023-12-14},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Seoul},
abstract = {We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{watcharasupat_quantifying_2024,
title = {Quantifying Spatial Audio Quality Impairment},
author = {Karn N Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2309.06531},
doi = {10.48550/arXiv.2309.06531},
year = {2024},
date = {2024-01-01},
urldate = {2023-12-14},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Seoul},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{Ooi2024ARAUSLargeScaleDataset,
title = {ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to
Augmented Urban Soundscapes},
author = {Kenneth Ooi and Zhen-Ting Ong and Karn N. Watcharasupat and Bhan Lam and Joo Young Hong and Woon-Seng Gan},
doi = {10.1109/TAFFC.2023.3247914},
issn = {1949-3045},
year = {2024},
date = {2024-01-01},
journal = {IEEE Transactions on Affective Computing},
volume = {15},
number = {1},
pages = {105\textendash120},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@article{Watcharasupat2024ValidatingThaiTranslations,
title = {Validating Thai Translations of Perceptual Soundscape Attributes: A
Non-Procrustean Approach with a Procrustes Projection},
author = {Karn N Watcharasupat and Kenneth Ooi and Bhan Lam and Zhen-Ting Ong and Sureenate Jaratjarungkiat and Woon-Seng Gan},
doi = {10.1016/j.apacoust.2024.109999},
year = {2024},
date = {2024-01-01},
journal = {Applied Acoustics},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{liu_enhancing_2024,
title = {Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings},
author = {Shimiao Liu and Alexander Lerch},
url = {https://ieeexplore.ieee.org/abstract/document/10704086},
doi = {10.1109/IS262782.2024.10704086},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the IEEE International Symposium on the Internet of Sounds (IS2)},
address = {Erlangen},
abstract = {A fitting soundtrack can help a video better convey its content and provide a better immersive experience. This paper introduces a novel approach utilizing self-supervised learning and contrastive learning to automatically recommend audio for video content, thereby eliminating the need for manual labeling. We use a dual-branch cross-modal embedding model that maps both audio and video features into a common low-dimensional space. The fit of various audio-video pairs can then be modeled as inverse distance measure. In addition, a comparative analysis of various temporal encoding methods is presented, emphasizing the effectiveness of transformers in managing the temporal information of audio-video matching tasks. Through multiple experiments, we demonstrate that our model TIVM, which integrates transformer encoders and using InfoNCE loss, significantly improves the performance of audio-video matching and surpasses traditional methods.},
keywords = {Contrastive learning, Encoding, Fitting, Immersive experience, Internet, Labeling, Manuals, multi-modal, music, music recommendation, Recommender systems, trans-former, Transformers},
pubstate = {published},
tppubtype = {inproceedings}
}
2023
@article{Lam2023CrossingLinguisticCauseway,
title = {Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape
Attributes in Bahasa Melayu},
author = {Bhan Lam and Julia Chieng and Kenneth Ooi and Zhen Ting Ong and Karn N. Watcharasupat and Joo Young Hong and Woon Seng Gan},
doi = {10.1016/j.apacoust.2023.109675},
issn = {1872910X},
year = {2023},
date = {2023-11-01},
journal = {Applied Acoustics},
volume = {214},
publisher = {Elsevier Ltd},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{ding_audio_2023,
title = {Audio Embeddings as Teachers for Music Classification},
author = {Yiwei Ding and Alexander Lerch},
url = {http://arxiv.org/abs/2306.17424},
doi = {10.48550/arXiv.2306.17424},
year = {2023},
date = {2023-06-01},
urldate = {2023-06-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Milan, Italy},
abstract = {Music classification has been one of the most popular tasks in the field of music information retrieval. With the development of deep learning models, the last decade has seen impressive improvements in a wide range of classification tasks. However, the increasing model complexity makes both training and inference computationally expensive. In this paper, we integrate the ideas of transfer learning and feature-based knowledge distillation and systematically investigate using pre-trained audio embeddings as teachers to guide the training of low-complexity student networks. By regularizing the feature space of the student networks with the pre-trained embeddings, the knowledge in the teacher embeddings can be transferred to the students. We use various pre-trained audio embeddings and test the effectiveness of the method on the tasks of musical instrument classification and music auto-tagging. Results show that our method significantly improves the results in comparison to the identical model trained without the teacher's knowledge. This technique can also be combined with classical knowledge distillation approaches to further improve the model's performance.},
keywords = {Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{knees_milc_2023,
title = {MILC 2023: 3rd Workshop on Intelligent Music Interfaces for Listening and Creation},
author = {Peter Knees and Alexander Lerch},
url = {https://dl.acm.org/doi/10.1145/3581754.3584164},
doi = {10.1145/3581754.3584164},
isbn = {9798400701078},
year = {2023},
date = {2023-03-01},
urldate = {2023-03-31},
booktitle = {Companion Proceedings of the 28th International Conference on Intelligent User Interfaces},
pages = {185--186},
publisher = {Association for Computing Machinery},
address = {Sydney},
series = {IUI '23 Companion},
abstract = {The third edition of the Workshop on Intelligent Music Interfaces for Listening and Creation (MILC), held in collaboration with the 28th International Conference on Intelligent User Interfaces (IUI) features a half-day program addressing recent and future developments in human-centered music technology. The presented papers cover recommendation in sound libraries, the use of generative systems for composition in Digital Audio Workstations (DAWs), tools for richer means of interaction with music streaming platforms, and music personalization for Cochlear implant users.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{chen_music_2023,
title = {Music Instrument Classification Reprogrammed},
author = {Hsin-Hung Chen and Alexander Lerch},
url = {https://arxiv.org/abs/2211.08379},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the International Conference on Multimedia Modeling (MMM)},
address = {Bergen, Norway},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@book{lerch_introduction_2023,
title = {An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications},
author = {Alexander Lerch},
url = {https://ieeexplore.ieee.org/servlet/opac?bknumber=9965970},
isbn = {978-1-119-89094-2},
year = {2023},
date = {2023-01-01},
urldate = {2022-01-01},
publisher = {Wiley-IEEE Press},
address = {Hoboken, N.J},
edition = {2},
abstract = {An Introduction to Audio Content Analysis Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation. To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website. Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include: Digital audio signals and their representation, common time-frequency transforms, audio features Pitch and fundamental frequency detection, key and chord Representation of dynamics in music and intensity-related features Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment Audio fingerprinting, musical genre, mood, and instrument classification An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.},
keywords = {analysis, audio, Audio content analysis, audio signal processing, Automatic Music Transcription, Computer sound processing, machine listening, Matlab, MIR, music analysis, music informatics, music information retrieval, Python},
pubstate = {published},
tppubtype = {book}
}
@incollection{lerch_audioinhaltsanalyse_2023,
title = {Audioinhaltsanalyse},
author = {Alexander Lerch},
editor = {Stefan Weinzierl},
url = {https://doi.org/10.1007/978-3-662-60357-4_8-1},
doi = {10.1007/978-3-662-60357-4_8-1},
isbn = {978-3-662-60357-4},
year = {2023},
date = {2023-01-01},
urldate = {2023-03-30},
booktitle = {Handbuch der Audiotechnik},
pages = {1--20},
publisher = {Springer Berlin Heidelberg},
address = {Berlin, Heidelberg},
abstract = {Audiosignale enthalten eine F\"{u}lle von Informationen, die sich Menschen beim H\"{o}ren leicht erschlie\ssen. So transportiert ein Sprachsignal nicht nur Informationen \"{u}ber den Text, sondern auch \"{u}ber den Sprecher (z. B. Geschlecht, Alter, Akzent) und die Aufnahmeumgebung (z. B. drinnen vs. drau\ssen). In einem Musiksignal k\"{o}nnen wir die Musikinstrumente, die musikalische Struktur, den Stil, die Melodie, Harmonien und Tonalit\"{a}t, einen emotionalen Ausdruck und andere Charakteristika der Darbietung sowie das K\"{o}nnen der Vortragenden identifizieren. Die Audioinhaltsanalyse (Audio Content Analysis, ACA) zielt darauf ab, Algorithmen zur automatischen Extraktion dieser Inhalte aus dem (digitalen) Audiosignal zu entwickeln und einzusetzen; diese Algorithmen erm\"{o}glichen es uns, das Audiosignal basierend auf dem Inhalt zu sortieren, zu kategorisieren, zu segmentieren und zu visualisieren (Lerch 2012). M\"{o}gliche Anwendungen sind inhaltsbasierte automatische Playlist-Generierung und Musikempfehlungssysteme, computergest\"{u}tzte Musikproduktion und -bearbeitung sowie intelligente Musiklernprogramme, die Musiksch\"{u}lerinnen und -sch\"{u}ler auf Fehler und Verbesserungsm\"{o}glichkeiten beim Instrumentalspiel hinweisen.},
keywords = {Audio content analysis, Grundfrequenzerkennung, music information retrieval, Musikklassifizierung, Musiktranskription, Tonarterkennung},
pubstate = {published},
tppubtype = {incollection}
}
@inproceedings{smith_impact_2023,
title = {The Impact of Salient Musical Features in a Hybrid Recommendation System for a Sound Library},
author = {Jason Brent Smith and Ashvala Vinay and Jason Freeman},
url = {https://ceur-ws.org/Vol-3359/paper18.pdf},
year = {2023},
date = {2023-01-01},
booktitle = {Joint Proceedings of the ACM IUI Workshops (MILC)},
address = {Sydney},
abstract = {EarSketch is an online learning environment that teaches coding and music concepts through the computational manipulation of sounds selected from a large sound library. It features sound recommendations based on acoustic similarity and co-usage with a user’s current sound selection in order to encourage exploration of the library. However, students have reported that the recommended sounds do not complement their current projects in terms of two areas: musical key and rhythm. We aim to improve the relevance of these recommendations through the inclusion of these two musically related features. This paper describes the addition of key signature and beat extraction to the EarSketch sound recommendation model in order to improve the musical compatibility of the recommendations with the sounds in a user’s project. Additionally, we present an analysis of the effects of these new recommendation strategies on user exploration and usage of the recommended sounds. The results of this analysis suggest that the addition of explicitly musically-relevant attributes increases the coverage of the sound library among sound recommendations as well as the sounds selected by users. It reflects the importance of including multiple musical attributes when building recommendation systems for creative and open-ended musical systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_low-resource_2023,
title = {Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming},
author = {Yun-Ning Hung and Chao-Han Huck Yang and Pin-Yu Chen and Alexander Lerch},
url = {https://arxiv.org/abs/2211.01317},
doi = {10.1109/ICASSP49357.2023.10096568},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Rhodes Island, Greece},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@incollection{lerch_grundlagen_2023,
title = {Grundlagen digitaler Audiosignale},
author = {Alexander Lerch},
editor = {Stefan Weinzierl},
url = {https://doi.org/10.1007/978-3-662-60357-4_31-1},
isbn = {978-3-662-60357-4},
year = {2023},
date = {2023-01-01},
booktitle = {Handbuch der Audiotechnik},
pages = {1--13},
publisher = {Springer Berlin Heidelberg},
address = {Berlin, Heidelberg},
edition = {2},
abstract = {Die Bearbeitung, Speicherung und \"{U}bertragung von Signalen ist heutzutage fast ausschlie\sslich digital. Das theoretische Verst\"{a}ndnis des Digitalisierungsprozesses und der Eigenschaften digitaler Audiosignale legt daher ein wichtiges Fundament f\"{u}r das Verst\"{a}ndnis vieler Systeme der Signalverarbeitung. Diese Kapitel f\"{u}hrt die Grundlagen der Signalabtastung und -quantisierung ein sowie Ans\"{a}tze zu Erh\"{o}hung der Signalqualit\"{a}t wie Dither und Noise-Shaping und endet mit einem \"{U}berblick \"{u}ber typische Zahlenformate von digitalen Audiosignalen.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
@inproceedings{vinay_aquatk_2023,
title = {AQUATK: An Audio Quality Assessment Toolkit},
author = {Ashvala Vinay and Alexander Lerch},
url = {https://arxiv.org/abs/2311.10113},
doi = {10.48550/arXiv.2311.10113},
year = {2023},
date = {2023-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Milan},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{watcharasupat_generalized_2023,
title = {A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation},
author = {Karn N Watcharasupat and Chih-Wei Wu and Yiwei Ding and Iroro Orife and Aaron J Hipple and Phillip A Williams and Scott Kramer and Alexander Lerch and William Wolcott},
url = {https://ieeexplore.ieee.org/document/10342812/authors#authors},
doi = {10.1109/OJSP.2023.3339428},
issn = {2644-1322},
year = {2023},
date = {2023-01-01},
urldate = {2023-12-06},
journal = {IEEE Open Journal of Signal Processing},
pages = {1\textendash9},
abstract = {Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{Lam2023PreliminaryInvestigationShortterm,
title = {Preliminary Investigation of the Short-Term in Situ Performance of an
Automatic Masker Selection System},
author = {Bhan Lam and Kenneth Ooi and Zhen-Ting Ong and Trevor Wong and Woon-Seng Gan and Karn Watcharasupat},
doi = {10.3397/in_2023_0805},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Ong2023EffectMaskerSelection,
title = {Effect of Masker Selection Schemes on the Perceived Affective Quality of
Soundscapes: A Pilot Study},
author = {Zhen-Ting Ong and Kenneth Ooi and Trevor Wong and Bhan Lam and Woon-Seng Gan and Karn N. Watcharasupat},
doi = {10.3397/in_2023_0791},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Ooi2023ARAUSv2ExpandedDataset,
title = {ARAUSv2: An Expanded Dataset and Multimodal Models of Affective Responses to
Augmented Urban Soundscapes},
author = {Kenneth Ooi and Zhen-Ting Ong and Bhan Lam and Trevor Wong and Woon-Seng Gan and Karn Watcharasupat},
doi = {10.3397/in_2023_0459},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Ooi2023AutonomousSoundscapeAugmentation,
title = {Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and
Participant-linked Inputs},
author = {Kenneth Ooi and Karn N Watcharasupat and Bhan Lam and Zhen-Ting Ong and Woon-Seng Gan},
doi = {10.1109/ICASSP49357.2023.10094866},
year = {2023},
date = {2023-01-01},
booktitle = {Proceedings of the 2023 International Conference on Acoustics, Speech, and
Signal Processing},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2022
@article{hung_large_2022,
title = {A large TV dataset for speech and music activity detection},
author = {Yun-Ning Hung and Chih-Wei Wu and Iroro Orife and Aaron Hipple and William Wolcott and Alexander Lerch},
url = {https://doi.org/10.1186/s13636-022-00253-8},
doi = {10.1186/s13636-022-00253-8},
issn = {1687-4722},
year = {2022},
date = {2022-09-01},
urldate = {2022-09-03},
journal = {EURASIP Journal on Audio, Speech, and Music Processing},
volume = {2022},
number = {1},
pages = {21},
abstract = {Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and the cost of manual annotation, the limited availability of diverse and sizeable datasets hinders the progress of state-of-the-art (SOTA) data-driven approaches. We address this challenge by presenting a large-scale dataset containing Mel spectrogram, VGGish, and MFCCs features extracted from around 1600 h of professionally produced audio tracks and their corresponding noisy labels indicating the approximate location of speech and music segments. The labels are several sources such as subtitles and cuesheet. A test set curated by human annotators is also included as a subset for evaluation. To validate the generalizability of the proposed dataset, we conduct several experiments comparing various model architectures and their variants under different conditions. The results suggest that our proposed dataset is able to serve as a reliable training resource and leads to SOTA performances on various public datasets. To the best of our knowledge, this dataset is the first large-scale, open-sourced dataset that contains features extracted from professionally produced audio tracks and their corresponding frame-level speech and music annotations.},
keywords = {Dataset, Production TV audio, Speech and music activation detection},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{ma_representation_2022,
title = {Representation Learning for the Automatic Indexing of Sound Effects Libraries},
author = {Alison B Ma and Alexander Lerch},
url = {http://arxiv.org/abs/2208.09096},
doi = {10.48550/arXiv.2208.09096},
year = {2022},
date = {2022-08-01},
urldate = {2022-08-22},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Bangalore, IN},
abstract = {Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established representations such as OpenL3. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.},
note = {arXiv:2208.09096 [cs, eess]},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{vinay_evaluating_2022,
title = {Evaluating Generative Audio Systems and their Metrics},
author = {Ashvala Vinay and Alexander Lerch},
url = {http://arxiv.org/abs/2209.00130},
doi = {10.48550/arXiv.2209.00130},
year = {2022},
date = {2022-08-01},
urldate = {2022-09-03},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Bangalore, IN},
abstract = {Recent years have seen considerable advances in audio synthesis with deep generative models. However, the state-of-the-art is very difficult to quantify; different studies often use different evaluation methodologies and different metrics when reporting results, making a direct comparison to other systems difficult if not impossible. Furthermore, the perceptual relevance and meaning of the reported metrics in most cases unknown, prohibiting any conclusive insights with respect to practical usability and audio quality. This paper presents a study that investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and with (ii) a listening study. The results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.},
note = {arXiv:2209.00130 [cs, eess]},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{lerch_libaca_2022-1,
title = {libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages},
author = {Alexander Lerch},
url = {https://www.sciencedirect.com/science/article/pii/S2665963822000677},
doi = {10.1016/j.simpa.2022.100349},
issn = {2665-9638},
year = {2022},
date = {2022-07-01},
urldate = {2022-07-04},
journal = {Software Impacts},
pages = {100349},
abstract = {The three packages libACA, pyACA, and ACA-Code provide reference implementations for basic approaches and algorithms for the analysis of musical audio signals in three different languages: C++, Python, and Matlab. All three packages cover the same algorithms, such as extraction of low level audio features, fundamental frequency estimation, as well as simple approaches to chord recognition, musical key detection, and onset detection. In addition, it implementations of more generic algorithms useful in audio content analysis such as dynamic time warping and the Viterbi algorithm are provided. The three packages thus provide a practical cross-language and cross-platform reference to students and engineers implementing audio analysis algorithms and enable implementation-focused learning of algorithms for audio content analysis and music information retrieval.},
keywords = {Audio content analysis, C++, Matlab, music information retrieval, Python},
pubstate = {published},
tppubtype = {article}
}
@misc{hung_feature-informed_2022-1,
title = {Feature-informed Latent Space Regularization for Music Source Separation},
author = {Yun-Ning Hung and Alexander Lerch},
url = {http://arxiv.org/abs/2203.09132},
doi = {10.48550/arXiv.2203.09132},
year = {2022},
date = {2022-06-01},
urldate = {2022-09-03},
publisher = {arXiv},
abstract = {The integration of additional side information to improve music source separation has been investigated numerous times, e.g., by adding features to the input or by adding learning targets in a multi-task learning scenario. These approaches, however, require additional annotations such as musical scores, instrument labels, etc. in training and possibly during inference. The available datasets for source separation do not usually provide these additional annotations. In this work, we explore transfer learning strategies to incorporate VGGish features with a state-of-the-art source separation model; VGGish features are known to be a very condensed representation of audio content and have been successfully used in many MIR tasks. We introduce three approaches to incorporate the features, including two latent space regularization methods and one naive concatenation method. Experimental results show that our proposed approaches improve several evaluation metrics for music source separation.},
note = {arXiv:2203.09132 [eess]},
keywords = {Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {misc}
}
@inproceedings{wang_catch_2022,
title = {To Catch A Chorus, Verse, Intro, or Anything Else: Analyzing a Song with Structural Functions},
author = {Ju-Chiang Wang and Yun-Ning Hung and Jordan B. L. Smith},
url = {https://ieeexplore.ieee.org/abstract/document/9747252/authors#authors},
doi = {10.1109/ICASSP43922.2022.9747252},
year = {2022},
date = {2022-05-01},
urldate = {2024-02-08},
booktitle = {ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages = {416\textendash420},
abstract = {Conventional music structure analysis algorithms aim to divide a song into segments and to group them with abstract labels (e.g., ‘A’, ‘B’, and ‘C’). However, explicitly identifying the function of each segment (e.g., ‘verse’ or ‘chorus’) is rarely attempted, but has many applications. We introduce a multi-task deep learning framework to model these structural semantic labels directly from audio by estimating "verseness," "chorusness," and so forth, as a function of time. We propose a 7-class taxonomy (i.e., intro, verse, chorus, bridge, outro, instrumental, and silence) and provide rules to consolidate annotations from four disparate datasets. We also propose to use a spectral-temporal Transformer-based model, called SpecTNT, which can be trained with an additional connectionist temporal localization (CTL) loss. In cross-dataset evaluations using four public datasets, we demonstrate the effectiveness of the SpecTNT model and CTL loss, and obtain strong results overall: the proposed system outperforms state-of-the-art chorus-detection and boundary-detection methods at detecting choruses and boundaries, respectively.},
note = {ISSN: 2379-190X},
keywords = {Location awareness, music, Music structure, segmentation, semantic labeling, Semantics, Signal processing, Signal processing algorithms, SpecTNT, Taxonomy, Transformer, Transformers},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{watcharasupat_latte_2022,
title = {Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models},
author = {Karn N Watcharasupat and Junyoung Lee and Alexander Lerch},
url = {https://linkinghub.elsevier.com/retrieve/pii/S2665963822000033},
doi = {10.1016/j.simpa.2022.100222},
issn = {26659638},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-13},
journal = {Software Impacts},
pages = {100222},
abstract = {Latte (for LATent Tensor Evaluation) is a Python library for evaluation of latent-based generative models in the fields of disentanglement learning and controllable generation. Latte is compatible with both PyTorch and TensorFlow/Keras, and provides both functional and modular APIs that can be easily extended to support other deep learning frameworks. Using NumPy-based and framework-agnostic implementation, Latte ensures reproducible, consistent, and deterministic metric calculations regardless of the deep learning framework of choice.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{kalbag_scream_2022,
title = {Scream Detection in Heavy Metal Music},
author = {Vedant Kalbag and Alexander Lerch},
url = {http://arxiv.org/abs/2205.05580},
doi = {10.48550/arXiv.2205.05580},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the Sound and Music Computing Conference (SMC)},
address = {Saint-Etienne},
abstract = {Harsh vocal effects such as screams or growls are far more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music, specifically the identification of different scream techniques. We investigate the suitability of various feature representations, including cepstral, spectral, and temporal features as input representations for classification. The main contributions of this work are (i) a manually annotated dataset comprised of over 280 minutes of heavy metal songs of various genres with a statistical analysis of occurrences of different extreme vocal techniques in heavy metal music, and (ii) a systematic study of different input feature representations for the classification of heavy metal vocals},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_feature-informed_2022,
title = {Feature-informed Embedding Space Regularization for Audio Classification},
author = {Yun-Ning Hung and Alexander Lerch},
url = {http://arxiv.org/abs/2206.04850},
doi = {10.48550/arXiv.2206.04850},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
address = {Belgrade, Serbia},
abstract = {Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computational burdens during inference. We propose to leverage both detailed task-specific features from spectrogram input and generic pre-trained features by introducing two regularization methods that integrate the information of both feature classes. The workload is kept low during inference as the pre-trained features are only necessary for training. In experiments with the pre-trained features VGGish, OpenL3, and a combination of both, we show that the proposed methods not only outperform baseline methods, but also can improve state-of-the-art models on several audio classification tasks. The results also suggest that using the mixture of features performs better than using individual features.},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{guo_deep_2022,
title = {Deep Reinforcement Learning for Urban Multi-taxis Cruising Strategy},
author = {Weian Guo and Zhenyao Hua and Zecheng Kang and Dongyang Li and Lei Wang and Qidi Wu and Alexander Lerch},
url = {https://doi.org/10.1007/s00521-022-07255-9},
doi = {10.1007/s00521-022-07255-9},
issn = {1433-3058},
year = {2022},
date = {2022-01-01},
urldate = {2022-04-27},
journal = {Neural Computing and Applications},
abstract = {Taxis play an important role in urban transportation system. Efficient taxi cruising strategies are helpful to alleviate urban traffic congestions, reduce pollution emission, attenuate greenhouse gas, and also provide a fast service for passengers. However, in real scenarios, taxis cruising strategies are mostly based on their own experiences. At unfamiliar urban areas or during off-peak hours, drivers usually have no good idea for an optimal cruising strategy, which causes a low efficiency for service and also increases taxi operation costs. Considering that it is difficult to construct an analytical model for the taxis scheduling and cruising, in this paper, we put forward a data-driven model for multi-taxis cruising based on reinforcement learning. Furthermore, an evolutionary reinforcement learning method is proposed, which aims at improving the exploration of reinforcement learning and enhancing reinforcement learning to maximize the global reward in multi-agent tasks. In the experimental part, two other kinds of deep Q-learning methods and a roaming strategy are employed in the comparisons. The results demonstrate the superiority of our proposed algorithm.},
keywords = {Data-driven model, deep Q-learning network, Multi-taxis cruising, Urban transportation},
pubstate = {published},
tppubtype = {article}
}
@incollection{herre_quellcodierung_2022,
title = {Quellcodierung},
author = {J\"{u}rgen Herre and Sascha Disch and Alexander Lerch},
editor = {Stefan Weinzierl},
url = {https://doi.org/10.1007/978-3-662-60357-4_34-1},
doi = {10.1007/978-3-662-60357-4_34-1},
isbn = {978-3-662-60357-4},
year = {2022},
date = {2022-01-01},
urldate = {2022-10-02},
booktitle = {Handbuch der Audiotechnik},
pages = {1--23},
publisher = {Springer},
address = {Berlin, Heidelberg},
edition = {2},
abstract = {Zur Bitratenreduktion (Datenkomprimierung) eingesetzte Codierungsverfahren haben die Aufgabe, die ben\"{o}tigte Datenmenge zur \"{U}bertragung oder Speicherung von digitalen Signalen ohne Verlust oder mit m\"{o}glichst geringem Qualit\"{a}tsverlust zu verkleinern. Sie werden entweder aus \"{o}konomischen Gr\"{u}nden wie der Kostenersparnis durch geringere erforderliche \"{U}bertragungsbandbreiten, oder aus technischen Gr\"{u}nden wie einem in der Gr\"{o}\sse beschr\"{a}nkten Speicherplatz oder eingeschr\"{a}nkten \"{U}bertragungskapazit\"{a}ten eingesetzt. Codierungsverfahren finden Anwendung in Datennetzen wie beispielsweise dem Internet beim Multimediavertrieb von Musik und Filmen, bei Streamingdiensten und im Rundfunk, in Filmtheatern und in der Telekommunikation aber auch auf physikalischen Datentr\"{a}gern wie DVD (Digital Versatile Disc), bei der Archivierung gro\sser Datenmengen auf Festplatte und auf Speicherkarten in portablen Mediaplayern. Dieses Kapitel vermittelt die technischen Grundlagen der effizienten und geh\"{o}rrichtigen Audiocodierung. Erg\"{a}nzend werden einige gebr\"{a}uchliche standardisierte Verfahren zur Messung der subjektiven Audioqualit\"{a}t erl\"{a}utert. Desweiteren wird ein \"{U}berblick \"{u}ber g\"{a}ngige verlustlose und verlustbehaftete Audiocodierverfahren und ihre qualitative Einordnung gegeben.},
keywords = {Audiocodierung, Audiokomprimierung, Codec, MPEG, Psychoakustik, verlustfrei, verlustlos},
pubstate = {published},
tppubtype = {incollection}
}
@article{li_large-scale_2022,
title = {A Large-Scale Multiobjective Particle Swarm Optimizer With Enhanced Balance of Convergence and Diversity},
author = {Dongyang Li and Lei Wang and Li Li and Weian Guo and Qidi Wu and Alexander Lerch},
doi = {10.1109/TCYB.2022.3225341},
issn = {2168-2275},
year = {2022},
date = {2022-01-01},
journal = {IEEE Transactions on Cybernetics},
pages = {1--12},
abstract = {Large-scale multiobjective optimization problems (LSMOPs) continue to be challenging for existing multiobjective evolutionary algorithms (MOEAs). The main difficulties are that: 1) the diversity preservation in both the objective space and the decision space needs to be taken into account when solving LSMOPs and 2) the existing learning structures in current MOEAs usually make the learning operators only coincidentally serve convergence and diversity, leading to difficulties in balancing these two factors. Therefore, balancing convergence and diversity in current MOEAs is difficult. To address these issues, this article proposes a multiobjective particle swarm optimizer with enhanced balance of convergence and diversity (MPSO-EBCD). In MPSO-EBCD, a novel velocity update structure for multiobjective particle swarm optimization is put forward, dividing the convergence, and diversity preservation operations into independent components. Following the proposed update structure, a weighted convergence factor is introduced to serve the convergence strategy, whilst a diversity preservation strategy is built to uniformly distribute the particles in the searched space based on a proposed multidimensional local sparseness degree indicator. By this means, MPSO-EBCD is able to balance convergence and diversity with specific parameters in independent operators. Experimental results on LSMOP benchmarks and a voltage transformer optimization problem demonstrate the competitiveness of the proposed algorithm compared to several state-of-the-art MOEAs.},
keywords = {Complexity theory, Convergence, Cybernetics, diversity, Estimation, large-scale multiobjective optimization, multidimensional local sparseness, Optimization, Particle swarm optimization, particle swarm optimization (PSO), Weight measurement, weighted convergence factor (WCF)},
pubstate = {published},
tppubtype = {article}
}
@article{Watcharasupat2022AutonomousInSituSoundscape,
title = {Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and
Gain},
author = {Karn N. Watcharasupat and Kenneth Ooi and Bhan Lam and Trevor Wong and Zhen Ting Ong and Woon Seng Gan},
doi = {10.1109/lsp.2022.3194419},
issn = {15582361},
year = {2022},
date = {2022-01-01},
journal = {IEEE Signal Processing Letters},
volume = {29},
pages = {1749\textendash1753},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2021
@inproceedings{watcharasupat_evaluation_2021,
title = {Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes},
author = {Karn N Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2110.05587},
year = {2021},
date = {2021-10-01},
urldate = {2021-11-11},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Online},
abstract = {Controllable music generation with deep generative models has become increasingly reliant on disentanglement learning techniques. However, current disentanglement metrics, such as mutual information gap (MIG), are often inadequate and misleading when used for evaluating latent representations in the presence of interdependent semantic attributes often encountered in real-world music datasets. In this work, we propose a dependency-aware information metric as a drop-in replacement for MIG that accounts for the inherent relationship between semantic attributes.},
keywords = {Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_transcription_2021,
title = {Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision},
author = {Yun-Ning Hung and Gordon Wichern and Jonathan Le Roux},
url = {https://ieeexplore.ieee.org/abstract/document/9413358/authors#authors},
doi = {10.1109/ICASSP39728.2021.9413358},
year = {2021},
date = {2021-06-01},
urldate = {2024-02-08},
booktitle = {ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages = {46\textendash50},
abstract = {Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain. In this work, we use musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system. In contrast with previous score-informed separation approaches, our system does not require isolated sources, and score is used only as a training target, not required for inference. Our model consists of a separator that outputs a time-frequency mask for each instrument, and a transcriptor that acts as a critic, providing both temporal and frequency supervision to guide the learning of the separator. A harmonic mask constraint is introduced as another way of leveraging score information during training, and we propose two novel adversarial losses for additional fine-tuning of both the transcriptor and the separator. Results demonstrate that using score information outper-forms temporal weak-labels, and adversarial structures lead to further improvements in both separation and transcription performance.},
note = {ISSN: 2379-190X},
keywords = {audio source separation, Conferences, Instruments, music, music transcription, Particle separators, Source separation, Time-frequency analysis, Training, weakly-labeled data, weakly-supervised separation},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{li_adaptive_2021,
title = {An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization},
author = {Dongyang Li and Lei Wang and Alexander Lerch and Qidi Wu},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2023/02/Li-et-al.-2020-An-Adaptive-Particle-Swarm-Optimizer-with-Decouple.pdf},
doi = {10.1016/j.swevo.2020.100789},
issn = {2210-6502},
year = {2021},
date = {2021-01-01},
urldate = {2021-01-01},
journal = {Swarm and Evolutionary Computation},
volume = {60},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{vinay_mind_2021,
title = {Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening},
author = {Ashvala Vinay and Alexander Lerch and Grace Leslie},
url = {https://arxiv.org/abs/2102.06393},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Toronto, Ontario, Canada},
abstract = {We propose a deep learning approach to predicting audio event onsets in electroencephalogram (EEG) recorded from users as they listen to music. We use a publicly available dataset containing ten contemporary songs and concurrently recorded EEG. We generate a sequence of onset labels for the songs in our dataset and trained neural networks (a fully connected network (FCN) and a recurrent neural network (RNN)) to parse one second windows of input EEG to predict one second windows of onsets in the audio. We compare our RNN network to both the standard spectral-flux based novelty function and the FCN. We find that our RNN was able to produce results that reflected its ability to generalize better than the other methods.
Since there are no pre-existing works on this topic, the numbers presented in this paper may serve as useful benchmarks for future approaches to this research problem.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Since there are no pre-existing works on this topic, the numbers presented in this paper may serve as useful benchmarks for future approaches to this research problem.@inproceedings{seshadri_improving_2021,
title = {Improving Music Performance Assessment with Contrastive Learning},
author = {Pavan Seshadri and Alexander Lerch},
url = {https://arxiv.org/abs/2108.01711},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
pages = {8},
address = {Online},
abstract = {Several automatic approaches for objective music performance assessment (MPA) have been proposed in the past, however, existing systems are not yet capable of reliably predicting ratings with the same accuracy as professional judges. This study investigates contrastive learning as a potential method to improve existing MPA systems. Contrastive learning is a widely used technique in representation learning to learn a structured latent space capable of separately clustering multiple classes. It has been shown to produce state of the art results for image-based classification problems. We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network and show that contrastive loss results in performance gains in regression tasks for MPA. Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks by creating better class clusters within the latent space of the neural networks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_is_2021,
title = {Is Disentanglement Enough? On Latent Representations for Controllable Music Generation},
author = {Ashis Pati and Alexander Lerch},
url = {https://arxiv.org/abs/2108.01450},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
pages = {8},
address = {Online},
abstract = {Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between disentanglement and controllability by conducting a systematic study using different supervised disentanglement learning algorithms based on the Variational Auto-Encoder (VAE) architecture. Our experiments show that a high degree of disentanglement can be achieved by using different forms of supervision to train a strong discriminative encoder. However, in the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes. To this end, we also propose methods and metrics to help evaluate the quality of a latent space with respect to the afforded degree of controllability.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_semi-supervised_2021,
title = {Semi-Supervised Audio Classification with Partially Labeled Data},
author = {Siddharth Gururani and Alexander Lerch},
url = {https://arxiv.org/abs/2111.12761},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the IEEE International Symposium on Multimedia (ISM)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {online},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{lerch_machine_2021,
title = {Machine Learning Applied to Music/Audio Signal Processing},
author = {Alexander Lerch and Peter Knees},
url = {https://www.mdpi.com/2079-9292/10/24/3077},
doi = {10.3390/electronics10243077},
year = {2021},
date = {2021-01-01},
urldate = {2021-12-10},
journal = {Electronics},
volume = {10},
number = {24},
pages = {3077},
abstract = {Over the past two decades, the utilization of machine learning in audio and music signal processing has dramatically increased [...]},
keywords = {n/a},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{hung_avaspeech-smad_2021,
title = {AVASPEECH-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-occurence},
author = {Yun-Ning Hung and Karn N Watcharasupat and Chih-Wei Wu and Iroro Orife and Kelian Li and Pavan Seshadri and Junyoung Lee},
year = {2021},
date = {2021-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
pages = {3},
address = {Online},
abstract = {We propose a dataset, AVASpeech-SMAD, to assist speech and music activity detection research. With frame-level music labels, the proposed dataset extends the existing AVASpeech dataset, which originally consists of 45 hours of audio and speech activity labels. To the best of our knowledge, the proposed AVASpeech-SMAD is the first open-source dataset that features strong polyphonic labels for both music and speech. The dataset was manually annotated and verified via an iterative cross-checking process. A simple automatic examination was also implemented to further improve the quality of the labels. Evaluation results from two state-of-the-art SMAD systems are also provided as a benchmark for future reference.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2020
@inproceedings{huang_score-informed_2020,
title = {Score-informed Networks for Music Performance Assessment},
author = {Jiawen Huang and Yun-Ning Hung and Ashis K Pati and Siddharth Gururani and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/08/Huang-et-al.-2020-Score-informed-Networks-for-Music-Performance-Asse.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Montreal},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
publications
Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024. Music auto-tagging in the long tail: A few-shot approach Proceedings Article In: Proceedings of the AES Convention, New York, 2024. Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963. A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024. Lion City Soundscapes: Modified Partitioning around Medoids for a
Perceptually Diverse Dataset of Singaporean Soundscapes Journal Article In: JASA Express Letters, vol. 4, no. 4, pp. 047402, 2024, ISSN: 2691-1191. Toward audio-based sensing for pedestrian detection Journal Article In: The Journal of the Acoustical Society of America, vol. 155, no. 3_Supplement, pp. A282, 2024, ISSN: 0001-4966. Embedding Compression for Teacher-to-Student Knowledge Transfer Proceedings Article In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) - Satellite Workshop Deep Neural Network Model Compression, Institute of Electrical and Electronics Engineers (IEEE), Seoul, Korea, 2024, (arXiv:2402.06761 [cs]). ASPED: An Audio Dataset for Detecting Pedestrians Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. Quantifying Spatial Audio Quality Impairment Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024. ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to
Augmented Urban Soundscapes Journal Article In: IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 105–120, 2024, ISSN: 1949-3045. Validating Thai Translations of Perceptual Soundscape Attributes: A
Non-Procrustean Approach with a Procrustes Projection Journal Article In: Applied Acoustics, 2024. Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings Proceedings Article In: Proceedings of the IEEE International Symposium on the Internet of Sounds (IS2), Erlangen, 2024. Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape
Attributes in Bahasa Melayu Journal Article In: Applied Acoustics, vol. 214, 2023, ISSN: 1872910X. Audio Embeddings as Teachers for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023. MILC 2023: 3rd Workshop on Intelligent Music Interfaces for Listening and Creation Proceedings Article In: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 185–186, Association for Computing Machinery, Sydney, 2023, ISBN: 9798400701078. Music Instrument Classification Reprogrammed Proceedings Article In: Proceedings of the International Conference on Multimedia Modeling (MMM), Bergen, Norway, 2023. An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications Book 2, Wiley-IEEE Press, Hoboken, N.J, 2023, ISBN: 978-1-119-89094-2. Audioinhaltsanalyse Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–20, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4. The Impact of Salient Musical Features in a Hybrid Recommendation System for a Sound Library Proceedings Article In: Joint Proceedings of the ACM IUI Workshops (MILC), Sydney, 2023. Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Rhodes Island, Greece, 2023. Grundlagen digitaler Audiosignale Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–13, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4. AQUATK: An Audio Quality Assessment Toolkit Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Milan, 2023. A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation Journal Article In: IEEE Open Journal of Signal Processing, pp. 1–9, 2023, ISSN: 2644-1322. Preliminary Investigation of the Short-Term in Situ Performance of an
Automatic Masker Selection System Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. Effect of Masker Selection Schemes on the Perceived Affective Quality of
Soundscapes: A Pilot Study Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. ARAUSv2: An Expanded Dataset and Multimodal Models of Affective Responses to
Augmented Urban Soundscapes Proceedings Article In: Proceedings of the 52nd International Congress and Exposition on Noise
Control Engineering, 2023. Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and
Participant-linked Inputs Proceedings Article In: Proceedings of the 2023 International Conference on Acoustics, Speech, and
Signal Processing, 2023. A large TV dataset for speech and music activity detection Journal Article In: EURASIP Journal on Audio, Speech, and Music Processing, vol. 2022, no. 1, pp. 21, 2022, ISSN: 1687-4722. Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]). Evaluating Generative Audio Systems and their Metrics Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]). libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages Journal Article In: Software Impacts, pp. 100349, 2022, ISSN: 2665-9638. Feature-informed Latent Space Regularization for Music Source Separation Miscellaneous 2022, (arXiv:2203.09132 [eess]). To Catch A Chorus, Verse, Intro, or Anything Else: Analyzing a Song with Structural Functions Proceedings Article In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 416–420, 2022, (ISSN: 2379-190X). Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models Journal Article In: Software Impacts, pp. 100222, 2022, ISSN: 26659638. Scream Detection in Heavy Metal Music Proceedings Article In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Feature-informed Embedding Space Regularization for Audio Classification Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Deep Reinforcement Learning for Urban Multi-taxis Cruising Strategy Journal Article In: Neural Computing and Applications, 2022, ISSN: 1433-3058. Quellcodierung Book Section In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–23, Springer, Berlin, Heidelberg, 2022, ISBN: 978-3-662-60357-4. A Large-Scale Multiobjective Particle Swarm Optimizer With Enhanced Balance of Convergence and Diversity Journal Article In: IEEE Transactions on Cybernetics, pp. 1–12, 2022, ISSN: 2168-2275. Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and
Gain Journal Article In: IEEE Signal Processing Letters, vol. 29, pp. 1749–1753, 2022, ISSN: 15582361. Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021. Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision Proceedings Article In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 46–50, 2021, (ISSN: 2379-190X). An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization Journal Article In: Swarm and Evolutionary Computation, vol. 60, 2021, ISSN: 2210-6502. Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Toronto, Ontario, Canada, 2021. Improving Music Performance Assessment with Contrastive Learning Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Is Disentanglement Enough? On Latent Representations for Controllable Music Generation Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Semi-Supervised Audio Classification with Partially Labeled Data Proceedings Article In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), online, 2021. Machine Learning Applied to Music/Audio Signal Processing Journal Article In: Electronics, vol. 10, no. 24, pp. 3077, 2021. AVASPEECH-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-occurence Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 3, Online, 2021. Score-informed Networks for Music Performance Assessment Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020.2024
2023
2022
2021
2020