publications | Music Informatics Group

127 entries « ‹ 1 of 3 › »

2025

Watcharasupat, Karn N.; Ding, Yiwei; Ma, T. Aleksandra; Seshadri, Pavan; Lerch, Alexander

Uncertainty Estimation in the Real World: A Study on Music Emotion Recognition Proceedings Article

In: Proceedings of the European Conference on Information Retrieval (ECIR), arXiv, Lucca, Italy, 2025.

Abstract | Links | BibTeX | Tags: Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Zölzer, Udo; Lerch, Alexander

Digitale Audio-Effekte Book Section

In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–22, Springer, Berlin, Heidelberg, 2025, ISBN: 978-3-662-60357-4.

Abstract | Links | BibTeX | Tags: Audio-Effekte, Chorus, Delay, Digitale Filter, FIR-Filter, Flanger, IIR-Filter, Impulsantwort, Nichtlineare Effekte, Phaser, Pitch shifting, Time compression, Tremolo, Vibrato

2024

Kim, Yonghyun; Lerch, Alexander

Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation Proceedings Article

In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), arXiv, San Francisco, 2024.

Ma, T. Aleksandra; Lerch, Alexander

Music auto-tagging in the long tail: A few-shot approach Proceedings Article

In: Proceedings of the AES Convention, New York, 2024.

Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3

@inproceedings{ma_music_2024,

title = {Music auto-tagging in the long tail: A few-shot approach},

author = {T. Aleksandra Ma and Alexander Lerch},

url = {http://arxiv.org/abs/2409.07730},

doi = {10.48550/arXiv.2409.07730},

year  = {2024},

date = {2024-09-01},

urldate = {2024-09-13},

booktitle = {Proceedings of the AES Convention},

address = {New York},

abstract = {In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostly accurate, whereas automatic tagging through supervised learning has approached satisfying accuracy but is restricted to a predefined set of training tags. Few-shot learning offers a viable solution to expand beyond this small set of predefined tags by enabling models to learn from only a few human-provided examples to understand tag meanings and subsequently apply these tags autonomously. We propose to integrate few-shot learning methodology into multi-label music auto-tagging by using features from pre-trained models as inputs to a lightweight linear classifier, also known as a linear probe. We investigate different popular pre-trained features, as well as different few-shot parametrizations with varying numbers of classes and samples per class. Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models while using significantly less training data, such as 20 samples per tag. Additionally, our linear probe performs competitively with leading models when trained on the entire training dataset. The results show that this transfer learning-based few-shot approach could effectively address the issue of automatically assigning long-tail tags with only limited labeled data.},

keywords = {Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.3.3},

pubstate = {published},

tppubtype = {inproceedings}

}

Han, Chaeyeon; Seshadri, Pavan; Ding, Yiwei; Posner, Noah; Koo, Bon Woo; Agrawal, Animesh; Lerch, Alexander; Guhathakurta, Subhrajit

Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors Journal Article

In: Urban Informatics, vol. 3, no. 1, pp. 22, 2024, ISSN: 2731-6963.

Abstract | Links | BibTeX | Tags: Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors

@article{han_understanding_2024,

title = {Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors},

author = {Chaeyeon Han and Pavan Seshadri and Yiwei Ding and Noah Posner and Bon Woo Koo and Animesh Agrawal and Alexander Lerch and Subhrajit Guhathakurta},

url = {https://doi.org/10.1007/s44212-024-00053-9},

doi = {10.1007/s44212-024-00053-9},

issn = {2731-6963},

year  = {2024},

date = {2024-07-01},

urldate = {2024-07-10},

journal = {Urban Informatics},

volume = {3},

number = {1},

pages = {22},

abstract = {While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.},

keywords = {Active mobility, Audio-based, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Pedestrian, Sensors},

pubstate = {published},

tppubtype = {article}

}

Watcharasupat, Karn N.; Lerch, Alexander

A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), San Francisco, 2024.

@inproceedings{watcharasupat_stem-agnostic_2024,

title = {A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems},

author = {Karn N. Watcharasupat and Alexander Lerch},

url = {http://arxiv.org/abs/2406.18747},

doi = {10.48550/arXiv.2406.18747},

year  = {2024},

date = {2024-06-01},

urldate = {2024-08-08},

booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},

address = {San Francisco},

abstract = {Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.},

keywords = {Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},

pubstate = {published},

tppubtype = {inproceedings}

}

Ooi, Kenneth; Goh, Jessie; Lin, Hao-Weng; Ong, Zhen-Ting; Wong, Trevor; Watcharasupat, Karn N.; Lam, Bhan; Gan, Woon-Seng

Lion City Soundscapes: Modified Partitioning around Medoids for a Perceptually Diverse Dataset of Singaporean Soundscapes Journal Article

In: JASA Express Letters, vol. 4, no. 4, pp. 047402, 2024, ISSN: 2691-1191.

Links | BibTeX | Tags:

Ding, Yiwei; Han, Chaeyeon; Seshadri, Pavan; Koo, Bon Woo; Posner, Noah; Guhathakurta, Subhro; Lerch, Alexander

Toward audio-based sensing for pedestrian detection Journal Article

In: The Journal of the Acoustical Society of America, vol. 155, no. 3_Supplement, pp. A282, 2024, ISSN: 0001-4966.

Abstract | Links | BibTeX | Tags:

Ding, Yiwei; Lerch, Alexander

Embedding Compression for Teacher-to-Student Knowledge Transfer Proceedings Article

In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP) - Satellite Workshop Deep Neural Network Model Compression, Institute of Electrical and Electronics Engineers (IEEE), Seoul, Korea, 2024, (arXiv:2402.06761 [cs]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning

Seshadri, Pavan; Han, Chaeyeon; Koo, Bon-Woo; Posner, Noah; Guhathakurta, Subhrajit; Lerch, Alexander

ASPED: An Audio Dataset for Detecting Pedestrians Proceedings Article

In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024.

Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Watcharasupat, Karn N; Lerch, Alexander

Quantifying Spatial Audio Quality Impairment Proceedings Article

In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Seoul, 2024.

Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Ooi, Kenneth; Ong, Zhen-Ting; Watcharasupat, Karn N.; Lam, Bhan; Hong, Joo Young; Gan, Woon-Seng

ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes Journal Article

In: IEEE Transactions on Affective Computing, vol. 15, no. 1, pp. 105–120, 2024, ISSN: 1949-3045.

Links | BibTeX | Tags:

Watcharasupat, Karn N; Ooi, Kenneth; Lam, Bhan; Ong, Zhen-Ting; Jaratjarungkiat, Sureenate; Gan, Woon-Seng

Validating Thai Translations of Perceptual Soundscape Attributes: A Non-Procrustean Approach with a Procrustes Projection Journal Article

In: Applied Acoustics, 2024.

Links | BibTeX | Tags:

Liu, Shimiao; Lerch, Alexander

Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings Proceedings Article

In: Proceedings of the IEEE International Symposium on the Internet of Sounds (IS2), Erlangen, 2024.

Abstract | Links | BibTeX | Tags: Contrastive learning, Encoding, Fitting, Immersive experience, Internet, Labeling, Manuals, multi-modal, music, music recommendation, Recommender systems, trans-former, Transformers

2023

Lam, Bhan; Chieng, Julia; Ooi, Kenneth; Ong, Zhen Ting; Watcharasupat, Karn N.; Hong, Joo Young; Gan, Woon Seng

Crossing the Linguistic Causeway: Ethnonational Differences on Soundscape Attributes in Bahasa Melayu Journal Article

In: Applied Acoustics, vol. 214, 2023, ISSN: 1872910X.

Links | BibTeX | Tags:

Ding, Yiwei; Lerch, Alexander

Audio Embeddings as Teachers for Music Classification Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy, 2023.

Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Knees, Peter; Lerch, Alexander

MILC 2023: 3rd Workshop on Intelligent Music Interfaces for Listening and Creation Proceedings Article

In: Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 185–186, Association for Computing Machinery, Sydney, 2023, ISBN: 9798400701078.

Abstract | Links | BibTeX | Tags:

Chen, Hsin-Hung; Lerch, Alexander

Music Instrument Classification Reprogrammed Proceedings Article

In: Proceedings of the International Conference on Multimedia Modeling (MMM), Bergen, Norway, 2023.

Links | BibTeX | Tags:

Lerch, Alexander

An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications Book

2, Wiley-IEEE Press, Hoboken, N.J, 2023, ISBN: 978-1-119-89094-2.

Abstract | Links | BibTeX | Tags: analysis, audio, Audio content analysis, audio signal processing, Automatic Music Transcription, Computer sound processing, machine listening, Matlab, MIR, music analysis, music informatics, music information retrieval, Python

@book{lerch_introduction_2023,

title = {An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications},

author = {Alexander Lerch},

url = {https://ieeexplore.ieee.org/servlet/opac?bknumber=9965970},

isbn = {978-1-119-89094-2},

year  = {2023},

date = {2023-01-01},

urldate = {2022-01-01},

publisher = {Wiley-IEEE Press},

address = {Hoboken, N.J},

edition = {2},

abstract = {An Introduction to Audio Content Analysis Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation. To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website. Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include: Digital audio signals and their representation, common time-frequency transforms, audio features Pitch and fundamental frequency detection, key and chord Representation of dynamics in music and intensity-related features Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment Audio fingerprinting, musical genre, mood, and instrument classification An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.},

keywords = {analysis, audio, Audio content analysis, audio signal processing, Automatic Music Transcription, Computer sound processing, machine listening, Matlab, MIR, music analysis, music informatics, music information retrieval, Python},

pubstate = {published},

tppubtype = {book}

}

An Introduction to Audio Content Analysis Enables readers to understand the algorithmic analysis of musical audio signals with AI-driven approaches An Introduction to Audio Content Analysis serves as a comprehensive guide on audio content analysis explaining how signal processing and machine learning approaches can be utilized for the extraction of musical content from audio. It gives readers the algorithmic understanding to teach a computer to interpret music signals and thus allows for the design of tools for interacting with music. The work ties together topics from audio signal processing and machine learning, showing how to use audio content analysis to pick up musical characteristics automatically. A multitude of audio content analysis tasks related to the extraction of tonal, temporal, timbral, and intensity-related characteristics of the music signal are presented. Each task is introduced from both a musical and a technical perspective, detailing the algorithmic approach as well as providing practical guidance on implementation details and evaluation. To aid in reader comprehension, each task description begins with a short introduction to the most important musical and perceptual characteristics of the covered topic, followed by a detailed algorithmic model and its evaluation, and concluded with questions and exercises. For the interested reader, updated supplemental materials are provided via an accompanying website. Written by a well-known expert in the music industry, sample topics covered in Introduction to Audio Content Analysis include: Digital audio signals and their representation, common time-frequency transforms, audio features Pitch and fundamental frequency detection, key and chord Representation of dynamics in music and intensity-related features Beat histograms, onset and tempo detection, beat histograms, and detection of structure in music, and sequence alignment Audio fingerprinting, musical genre, mood, and instrument classification An invaluable guide for newcomers to audio signal processing and industry experts alike, An Introduction to Audio Content Analysis covers a wide range of introductory topics pertaining to music information retrieval and machine listening, allowing students and researchers to quickly gain core holistic knowledge in audio analysis and dig deeper into specific aspects of the field with the help of a large amount of references.

Lerch, Alexander

Audioinhaltsanalyse Book Section

In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–20, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4.

Abstract | Links | BibTeX | Tags: Audio content analysis, Grundfrequenzerkennung, music information retrieval, Musikklassifizierung, Musiktranskription, Tonarterkennung

Smith, Jason Brent; Vinay, Ashvala; Freeman, Jason

The Impact of Salient Musical Features in a Hybrid Recommendation System for a Sound Library Proceedings Article

In: Joint Proceedings of the ACM IUI Workshops (MILC), Sydney, 2023.

Abstract | Links | BibTeX | Tags:

Hung, Yun-Ning; Yang, Chao-Han Huck; Chen, Pin-Yu; Lerch, Alexander

Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming Proceedings Article

In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Rhodes Island, Greece, 2023.

Links | BibTeX | Tags:

Lerch, Alexander

Grundlagen digitaler Audiosignale Book Section

In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–13, Springer Berlin Heidelberg, Berlin, Heidelberg, 2023, ISBN: 978-3-662-60357-4.

Abstract | Links | BibTeX | Tags:

Vinay, Ashvala; Lerch, Alexander

AQUATK: An Audio Quality Assessment Toolkit Proceedings Article

In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Milan, 2023.

Links | BibTeX | Tags:

Watcharasupat, Karn N; Wu, Chih-Wei; Ding, Yiwei; Orife, Iroro; Hipple, Aaron J; Williams, Phillip A; Kramer, Scott; Lerch, Alexander; Wolcott, William

A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation Journal Article

In: IEEE Open Journal of Signal Processing, pp. 1–9, 2023, ISSN: 2644-1322.

Abstract | Links | BibTeX | Tags:

Lam, Bhan; Ooi, Kenneth; Ong, Zhen-Ting; Wong, Trevor; Gan, Woon-Seng; Watcharasupat, Karn

Preliminary Investigation of the Short-Term in Situ Performance of an Automatic Masker Selection System Proceedings Article

In: Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering, 2023.

Links | BibTeX | Tags:

Ong, Zhen-Ting; Ooi, Kenneth; Wong, Trevor; Lam, Bhan; Gan, Woon-Seng; Watcharasupat, Karn N.

Effect of Masker Selection Schemes on the Perceived Affective Quality of Soundscapes: A Pilot Study Proceedings Article

In: Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering, 2023.

Links | BibTeX | Tags:

Ooi, Kenneth; Ong, Zhen-Ting; Lam, Bhan; Wong, Trevor; Gan, Woon-Seng; Watcharasupat, Karn

ARAUSv2: An Expanded Dataset and Multimodal Models of Affective Responses to Augmented Urban Soundscapes Proceedings Article

In: Proceedings of the 52nd International Congress and Exposition on Noise Control Engineering, 2023.

Links | BibTeX | Tags:

Ooi, Kenneth; Watcharasupat, Karn N; Lam, Bhan; Ong, Zhen-Ting; Gan, Woon-Seng

Autonomous Soundscape Augmentation with Multimodal Fusion of Visual and Participant-linked Inputs Proceedings Article

In: Proceedings of the 2023 International Conference on Acoustics, Speech, and Signal Processing, 2023.

Links | BibTeX | Tags:

2022

Hung, Yun-Ning; Wu, Chih-Wei; Orife, Iroro; Hipple, Aaron; Wolcott, William; Lerch, Alexander

A large TV dataset for speech and music activity detection Journal Article

In: EURASIP Journal on Audio, Speech, and Music Processing, vol. 2022, no. 1, pp. 21, 2022, ISSN: 1687-4722.

Abstract | Links | BibTeX | Tags: Dataset, Production TV audio, Speech and music activation detection

@article{hung_large_2022,

title = {A large TV dataset for speech and music activity detection},

author = {Yun-Ning Hung and Chih-Wei Wu and Iroro Orife and Aaron Hipple and William Wolcott and Alexander Lerch},

url = {https://doi.org/10.1186/s13636-022-00253-8},

doi = {10.1186/s13636-022-00253-8},

issn = {1687-4722},

year  = {2022},

date = {2022-09-01},

urldate = {2022-09-03},

journal = {EURASIP Journal on Audio, Speech, and Music Processing},

volume = {2022},

number = {1},

pages = {21},

abstract = {Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and the cost of manual annotation, the limited availability of diverse and sizeable datasets hinders the progress of state-of-the-art (SOTA) data-driven approaches. We address this challenge by presenting a large-scale dataset containing Mel spectrogram, VGGish, and MFCCs features extracted from around 1600 h of professionally produced audio tracks and their corresponding noisy labels indicating the approximate location of speech and music segments. The labels are several sources such as subtitles and cuesheet. A test set curated by human annotators is also included as a subset for evaluation. To validate the generalizability of the proposed dataset, we conduct several experiments comparing various model architectures and their variants under different conditions. The results suggest that our proposed dataset is able to serve as a reliable training resource and leads to SOTA performances on various public datasets. To the best of our knowledge, this dataset is the first large-scale, open-sourced dataset that contains features extracted from professionally produced audio tracks and their corresponding frame-level speech and music annotations.},

keywords = {Dataset, Production TV audio, Speech and music activation detection},

pubstate = {published},

tppubtype = {article}

}

Ma, Alison B; Lerch, Alexander

Representation Learning for the Automatic Indexing of Sound Effects Libraries Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2208.09096 [cs, eess]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Vinay, Ashvala; Lerch, Alexander

Evaluating Generative Audio Systems and their Metrics Proceedings Article

In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Bangalore, IN, 2022, (arXiv:2209.00130 [cs, eess]).

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Lerch, Alexander

libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages Journal Article

In: Software Impacts, pp. 100349, 2022, ISSN: 2665-9638.

Abstract | Links | BibTeX | Tags: Audio content analysis, C++, Matlab, music information retrieval, Python

Hung, Yun-Ning; Lerch, Alexander

Feature-informed Latent Space Regularization for Music Source Separation Miscellaneous

2022, (arXiv:2203.09132 [eess]).

Abstract | Links | BibTeX | Tags: Electrical Engineering and Systems Science - Audio and Speech Processing

Wang, Ju-Chiang; Hung, Yun-Ning; Smith, Jordan B. L.

To Catch A Chorus, Verse, Intro, or Anything Else: Analyzing a Song with Structural Functions Proceedings Article

In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 416–420, 2022, (ISSN: 2379-190X).

Abstract | Links | BibTeX | Tags: Location awareness, music, Music structure, segmentation, semantic labeling, Semantics, Signal processing, Signal processing algorithms, SpecTNT, Taxonomy, Transformer, Transformers

Watcharasupat, Karn N; Lee, Junyoung; Lerch, Alexander

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models Journal Article

In: Software Impacts, pp. 100222, 2022, ISSN: 26659638.

Abstract | Links | BibTeX | Tags:

Kalbag, Vedant; Lerch, Alexander

Scream Detection in Heavy Metal Music Proceedings Article

In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022.

Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Hung, Yun-Ning; Lerch, Alexander

Feature-informed Embedding Space Regularization for Audio Classification Proceedings Article

In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022.

Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Guo, Weian; Hua, Zhenyao; Kang, Zecheng; Li, Dongyang; Wang, Lei; Wu, Qidi; Lerch, Alexander

Deep Reinforcement Learning for Urban Multi-taxis Cruising Strategy Journal Article

In: Neural Computing and Applications, 2022, ISSN: 1433-3058.

Abstract | Links | BibTeX | Tags: Data-driven model, deep Q-learning network, Multi-taxis cruising, Urban transportation

Herre, Jürgen; Disch, Sascha; Lerch, Alexander

Quellcodierung Book Section

In: Weinzierl, Stefan (Ed.): Handbuch der Audiotechnik, pp. 1–23, Springer, Berlin, Heidelberg, 2022, ISBN: 978-3-662-60357-4.

Abstract | Links | BibTeX | Tags: Audiocodierung, Audiokomprimierung, Codec, MPEG, Psychoakustik, verlustfrei, verlustlos

@incollection{herre_quellcodierung_2022,

title = {Quellcodierung},

author = {J\"{u}rgen Herre and Sascha Disch and Alexander Lerch},

editor = {Stefan Weinzierl},

url = {https://doi.org/10.1007/978-3-662-60357-4_34-1},

doi = {10.1007/978-3-662-60357-4_34-1},

isbn = {978-3-662-60357-4},

year  = {2022},

date = {2022-01-01},

urldate = {2022-10-02},

booktitle = {Handbuch der Audiotechnik},

pages = {1--23},

publisher = {Springer},

address = {Berlin, Heidelberg},

edition = {2},

abstract = {Zur Bitratenreduktion (Datenkomprimierung) eingesetzte Codierungsverfahren haben die Aufgabe, die ben\"{o}tigte Datenmenge zur \"{U}bertragung oder Speicherung von digitalen Signalen ohne Verlust oder mit m\"{o}glichst geringem Qualit\"{a}tsverlust zu verkleinern. Sie werden entweder aus \"{o}konomischen Gr\"{u}nden wie der Kostenersparnis durch geringere erforderliche \"{U}bertragungsbandbreiten, oder aus technischen Gr\"{u}nden wie einem in der Gr\"{o}\sse beschr\"{a}nkten Speicherplatz oder eingeschr\"{a}nkten \"{U}bertragungskapazit\"{a}ten eingesetzt. Codierungsverfahren finden Anwendung in Datennetzen wie beispielsweise dem Internet beim Multimediavertrieb von Musik und Filmen, bei Streamingdiensten und im Rundfunk, in Filmtheatern und in der Telekommunikation aber auch auf physikalischen Datentr\"{a}gern wie DVD (Digital Versatile Disc), bei der Archivierung gro\sser Datenmengen auf Festplatte und auf Speicherkarten in portablen Mediaplayern. Dieses Kapitel vermittelt die technischen Grundlagen der effizienten und geh\"{o}rrichtigen Audiocodierung. Erg\"{a}nzend werden einige gebr\"{a}uchliche standardisierte Verfahren zur Messung der subjektiven Audioqualit\"{a}t erl\"{a}utert. Desweiteren wird ein \"{U}berblick \"{u}ber g\"{a}ngige verlustlose und verlustbehaftete Audiocodierverfahren und ihre qualitative Einordnung gegeben.},

keywords = {Audiocodierung, Audiokomprimierung, Codec, MPEG, Psychoakustik, verlustfrei, verlustlos},

pubstate = {published},

tppubtype = {incollection}

}

Li, Dongyang; Wang, Lei; Li, Li; Guo, Weian; Wu, Qidi; Lerch, Alexander

A Large-Scale Multiobjective Particle Swarm Optimizer With Enhanced Balance of Convergence and Diversity Journal Article

In: IEEE Transactions on Cybernetics, pp. 1–12, 2022, ISSN: 2168-2275.

Abstract | Links | BibTeX | Tags: Complexity theory, Convergence, Cybernetics, diversity, Estimation, large-scale multiobjective optimization, multidimensional local sparseness, Optimization, Particle swarm optimization, particle swarm optimization (PSO), Weight measurement, weighted convergence factor (WCF)

@article{li_large-scale_2022,

title = {A Large-Scale Multiobjective Particle Swarm Optimizer With Enhanced Balance of Convergence and Diversity},

author = {Dongyang Li and Lei Wang and Li Li and Weian Guo and Qidi Wu and Alexander Lerch},

doi = {10.1109/TCYB.2022.3225341},

issn = {2168-2275},

year  = {2022},

date = {2022-01-01},

journal = {IEEE Transactions on Cybernetics},

pages = {1--12},

abstract = {Large-scale multiobjective optimization problems (LSMOPs) continue to be challenging for existing multiobjective evolutionary algorithms (MOEAs). The main difficulties are that: 1) the diversity preservation in both the objective space and the decision space needs to be taken into account when solving LSMOPs and 2) the existing learning structures in current MOEAs usually make the learning operators only coincidentally serve convergence and diversity, leading to difficulties in balancing these two factors. Therefore, balancing convergence and diversity in current MOEAs is difficult. To address these issues, this article proposes a multiobjective particle swarm optimizer with enhanced balance of convergence and diversity (MPSO-EBCD). In MPSO-EBCD, a novel velocity update structure for multiobjective particle swarm optimization is put forward, dividing the convergence, and diversity preservation operations into independent components. Following the proposed update structure, a weighted convergence factor is introduced to serve the convergence strategy, whilst a diversity preservation strategy is built to uniformly distribute the particles in the searched space based on a proposed multidimensional local sparseness degree indicator. By this means, MPSO-EBCD is able to balance convergence and diversity with specific parameters in independent operators. Experimental results on LSMOP benchmarks and a voltage transformer optimization problem demonstrate the competitiveness of the proposed algorithm compared to several state-of-the-art MOEAs.},

keywords = {Complexity theory, Convergence, Cybernetics, diversity, Estimation, large-scale multiobjective optimization, multidimensional local sparseness, Optimization, Particle swarm optimization, particle swarm optimization (PSO), Weight measurement, weighted convergence factor (WCF)},

pubstate = {published},

tppubtype = {article}

}

Watcharasupat, Karn N.; Ooi, Kenneth; Lam, Bhan; Wong, Trevor; Ong, Zhen Ting; Gan, Woon Seng

Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain Journal Article

In: IEEE Signal Processing Letters, vol. 29, pp. 1749–1753, 2022, ISSN: 15582361.

Links | BibTeX | Tags:

2021

Watcharasupat, Karn N; Lerch, Alexander

Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Proceedings Article

In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021.

Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Hung, Yun-Ning; Wichern, Gordon; Roux, Jonathan Le

Transcription Is All You Need: Learning To Separate Musical Mixtures With Score As Supervision Proceedings Article

In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 46–50, 2021, (ISSN: 2379-190X).

Abstract | Links | BibTeX | Tags: audio source separation, Conferences, Instruments, music, music transcription, Particle separators, Source separation, Time-frequency analysis, Training, weakly-labeled data, weakly-supervised separation

Li, Dongyang; Wang, Lei; Lerch, Alexander; Wu, Qidi

An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization Journal Article

In: Swarm and Evolutionary Computation, vol. 60, 2021, ISSN: 2210-6502.

Links | BibTeX | Tags:

Vinay, Ashvala; Lerch, Alexander; Leslie, Grace

Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening Proceedings Article

In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Toronto, Ontario, Canada, 2021.

Abstract | Links | BibTeX | Tags: