Watcharasupat, Karn N; Lee, Junyoung; Lerch, Alexander Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models Journal Article In: Software Impacts, pp. 100222, 2022, ISSN: 26659638. Abstract | Links | BibTeX | Tags: Kalbag, Vedant; Lerch, Alexander Scream Detection in Heavy Metal Music Inproceedings In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Hung, Yun-Ning; Lerch, Alexander Feature-informed Embedding Space Regularization for Audio Classification Inproceedings In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Abstract | Links | BibTeX | Tags: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Watcharasupat, Karn N; Lerch, Alexander Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021. Abstract | Links | BibTeX | Tags: Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing Li, Dongyang; Wang, Lei; Lerch, Alexander; Wu, Qidi An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization Journal Article In: Swarm and Evolutionary Computation, vol. 60, 2021, ISSN: 2210-6502. Vinay, Ashvala; Lerch, Alexander; Leslie, Grace Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening Inproceedings In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Toronto, Ontario, Canada, 2021. Abstract | Links | BibTeX | Tags: Seshadri, Pavan; Lerch, Alexander Improving Music Performance Assessment with Contrastive Learning Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Abstract | Links | BibTeX | Tags: Pati, Ashis; Lerch, Alexander Is Disentanglement Enough? On Latent Representations for Controllable Music Generation Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Semi-Supervised Audio Classification with Partially Labeled Data Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), online, 2021. Lerch, Alexander; Knees, Peter Machine Learning Applied to Music/Audio Signal Processing Journal Article In: Electronics, vol. 10, no. 24, pp. 3077, 2021. Abstract | Links | BibTeX | Tags: n/a Huang, Jiawen; Hung, Yun-Ning; Pati, Ashis K; Gururani, Siddharth; Lerch, Alexander Score-informed Networks for Music Performance Assessment Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. Pati, Ashis K; Gururani, Siddharth; Lerch, Alexander dMelodies: A Music Dataset for Disentanglement Learning Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. Hung, Yun-Ning; Lerch, Alexander Multi-Task Learning for Instrument Activation Aware Music Source Separation Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. Pati, Ashis K; Lerch, Alexander Attribute-based Regularization for Latent Spaces of Variational Auto-Encoders Journal Article In: Neural Computing and Applications, 2020. Manjunath, Tejas; Pawani, Jeet; Lerch, Alexander Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks Inproceedings In: Proceedings of the Audio Engineering Society Convention, New York, 2020. BibTeX | Tags: Yang, Li-Chia; Lerch, Alexander Remixing Music with Visual Conditioning Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), Naples, Italy, 2020. Abstract | Links | BibTeX | Tags: Chen, Yihao; Lerch, Alexander Melody-Conditioned Lyrics Generation with SeqGANs Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), Naples, Italy, 2020. Abstract | Links | BibTeX | Tags: Lerch, Alexander; Arthur, Claire; Pati, Ashis K; Gururani, Siddharth An Interdisciplinary Review of Music Performance Analysis Journal Article In: Transactions of the International Society for Music Information Retrieval (TISMIR), vol. 3, no. 1, pp. 221–245, 2020. Guan, Hongzhao; Lerch, Alexander Learning Strategies for Voice Disorder Detection Inproceedings In: Proceedings of the International Conference on Semantic Computing (ICSC), 2019. Abstract | Links | BibTeX | Tags: Swaminathan, Rupak Vignesh; Lerch, Alexander Improving Singing Voice Separation using Attribute-Aware Deep Network Inproceedings In: Proceedings of the International Workshop on Multilayer Music Representation and Processing (MMRP), Milan, Italy, 2019. Abstract | Links | BibTeX | Tags: Qin, Yi; Lerch, Alexander Tuning Frequency Dependency in Music Classification Inproceedings In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP), Brighton, UK, 2019. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander; Bretan, Mason A Comparison of Music Input Domains for Self-Supervised Feature Learning Inproceedings In: ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract, Long Beach, 2019. Abstract | Links | BibTeX | Tags: Pati, Ashis; Lerch, Alexander Latent Space Regularization for Explicit Control of Musical Attributes Inproceedings In: ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract, Long Beach, 2019. Abstract | Links | BibTeX | Tags: Xambo, Anna; Freeman, Jason; Lerch, Alexander Music Information Retrieval in Live Coding: A Theoretical Framework Journal Article In: Computer Music Journal, vol. 42, no. 4, pp. 9–25, 2019. Abstract | Links | BibTeX | Tags: Genchel, Benjamin; Pati, Ashis K; Lerch, Alexander Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs Inproceedings In: Proceedings of the International Workshop on Musical Metacreation (MuMe), Charlotte, 2019. Abstract | Links | BibTeX | Tags: Lerch, Alexander; Arthur, Claire; Pati, Ashis; Gururani, Siddharth Music Performance Analysis: A Survey Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Pati, Ashis; Lerch, Alexander; Hadjeres, Gaëtan Learning to Traverse Latent Spaces for Musical Score Inpainting Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Huang, Jiawen; Lerch, Alexander Automatic Assessment of Sight-Reading Exercises Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Sharma, Mohit; Lerch, Alexander An Attention Mechanism for Music Instrument Recognition Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Guan, Hongzhao; Lerch, Alexander Evaluation of Feature Learning Methods for Voice Disorder Detection Journal Article In: International Journal of Semantic Computing (IJSC), vol. 13, no. 4, pp. 453–470, 2019. Wu, Chih-Wei; Lerch, Alexander Learned Features for the Assessment of Percussive Music Performances Inproceedings In: Proceedings of the International Conference on Semantic Computing (ICSC), IEEE, Laguna Hills, 2018. Links | BibTeX | Tags: audio, feature learning, music performance analysis, percussion Lerch, Alexander The Relation Between Music Technology and Music Industry Incollection In: Bader, Rolf (Ed.): Springer Handbook of Systematic Musicology, pp. 899–909, Springer, Berlin, Heidelberg, 2018, ISBN: 978-3-662-55002-1 978-3-662-55004-5. Abstract | Links | BibTeX | Tags: Pati, Kumar Ashis; Gururani, Siddharth; Lerch, Alexander Assessment of Student Music Performances Using Deep Neural Networks Journal Article In: Applied Sciences, vol. 8, no. 4, pp. 507, 2018. Abstract | Links | BibTeX | Tags: deep learning, deep neural networks, DNN, MIR, music education, music informatics, music information retrieval, music learning, music performance assessment Wu, Chih-Wei; Dittmar, Christian; Southall, Carl; Vogl, Richard; Widmer, Gerhard; Hockman, Jason A; Muller, Meinard; Lerch, Alexander A Review of Automatic Drum Transcription Journal Article In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1457–1483, 2018, ISSN: 2329-9290. Abstract | Links | BibTeX | Tags: Automatic Music Transcription, deep learning, Instruments, Machine Learning, Matrix Factorization, Rhythm, Spectrogram, Speech processing, Task analysis, Transient analysis Xambo, Anna; Roma, Gerard; Lerch, Alexander; Barthet, Matthieu; Fazekas, Gyorgy Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases Inproceedings In: Proceedings of the Conference on New Interfaces for Musical Expression (NIME), Blacksburg, 2018. Abstract | Links | BibTeX | Tags: Seipel, Fabian; Lerch, Alexander Multi-Track Crosstalk Reduction Inproceedings In: Proceedings of the Audio Engineering Society Convention, Audio Engineering Society (AES), Milan, 2018. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Summers, Cameron; Lerch, Alexander Instrument Activity Detection in Polyphonic Music using Deep Neural Networks Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Wu, Chih-Wei; Lerch, Alexander From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Subramanian, Vinod; Lerch, Alexander Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Gururani, Siddharth; Pati, Kumar Ashis; Wu, Chih-Wei; Lerch, Alexander Analysis of Objective Descriptors for Music Performance Assessment Inproceedings In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), Montreal, Canada, 2018. Abstract | Links | BibTeX | Tags: Genchel, Benjamin; Lerch, Alexander Lead Sheet Generation with Musically Interdependent Networks Inproceedings In: Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC), Dublin, 2018. Wu, Chih-Wei; Lerch, Alexander Assessment of Percussive Music Performances with Feature Learning Journal Article In: International Journal of Semantic Computing, vol. 12, no. 3, pp. 315–333, 2018, ISSN: 1793-351X. Yang, Li-Chia; Lerch, Alexander On the evaluation of generative models in music Journal Article In: Neural Computing and Applications, 2018, ISSN: 1433-3058. Abstract | Links | BibTeX | Tags: Computational creativity, Music generation, Objective evaluation Wu, Chih-Wei; Vinton, Mark Blind Bandwidth Extension using K-Means and Support Vector Regression Inproceedings In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, New Orleans, 2017. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Automatic Sample Detection in Polyphonic Music Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Pati, Kumar Ashis; Lerch, Alexander A Dataset and Method for Electric Guitar Solo Detection in Rock Music Inproceedings In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Abstract | Links | BibTeX | Tags: Southall, Carl; Wu, Chih-Wei; Lerch, Alexander; Hockman, Jason A MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Vidwans, Amruta; Gururani, Siddharth; Wu, Chih-Wei; Subramanian, Vinod; Swaminathan, Rupak Vignesh; Lerch, Alexander Objective descriptors for the assessment of student music performances Inproceedings In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Abstract | Links | BibTeX | Tags: computational auditory scene analysis, Computer sound processing, Content analysis (Communication), Data processing Wu, Chih-Wei; Lerch, Alexander Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: 2022
@article{watcharasupat_latte_2022,
title = {Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models},
author = {Karn N Watcharasupat and Junyoung Lee and Alexander Lerch},
url = {https://linkinghub.elsevier.com/retrieve/pii/S2665963822000033},
doi = {10.1016/j.simpa.2022.100222},
issn = {26659638},
year = {2022},
date = {2022-01-01},
urldate = {2022-01-13},
journal = {Software Impacts},
pages = {100222},
abstract = {Latte (for LATent Tensor Evaluation) is a Python library for evaluation of latent-based generative models in the fields of disentanglement learning and controllable generation. Latte is compatible with both PyTorch and TensorFlow/Keras, and provides both functional and modular APIs that can be easily extended to support other deep learning frameworks. Using NumPy-based and framework-agnostic implementation, Latte ensures reproducible, consistent, and deterministic metric calculations regardless of the deep learning framework of choice.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{kalbag_scream_2022,
title = {Scream Detection in Heavy Metal Music},
author = {Vedant Kalbag and Alexander Lerch},
url = {http://arxiv.org/abs/2205.05580},
doi = {10.48550/arXiv.2205.05580},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the Sound and Music Computing Conference (SMC)},
address = {Saint-Etienne},
abstract = {Harsh vocal effects such as screams or growls are far more common in heavy metal vocals than the traditionally sung vocal. This paper explores the problem of detection and classification of extreme vocal techniques in heavy metal music, specifically the identification of different scream techniques. We investigate the suitability of various feature representations, including cepstral, spectral, and temporal features as input representations for classification. The main contributions of this work are (i) a manually annotated dataset comprised of over 280 minutes of heavy metal songs of various genres with a statistical analysis of occurrences of different extreme vocal techniques in heavy metal music, and (ii) a systematic study of different input feature representations for the classification of heavy metal vocals},
keywords = {Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_feature-informed_2022,
title = {Feature-informed Embedding Space Regularization for Audio Classification},
author = {Yun-Ning Hung and Alexander Lerch},
url = {http://arxiv.org/abs/2206.04850},
doi = {10.48550/arXiv.2206.04850},
year = {2022},
date = {2022-01-01},
booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
address = {Belgrade, Serbia},
abstract = {Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computational burdens during inference. We propose to leverage both detailed task-specific features from spectrogram input and generic pre-trained features by introducing two regularization methods that integrate the information of both feature classes. The workload is kept low during inference as the pre-trained features are only necessary for training. In experiments with the pre-trained features VGGish, OpenL3, and a combination of both, we show that the proposed methods not only outperform baseline methods, but also can improve state-of-the-art models on several audio classification tasks. The results also suggest that using the mixture of features performs better than using individual features.},
keywords = {Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
2021
@inproceedings{watcharasupat_evaluation_2021,
title = {Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes},
author = {Karn N Watcharasupat and Alexander Lerch},
url = {http://arxiv.org/abs/2110.05587},
year = {2021},
date = {2021-10-01},
urldate = {2021-11-11},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Online},
abstract = {Controllable music generation with deep generative models has become increasingly reliant on disentanglement learning techniques. However, current disentanglement metrics, such as mutual information gap (MIG), are often inadequate and misleading when used for evaluating latent representations in the presence of interdependent semantic attributes often encountered in real-world music datasets. In this work, we propose a dependency-aware information metric as a drop-in replacement for MIG that accounts for the inherent relationship between semantic attributes.},
keywords = {Computer Science - Information Retrieval, Computer Science - Information Theory, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{li_adaptive_2021,
title = {An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization},
author = {Dongyang Li and Lei Wang and Alexander Lerch and Qidi Wu},
url = {http://www.sciencedirect.com/science/article/pii/S2210650220304429},
doi = {10.1016/j.swevo.2020.100789},
issn = {2210-6502},
year = {2021},
date = {2021-01-01},
journal = {Swarm and Evolutionary Computation},
volume = {60},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{vinay_mind_2021,
title = {Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening},
author = {Ashvala Vinay and Alexander Lerch and Grace Leslie},
url = {https://arxiv.org/abs/2102.06393},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Toronto, Ontario, Canada},
abstract = {We propose a deep learning approach to predicting audio event onsets in electroencephalogram (EEG) recorded from users as they listen to music. We use a publicly available dataset containing ten contemporary songs and concurrently recorded EEG. We generate a sequence of onset labels for the songs in our dataset and trained neural networks (a fully connected network (FCN) and a recurrent neural network (RNN)) to parse one second windows of input EEG to predict one second windows of onsets in the audio. We compare our RNN network to both the standard spectral-flux based novelty function and the FCN. We find that our RNN was able to produce results that reflected its ability to generalize better than the other methods.
Since there are no pre-existing works on this topic, the numbers presented in this paper may serve as useful benchmarks for future approaches to this research problem.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Since there are no pre-existing works on this topic, the numbers presented in this paper may serve as useful benchmarks for future approaches to this research problem.@inproceedings{seshadri_improving_2021,
title = {Improving Music Performance Assessment with Contrastive Learning},
author = {Pavan Seshadri and Alexander Lerch},
url = {https://arxiv.org/abs/2108.01711},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
pages = {8},
address = {Online},
abstract = {Several automatic approaches for objective music performance assessment (MPA) have been proposed in the past, however, existing systems are not yet capable of reliably predicting ratings with the same accuracy as professional judges. This study investigates contrastive learning as a potential method to improve existing MPA systems. Contrastive learning is a widely used technique in representation learning to learn a structured latent space capable of separately clustering multiple classes. It has been shown to produce state of the art results for image-based classification problems. We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network and show that contrastive loss results in performance gains in regression tasks for MPA. Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks by creating better class clusters within the latent space of the neural networks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_is_2021,
title = {Is Disentanglement Enough? On Latent Representations for Controllable Music Generation},
author = {Ashis Pati and Alexander Lerch},
url = {https://arxiv.org/abs/2108.01450},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
pages = {8},
address = {Online},
abstract = {Improving controllability or the ability to manipulate one or more attributes of the generated data has become a topic of interest in the context of deep generative models of music. Recent attempts in this direction have relied on learning disentangled representations from data such that the underlying factors of variation are well separated. In this paper, we focus on the relationship between disentanglement and controllability by conducting a systematic study using different supervised disentanglement learning algorithms based on the Variational Auto-Encoder (VAE) architecture. Our experiments show that a high degree of disentanglement can be achieved by using different forms of supervision to train a strong discriminative encoder. However, in the absence of a strong generative decoder, disentanglement does not necessarily imply controllability. The structure of the latent space with respect to the VAE-decoder plays an important role in boosting the ability of a generative model to manipulate different attributes. To this end, we also propose methods and metrics to help evaluate the quality of a latent space with respect to the afforded degree of controllability.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_semi-supervised_2021,
title = {Semi-Supervised Audio Classification with Partially Labeled Data},
author = {Siddharth Gururani and Alexander Lerch},
url = {https://arxiv.org/abs/2111.12761},
year = {2021},
date = {2021-01-01},
booktitle = {Proceedings of the IEEE International Symposium on Multimedia (ISM)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {online},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{lerch_machine_2021,
title = {Machine Learning Applied to Music/Audio Signal Processing},
author = {Alexander Lerch and Peter Knees},
url = {https://www.mdpi.com/2079-9292/10/24/3077},
doi = {10.3390/electronics10243077},
year = {2021},
date = {2021-01-01},
urldate = {2021-12-10},
journal = {Electronics},
volume = {10},
number = {24},
pages = {3077},
abstract = {Over the past two decades, the utilization of machine learning in audio and music signal processing has dramatically increased [...]},
keywords = {n/a},
pubstate = {published},
tppubtype = {article}
}
2020
@inproceedings{huang_score-informed_2020,
title = {Score-informed Networks for Music Performance Assessment},
author = {Jiawen Huang and Yun-Ning Hung and Ashis K Pati and Siddharth Gururani and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/08/Huang-et-al.-2020-Score-informed-Networks-for-Music-Performance-Asse.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Montreal},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_dmelodies_2020,
title = {dMelodies: A Music Dataset for Disentanglement Learning},
author = {Ashis K Pati and Siddharth Gururani and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/08/Pati-et-al.-2020-dMelodies-A-Music-Dataset-for-Disentanglement-Lea.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Montreal},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{hung_multi-task_2020,
title = {Multi-Task Learning for Instrument Activation Aware Music Source Separation},
author = {Yun-Ning Hung and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/08/Hung-and-Lerch-2020-Multi-Task-Learning-for-Instrument-Activation-Awar.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Montreal},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{pati_attribute-based_2020,
title = {Attribute-based Regularization for Latent Spaces of Variational Auto-Encoders},
author = {Ashis K Pati and Alexander Lerch},
url = {https://arxiv.org/pdf/2004.05485},
doi = {10.1007/s00521-020-05270-2},
year = {2020},
date = {2020-01-01},
journal = {Neural Computing and Applications},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{manjunath_automatic_2020,
title = {Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks},
author = {Tejas Manjunath and Jeet Pawani and Alexander Lerch},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the Audio Engineering Society Convention},
address = {New York},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{yang_remixing_2020,
title = {Remixing Music with Visual Conditioning},
author = {Li-Chia Yang and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/10/Yang-and-Lerch-2020-Remixing-Music-with-Visual-Conditioning.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the IEEE International Symposium on Multimedia (ISM)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Naples, Italy},
abstract = {We propose a visually conditioned music remixing system by incorporating deep visual and audio models. The method is based on a state of the art audio-visual source separation model which performs music instrument source separation with video information. We modified the model to work with user-selected images instead of videos as visual input during inference to enable separation of audio-only content. Furthermore, we propose a remixing engine that generalizes the task of source separation into music remixing. The proposed method is able to achieve improved audio quality compared to remixing performed by the separate-and-add method with a state-of-the-art audio-visual source separation model.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{chen_melody-conditioned_2020,
title = {Melody-Conditioned Lyrics Generation with SeqGANs},
author = {Yihao Chen and Alexander Lerch},
url = {https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2020/10/Chen-and-Lerch-2020-Melody-Conditioned-Lyrics-Generation-with-SeqGANs.pdf},
year = {2020},
date = {2020-01-01},
booktitle = {Proceedings of the IEEE International Symposium on Multimedia (ISM)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {Naples, Italy},
abstract = {Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches havetextasciitilde---due to increases in computational power and evolution in data-driven models---textasciitildemostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{lerch_interdisciplinary_2020,
title = {An Interdisciplinary Review of Music Performance Analysis},
author = {Alexander Lerch and Claire Arthur and Ashis K Pati and Siddharth Gururani},
url = {https://transactions.ismir.net/articles/10.5334/tismir.53},
doi = {10.5334/tismir.53},
year = {2020},
date = {2020-01-01},
journal = {Transactions of the International Society for Music Information Retrieval (TISMIR)},
volume = {3},
number = {1},
pages = {221--245},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2019
@inproceedings{guan_learning_2019,
title = {Learning Strategies for Voice Disorder Detection},
author = {Hongzhao Guan and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/Guan-and-Lerch-2019-Learning-Strategies-for-Voice-Disorder-Detection.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Conference on Semantic Computing (ICSC)},
abstract = {Voice disorder is a health issue that is frequently encountered, however, many patients either cannot afford to visit a professional doctor or neglect to take good care of their voice. In order to give a patient a preliminary diagnosis without using professional medical devices, previous research has shown that the detection of voice disorders can be carried out by utilizing machine learning and acoustic features extracted from voice recordings. Considering the increasing popularity of deep learning and feature learning, this study explores the possibilities of using these methods to assign voice recordings into one of the two classes\textemdashNormal and Pathological. While the results show the general viability of deep learning and feature learning for the automatic recognition of voice disorder, they also demonstrate the shortcomings of the existing datasets for this task such as insufficient dataset size and lack of generality.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{swaminathan_improving_2019,
title = {Improving Singing Voice Separation using Attribute-Aware Deep Network},
author = {Rupak Vignesh Swaminathan and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/Swaminathan-and-Lerch-2019-Improving-Singing-Voice-Separation-using-Attribute.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Workshop on Multilayer Music Representation and Processing (MMRP)},
address = {Milan, Italy},
abstract = {Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ‘oracle’ input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and non-vocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{qin_tuning_2019,
title = {Tuning Frequency Dependency in Music Classification},
author = {Yi Qin and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/04/Qin-and-Lerch-2019-Tuning-Frequency-Dependency-in-Music-Classificatio.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP)},
address = {Brighton, UK},
abstract = {Deep architectures have become ubiquitous in Music Information Retrieval (MIR) tasks, however, concurrent studies still lack a deep understanding of the input properties being evaluated by the networks. In this study, we show by the example of a Music Genre Classification system the potential dependency on the tuning frequency, an irrelevant and confounding variable. We generate adversarial samples through pitch-shifting the audio data and investigate the classification accuracy of the output depending on the pitch shift. We find the accuracy to be periodic with a period of one semitone, indicating that the system is utilizing tuning information. We show that proper data augmentation including pitch-shifts smaller than one semitone helps minimizing this problem and point out the need for carefully designed augmentation procedures in related MIR tasks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_comparison_2019,
title = {A Comparison of Music Input Domains for Self-Supervised Feature Learning},
author = {Siddharth Gururani and Alexander Lerch and Mason Bretan},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Gururani-et-al.-A-Comparison-of-Music-Input-Domains-for-Self-Super.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract},
address = {Long Beach},
abstract = {In music using neural networks to learn effective feature spaces, or embeddings, that capture useful characteristics has been demonstrated in the symbolic and audio domains. In this work, we compare the symbolic and audio domains, attempting to identify the benefits of each, and whether incorporating both of the representations during learning has utility. We use a self-supervising siamese network to learn a low-dimensional representation of three second music clips and evaluate the learned features on their ability to perform a variety of music tasks. We use a polyphonic piano performance dataset and directly compare the performance on these tasks with embeddings derived from synthesized audio and the corresponding symbolic representations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_latent_2019,
title = {Latent Space Regularization for Explicit Control of Musical Attributes},
author = {Ashis Pati and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Pati-and-Lerch-Latent-Space-Regularization-for-Explicit-Control-o.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract},
address = {Long Beach},
abstract = {Deep generative models for music are often restrictive since they do not allow users any meaningful control over the generated music. To address this issue, we propose a novel latent space regularization technique which is capable of structuring the latent space of a deep generative model by encoding musically meaningful attributes along specific dimensions of the latent space. This, in turn, can provide users with explicit control over these attributes during inference and thereby, help design intuitive musical interfaces to enhance creative workflows.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{xambo_music_2019,
title = {Music Information Retrieval in Live Coding: A Theoretical Framework},
author = {Anna Xambo and Jason Freeman and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Xambo-et-al.-2019-Music-Information-Retrieval-in-Live-Coding-A-Theo.pdf},
doi = {10.1162/COMJ a 00484},
year = {2019},
date = {2019-01-01},
journal = {Computer Music Journal},
volume = {42},
number = {4},
pages = {9--25},
abstract = {Music information retrieval (MIR) has a great potential in musical live coding because it can help the musician\textendashprogrammer to make musical decisions based on audio content analysis and explore new sonorities by means of MIR techniques. The use of real-time MIR techniques can be computationally demanding and thus they have been rarely used in live coding; when they have been used, it has been with a focus on low-level feature extraction. This article surveys and discusses the potential of MIR applied to live coding at a higher musical level. We propose a conceptual framework of three categories: (1) audio repurposing, (2) audio rewiring, and (3) audio remixing. We explored the three categories in live performance through an application programming interface library written in SuperCollider, MIRLC. We found that it is still a technical challenge to use high-level features in real time, yet using rhythmic and tonal properties (midlevel features) in combination with text-based information (e.g., tags) helps to achieve a closer perceptual level centered on pitch and rhythm when using MIR in live coding. We discuss challenges and future directions of utilizing MIR approaches in the computer music field.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{genchel_explicitly_2019,
title = {Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs},
author = {Benjamin Genchel and Ashis K Pati and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Genchel-et-al.-2019-Explicitly-Conditioned-Melody-Generation-A-Case-S.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Workshop on Musical Metacreation (MuMe)},
address = {Charlotte},
abstract = {Deep generative models for symbolic music are typically designed to model temporal dependencies in music so as to predict the next musical event given previous events. In many cases, such models are expected to learn abstract concepts such as harmony, meter, and rhythm from raw musical data without any additional information. In this study, we investigate the effects of explicitly conditioning deep generative models with musically relevant information. Specifically, we study the effects of four different conditioning inputs on the performance of a recurrent monophonic melody generation model. Several combinations of these conditioning inputs are used to train different model variants which are then evaluated using three objective evaluation paradigms across two genres of music. The results indicate musically relevant conditioning significantly improves learning and performance, and reveal how this information affects learning of musical features related to pitch and rhythm. An informal subjective evaluation suggests a corresponding improvement in the aesthetic quality of generations.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{lerch_music_2019,
title = {Music Performance Analysis: A Survey},
author = {Alexander Lerch and Claire Arthur and Ashis Pati and Siddharth Gururani},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Lerch-et-al.-2019-Music-Performance-Analysis-A-Survey.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {Music Information Retrieval (MIR) tends to focus on the analysis of audio signals. Often, a single music recording is used as representative of a "song" even though different performances of the same song may reveal different properties. A performance is distinct in many ways from a (arguably more abstract) representation of a "song," "piece," or musical score. The characteristics of the (recorded) performance -as opposed to the score or musical idea- can have a major impact on how a listener perceives music. The analysis of music performance, however, has been traditionally only a peripheral topic for the MIR research community. This paper surveys the field of Music Performance Analysis (MPA) from various perspectives, discusses its significance to the field of MIR, and points out opportunities for future research in this field.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_learning_2019,
title = {Learning to Traverse Latent Spaces for Musical Score Inpainting},
author = {Ashis Pati and Alexander Lerch and Ga\"{e}tan Hadjeres},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Pati-et-al.-2019-Learning-to-Traverse-Latent-Spaces-for-Musical-Sco.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {Music Inpainting is the task of filling in missing or lost information in a piece of music. We investigate this task from an interactive music creation perspective. To this end, a novel deep learning-based approach for musical score inpainting is proposed. The designed model takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically meaningful manner. To achieve this, we leverage the representational power of the latent space of a Variational Auto-Encoder and train a Recurrent Neural Network which learns to traverse this latent space conditioned on the past and future musical contexts. Consequently, the designed model is capable of generating several measures of music to connect two musical excerpts. The capabilities and performance of the model are showcased by comparison with competitive baselines using several objective and subjective evaluation methods. The results show that the model generates meaningful inpaintings and can be used in interactive music creation applications. Overall, the method demonstrates the merit of learning complex trajectories in the latent spaces of deep generative models.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{huang_automatic_2019,
title = {Automatic Assessment of Sight-Reading Exercises},
author = {Jiawen Huang and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/07/Huang-and-Lerch-2019-Automatic-Assessment-of-Sight-Reading-Exercises.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {Sight-reading requires a musician to decode, process, and perform a musical score quasi-instantaneously and without rehearsal. Due to the complexity of this task, it is difficult to assess the proficiency of a sight-reading performance, and it is even more challenging to model its human assessment. This study aims at evaluating and identifying effective features for automatic assessment of sight-reading performance. The evaluated set of features comprises task-specific, hand-crafted, and interpretable features designed to represent various aspect of sight-reading performance covering parameters such as intonation, timing, dynamics, and score continuity. The most relevant features are identified by Principal Component Analysis and forward feature selection. For context, the same features are also applied to the assessment of rehearsed student music performances and compared across different assessment categories. The results show potential of automatic assessment models for sight-reading and the relevancy of different features as well as the contribution of different feature groups to different assessment categories.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_attention_2019,
title = {An Attention Mechanism for Music Instrument Recognition},
author = {Siddharth Gururani and Mohit Sharma and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/07/Gururani-et-al.-2019-An-Attention-Mechanism-for-Music-Instrument-Recogn.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {While the automatic recognition of musical instruments has seen significant progress, the task is still considered hard for music featuring multiple instruments as opposed to single instrument recordings. Datasets for polyphonic instrument recognition can be categorized into roughly two categories. Some, suchasMedleyDB,havestrongper-frameinstrument activity annotations but are usually small in size. Other, larger datasets such as OpenMIC only have weak labels, i.e., instrument presence or absence is annotated only for long snippets of a song. We explore an attention mechanism for handling weakly labeled data for multi-label instrument recognition. Attention has been found to perform well for other tasks with weakly labeled data. We compare the proposed attention model to multiple models which include a baseline binary relevance random forest, recurrent neural network, and fully connected neural networks. Our results show that incorporating attention leads to an overall improvement in classification accuracy metrics across all 20 instruments in the OpenMIC dataset. We find that attention enables models to focus on (or ‘attend to’) specific time segments in the audio relevant to each instrument label leading to interpretable results.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{guan_evaluation_2018,
title = {Evaluation of Feature Learning Methods for Voice Disorder Detection},
author = {Hongzhao Guan and Alexander Lerch},
doi = {10.1142/S1793351X19400191},
year = {2019},
date = {2019-01-01},
journal = {International Journal of Semantic Computing (IJSC)},
volume = {13},
number = {4},
pages = {453--470},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2018
@inproceedings{wu_learned_2018,
title = {Learned Features for the Assessment of Percussive Music Performances},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Conference on Semantic Computing (ICSC)},
publisher = {IEEE},
address = {Laguna Hills},
keywords = {audio, feature learning, music performance analysis, percussion},
pubstate = {published},
tppubtype = {inproceedings}
}
@incollection{lerch_relation_2018,
title = {The Relation Between Music Technology and Music Industry},
author = {Alexander Lerch},
editor = {Rolf Bader},
url = {https://link.springer.com/chapter/10.1007/978-3-662-55004-5_44},
doi = {10.1007/978-3-662-55004-5_44},
isbn = {978-3-662-55002-1 978-3-662-55004-5},
year = {2018},
date = {2018-01-01},
urldate = {2018-03-26},
booktitle = {Springer Handbook of Systematic Musicology},
pages = {899--909},
publisher = {Springer, Berlin, Heidelberg},
series = {Springer Handbooks},
abstract = {The music industry has changed drastically over the last century and most of its changes and transformations have been technology-driven. Music technology \textendash encompassing musical instruments, sound generators, studio equipment and software, perceptual audio coding algorithms, and reproduction software and devices \textendash has shaped the way music is produced, performed, distributed, and consumed. The evolution of music technology enabled studios and hobbyist producers to produce music at a technical quality unthinkable decades ago and have affordable access to new effects as well as production techniques. Artists explore nontraditional ways of sound generation and sound modification to create previously unheard effects, soundscapes, or even to conceive new musical styles. The consumer has immediate access to a vast diversity of songs and styles and is able to listen to individualized playlists virtually everywhere and at any time. The most disruptive technological innovations during the past 130 years have probably been:1. The possibility to record and distribute recordings on a large scale through the gramophone. 2. The introduction of vinyl disks enabling high-quality sound reproduction. 3. The compact cassette enabling individualized playlists, music sharing with friends and mobile listening. 4. Digital audio technology enabling high quality professional-grade studio equipment at low prices. 5. Perceptual audio coding in combination with online distribution, streaming, and file sharing. This text will describe these technological innovations and their impact on artists, engineers, and listeners.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
@article{pati_assessment_2018,
title = {Assessment of Student Music Performances Using Deep Neural Networks},
author = {Kumar Ashis Pati and Siddharth Gururani and Alexander Lerch},
url = {http://www.mdpi.com/2076-3417/8/4/507/pdf},
doi = {10.3390/app8040507},
year = {2018},
date = {2018-01-01},
urldate = {2018-03-27},
journal = {Applied Sciences},
volume = {8},
number = {4},
pages = {507},
abstract = {Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an audio recording of a performance and rating it along several criteria such as musicality, note accuracy, etc. Much of the earlier work in this area has been centered around using hand-crafted features intended to capture relevant aspects of a performance. However, such features are based on our limited understanding of music perception and may not be optimal. In this article, we propose using Deep Neural Networks (DNNs) for the task and compare their performance against a baseline model using standard and hand-crafted features. We show that, using input representations at different levels of abstraction, DNNs can outperform the baseline models across all assessment criteria. In addition, we use model analysis techniques to further explain the model predictions in an attempt to gain useful insights into the assessment process. The results demonstrate the potential of using supervised feature learning techniques to better characterize music performances.},
keywords = {deep learning, deep neural networks, DNN, MIR, music education, music informatics, music information retrieval, music learning, music performance assessment},
pubstate = {published},
tppubtype = {article}
}
@article{wu_review_2018,
title = {A Review of Automatic Drum Transcription},
author = {Chih-Wei Wu and Christian Dittmar and Carl Southall and Richard Vogl and Gerhard Widmer and Jason A Hockman and Meinard Muller and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/05/Wu-et-al.-2018-A-review-of-automatic-drum-transcription.pdf},
doi = {10.1109/TASLP.2018.2830113},
issn = {2329-9290},
year = {2018},
date = {2018-01-01},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume = {26},
number = {9},
pages = {1457--1483},
abstract = {In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription (ADT). This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Non-negative Matrix Factorization and Recurrent Neural Networks. We explain the methods' technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this field.},
keywords = {Automatic Music Transcription, deep learning, Instruments, Machine Learning, Matrix Factorization, Rhythm, Spectrogram, Speech processing, Task analysis, Transient analysis},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{xambo_live_2018,
title = {Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases},
author = {Anna Xambo and Gerard Roma and Alexander Lerch and Matthieu Barthet and Gyorgy Fazekas},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/04/Xambo-et-al.-2018-Live-Repurposing-of-Sounds-MIR-Explorations-with-.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the Conference on New Interfaces for Musical Expression (NIME)},
address = {Blacksburg},
abstract = {The recent increase in the accessibility and size of personal and crowd-sourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve and repurpose sounds in musical live coding. We present a live coding system built on SuperCollider enabling the use of audio content from Creative Commons (CC) sound databases such as Freesound or personal sound databases. The novelty of our approach lies in exploiting high-level MIR methods (e.g. query by pitch or rhythmic cues) using live coding techniques applied to sounds. We demonstrate its potential through the reflection of an illustrative case study and the feedback from four expert users. The users tried the system with either a personal database or a crowd-source database and reported its potential in facilitating tailorability of the tool to their own creative workflows. This approach to live repurposing of sounds can be applied to real-time interactive systems for performance and composition beyond live coding, as well as inform live coding and MIR research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{seipel_multi-track_2018,
title = {Multi-Track Crosstalk Reduction},
author = {Fabian Seipel and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Seipel-and-Lerch-2018-Multi-Track-Crosstalk-Reduction.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the Audio Engineering Society Convention},
publisher = {Audio Engineering Society (AES)},
address = {Milan},
abstract = {While many music-related blind source separation methods focus on mono or stereo material, the detection and reduction of crosstalk in multi-track recordings is less researched. Crosstalk or ’bleed’ of one recorded channel in another is a very common phenomenon in specific genres such as jazz and classical, where all instrumentalists are recorded simultaneously. We present an efficient algorithm that estimates the crosstalk amount in the spectral domain and applies spectral subtraction to remove it. Randomly generated artificial mixtures from various anechoic orchestral source material were employed to develop and evaluate the algorithm, which scores an average SIR-Gain result of 15.14dB on various datasets with different amounts of simulated crosstalk.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_instrument_2018,
title = {Instrument Activity Detection in Polyphonic Music using Deep Neural Networks},
author = {Siddharth Gururani and Cameron Summers and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-Instrument-Activity-Detection-in-Polyphonic-Music-.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{wu_labeled_2018,
title = {From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Wu-and-Lerch-From-Labeled-to-Unlabeled-Data-On-the-Data-Chal.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{subramanian_concert_2018,
title = {Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings},
author = {Vinod Subramanian and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Subramanian-and-Lerch-Concert-Stitch-Organization-and-Synchromization-o.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_analysis_2018,
title = {Analysis of Objective Descriptors for Music Performance Assessment},
author = {Siddharth Gururani and Kumar Ashis Pati and Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-2018-Analysis-of-Objective-Descriptors-for-Music-Perfor.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Conference on Music Perception and Cognition (ICMPC)},
address = {Montreal, Canada},
abstract = {The assessment of musical performances in, e.g., student competitions or auditions, is a largely subjective evaluation of a performer's technical skills and expressivity. Objective descriptors extracted from the audio signal have been proposed for automatic performance assessment in such a context. Such descriptors represent different aspects of pitch, dynamics and timing of a performance and have been shown to be reasonably successful in modeling human assessments of student performances through regression. This study aims to identify the influence of individual descriptors on models of human assessment in 4 categories: musicality, note accuracy, rhythmic accuracy, and tone quality. To evaluate the influence of the individual descriptors, the descriptors highly correlated with the human assessments are identified. Subsequently, various subsets are chosen using different selection criteria and the adjusted R-squared metric is computed to evaluate the degree to which these subsets explain the variance in the assessments. In addition, sequential forward selection is performed to identify the most meaningful descriptors. The goal of this study is to gain insights into which objective descriptors contribute most to the human assessments as well as to identify a subset of well-performing descriptors. The results indicate that a small subset of the designed descriptors can perform at a similar accuracy as the full set of descriptors. Sequential forward selection shows how around 33% of the descriptors do not add new information to the linear regression models, pointing towards redundancy in the descriptors.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{genchel_lead_2018,
title = {Lead Sheet Generation with Musically Interdependent Networks},
author = {Benjamin Genchel and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/08/Genchel-and-Lerch-2018-Lead-Sheet-Generation-with-Musically-Interdependen.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC)},
address = {Dublin},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{wu_assessment_2018,
title = {Assessment of Percussive Music Performances with Feature Learning},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/09/ws-ijsc_cw_submission.pdf},
doi = {10.1142/S1793351X18400147},
issn = {1793-351X},
year = {2018},
date = {2018-01-01},
journal = {International Journal of Semantic Computing},
volume = {12},
number = {3},
pages = {315--333},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@article{yang_evaluation_2018,
title = {On the evaluation of generative models in music},
author = {Li-Chia Yang and Alexander Lerch},
url = {https://rdcu.be/baHuU
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/11/postprint.pdf},
doi = {10.1007/s00521-018-3849-7},
issn = {1433-3058},
year = {2018},
date = {2018-01-01},
urldate = {2018-11-04},
journal = {Neural Computing and Applications},
abstract = {The modeling of artificial, human-level creativity is becoming more and more achievable. In recent years, neural networks have been successfully applied to different tasks such as image and music generation, demonstrating their great potential in realizing computational creativity. The fuzzy definition of creativity combined with varying goals of the evaluated generative systems, however, makes subjective evaluation seem to be the only viable methodology of choice. We review the evaluation of generative music systems and discuss the inherent challenges of their evaluation. Although subjective evaluation should always be the ultimate choice for the evaluation of creative results, researchers unfamiliar with rigorous subjective experiment design and without the necessary resources for the execution of a large-scale experiment face challenges in terms of reliability, validity, and replicability of the results. In numerous studies, this leads to the report of insignificant and possibly irrelevant results and the lack of comparability with similar and previous generative systems. Therefore, we propose a set of simple musically informed objective metrics enabling an objective and reproducible way of evaluating and comparing the output of music generative systems. We demonstrate the usefulness of the proposed metrics with several experiments on real-world data.},
keywords = {Computational creativity, Music generation, Objective evaluation},
pubstate = {published},
tppubtype = {article}
}
2017
@inproceedings{wu_blind_2017,
title = {Blind Bandwidth Extension using K-Means and Support Vector Regression},
author = {Chih-Wei Wu and Mark Vinton},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/04/Wu-and-Vinton-2017-Blind-Bandwidth-Extension-using-K-Means-and-Suppor.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {IEEE},
address = {New Orleans},
abstract = {In this paper, a blind bandwidth extension algorithm for music signals has been proposed. This method applies the K-means algorithm to firstly cluster audio data in the feature space, and constructs multiple envelope predictors for each cluster accordingly using Support Vector Regression (SVR). A set of well-established audio features for Music Information Retrieval (MIR) has been used to characterize the audio content. The resulting system is applied to a variety of music signals without any side information provided. The subjec tive listening test results show that his method can improve the perceptual quality successfully, ut the minor artifacts still leave room for future improvements.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_automatic_2017,
title = {Automatic Sample Detection in Polyphonic Music},
author = {Siddharth Gururani and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/07/Gururani_Lerch_2017_Automatic-Sample-Detection-in-Polyphonic-Music.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {The term `sampling' refers to the usage of snippets or loops from existing songs or sample libraries in new songs, mashups, or other music productions. The ability to automatically detect sampling in music is, for instance, beneficial for studies tracking artist influences geographically and temporally. We present a method based on Non-negative Matrix Factorization (NMF) and Dynamic Time Warping (DTW) for the automatic detection of a sample in a pool of songs. The method comprises of two processing steps: first, the DTW alignment path between NMF activations of a song and query sample is computed. Second, features are extracted from this path and used to train a Random Forest classifier to detect the presence of the sample. The method is able to identify samples that are pitch shifted and/or time stretched with approximately 63% F-measure. We evaluate this method against a new publicly available dataset of real-world sample and song pairs.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_mixing_2017,
title = {Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music},
author = {Siddharth Gururani and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {Instrument recognition as a task in MIR is largely data drive. This drives a need for large datasets that cater to the need of these algorithms. Several datasets exist for the task of instrument recognition in monophonic signals. For polyphonic music, creating a finely labeled dataset for instrument recognition is a hard task and using multi-track data eases that process. We present 250+ multi-tracks that have been labeled for instrument recognition and release the annotations to be used in the community. The process of data acquisition, cleaning and labeling has been detailed in this late-breaking demo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_dataset_2017,
title = {A Dataset and Method for Electric Guitar Solo Detection in Rock Music},
author = {Kumar Ashis Pati and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Pati_Lerch_2017_A-Dataset-and-Method-for-Electric-Guitar-Solo-Detection-in-Rock-Music.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the AES Conference on Semantic Audio},
publisher = {Audio Engineering Society (AES)},
address = {Erlangen},
abstract = {This paper explores the problem of automatically detecting electric guitar solos in rock music. A baseline study using standard spectral and temporal audio features in conjunction with an SVM classifier is carried out. To improve detection rates, custom features based on predominant pitch and structural segmentation of songs are designed and investigated. The evaluation of different feature combinations suggests that the combination of all features followed by a post-processing step results in the best accuracy. A macro-accuracy of 78.6% with a solo detection precision of 63.3% is observed for the best feature combination. This publication is accompanied by release of an annotated dataset of electric guitar solos to encourage future research in this area.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{southall_mdb_2017,
title = {MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription},
author = {Carl Southall and Chih-Wei Wu and Alexander Lerch and Jason A Hockman},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Wu-et-al_2017_MDB-Drums-An-Annotated-Subset-of-MedleyDB-for-Automatic-Drum-Transcription.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {In this paper we present MDB Drums, a new dataset for automatic drum transcription (ADT) tasks. This dataset is built on top of the MusicDelta subset of the MedleyDB dataset, taking advantage of real-world recordings in multi-track format. The dataset is comprised of a variety of genres, providing a balanced pool for developing and evaluating ADT models with respect to various musical styles.
To reduce the cost of the labor-intensive process of manual annotation, a semi-automatic process was utilised in both the annotation and quality control processes. The pre sented dataset consists of 23 tracks with a total of 7994 onsets. These onsets are divided into 6 classes based on drum instruments or 21 subclasses based on playing techniques. Every track consists of a drum-only track as well
as multiple accompanied tracks, enabling audio files containing different combinations of instruments to be used in the ADT evaluation process.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
To reduce the cost of the labor-intensive process of manual annotation, a semi-automatic process was utilised in both the annotation and quality control processes. The pre sented dataset consists of 23 tracks with a total of 7994 onsets. These onsets are divided into 6 classes based on drum instruments or 21 subclasses based on playing techniques. Every track consists of a drum-only track as well
as multiple accompanied tracks, enabling audio files containing different combinations of instruments to be used in the ADT evaluation process.@inproceedings{vidwans_objective_2017,
title = {Objective descriptors for the assessment of student music performances},
author = {Amruta Vidwans and Siddharth Gururani and Chih-Wei Wu and Vinod Subramanian and Rupak Vignesh Swaminathan and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Vidwans-et-al_2017_Objective-descriptors-for-the-assessment-of-student-music-performances.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the AES Conference on Semantic Audio},
publisher = {Audio Engineering Society (AES)},
address = {Erlangen},
abstract = {Assessment of students’ music performances is a subjective task that requires the judgment of technical correctness as well as aesthetic properties. A computational model automatically evaluating music performance based on objective measurements could ensure consistent and reproducible assessments for, e.g., automatic music tutoring systems. In this study, we investigate the effectiveness of various audio descriptors for assessing performances. Specifically, three different sets of features, including a baseline set, score-independent features, and score-based features, are compared with respect to their efficiency in regression tasks. The results show that human assessments can be modeled to a certain degree, however, the generality of the model still needs further investigation.},
keywords = {computational auditory scene analysis, Computer sound processing, Content analysis (Communication), Data processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{wu_automatic_2017,
title = {Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/07/Wu_Lerch_2017_Automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {Automatic drum transcription is a sub-task of automatic music transcription that converts drum-related audio events into musical notation. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, the major limitation of many state-of-the-art systems still originates from the difficulty of obtaining a meaningful amount of annotated data to support the data-driven algorithms. In this work, we address the challenge of insufficiently labeled data by exploring the possibility of utilizing unlabeled music data from online resources. Specifically, a student neural network is trained using the labels generated from multiple teacher systems. The performance of the model is evaluated on a publicly available dataset. The results show the general viability of using unlabeled music data to improve the performance of drum transcription systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
publications
Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models Journal Article In: Software Impacts, pp. 100222, 2022, ISSN: 26659638. Scream Detection in Heavy Metal Music Inproceedings In: Proceedings of the Sound and Music Computing Conference (SMC), Saint-Etienne, 2022. Feature-informed Embedding Space Regularization for Audio Classification Inproceedings In: Proceedings of the European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022. Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Online, 2021. An Adaptive Particle Swarm Optimizer with Decoupled Exploration and Exploitation for Large Scale Optimization Journal Article In: Swarm and Evolutionary Computation, vol. 60, 2021, ISSN: 2210-6502. Mind the Beat: Detecting Audio Onsets from EEG Recordings of Music Listening Inproceedings In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Institute of Electrical and Electronics Engineers (IEEE), Toronto, Ontario, Canada, 2021. Improving Music Performance Assessment with Contrastive Learning Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Is Disentanglement Enough? On Latent Representations for Controllable Music Generation Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), pp. 8, Online, 2021. Semi-Supervised Audio Classification with Partially Labeled Data Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), online, 2021. Machine Learning Applied to Music/Audio Signal Processing Journal Article In: Electronics, vol. 10, no. 24, pp. 3077, 2021. Score-informed Networks for Music Performance Assessment Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. dMelodies: A Music Dataset for Disentanglement Learning Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. Multi-Task Learning for Instrument Activation Aware Music Source Separation Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Montreal, 2020. Attribute-based Regularization for Latent Spaces of Variational Auto-Encoders Journal Article In: Neural Computing and Applications, 2020. Automatic Classification of Live and Studio Audio Recordings using Convolutional Neural Networks Inproceedings In: Proceedings of the Audio Engineering Society Convention, New York, 2020. Remixing Music with Visual Conditioning Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), Naples, Italy, 2020. Melody-Conditioned Lyrics Generation with SeqGANs Inproceedings In: Proceedings of the IEEE International Symposium on Multimedia (ISM), Institute of Electrical and Electronics Engineers (IEEE), Naples, Italy, 2020. An Interdisciplinary Review of Music Performance Analysis Journal Article In: Transactions of the International Society for Music Information Retrieval (TISMIR), vol. 3, no. 1, pp. 221–245, 2020. Learning Strategies for Voice Disorder Detection Inproceedings In: Proceedings of the International Conference on Semantic Computing (ICSC), 2019. Improving Singing Voice Separation using Attribute-Aware Deep Network Inproceedings In: Proceedings of the International Workshop on Multilayer Music Representation and Processing (MMRP), Milan, Italy, 2019. Tuning Frequency Dependency in Music Classification Inproceedings In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (ICASSP), Brighton, UK, 2019. A Comparison of Music Input Domains for Self-Supervised Feature Learning Inproceedings In: ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract, Long Beach, 2019. Latent Space Regularization for Explicit Control of Musical Attributes Inproceedings In: ICML Machine Learning for Music Discovery Workshop (ML4MD), Extended Abstract, Long Beach, 2019. Music Information Retrieval in Live Coding: A Theoretical Framework Journal Article In: Computer Music Journal, vol. 42, no. 4, pp. 9–25, 2019. Explicitly Conditioned Melody Generation: A Case Study with Interdependent RNNs Inproceedings In: Proceedings of the International Workshop on Musical Metacreation (MuMe), Charlotte, 2019. Music Performance Analysis: A Survey Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Learning to Traverse Latent Spaces for Musical Score Inpainting Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Automatic Assessment of Sight-Reading Exercises Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. An Attention Mechanism for Music Instrument Recognition Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Evaluation of Feature Learning Methods for Voice Disorder Detection Journal Article In: International Journal of Semantic Computing (IJSC), vol. 13, no. 4, pp. 453–470, 2019. Learned Features for the Assessment of Percussive Music Performances Inproceedings In: Proceedings of the International Conference on Semantic Computing (ICSC), IEEE, Laguna Hills, 2018. The Relation Between Music Technology and Music Industry Incollection In: Bader, Rolf (Ed.): Springer Handbook of Systematic Musicology, pp. 899–909, Springer, Berlin, Heidelberg, 2018, ISBN: 978-3-662-55002-1 978-3-662-55004-5. Assessment of Student Music Performances Using Deep Neural Networks Journal Article In: Applied Sciences, vol. 8, no. 4, pp. 507, 2018. A Review of Automatic Drum Transcription Journal Article In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1457–1483, 2018, ISSN: 2329-9290. Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases Inproceedings In: Proceedings of the Conference on New Interfaces for Musical Expression (NIME), Blacksburg, 2018. Multi-Track Crosstalk Reduction Inproceedings In: Proceedings of the Audio Engineering Society Convention, Audio Engineering Society (AES), Milan, 2018. Instrument Activity Detection in Polyphonic Music using Deep Neural Networks Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Analysis of Objective Descriptors for Music Performance Assessment Inproceedings In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), Montreal, Canada, 2018. Lead Sheet Generation with Musically Interdependent Networks Inproceedings In: Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC), Dublin, 2018. Assessment of Percussive Music Performances with Feature Learning Journal Article In: International Journal of Semantic Computing, vol. 12, no. 3, pp. 315–333, 2018, ISSN: 1793-351X. On the evaluation of generative models in music Journal Article In: Neural Computing and Applications, 2018, ISSN: 1433-3058. Blind Bandwidth Extension using K-Means and Support Vector Regression Inproceedings In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, New Orleans, 2017. Automatic Sample Detection in Polyphonic Music Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. A Dataset and Method for Electric Guitar Solo Detection in Rock Music Inproceedings In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription Inproceedings In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Objective descriptors for the assessment of student music performances Inproceedings In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data Inproceedings In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017.2022
2021
2020
2019
2018
2017