Pati, Ashis; Lerch, Alexander; Hadjeres, Gaëtan Learning to Traverse Latent Spaces for Musical Score Inpainting Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Huang, Jiawen; Lerch, Alexander Automatic Assessment of Sight-Reading Exercises Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Sharma, Mohit; Lerch, Alexander An Attention Mechanism for Music Instrument Recognition Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Abstract | Links | BibTeX | Tags: Guan, Hongzhao; Lerch, Alexander Evaluation of Feature Learning Methods for Voice Disorder Detection Journal Article In: International Journal of Semantic Computing (IJSC), vol. 13, no. 4, pp. 453–470, 2019. Wu, Chih-Wei; Lerch, Alexander Learned Features for the Assessment of Percussive Music Performances Proceedings Article In: Proceedings of the International Conference on Semantic Computing (ICSC), IEEE, Laguna Hills, 2018. Links | BibTeX | Tags: audio, feature learning, music performance analysis, percussion Lerch, Alexander The Relation Between Music Technology and Music Industry Book Section In: Bader, Rolf (Ed.): Springer Handbook of Systematic Musicology, pp. 899–909, Springer, Berlin, Heidelberg, 2018, ISBN: 978-3-662-55002-1 978-3-662-55004-5. Abstract | Links | BibTeX | Tags: Pati, Kumar Ashis; Gururani, Siddharth; Lerch, Alexander Assessment of Student Music Performances Using Deep Neural Networks Journal Article In: Applied Sciences, vol. 8, no. 4, pp. 507, 2018. Abstract | Links | BibTeX | Tags: deep learning, deep neural networks, DNN, MIR, music education, music informatics, music information retrieval, music learning, music performance assessment Wu, Chih-Wei; Dittmar, Christian; Southall, Carl; Vogl, Richard; Widmer, Gerhard; Hockman, Jason A; Muller, Meinard; Lerch, Alexander A Review of Automatic Drum Transcription Journal Article In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1457–1483, 2018, ISSN: 2329-9290. Abstract | Links | BibTeX | Tags: Automatic Music Transcription, deep learning, Instruments, Machine Learning, Matrix Factorization, Rhythm, Spectrogram, Speech processing, Task analysis, Transient analysis Xambo, Anna; Roma, Gerard; Lerch, Alexander; Barthet, Matthieu; Fazekas, Gyorgy Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases Proceedings Article In: Proceedings of the Conference on New Interfaces for Musical Expression (NIME), Blacksburg, 2018. Abstract | Links | BibTeX | Tags: Seipel, Fabian; Lerch, Alexander Multi-Track Crosstalk Reduction Proceedings Article In: Proceedings of the Audio Engineering Society Convention, Audio Engineering Society (AES), Milan, 2018. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Summers, Cameron; Lerch, Alexander Instrument Activity Detection in Polyphonic Music using Deep Neural Networks Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Wu, Chih-Wei; Lerch, Alexander From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Subramanian, Vinod; Lerch, Alexander Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Gururani, Siddharth; Pati, Kumar Ashis; Wu, Chih-Wei; Lerch, Alexander Analysis of Objective Descriptors for Music Performance Assessment Proceedings Article In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), Montreal, Canada, 2018. Abstract | Links | BibTeX | Tags: Genchel, Benjamin; Lerch, Alexander Lead Sheet Generation with Musically Interdependent Networks Proceedings Article In: Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC), Dublin, 2018. Wu, Chih-Wei; Lerch, Alexander Assessment of Percussive Music Performances with Feature Learning Journal Article In: International Journal of Semantic Computing, vol. 12, no. 3, pp. 315–333, 2018, ISSN: 1793-351X. Yang, Li-Chia; Lerch, Alexander On the evaluation of generative models in music Journal Article In: Neural Computing and Applications, 2018, ISSN: 1433-3058. Abstract | Links | BibTeX | Tags: Computational creativity, Music generation, Objective evaluation Wu, Chih-Wei; Vinton, Mark Blind Bandwidth Extension using K-Means and Support Vector Regression Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, New Orleans, 2017. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Automatic Sample Detection in Polyphonic Music Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Gururani, Siddharth; Lerch, Alexander Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Pati, Kumar Ashis; Lerch, Alexander A Dataset and Method for Electric Guitar Solo Detection in Rock Music Proceedings Article In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Abstract | Links | BibTeX | Tags: Southall, Carl; Wu, Chih-Wei; Lerch, Alexander; Hockman, Jason A MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Vidwans, Amruta; Gururani, Siddharth; Wu, Chih-Wei; Subramanian, Vinod; Swaminathan, Rupak Vignesh; Lerch, Alexander Objective descriptors for the assessment of student music performances Proceedings Article In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Abstract | Links | BibTeX | Tags: computational auditory scene analysis, Computer sound processing, Content analysis (Communication), Data processing Wu, Chih-Wei; Lerch, Alexander Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Abstract | Links | BibTeX | Tags: Chen, Zhiqian; Wu, Chih-Wei; Lu, Yen-Cheng; Lerch, Alexander; Lu, Chang-Tien Learning to Fuse Music Genres with Generative Adversarial Dual Learning Proceedings Article In: Proceedings of the International Conference on Data Mining (ICDM), Institute of Electrical and Electronics Engineers (IEEE), New Orleans, 2017. Abstract | Links | BibTeX | Tags: Freeman, Jason; Lerch, Alexander; Paradis, Matthew (Ed.) Proceedings of the 2nd Web Audio Conference (WAC-2016) Book Georgia Institute of Technology, Atlanta, 2016, ISBN: 978-0-692-61973-5. Laguna, Christopher; Lerch, Alexander An Efficient Algorithm For Clipping Detection And Declipping Audio Proceedings Article In: Proceedings of the 141st AES Convention, Audio Engineering Society (AES), Los Angeles, 2016. Abstract | Links | BibTeX | Tags: Lu, Yen-Cheng; Wu, Chih-Wei; Lu, Chang-Tien; Lerch, Alexander An Unsupervised Approach to Anomaly Detection in Music Datasets Proceedings Article In: Proceedings of the ACM SIGIR Conference (SIGIR), pp. 749–752, ACM, Pisa, 2016, ISBN: 978-1-4503-4069-4. Abstract | Links | BibTeX | Tags: anomaly detection, data clean-up, music genre retrieval, music information retrieval Lu, Yen-Cheng; Wu, Chih-Wei; Lu, Chang-Tien; Lerch, Alexander Automatic Outlier Detection in Music Genre Datasets Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Abstract | Links | BibTeX | Tags: anomaly detection, data clean-up, music genre retrieval, music information retrieval Winters, Michael R; Gururani, Siddharth; Lerch, Alexander Automatic Practice Logging: Introduction, Dataset & Preliminary Study Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Abstract | Links | BibTeX | Tags: Wu, Chih-Wei; Gururani, Siddharth; Laguna, Christopher; Pati, Ashis; Vidwans, Amruta; Lerch, Alexander Towards the Objective Assessment of Music Performances Proceedings Article In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), pp. 99–103, San Francisco, 2016, ISBN: 1-879346-65-5. Abstract | Links | BibTeX | Tags: Wu, Chih-Wei; Lerch, Alexander On Drum Playing Technique Detection in Polyphonic Mixtures Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Abstract | Links | BibTeX | Tags: Xambo, Anna; Lerch, Alexander; Freeman, Jason Learning to code through MIR Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Abstract | Links | BibTeX | Tags: Gupta, Udit; II, Elliot Moore; Lerch, Alexander On the Perceptual Relevance of Objective Source Separation Measures for Singing Voice Separation Proceedings Article In: Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE, New Paltz, 2015. Abstract | Links | BibTeX | Tags: Lykartsis, Athanasios; Lerch, Alexander; Weinzierl, Stefan Analysis of Speech Rhythm for Language Identification Based on Beat Histograms Proceedings Article In: Proceedings of the DAGA (Jahrestagung fur Akustik), Nuremberg, 2015. Lykartsis, Athanasios; Lerch, Alexander Beat Histogram Features for Rythm-based Musical Genre Classification Using Multiple Novelty Functions Proceedings Article In: Proceedings of the International Conference on Digital Audio Effects (DAFX), Trondheim, Norway, 2015. Abstract | Links | BibTeX | Tags: Lykartsis, Athanasios; Wu, Chih-Wei; Lerch, Alexander Beat Histogram Features from NMF-Based Novelty Functions for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. Abstract | Links | BibTeX | Tags: O'Brien, Cian; Lerch, Alexander Genre-Specific Key Profiles Proceedings Article In: Proceedings of the International Computer Music Conference (ICMC), ICMA, Denton, 2015. Abstract | Links | BibTeX | Tags: Wu, Chih-Wei; Lerch, Alexander Drum Transcription using Partially Fixed Non-Negative Matrix Factorization Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), EURASIP, Nice, 2015. Abstract | Links | BibTeX | Tags: Wu, Chih-Wei; Lerch, Alexander Drum Transcription using Partially Fixed Non-Negative Matrix Factorization With Template Adaptation Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. Abstract | Links | BibTeX | Tags: Zhou, Xinquan; Lerch, Alexander Chord Detection Using Deep Learning Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. Abstract | Links | BibTeX | Tags: von Coler, Henrik; Lerch, Alexander CMMSD: A Data Set for Note-Level Segmentation of Monophonic Music Proceedings Article In: Proceedings of the AES 53rd International Conference on Semantic Audio, Audio Engineering Society (AES), London, UK, 2014. Abstract | Links | BibTeX | Tags: Lerch, Alexander Music Information Retrieval Book Section In: Weinzierl, Stefan (Ed.): Akustische Grundlagen der Musik, no. 5, pp. 79–102, Laaber, 2014, ISBN: 978-3-89007-699-7. BibTeX | Tags: Kraft, Sebastian; Lerch, Alexander; Zölzer, Udo The Tonalness Spectrum: Feature-Based Estimation of Tonal Components Proceedings Article In: Proceedings of the 16th International Conference on Digital Audio Effects, Maynooth, 2013. Abstract | Links | BibTeX | Tags: Lerch, Alexander An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics Book Wiley-IEEE Press, Hoboken, 2012, ISBN: 978-1-118-26682-3. Abstract | Links | BibTeX | Tags: analysis, audio, audio signal processing, information, listening, machine, machine listening, music, music analysis, music information retrieval, processing, retrieval, signal Kirchhoff, Holger; Lerch, Alexander Evaluation of Features for Audio-to-Audio Alignment Journal Article In: Journal of New Music Research, vol. 40, no. 1, pp. 27–41, 2011. Abstract | Links | BibTeX | Tags: Lerch, Alexander Software-gestützte Merkmalsextraktion für die musikalische Aufführungsanalyse Book Section In: von Loesch, Heinz; Weinzierl, Stefan (Ed.): Gemessene Interpretation - Computergestützte Aufführungsanalyse im Kreuzverhör der Disziplinen, pp. 205–212, Schott, Mainz, 2011, ISBN: 978-3-7957-0771-2. BibTeX | Tags: Ness, Steven R; Lerch, Alexander; Tzanetakis, George Strategies for Orca Call Retrieval to Support Collaborative Annotation of a Large Archive Proceedings Article In: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), IEEE, Hangzhou, 2011, ISBN: 978-1-4577-1434-4. Abstract | Links | BibTeX | Tags: Wiesener, Constantin; Flohrer, Tim; Lerch, Alexander; Weinzierl, Stefan Adaptive Noise Reduction for Real-time Applications Proceedings Article In: Proceedings of the 128th Audio Engineering Society Convention (Preprint #8048), Audio Engineering Society, London, 2010. Abstract | Links | BibTeX | Tags: Lerch, Alexander Software-Based Extraction of Objective Parameters from Music Performances Book GRIN Verlag, München, 2009, ISBN: 978-3-640-29496-1. Abstract | Links | BibTeX | Tags: analysis, audio, content, information, music, performance, retrieval2019
@inproceedings{pati_learning_2019,
title = {Learning to Traverse Latent Spaces for Musical Score Inpainting},
author = {Ashis Pati and Alexander Lerch and Ga\"{e}tan Hadjeres},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/06/Pati-et-al.-2019-Learning-to-Traverse-Latent-Spaces-for-Musical-Sco.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {Music Inpainting is the task of filling in missing or lost information in a piece of music. We investigate this task from an interactive music creation perspective. To this end, a novel deep learning-based approach for musical score inpainting is proposed. The designed model takes both past and future musical context into account and is capable of suggesting ways to connect them in a musically meaningful manner. To achieve this, we leverage the representational power of the latent space of a Variational Auto-Encoder and train a Recurrent Neural Network which learns to traverse this latent space conditioned on the past and future musical contexts. Consequently, the designed model is capable of generating several measures of music to connect two musical excerpts. The capabilities and performance of the model are showcased by comparison with competitive baselines using several objective and subjective evaluation methods. The results show that the model generates meaningful inpaintings and can be used in interactive music creation applications. Overall, the method demonstrates the merit of learning complex trajectories in the latent spaces of deep generative models.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{huang_automatic_2019,
title = {Automatic Assessment of Sight-Reading Exercises},
author = {Jiawen Huang and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/07/Huang-and-Lerch-2019-Automatic-Assessment-of-Sight-Reading-Exercises.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {Sight-reading requires a musician to decode, process, and perform a musical score quasi-instantaneously and without rehearsal. Due to the complexity of this task, it is difficult to assess the proficiency of a sight-reading performance, and it is even more challenging to model its human assessment. This study aims at evaluating and identifying effective features for automatic assessment of sight-reading performance. The evaluated set of features comprises task-specific, hand-crafted, and interpretable features designed to represent various aspect of sight-reading performance covering parameters such as intonation, timing, dynamics, and score continuity. The most relevant features are identified by Principal Component Analysis and forward feature selection. For context, the same features are also applied to the assessment of rehearsed student music performances and compared across different assessment categories. The results show potential of automatic assessment models for sight-reading and the relevancy of different features as well as the contribution of different feature groups to different assessment categories.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_attention_2019,
title = {An Attention Mechanism for Music Instrument Recognition},
author = {Siddharth Gururani and Mohit Sharma and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/07/Gururani-et-al.-2019-An-Attention-Mechanism-for-Music-Instrument-Recogn.pdf},
year = {2019},
date = {2019-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Delft},
abstract = {While the automatic recognition of musical instruments has seen significant progress, the task is still considered hard for music featuring multiple instruments as opposed to single instrument recordings. Datasets for polyphonic instrument recognition can be categorized into roughly two categories. Some, suchasMedleyDB,havestrongper-frameinstrument activity annotations but are usually small in size. Other, larger datasets such as OpenMIC only have weak labels, i.e., instrument presence or absence is annotated only for long snippets of a song. We explore an attention mechanism for handling weakly labeled data for multi-label instrument recognition. Attention has been found to perform well for other tasks with weakly labeled data. We compare the proposed attention model to multiple models which include a baseline binary relevance random forest, recurrent neural network, and fully connected neural networks. Our results show that incorporating attention leads to an overall improvement in classification accuracy metrics across all 20 instruments in the OpenMIC dataset. We find that attention enables models to focus on (or ‘attend to’) specific time segments in the audio relevant to each instrument label leading to interpretable results.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{guan_evaluation_2018,
title = {Evaluation of Feature Learning Methods for Voice Disorder Detection},
author = {Hongzhao Guan and Alexander Lerch},
doi = {10.1142/S1793351X19400191},
year = {2019},
date = {2019-01-01},
journal = {International Journal of Semantic Computing (IJSC)},
volume = {13},
number = {4},
pages = {453--470},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
2018
@inproceedings{wu_learned_2018,
title = {Learned Features for the Assessment of Percussive Music Performances},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Conference on Semantic Computing (ICSC)},
publisher = {IEEE},
address = {Laguna Hills},
keywords = {audio, feature learning, music performance analysis, percussion},
pubstate = {published},
tppubtype = {inproceedings}
}
@incollection{lerch_relation_2018,
title = {The Relation Between Music Technology and Music Industry},
author = {Alexander Lerch},
editor = {Rolf Bader},
url = {https://link.springer.com/chapter/10.1007/978-3-662-55004-5_44},
doi = {10.1007/978-3-662-55004-5_44},
isbn = {978-3-662-55002-1 978-3-662-55004-5},
year = {2018},
date = {2018-01-01},
urldate = {2018-03-26},
booktitle = {Springer Handbook of Systematic Musicology},
pages = {899--909},
publisher = {Springer, Berlin, Heidelberg},
series = {Springer Handbooks},
abstract = {The music industry has changed drastically over the last century and most of its changes and transformations have been technology-driven. Music technology \textendash encompassing musical instruments, sound generators, studio equipment and software, perceptual audio coding algorithms, and reproduction software and devices \textendash has shaped the way music is produced, performed, distributed, and consumed. The evolution of music technology enabled studios and hobbyist producers to produce music at a technical quality unthinkable decades ago and have affordable access to new effects as well as production techniques. Artists explore nontraditional ways of sound generation and sound modification to create previously unheard effects, soundscapes, or even to conceive new musical styles. The consumer has immediate access to a vast diversity of songs and styles and is able to listen to individualized playlists virtually everywhere and at any time. The most disruptive technological innovations during the past 130 years have probably been:1. The possibility to record and distribute recordings on a large scale through the gramophone. 2. The introduction of vinyl disks enabling high-quality sound reproduction. 3. The compact cassette enabling individualized playlists, music sharing with friends and mobile listening. 4. Digital audio technology enabling high quality professional-grade studio equipment at low prices. 5. Perceptual audio coding in combination with online distribution, streaming, and file sharing. This text will describe these technological innovations and their impact on artists, engineers, and listeners.},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
@article{pati_assessment_2018,
title = {Assessment of Student Music Performances Using Deep Neural Networks},
author = {Kumar Ashis Pati and Siddharth Gururani and Alexander Lerch},
url = {http://www.mdpi.com/2076-3417/8/4/507/pdf},
doi = {10.3390/app8040507},
year = {2018},
date = {2018-01-01},
urldate = {2018-03-27},
journal = {Applied Sciences},
volume = {8},
number = {4},
pages = {507},
abstract = {Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an audio recording of a performance and rating it along several criteria such as musicality, note accuracy, etc. Much of the earlier work in this area has been centered around using hand-crafted features intended to capture relevant aspects of a performance. However, such features are based on our limited understanding of music perception and may not be optimal. In this article, we propose using Deep Neural Networks (DNNs) for the task and compare their performance against a baseline model using standard and hand-crafted features. We show that, using input representations at different levels of abstraction, DNNs can outperform the baseline models across all assessment criteria. In addition, we use model analysis techniques to further explain the model predictions in an attempt to gain useful insights into the assessment process. The results demonstrate the potential of using supervised feature learning techniques to better characterize music performances.},
keywords = {deep learning, deep neural networks, DNN, MIR, music education, music informatics, music information retrieval, music learning, music performance assessment},
pubstate = {published},
tppubtype = {article}
}
@article{wu_review_2018,
title = {A Review of Automatic Drum Transcription},
author = {Chih-Wei Wu and Christian Dittmar and Carl Southall and Richard Vogl and Gerhard Widmer and Jason A Hockman and Meinard Muller and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/05/Wu-et-al.-2018-A-review-of-automatic-drum-transcription.pdf},
doi = {10.1109/TASLP.2018.2830113},
issn = {2329-9290},
year = {2018},
date = {2018-01-01},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume = {26},
number = {9},
pages = {1457--1483},
abstract = {In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription (ADT). This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Non-negative Matrix Factorization and Recurrent Neural Networks. We explain the methods' technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this field.},
keywords = {Automatic Music Transcription, deep learning, Instruments, Machine Learning, Matrix Factorization, Rhythm, Spectrogram, Speech processing, Task analysis, Transient analysis},
pubstate = {published},
tppubtype = {article}
}
@inproceedings{xambo_live_2018,
title = {Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases},
author = {Anna Xambo and Gerard Roma and Alexander Lerch and Matthieu Barthet and Gyorgy Fazekas},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/04/Xambo-et-al.-2018-Live-Repurposing-of-Sounds-MIR-Explorations-with-.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the Conference on New Interfaces for Musical Expression (NIME)},
address = {Blacksburg},
abstract = {The recent increase in the accessibility and size of personal and crowd-sourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve and repurpose sounds in musical live coding. We present a live coding system built on SuperCollider enabling the use of audio content from Creative Commons (CC) sound databases such as Freesound or personal sound databases. The novelty of our approach lies in exploiting high-level MIR methods (e.g. query by pitch or rhythmic cues) using live coding techniques applied to sounds. We demonstrate its potential through the reflection of an illustrative case study and the feedback from four expert users. The users tried the system with either a personal database or a crowd-source database and reported its potential in facilitating tailorability of the tool to their own creative workflows. This approach to live repurposing of sounds can be applied to real-time interactive systems for performance and composition beyond live coding, as well as inform live coding and MIR research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{seipel_multi-track_2018,
title = {Multi-Track Crosstalk Reduction},
author = {Fabian Seipel and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Seipel-and-Lerch-2018-Multi-Track-Crosstalk-Reduction.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the Audio Engineering Society Convention},
publisher = {Audio Engineering Society (AES)},
address = {Milan},
abstract = {While many music-related blind source separation methods focus on mono or stereo material, the detection and reduction of crosstalk in multi-track recordings is less researched. Crosstalk or ’bleed’ of one recorded channel in another is a very common phenomenon in specific genres such as jazz and classical, where all instrumentalists are recorded simultaneously. We present an efficient algorithm that estimates the crosstalk amount in the spectral domain and applies spectral subtraction to remove it. Randomly generated artificial mixtures from various anechoic orchestral source material were employed to develop and evaluate the algorithm, which scores an average SIR-Gain result of 15.14dB on various datasets with different amounts of simulated crosstalk.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_instrument_2018,
title = {Instrument Activity Detection in Polyphonic Music using Deep Neural Networks},
author = {Siddharth Gururani and Cameron Summers and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-Instrument-Activity-Detection-in-Polyphonic-Music-.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{wu_labeled_2018,
title = {From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Wu-and-Lerch-From-Labeled-to-Unlabeled-Data-On-the-Data-Chal.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{subramanian_concert_2018,
title = {Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings},
author = {Vinod Subramanian and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Subramanian-and-Lerch-Concert-Stitch-Organization-and-Synchromization-o.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
address = {Paris},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_analysis_2018,
title = {Analysis of Objective Descriptors for Music Performance Assessment},
author = {Siddharth Gururani and Kumar Ashis Pati and Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-2018-Analysis-of-Objective-Descriptors-for-Music-Perfor.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Proceedings of the International Conference on Music Perception and Cognition (ICMPC)},
address = {Montreal, Canada},
abstract = {The assessment of musical performances in, e.g., student competitions or auditions, is a largely subjective evaluation of a performer's technical skills and expressivity. Objective descriptors extracted from the audio signal have been proposed for automatic performance assessment in such a context. Such descriptors represent different aspects of pitch, dynamics and timing of a performance and have been shown to be reasonably successful in modeling human assessments of student performances through regression. This study aims to identify the influence of individual descriptors on models of human assessment in 4 categories: musicality, note accuracy, rhythmic accuracy, and tone quality. To evaluate the influence of the individual descriptors, the descriptors highly correlated with the human assessments are identified. Subsequently, various subsets are chosen using different selection criteria and the adjusted R-squared metric is computed to evaluate the degree to which these subsets explain the variance in the assessments. In addition, sequential forward selection is performed to identify the most meaningful descriptors. The goal of this study is to gain insights into which objective descriptors contribute most to the human assessments as well as to identify a subset of well-performing descriptors. The results indicate that a small subset of the designed descriptors can perform at a similar accuracy as the full set of descriptors. Sequential forward selection shows how around 33% of the descriptors do not add new information to the linear regression models, pointing towards redundancy in the descriptors.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{genchel_lead_2018,
title = {Lead Sheet Generation with Musically Interdependent Networks},
author = {Benjamin Genchel and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/08/Genchel-and-Lerch-2018-Lead-Sheet-Generation-with-Musically-Interdependen.pdf},
year = {2018},
date = {2018-01-01},
booktitle = {Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC)},
address = {Dublin},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@article{wu_assessment_2018,
title = {Assessment of Percussive Music Performances with Feature Learning},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/09/ws-ijsc_cw_submission.pdf},
doi = {10.1142/S1793351X18400147},
issn = {1793-351X},
year = {2018},
date = {2018-01-01},
journal = {International Journal of Semantic Computing},
volume = {12},
number = {3},
pages = {315--333},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
@article{yang_evaluation_2018,
title = {On the evaluation of generative models in music},
author = {Li-Chia Yang and Alexander Lerch},
url = {https://rdcu.be/baHuU
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/11/postprint.pdf},
doi = {10.1007/s00521-018-3849-7},
issn = {1433-3058},
year = {2018},
date = {2018-01-01},
urldate = {2018-11-04},
journal = {Neural Computing and Applications},
abstract = {The modeling of artificial, human-level creativity is becoming more and more achievable. In recent years, neural networks have been successfully applied to different tasks such as image and music generation, demonstrating their great potential in realizing computational creativity. The fuzzy definition of creativity combined with varying goals of the evaluated generative systems, however, makes subjective evaluation seem to be the only viable methodology of choice. We review the evaluation of generative music systems and discuss the inherent challenges of their evaluation. Although subjective evaluation should always be the ultimate choice for the evaluation of creative results, researchers unfamiliar with rigorous subjective experiment design and without the necessary resources for the execution of a large-scale experiment face challenges in terms of reliability, validity, and replicability of the results. In numerous studies, this leads to the report of insignificant and possibly irrelevant results and the lack of comparability with similar and previous generative systems. Therefore, we propose a set of simple musically informed objective metrics enabling an objective and reproducible way of evaluating and comparing the output of music generative systems. We demonstrate the usefulness of the proposed metrics with several experiments on real-world data.},
keywords = {Computational creativity, Music generation, Objective evaluation},
pubstate = {published},
tppubtype = {article}
}
2017
@inproceedings{wu_blind_2017,
title = {Blind Bandwidth Extension using K-Means and Support Vector Regression},
author = {Chih-Wei Wu and Mark Vinton},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/04/Wu-and-Vinton-2017-Blind-Bandwidth-Extension-using-K-Means-and-Suppor.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
publisher = {IEEE},
address = {New Orleans},
abstract = {In this paper, a blind bandwidth extension algorithm for music signals has been proposed. This method applies the K-means algorithm to firstly cluster audio data in the feature space, and constructs multiple envelope predictors for each cluster accordingly using Support Vector Regression (SVR). A set of well-established audio features for Music Information Retrieval (MIR) has been used to characterize the audio content. The resulting system is applied to a variety of music signals without any side information provided. The subjec tive listening test results show that his method can improve the perceptual quality successfully, ut the minor artifacts still leave room for future improvements.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_automatic_2017,
title = {Automatic Sample Detection in Polyphonic Music},
author = {Siddharth Gururani and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/07/Gururani_Lerch_2017_Automatic-Sample-Detection-in-Polyphonic-Music.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {The term `sampling' refers to the usage of snippets or loops from existing songs or sample libraries in new songs, mashups, or other music productions. The ability to automatically detect sampling in music is, for instance, beneficial for studies tracking artist influences geographically and temporally. We present a method based on Non-negative Matrix Factorization (NMF) and Dynamic Time Warping (DTW) for the automatic detection of a sample in a pool of songs. The method comprises of two processing steps: first, the DTW alignment path between NMF activations of a song and query sample is computed. Second, features are extracted from this path and used to train a Random Forest classifier to detect the presence of the sample. The method is able to identify samples that are pitch shifted and/or time stretched with approximately 63% F-measure. We evaluate this method against a new publicly available dataset of real-world sample and song pairs.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{gururani_mixing_2017,
title = {Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music},
author = {Siddharth Gururani and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {Instrument recognition as a task in MIR is largely data drive. This drives a need for large datasets that cater to the need of these algorithms. Several datasets exist for the task of instrument recognition in monophonic signals. For polyphonic music, creating a finely labeled dataset for instrument recognition is a hard task and using multi-track data eases that process. We present 250+ multi-tracks that have been labeled for instrument recognition and release the annotations to be used in the community. The process of data acquisition, cleaning and labeling has been detailed in this late-breaking demo.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{pati_dataset_2017,
title = {A Dataset and Method for Electric Guitar Solo Detection in Rock Music},
author = {Kumar Ashis Pati and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Pati_Lerch_2017_A-Dataset-and-Method-for-Electric-Guitar-Solo-Detection-in-Rock-Music.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the AES Conference on Semantic Audio},
publisher = {Audio Engineering Society (AES)},
address = {Erlangen},
abstract = {This paper explores the problem of automatically detecting electric guitar solos in rock music. A baseline study using standard spectral and temporal audio features in conjunction with an SVM classifier is carried out. To improve detection rates, custom features based on predominant pitch and structural segmentation of songs are designed and investigated. The evaluation of different feature combinations suggests that the combination of all features followed by a post-processing step results in the best accuracy. A macro-accuracy of 78.6% with a solo detection precision of 63.3% is observed for the best feature combination. This publication is accompanied by release of an annotated dataset of electric guitar solos to encourage future research in this area.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{southall_mdb_2017,
title = {MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription},
author = {Carl Southall and Chih-Wei Wu and Alexander Lerch and Jason A Hockman},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Wu-et-al_2017_MDB-Drums-An-Annotated-Subset-of-MedleyDB-for-Automatic-Drum-Transcription.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {In this paper we present MDB Drums, a new dataset for automatic drum transcription (ADT) tasks. This dataset is built on top of the MusicDelta subset of the MedleyDB dataset, taking advantage of real-world recordings in multi-track format. The dataset is comprised of a variety of genres, providing a balanced pool for developing and evaluating ADT models with respect to various musical styles.
To reduce the cost of the labor-intensive process of manual annotation, a semi-automatic process was utilised in both the annotation and quality control processes. The pre sented dataset consists of 23 tracks with a total of 7994 onsets. These onsets are divided into 6 classes based on drum instruments or 21 subclasses based on playing techniques. Every track consists of a drum-only track as well
as multiple accompanied tracks, enabling audio files containing different combinations of instruments to be used in the ADT evaluation process.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
To reduce the cost of the labor-intensive process of manual annotation, a semi-automatic process was utilised in both the annotation and quality control processes. The pre sented dataset consists of 23 tracks with a total of 7994 onsets. These onsets are divided into 6 classes based on drum instruments or 21 subclasses based on playing techniques. Every track consists of a drum-only track as well
as multiple accompanied tracks, enabling audio files containing different combinations of instruments to be used in the ADT evaluation process.@inproceedings{vidwans_objective_2017,
title = {Objective descriptors for the assessment of student music performances},
author = {Amruta Vidwans and Siddharth Gururani and Chih-Wei Wu and Vinod Subramanian and Rupak Vignesh Swaminathan and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Vidwans-et-al_2017_Objective-descriptors-for-the-assessment-of-student-music-performances.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the AES Conference on Semantic Audio},
publisher = {Audio Engineering Society (AES)},
address = {Erlangen},
abstract = {Assessment of students’ music performances is a subjective task that requires the judgment of technical correctness as well as aesthetic properties. A computational model automatically evaluating music performance based on objective measurements could ensure consistent and reproducible assessments for, e.g., automatic music tutoring systems. In this study, we investigate the effectiveness of various audio descriptors for assessing performances. Specifically, three different sets of features, including a baseline set, score-independent features, and score-based features, are compared with respect to their efficiency in regression tasks. The results show that human assessments can be modeled to a certain degree, however, the generality of the model still needs further investigation.},
keywords = {computational auditory scene analysis, Computer sound processing, Content analysis (Communication), Data processing},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{wu_automatic_2017,
title = {Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/07/Wu_Lerch_2017_Automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {International Society for Music Information Retrieval (ISMIR)},
address = {Suzhou},
abstract = {Automatic drum transcription is a sub-task of automatic music transcription that converts drum-related audio events into musical notation. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, the major limitation of many state-of-the-art systems still originates from the difficulty of obtaining a meaningful amount of annotated data to support the data-driven algorithms. In this work, we address the challenge of insufficiently labeled data by exploring the possibility of utilizing unlabeled music data from online resources. Specifically, a student neural network is trained using the labels generated from multiple teacher systems. The performance of the model is evaluated on a publicly available dataset. The results show the general viability of using unlabeled music data to improve the performance of drum transcription systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{zhiqian_chen_learning_2017,
title = {Learning to Fuse Music Genres with Generative Adversarial Dual Learning},
author = {Zhiqian Chen and Chih-Wei Wu and Yen-Cheng Lu and Alexander Lerch and Chang-Tien Lu},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/Zhiqian-Chen-et-al_2017_Learning-to-Fuse-Music-Genres-with-Generative-Adversarial-Dual-Learning.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the International Conference on Data Mining (ICDM)},
publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
address = {New Orleans},
abstract = {FusionGAN is a novel genre fusion framework for music generation that integrates the strengths of generative adversarial networks and dual learning. In particular, the proposed method offers a dual learning extension that can effectively integrate the styles of the given domains. To efficiently quantify the difference among diverse domains and avoid the vanishing gradient issue, FusionGAN provides a Wasserstein based metric to approximate the distance between the target domain and the existing domains. Adopting the Wasserstein distance, a new domain is created by combining the patterns of the existing domains using adversarial learning. Experimental results on public music datasets demonstrated that our approach could effectively merge two genres.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2016
@book{freeman_proceedings_2016,
title = {Proceedings of the 2nd Web Audio Conference (WAC-2016)},
editor = {Jason Freeman and Alexander Lerch and Matthew Paradis},
url = {https://smartech.gatech.edu/handle/1853/54577},
isbn = {978-0-692-61973-5},
year = {2016},
date = {2016-01-01},
publisher = {Georgia Institute of Technology},
address = {Atlanta},
keywords = {},
pubstate = {published},
tppubtype = {book}
}
@inproceedings{laguna_efficient_2016,
title = {An Efficient Algorithm For Clipping Detection And Declipping Audio},
author = {Christopher Laguna and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/09/Laguna_Lerch_2016_An-Efficient-Algorithm-For-Clipping-Detection-And-Declipping-Audio.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the 141st AES Convention},
publisher = {Audio Engineering Society (AES)},
address = {Los Angeles},
abstract = {We present an algorithm for end to end declipping, which includes clipping detection and the replacement
of clipped samples. To detect regions of clipping, we analyze the signal’s amplitude histogram and the
shape of the signal in the time-domain. The sample replacement algorithm uses a two-pass approach: short
regions of clipping are replaced in the time-domain and long regions of clipping are replaced in the
frequency-domain. The algorithm is robust against different types of clipping and is efficient compared to
existing approaches. The algorithm has been implemented in an open source JavaScript client-side web
application. Clipping detection is shown to give an f-measure of 0.92 and is robust to the clipping level.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
of clipped samples. To detect regions of clipping, we analyze the signal’s amplitude histogram and the
shape of the signal in the time-domain. The sample replacement algorithm uses a two-pass approach: short
regions of clipping are replaced in the time-domain and long regions of clipping are replaced in the
frequency-domain. The algorithm is robust against different types of clipping and is efficient compared to
existing approaches. The algorithm has been implemented in an open source JavaScript client-side web
application. Clipping detection is shown to give an f-measure of 0.92 and is robust to the clipping level.@inproceedings{lu_unsupervised_2016,
title = {An Unsupervised Approach to Anomaly Detection in Music Datasets},
author = {Yen-Cheng Lu and Chih-Wei Wu and Chang-Tien Lu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/07/Lu-et-al_2016_An-Unsupervised-Approach-to-Anomaly-Detection-in-Music-Datasets.pdf},
doi = {10.1145/2911451.2914700},
isbn = {978-1-4503-4069-4},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the ACM SIGIR Conference (SIGIR)},
pages = {749--752},
publisher = {ACM},
address = {Pisa},
series = {SIGIR '16},
abstract = {This paper presents an unsupervised method for systematically identifying anomalies in music datasets. The model integrates categorical regression and robust estimation techniques to infer anomalous scores in music clips. When applied to a music genre recognition dataset, the new method is able to detect corrupted, distorted, or mislabeled audio samples based on commonly used features in music information retrieval. The evaluation results show that the algorithm outperforms other anomaly detection methods and is capable of finding problematic samples identified by human experts. The proposed method introduces a preliminary framework for anomaly detection in music data that can serve as a useful tool to improve data integrity in the future.},
keywords = {anomaly detection, data clean-up, music genre retrieval, music information retrieval},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{lu_automatic_2016,
title = {Automatic Outlier Detection in Music Genre Datasets},
author = {Yen-Cheng Lu and Chih-Wei Wu and Chang-Tien Lu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/07/Lu-et-al_2016_Automatic-Outlier-Detection-in-Music-Genre-Datasets.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {New York},
series = {ISMIR},
abstract = {Outlier detection, also known as anomaly detection, is an
importanttopicthathasbeenstudiedfordecades. Anoutlier
detection system is able to identify anomalies in a dataset
and thus improve data integrity by removing the detected
outliers. It has been successfully applied to different types
of data in various fields such as cyber-security, finance,
and transportation. In the field of Music Information Re-
trieval (MIR), however, the number of related studies is
small. In this paper, we introduce different state-of-the-art
outlier detection techniques and evaluate their viability in
the context of music datasets. More specifically, we present
a comparative study of 6 outlier detection algorithms ap-
plied to a Music Genre Recognition (MGR) dataset. It is
determined how well algorithms can identify mislabeled or
corrupted files, and how much the quality of the dataset can
be improved. Results indicate that state-of-the-art anomaly
detection systems have problems identifying anomalies in
MGR datasets reliably.},
keywords = {anomaly detection, data clean-up, music genre retrieval, music information retrieval},
pubstate = {published},
tppubtype = {inproceedings}
}
importanttopicthathasbeenstudiedfordecades. Anoutlier
detection system is able to identify anomalies in a dataset
and thus improve data integrity by removing the detected
outliers. It has been successfully applied to different types
of data in various fields such as cyber-security, finance,
and transportation. In the field of Music Information Re-
trieval (MIR), however, the number of related studies is
small. In this paper, we introduce different state-of-the-art
outlier detection techniques and evaluate their viability in
the context of music datasets. More specifically, we present
a comparative study of 6 outlier detection algorithms ap-
plied to a Music Genre Recognition (MGR) dataset. It is
determined how well algorithms can identify mislabeled or
corrupted files, and how much the quality of the dataset can
be improved. Results indicate that state-of-the-art anomaly
detection systems have problems identifying anomalies in
MGR datasets reliably.@inproceedings{winters_automatic_2016,
title = {Automatic Practice Logging: Introduction, Dataset \& Preliminary Study},
author = {Michael R Winters and Siddharth Gururani and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/07/Winters-et-al_2016_Automatic-Practice-Logging.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {New York},
abstract = {Musicians spend countless hours practicing their instru-
ments. To document and organize this time, musicians com-
monly use practice charts to log their practice. However,
manual techniques require time, dedication, and experience
to master, are prone to fallacy and omission, and ultimately
can not describe the subtle variations in each repetition.
This paper presents an alternative: by analyzing and clas-
sifying the audio recorded while practicing, logging could
occur automatically, with levels of detail, accuracy, and ease
that would not be possible otherwise. Towards this goal,
we introduce the problem of Automatic Practice Logging
(APL), including a discussion of the benefits and unique
challenges it raises. We then describe a new dataset of over
600 annotated recordings of solo piano practice, which can
be used to design and evaluate APL systems. After fram-
ing our approach to the problem, we present an algorithm
designed to align short segments of practice audio with
reference recordings using pitch chroma and dynamic time
warping.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
ments. To document and organize this time, musicians com-
monly use practice charts to log their practice. However,
manual techniques require time, dedication, and experience
to master, are prone to fallacy and omission, and ultimately
can not describe the subtle variations in each repetition.
This paper presents an alternative: by analyzing and clas-
sifying the audio recorded while practicing, logging could
occur automatically, with levels of detail, accuracy, and ease
that would not be possible otherwise. Towards this goal,
we introduce the problem of Automatic Practice Logging
(APL), including a discussion of the benefits and unique
challenges it raises. We then describe a new dataset of over
600 annotated recordings of solo piano practice, which can
be used to design and evaluate APL systems. After fram-
ing our approach to the problem, we present an algorithm
designed to align short segments of practice audio with
reference recordings using pitch chroma and dynamic time
warping.@inproceedings{wu_towards_2016,
title = {Towards the Objective Assessment of Music Performances},
author = {Chih-Wei Wu and Siddharth Gururani and Christopher Laguna and Ashis Pati and Amruta Vidwans and Alexander Lerch},
url = {http://www.icmpc.org/icmpc14/proceedings.html},
isbn = {1-879346-65-5},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the International Conference on Music Perception and Cognition (ICMPC)},
pages = {99--103},
address = {San Francisco},
abstract = {The qualitative assessment of music performances is a task that is
influenced by technical correctness, deviations from established
performance standards, and aesthetic judgment. Despite its inherently
subjective nature, a quantitative overall assessment is often desired,
as exemplified by US all-state auditions or other competitions. A
model that automatically generates assessments from the audio data
would allow for objective assessments and enable musically
intelligent computer-assisted practice sessions for students learning
an instrument. While existing systems are already able to provide
similar basic functionality, they rely on the musical score as prior
knowledge. In this paper, we present a score-independent system for
assessing student instrument performances based on audio recordings.
This system aims to characterize the performance with both
well-established and custom-designed audio features, model expert
assessments of student performances, and predict the assessment of
unknown audio recordings. The results imply the viability of
modeling human assessment with score-independent audio features.
Results could lead towards more general software music tutoring
systems that do not require score information for the assessment of
student music performances.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
influenced by technical correctness, deviations from established
performance standards, and aesthetic judgment. Despite its inherently
subjective nature, a quantitative overall assessment is often desired,
as exemplified by US all-state auditions or other competitions. A
model that automatically generates assessments from the audio data
would allow for objective assessments and enable musically
intelligent computer-assisted practice sessions for students learning
an instrument. While existing systems are already able to provide
similar basic functionality, they rely on the musical score as prior
knowledge. In this paper, we present a score-independent system for
assessing student instrument performances based on audio recordings.
This system aims to characterize the performance with both
well-established and custom-designed audio features, model expert
assessments of student performances, and predict the assessment of
unknown audio recordings. The results imply the viability of
modeling human assessment with score-independent audio features.
Results could lead towards more general software music tutoring
systems that do not require score information for the assessment of
student music performances.@inproceedings{wu_drum_2016,
title = {On Drum Playing Technique Detection in Polyphonic Mixtures},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/07/Wu_Lerch_2016_On-Drum-Playing-Technique-Detection-in-Polyphonic-Mixtures.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {New York},
abstract = {In this paper, the problem of drum playing technique
detection in polyphonic mixtures of music is addressed.
We focus on the identification of 4 rudimentary techniques:
strike, buzz roll, flam, and drag. The specifics and the
challenges of this task are being discussed, and different
sets of features are compared, including various features
extracted from NMF-based activation functions, as well
as baseline spectral features. We investigate the capabil-
ities and limitations of the presented system in the case
of real-world recordings and polyphonic mixtures. To de-
sign and evaluate the system, two datasets are introduced: a
training dataset generated from individual drum hits, and ad-
ditional annotations of the well-known ENST drum dataset
minus one subset as test dataset. The results demonstrate
issues with the traditionally used spectral features, and in-
dicate the potential of using NMF activation functions for
playing technique detection, however, the performance of
polyphonic music still leaves room for future improvement.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
detection in polyphonic mixtures of music is addressed.
We focus on the identification of 4 rudimentary techniques:
strike, buzz roll, flam, and drag. The specifics and the
challenges of this task are being discussed, and different
sets of features are compared, including various features
extracted from NMF-based activation functions, as well
as baseline spectral features. We investigate the capabil-
ities and limitations of the presented system in the case
of real-world recordings and polyphonic mixtures. To de-
sign and evaluate the system, two datasets are introduced: a
training dataset generated from individual drum hits, and ad-
ditional annotations of the well-known ENST drum dataset
minus one subset as test dataset. The results demonstrate
issues with the traditionally used spectral features, and in-
dicate the potential of using NMF activation functions for
playing technique detection, however, the performance of
polyphonic music still leaves room for future improvement.@inproceedings{xambo_learning_2016,
title = {Learning to code through MIR},
author = {Anna Xambo and Alexander Lerch and Jason Freeman},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/08/Xambo-et-al.-2016-Learning-to-code-through-MIR.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {New York},
abstract = {An approach to teaching computer science (CS) in high-
schools is using EarSketch, a free online tool for teaching
CS concepts while making music. In this demonstration we
present the potential of teaching music information retrieval
(MIR) concepts using EarSketch. The aim is twofold: to
discuss the benefits of introducing MIR concepts in the
classroom and to shed light on how MIR concepts can
be gently introduced in a CS curriculum. We conclude by
identifying the advantagesofteachingMIR inthe classroom
and pointing to future directions for research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
schools is using EarSketch, a free online tool for teaching
CS concepts while making music. In this demonstration we
present the potential of teaching music information retrieval
(MIR) concepts using EarSketch. The aim is twofold: to
discuss the benefits of introducing MIR concepts in the
classroom and to shed light on how MIR concepts can
be gently introduced in a CS curriculum. We conclude by
identifying the advantagesofteachingMIR inthe classroom
and pointing to future directions for research.2015
@inproceedings{gupta_perceptual_2015,
title = {On the Perceptual Relevance of Objective Source Separation Measures for Singing Voice Separation},
author = {Udit Gupta and Elliot Moore II and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/10/Gupta-et-al_2015_On-the-Perceptual-Relevance-of-Objective-Source-Separation-Measures-for-Singing.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
publisher = {IEEE},
address = {New Paltz},
abstract = {Singing Voice Separation (SVS) is a task which uses audio source
separation methods to isolate the vocal component from the back-
ground accompaniment for a song mix. This paper discusses the
methods of evaluating SVS algorithms, and determines how the
current state of the art measures correlate to human perception. A
modified ITU-R BS.1543 MUSHRA test is used to get the human
perceptual ratings for the outputs of various SVS algorithms, which
are correlated with widely used objective measures for source sep-
aration quality. The results show that while the objective measures
provide a moderate correlation with perceived intelligibility and
isolation, they may not adequately assess the overall perceptual
quality.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
separation methods to isolate the vocal component from the back-
ground accompaniment for a song mix. This paper discusses the
methods of evaluating SVS algorithms, and determines how the
current state of the art measures correlate to human perception. A
modified ITU-R BS.1543 MUSHRA test is used to get the human
perceptual ratings for the outputs of various SVS algorithms, which
are correlated with widely used objective measures for source sep-
aration quality. The results show that while the objective measures
provide a moderate correlation with perceived intelligibility and
isolation, they may not adequately assess the overall perceptual
quality.@inproceedings{lykartsis_analysis_2015,
title = {Analysis of Speech Rhythm for Language Identification Based on Beat Histograms},
author = {Athanasios Lykartsis and Alexander Lerch and Stefan Weinzierl},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/06/Lykartsis%20et%20al_2015_Analysis%20of%20Speech%20Rhythm%20for%20Language%20Identification%20Based%20on%20Beat%20Histograms.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the DAGA (Jahrestagung fur Akustik)},
address = {Nuremberg},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{lykartsis_beat_2015,
title = {Beat Histogram Features for Rythm-based Musical Genre Classification Using Multiple Novelty Functions},
author = {Athanasios Lykartsis and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/12/DAFx-15_submission_42-1.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Conference on Digital Audio Effects (DAFX)},
address = {Trondheim, Norway},
abstract = {In this paper we present beat histogram features for multiple level
rhythmdescriptionandevaluatetheminamusicalgenreclassifica-
tion task. Audio features pertaining to various musical content cat-
egories and their related novelty functions are extracted as a basis
for the creation of beat histograms. The proposed features capture
not only amplitude, but also tonal and general spectral changes
in the signal, aiming to represent as much rhythmic information
as possible. The most and least informative features are identi-
fied through feature selection methods and are then tested using
Support Vector Machines on five genre datasets concerning classi-
fication accuracy against a baseline feature set. Results show that
the presented features provide comparable classification accuracy
with respect to other genre classification approaches using period-
icity histograms and display a performance close to that of much
more elaborate up-to-date approaches for rhythm description. The
use of bar boundary annotations for the texture frames has pro-
vided an improvement for the dance-oriented Ballroom dataset.
The comparably small number of descriptors and the possibility of
evaluating the influence of specific signal components to the gen-
eral rhythmic content encourage the further use of the method in
rhythm description tasks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
rhythmdescriptionandevaluatetheminamusicalgenreclassifica-
tion task. Audio features pertaining to various musical content cat-
egories and their related novelty functions are extracted as a basis
for the creation of beat histograms. The proposed features capture
not only amplitude, but also tonal and general spectral changes
in the signal, aiming to represent as much rhythmic information
as possible. The most and least informative features are identi-
fied through feature selection methods and are then tested using
Support Vector Machines on five genre datasets concerning classi-
fication accuracy against a baseline feature set. Results show that
the presented features provide comparable classification accuracy
with respect to other genre classification approaches using period-
icity histograms and display a performance close to that of much
more elaborate up-to-date approaches for rhythm description. The
use of bar boundary annotations for the texture frames has pro-
vided an improvement for the dance-oriented Ballroom dataset.
The comparably small number of descriptors and the possibility of
evaluating the influence of specific signal components to the gen-
eral rhythmic content encourage the further use of the method in
rhythm description tasks.@inproceedings{lykartsis_beat_2015-1,
title = {Beat Histogram Features from NMF-Based Novelty Functions for Music Classification},
author = {Athanasios Lykartsis and Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/10/Lykartsis-et-al_2015_Beat-Histogram-Features-from-NMF-Based-Novelty-Functions-for-Music.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {Malaga},
abstract = {In this paper we present novel rhythm features derived from
drum tracks extracted from polyphonic music and evaluate
them in a genre classification task. Musical excerpts are
analyzed using an optimized, partially fixed Non-Negative
Matrix Factorization (NMF) method and beat histogram
features are calculated on basis of the resulting activation
functions for each one out of three drum tracks extracted
(Hi-Hat, SnareDrumandBassDrum). Thefeaturesareeval-
uated on two widely used genre datasets (GTZAN and Ball-
room) using standard classification methods, concerning
the achieved overall classification accuracy. Furthermore,
their suitability in distinguishing between rhythmically sim-
ilar genres and the performance of the features resulting
from individual activation functions is discussed. Results
show that the presented NMF-based beat histogram features
can provide comparable performance to other classification
systems, while considering strictly drum patterns.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
drum tracks extracted from polyphonic music and evaluate
them in a genre classification task. Musical excerpts are
analyzed using an optimized, partially fixed Non-Negative
Matrix Factorization (NMF) method and beat histogram
features are calculated on basis of the resulting activation
functions for each one out of three drum tracks extracted
(Hi-Hat, SnareDrumandBassDrum). Thefeaturesareeval-
uated on two widely used genre datasets (GTZAN and Ball-
room) using standard classification methods, concerning
the achieved overall classification accuracy. Furthermore,
their suitability in distinguishing between rhythmically sim-
ilar genres and the performance of the features resulting
from individual activation functions is discussed. Results
show that the presented NMF-based beat histogram features
can provide comparable performance to other classification
systems, while considering strictly drum patterns.@inproceedings{obrien_genre-specific_2015,
title = {Genre-Specific Key Profiles},
author = {Cian O'Brien and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/09/O'Brien_Lerch_2015_Genre-Specific%20Key%20Profiles.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Computer Music Conference (ICMC)},
publisher = {ICMA},
address = {Denton},
abstract = {The most common approaches to the automatic recognition
of musical key are template-based, i.e., an extracted pitch
chroma vector is compared to a template key profile in order
to identify the most similar key. General as well as domain-
specific templates have been used in the past, but to the au-
thors best knowledge there has been no study that evaluated
genre-specific key profiles extracted from the audio signal. We
investigate the pitch chroma distributions for 9 different gen-
res, their distances, and the degree to which these genres can
be identified using these distributions when utilizing different
strategies for achieving key-invariance.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
of musical key are template-based, i.e., an extracted pitch
chroma vector is compared to a template key profile in order
to identify the most similar key. General as well as domain-
specific templates have been used in the past, but to the au-
thors best knowledge there has been no study that evaluated
genre-specific key profiles extracted from the audio signal. We
investigate the pitch chroma distributions for 9 different gen-
res, their distances, and the degree to which these genres can
be identified using these distributions when utilizing different
strategies for achieving key-invariance.@inproceedings{wu_drum_2015,
title = {Drum Transcription using Partially Fixed Non-Negative Matrix Factorization},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/09/Wu_Lerch_2015_Drum%20Transcription%20using%20Partially%20Fixed%20Non-Negative%20Matrix%20Factorization.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
publisher = {EURASIP},
address = {Nice},
abstract = {In this paper, a drum transcription algorithm using partially
fixed non-negative matrix factorization is presented. The pro-
posed method allows users to identify percussive events in
complex mixtures with a minimal training set. The algorithm
decomposes the music signal into two parts: percussive part
with pre-defined drum templates and harmonic part with un-
defined entries. The harmonic part is able to adapt to the
music content, allowing the algorithm to work in polyphonic
mixtures. Drum event times can be simply picked from the
percussive activation matrix with onset detection. The system
is efficient and robust even with a minimal training set. The
recognition rates for the ENST dataset vary from 56.7 to 78.9%
for three percussive instruments extracted from polyphonic
music.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
fixed non-negative matrix factorization is presented. The pro-
posed method allows users to identify percussive events in
complex mixtures with a minimal training set. The algorithm
decomposes the music signal into two parts: percussive part
with pre-defined drum templates and harmonic part with un-
defined entries. The harmonic part is able to adapt to the
music content, allowing the algorithm to work in polyphonic
mixtures. Drum event times can be simply picked from the
percussive activation matrix with onset detection. The system
is efficient and robust even with a minimal training set. The
recognition rates for the ENST dataset vary from 56.7 to 78.9%
for three percussive instruments extracted from polyphonic
music.@inproceedings{wu_drum_2015-1,
title = {Drum Transcription using Partially Fixed Non-Negative Matrix Factorization With Template Adaptation},
author = {Chih-Wei Wu and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/10/Wu_Lerch_2015_Drum-Transcription-using-Partially-Fixed-Non-Negative-Matrix-Factorization-With.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {Malaga},
abstract = {In this paper, a template adaptive drum transcription algo-
rithm using partially fixed Non-negative Matrix Factoriza-
tion (NMF) is presented. The proposed method detects per-
cussive events in complex mixtures of music with a minimal
training set. The algorithm decomposes the music signal
into two dictionaries: a percussive dictionary initialized
with pre-defined drum templates and a harmonic dictionary
initialized with undefined entries. The harmonic dictionary
is adapted to the non-percussive music content in a standard
NMF procedure. The percussive dictionary is adapted to
each individual signal in an iterative scheme: it is fixed
during the decomposition process, and is updated based on
the result of the previous convergence. Two template adap-
tation methods are proposed to provide more flexibility and
robustness in the case of unknown data. The performance
of the proposed system has been evaluated and compared
to state of the art systems. The results show that template
adaptation improves the transcription performance, and the
detection accuracy is in the same range as more complex
systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
rithm using partially fixed Non-negative Matrix Factoriza-
tion (NMF) is presented. The proposed method detects per-
cussive events in complex mixtures of music with a minimal
training set. The algorithm decomposes the music signal
into two dictionaries: a percussive dictionary initialized
with pre-defined drum templates and a harmonic dictionary
initialized with undefined entries. The harmonic dictionary
is adapted to the non-percussive music content in a standard
NMF procedure. The percussive dictionary is adapted to
each individual signal in an iterative scheme: it is fixed
during the decomposition process, and is updated based on
the result of the previous convergence. Two template adap-
tation methods are proposed to provide more flexibility and
robustness in the case of unknown data. The performance
of the proposed system has been evaluated and compared
to state of the art systems. The results show that template
adaptation improves the transcription performance, and the
detection accuracy is in the same range as more complex
systems.@inproceedings{zhou_chord_2015,
title = {Chord Detection Using Deep Learning},
author = {Xinquan Zhou and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/10/Zhou_Lerch_2015_Chord-Detection-Using-Deep-Learning.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
publisher = {ISMIR},
address = {Malaga},
abstract = {In this paper, we utilize deep learning to learn high-level
features for audio chord detection. The learned features,
obtained by a deep network in bottleneck architecture, give
promising results and outperform state-of-the-art systems.
We present and evaluate the results for various methods and
configurations, including input pre-processing, a bottleneck
architecture, and SVMs vs. HMMs for chord classification.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
features for audio chord detection. The learned features,
obtained by a deep network in bottleneck architecture, give
promising results and outperform state-of-the-art systems.
We present and evaluate the results for various methods and
configurations, including input pre-processing, a bottleneck
architecture, and SVMs vs. HMMs for chord classification.2014
@inproceedings{coler_cmmsd:_2014,
title = {CMMSD: A Data Set for Note-Level Segmentation of Monophonic Music},
author = {Henrik von Coler and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/04/Coler_Lerch_2014_CMMSD.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the AES 53rd International Conference on Semantic Audio},
publisher = {Audio Engineering Society (AES)},
address = {London, UK},
abstract = {A musical data set for note-level segmentation of monophonic music is presented. It contains 36 excerpts from
commercial recordings of monophonic classical western music and features the instrument groups strings,
woodwind and brass. The excerpts are self-contained phrases with a mean length of 17.97 seconds and an
average of 20 notes. All phrases are played in moderate tempo, mostly with significant amounts of expressive
articulation. A manually annotated ground truth splits each item into a sequence of the three states note,
transition and rest. The set is designed as an open source project, aiming at the development and evaluation
of algorithms for segmentation, music performance analysis and feature selection. This paper presents the
process of ground truth labeling and a detailed description of the data set and its properties.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
commercial recordings of monophonic classical western music and features the instrument groups strings,
woodwind and brass. The excerpts are self-contained phrases with a mean length of 17.97 seconds and an
average of 20 notes. All phrases are played in moderate tempo, mostly with significant amounts of expressive
articulation. A manually annotated ground truth splits each item into a sequence of the three states note,
transition and rest. The set is designed as an open source project, aiming at the development and evaluation
of algorithms for segmentation, music performance analysis and feature selection. This paper presents the
process of ground truth labeling and a detailed description of the data set and its properties.@incollection{lerch_music_2014,
title = {Music Information Retrieval},
author = {Alexander Lerch},
editor = {Stefan Weinzierl},
isbn = {978-3-89007-699-7},
year = {2014},
date = {2014-01-01},
booktitle = {Akustische Grundlagen der Musik},
number = {5},
pages = {79--102},
publisher = {Laaber},
series = {Handbuch der Systematischen Musikwissenschaft},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
2013
@inproceedings{kraft_tonalness_2013,
title = {The Tonalness Spectrum: Feature-Based Estimation of Tonal Components},
author = {Sebastian Kraft and Alexander Lerch and Udo Z\"{o}lzer},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/04/Kraft%20et%20al_2013_The%20Tonalness%20Spectrum.pdf},
year = {2013},
date = {2013-01-01},
urldate = {2014-01-16},
booktitle = {Proceedings of the 16th International Conference on Digital Audio Effects},
address = {Maynooth},
abstract = {The tonalness spectrum shows the likelihood of a spectral bin be-
ing part of a tonal or non-tonal component. It is a non-binary
measure based on a set of established spectral features. An eas-
ily extensible framework for the computation, selection, and com-
bination of features is introduced. The results are evaluated and
compared in two ways. First with a data set of synthetically gen-
erated signals but also with real music signals in the context of a
typical MIR application.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
ing part of a tonal or non-tonal component. It is a non-binary
measure based on a set of established spectral features. An eas-
ily extensible framework for the computation, selection, and com-
bination of features is introduced. The results are evaluated and
compared in two ways. First with a data set of synthetically gen-
erated signals but also with real music signals in the context of a
typical MIR application.2012
@book{lerch_introduction_2012,
title = {An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics},
author = {Alexander Lerch},
url = {http://ieeexplore.ieee.org/xpl/bkabstractplus.jsp?bkn=6266785},
isbn = {978-1-118-26682-3},
year = {2012},
date = {2012-01-01},
publisher = {Wiley-IEEE Press},
address = {Hoboken},
abstract = {With the proliferation of digital audio distribution over digital media, audio content analysis is fast becoming a requirement for designers of intelligent signal-adaptive audio processing systems. Written by a well-known expert in the field, this book provides quick access to different analysis algorithms and allows comparison between different approaches to the same task, making it useful for newcomers to audio signal processing and industry experts alike. A review of relevant fundamentals in audio signal processing, psychoacoustics, and music theory, as well as downloadable MATLAB files are also included. Please visit the companion website: www.AudioContentAnalysis.org},
keywords = {analysis, audio, audio signal processing, information, listening, machine, machine listening, music, music analysis, music information retrieval, processing, retrieval, signal},
pubstate = {published},
tppubtype = {book}
}
2011
@article{kirchhoff_evaluation_2011,
title = {Evaluation of Features for Audio-to-Audio Alignment},
author = {Holger Kirchhoff and Alexander Lerch},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/04/Kirchhoff_Lerch_2011_Evaluation%20of%20Features%20for%20Audio-to-Audio%20Alignment.pdf},
doi = {10.1080/09298215.2010.529917},
year = {2011},
date = {2011-01-01},
journal = {Journal of New Music Research},
volume = {40},
number = {1},
pages = {27--41},
abstract = {Audio-to-audio alignment is the task of synchronizing two audio sequences with similar musical content in time. We investigated a large set of audio features for this task. The features were chosen to represent four different content-dependent similarity categories: the envelope, the timbre, note-onsets and the pitch. The features were subjected to two processing stages. First, a feature subset was selected by evaluating the alignment performance of each individual feature. Second, the selected features were combined and subjected to an automatic weighting algorithm.
A new method for the objective evaluation of audio-to-audio alignment systems is proposed that enables the use of arbitrary kinds of music as ground truth data. We evaluated our algorithm by this method as well as on a data set of real recordings of solo piano music. The results showed that the feature weighting algorithm could improve the alignment accuracies compared to the results of the individual features.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
A new method for the objective evaluation of audio-to-audio alignment systems is proposed that enables the use of arbitrary kinds of music as ground truth data. We evaluated our algorithm by this method as well as on a data set of real recordings of solo piano music. The results showed that the feature weighting algorithm could improve the alignment accuracies compared to the results of the individual features.@incollection{lerch_software-gestutzte_2011,
title = {Software-gest\"{u}tzte Merkmalsextraktion f\"{u}r die musikalische Auff\"{u}hrungsanalyse},
author = {Alexander Lerch},
editor = {Heinz von Loesch and Stefan Weinzierl},
isbn = {978-3-7957-0771-2},
year = {2011},
date = {2011-01-01},
booktitle = {Gemessene Interpretation - Computergest\"{u}tzte Auff\"{u}hrungsanalyse im Kreuzverh\"{o}r der Disziplinen},
pages = {205--212},
publisher = {Schott},
address = {Mainz},
series = {Klang und Begriff},
keywords = {},
pubstate = {published},
tppubtype = {incollection}
}
@inproceedings{ness_strategies_2011,
title = {Strategies for Orca Call Retrieval to Support Collaborative Annotation of a Large Archive},
author = {Steven R Ness and Alexander Lerch and George Tzanetakis},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6093798},
doi = {10.1109/MMSP.2011.6093798},
isbn = {978-1-4577-1434-4},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the International Workshop on Multimedia Signal Processing (MMSP)},
publisher = {IEEE},
address = {Hangzhou},
abstract = {The Orchive is a large audio archive of hydrophone recordings of Killer whale (Orcinus orca) vocalizations. Researchers and users from around the world can interact with the archive using a collaborative web-based annotation, visualization and retrieval interface. In addition a mobile client has been written in order to crowdsource Orca call annotation. In this paper we describe and compare different strategies for the retrieval of discrete Orca calls. In addition, the results of the automatic analysis are integrated in the user interface facilitating annotation as well as leveraging the existing annotations for supervised learning. The best strategy achieves a mean average precision of 0.77 with the first retrieved item being relevant 95% of the time in a dataset of 185 calls belonging to 4 types.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2010
@inproceedings{wiesener_adaptive_2010,
title = {Adaptive Noise Reduction for Real-time Applications},
author = {Constantin Wiesener and Tim Flohrer and Alexander Lerch and Stefan Weinzierl},
url = {http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/04/Wiesener%20et%20al_2010_Adaptive%20Noise%20Reduction%20for%20Real-time%20Applications.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 128th Audio Engineering Society Convention (Preprint #8048)},
publisher = {Audio Engineering Society},
address = {London},
abstract = {We present a new algorithm for real-time noise reduction of audio signals. In order to derive the noise reduction function, the proposed method adaptively estimates the instantaneous noise spectrum from an autoregressive signal model as opposed to the widely-used approach of using a constant noise spectrum fingerprint. In conjunction with the Ephraim and Malah suppression rule a significant reduction of both stationary and non-stationary noise can be obtained. The adaptive algorithm is able to work without user interaction and is capable of real-time processing. Furthermore, quality improvements are easily possible by integration of additional processing blocks such as transient preservation.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
2009
@book{lerch_software-based_2009,
title = {Software-Based Extraction of Objective Parameters from Music Performances},
author = {Alexander Lerch},
url = {http://dx.doi.org/10.14279/depositonce-2025},
isbn = {978-3-640-29496-1},
year = {2009},
date = {2009-01-01},
publisher = {GRIN Verlag},
address = {M\"{u}nchen},
abstract = {Different music performances of the same score may significantly differ from each other. It is obvious that not only the composer’s work, the score, defines the listener’s music experience, but that the music performance itself is an integral part of this experience. Music performers use the information contained in the score, but interpret, transform or add to this information. Four parameter classes can be used to describe a performance objectively: tempo and timing, loudness, timbre and pitch. Each class contains a multitude of individual parameters that are at the performers’ disposal to generate a unique physical rendition of musical ideas. The extraction of such objective parameters is one of the difficulties in music performance research. This work presents an approach to the software-based extraction of tempo and timing, loudness and timbre parameters from audio files to provide a tool for the automatic parameter extraction from music performances. The system is applied to extract data from 21 string quartet performances and a detailed analysis of the extracted data is presented. The main contributions of this thesis are the adaptation and development of signal processing approaches to performance parameter extraction and the presentation and discussion of string quartet performances of a movement of Beethoven’s late String Quartet op. 130.},
keywords = {analysis, audio, content, information, music, performance, retrieval},
pubstate = {published},
tppubtype = {book}
}
publications
Learning to Traverse Latent Spaces for Musical Score Inpainting Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Automatic Assessment of Sight-Reading Exercises Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. An Attention Mechanism for Music Instrument Recognition Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, 2019. Evaluation of Feature Learning Methods for Voice Disorder Detection Journal Article In: International Journal of Semantic Computing (IJSC), vol. 13, no. 4, pp. 453–470, 2019. Learned Features for the Assessment of Percussive Music Performances Proceedings Article In: Proceedings of the International Conference on Semantic Computing (ICSC), IEEE, Laguna Hills, 2018. The Relation Between Music Technology and Music Industry Book Section In: Bader, Rolf (Ed.): Springer Handbook of Systematic Musicology, pp. 899–909, Springer, Berlin, Heidelberg, 2018, ISBN: 978-3-662-55002-1 978-3-662-55004-5. Assessment of Student Music Performances Using Deep Neural Networks Journal Article In: Applied Sciences, vol. 8, no. 4, pp. 507, 2018. A Review of Automatic Drum Transcription Journal Article In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1457–1483, 2018, ISSN: 2329-9290. Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases Proceedings Article In: Proceedings of the Conference on New Interfaces for Musical Expression (NIME), Blacksburg, 2018. Multi-Track Crosstalk Reduction Proceedings Article In: Proceedings of the Audio Engineering Society Convention, Audio Engineering Society (AES), Milan, 2018. Instrument Activity Detection in Polyphonic Music using Deep Neural Networks Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. From Labeled to Unlabeled Data -- On the Data Challenge in Automatic Drum Transcription Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Concert Stitch: Organization and Synchromization of Crowd-Sourced Recordings Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, 2018. Analysis of Objective Descriptors for Music Performance Assessment Proceedings Article In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), Montreal, Canada, 2018. Lead Sheet Generation with Musically Interdependent Networks Proceedings Article In: Late Breaking Abstract, Proceedings of Computer Simulation of Musical Creativity (CSMC), Dublin, 2018. Assessment of Percussive Music Performances with Feature Learning Journal Article In: International Journal of Semantic Computing, vol. 12, no. 3, pp. 315–333, 2018, ISSN: 1793-351X. On the evaluation of generative models in music Journal Article In: Neural Computing and Applications, 2018, ISSN: 1433-3058. Blind Bandwidth Extension using K-Means and Support Vector Regression Proceedings Article In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, New Orleans, 2017. Automatic Sample Detection in Polyphonic Music Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. A Dataset and Method for Electric Guitar Solo Detection in Rock Music Proceedings Article In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Objective descriptors for the assessment of student music performances Proceedings Article In: Proceedings of the AES Conference on Semantic Audio, Audio Engineering Society (AES), Erlangen, 2017. Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), International Society for Music Information Retrieval (ISMIR), Suzhou, 2017. Learning to Fuse Music Genres with Generative Adversarial Dual Learning Proceedings Article In: Proceedings of the International Conference on Data Mining (ICDM), Institute of Electrical and Electronics Engineers (IEEE), New Orleans, 2017. Proceedings of the 2nd Web Audio Conference (WAC-2016) Book Georgia Institute of Technology, Atlanta, 2016, ISBN: 978-0-692-61973-5. An Efficient Algorithm For Clipping Detection And Declipping Audio Proceedings Article In: Proceedings of the 141st AES Convention, Audio Engineering Society (AES), Los Angeles, 2016. An Unsupervised Approach to Anomaly Detection in Music Datasets Proceedings Article In: Proceedings of the ACM SIGIR Conference (SIGIR), pp. 749–752, ACM, Pisa, 2016, ISBN: 978-1-4503-4069-4. Automatic Outlier Detection in Music Genre Datasets Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Automatic Practice Logging: Introduction, Dataset & Preliminary Study Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Towards the Objective Assessment of Music Performances Proceedings Article In: Proceedings of the International Conference on Music Perception and Cognition (ICMPC), pp. 99–103, San Francisco, 2016, ISBN: 1-879346-65-5. On Drum Playing Technique Detection in Polyphonic Mixtures Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. Learning to code through MIR Proceedings Article In: Late Breaking Demo (Extended Abstract), Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, New York, 2016. On the Perceptual Relevance of Objective Source Separation Measures for Singing Voice Separation Proceedings Article In: Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE, New Paltz, 2015. Analysis of Speech Rhythm for Language Identification Based on Beat Histograms Proceedings Article In: Proceedings of the DAGA (Jahrestagung fur Akustik), Nuremberg, 2015. Beat Histogram Features for Rythm-based Musical Genre Classification Using Multiple Novelty Functions Proceedings Article In: Proceedings of the International Conference on Digital Audio Effects (DAFX), Trondheim, Norway, 2015. Beat Histogram Features from NMF-Based Novelty Functions for Music Classification Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. Genre-Specific Key Profiles Proceedings Article In: Proceedings of the International Computer Music Conference (ICMC), ICMA, Denton, 2015. Drum Transcription using Partially Fixed Non-Negative Matrix Factorization Proceedings Article In: Proceedings of the European Signal Processing Conference (EUSIPCO), EURASIP, Nice, 2015. Drum Transcription using Partially Fixed Non-Negative Matrix Factorization With Template Adaptation Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. Chord Detection Using Deep Learning Proceedings Article In: Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), ISMIR, Malaga, 2015. CMMSD: A Data Set for Note-Level Segmentation of Monophonic Music Proceedings Article In: Proceedings of the AES 53rd International Conference on Semantic Audio, Audio Engineering Society (AES), London, UK, 2014. Music Information Retrieval Book Section In: Weinzierl, Stefan (Ed.): Akustische Grundlagen der Musik, no. 5, pp. 79–102, Laaber, 2014, ISBN: 978-3-89007-699-7. The Tonalness Spectrum: Feature-Based Estimation of Tonal Components Proceedings Article In: Proceedings of the 16th International Conference on Digital Audio Effects, Maynooth, 2013. An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics Book Wiley-IEEE Press, Hoboken, 2012, ISBN: 978-1-118-26682-3. Evaluation of Features for Audio-to-Audio Alignment Journal Article In: Journal of New Music Research, vol. 40, no. 1, pp. 27–41, 2011. Software-gestützte Merkmalsextraktion für die musikalische Aufführungsanalyse Book Section In: von Loesch, Heinz; Weinzierl, Stefan (Ed.): Gemessene Interpretation - Computergestützte Aufführungsanalyse im Kreuzverhör der Disziplinen, pp. 205–212, Schott, Mainz, 2011, ISBN: 978-3-7957-0771-2. Strategies for Orca Call Retrieval to Support Collaborative Annotation of a Large Archive Proceedings Article In: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), IEEE, Hangzhou, 2011, ISBN: 978-1-4577-1434-4. Adaptive Noise Reduction for Real-time Applications Proceedings Article In: Proceedings of the 128th Audio Engineering Society Convention (Preprint #8048), Audio Engineering Society, London, 2010. Software-Based Extraction of Objective Parameters from Music Performances Book GRIN Verlag, München, 2009, ISBN: 978-3-640-29496-1.2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009