<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>project | Music Informatics Group</title>
	<atom:link href="https://musicinformatics.gatech.edu/category/project/feed/" rel="self" type="application/rss+xml" />
	<link>https://musicinformatics.gatech.edu</link>
	<description>Georgia Institute of Technology</description>
	<lastBuildDate>Mon, 23 Feb 2026 21:12:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Quantifying Spatial Audio Quality Impairment</title>
		<link>https://musicinformatics.gatech.edu/project/quantifying-spatial-audio-quality-impairment/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Mon, 08 Dec 2025 01:00:22 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">https://musicinformatics.gatech.edu/?p=656</guid>

					<description><![CDATA[<p>by Karn Watcharasupat Spatial audio quality is a highly multifaceted concept (see this for a very long list of things to consider). &#8220;Geometrical&#8221; components of spatial audio quality are perhaps the least subjective aspect of spatial audio quality to quantify, yet there have been very little attempt at dealing withit since BSS Eval came out &#8230; <a href="https://musicinformatics.gatech.edu/project/quantifying-spatial-audio-quality-impairment/" class="more-link">Continue reading <span class="screen-reader-text">Quantifying Spatial Audio Quality Impairment</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/quantifying-spatial-audio-quality-impairment/">Quantifying Spatial Audio Quality Impairment</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p style="text-align: justify;"><span style="font-size: 10pt;">by Karn Watcharasupat</span></p>
<p style="text-align: justify;">Spatial audio quality is a highly multifaceted concept (see <a href="https://depositonce.tu-berlin.de/items/50b7777f-ce30-431b-b371-55b977e0f707">this</a> for a very long list of things to consider). &#8220;Geometrical&#8221; components of spatial audio quality are perhaps the least subjective aspect of spatial audio quality to quantify, yet there have been very little attempt at dealing withit since <a href="https://gitlab.inria.fr/bass-db/bss_eval">BSS Eval</a> came out almost 20(!) years ago.</p>
<p style="text-align: justify;">Even the geometrical component of spatial audio quality is not trivial to quantify. We resorted to only considering the interchannel time differences (ITD) and interchannel level differences (ILD) of the test signal relative to a reference signal. With this, it is actually possible to construct a signal model to isolate _some_ of the spatial distortion. By using a combination of Weiner-style least-square optimization and good ol&#8217; correlation maximization, we propose a signal decomposition method to isolate the spatial error, in terms of interchannel gain leakages and changes in relative delays, from a processed signal. These intermediates parameters can then be used as a diagnostic tool to identify the nature of the spatial distortion and to quantify the spatial quality impairment.</p>
<h4 id="Methods" style="text-align: justify;">Methods</h4>The post <a href="https://musicinformatics.gatech.edu/project/quantifying-spatial-audio-quality-impairment/">Quantifying Spatial Audio Quality Impairment</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Evaluation of Generative Models in Music</title>
		<link>https://musicinformatics.gatech.edu/project/evaluation-of-generative-models-in-music/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Fri, 26 Sep 2025 01:50:21 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">https://musicinformatics.gatech.edu/?p=680</guid>

					<description><![CDATA[<p>by Alexander Lerch Generative Artificial intelligence is increasingly capable of composing music, from short melodies to full songs. Despite the increasing number of new, &#8220;superior&#8221; models, there has been no consensus on how to measure this progress. How do we know if one model is indeed better than the other? Evaluating AI-generated music is challenging &#8230; <a href="https://musicinformatics.gatech.edu/project/evaluation-of-generative-models-in-music/" class="more-link">Continue reading <span class="screen-reader-text">Evaluation of Generative Models in Music</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/evaluation-of-generative-models-in-music/">Evaluation of Generative Models in Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p style="text-align: justify;"><span style="font-size: 10pt;">by Alexander Lerch</span></p>
<p style="text-align: justify;">Generative Artificial intelligence is increasingly capable of composing music, from short melodies to full songs. Despite the increasing number of new, &#8220;superior&#8221; models, there has been no consensus on how to measure this progress. How do we know if one model is indeed better than the other?</p>
<p style="text-align: justify;">Evaluating AI-generated music is challenging because music perception is inherently subjective. There is no single “correct” or &#8220;best&#8221; version of a song, and people’s tastes vary widely and objectively evaluating elusive properties such as aesthetics, musicality, creativity or emotional impact is ultimately pointless. The language of music is complex and abstract, and its perception subjective.</p>
<h4 id="Evaluation Targets" style="text-align: justify;">Evaluation Targets</h4>
<p><a href="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2025/09/202509-eval.png"><img fetchpriority="high" decoding="async" class="aligncenter size-full wp-image-681" src="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2025/09/202509-eval.png" alt="" width="1100" height="620" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2025/09/202509-eval.png 1100w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2025/09/202509-eval-300x169.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2025/09/202509-eval-1024x577.png 1024w" sizes="(max-width: 1100px) 100vw, 1100px" /></a></p>
<p style="text-align: justify;">The paper breaks evaluation into two main categories:<br />
• System Output: focusing on the generated output of a system and its properties<br />
• User Experience: focusing on how people interact with a generative system.<br />
Researchers use both subjective and objective methods. Subjective methods include listening tests, surveys, and Turing-style tests where listeners try to guess whether a piece was composed by a human or a machine. Objective methods use mathematical metrics to compare the AI’s output to human-composed music, measuring things like pitch distribution, rhythm patterns, and audio fidelity.</p>
<h4 id="Challenges and Conclusion" style="text-align: justify;">Challenges and Conclusion</h4>
<p style="text-align: justify;">There are several major challenges in evaluating generative music systems. First, the validity of existing methodologies is limited. Second, existing metrics have limited and/or unknown musical and perceptual meaning. Third, there is no standard set of metrics, which makes it hard to compare different systems. In addition, there are concerns around the topic of responsible AI.</p>
<p style="text-align: justify;">There is a need for more consistent, interdisciplinary approaches to evaluating generative music. It highlights the need for better metrics, more transparent research practices, and deeper collaboration between computer scientists, musicians, and psychologists.</p>
<h4 id="Resources" style="text-align: justify;">Resources</h4>
<p style="text-align: justify;">Please find the <a href="https://dl.acm.org/doi/10.1145/3769106">open access survey paper</a> for more details.</p>The post <a href="https://musicinformatics.gatech.edu/project/evaluation-of-generative-models-in-music/">Evaluation of Generative Models in Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models</title>
		<link>https://musicinformatics.gatech.edu/project/latte-cross-framework-python-package-for-evaluation-of-latent-based-generative-models/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Fri, 25 Feb 2022 18:44:34 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">https://musicinformatics.gatech.edu/?p=610</guid>

					<description><![CDATA[<p>by Karn N. Watcharasupat and Junyoung Lee Controllable deep generative models have promising applications in various fields such as computer vision, natural language processing, or music. However, implementations of evaluation metrics for these generative models remain non-standardized. Evaluating disentanglement learning, in particular, might require implementing your own metrics, possibly entangling you more than when you &#8230; <a href="https://musicinformatics.gatech.edu/project/latte-cross-framework-python-package-for-evaluation-of-latent-based-generative-models/" class="more-link">Continue reading <span class="screen-reader-text">Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/latte-cross-framework-python-package-for-evaluation-of-latent-based-generative-models/">Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p style="text-align: justify;"><span style="font-size: 10pt;">by Karn N. Watcharasupat and Junyoung Lee</span></p>
<p style="text-align: justify;">Controllable deep generative models have promising applications in various fields such as computer vision, natural language processing, or music. However, implementations of evaluation metrics for these generative models remain non-standardized. Evaluating disentanglement learning, in particular, might require implementing your own metrics, possibly entangling you more than when you started.</p>
<p style="text-align: justify;"><strong>Latte</strong> (for <strong>lat</strong>ent <strong>t</strong>ensor <strong>e</strong>valuation) is a package designed to help both you and your latent-based model to stay disentangled at the end of the day (YMMW, of course). The Python package is, by design, created to work with both <a href="https://pytorch.org/">PyTorch</a> and <a href="https://www.tensorflow.org/">TensorFlow</a>. All metrics are first implemented in <a href="https://numpy.org/">NumPy</a> with minimal dependencies (like <a href="https://scikit-learn.org">scikit-learn</a>) and then a wrapper is created to turn these NumPy functions into TorchMetrics or Keras Metric modules. This way, each metric is always calculated in the exact same way regardless of the deep learning framework being used. In addition, the functional NumPy API is also exposed, so that post-hoc evaluation or models from other frameworks can also enjoy our metric implementations.</p>
<p style="text-align: justify;">Currently, our package supports the classic disentanglement metrics: <em>Mutual Information Gap (MIG)</em>, <em>Separate Attribute Predictability (SAP)</em>, <em>Modularity</em>. In addition, several dependency-aware variants of MIG proposed <a href="https://archives.ismir.net/ismir2021/latebreaking/000002.pdf">here</a> and <a href="https://www.researchgate.net/publication/356259963_Controllable_Music_Supervised_Learning_of_Disentangled_Representations_for_Music_Generation">here</a> are also included. These metrics are useful for situations where your semantic attributes are inherent dependent with respect to one another, a situation where traditional metrics might penalize a latent space that has <em>correctly</em> learned the nature of the semantic attributes. Latte also implements interpolatability metrics which evaluate how smoothly or monotonically your decoder translates the latent vectors into generated samples.</p>
<p style="text-align: justify;">To make life simpler, Latte also comes equipped with <em>metric bundles</em> which are optimized implementations of multiple metrics commonly used together. The bundles optimize away duplicate computation of identical or similar steps in the metric computation, reducing both the lines of code needed and the runtime. We are working to add more metrics and bundles into our package. The most updated list can always be found at our <a href="https://github.com/karnwatcharasupat/latte">GitHub repository</a>.</p>
<p style="text-align: justify;">Latte can be easily installed via <code>pip</code> using <code>pip install latte-metrics</code>. We have also created a few <em>Google Colab</em> notebooks demonstrating how you can use Latte to evaluate an attribute-regularized VAE for controlling MNIST digits, using <a href="https://colab.research.google.com/github/karnwatcharasupat/latte/blob/main/examples/morphomnist/morphomnist-torch.ipynb">vanilla PyTorch</a>, <a href="https://colab.research.google.com/github/karnwatcharasupat/latte/blob/main/examples/morphomnist/morphomnist-lightning.ipynb">PyTorch Lightning</a>, and <a href="https://colab.research.google.com/github/karnwatcharasupat/latte/blob/main/examples/morphomnist/morphomnist-keras.ipynb">TensorFlow</a>. The full documentation of our package can be found <a href="https://latte.readthedocs.io/en/latest">here</a>.</p>
<h4 id="Resources" style="text-align: justify;">Resources</h4>
<ul>
<li>Open-Access Paper in <em>Software Impacts</em>: <a href="https://www.sciencedirect.com/science/article/pii/S2665963822000033">https://www.sciencedirect.com/science/article/pii/S2665963822000033</a></li>
<li>GitHub repository: <a href="https://github.com/karnwatcharasupat/latte">https://github.com/karnwatcharasupat/latte</a></li>
</ul>The post <a href="https://musicinformatics.gatech.edu/project/latte-cross-framework-python-package-for-evaluation-of-latent-based-generative-models/">Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>From labeled to unlabeled data &#8211; on the data challenge in automatic drum transcription</title>
		<link>https://musicinformatics.gatech.edu/project/from-labeled-to-unlabeled-data-on-the-data-challenge-in-automatic-drum-transcription/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Fri, 18 Jan 2019 16:00:44 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=434</guid>

					<description><![CDATA[<p>by Chih-Wei Wu Automatic Drum Transcription (ADT) is an on-going research topic that concerns the extraction of drum events from music signals. After roughly three decades of research on this topic, many methods and several datasets have been proposed to address this problem. However, similar to many other Music Information Retrieval (MIR) research topics, the &#8230; <a href="https://musicinformatics.gatech.edu/project/from-labeled-to-unlabeled-data-on-the-data-challenge-in-automatic-drum-transcription/" class="more-link">Continue reading <span class="screen-reader-text">From labeled to unlabeled data &#8211; on the data challenge in automatic drum transcription</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/from-labeled-to-unlabeled-data-on-the-data-challenge-in-automatic-drum-transcription/">From labeled to unlabeled data – on the data challenge in automatic drum transcription</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 8pt;">by Chih-Wei Wu</span></p>
<p>Automatic Drum Transcription (ADT) is an on-going research topic that concerns the extraction of drum events from music signals. After roughly three decades of research on this topic, many methods and several datasets have been proposed to address this problem. However, similar to many other Music Information Retrieval (MIR) research topics, the availability of realistic and carefully curated datasets is one of the bottlenecks for advancing the performance of ADT systems.</p>
<p>In our <a href="http://www.musicinformatics.gatech.edu/project/automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with-unlabeled-music-data/" target="_blank" rel="noopener">previous blog post</a>, we briefly discussed this challenge in the context of ADT. With a standard annotated dataset (i.e., ENST drums) and a small collection of unlabeled data, we demonstrated the possibility of harnessing unlabeled music data for improvements in the context of ADT.<br />
In this paper, we explore this idea further in the following directions:</p>
<ol>
<li>Identify the major types of ADT systems and investigate generic methods for integrating unlabeled data into these systems accordingly</li>
<li>Train the systems using a large scale unlabeled music dataset</li>
<li>Evaluate all systems using multiple labeled datasets currently available</li>
</ol>
<p>The intention is to validate the idea of using unlabeled data for ADT in a large scale. To achieve this goal, we present two approaches.</p>
<h4>Method</h4>
<p>To show that the benefit of using unlabeled data can be generalized to most ADT systems, the first thing is to identify the most popular ADT approaches. To this end, we reviewed existing ADT systems (for more information, please refer to this <a href="https://ieeexplore.ieee.org/document/8350302" target="_blank" rel="noopener">recent publication</a>), and we found that the two most popular approaches can be categorized as Segment-and-classify ( classification-based) and Separate-and-detect (activation-based).</p>
<p>Based on these two types of approaches, two learning paradigms for incorporating unlabeled data are evaluated. These are</p>
<ol>
<li style="list-style-type: none;">
<ol>
<li>Feature Learning and</li>
<li>Student Teacher Learning.</li>
</ol>
</li>
</ol>
<p>As shown in the figure below, both paradigms may extract information from unlabeled data and transfer them to ADT systems through different mechanisms; the feature learning paradigm learns a feature extractor that computes distinctive features from audio signals, whereas the student teacher learning paradigm focuses on generating “pseudo ground truth” (i.e., soft targets) using teacher models and passing them onto student models. Different variants of both paradigms are evaluated.</p>
<p><img decoding="async" class="aligncenter size-full wp-image-435" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/method-overview.png" alt="" width="1733" height="409" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/method-overview.png 1733w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/method-overview-300x71.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/method-overview-1024x242.png 1024w" sizes="(max-width: 1733px) 100vw, 1733px" /></p>
<h4>Results 1: we need more labeled data</h4>
<p>In the first part of the experiment results, the averaged performance across all evaluated systems for each labeled dataset (e.g, ENST drums, MIREX 2005, MDB drums, RBMA) is presented. As shown in the following figure, for each individual drum instrument such as Kick Drum (KD), Snare Drum (SD), and HiHat (HH), the averaged performances differ from dataset to dataset. This result not only shows the relative difficulties of these datasets, but also implies the danger of relying on solely one dataset (which is exactly the case in many prior ADT studies). This result highlights the need for more diverse labeled datasets!</p>
<p><img decoding="async" class="aligncenter size-full wp-image-436" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/cw_results1.png" alt="" width="1729" height="769" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/cw_results1.png 1729w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/cw_results1-300x133.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/cw_results1-1024x455.png 1024w" sizes="(max-width: 1729px) 100vw, 1729px" /></p>
<h4>Results 2: unlabeled data is useful</h4>
<p>In the second part of the experiment results, different systems for each learning paradigm are compared under the controlled conditions (e.g., training methods, the number of unlabeled examples used, etc.). For feature learning paradigm, as shown in the following table, both evaluated systems outperformed baseline systems on SD for the averaged F-Measure. This improvement was confirmed through a statistical check. This result suggests Segment-and-classify ADT systems can successfully benefit from unlabeled data through feature learning.</p>
<table>
<tbody>
<tr>
<td style="width: 206.883px;"><strong>Role</strong></td>
<td style="width: 151.9px;"><strong>System</strong></td>
<td style="width: 81.8667px;"><strong>HH</strong></td>
<td style="width: 73.8333px;"><strong>BD</strong></td>
<td style="width: 69.5167px;"><strong>SD</strong></td>
</tr>
<tr>
<td style="width: 206.883px;">Baseline</td>
<td style="width: 151.9px;">MFCC</td>
<td style="width: 81.8667px;">0.61</td>
<td style="width: 73.8333px;"><strong>0.62</strong></td>
<td style="width: 69.5167px;">0.40</td>
</tr>
<tr>
<td style="width: 206.883px;">Baseline</td>
<td style="width: 151.9px;">CONV-RANDOM</td>
<td style="width: 81.8667px;">0.61</td>
<td style="width: 73.8333px;">0.54</td>
<td style="width: 69.5167px;">0.39</td>
</tr>
<tr>
<td style="width: 206.883px;">Evaluated</td>
<td style="width: 151.9px;">CONV-AE</td>
<td style="width: 81.8667px;">0.61</td>
<td style="width: 73.8333px;"><strong>0.62</strong></td>
<td style="width: 69.5167px;"><strong>0.42</strong></td>
</tr>
<tr>
<td style="width: 206.883px;">Evaluated</td>
<td style="width: 151.9px;">CONV-DAE</td>
<td style="width: 81.8667px;">0.61</td>
<td style="width: 73.8333px;">0.61</td>
<td style="width: 69.5167px;"><strong>0.42</strong></td>
</tr>
</tbody>
</table>
<p>For student-teacher learning paradigm, encouraging results can also be found (this time on HH). In the following table, it is shown that all student models performed better on HH compared to both teacher models. This result indicates the possibility of obtaining students that are better than teachers with the help of unlabeled data. Additionally, this result confirms the finding in our previous paper in a larger scale.</p>
<table>
<tbody>
<tr>
<td style="width: 149.283px;"><strong>Role</strong></td>
<td style="width: 232.383px;"><strong>System</strong></td>
<td style="width: 66.5833px;"><strong>HH</strong></td>
<td style="width: 67.8833px;"><strong>BD</strong></td>
<td style="width: 67.8667px;"><strong>SD</strong></td>
</tr>
<tr>
<td style="width: 149.283px;">Teacher</td>
<td style="width: 232.383px;">PFNMF (SMT)</td>
<td style="width: 66.5833px;">0.47</td>
<td style="width: 67.8833px;">0.61</td>
<td style="width: 67.8667px;"><strong>0.45</strong></td>
</tr>
<tr>
<td style="width: 149.283px;">Teacher</td>
<td style="width: 232.383px;">PFNMF (200D)</td>
<td style="width: 66.5833px;">0.47</td>
<td style="width: 67.8833px;"><strong>0.67</strong></td>
<td style="width: 67.8667px;">0.40</td>
</tr>
<tr>
<td style="width: 149.283px;">Student</td>
<td style="width: 232.383px;">FC-200</td>
<td style="width: 66.5833px;"><strong>0.56</strong></td>
<td style="width: 67.8833px;">0.57</td>
<td style="width: 67.8667px;">0.44</td>
</tr>
<tr>
<td style="width: 149.283px;">Student</td>
<td style="width: 232.383px;">FC-ALL</td>
<td style="width: 66.5833px;">0.53</td>
<td style="width: 67.8833px;">0.59</td>
<td style="width: 67.8667px;">0.42</td>
</tr>
<tr>
<td style="width: 149.283px;">Student</td>
<td style="width: 232.383px;">FC-ALL (ALT</td>
<td style="width: 66.5833px;">0.55</td>
<td style="width: 67.8833px;">0.58</td>
<td style="width: 67.8667px;">0.44</td>
</tr>
</tbody>
</table>
<h4>Conclusion</h4>
<p>According to the results, both learning paradigms can potential improve the ADT performances with the addition of unlabeled data. However, each paradigm seems to benefit different instruments. In other words, it is not easy to conclude which paradigm is “the way to go” when it comes to harnessing unlabeled resources. To simply put it, unlabeled data certainly has potential in improving ADT systems, and further investigation is worthwhile.</p>
<p>If you are interested in learning more details about this work, please refer to <a href="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Wu-and-Lerch-From-Labeled-to-Unlabeled-Data-On-the-Data-Chal.pdf" target="_blank" rel="noopener">our paper</a>. The code is available on <a href="https://github.com/cwu307/ADT_with_unlabeledData" target="_blank" rel="noopener">github</a>.</p>The post <a href="https://musicinformatics.gatech.edu/project/from-labeled-to-unlabeled-data-on-the-data-challenge-in-automatic-drum-transcription/">From labeled to unlabeled data – on the data challenge in automatic drum transcription</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>On the evaluation of generative models in music</title>
		<link>https://musicinformatics.gatech.edu/project/on-the-evaluation-of-generative-models-in-music/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Wed, 09 Jan 2019 18:53:14 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=422</guid>

					<description><![CDATA[<p>by Li-Chia Richard Yang Generative modeling among creative systems has research interest in a wide variety of tasks. Just as deep learning has reshaped the whole field of artificial intelligence, it has reinvented generative modeling in recent years, e.g., in music or painting. Regardless, however, of the research interest in generative systems, the assessment and &#8230; <a href="https://musicinformatics.gatech.edu/project/on-the-evaluation-of-generative-models-in-music/" class="more-link">Continue reading <span class="screen-reader-text">On the evaluation of generative models in music</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/on-the-evaluation-of-generative-models-in-music/">On the evaluation of generative models in music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 10pt;">by Li-Chia Richard Yang</span></p>



<p>Generative modeling among creative systems has research interest in a wide variety of tasks. Just as deep learning has reshaped the whole field of artificial intelligence, it has reinvented generative modeling in recent years, e.g., in <a href="https://magenta.tensorflow.org" rel="noreferrer noopener" aria-label="Music  (opens in a new tab)">music </a>or <a href="https://deepart.io" target="_blank" rel="noreferrer noopener" aria-label="Painting (opens in a new tab)">painting</a>. Regardless, however, of the research interest in generative systems, the assessment and evaluation of such systems has proven challenging.</p>



<p>In recent research on music generation, various data-driven models have shown promising results. As a quick example, here are two generated samples from two distinct systems:</p>



<figure class="wp-block-audio"><audio src="https://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/audio_content1.mp3" controls="controls"></audio>
<figcaption>Magenta (Attention RNN)</figcaption>
</figure>



<figure class="wp-block-audio"><audio src="https://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/audio_content2.mp3" controls="controls"></audio>
<figcaption>Magenta (Lookback RNN)</figcaption>
</figure>



<p>&nbsp;</p>
<p>Now, how can we analyze and compare the behavior of these models?<br />As the ultimate judge of creative output is the human (listener or viewer), subjective evaluation is generally preferable in generative modeling. However, the general drawbacks of subjective evaluation can be summarized as various issues related to the required amount of resources and to the experiment design. Furthermore, objective evaluation has the advantage of providing a systematic, repeatable measurement across a significant amount of generated samples.</p>



<h4 class="wp-block-heading">The proposed evaluation strategy</h4>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="624" height="285" class="wp-image-425" src="https://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/flow.jpg" alt="" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/flow.jpg 624w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/flow-300x137.jpg 300w" sizes="auto, (max-width: 624px) 100vw, 624px" /></figure>



<p>The proposed method does not aim at assessing musical pieces in the context of human-level creativity nor does it attempt to model the aesthetic perception of music.</p>



<p>It rather applies the concept of multi-criteria evaluation in order to provide metrics that assess basic technical properties of the generated music and help researchers identify issues and specific characteristics of both model and dataset. In a first step, we define two collections of samples as our datasets (in case of objective evaluation, one dataset contains the generated samples, the other contains samples from the training dataset). Then, we extract a set of features based on musical domain-knowledge for two main targets of the proposed evaluation strategy:</p>
<h5>Absolute MeasurEments</h5>

<p>Absolute measurements give insights into properties and characteristics of a generated or collected set of data.<br />During the model design phase of a generative system, it can be of interest to investigate absolute metrics from the output of different system iterations or of datasets as opposed to a relative evaluation. A typical example is the comparison of the generated results from two generative systems: although the model properties cannot be determined precisely for a data-driven approach, the observation of the generated samples can justify or invalidate a system design. <br />For instance, an absolute measurement can be a statistic analysis of note length transition histogram of each sample of a given dataset. In the following figure, we can easily observe the difference of such this feature among datasets from two different genres.</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-full wp-image-427" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/experiment1.jpg" alt="" width="624" height="298" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/experiment1.jpg 624w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/experiment1-300x143.jpg 300w" sizes="auto, (max-width: 624px) 100vw, 624px" /></p>

<h5>Relative Measurements</h5>
<p>In order to enable the comparison of different sets of data, the relative measure generalizes the result among features with various dimensions.<br />We first perform pairwise exhaustive cross-validation for each feature and smooth the histogram results into probability density functions (PDFs) for a more generalizable representation. If the cross-validation is computed within one set of data, we will refer to it as intra-set distances. If each sample of one set is compared with all samples of the other set, we call it the inter-set distances. <br />Finally, we measure the similarity between these distributions for the application of evaluating music generative systems, and compute two metrics between the target dataset’s intra-set PDF and the inter-set PDF: the Kullback-Leibler Divergence (KLD) and overlapped area (OA) of two PDFs.<br />Take the following visualized figure as an example: assume set1 is the training data, while set2 and set3 correspond to generated results from two systems. The analysis can provide a quick observation It can be easily observed that the system that generates generated set2 produces results more in line has a closer behavior in such feature with the training data (in the context of this feature).</p>
<p><img loading="lazy" decoding="async" class="aligncenter size-full wp-image-428" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/example_pdf.jpg" alt="" width="624" height="455" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/example_pdf.jpg 624w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/example_pdf-300x219.jpg 300w" sizes="auto, (max-width: 624px) 100vw, 624px" /></p>
<h4>Find out more</h4>
<p>Check out our <span class="Hyperlink0"><a href="https://rdcu.be/baHuU"><span style="color: black;">paper</span></a></span> for detailed use-case demonstration and the released <span class="Hyperlink0"><a href="https://github.com/RichardYang40148/mgeval"><span style="color: black;">toolbox</span></a></span> for further application.</p>The post <a href="https://musicinformatics.gatech.edu/project/on-the-evaluation-of-generative-models-in-music/">On the evaluation of generative models in music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Instrument Activity Detection in Polyphonic Music</title>
		<link>https://musicinformatics.gatech.edu/conferences/instrument-activity-detection-in-polyphonic-music/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Thu, 20 Dec 2018 00:45:02 +0000</pubDate>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=411</guid>

					<description><![CDATA[<p>by Siddharth Gururani Most forms of music are rendered as a mixture of acoustic and electronic instruments. The human ear, for the most part, is able to discern the instruments being played in a song fairly easily. However, the same is not true for computers or machines. The task of recognizing instrumentation in music is &#8230; <a href="https://musicinformatics.gatech.edu/conferences/instrument-activity-detection-in-polyphonic-music/" class="more-link">Continue reading <span class="screen-reader-text">Instrument Activity Detection in Polyphonic Music</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/conferences/instrument-activity-detection-in-polyphonic-music/">Instrument Activity Detection in Polyphonic Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 10pt;">by Siddharth Gururani</span></p>



<p>Most forms of music are rendered as a mixture of acoustic and electronic instruments. The human ear, for the most part, is able to discern the instruments being played in a song fairly easily. However, the same is not true for computers or machines. The task of recognizing instrumentation in music is still an unsolved and active area of research in Music Information Retrieval (MIR). </p>



<p>The applications of such a technology are manifold:
</p>



<ul class="wp-block-list"><li>Metadata which includes instrumentation enables instrument-specific music discovery and recommendations. </li><li>Identifying regions of activity of specific instruments in a song allows easy browsing for users. For example, a user interested in a guitar solo or vocals in a song can easily browse to the relevant part.</li><li>Instrument activity detection may serve as a helpful pre-processing step for other MIR tasks such as automatic transcription and source separation.</li></ul>



<p>In our work, we propose a neural network-based system to detect activity for 18 different instruments in polyphonic music.</p>



<h4 class="wp-block-heading">Challenges in Instrument Activity Detection</h4>



<p>A big challenge in building algorithms for instrument activity detection is the lack of appropriate datasets. Until very recently, the IRMAS dataset was used as the benchmark dataset for instrument recognition in polyphonic music. However, this dataset is not suitable for an instrument activity detection because of the following reasons:
</p>



<ul class="wp-block-list"><li>The test set contains 3 to 10 second snippets of audio that are only labeled with instruments present instead of a fine-grained instrument activity annotation. </li><li>The training clips are labeled with a single ‘predominant’ instrument even if more than one instrument is active in the clip.</li></ul>



<p>
We overcome this challenge by leveraging multi-track datasets such as the MedleyDB and Mixing Secrets dataset. These multi-track datasets contain the mixes as well as the stems accompanying them. Therefore, annotations for fine-grained stem activity may be automatically obtained by applying envelope tracking on the instrument stems.</p>



<p>In addition, we identify metrics that allow easier comparison of models for instrument activity detection. Traditional metrics such as precision, recall and f1-score are both threshold dependent and not ideal for multi-label classification scenarios.  We use label-ranking average precision (LRAP) and area under the ROC curve (AUC) for comparison between different model architectures. Both these metrics are threshold agnostic and are suitable for multi-label classification.</p>



<div class="wp-block-image"><figure class="aligncenter"><img loading="lazy" decoding="async" width="513" height="139" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_flowchart.png" alt="" class="wp-image-413" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_flowchart.png 513w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_flowchart-300x81.png 300w" sizes="auto, (max-width: 513px) 100vw, 513px" /></figure></div>



<h4 class="wp-block-heading">Method and Models</h4>



<p>We propose a rather simple pipeline for our instrument activity detection system. The block diagram below shows the high-level processing steps in our approach. First, we split our all the multi-tracks into artist conditional splits. We obtain 361 training tracks and 100 testing tracks. During training, the various models are fed with log-scaled mel-spectrograms for 1 second clips for the training tracks. We train these models to predict all the instruments present in a 1 second clip. We compare Fully Connected, Convolutional (CNN) and Convolutional-Recurrent (CRNN) Neural Networks in this work.</p>



<p>During testing, a track is split into 1 second clips and fed into the model. Once all 1 second level predictions are obtained from the model, we evaluate the predictions at different time-scales: 1 s, 5 s, 10 s and track-level. We aggregate over time by max-pooling the predictions and annotations for longer time-scale evaluation.</p>



<h4 class="wp-block-heading">Results</h4>



<p>As expected, the CNN and CRNN models outperform the Fully Connected architectures. The CNN or the CRNN perform very similarly and we attribute that to the choice of input time context. For only a 1 second input, there are only a few time-steps for the recurrent network to learn temporal features from, hence the insignificant change in performance over the CNN. An encouraging finding was that the models perform well for rare instruments also.</p>



<div class="wp-block-image"><figure class="aligncenter is-resized"><img loading="lazy" decoding="async" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_confusion.png" alt="" class="wp-image-414" width="451" height="336" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_confusion.png 338w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/12/IAD_confusion-300x224.png 300w" sizes="auto, (max-width: 451px) 100vw, 451px" /></figure></div>



<p>We also propose a method for visualizing confusions in a multi-label context, shown in the figure above. We visualize the distribution of false negatives for all instruments conditioned on a false positive of a particular instrument. For example, the first row in the matrix represents the distribution of false negatives of all instruments conditioned on the acoustic guitar false positives. We observe several cases of confusions that make sense musically, for example: different guitars, tabla and drums, synth and distorted guitars being confused.</p>



<p>For more details on the various processing steps, detailed results and discussion, please check out the paper <a href="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-Instrument-Activity-Detection-in-Polyphonic-Music-.pdf" target="_blank" rel="noopener">here</a>! Additionally, a 3 and a half minute lightning talk given at the ISMIR conference is accessible <a href="https://youtu.be/u3IJ2CYw66I?t=2008f" target="_blank" rel="noopener">here</a>.</p>The post <a href="https://musicinformatics.gatech.edu/conferences/instrument-activity-detection-in-polyphonic-music/">Instrument Activity Detection in Polyphonic Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Assessment of Student Music Performance using Deep Neural Networks</title>
		<link>https://musicinformatics.gatech.edu/project/assessment-of-student-music-performance-using-deep-neural-networks/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Tue, 26 Jun 2018 01:32:14 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=372</guid>

					<description><![CDATA[<p>by Ashis Pati Moving Towards Automatic Music Performance Assessment Systems Improving one’s proficiency in performing a musical instrument often requires constructive feedback from a trained teacher regarding various aspects of a performance, e.g., its musicality, note accuracy, rhythmic accuracy, which are often hard to define and evaluate. While the positive effects of a good teacher &#8230; <a href="https://musicinformatics.gatech.edu/project/assessment-of-student-music-performance-using-deep-neural-networks/" class="more-link">Continue reading <span class="screen-reader-text">Assessment of Student Music Performance using Deep Neural Networks</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/assessment-of-student-music-performance-using-deep-neural-networks/">Assessment of Student Music Performance using Deep Neural Networks</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 10pt;">by Ashis Pati</span></p>
<h4>Moving Towards Automatic Music Performance Assessment Systems</h4>
<p>Improving one’s proficiency in performing a musical instrument often requires constructive feedback from a trained teacher regarding various aspects of a performance, e.g., its musicality, note accuracy, rhythmic accuracy, which are often hard to define and evaluate. While the positive effects of a good teacher on the learning process is unquestionable, it is not always practical to have a teacher during practice. This begs the questions if we can design an autonomous system which can analyze a music performance and provide the necessary feedback to the student. Such a system will allow students without access to human teachers to learn music, effectively enabling them to get most out of their practice sessions.</p>
<h4>What are the limitations of the current systems?</h4>
<p>Most of the previous attempts (including <a href="http://www.musicinformatics.gatech.edu/conferences/objective-descriptors-for-the-assessment-of-student-music-performances/" target="_blank" rel="noopener">our own</a>) at automatic music performance systems have relied on:</p>
<ol>
<li>Extracting standard audio features (e.g. <a href="https://en.wikipedia.org/wiki/Spectral_flux">Spectral Flux</a>, <a href="https://en.wikipedia.org/wiki/Spectral_centroid">Spectral Centroid</a>, <a href="https://www.audiocontentanalysis.org">etc</a>.) which may not contain relevant information pertaining to a musical performance</li>
<li>Designing hand-crafted features from music which are based on our (limited?) understanding of music performances and their perception.</li>
</ol>
<p>Considering these limitations, relying on standard and hand-crafted features for the music performance assessment tasks leads to sub-optimal results. Instead, feature learning techniques which have no “prejudice” and can learn relevant features from the data <a href="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf">have shown promise</a> at this task.<br />
Deep Neural Networks (DNNs) form a special class of feature learning tools which are capable of learning complex relationships and functions from data. Over the last decade or so, they have emerged as the architecture-of-choice for a large number of discriminative tasks across multiple domains such as <a href="https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/" target="_blank" rel="noopener">images</a>, <a href="https://blogs.microsoft.com/ai/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/" target="_blank" rel="noopener">speech</a> and music. Thus, in this study, we explore the possibility of using DNNs for assessing student music performances. Specifically, we evaluate their performance with different input representations and network architectures.</p>
<h4>Input Representations and Network Architectures</h4>
<p>We chose input representations at two different levels of abstraction: a) Pitch Contour which extracts high level melodic information, and b) Mel-Spectrogram which extracts low-level information across several dimensions such as pitch, amplitude and timbre. The flow diagram for the computation of the input representations is shown in the Figure below:</p>
<p><figure id="attachment_374" aria-describedby="caption-attachment-374" style="width: 624px" class="wp-caption alignnone"><img loading="lazy" decoding="async" class="wp-image-374 size-full" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/flowchart.png" alt="" width="624" height="218" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/flowchart.png 624w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/flowchart-300x105.png 300w" sizes="auto, (max-width: 624px) 100vw, 624px" /><figcaption id="caption-attachment-374" class="wp-caption-text">Flow diagram for computation of input representations. F0: Fundamental frequency, MIDI: Musical instrument digital interface</figcaption></figure></p>
<p>Three different model architectures were used: a) A fully convolutional model with Pitch Contour as input (PC-FCN), b) A convolutional recurrent model with Mel-Spectrogram at input, and (M-CRNN) c) A hybrid model combining information both the input representations (PCM-CRNN). The three model architectures are shown below.</p>
<p><img loading="lazy" decoding="async" class="alignnone size-full wp-image-375" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch1.png" alt="" width="291" height="192" /><img loading="lazy" decoding="async" class="alignnone size-full wp-image-377" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch2.png" alt="" width="305" height="192" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch2.png 305w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch2-300x189.png 300w" sizes="auto, (max-width: 305px) 100vw, 305px" /><img loading="lazy" decoding="async" class="alignnone size-full wp-image-376" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch3.png" alt="" width="375" height="192" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch3.png 375w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/arch3-300x154.png 300w" sizes="auto, (max-width: 375px) 100vw, 375px" /></p>
<h4>Experiments and Results</h4>
<p>For data, we use the student performance recordings obtained from the Florida All-State auditions. Each performance is rated by experts along 4 different criteria: a) Musicality, b) Note Accuracy, c) Rhythmic Accuracy, and d) Tone Quality. Moreover, we consider two categories of students at different proficiency levels: a) Symphonic Band and, b) Middle School and design separate experiments for each category. Three instruments are considered: Alto Saxophone, Bb Clarinet and Flute.<br />
The models are trained to predict the ratings (which are normalized between 0 and 1) given by the experts. As baseline, we use a Support Vector Regression based model (SVR-BD) which relies on standard and hand-crafted features extracted from the audio signal. More details about the baseline model can be found in our <a href="http://www.musicinformatics.gatech.edu/conferences/objective-descriptors-for-the-assessment-of-student-music-performances/" target="_blank" rel="noopener">previous blog post</a>. The performance of the models at this regression task is summarized as the plot in Figure 6. The <a href="https://en.wikipedia.org/wiki/Coefficient_of_determination" target="_blank" rel="noopener">coefficient of determination</a> (R2) is used as the evaluation metric (higher is better).</p>
<p><figure id="attachment_379" aria-describedby="caption-attachment-379" style="width: 1208px" class="wp-caption alignnone"><img loading="lazy" decoding="async" class="size-full wp-image-379" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/results.png" alt="" width="1208" height="391" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/results.png 1208w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/results-300x97.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/results-1024x331.png 1024w" sizes="auto, (max-width: 1208px) 100vw, 1208px" /><figcaption id="caption-attachment-379" class="wp-caption-text">Evaluation results showing R2 metric for all assessment criteria. SVR-BD: Baseline Model, PC-FCN: Fully Convolutional Pitch Contour Model, M-CRNN: Convolutional Recurrent Model with Mel Spectrogram, PCM-CRNN: Hybrid Model Combining Mel-Spectrogram and P. Left: Symphonic Band, Right: Middle School</figcaption></figure></p>
<p>The results clearly show that the DNN based models outperform the baseline model across all 4 assessment criteria. In fact, the DNN models perform the best for the Musicality criterion which is arguably the most abstract and is hard to define. In the absence of a clear definition, it is indeed difficult to design features to describe musicality. The success of the DNN models at modeling this criterion is, thus, extremely encouraging.</p>
<p>Another interesting observation is that the pitch contour based model (PC-FCN) outperforms every other model for the Symphonic Band students. This could indicate that the high-level melodic information encoded by the pitch contour is important to assess students at a higher proficiency level since one would expect that the differences between individual students would be finer. The same is not true for Middle School students where the best models use the Mel-Spectrogram as the input.</p>
<h4>Way Forward</h4>
<p>While the success of DNNs at this task is encouraging, it should be noted, however, that the performance of the models is still not robust enough for practical applications. Some of the possible areas for future research include experimenting with other input representations (potentially raw audio), adding musical score information as input to the models and training instrument specific models. It is also important to develop better model analysis techniques which can allow us to understand and interpret the features learned by the model.<br />
For interested readers, the full paper published in the Applied Sciences Journal can be found <a href="http://www.mdpi.com/2076-3417/8/4/507/htm" target="_blank" rel="noopener">here</a>.</p>The post <a href="https://musicinformatics.gatech.edu/project/assessment-of-student-music-performance-using-deep-neural-networks/">Assessment of Student Music Performance using Deep Neural Networks</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Neural Style Transfer in Music</title>
		<link>https://musicinformatics.gatech.edu/project/neural-style-transfer-in-music/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Thu, 11 Jan 2018 14:52:23 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=336</guid>

					<description><![CDATA[<p>Read a recent blog post by our PhD student Ashis Pati on the difficulties on style transfer in music as opposed to pictures.</p>
The post <a href="https://musicinformatics.gatech.edu/project/neural-style-transfer-in-music/">Neural Style Transfer in Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p>Read a recent <a href="https://ashispati.github.io//style-transfer/">blog post</a> by our PhD student Ashis Pati on the difficulties on style transfer in music as opposed to pictures.</p>The post <a href="https://musicinformatics.gatech.edu/project/neural-style-transfer-in-music/">Neural Style Transfer in Music</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Guitar Solo Detection</title>
		<link>https://musicinformatics.gatech.edu/project/guitar-solo-detection/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Mon, 04 Dec 2017 15:57:47 +0000</pubDate>
				<category><![CDATA[dataset]]></category>
		<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=320</guid>

					<description><![CDATA[<p>by Ashis Pati Over the course of the evolution of rock, electric guitar solos have developed into an important feature of any rock song. Their popularity among rock music fans is reflected by lists found online such as here and here. The ability to automatically detect guitar solos could, for example, be used by music &#8230; <a href="https://musicinformatics.gatech.edu/project/guitar-solo-detection/" class="more-link">Continue reading <span class="screen-reader-text">Guitar Solo Detection</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/guitar-solo-detection/">Guitar Solo Detection</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 10pt;">by Ashis Pati</span></p>
<p>Over the course of the evolution of rock, electric guitar solos have developed into an important feature of any rock song. Their popularity among rock music fans is reflected by lists found online such as <a href="https://open.spotify.com/user/rdnk/playlist/6hxHl4n3sKaQCbdKc1Pzlg" target="_blank" rel="noopener">here</a> and <a href="https://www.stereogum.com/10114/rolling_stones_100_greatest_guitar_songs_of_all_ti/franchises/list/" target="_blank" rel="noopener">here</a>. The ability to automatically detect guitar solos could, for example, be used by music browsing and streaming services (like Apple Music and Spotify) to create targeted previews of rock songs. Such an algorithm would also be useful as a pre-processing step for other tasks such as guitar playing style analysis.</p>
<h4>What is a Solo?</h4>
<p>Even though most listeners can easily identify the location of a guitar solo within a song, it is not a trivial problem for a machine. Looking at it from an audio signal perspective, solos can be very similar to some of the other techniques such as riffs or licks.</p>
<p>Therefore, we define a guitar solo as having the following characteristics:</p>
<ul style="margin-left: 50px;">
<li>The guitar is in the foreground compared to other instruments</li>
<li>The guitar plays improvised melodic phrases which don’t repeat over measures (differentiate from a riff)</li>
<li>The section is larger than a few measures (differentiate from a lick)</li>
</ul>
<h4>What about Data?</h4>
<p>In the absence of any annotated dataset of guitar solos, we decided to create a pilot <a href="https://github.com/ashispati/GuitarSoloDetection/tree/master/Dataset" target="_blank" rel="noopener">dataset containing 60 full-length rock songs</a> and annotated the location of the guitar solos within the song.  Some of the songs contained in the dataset include classics like “Stairway to Heaven,” “Alive,” and “Hotel California.” The sub-genre distribution of the dataset is shown in Fig. 1.</p>
<p><img loading="lazy" decoding="async" class="size-full wp-image-321 aligncenter" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/12/gsd_genredist.png" alt="" width="287" height="180" /></p>
<h4>What Descriptors can be used to discriminate solos?</h4>
<p><img loading="lazy" decoding="async" class="size-medium wp-image-322 alignright" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/12/gsd_flowchart-159x300.png" alt="" width="159" height="300" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/12/gsd_flowchart-159x300.png 159w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/12/gsd_flowchart.png 191w" sizes="auto, (max-width: 159px) 100vw, 159px" />The widespread use of effect pedal boards and amps results in a plethora of different electric guitar “sounds,” possibly almost as large as the number of solos themselves. Hence, finding audio descriptors capable of discriminating a solo from a non-solo part is NOT a trivial task. To gauge how difficult this actually is, we implemented a Support Vector Machine (SVM) based supervised classification system (see the overall block diagram in Fig. 2).</p>
<p>In addition to the more ubiquitous spectral and temporal audio descriptors (such as Spectral Centroid, Spectral Flux, Mel-Frequency Cepstral Coefficients etc.), we examine two specific class of descriptors which intuitively should have better capacity to differentiate solo segments from non-solo segments.</p>
<ul style="margin-left: 50px;">
<li><strong>Descriptors from Fundamental Pitch estimation</strong>:<br /> A guitar solo is primarily a melodic improvisation and hence, can be expected to have a distinctive fundamental frequency component which would be different from that of another instrument (say a bass guitar). In addition, during a solo the guitar will have a stronger presence in the audio mix which can be measured using the strength of the fundamental frequency component.</li>
<li><strong>Descriptors from Structural Segmentation</strong>:<br /> A guitar solo generally doesn’t repeat in a song and hence, would not occur in repeated segments of a song (e.g., chorus, song). This allows to leverage existing structural segmentation algorithms in a novel way. A measure of the number of times a segment has been repeated in a song and the normalized length of the segment can serve as useful inputs to the classifier.</li>
</ul>
<p>By using these features and post-processing to group the identified solo segments together, we obtain a detection accuracy of nearly 78%.</p>
<p>The main purpose of this study was to provide a framework against which more sophisticated solo detection algorithms can be examined. We use relatively simple features to perform a rather complicated task. The performance of features based on structural segmentation is encouraging and warrants further research into developing better features. For interested readers, the full paper as presented at the <a href="http://www.aes.org/conferences/2017/semantic/" target="_blank" rel="noopener">2017 AES Conference on Semantic Audio</a> can be found <a href="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Pati_Lerch_2017_A-Dataset-and-Method-for-Electric-Guitar-Solo-Detection-in-Rock-Music.pdf">here</a>.</p>The post <a href="https://musicinformatics.gatech.edu/project/guitar-solo-detection/">Guitar Solo Detection</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data</title>
		<link>https://musicinformatics.gatech.edu/project/automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with-unlabeled-music-data/</link>
		
		<dc:creator><![CDATA[Alexander]]></dc:creator>
		<pubDate>Tue, 07 Nov 2017 02:41:50 +0000</pubDate>
				<category><![CDATA[project]]></category>
		<category><![CDATA[publication]]></category>
		<guid isPermaLink="false">http://www.musicinformatics.gatech.edu/?p=303</guid>

					<description><![CDATA[<p>by Chih-Wei Wu Building a computer system that “listens” and “understands” music is the goal of many researchers working in the field of Music Information Retrieval (MIR). To achieve this objective, identifying effective ways of translating human domain knowledge into computer language is the key. Machine learning (ML) promises to provide methods to fulfill this &#8230; <a href="https://musicinformatics.gatech.edu/project/automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with-unlabeled-music-data/" class="more-link">Continue reading <span class="screen-reader-text">Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data</span></a></p>
The post <a href="https://musicinformatics.gatech.edu/project/automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with-unlabeled-music-data/">Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></description>
										<content:encoded><![CDATA[<p><span style="font-size: 10pt;">by Chih-Wei Wu</span></p>
<p>Building a computer system that “listens” and “understands” music is the goal of many researchers working in the field of Music Information Retrieval (MIR). To achieve this objective, identifying effective ways of translating human domain knowledge into computer language is the key. Machine learning (ML) promises to provide methods to fulfill this goal. In short, ML algorithms are capable of making decisions (or predictions) in a way that is similar to human experts; this is achievable by browsing and observing patterns within so called “training data.” When large amounts of data are available (e.g., images and text), modern ML systems can perform comparably or even outperform human experts in tasks such as object recognition in images.</p>
<p>Similarly, to train a successful ML model for MIR tasks, (openly available) data also plays an essential role. Useful training data usually includes both the raw data (e.g., audio files, video files) and annotations that describe the answer for a certain task (such as the music genre, the tempo of the music). With a reasonable amount of data and correct ground truth labels, the ML models may build a function that maps the raw data to their corresponding answers.</p>
<p>One of the first questions new researchers ask is: “How much data is needed to build a good model?” The short answer to the first question is the more the better. This answer may be a little unsatisfying, but it is often true for ML algorithms (especially the increasingly popular deep neural networks!). The human annotation of data, however, is labor-intensive and does not scale well. This situation gets worse when the target task requires highly skilled annotators and crowdsourcing is not an option. Automatic Drum Transcription (ADT), a process that extracts the drum events from the audio signals, is a good example of such skill demanding task. To date, most of the existing ADT datasets are either too small or too simple (synthetic).</p>
<p>To find a potential solution for this problem, we try to explore the possibility of having ML systems learn from the data without labels (as shown in Fig. 1).</p>
<p><figure id="attachment_304" aria-describedby="caption-attachment-304" style="width: 455px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class=" wp-image-304" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/unlabeled-300x130.png" alt="unlabeled data" width="455" height="197" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/unlabeled-300x130.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/unlabeled.png 546w" sizes="auto, (max-width: 455px) 100vw, 455px" /><figcaption id="caption-attachment-304" class="wp-caption-text">The concept of learning from unlabeled data</figcaption></figure></p>
<p>Unlabeled data has the following advantages: 1) it is easily available compared to labeled data, 2) it is diverse, and 3) it is realistic.</p>
<p>We explore a fascinating way of using unlabeled data referred to as the “student-teacher” learning paradigm. In a way, it uses “machines to teach machines.” As researchers have been working on systems for drum transcription before, these existing systems can be utilized as teachers. Multiple teachers “transfer” their knowledge to the student and the unlabeled data is used as the medium to carry the knowledge of the teachers. The teachers make their predictions on the unlabeled data and the student will try to mimic the teachers’ predictions and become better and better at certain task. Of course, the teachers might be wrong, but the assumption is that multiple teachers and a large amount of data will compensate for this.</p>
<p><figure id="attachment_305" aria-describedby="caption-attachment-305" style="width: 441px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class=" wp-image-305" src="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/flowchart-300x136.png" alt="" width="441" height="200" srcset="https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/flowchart-300x136.png 300w, https://musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/flowchart.png 462w" sizes="auto, (max-width: 441px) 100vw, 441px" /><figcaption id="caption-attachment-305" class="wp-caption-text">System flowchart</figcaption></figure></p>
<p>Figure 2 shows the presented system consisting of a training phase and a testing phase. During the training phase, all teacher models will be used to generate their predictions on the unlabeled data. These predictions will become “soft targets” or pseudo ground truth. Next, the student model is trained on the same unlabeled data with the soft targets. In the testing phase, the trained student model will be tested against an existing labeled dataset for evaluation.</p>
<p>The exciting (preliminary) result of this research is that the student model is actually able to outperform the teachers!  Through our evaluation, we show that it is possible to get a student model that outperforms the teacher models on certain drum instruments for ADT task. This finding is encouraging and shows  the potential benefits we can get from working with unlabeled data.</p>
<p>For more information, please refer to our <a href="http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/07/Wu_Lerch_2017_Automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with.pdf">full paper</a>. The unlabeled dataset can be found on <a href="https://github.com/cwu307/unlabeledDrumDataset">github</a>.</p>The post <a href="https://musicinformatics.gatech.edu/project/automatic-drum-transcription-using-the-student-teacher-learning-paradigm-with-unlabeled-music-data/">Automatic drum transcription using the student-teacher learning paradigm with unlabeled music data</a> first appeared on <a href="https://musicinformatics.gatech.edu">Music Informatics Group</a>.]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
