September, 2025 | Music Informatics Group

by Alexander Lerch

Generative Artificial intelligence is increasingly capable of composing music, from short melodies to full songs. Despite the increasing number of new, “superior” models, there has been no consensus on how to measure this progress. How do we know if one model is indeed better than the other?

Evaluating AI-generated music is challenging because music perception is inherently subjective. There is no single “correct” or “best” version of a song, and people’s tastes vary widely and objectively evaluating elusive properties such as aesthetics, musicality, creativity or emotional impact is ultimately pointless. The language of music is complex and abstract, and its perception subjective.

Evaluation Targets

The paper breaks evaluation into two main categories:
• System Output: focusing on the generated output of a system and its properties
• User Experience: focusing on how people interact with a generative system.
Researchers use both subjective and objective methods. Subjective methods include listening tests, surveys, and Turing-style tests where listeners try to guess whether a piece was composed by a human or a machine. Objective methods use mathematical metrics to compare the AI’s output to human-composed music, measuring things like pitch distribution, rhythm patterns, and audio fidelity.

Challenges and Conclusion

There are several major challenges in evaluating generative music systems. First, the validity of existing methodologies is limited. Second, existing metrics have limited and/or unknown musical and perceptual meaning. Third, there is no standard set of metrics, which makes it hard to compare different systems. In addition, there are concerns around the topic of responsible AI.

There is a need for more consistent, interdisciplinary approaches to evaluating generative music. It highlights the need for better metrics, more transparent research practices, and deeper collaboration between computer scientists, musicians, and psychologists.

Resources

Please find the open access survey paper for more details.

Month: September 2025

Evaluation of Generative Models in Music

Evaluation Targets

Challenges and Conclusion

Resources