AI-Enabled Text-to-Music Generation: A Comprehensive Review of Methods, Frameworks, and Future DirectionsShow others and affiliations
2025 (English)In: Electronics, E-ISSN 2079-9292, Vol. 14, no 6, article id 1197Article, review/survey (Refereed) Published
Abstract [en]
Text-to-music generation integrates natural language processing and music generation, enabling artificial intelligence (AI) to compose music from textual descriptions. While AI-enabled music generation has advanced, challenges in aligning text with musical structures remain underexplored. This paper systematically reviews text-to-music generation across symbolic and audio domains, covering melody composition, polyphony, instrumental synthesis, and singing voice generation. It categorizes existing methods into traditional, hybrid, and end-to-end LLM-centric frameworks according to the usage of large language models (LLMs), highlighting the growing role of LLMs in improving controllability and expressiveness. Despite progress, challenges such as data scarcity, representation limitations, and long-term coherence persist. Future work should enhance multi-modal integration, improve model generalization, and develop more user-controllable frameworks to advance AI-enabled music composition.
Place, publisher, year, edition, pages
MDPI, 2025. Vol. 14, no 6, article id 1197
Keywords [en]
artificial intelligence, large language model, music generation, text-to-music generation
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:bth-27688DOI: 10.3390/electronics14061197ISI: 001453821500001Scopus ID: 2-s2.0-105001095759OAI: oai:DiVA.org:bth-27688DiVA, id: diva2:1950322
2025-04-072025-04-072025-04-07Bibliographically approved