Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
AI-Enabled Text-to-Music Generation: A Comprehensive Review of Methods, Frameworks, and Future Directions
University of Science and Technology Beijing, China.
Guangxi Tourism Development One-Click Tour Digital Cultural Tourism Industry Co., Ltd., China.
Administrative Office, Chunan Academy of Governance, China.
Jinzhong University, China.
Show others and affiliations
2025 (English)In: Electronics, E-ISSN 2079-9292, Vol. 14, no 6, article id 1197Article, review/survey (Refereed) Published
Abstract [en]

Text-to-music generation integrates natural language processing and music generation, enabling artificial intelligence (AI) to compose music from textual descriptions. While AI-enabled music generation has advanced, challenges in aligning text with musical structures remain underexplored. This paper systematically reviews text-to-music generation across symbolic and audio domains, covering melody composition, polyphony, instrumental synthesis, and singing voice generation. It categorizes existing methods into traditional, hybrid, and end-to-end LLM-centric frameworks according to the usage of large language models (LLMs), highlighting the growing role of LLMs in improving controllability and expressiveness. Despite progress, challenges such as data scarcity, representation limitations, and long-term coherence persist. Future work should enhance multi-modal integration, improve model generalization, and develop more user-controllable frameworks to advance AI-enabled music composition. 

Place, publisher, year, edition, pages
MDPI, 2025. Vol. 14, no 6, article id 1197
Keywords [en]
artificial intelligence, large language model, music generation, text-to-music generation
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:bth-27688DOI: 10.3390/electronics14061197ISI: 001453821500001Scopus ID: 2-s2.0-105001095759OAI: oai:DiVA.org:bth-27688DiVA, id: diva2:1950322
Available from: 2025-04-07 Created: 2025-04-07 Last updated: 2025-04-07Bibliographically approved

Open Access in DiVA

fulltext(6550 kB)142 downloads
File information
File name FULLTEXT01.pdfFile size 6550 kBChecksum SHA-512
e9087b7477befb25d30a52b18ebad8cbc11741856c9a81a165a2a82cf6b353d107adf9f34ecbad3c1bd644dfef9ed26a1b4daed2fbed45c8827ef46dce9b7576
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Ding, Jianguo
By organisation
Department of Computer Science
In the same journal
Electronics
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 143 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 660 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf