Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A comparative analysis of CNN and LSTM for music genre classification
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
En jämförande analys av CNN och LSTM för klassificering av musikgenrer (Swedish)
Abstract [en]

The music industry has seen a great influx of new channels to browse and distribute music. This does not come without drawbacks. As the data rapidly increases, manual curation becomes a much more difficult task. Audio files have a plethora of features that could be used to make parts of this process a lot easier. It is possible to extract these features, but the best way to handle these for different tasks is not always known. This thesis compares the two deep learning models, convolutional neural network (CNN) and long short-term memory (LSTM), for music genre classification when trained using mel-frequency cepstral coefficients (MFCCs) in hopes of making audio data as useful as possible for future usage. These models were tested on two different datasets, GTZAN and FMA, and the results show that the CNN had a 56.0% and 50.5% prediction accuracy, respectively. This outperformed the LSTM model that instead achieved a 42.0% and 33.5% prediction accuracy.

Abstract [sv]

Musikindustrin har sett en stor ökning i antalet sätt att hitta och distribuera musik. Det kommer däremot med sina nackdelar, då mängden data ökar fort så blir det svårare att hantera den på ett bra sätt. Ljudfiler har mängder av information man kan extrahera och därmed göra den här processen enklare. Det är möjligt att använda sig av de olika typer av information som finns i filen, men bästa sättet att hantera dessa är inte alltid känt. Den här rapporten jämför två olika djupinlärningsmetoder, convolutional neural network (CNN) och long short-term memory (LSTM), tränade med mel-frequency cepstral coefficients (MFCCs) för klassificering av musikgenre i hopp om att göra ljuddata lättare att hantera inför framtida användning. Modellerna testades på två olika dataset, GTZAN och FMA, där resultaten visade att CNN:et fick en träffsäkerhet på 56.0% och 50.5% tränat på respektive dataset. Denna utpresterade LSTM modellen som istället uppnådde en träffsäkerhet på 42.0% och 33.5%.

Place, publisher, year, edition, pages
2019. , p. 23
Series
TRITA-EECS-EX ; 2019:372
Keywords [en]
Bachelor thesis, music genre classification, GTZAN, FMA, CNN, LSTM
Keywords [sv]
Kandidatexamensarbete, klassificering av musikgenrer, GTZAN, FMA, CNN, LSTM
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-260138OAI: oai:DiVA.org:kth-260138DiVA, id: diva2:1354738
Supervisors
Examiners
Available from: 2019-10-09 Created: 2019-09-26 Last updated: 2019-10-09Bibliographically approved

Open Access in DiVA

fulltext(694 kB)25 downloads
File information
File name FULLTEXT01.pdfFile size 694 kBChecksum SHA-512
7a689cb60a1a8d7a48da1067743d23c757ef6c98613925d26bebaa1232610b46bd08b3888066a1d3507cafec5522bb5180593822170e871a614d7b967be60fd6
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 25 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf