Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Language Classification of Music Using Metadata
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control.
2019 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The purpose of this study was to investigate how metadata from Spotify could be used to identify the language of songs in a dataset containing nine languages. Features based on song name, album name, genre, regional popularity and vectors describing songs, playlists and users were analysed individually and in combination with each other in different classifiers. In addition to this, this report explored how different levels of prediction confidence affects performance and how it compared to a classifier based on audio input.

A random forest classifier proved to have the best performance with an accuracy of 95.4% for the whole data set. Performance was also investigated when the confidence of the model was taken into account, and when only keeping more confident predictions from the model, accuracy was higher. When keeping the 70% most confident predictions an accuracy of 99.4% was achieved. The model also proved to be robust to input of other languages than it was trained on, and managed to filter out unwanted records not matching the languages of the model. A comparison was made to a classifier based on audio input, where the model using metadata performed better on the training and test set used. Finally, a number of possible improvements and future work were suggested.

Place, publisher, year, edition, pages
2019. , p. 56
Series
UPTEC STS, ISSN 1650-8319 ; 19007
Keywords [en]
Music, Metadata, Language, Classification, Machine Learning, Spotify, Multiclass, Feature Importance
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:uu:diva-379625OAI: oai:DiVA.org:uu-379625DiVA, id: diva2:1297032
External cooperation
Spotify
Subject / course
Computer Systems Sciences
Educational program
Systems in Technology and Society Programme
Supervisors
Examiners
Available from: 2019-03-19 Created: 2019-03-18 Last updated: 2019-03-19Bibliographically approved

Open Access in DiVA

language_classification_metadata(770 kB)47 downloads
File information
File name FULLTEXT01.pdfFile size 770 kBChecksum SHA-512
c30231ef8bb48dd4925605e0286c5751073efa39b3eef883e436cd0c03d4ff4cbe5f8d2deecc60a7f46c5b0e8a1fd187a389a9f005e5daf1e1fdb51a6c5f0505
Type fulltextMimetype application/pdf

By organisation
Division of Systems and Control
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 47 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 180 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf