Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Modeling Music Modality with a Key-Class Invariant Pitch Chroma CNN
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-4957-2128
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2926-6518
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN layer to compute two pitch chromas focused on dif-ferent octaves. The following layers perform harmony analysis across chroma and time scales. Through max pooling across pitch, the CNN becomes invariant with re-gards to the key class (i.e., key disregarding mode) of the music. A multilayer perceptron combines the modality ac-tivation output with spectral features for the final predic-tion. The study uses a dataset of 203 excerpts rated by around 20 listeners each, a small challenging data size re-quiring a carefully designed parameter sharing. With an R2 of about 0.71, the system clearly outperforms previous sys-tems as well as individual human listeners. A final ablation study highlights the importance of using pitch activations processed across longer time scales, and using pooling to facilitate invariance with regards to the key class.

Place, publisher, year, edition, pages
2019.
Keywords [en]
Pitch chroma, invariance, modelling, audio analysis, perceptual features, CNN
National Category
Signal Processing Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-262662OAI: oai:DiVA.org:kth-262662DiVA, id: diva2:1361730
Conference
20th International Society for Music In-formation Retrieval Conference, Delft, Netherlands, November 4-8, 2019
Note

QC 20191018

Available from: 2019-10-16 Created: 2019-10-16 Last updated: 2019-10-18Bibliographically approved

Open Access in DiVA

fulltext(644 kB)4 downloads
File information
File name FULLTEXT01.pdfFile size 644 kBChecksum SHA-512
93e09441d32d11b941cb3a302c3e24f17038a542f379253daef5a8adcbee0b58d339848a4d083584a5a2ad4a73cbe63c1edc1bc3d0fd57568e4190d975a78568
Type fulltextMimetype application/pdf

Other links

Conference

Search in DiVA

By author/editor
Elowsson, AndersFriberg, Anders
By organisation
Speech, Music and Hearing, TMH
Signal ProcessingOther Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 4 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 36 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf