Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting Perceived Dissonance of Piano Chords Using a Chord-Class Invariant CNN and Deep Layered Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH. ENSTA ParisTech.
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-4957-2128
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-2926-6518
2019 (English)In: Proceedings of 16th Sound & Music Computing Conference (SMC), Malaga, Spain, 2019, p. 530-536Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a convolutional neural network (CNN) able to predict the perceived dissonance of piano chords. Ratings of dissonance for short audio excerpts were com- bined from two different datasets and groups of listeners. The CNN uses two branches in a directed acyclic graph (DAG). The first branch receives input from a pitch esti- mation algorithm, restructured into a pitch chroma. The second branch analyses interactions between close partials, known to affect our perception of dissonance and rough- ness. The analysis is pitch invariant in both branches, fa- cilitated by convolution across log-frequency and octave- wide max-pooling. Ensemble learning was used to im- prove the accuracy of the predictions. The coefficient of determination (R2) between rating and predictions are close to 0.7 in a cross-validation test of the combined dataset. The system significantly outperforms recent computational models. An ablation study tested the impact of the pitch chroma and partial analysis branches separately, conclud- ing that the deep layered learning approach with a pitch chroma was driving the high performance.

Place, publisher, year, edition, pages
2019. p. 530-536
Keywords [en]
dissonance, machine learning, CNN, invariance, audio analysis, perceptual features
National Category
Signal Processing
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-262723ISBN: 978-84-09-08518-7 (print)OAI: oai:DiVA.org:kth-262723DiVA, id: diva2:1362342
Conference
16th Sound & Music Computing Conference SMC2019, Malaga, Spain
Note

QC 20191022

Available from: 2019-10-18 Created: 2019-10-18 Last updated: 2019-10-22Bibliographically approved

Open Access in DiVA

fulltext(702 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 702 kBChecksum SHA-512
e1030ce38668082cfbb0993f1af036a550751d02aa5a01238bf601f8ef778bd03a6d6f50848537b9c6045740ed93faf7a00dba4d31ffda9ef3d25a9070e3e45c
Type fulltextMimetype application/pdf

Other links

Conference webpage

Search in DiVA

By author/editor
Dubois, JulietteElovsson, AndersFriberg, Anders
By organisation
Speech, Music and Hearing, TMH
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf