Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Oövervakad ljudspektrogramkompression med vektor-kvantiserande självkodande neurala nätverk (Swedish)
Abstract [en]

Despite the recent successes of neural networks in a variety of domains, musical audio modeling is still considered a hard task, with features typically spanning tens of thousands of dimensions in input space. By formulating audio data compression as an unsupervised learning task, this project investigates the applicability of vector quantized neural network autoencoders for compressing spectrograms – image-like representations of audio. Using a recently proposed gradient-based method for approximating waveforms from reconstructed (real-valued) spectrograms, the discrete pipeline produces listenable reconstructions of surprising fidelity compared to uncompressed versions, even for out-of-domain examples. The results suggest that the learned discrete quantization method achieves about 9x harder spectrogram compression compared to its continuous counterpart, while achieving similar reconstructions, both qualitatively and in terms of quantitative error metrics.

Abstract [sv]

Trots de senaste framgångarna för neurala nätverk på en rad olika områden är musikalisk ljudmodellering fortfarande en svår uppgift, med karakteristiska egenskaper som spänner över tiotusentals dimensioner i inputrymnden. Genom att formulera ljuddatakomprimering som en oövervakad inlärningsuppgift undersöker detta projekt användbarheten av vektorkvantiserade neurala nätverkbaserade självkodare på spektrogram – en bildliknande representation av ljud. Med en nyligen beskriven gradientbaserad metod för approximering av vågformer från rekonstruerade (realvärda) spektrogram, producerar den diskreta pipelinen lyssningsbara rekonstruktioner med överraskande ljudåtergivning jämfört med okomprimerade versioner, även för exempel utanför domänen. Resultaten tyder på att den lärda diskreta kvantiseringsmetoden uppnår ungefär nio gånger hårdare spektrogramkompression jämfört med sin kontinuerliga motsvarighet, samtidigt som den skapar liknande rekonstruktioner, både kvalitativt och enligt kvantitativa felmått.

Place, publisher, year, edition, pages
2019. , p. 75
Series
TRITA-EECS-EX ; 2019:649
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-264947OAI: oai:DiVA.org:kth-264947DiVA, id: diva2:1376201
External cooperation
Peltarion AB
Educational program
Master of Science in Engineering - Media Technology
Supervisors
Examiners
Available from: 2020-01-17 Created: 2019-12-09 Last updated: 2020-01-17Bibliographically approved

Open Access in DiVA

fulltext(8856 kB)57 downloads
File information
File name FULLTEXT01.pdfFile size 8856 kBChecksum SHA-512
5c4d043b6ccd2de6981c62569adfbd00ea2f4d25260e64d36eb8251d0449ba6f9876e4f4866682407e01994dce1277db73eaf4e6dae34fc4bfc6e439d7786703
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 57 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf