Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sequence Models for Speech and Music Detection in Radio Broadcast
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Speech and Music detection is an important meta-data extraction step for radio broadcasters. It provides them with a good time-stamping of the audio, including parts where speech and music overlap. This task has important applications in royalty collection in broadcast audio for instance, which is the use case for this particular study.

The study is focused on deep neural network architectures made to process sequential data such as recurrent neural networks or convolutional architectures for sequential learning. Different architectures that have not yet been applied for this task are evaluated and compared with a state-of-the-art architecture (Bidirectional Long Short-Term Memory). Moreover, different strategies to take advantage of both low and high-quality datasets are evaluated.

The study shows that Temporal Convolution Network (TCN) architectures can outperform state-of-the-art architectures, and that especially non-causal TCNs lead to a significant improvement in the accuracy. The code used for this study has been made available on GitHub.

Abstract [sv]

Taloch musikdetektion är ett viktigt steg för att extrahera metadata för radiobolag. Det ger dem en bra tidsstämpling av ljudet inklusive de delar där tal och musik överlappar varandra. Tillämpningen är viktig vid insamling av royalties för radiosändningar vilket är användningsfallet för den här studien.

Studien är inriktad på djupa neurala nätverksarkitekturer, Deep Neural Networks (DNN), gjorda för att behandla sekventiell data som Recurrent Neural Networks (RNN) eller faltningsarkitekturer för sekventiell inlärning. Olika arkitekturer som ännu inte har tillämpats för denna uppgift utvärderas och jämförs med en state-of-the-art-arkitektur (Bidirectional Long Short-Term Memory). Dessutom utvärderas olika strategier för att utnyttja både lågoch högkvalitativa dataset.

Studien visar att arkitekturerna för Temporal Convolution Network (TCN) kan överträffa state-of-the-art-arkitekturer, och att speciellt icke-kausala TCN leder till en signifikant förbättring av noggrannheten. Koden som används för denna studie finns tillgänglig på GitHub.

Place, publisher, year, edition, pages
2019. , p. 14
Series
TRITA-EECS-EX ; 2019:86
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-251011OAI: oai:DiVA.org:kth-251011DiVA, id: diva2:1314164
External cooperation
Sveriges Radio
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2019-05-15 Created: 2019-05-07 Last updated: 2019-05-15Bibliographically approved

Open Access in DiVA

fulltext(3007 kB)57 downloads
File information
File name FULLTEXT01.pdfFile size 3007 kBChecksum SHA-512
0c1fec8e61d27d7fa0a422f5fc9a8b56ce9ce852b5512eea2dfe6f98594b152f2e711c04268966faeb08094f919f5577f45bfc4f40e072459a9561c787272f5d
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 57 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 204 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf