Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Audio Anomaly Detection in Voice Recordings
Linköping University, Department of Electrical Engineering.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

When people interact over a communication system, it is essential that all parties can correctly interpret each other. Sometimes the listener cannot correctlygrasp what the speaker is saying. This can be caused for a variety of reasons, andit is important to identify the reason for the anomaly in order to avoid similar situations.

Identifying the reasons for abnormal speech can be done by manually analyzing recordings of speech interactions. This can be a very time-consuming task,and for the purpose of speeding up this process, automated anomaly detectiontools can be useful.

During this thesis project, three different anomaly detection methods hasbeen implemented to automate the detection process. The implemented methods are an autoencoder, a Local Outlier Factor model, and an Isolation Forestmodel. Short-Time Fourier Transform coefficients (STFTs) and Mel-FrequencyCepstral Coefficients (MFCCs) were extracted in order to train the models on relevant audio features. For each anomaly detection method, three instances wereimplemented, where one was based on MFCCs, another on STFTs and the lastone was based on a combination of the two features. Similarly, three instancesof a K-Nearest Neighbors (KNN) were implemented as benchmarks to which theperformance of the anomaly detection methods could be compared.

The results showed that the top performing anomaly detection method wasthe MFCC-based autoencoder, which had an accuracy of 83 %. The supervisedKNNs with their accuracies being 84, 91 and 92 %, outperformed the autoencoder. However, the performance of the autoencoder performed well enough tojustify using unsupervised learning instead of spending hours manually labelingthe entire dataset for the purpose of using supervised learning.

Place, publisher, year, edition, pages
2025. , p. 48
Keywords [en]
machine learning, anomaly detection, audio, voice, MFCC, autoencoder, LOF, IF
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-212249ISRN: LiTH-ISY-EX--25/5729--SEOAI: oai:DiVA.org:liu-212249DiVA, id: diva2:1944545
External cooperation
FOI
Subject / course
Computer Engineering
Supervisors
Examiners
Available from: 2025-03-19 Created: 2025-03-14 Last updated: 2025-03-19Bibliographically approved

Open Access in DiVA

fulltext(1245 kB)104 downloads
File information
File name FULLTEXT01.pdfFile size 1245 kBChecksum SHA-512
8065681841113e8b18aa1e5bc2a4970ee1e33d61cc915df48a78138fcaeab7ba3f17e419a57071fbca40594d06da6376bc44b78c82b446d95890e3d6aec62fae
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
de Brun Mangs, William
By organisation
Department of Electrical Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 105 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 508 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf