Automated Audio Anomaly Detection in Voice Recordings
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
When people interact over a communication system, it is essential that all parties can correctly interpret each other. Sometimes the listener cannot correctlygrasp what the speaker is saying. This can be caused for a variety of reasons, andit is important to identify the reason for the anomaly in order to avoid similar situations.
Identifying the reasons for abnormal speech can be done by manually analyzing recordings of speech interactions. This can be a very time-consuming task,and for the purpose of speeding up this process, automated anomaly detectiontools can be useful.
During this thesis project, three different anomaly detection methods hasbeen implemented to automate the detection process. The implemented methods are an autoencoder, a Local Outlier Factor model, and an Isolation Forestmodel. Short-Time Fourier Transform coefficients (STFTs) and Mel-FrequencyCepstral Coefficients (MFCCs) were extracted in order to train the models on relevant audio features. For each anomaly detection method, three instances wereimplemented, where one was based on MFCCs, another on STFTs and the lastone was based on a combination of the two features. Similarly, three instancesof a K-Nearest Neighbors (KNN) were implemented as benchmarks to which theperformance of the anomaly detection methods could be compared.
The results showed that the top performing anomaly detection method wasthe MFCC-based autoencoder, which had an accuracy of 83 %. The supervisedKNNs with their accuracies being 84, 91 and 92 %, outperformed the autoencoder. However, the performance of the autoencoder performed well enough tojustify using unsupervised learning instead of spending hours manually labelingthe entire dataset for the purpose of using supervised learning.
Place, publisher, year, edition, pages
2025. , p. 48
Keywords [en]
machine learning, anomaly detection, audio, voice, MFCC, autoencoder, LOF, IF
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-212249ISRN: LiTH-ISY-EX--25/5729--SEOAI: oai:DiVA.org:liu-212249DiVA, id: diva2:1944545
External cooperation
FOI
Subject / course
Computer Engineering
Supervisors
Examiners
2025-03-192025-03-142025-03-19Bibliographically approved