Change search
ReferencesLink to record
Permanent link

Direct link
Spoken Document Classification of Broadcast News
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications.
2012 (English)MasteroppgaveStudent thesis
Abstract [en]

Two systems for spoken document classification are implemented by combining an automatic speech recognizer with the two classification algorithms naive Bayes and logistic regression. The focus is on how to handle the inherent uncertainty in the output of the speech recognizer. Feature extraction is performed by computing expected word counts from speech recognition lattices, and subsequently removing words that are found to carry little or noisy information about the topic label, as determined by the information gain metric. The systems are evaluated by performing cross-validation on broadcast news stories, and the classification accuracy is measured with different configurations and on recognition output with different word error rates. The results show that a relatively high classification accuracy can be obtained with word error rates around 50%, and that the benefit of extracting features from lattices instead of 1-best transcripts increases with increasing word error rates.

Place, publisher, year, edition, pages
Institutt for elektronikk og telekommunikasjon , 2012. , 51 p.
Keyword [no]
ntnudaim:7911, MTKOM kommunikasjonsteknologi, Signalbehandling og kommunikasjon
URN: urn:nbn:no:ntnu:diva-19226Local ID: ntnudaim:7911OAI: diva2:566519
Available from: 2012-11-08 Created: 2012-11-08

Open Access in DiVA

fulltext(1227 kB)209 downloads
File information
File name FULLTEXT01.pdfFile size 1227 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)11 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf

By organisation
Department of Electronics and Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 209 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 36 hits
ReferencesLink to record
Permanent link

Direct link