Change search
ReferencesLink to record
Permanent link

Direct link
Design of Detectors for Automatic Speech Recognition
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications. (Speech Technology)
2012 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis presents methods and results for optimizing subword detectors in continuous speech. Speech detectors are useful within areas like detection-based ASR, pronunciation training, phonetic analysis, word spotting, etc. Firstly, we propose a structure suitable for subword detection. This structure is based on the standard HMM framework, but in each detector the MFCC feature extractor and the models are trained for the specific detection problem. Our experiments in the TIMIT database validate the effectiveness of this structure for detection of phones and articulatory features.

Secondly, two discriminative training techniques are proposed for detector training. The first one is a modification of Minimum Classification Error training. The second one, Minimum Detection Error training, is the adaptation of Minimum Phone Error to the detection problem. Both methods are used to train HMMs and filterbanks in the detectors, isolated or jointly. MDE has the advantage that any detection performance criterion can be optimized directly. F-score and class accuracy optimization experiments show that MDE training is superior to the MCE-based method.

The optimized filterbanks reflect some acoustical properties of the detection classes. Moreover, some changes are consistent over classes with similar acoustical properties. In addition, MDE-training of filterbanks results in filters significatively different than in the standard filterbank. In fact, some filters extract information from different critical bands.

Finally, we propose a detection-based automatic speech recognition system. Detectors are built with the proposed HMM-based detection structure and trained discriminatively. The linguistic merger is based on an MLP/Viterbi decoder.

Place, publisher, year, edition, pages
NTNU: NTNU-trykk , 2012. , 135 p.
Doctoral Theses at NTNU, ISSN 1503-8181 ; 2012:36
Keyword [en]
Speech Recognition Detector Filterbank
National Category
URN: urn:nbn:no:ntnu:diva-16548ISBN: 978-82-471-3336-1 (printed ver.)ISBN: 978-82-471-3337-8 (electronic ver.)OAI: diva2:528775
Public defence
2012-05-25, 00:00
Available from: 2012-06-12 Created: 2012-05-28 Last updated: 2012-06-12Bibliographically approved

Open Access in DiVA

fulltext(775 kB)353 downloads
File information
File name FULLTEXT02.pdfFile size 775 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Electronics and Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 353 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 115 hits
ReferencesLink to record
Permanent link

Direct link