Change search
ReferencesLink to record
Permanent link

Direct link
Subspace Modeling of Discrete Features for Language Recognition
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications.
2014 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis addresses the language recognition problem with a special focus on phonotactic language recognition. A full description of different steps in a language recognition system is provided. We study state-of-the-art speech modeling techniques in language recognition that comprise phonotactic, acoustic and prosodic language modeling. A brief understanding of the state-of-the-art subspace modeling technique known as the iVector model for continuous features is given. Using recent proposals on training the iVector model for continuous features, we explain our recipe for extracting iVectors for acoustic and prosodic features that results in similar language recognition performance as the state-of-the-art results reported in the recent literature. In the next step, inspired by the intuition behind the iVector model for continuous features, we propose our iVector model for discrete features. After a general explanation of the model, adaption of the proposed model to the n-gram model that is used to extract iVectors representing the language phonotactics is given. Finally a regularized iVector extraction model for discrete features that is robust to model overfitting is proposed. The full theoretical derivation of the proposed iVector model for discrete features is given. We also explain use of discriminative and generative classifiers for training language models based on the different extracted iVectors. Effects of the iVector normalizations for binary and multi-class formulation of the used classifiers is also studied.

We report performances of our iVector model on NIST language recognition evaluation LRE2009, LRE2011 and RATS language recognition as the most recent and challenging language recognition task. Using our phonotactic iVector model, we obtain a significant improvement over our phonotactic baseline system which was a state-of-the-art system at the time of starting this thesis. Our results on NIST LRE09, NIST LRE2011 and RATS confirms superior advantage of our iVector model for discrete features compared to the other state-of-the-art phonotactic system.

Place, publisher, year, edition, pages
NTNU: NTNU-trykk , 2014.
Dr. ingeniøravhandling, ISSN 0809-103X ; 2014:292
National Category
URN: urn:nbn:no:ntnu:diva-27267ISBN: 978-82-326-0496-8 (printed ver.)ISBN: 978-82-326-0497-5 (electronic ver.)OAI: diva2:765169
Public defence
2014-11-04, 13:15
Available from: 2014-11-21 Created: 2014-11-21 Last updated: 2014-11-21Bibliographically approved

Open Access in DiVA

fulltekst(1169 kB)212 downloads
File information
File name FULLTEXT01.pdfFile size 1169 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Electronics and Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 212 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 39 hits
ReferencesLink to record
Permanent link

Direct link