This thesis addresses the language recognition problem with a special focus on phonotactic language recognition. A full description of different steps in a language recognition system is provided. We study state-of-the-art speech modeling techniques in language recognition that comprise phonotactic, acoustic and prosodic language modeling. A brief understanding of the state-of-the-art subspace modeling technique known as the iVector model for continuous features is given. Using recent proposals on training the iVector model for continuous features, we explain our recipe for extracting iVectors for acoustic and prosodic features that results in similar language recognition performance as the state-of-the-art results reported in the recent literature. In the next step, inspired by the intuition behind the iVector model for continuous features, we propose our iVector model for discrete features. After a general explanation of the model, adaption of the proposed model to the n-gram model that is used to extract iVectors representing the language phonotactics is given. Finally a regularized iVector extraction model for discrete features that is robust to model overfitting is proposed. The full theoretical derivation of the proposed iVector model for discrete features is given. We also explain use of discriminative and generative classifiers for training language models based on the different extracted iVectors. Effects of the iVector normalizations for binary and multi-class formulation of the used classifiers is also studied.
We report performances of our iVector model on NIST language recognition evaluation LRE2009, LRE2011 and RATS language recognition as the most recent and challenging language recognition task. Using our phonotactic iVector model, we obtain a significant improvement over our phonotactic baseline system which was a state-of-the-art system at the time of starting this thesis. Our results on NIST LRE09, NIST LRE2011 and RATS confirms superior advantage of our iVector model for discrete features compared to the other state-of-the-art phonotactic system.
NTNU: NTNU-trykk , 2014.