Change search
ReferencesLink to record
Permanent link

Direct link
Using Sub-Phonemic Units for HMM Based Phone Recognition
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications.
2013 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

A common way to construct a large vocabulary continuous speech recogniser LVCSR is to use 3 state HMMs to model phonemic units. In this dissertation the focus is to improve this standard phone model. To this end three alternative phone recognition systems will be proposed. Central in the first two systems is a set of Acoustic SubWord Units (ASWUs), which are used in order to train phone models with an extended state topology. This extended topology contains several parallel paths and allows the model to vary the amount of states that are employed for each realisation of the phones.

In the first system this topology is fixed with four parallel paths which contains one, two, three or four states. A novel training algorithm is developed in order to train each of the states properly. In the second system the number of paths and the number of states in each of the states are derived in a data driven manner using an algorithm for pronunciation variation modelling (PVM). This algorithm is applied to the set of ASWUs in order to find variations for each phones, variations which are used to decide the topologies.

The final system is a hybrid system that employs non-negative matrix factorisation (NMF), an algorithm capable of extracting latent units in a data driven manner to model the acoustic observations. This hybrid was proposed before in the literature for modelling audio mixtures. In this dissertation modifications to this original hybrid, the non-negative HMM (N-HMM), are suggested for it to be used on the speech recognition task. The main contribution is to introduce dependency on state duration for the output probability distribution functions. This modified structure is referred to as the non-negative durational HMM (NdHMM).

Place, publisher, year, edition, pages
NTNU, 2013.
Doctoral theses at NTNU, ISSN 1503-8181 ; 2013:185
National Category
Electronics Telecommunication
URN: urn:nbn:no:ntnu:diva-21176OAI: diva2:631569
Public defence
2013-06-21, 13:15
Available from: 2013-06-21 Created: 2013-06-21Bibliographically approved

Open Access in DiVA

fulltekst(1266 kB)798 downloads
File information
File name FULLTEXT01.pdfFile size 1266 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Electronics and Telecommunications

Search outside of DiVA

GoogleGoogle Scholar
Total: 798 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 60 hits
ReferencesLink to record
Permanent link

Direct link