Open this publication in new window or tab >>2013 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 9, p. 1777-1790Article in journal (Refereed) Published
Abstract [en]
Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences (Delta LSF) can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loeve transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter-and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost.
Keywords
Beta distribution, bounded support distribution, Dirichlet distribution, line spectral frequency, mixture modelling, neutrality property, Speech coding, vector quantization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-47406 (URN)10.1109/TASL.2013.2238732 (DOI)000321906500001 ()2-s2.0-84880522911 (Scopus ID)
Note
QC 20130815. Updated from submitted to published.
2011-11-082011-11-082024-03-18Bibliographically approved