Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Speech Enhancement Using Nonnegative MatrixFactorization and Hidden Markov Models
KTH, School of Electrical Engineering (EES), Communication Theory.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM).

 The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal correlations between consecutive short-time frames are ignored. We propose both continuous and discrete state-space nonnegative dynamical models. These approaches are used to describe the dynamics of the NMF coefficients or activations. We derive optimal minimum mean squared error (MMSE) or linear MMSE estimates of the speech signal using the probabilistic formulations of NMF. Our experiments show that using temporal dynamics in the NMF-based denoising systems improves the performance greatly. Additionally, this dissertation proposes an approach to learn the noise basis matrix online from the noisy observations. This relaxes the assumption of an a-priori specified noise type and enables us to use the NMF-based denoising method in an unsupervised manner. Our experiments show that the proposed approach with online noise basis learning considerably outperforms state-of-the-art methods in different noise conditions.

 Second, this thesis proposes two methods for NMF-based separation of sources with similar dictionaries. We suggest a nonnegative HMM (NHMM) for babble noise that is derived from a speech HMM. In this approach, speech and babble signals share the same basis vectors, whereas the activation of the basis vectors are different for the two signals over time. We derive an MMSE estimator for the clean speech signal using the proposed NHMM. The objective evaluations and performed subjective listening test show that the proposed babble model and the final noise reduction algorithm outperform the conventional methods noticeably. Moreover, the dissertation proposes another solution to separate a desired source from a mixture with arbitrarily low artifacts.

 Third, an HMM-based algorithm to enhance the speech spectra using super-Gaussian priors is proposed. Our experiments show that speech discrete Fourier transform (DFT) coefficients have super-Gaussian rather than Gaussian distributions even if we limit the speech data to come from a specific phoneme. We derive a new MMSE estimator for the speech spectra that uses super-Gaussian priors. The results of our evaluations using the developed noise reduction algorithm support the super-Gaussianity hypothesis.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. , xiv, 52 p.
Series
Trita-EE, ISSN 1653-5146 ; 2013:030
Keyword [en]
Speech enhancement, noise reduction, nonnegative matrix factorization, hidden Markov model, probabilistic latent component analysis, online dictionary learning, super-Gaussian distribution, MMSE estimator, temporal dependencies, dynamic NMF
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-124642ISBN: 978-91-7501-833-1 (print)OAI: oai:DiVA.org:kth-124642DiVA: diva2:637931
Public defence
2013-10-18, Lecture Room F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20130916

Available from: 2013-09-16 Created: 2013-07-24 Last updated: 2013-10-09Bibliographically approved
List of papers
1. A New Linear MMSE Filter for Single Channel Speech Enhancement Based on Nonnegative Matrix Factorization
Open this publication in new window or tab >>A New Linear MMSE Filter for Single Channel Speech Enhancement Based on Nonnegative Matrix Factorization
2011 (English)In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2011, IEEE , 2011Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, a linear MMSE filter is derived for single-channelspeech enhancement which is based on Nonnegative Matrix Factorization(NMF). Assuming an additive model for the noisy observation,an estimator is obtained by minimizing the mean square errorbetween the clean speech and the estimated speech components inthe frequency domain. In addition, the noise power spectral density(PSD) is estimated using NMF and the obtained noise PSD is usedin a Wiener filtering framework to enhance the noisy speech. Theresults of the both algorithms are compared to the result of the sameWiener filtering framework in which the noise PSD is estimatedusing a recently developed MMSE-based method. NMF based approachesoutperform the Wiener filter with the MMSE-based noisePSD tracker for different measures. Compared to the NMF-basedWiener filtering approach, Source to Distortion Ratio (SDR) is improvedfor the evaluated noise types for different input SNRs usingthe proposed linear MMSE filter.

Place, publisher, year, edition, pages
IEEE, 2011
Keyword
Speech enhancement, nonnegative matrix factorization, Linear MMSE filter
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-42589 (URN)10.1109/ASPAA.2011.6082303 (DOI)000298302900012 ()2-s2.0-83455182002 (Scopus ID)
Conference
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. New Paltz, NY. October 16-19 2011
Projects
AUDIS
Funder
EU, European Research Council, 2008-214699
Note

QC 20111116

Available from: 2012-10-31 Created: 2011-10-11 Last updated: 2013-09-16Bibliographically approved
2. Supervised and unsupervised speech enhancement using nonnegative matrix factorization
Open this publication in new window or tab >>Supervised and unsupervised speech enhancement using nonnegative matrix factorization
2013 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 10, 2140-2151 p.Article in journal (Refereed) Published
Abstract [en]

Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e. g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013
Keyword
Nonnegative matrix factorization (NMF), speech enhancement, PLCA, HMM, Bayesian Inference
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-124353 (URN)10.1109/TASL.2013.2270369 (DOI)000322334900013 ()2-s2.0-84881053943 (Scopus ID)
Note

QC 20130905

Available from: 2013-06-28 Created: 2013-06-28 Last updated: 2017-12-06Bibliographically approved
3. Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement
Open this publication in new window or tab >>Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement
2013 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 21, no 5, 998-1011 p.Article in journal (Refereed) Published
Abstract [en]

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms,e.g. noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, thefact that the babble waveform is generated as a sum of N different speech waveforms is not exploitedexplicitly. In this paper, first we develop a gamma hidden Markov model for power spectra of the speechsignal, and then formulate it as a sparse nonnegative matrix factorization (NMF). Second, the sparse NMFis extended by relaxing the sparsity constraint, and a novel model for babble noise (gamma nonnegativeHMM) is proposed in which the babble basis matrix is the same as the speech basis matrix, and only theactivation factors (weights) of the basis vectors are different for the two signals over time. Finally, a noisereduction algorithm is proposed using the derived speech and babble models. All of the stationary modelparameters are estimated using the expectation-maximization (EM) algorithm, whereas the time-varyingparameters, i.e. the gain parameters of speech and babble signals, are estimated using a recursive EMalgorithm. The objective and subjective listening evaluations show that the proposed babble model andthe final noise reduction algorithm significantly outperform the conventional methods.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013
Keyword
Babble noise, hidden Markov model, nonnegative matrix factorization, speech enhancement
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-116767 (URN)10.1109/TASL.2013.2243435 (DOI)000315287500003 ()2-s2.0-84873897366 (Scopus ID)
Note

QC 20130219

Available from: 2013-02-19 Created: 2013-01-26 Last updated: 2017-12-06Bibliographically approved
4. Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors
Open this publication in new window or tab >>Spectral Domain Speech Enhancement Using HMM State-Dependent Super-Gaussian Priors
2013 (English)In: IEEE Signal Processing Letters, ISSN 1070-9908, E-ISSN 1558-2361, Vol. 20, no 3, 253-256 p.Article in journal (Refereed) Published
Abstract [en]

The derivation of MMSE estimators for the DFT coefficients of speech signals, given an observed noisy signal and super-Gaussian prior distributions, has received a lot of interest recently. In this letter, we look at the distribution of the periodogram coefficients of different phonemes, and show that they have a gamma distribution with shape parameters less than one. This verifies that the DFT coefficients for not only the whole speech signal but also for individual phonemes have super-Gaussian distributions. We develop a spectral domain speech enhancement algorithm, and derive hidden Markov model (HMM) based MMSE estimators for speech periodogram coefficients under this gamma assumption in both a high uniform resolution and a reduced-resolution Mel domain. The simulations show that the performance is improved using a gamma distribution compared to the exponential case. Moreover, we show that, even though beneficial in some aspects, the Mel-domain processing does not lead to better results than the algorithms in the high-resolution domain.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013
Keyword
HMM, speech enhancement, super-Gaussian pdf
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-118490 (URN)10.1109/LSP.2013.2242467 (DOI)000314828600002 ()2-s2.0-84873620144 (Scopus ID)
Note

QC 20130221

Available from: 2013-02-21 Created: 2013-02-19 Last updated: 2017-12-06Bibliographically approved
5. Prediction Based Filtering and Smoothing to Exploit Temporal Dependencies in NMF
Open this publication in new window or tab >>Prediction Based Filtering and Smoothing to Exploit Temporal Dependencies in NMF
2013 (English)In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Signal Processing Society, 2013, 873-877 p.Conference paper, Published paper (Refereed)
Abstract [en]

Nonnegative matrix factorization is an appealing technique for many audio applications. However, in it's basic form it does not use temporal structure, which is an important source of information in speech processing. In this paper, we propose NMF-based filtering and smoothing algorithms that are related to Kalman filtering and smoothing. While our prediction step is similar to that of Kalman filtering, we develop a multiplicative update step which is more convenient for nonnegative data analysis and in line with existing NMF literature. The proposed smoothing approach introduces an unavoidable processing delay, but the filtering algorithm does not and can be readily used for on-line applications. Our experiments using the proposed algorithms show a significant improvement over the baseline NMF approaches. In the case of speech denoising with factory noise at 0 dB input SNR, the smoothing algorithm outperforms NMF with 3.2 dB in SDR and around 0.5 MOS in PESQ, likewise source separation experiments result in improved performance due to taking advantage of the temporal regularities in speech.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013
Series
IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings, ISSN 1520-6149
Keyword
Nonnegative matrix factorization (NMF), Probabilistic latent component analysis (PLCA), Prediction, Temporal dependencies.
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-119317 (URN)10.1109/ICASSP.2013.6637773 (DOI)000329611501008 ()2-s2.0-84890522039 (Scopus ID)978-147990356-6 (ISBN)
Conference
2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013; Vancouver, BC; Canada; 26 May 2013 through 31 May 2013
Note

QC 20140224

Available from: 2013-03-13 Created: 2013-03-13 Last updated: 2014-02-24Bibliographically approved
6. Low-artifact Source Separation Using Probabilistic Latent Component Analysis
Open this publication in new window or tab >>Low-artifact Source Separation Using Probabilistic Latent Component Analysis
2013 (English)In: 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Signal Processing Society, 2013, 6701837- p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a method based on the probabilistic latent componentanalysis (PLCA) in which we use exponential distributions as priorsto decrease the activity level of a given basis vector. A straightforwardapplication of this method is when we try to extract a desiredsource from a mixture with low artifacts. For this purpose, we proposea maximum a posteriori (MAP) approach to identify the commonbasis vectors between two sources. A low-artifact estimate cannow be obtained by using a constraint such that the common basisvectors in the interfering signal’s dictionary tend to remain inactive.We discuss applications of this method in source separationwith similar-gender speakers and in enhancing a speech signal thatis contaminated with babble noise. Our simulations show that theproposed method not only reduces the artifacts but also increasesthe overall quality of the estimated signal.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013
Series
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, ISSN 1931-1168
Keyword
Source Separation, Nonnegative Matrix Factorization (NMF), PLCA, Dictionary Learning, Artifact Reduction
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-124641 (URN)10.1109/WASPAA.2013.6701837 (DOI)000349479800029 ()2-s2.0-84893559774 (Scopus ID)978-147990972-8 (ISBN)
Conference
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 20 Oct - 23 Oct 2013,New Paltz, New York, U.S.A
Note

QC 20130724

Available from: 2013-07-24 Created: 2013-07-24 Last updated: 2015-12-07Bibliographically approved

Open Access in DiVA

Thesis(785 kB)1386 downloads
File information
File name FULLTEXT01.pdfFile size 785 kBChecksum SHA-512
b1a0ef8609872dc82d799e5cb99c7ebf56f066abaf31fc52c09bae39d4394377946dd0b5dd6ebc400d3e4735c1d071bba50f62e08808dff34e46a07d50e8d8c2
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Mohammadiha, Nasser
By organisation
Communication Theory
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 1386 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 658 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf