Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Prediction Based Filtering and Smoothing to Exploit Temporal Dependencies in NMF
KTH, School of Electrical Engineering (EES), Communication Theory.
University of Illinois at Urbana-Champaign.
KTH, School of Electrical Engineering (EES), Communication Theory.
2013 (English)In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE Signal Processing Society, 2013, 873-877 p.Conference paper, Published paper (Refereed)
Abstract [en]

Nonnegative matrix factorization is an appealing technique for many audio applications. However, in it's basic form it does not use temporal structure, which is an important source of information in speech processing. In this paper, we propose NMF-based filtering and smoothing algorithms that are related to Kalman filtering and smoothing. While our prediction step is similar to that of Kalman filtering, we develop a multiplicative update step which is more convenient for nonnegative data analysis and in line with existing NMF literature. The proposed smoothing approach introduces an unavoidable processing delay, but the filtering algorithm does not and can be readily used for on-line applications. Our experiments using the proposed algorithms show a significant improvement over the baseline NMF approaches. In the case of speech denoising with factory noise at 0 dB input SNR, the smoothing algorithm outperforms NMF with 3.2 dB in SDR and around 0.5 MOS in PESQ, likewise source separation experiments result in improved performance due to taking advantage of the temporal regularities in speech.

Place, publisher, year, edition, pages
IEEE Signal Processing Society, 2013. 873-877 p.
Series
IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings, ISSN 1520-6149
Keyword [en]
Nonnegative matrix factorization (NMF), Probabilistic latent component analysis (PLCA), Prediction, Temporal dependencies.
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-119317DOI: 10.1109/ICASSP.2013.6637773ISI: 000329611501008Scopus ID: 2-s2.0-84890522039ISBN: 978-147990356-6 (print)OAI: oai:DiVA.org:kth-119317DiVA: diva2:610994
Conference
2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013; Vancouver, BC; Canada; 26 May 2013 through 31 May 2013
Note

QC 20140224

Available from: 2013-03-13 Created: 2013-03-13 Last updated: 2014-02-24Bibliographically approved
In thesis
1. Speech Enhancement Using Nonnegative MatrixFactorization and Hidden Markov Models
Open this publication in new window or tab >>Speech Enhancement Using Nonnegative MatrixFactorization and Hidden Markov Models
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Reducing interference noise in a noisy speech recording has been a challenging task for many years yet has a variety of applications, for example, in handsfree mobile communications, in speech recognition, and in hearing aids. Traditional single-channel noise reduction schemes, such as Wiener filtering, do not work satisfactorily in the presence of non-stationary background noise. Alternatively, supervised approaches, where the noise type is known in advance, lead to higher-quality enhanced speech signals. This dissertation proposes supervised and unsupervised single-channel noise reduction algorithms. We consider two classes of methods for this purpose: approaches based on nonnegative matrix factorization (NMF) and methods based on hidden Markov models (HMM).

 The contributions of this dissertation can be divided into three main (overlapping) parts. First, we propose NMF-based enhancement approaches that use temporal dependencies of the speech signals. In a standard NMF, the important temporal correlations between consecutive short-time frames are ignored. We propose both continuous and discrete state-space nonnegative dynamical models. These approaches are used to describe the dynamics of the NMF coefficients or activations. We derive optimal minimum mean squared error (MMSE) or linear MMSE estimates of the speech signal using the probabilistic formulations of NMF. Our experiments show that using temporal dynamics in the NMF-based denoising systems improves the performance greatly. Additionally, this dissertation proposes an approach to learn the noise basis matrix online from the noisy observations. This relaxes the assumption of an a-priori specified noise type and enables us to use the NMF-based denoising method in an unsupervised manner. Our experiments show that the proposed approach with online noise basis learning considerably outperforms state-of-the-art methods in different noise conditions.

 Second, this thesis proposes two methods for NMF-based separation of sources with similar dictionaries. We suggest a nonnegative HMM (NHMM) for babble noise that is derived from a speech HMM. In this approach, speech and babble signals share the same basis vectors, whereas the activation of the basis vectors are different for the two signals over time. We derive an MMSE estimator for the clean speech signal using the proposed NHMM. The objective evaluations and performed subjective listening test show that the proposed babble model and the final noise reduction algorithm outperform the conventional methods noticeably. Moreover, the dissertation proposes another solution to separate a desired source from a mixture with arbitrarily low artifacts.

 Third, an HMM-based algorithm to enhance the speech spectra using super-Gaussian priors is proposed. Our experiments show that speech discrete Fourier transform (DFT) coefficients have super-Gaussian rather than Gaussian distributions even if we limit the speech data to come from a specific phoneme. We derive a new MMSE estimator for the speech spectra that uses super-Gaussian priors. The results of our evaluations using the developed noise reduction algorithm support the super-Gaussianity hypothesis.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. xiv, 52 p.
Series
Trita-EE, ISSN 1653-5146 ; 2013:030
Keyword
Speech enhancement, noise reduction, nonnegative matrix factorization, hidden Markov model, probabilistic latent component analysis, online dictionary learning, super-Gaussian distribution, MMSE estimator, temporal dependencies, dynamic NMF
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-124642 (URN)978-91-7501-833-1 (ISBN)
Public defence
2013-10-18, Lecture Room F3, Lindstedtsvägen 26, KTH, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20130916

Available from: 2013-09-16 Created: 2013-07-24 Last updated: 2013-10-09Bibliographically approved

Open Access in DiVA

fulltext(151 kB)133 downloads
File information
File name FULLTEXT01.pdfFile size 151 kBChecksum SHA-512
24420df7ccd9c305d0a16d8db1f3c85d3af88bcb31e303936f0118f06f3a40ba11f780415125259a8187fecfc5da470afe30aeb119384f3f3868f9ea4bf1ebfd
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Mohammadiha, NasserLeijon, Arne
By organisation
Communication Theory
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 133 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 196 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf