Change search
ReferencesLink to record
Permanent link

Direct link
Bayesian learning of Gaussian mixtures: Variational "over-pruning" revisited
KTH, School of Electrical Engineering (EES), Communication Theory.
2013 (English)Report (Other academic)
Abstract [en]

This study reconsiders two simple toy data examples proposed by MacKay (2001) to illustrate what he called “symmetry-breaking” and inappropriate “over-pruning” by the variational inference (VI) approximation in Bayesian learning of probabilistic mixture models.

The exact Bayesian solution is derived formally, including the effects of parameter values in the prior distribution of mixture weights. The exact solution is then compared to the results of VI approximation.

In both toy examples both the exact solution and the VI approxi- mation normally assigned each data cluster entirely to its own mixture component. In both methods the number of active mixture components is normally the same as the number of data clusters. In this sense, the VI approach causes no “over-pruning”. In one extreme example with two clusters with only 1 and 3 samples, and very small parameter values in the prior Dirichlet distribution of mixture weights, the exact Bayesian solution assigned all samples to the same component, i.e., with “over-pruning”, whereas the VI approximation still converged to a solution using both mixture components, i.e., with no “over-pruning”. Thus, if inappropriate over-pruning occurs, it is probably caused by inappropriate selection of prior model parameters, and not by the VI approach.

The VI approximation shows “symmetry-breaking” because it converges to one of the arbitrary and equivalent permutations of the indices of mixture components. The “symmetric” exact solution formally in- cludes all these permutations, but this is precisely what makes the exact Bayesian solution computationally impractical. Thus, in these toy examples, we must conclude that “symmetry-breaking” is not the same thing as “over-pruning”. The VI approximation shows “symmetry-breaking” but no “over-pruning”.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. , 29 p.
Trita-EE, ISSN 1653-5146 ; 2013:032
Keyword [en]
Machine learning; Bayesian; Variational
National Category
URN: urn:nbn:se:kth:diva-125832OAI: diva2:640979

QC 20130816

Available from: 2013-08-15 Created: 2013-08-15 Last updated: 2013-08-16Bibliographically approved

Open Access in DiVA

fulltext(1749 kB)136 downloads
File information
File name FULLTEXT01.pdfFile size 1749 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Leijon, Arne
By organisation
Communication Theory

Search outside of DiVA

GoogleGoogle Scholar
Total: 136 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 157 hits
ReferencesLink to record
Permanent link

Direct link