Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analyzing the Impact of Introducing Convolutional Layers in Variational Autoencoders
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Analys av Effekten av att Introducera Konvolutionella Lager i Variationsautoenkodare (Swedish)
Abstract [en]

Variational autoencoders (VAEs) are probabilistic models which can be used for both image denoising and image generation through latent space sampling. Convolution neutral networks (CNNs) are widely used, although to what extent convolutional layers positively affect VAEs is not always clear. Therefore, this thesis implements and compares two non-convolutional and two convolutional VAEs to quantitatively investigate some of the effects of introducing convolutional layers into VAEs for image denoising and image generation. Manual and grid search methodologies were employed to find suitable models to be used in the comparisons. The scope of this thesis is limited to small networks and a dataset containing very small homogeneous images. The results from the comparison did not show significant differences in denoising performance between the non-convolutional and convolutional models. The differences between the models were less than 0.2% across both the mean structural similarity index measure and the peak signal-to-noise ratio metrics. Furthermore, the results showed that the non-convolutional model vastly outperformed the convolutional model in image generation, with the non-convolutional scoring three times better than the convolutional according to the Fréchet inception distance metric. On the contrary, convolutional layers did seem to show benefits through a more numerically stable training and they also allowed for deeper networks while improving convergence. These benefits were especially notable in denoising task, where the convolutional model converged much faster than its counterpart. In summary, introducing of convolutional layers did not invariably enhance the performance of VAEs in neither image denoising nor image generation tasks, indicating that convolutional layers should be used with some consideration. However, they do seem to open possibilities for deeper and more stable models that could be capable of surpassing the boundaries of non-convolutional models for more complex tasks.

Abstract [sv]

Variational autoencoders (VAEs) är probabilistiska modeller som kan användas inom både brusreducering för bilder och bildgenerering genom provtagning från modellens latent space. Convolutional neural networks (CNNs) används i stor utsträckning men det är däremot oklart i vilken utsträckning som convolutional layers positivt påverkar VAEs. I denna avhandling implementeras och jämförs därför två icke-convolutional och två convolutional VAEs för att kvantitativt undersöka effekterna av convolutional layers i VAEs för brusreducering och bildgenerering. Metoderna manual search och grid search användes för att hitta lämpliga modeller till jämförelsen. Omfattningen av denna avhandling begränsas till små nätverk och träningsdata som utgörs av mycket små och homogena bilder. Resultaten från jämförelsen visade inga signifikanta prestationsskillnader mellan den icke-convolutional och den convolutional modellen inom brusreducering. Skillnaderna mellan de två modellerna var mindre än 0.2% enligt både mean structural similarity index measure och peak signal-to-noise ratio. Vidare visade resultaten att den icke-convolutional modellen presterade betydligt bättre än den convolutional i bildgenereringen. Där var skillnaderna markanta då den icke-convolutional modellen presterade mer än tre gånger bättre enligt Fréchet inception distance måttet. Däremot indikerade resultaten också fördelar med convolutional layers, så som mer numeriskt stabil träning och möjligheter att skapa djupare nätverk samtidigt som konvergensen förbättrades. Dessa fördelar var särskilt tydliga inom brusreducering, där den convolutional modellen konvergerade betydligt snabbare än sin motpart. Sammanfattningsvis tycks introduktionen av convolutional layers inte alltid förbättra prestationen hos VAEs inom varken brusreducering eller bildgenerering, vilket indikerar att convolutional layers bör används med viss eftertanke. Däremot verkar de möjligtgöra djupare och mer stabila modeller som kan vara kapabla att överträffa gränserna för icke-convolutional modeller i mer komplexa uppgifter.

Place, publisher, year, edition, pages
2024. , p. 57
Series
TRITA-EECS-EX ; 2024:346
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-351099OAI: oai:DiVA.org:kth-351099DiVA, id: diva2:1886188
Supervisors
Examiners
Available from: 2024-08-22 Created: 2024-07-30 Last updated: 2024-08-22Bibliographically approved

Open Access in DiVA

fulltext(1474 kB)137 downloads
File information
File name FULLTEXT01.pdfFile size 1474 kBChecksum SHA-512
b8dc33bf67bbafecbba394b7cf324db28b723b1622240943ecf8f6c72d658f172c9e840246a20e380f3132a4f425ee4a37332a4f539c892689a38c11453a3228
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 137 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 168 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf