Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Advancements in Agriculture: Multimodal Deep Learning for Enhanced Plant Identification
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Plant identification is a critical task in agriculture and botany, traditionally performed manually. Recently, automated plant identification methods driven by deep learning (DL) have drawn the attention of researchers due to their solid performance and ability to eliminate manual feature engineering. However, these methods often rely on a single data source, failing to capture the full diversity of plant species. Multimodal fusion addresses this limitation, but determining an optimal fusion approach remains challenging and depends on human expertise. While advancements in automated neural architecture search are promising, their application in plant classification has been limited to seeking out unimodal architectures.

This study addresses this challenge by constructing a multimodal DL model for plant classification leveraging images of four plant organs – flower, leaf, fruit, and stem. The goal is to identify the optimal fusion architecture in an automated manner utilizing the multimodal fusion architecture search algorithm (MFAS). Thus, the research question is, ”How effective is a multimodal deep learning model, utilizing images of plant organs fused via a multimodal neural architecture search algorithm, in automating plant identification?”

The research is guided by the design science framework and utilizes 956 classes from the PlantCLEF2015 dataset. Initially, transfer learning with the MobileNetV3Small model is employed to train a unimodal model for each plant organ. Subsequently, the layers of these four models are fused using MFAS. The best fusion architecture undergoes end-to-end training alongside hyperparameter tuning, achieving 83.48% accuracy, surpassing state-of-the-art models. It outperforms late fusion by 11.07% and demonstrates robustness to missing modalities, underscoring the effectiveness of our approach.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Plant Identification, Plant Classification, Multimodal Learning, Fusion Automation, Multimodal Fusion Architecture Search, Neural Architecture Search, Deep Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242632OAI: oai:DiVA.org:su-242632DiVA, id: diva2:1955523
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(7899 kB)26 downloads
File information
File name FULLTEXT01.pdfFile size 7899 kBChecksum SHA-512
5ca3e313c0a40cd8dd3e415929839c6cafd2dd8f81d9410d7f166aaea6ca9f1c6b3e8dd696b72340c6529a9ae14c68700bd7c393b451a0f86741b4e09e365513
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Lapkovskis, AlfredsNefedova, Natalia
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 26 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 24 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf