Advancements in Agriculture: Multimodal Deep Learning for Enhanced Plant Identification
2024 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Plant identification is a critical task in agriculture and botany, traditionally performed manually. Recently, automated plant identification methods driven by deep learning (DL) have drawn the attention of researchers due to their solid performance and ability to eliminate manual feature engineering. However, these methods often rely on a single data source, failing to capture the full diversity of plant species. Multimodal fusion addresses this limitation, but determining an optimal fusion approach remains challenging and depends on human expertise. While advancements in automated neural architecture search are promising, their application in plant classification has been limited to seeking out unimodal architectures.
This study addresses this challenge by constructing a multimodal DL model for plant classification leveraging images of four plant organs – flower, leaf, fruit, and stem. The goal is to identify the optimal fusion architecture in an automated manner utilizing the multimodal fusion architecture search algorithm (MFAS). Thus, the research question is, ”How effective is a multimodal deep learning model, utilizing images of plant organs fused via a multimodal neural architecture search algorithm, in automating plant identification?”
The research is guided by the design science framework and utilizes 956 classes from the PlantCLEF2015 dataset. Initially, transfer learning with the MobileNetV3Small model is employed to train a unimodal model for each plant organ. Subsequently, the layers of these four models are fused using MFAS. The best fusion architecture undergoes end-to-end training alongside hyperparameter tuning, achieving 83.48% accuracy, surpassing state-of-the-art models. It outperforms late fusion by 11.07% and demonstrates robustness to missing modalities, underscoring the effectiveness of our approach.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Plant Identification, Plant Classification, Multimodal Learning, Fusion Automation, Multimodal Fusion Architecture Search, Neural Architecture Search, Deep Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242632OAI: oai:DiVA.org:su-242632DiVA, id: diva2:1955523
2025-04-302025-04-30