Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Context-aware learning for adaptive vision-based systems
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).ORCID iD: 0000-0002-9464-7010
2025 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis shows our investigation on scene understanding and object detection for surveillance applications, emphasizing context-aware computer vision models that enhance detection accuracy in complex environments while respecting privacy considerations. The research advances object detection by addressing key aspects such as variability across environments, contextual information, and multimodal data fusion. Through a comprehensive literature review, we examines the role of contextual information, such as spatial, scale, and temporal context, in improving detection performance. Furthermore, we introduce specialized object detection models designed for indoor and outdoor environments, demonstrating howscene-specific training enhances detection accuracy. We also explore hierarchical scene classification, analyzing how different levels contribute to scene recognition. Lastly, a multimodal fall detection method integrating video and audio is proposed, overcoming limitations of purely visual systems in obstructed or low-visibility conditions. The findings of all papers highlight the effectiveness of scene context, hierarchical classification, and multimodal fusion in developing robust, high-accuracy surveillance models suitable for real-world environments. 

Place, publisher, year, edition, pages
Malmö: Malmö University Press, 2025. , p. 35
Series
Studies in Computer Science ; 34
Keywords [en]
Object detection, Scene classification, Vision based systems, Multimodal learning, Context-aware learning
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:mau:diva-75404DOI: 10.24834/isbn.9789178776238ISBN: 978-91-7877-622-1 (print)ISBN: 978-91-7877-623-8 (electronic)OAI: oai:DiVA.org:mau-75404DiVA, id: diva2:1952119
Presentation
2025-04-24, B1, Niagara, Malmö University, Malmö, 10:00 (English)
Opponent
Supervisors
Available from: 2025-04-16 Created: 2025-04-14 Last updated: 2025-04-17Bibliographically approved
List of papers
1. Context in object detection: a systematic literature review
Open this publication in new window or tab >>Context in object detection: a systematic literature review
Show others...
2025 (English)In: Artificial Intelligence Review, ISSN 0269-2821, E-ISSN 1573-7462, Vol. 58, no 6, article id 175Article in journal (Refereed) Published
Abstract [en]

Context is an important factor in computer vision as it offers valuable information to clarify and analyze visual data. Utilizing the contextual information inherent in an image or a video can improve the precision and effectiveness of object detectors. For example, where recognizing an isolated object might be challenging, context information can improve comprehension of the scene. This study explores the impact of various context-based approaches to object detection. Initially, we investigate the role of context in object detection and survey it from several perspectives. We then review and discuss the most recent context-based object detection approaches and compare them. Finally, we conclude by addressing research questions and identifying gaps for further studies. More than 265 publications are included in this survey, covering different aspects of context in different categories of object detection, including general object detection, video object detection, small object detection, camouflaged object detection, zero-shot, one-shot, and few-shot object detection. This literature review presents a comprehensive overview of the latest advancements in context-based object detection, providing valuable contributions such as a thorough understanding of contextual information and effective methods for integrating various context types into object detection, thus benefiting researchers.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Computer vision, Context, Contextual information, Object detection, Object recognition
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:mau:diva-75029 (URN)10.1007/s10462-025-11186-x (DOI)001448979900001 ()2-s2.0-105000389895 (Scopus ID)
Available from: 2025-04-01 Created: 2025-04-01 Last updated: 2025-04-14Bibliographically approved
2. Specialized Indoor and Outdoor Scene-specific Object Detection Models
Open this publication in new window or tab >>Specialized Indoor and Outdoor Scene-specific Object Detection Models
Show others...
2024 (English)In: Sixteenth International Conference on Machine Vision (ICMV 2023) / [ed] Osten, Wolfgang, 2024Conference paper, Published paper (Refereed)
Abstract [en]

Object detection is a critical task in computer vision with applications across various domains, ranging from autonomous driving to surveillance systems. Despite extensive research on improving the performance of object detection systems, identifying all objects in different places remains a challenge. The traditional object detection approaches focus primarily on extracting and analyzing visual features without considering the contextual information about the places of objects. However, entities in many real-world scenarios closely relate to their surrounding environment, providing crucial contextual cues for accurate detection. This study investigates the importance and impact of places of images (indoor and outdoor) on object detection accuracy. To this purpose, we propose an approach that first categorizes images into two distinct categories: indoor and outdoor. We then train and evaluate three object detection models (indoor, outdoor, and general models) based on YOLOv5 and 19 classes of the PASCAL VOC dataset and 79 classes of COCO dataset that consider places. The experimental evaluations show that the specialized indoor and outdoor models have higher mAP (mean Average Precision) to detect objects in specific environments compared to the general model that detects objects found both indoors and outdoors. Indeed, the network can detect objects more accurately in similar places with common characteristics due to semantic relationships between objects and their surroundings, and the network’s misdetection is diminished. All the results were analyzed statistically with t-tests.

Series
Proceedings of SPIE, ISSN 0277-786X, E-ISSN 1996-756X ; 13072
Keywords
object detection, YOLOv5, indoor object detection, outdoor object detection, scene classification
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:mau:diva-66441 (URN)10.1117/12.3023479 (DOI)001208308300024 ()2-s2.0-85191658757 (Scopus ID)9781510674622 (ISBN)9781510674639 (ISBN)
Conference
International Conference on Machine Vision (ICMV 2023), Nov. 15-18, 2023, Yerevan, Armenia
Available from: 2024-03-22 Created: 2024-03-22 Last updated: 2025-04-14Bibliographically approved
3. Hierarchical Transfer Multi-task Learning Approach for Scene Classification
Open this publication in new window or tab >>Hierarchical Transfer Multi-task Learning Approach for Scene Classification
Show others...
2024 (English)In: Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part I, Springer, 2024, p. 231-248Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a novel Hierarchical Transfer and Multi-task Learning (HTMTL) approach designed to substantially improve the performance of scene classification networks by leveraging the collective influence of diverse scene types. HTMTL is distinguished by its ability to capture the interaction between various scene types, recognizing how context information from one scene category can enhance the classification performance of another. Our method, when applied to the Places365 dataset, demonstrates a significant improvement in the network’s ability to accurately identify scene types. By exploiting these inter-scene interactions, HTMTL significantly enhances scene classification performance, making it a potent tool for advancing scene understanding and classification. Additionally, this study explores the contribution of individual tasks and task groupings on the performance of other tasks. To further validate the generality of HTMTL, we applied it to the Cityscapes dataset, where the results also show promise. This indicates the broad applicability and effectiveness of our approach across different datasets and scene types.

Place, publisher, year, edition, pages
Springer, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15301
Keywords
Multi-task Learning; Scene Classification; Transfer Learning
National Category
Natural Language Processing
Identifiers
urn:nbn:se:mau:diva-72852 (URN)10.1007/978-3-031-78107-0_15 (DOI)2-s2.0-85211958209 (Scopus ID)978-3-031-78106-3 (ISBN)978-3-031-78107-0 (ISBN)
Conference
27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024
Available from: 2024-12-20 Created: 2024-12-20 Last updated: 2025-04-14Bibliographically approved
4. Video-Audio Multimodal Fall Detection Method
Open this publication in new window or tab >>Video-Audio Multimodal Fall Detection Method
Show others...
2025 (English)In: PRICAI 2024: Trends in Artificial Intelligence: 21st Pacific Rim International Conference on Artificial Intelligence, PRICAI 2024, Kyoto, Japan, November 18–24, 2024, Proceedings, Part IV / [ed] Rafik Hadfi; Patricia Anthony; Alok Sharma; Takayuki Ito; Quan Bai, Springer, 2025, p. 62-75Conference paper, Published paper (Refereed)
Abstract [en]

Falls frequently present substantial safety hazards to those who are alone, particularly the elderly. Deploying a rapid and proficient method for detecting falls is a highly effective approach to tackle this concealed peril. The majority of existing fall detection methods rely on either visual data or wearable devices, both of which have drawbacks. This research presents a multimodal approach that integrates video and audio modalities to address the issue of fall detection systems and enhances the accuracy of fall detection in challenging environmental conditions. This multimodal approach, which leverages the benefits of attention mechanism in both video and audio streams, utilizes features from both modalities through feature-level fusion to detect falls in unfavorable conditions where visual systems alone are unable to do so. We assessed the performance of our multimodal fall detection model using Le2i and UP-Fall datasets. Additionally, we compared our findings with other fall detection methods. The outstanding results of our multimodal model indicate its superior performance compared to single fall detection models.

Place, publisher, year, edition, pages
Springer, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15284
Keywords
Audio classification, Fall detection, Multimodal, Video classification, Video analysis, Detection methods, Detection models, Effective approaches, Multi-modal, Multi-modal approach, Performance, Safety hazards
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:mau:diva-72628 (URN)10.1007/978-981-96-0125-7_6 (DOI)2-s2.0-85210317498 (Scopus ID)978-981-96-0124-0 (ISBN)978-981-96-0125-7 (ISBN)
Conference
21st Pacific Rim International Conference on Artificial Intelligence, PRICAI 2024, Kyoto, Japan, November 18–24, 2024
Available from: 2024-12-10 Created: 2024-12-10 Last updated: 2025-04-14Bibliographically approved

Open Access in DiVA

fulltext(11149 kB)55 downloads
File information
File name FULLTEXT01.pdfFile size 11149 kBChecksum SHA-512
97b2e35cadb08fd41abaf813fb03c503c72ff79e91868bb26a015b88c008089f538ce1a60e3879861e05f69d9152752645684a8cbc6695508ee8c8cee23a1167
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Jamali, Mahtab
By organisation
Department of Computer Science and Media Technology (DVMT)
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 56 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 124 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf