Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Robust Visual Learning across Class Imbalance and Distributional Shift
Linköpings universitet, Institutionen för systemteknik, Datorseende. Linköpings universitet, Tekniska fakulteten.ORCID-id: 0000-0001-9874-737X
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Computer vision aims to equip machines with perceptual understanding—detecting, recognizing, localizing, and relating visual entities to existing sources of knowledge. Machine learning provides the mechanism: models learn representations and decision rules from data and are expected to generalize beyond the training distribution. These systems already support biodiversity monitoring, autonomous driving, and geospatial mapping. In practice, however, textbook assumptions break down: the concept space is vast, data is sparse and imbalanced, many categories are rare, and high-quality annotations are costly. In addition, deployment conditions shift over time—class frequencies and visual domains evolve—biasing models toward frequent scenarios and eroding reliability.

In this work, we develop methods for training reliable visual recognition models under more realistic conditions: class imbalance, limited labeled data, and distribution shift. Our contributions span three themes: (1) debiasing strategies for imbalanced classification that remain reliable under changes in class priors; (2) semi-supervised learning techniques tailored to imbalanced data to reduce annotation cost while preserving minority-class performance; and (3) a unified multimodal retrieval approach for remote sensing (RS) that narrows the domain gap.

In Paper A, we study long-tailed image recognition, where skewed training data biases classifiers toward frequent classes. During deployment, changes in class priors can further amplify this bias. We propose an ensemble of skill-diverse experts, each trained under a distinct target prior, and aggregate their predictions to balance head and tail performance. We theoretically show that the ensemble’s prior bias equals the mean expert bias and that choosing complementary target priors cancels it, yielding an unbiased predictor that minimizes balanced error. With calibrated experts—achieved in practice via Mixup—the ensemble attains state-of-the-art accuracy and remains reliable under label shift.

In Paper B, we investigate long-tailed recognition in the semi-supervised setting, where a small, imbalanced labeled set is paired with a large unlabeled pool. Semi-supervised learning leverages unlabeled data to reduce annotation costs, typically through pseudo-labeling, but the unlabeled class distribution is often unknown and skewed. Naïve pseudo-labeling propagates the labeled bias, reinforcing head classes and overlooking rare ones. We propose a flexible distribution-alignment framework that estimates the unlabeled class mix online and reweights pseudo-labels accordingly, guiding the model first toward the unlabeled distribution to stabilize training and then toward a balanced classifier for fair inference. The proposed approach leverages unlabeled data more effectively, improving accuracy, calibration, and robustness to unknown unlabeled priors.

In Paper C, we move beyond recognition to unified multimodal retrieval for remote sensing—a domain with scarce image–text annotations and a challenging shift from natural images. Prior solutions are fragmented: RS dual encoders lack interleaved input support; universal embedders miss spatial metadata and degrade under domain shift; and RS generative assistants reason over regions but lack scalable retrieval. To overcome these limitations, we introduce VLM2GeoVec, a single-encoder, instruction-following embedder that aligns images, text, regions, and geocoordinates in a shared space. For comprehensive evaluation, we also propose RSMEB, a unified retrieval benchmark that spans conventional tasks (e.g., classification, cross-modal retrieval) and novel interleaved tasks (e.g., visual grounding, spatial localization, semantic geo-localization). In RSMEB, VLM2GeoVec narrows the domain gap relative to universal embedders and matches specialized baselines in conventional tasks in zero-shot settings. It further enables interleaved spatially-aware search, delivering several-fold gains in metadata-aware RS applications.

Ort, förlag, år, upplaga, sidor
Linköping: Linköping University Electronic Press, 2025. , s. 67
Serie
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2487
Nationell ämneskategori
Datorseende och lärande system
Identifikatorer
URN: urn:nbn:se:liu:diva-219564DOI: 10.3384/9789181183085ISBN: 9789181183078 (tryckt)ISBN: 9789181183085 (digital)OAI: oai:DiVA.org:liu-219564DiVA, id: diva2:2014470
Disputation
2025-12-17, Zero, Zenit Building, Campus Valla, Linköping, 09:15 (Engelska)
Opponent
Handledare
Anmärkning

Funding agency: The Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by the Knut and Alice Wallenberg Foundation

Tillgänglig från: 2025-11-18 Skapad: 2025-11-18 Senast uppdaterad: 2026-05-22Bibliografiskt granskad
Delarbeten
1. Balanced Product of Calibrated Experts for Long-Tailed Recognition
Öppna denna publikation i ny flik eller fönster >>Balanced Product of Calibrated Experts for Long-Tailed Recognition
2023 (Engelska)Ingår i: 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE COMPUTER SOC , 2023, s. 19967-19977Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Many real-world recognition problems are characterized by long-tailed label distributions. These distributions make representation learning highly challenging due to limited generalization over the tail classes. If the test distribution differs from the training distribution, e.g. uniform versus long-tailed, the problem of the distribution shift needs to be addressed. A recent line of work proposes learning multiple diverse experts to tackle this issue. Ensemble diversity is encouraged by various techniques, e.g. by specializing different experts in the head and the tail classes. In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE). BalPoE combines a family of experts with different test-time target distributions, generalizing several previous approaches. We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions, by proving that the ensemble is Fisher-consistent for minimizing the balanced error. Our theoretical analysis shows that our balanced ensemble requires calibrated experts, which we achieve in practice using mixup. We conduct extensive experiments and our method obtains new state-of-the-art results on three long-tailed datasets: CIFAR-100-LT, ImageNet-LT, and iNaturalist-2018. Our code is available at https://github.com/emasa/BalPoE-CalibratedLT.

Ort, förlag, år, upplaga, sidor
IEEE COMPUTER SOC, 2023
Serie
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919, E-ISSN 2575-7075
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:liu:diva-199347 (URN)10.1109/CVPR52729.2023.01912 (DOI)001062531304028 ()9798350301298 (ISBN)9798350301304 (ISBN)
Konferens
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, CANADA, jun 17-24, 2023
Anmärkning

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Tillgänglig från: 2023-11-28 Skapad: 2023-11-28 Senast uppdaterad: 2025-11-18
2. Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Öppna denna publikation i ny flik eller fönster >>Flexible Distribution Alignment: Towards Long-Tailed Semi-supervised Learning with Proper Calibration
Visa övriga...
2024 (Engelska)Ingår i: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LIV / [ed] Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol, Springer Nature Switzerland , 2024, Vol. 15112, s. 307-327Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.

Ort, förlag, år, upplaga, sidor
Springer Nature Switzerland, 2024
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15112
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:liu:diva-209223 (URN)10.1007/978-3-031-72949-2_18 (DOI)001352860600018 ()2-s2.0-85208545165 (Scopus ID)9783031729485 (ISBN)9783031729492 (ISBN)
Konferens
18th European Conference, Milan, Italy, September 29–October 4, 2024
Anmärkning

Funding Agencies|Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation; Swedish Research Council [2022-06725]; Knut and Alice Wallenberg Foundation at the National Supercomputer Centre

Tillgänglig från: 2024-11-06 Skapad: 2024-11-06 Senast uppdaterad: 2025-11-18

Open Access i DiVA

fulltext(10062 kB)1057 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 10062 kBChecksumma SHA-512
57dcc38f718de7a91efc0cd845e595a2067a0bc445efa1be14fc57c3f1f2e40f23ad60e81b87fa9dfc74f57d6e5e7f006a68f177b9a96d389dad0682a0fb09bc
Typ fulltextMimetyp application/pdf
Beställ online >>

Övriga länkar

Förlagets fulltext

Sök vidare i DiVA

Av författaren/redaktören
Sánchez Aimar, Emanuel
Av organisationen
DatorseendeTekniska fakulteten
Datorseende och lärande system

Sök vidare utanför DiVA

GoogleGoogle Scholar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 2296 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf