Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Why Is Multiclass Classification Hard?
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0001-5395-5482
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0002-7796-5201
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0003-3272-4145
2022 (Engelska)Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 10, s. 80448-80462Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In classification problems, as the number of classes increases, correctly classifying a new instance into one of them is assumed to be more challenging than making the same decision in the presence of fewer classes. The essence of the problem is that using the learning algorithm on each decision boundary individually is better than using the same learning algorithm on several of them simultaneously. However, why and when it happens is still not well-understood today. This work’s main contribution is to introduce the concept of heterogeneity of decision boundaries as an explanation of this phenomenon. Based on the definition of heterogeneity of decision boundaries, we analyze and explain the differences in the performance of state of the art approaches to solve multi-class classification. We demonstrate that as the heterogeneity increases, the performances of all approaches, except one-vs-one, decrease. We show that by correctly encoding the knowledge of the heterogeneity of decision boundaries in a decomposition of the multi-class problem, we can obtain better results than state of the art decompositions. The benefits can be an increase in classification performance or a decrease in the time it takes to train and evaluate the models. We first provide intuitions and illustrate the effects of the heterogeneity of decision boundaries using synthetic datasets and a simplistic classifier. Then, we demonstrate how a real dataset exhibits these same principles, also under realistic learning algorithms. In this setting, we devise a method to quantify the heterogeneity of different decision boundaries, and use it to decompose the multi-class problem. The results show significant improvements over state-of-the-art decompositions that do not take the heterogeneity of decision boundaries into account. © 2013 IEEE.

Ort, förlag, år, upplaga, sidor
Piscataway, NJ: IEEE, 2022. Vol. 10, s. 80448-80462
Nyckelord [en]
Classification complexity, heterogeneity of decision boundaries, multi-class classification
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:hh:diva-48116DOI: 10.1109/access.2022.3192514ISI: 000838670500001Scopus ID: 2-s2.0-85135735284OAI: oai:DiVA.org:hh-48116DiVA, id: diva2:1697940
Forskningsfinansiär
KK-stiftelsenTillgänglig från: 2022-09-22 Skapad: 2022-09-22 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
Ingår i avhandling
1. Hierarchical Methods for Self-Monitoring Systems: Theory and Application
Öppna denna publikation i ny flik eller fönster >>Hierarchical Methods for Self-Monitoring Systems: Theory and Application
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Self-monitoring solutions first appeared to avoid catastrophic breakdowns in safety-critical mechanisms. The design behind these solutions relied heavily on the physical knowledge of the mechanism and its fault. They usually involved installing specialized sensors to monitor the state of the mechanism and statistical modeling of the recorded data. Mainly, these solutions focused on specific components of a machine and rarely considered more than one type of fault.

In our work, on the other hand, we focus on self-monitoring of complex machines, systems composed of multiple components performing heterogeneous tasks and interacting with each other: systems with many possible faults. Today, the data available to monitor these machines is vast but usually lacks the design and specificity to monitor each possible fault in the system accurately. Some faults will show distinctive symptoms in the data; some faults will not; more interestingly, there will be groups of faults with common symptoms in the recorded data.

The thesis in this manuscript is that we can exploit the similarities between faults to train machine learning models that can significantly improve the performance of self-monitoring solutions for complex systems that overlook these similarities. We choose to encode these similarity relationships into hierarchies of faults, which we use to train hierarchical supervised models. We use both real-life problems and standard benchmarks to prove the adequacy of our approach on tasks like fault diagnosis and fault prediction.

We also demonstrate that models trained on different hierarchies result in significantly different performances. We analyze what makes a good hierarchy and what are the best practices to develop methods to extract hierarchies of classes from the data. We advance the state-of-the-art by defining the concept of heterogeneity of decision boundaries and studying how it affects the performance of different class decompositions. 

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2022. s. 66
Serie
Halmstad University Dissertations ; 93
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-48138 (URN)978-91-88749-98-7 (ISBN)978-91-88749-97-0 (ISBN)
Disputation
2022-10-14, Wigforssalen, Hus J (Visionen), Kristian IV:s väg 3, Halmstad, 10:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2022-09-23 Skapad: 2022-09-23 Senast uppdaterad: 2025-10-01Bibliografiskt granskad

Open Access i DiVA

fulltext(1026 kB)2246 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1026 kBChecksumma SHA-512
77b65f2641643275b019c076f72dab96249dd141f753281afe46372be69ed088b1fa448f9028036d5bf874e75655baff0aacd2780baf91206a5c272b4a1d4f7c
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Del Moral, PabloNowaczyk, SławomirPashami, Sepideh
Av organisationen
Akademin för informationsteknologi
I samma tidskrift
IEEE Access
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 2247 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 431 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf