Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Pitfalls of Assessing Extracted Hierarchies for Multi-Class Classification
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0001-5395-5482
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0002-7796-5201
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0002-3495-2961
Högskolan i Halmstad, Akademin för informationsteknologi.ORCID-id: 0000-0003-3272-4145
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Using hierarchies of classes is one of the standard methods to solve multi-class classification problems. In the literature, selecting the right hierarchy is considered to play a key role in improving classification performance. Although different methods have been proposed, there is still a lack of understanding of what makes a hierarchy good and what makes a method to extract hierarchies perform better or worse.

To this effect, we analyze and compare some of the most popular approaches to extracting hierarchies. We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods.To address some of these problems, we demonstrate that using random hierarchies is an appropriate benchmark to assess how the hierarchy's quality affects the classification performance.

In particular, we show how the hierarchy's quality can become irrelevant depending on the experimental setup: when using powerful enough classifiers, the final performance is not affected by the quality of the hierarchy. We also show how comparing the effect of the hierarchies against non-hierarchical approaches might incorrectly indicate their superiority.

Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other. In these datasets, the right hierarchy can dramatically improve classification performance.

Nyckelord [en]
Hierarchical Multi-class Classification, Multi-class Classification, Class Hierarchies
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:hh:diva-48117OAI: oai:DiVA.org:hh-48117DiVA, id: diva2:1697953
Forskningsfinansiär
KK-stiftelsen
Anmärkning

As manuscript in thesis

Tillgänglig från: 2022-09-22 Skapad: 2022-09-22 Senast uppdaterad: 2025-10-01Bibliografiskt granskad
Ingår i avhandling
1. Hierarchical Methods for Self-Monitoring Systems: Theory and Application
Öppna denna publikation i ny flik eller fönster >>Hierarchical Methods for Self-Monitoring Systems: Theory and Application
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Self-monitoring solutions first appeared to avoid catastrophic breakdowns in safety-critical mechanisms. The design behind these solutions relied heavily on the physical knowledge of the mechanism and its fault. They usually involved installing specialized sensors to monitor the state of the mechanism and statistical modeling of the recorded data. Mainly, these solutions focused on specific components of a machine and rarely considered more than one type of fault.

In our work, on the other hand, we focus on self-monitoring of complex machines, systems composed of multiple components performing heterogeneous tasks and interacting with each other: systems with many possible faults. Today, the data available to monitor these machines is vast but usually lacks the design and specificity to monitor each possible fault in the system accurately. Some faults will show distinctive symptoms in the data; some faults will not; more interestingly, there will be groups of faults with common symptoms in the recorded data.

The thesis in this manuscript is that we can exploit the similarities between faults to train machine learning models that can significantly improve the performance of self-monitoring solutions for complex systems that overlook these similarities. We choose to encode these similarity relationships into hierarchies of faults, which we use to train hierarchical supervised models. We use both real-life problems and standard benchmarks to prove the adequacy of our approach on tasks like fault diagnosis and fault prediction.

We also demonstrate that models trained on different hierarchies result in significantly different performances. We analyze what makes a good hierarchy and what are the best practices to develop methods to extract hierarchies of classes from the data. We advance the state-of-the-art by defining the concept of heterogeneity of decision boundaries and studying how it affects the performance of different class decompositions. 

Ort, förlag, år, upplaga, sidor
Halmstad: Halmstad University Press, 2022. s. 66
Serie
Halmstad University Dissertations ; 93
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:hh:diva-48138 (URN)978-91-88749-98-7 (ISBN)978-91-88749-97-0 (ISBN)
Disputation
2022-10-14, Wigforssalen, Hus J (Visionen), Kristian IV:s väg 3, Halmstad, 10:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2022-09-23 Skapad: 2022-09-23 Senast uppdaterad: 2025-10-01Bibliografiskt granskad

Open Access i DiVA

fulltext(622 kB)240 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 622 kBChecksumma SHA-512
da96e42e9a29c34ade0ed5ac85028eb0d9c3fa686aa32b63d97b8731e72573c3f2622a46c24dfeddf9f9dad2f838ec0785f23a60aa1fd993c78d0467e56b7fba
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Del Moral, PabloNowaczyk, SławomirSant'Anna, AnitaPashami, Sepideh
Av organisationen
Akademin för informationsteknologi
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 240 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 204 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf