Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing machine learning performance through intelligent data quality assessment: An unsupervised data-centric framework
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0009-0006-7733-8298
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0000-0001-9051-7609
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Chemical Sciences (from 2013).
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Chemical Sciences (from 2013).ORCID iD: 0000-0002-7123-2066
Show others and affiliations
2025 (English)In: Heliyon, E-ISSN 2405-8440, Vol. 11, no 4, article id e42777Article in journal (Refereed) Published
Abstract [en]

Poor data quality limits the advantageous power of Machine Learning (ML) and weakens high-performing ML software systems. Nowadays, data are more prone to the risk of poor quality due to their increasing volume and complexity. Therefore, tedious and time-consuming work goes into data preparation and improvement before moving further in the ML pipeline. To address this challenge, we propose an intelligent data-centric evaluation framework that can identify high-quality data and improve the performance of an ML system. The proposed framework combines the curation of quality measurements and unsupervised learning to distinguish high- and low-quality data. The framework is designed to integrate flexible and general-purpose methods so that it is deployed in various domains and applications. To validate the outcomes of the designed framework, we implemented it in a real-world use case from the field of analytical chemistry, where it is tested on three datasets of anti-sense oligonucleotides. A domain expert is consulted to identify the relevant quality measurements and evaluate the outcomes of the framework. The results show that the quality-centric data evaluation framework identifies the characteristics of high-quality data that guide the conduct of efficient laboratory experiments and consequently improve the performance of the ML system. 

Place, publisher, year, edition, pages
Elsevier, 2025. Vol. 11, no 4, article id e42777
Keywords [en]
Automated data evaluation, Data quality, Data-centric clustering, Machine learning, Unsupervised learning
National Category
Computer Sciences Computer Systems
Research subject
Computer Science; Chemistry
Identifiers
URN: urn:nbn:se:kau:diva-104062DOI: 10.1016/j.heliyon.2025.e42777Scopus ID: 2-s2.0-85218987614OAI: oai:DiVA.org:kau-104062DiVA, id: diva2:1954655
Funder
Knowledge Foundation, 20210021Available from: 2025-04-25 Created: 2025-04-25 Last updated: 2025-04-25Bibliographically approved

Open Access in DiVA

fulltext(11056 kB)20 downloads
File information
File name FULLTEXT01.pdfFile size 11056 kBChecksum SHA-512
b0831018bdf8c50b41485cbfd823471cccae3924f3d24ad861a8916d093f0e2365d4857e29f5b490fa90417d1e74de90811320096322ab1be674868e2147ed93
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Rahal, ManalAhmed, Bestoun S.Fornstedt, TorgnySamuelsson, Jörgen
By organisation
Department of Mathematics and Computer Science (from 2013)Department of Engineering and Chemical Sciences (from 2013)
In the same journal
Heliyon
Computer SciencesComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 168 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf