Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Quality Model for Machine Learning
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2019 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context: - Machine learning is a part of artificial intelligence, this area is now continuously growing day by day. Most internet related services such as Social media service, Email Spam, E-commerce sites, Search engines are now using machine learning. The Quality of machine learning output relies on the input data, so the input data is crucial for machine learning and good quality of input data can give a better outcome to the machine learning system. In order to achieve quality data, a data scientist can use a data quality model on data of machine learning. Data quality model can help data scientists to monitor and control the input data of machine learning. But there is no considerable amount of research done on data quality attributes and data quality model for machine learning.

Objectives: - The primary objectives of this paper are to find and understand the state-of-art and state-of-practice on data quality attributes for machine learning, and to develop a data quality model for machine learning in collaboration with data scientists.

Methods: - This paper mainly consists of two studies: - 1) Conducted a literature review in the different database in order to identify literature on data quality attributes and data quality model for machine learning. 2) An in-depth interview study was conducted to allow a better understanding and verifying of data quality attributes that we identified from our literature review study, this process is carried out with the collaboration of data scientists from multiple locations. Totally of 15 interviews were performed and based on the results we proposed a data quality model based on these interviewees perspective.

Result: - We identified 16 data quality attributes as important from our study which is based on the perspective of experienced data scientists who were interviewed in this study. With these selected data quality attributes, we proposed a data quality model with which quality of data for machine learning can be monitored and improved by data scientists, and effects of these data quality attributes on machine learning have also been stated.

Conclusion: - This study signifies the importance of quality of data, for which we proposed a data quality model for machine learning based on the industrial experiences of a data scientist. This research gap is a benefit to all machine learning practitioners and data scientists who intended to identify quality data for machine learning. In order to prove that data quality attributes in the data quality model are important, a further experiment can be conducted, which is proposed in future work.

Place, publisher, year, edition, pages
2019. , p. P. 107
Keywords [en]
Machine learning, Data Quality Attributes, Data Quality Model
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-18498OAI: oai:DiVA.org:bth-18498DiVA, id: diva2:1339248
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAAPT Master of Science Programme in Software Engineering
Presentation
(English)
Supervisors
Examiners
Available from: 2019-07-30 Created: 2019-07-26 Last updated: 2019-07-30Bibliographically approved

Open Access in DiVA

BTH2019RudrarajuBoyanapally(6140 kB)5683 downloads
File information
File name FULLTEXT02.pdfFile size 6140 kBChecksum SHA-512
1208741f2ea7336fc36d9170a811d54b8058c3c7126c9fc06cf2c2e19cd48122813f23cbdaf751fd3caba882ee2b9a1b91bc7d351388c752534dc3090561fbbc
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nitesh Varma Rudraraju, NiteshVarun Boyanapally, Varun
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 5687 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2051 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf