Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Identifiera känslig data inom ramen för GDPR: Med K-Nearest Neighbors
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2018 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

General Data Protection Regulation, GDPR, is a regulation coming into effect on May 25th 2018. Due to this, organizations face large decisions concerning how sensitive data, stored in databases, are to be identified. Meanwhile, there is an expansion of machine learning on the software market. The goal of this project has been to develop a tool which, through machine learning, can identify sensitive data. The development of this tool has been accomplished through the use of agile methods and has included comparisions of various algorithms and the development of a prototype. This by using tools such as Spyder and XAMPP. The results show that different types of sensitive data give variating results in the developed software solution. The kNN algorithm showed strong results in such cases when the sensitive data concerned Swedish Social Security numbers of 10 digits, and phone numbers in the length of ten or eleven digits, either starting with 46-, 070, 072 or 076 and also addresses. Regular expression showed strong results concerning e-mails and IP-addresses.

Abstract [sv]

General Data Protection Regulation, GDPR, är en reglering som träder i kraft 25 maj 2018. I och med detta ställs organisationer inför stora beslut kring hur de ska finna känsliga data som är lagrad i databaser. Samtidigt expanderar maskininlärning på mjukvarumarknaden. Målet för detta projekt har varit att ta fram ett verktyg som med hjälp av maskininlärning kan identifiera känsliga data. Utvecklingen av detta verktyg har skett med hjälp av agila metoder och har innefattat jämförelser av olika algoritmer och en framtagning av en prototyp. Detta med hjälp av verktyg såsom Spyder och XAMPP. Resultatet visar på att olika typer av känsliga data ger olika starka resultat i den utvecklade programvaran. kNN-algoritmen visade starka resultat i de fall då den känsliga datan rörde svenska, tiosiffriga personnummer samt telefonnummer i tio- eller elva-siffrigt format, och antingen inleds med 46, 070, 072 eller 076 samt då den rörde adresser. Regular expression visade på starka resultat när det gällde e- mails och IP-adresser.

Place, publisher, year, edition, pages
2018. , p. 42
Keywords [en]
GDPR, Machine Learning, Regular Expression, kNN, Python
Keywords [sv]
GDPR, Maskininlärning, Regular Expression, kNN, Python
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:miun:diva-34070Local ID: DT-V18-G3-003OAI: oai:DiVA.org:miun-34070DiVA, id: diva2:1229829
Subject / course
Computer Engineering DT1
Educational program
Computer Science TDATG 180 higher education credits
Supervisors
Examiners
Available from: 2018-07-02 Created: 2018-07-02 Last updated: 2018-07-02Bibliographically approved

Open Access in DiVA

fulltext(767 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 767 kBChecksum SHA-512
c5e8045ff3c97731369a9f49eb950f4de86a59f300b83778d2fec9859e303e0c6fd70a89cf981d1044de71bf64354eb52059da2bc2bcfaa31f6a12157ba0c553
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Darborg, Alex
By organisation
Department of Information Systems and Technology
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 49 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf