Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Privacy Preserving Micro-data Analysis: A machine learning based perspective under prevailing privacy regulations
University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0002-2564-0683
2021 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Machine learning (ML) has been employed in a wide variety of domains where micro-data (i.e., personal data) are used in the training process. In recent research, it has been shown that ML models are vulnerable to privacy attacks that exploit their observable predictions and optimization information in order to extract sensitive information about the underlying data subjects. Therefore, models trained on micro-data pose a distinct threat to the privacy of the data subjects. To mitigate these risks, privacy preserving machine learning (PPML) techniques are proposed in the literature. Existing PPML techniques are mainly based on differential privacy or cryptography based techniques. However, using these techniques for privacy preservation either results in poor predictive accuracy of the derived ML models or a high computational cost. Also, they operate under the assumption that raw data are available for training the ML models.

Due to stringent requirements for data protection and data publishing, it is plausible that the micro-data are anonymized by the data controllers before releasing them for analysis. In the event that anonymized data are available for ML model training, it is vital to understand its impact on ML utility and privacy aspects. In literature on data privacy, anonymization and PPML are often studied as two disconnected fields. But we argue that a natural synergy exists between these two fields that results in a myriad of benefits for the data controllers as well as for the data subjects, in the light of new privacy regulations, business requirements, and privacy risk factors. When anonymized data are used to train the ML models there is an intrinsic requirement to re-think the existing privacy preserving mechanisms used in both data anonymization and PPML.

One of the main contributions of this thesis is, understanding the opportunities and challenges presented by data anonymization in a ML setting. During this exploration, we highlight how certain provisions of the General Data Protection Regulation (GDPR) could be in direct conflict with the interest of ML utility and privacy. Inspired by these findings, we then propose a novel anonymization technique based on probabilistic k-anonymity that comprises amenable characteristics for ML utility and privacy. Next, we introduce a privacy-preserving technique for ML model selection based on integral privacy that can inhibit the inferences drawn by the adversaries about the training data or their transformations over time, by the means of selecting models with certain characteristics that can improve the adversary’s uncertainty. Moreover, we provide a rigorous characterization of a well-known privacy attack targeting the ML models (i.e., membership inference), and then identify the limitations of the existing methods that can easily be manipulated in order to overstate or understate the particular privacy risk. Finally, we present a new membership inference attack model, based on activation pattern based anomaly detection that overcomes these limitations while providing greater accuracy in identifying membership.

Together, we believe these contributions will broaden the understanding of the research community, not only concerning the technical aspects of preserving privacy in ML but also highlighting its interplay with existing privacy regulations such as GDPR. It is hoped such findings will shape our journey for knowledge discovery in the era of big data.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2021. , p. xxi, 193
Series
Dissertation Series ; 41
Keywords [en]
Data anonymization, data privacy, privacy-preserving machine learning, privacy preserving data publishing, personal data protection
National Category
Computer Sciences
Research subject
Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-20714ISBN: 978-91-984919-5-1 (print)OAI: oai:DiVA.org:his-20714DiVA, id: diva2:1613754
Public defence
2021-12-10, Insikten, Portalen, Kanikegränd 3, Skövde, 15:00 (English)
Opponent
Supervisors
Available from: 2021-11-23 Created: 2021-11-23 Last updated: 2021-11-23Bibliographically approved

Open Access in DiVA

fulltext(1162 kB)180 downloads
File information
File name FULLTEXT01.pdfFile size 1162 kBChecksum SHA-512
2ce82fdf54bb19fd0c3eb7d95bf5a363cb762b90ca7b6f416bf7de1fafc572111c1833fa756792ef5718827896be2ea2bf2a13b33500c775236a9cd9c94fc3cf
Type fulltextMimetype application/pdf
fulltext(13728 kB)198 downloads
File information
File name FULLTEXT02.pdfFile size 13728 kBChecksum SHA-512
94e655ac4860a75825976e21caa824f1f51e8785a5c5abb928a06d2add87522badb3aa80d68a700c619ec365288a42d616f95acb446d02b9c7eb20ec4fccd644
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Senavirathne, Navoda
By organisation
School of InformaticsInformatics Research Environment
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 378 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1766 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf