Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Personalized Privacy Amplification via Importance Sampling
Elekta and KTH Royal Institute of Technology, Stockholm, Sweden.ORCID iD: 0000-0002-5530-2714
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning. Linköping University, Faculty of Science & Engineering. Uppsala University, Sweden.ORCID iD: 0000-0003-2949-8781
Uppsala University, Sweden.ORCID iD: 0000-0002-9099-3522
2025 (English)In: Transactions on Machine Learning Research, E-ISSN 2835-8856Article in journal (Refereed) Published
Abstract [en]

For scalable machine learning on large data sets, subsampling a representative subset is a common approach for efficient model training. This is often achieved through importance sampling, whereby informative data points are sampled more frequently. In this paper, we examine the privacy properties of importance sampling, focusing on an individualized privacy analysis. We find that, in importance sampling, privacy is well aligned with utility but at odds with sample size. Based on this insight, we propose two approaches for constructing sampling distributions: one that optimizes the privacy-efficiency trade-off; and one based on a utility guarantee in the form of coresets. We evaluate both approaches empirically in terms of privacy, efficiency, and accuracy on the differentially private k-means problem. We observe that both approaches yield similar outcomes and consistently outperform uniform sampling across a wide range of data sets. Our code is available on GitHub.

Place, publisher, year, edition, pages
2025.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-212077OAI: oai:DiVA.org:liu-212077DiVA, id: diva2:1942144
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2025-03-04 Created: 2025-03-04 Last updated: 2025-03-04

Open Access in DiVA

fulltext(3649 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 3649 kBChecksum SHA-512
48b5a0bf72e9a236e2a9bf22b872b341e752d0ffa7465e2b6209bb444458feaa11fa0bdb1b68786c2675227f8695f9aad84d57fc3ec53909e1e8384d8b3e7092
Type fulltextMimetype application/pdf

Other links

https://openreview.net/forum?id=IK2cR89z45

Search in DiVA

By author/editor
Fay, DominikMair, SebastianSjölund, Jens
By organisation
The Division of Statistics and Machine LearningFaculty of Science & Engineering
In the same journal
Transactions on Machine Learning Research
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 40 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 586 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf