Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Federated Word2Vec: Leveraging Federated Learning to Encourage Collaborative Representation Learning
KTH.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-0223-8907
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-4516-7317
RISE Research Institutes of Sweden.
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality results. Joining and accessing all these data from multiple sources can be extremely challenging due to privacy and regulatory reasons. Federated Learning can solve these limitations by training models in a distributed fashion, taking advantage of the hardware of the devices that generate the data. We show the viability of training NLP models, specifically Word2Vec, with the Federated Learning protocol. In particular, we focus on a scenario in which a small number of organizations each hold a relatively large corpus. The results show that neither the quality of the results nor the convergence time in Federated Word2Vec deteriorates as compared to centralised Word2Vec.

Keywords [en]
federated learning, nlp
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-294115DOI: 10.5281/zenodo.4704840OAI: oai:DiVA.org:kth-294115DiVA, id: diva2:1553472
Funder
EU, Horizon 2020, 813162
Note

QC 20210512

Available from: 2021-05-10 Created: 2021-05-10 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(307 kB)181 downloads
File information
File name FULLTEXT01.pdfFile size 307 kBChecksum SHA-512
1ad023d4516fc4ff0c837957c5be8c75b1755ce1429786050e71e494a5dc34b9352ff47e2992dc6da52c3b15d5f2e7f7b156c420446902269120e5c1cd2f4f31
Type fulltextMimetype application/pdf

Other links

Publisher's full texthttps://arxiv.org/abs/2105.00831

Search in DiVA

By author/editor
Garcia Bernal, DanielGiaretta, LodovicoGirdzijauskas, Sarunas
By organisation
KTHSoftware and Computer systems, SCS
Language Technology (Computational Linguistics)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 181 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 219 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf