Digitala Vetenskapliga Arkivet

Change search
Refine search result
1 - 15 of 15
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Towards Decentralized Graph Learning2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Current Machine Learning (ML) approaches typically present either a centralized or federated architecture. However, these architectures cannot easily keep up with some of the challenges introduced by recent trends, such as the growth in the number of IoT devices, increasing awareness about the privacy and security implications of extensive data collection, and the rise of graph-structured data and Graph Representation Learning. Systems based on either direct data collection or Federated Learning contain centralized, privileged systems that may act as scalability bottlenecks and dangerous single points of failure, while requiring users to trust the privacy protections and security practices in place. The combination of these issues ultimately leads to data waste, as opportunities to extract insights from available data are missed and thus the full societal benefits of advanced data analytics and ML are not realized.

    In this thesis, we argue for a paradigm shift towards a completely decentralized and trustless architecture for privacy-aware Graph Representation Learning, which employs Gossip Learning and other gossip-based peer-to-peer techniques to achieve high levels of scalability and resilience while reducing the risk of privacy leaks. We then identify and pursue three key research directions necessary to achieve our vision: lifting unrealistic assumptions on Gossip Learning, identifying and developing specific use cases that are enabled or improved by gossip-based decentralization, and overcoming the obstacles to the deployment of decentralized training and inference for Graph Representation Learning models.

     Based on these key directions, our contributions are as follows. First, we analyze the robustness of Gossip Learning when several unrealistic but often assumed conditions are lifted. Then, we exploit Gossip Learning and gossip-based peer-to-peer protocols more in general across three use cases: the collaborative training of differentially-private Naive Bayes classifiers across organizations holding sensitive user data; the construction of decentralized, privacy-preserving data marketplaces; and the development and decentralization of early-stage IoT botnet detection systems based on Graph Representation Learning. Finally, we introduce a general framework for the fully-decentralized training of Graph Neural Networks, overcoming the typical requirement of these models to access non-local information during training and inference.

     The combination of these contributions removes major roadblocks towards decentralized graph learning, and also opens a new research direction aimed at further developing and optimizing the fully-decentralized training of Graph Representation Learning models.

    Download (pdf)
    summary
  • 2.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Marchioro, Thomas
    Foundation for Research and Technology Hellas, Nikolaou Plastira 100, 70013, Heraklion, Greece, Nikolaou Plastira 100.
    Markatos, Evangelos
    Foundation for Research and Technology Hellas, Nikolaou Plastira 100, 70013, Heraklion, Greece, Nikolaou Plastira 100.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Towards a Realistic Decentralized Naive Bayes with Differential Privacy2023In: E-Business and Telecommunications - 19th International Conference, ICSBT 2022, and 19th International Conference, SECRYPT 2022, Revised Selected Papers, Springer Nature , 2023, p. 98-121Conference paper (Refereed)
    Abstract [en]

    This is an extended version of our work in [16]. In this paper, we introduce two novel algorithms to collaboratively train Naive Bayes models across multiple private data sources: Federated Naive Bayes and Gossip Naive Bayes. Instead of directly providing access to their data, the data owners compute local updates that are then aggregated to build a global model. In order to also prevent indirect privacy leaks from the updates or from the final model, our algorithms protect the exchanged information with differential privacy. We experimentally evaluate our proposed approaches, examining different scenarios and focusing on potential real-world issues, such as different data owner offering different amounts of data or requesting different levels of privacy. Our results show that both Federated and Gossip Naive Bayes achieve similar accuracy to a “vanilla” Naive Bayes while maintaining reasonable privacy guarantees, while being extremely robust to heterogeneous data owners.

  • 3.
    Marchioro, Thomas
    et al.
    Fdn Res & Technol Hellas, Inst Comp Sci, Iraklion, Greece..
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Markatos, Evangelos
    Fdn Res & Technol Hellas, Inst Comp Sci, Iraklion, Greece..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Federated Naive Bayes under Differential Privacy2022In: Proceedings of the 19th International Conference on Security and Cryptography - SECRYPT / [ed] DiVimercati, SDC Samarati, P, Scitepress , 2022, p. 170-180Conference paper (Refereed)
    Abstract [en]

    Growing privacy concerns regarding personal data disclosure are contrasting with the constant need of such information for data-driven applications. To address this issue, the combination of federated learning and differential privacy is now well-established in the domain of machine learning. These techniques allow to train deep neural networks without collecting the data and while preventing information leakage. However, there are many scenarios where simpler and more robust machine learning models are preferable. In this paper, we present a federated and differentially-private version of the Naive Bayes algorithm for classification. Our results show that, without data collection, the same performance of a centralized solution can be achieved on any dataset with only a slight increase in the privacy budget. Furthermore, if certain conditions are met, our federated solution can outperform a centralized approach.

  • 4.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Marchioro, Thomas
    Markatos, Evangelos
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Towards a decentralized infrastructure for data marketplaces: narrowing the gap between academia and industry2022In: DE '22: Proceedings of the 1st International Workshop on Data Economy, New York, NY, USA: Association for Computing Machinery (ACM), 2022, p. 49-56Conference paper (Refereed)
    Abstract [en]

    One big challenge for Industry 4.0 is leveraging the large amount of data that remain unused after collection. A variety of commercial data marketplaces have emerged in recent years to tackle this task. Despite their different business models and target markets, such marketplaces share a number of common issues that slow the growth of the industry, including data discovery, transparency, data privacy and data valuation. Many academic designs have been proposed to address these issues, yet most of them remain unimplemented, due to complexity or inefficiency.

    We argue that these issues can be addressed with a combination of blockchain-based infrastructure, privacy-preserving computing and machine learning-based valuation metrics. Furthermore, we discuss key enabling technologies in each of these areas that are feasible to deploy at scale and could thus be implemented in real-world marketplaces in the near future. We select such technologies based on their current maturity and their industrial prominence.

    Download full text (pdf)
    fulltext
  • 5.
    Samy, Ahmed
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Kefato, Zekarias Tilahun
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS).
    SchemaWalk: Schema Aware Random Walks for Heterogeneous Graph Embedding2022In: WWW 2022 - Companion Proceedings of the Web Conference 2022, Association for Computing Machinery (ACM) , 2022, p. 1157-1166Conference paper (Refereed)
    Abstract [en]

    Heterogeneous Information Network (HIN) embedding has been a prevalent approach to learn representations off semantically-rich heterogeneous networks. Most HIN embedding methods exploit meta-paths to retain high-order structures, yet, their performance is conditioned on the quality of the (generated/manually-defined) meta-paths and their suitability for the specific label set. Whereas other methods adjust random walks to harness or skip certain heterogeneous structures (e.g. node type(s)), in doing so, the adjusted random walker may casually omit other node/edge types. Our key insight is with no domain knowledge, the random walker should hold no assumptions about heterogeneous structure (i.e. edge types). Thus, aiming for a flexible and general method, we utilize network schema as a unique blueprint of HIN, and propose SchemaWalk, a random walk to uniformly sample all edge types within the network schema. Moreover, we identify the starvation phenomenon which induces random walkers on HINs to under- or over-sample certain edge types. Accordingly, we design SchemaWalkHO to skip local deficient connectivity to preserve uniform sampling distribution. Finally, we carry out node classification experiments on four real-world HINs, and provide in-depth qualitative analysis. The results highlight the robustness of our method regardless to the graph structure in contrast with the state-of-the-art baselines. 

  • 6.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Marchioro, Thomas
    Foundation for Research and Technology Hellas, Heraklion, Greece.
    Markatos, Evangelos
    Foundation for Research and Technology Hellas, Heraklion, Greece.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Towards a Decentralized Infrastructure for Data Marketplaces: Narrowing the Gap between Academia and Industry2022In: DE 2022: Proceedings of the 1st International Workshop on Data Economy, Part of CoNEXT 2022, Association for Computing Machinery (ACM) , 2022, p. 49-56Conference paper (Refereed)
    Abstract [en]

    One big challenge for Industry 4.0 is leveraging the large amount of data that remain unused after collection. A variety of commercial data marketplaces have emerged in recent years to tackle this task. Despite their different business models and target markets, such marketplaces share a number of common issues that slow the growth of the industry, including data discovery, transparency, data privacy and data valuation. Many academic designs have been proposed to address these issues, yet most of them remain unimplemented, due to complexity or inefficiency. We argue that these issues can be addressed with a combination of blockchain-based infrastructure, privacy-preserving computing and machine learning-based valuation metrics. Furthermore, we discuss key enabling technologies in each of these areas that are feasible to deploy at scale and could thus be implemented in real-world marketplaces in the near future. We select such technologies based on their current maturity and their industrial prominence.

  • 7.
    Alkathiri, Abdul Aziz
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sahlgren, Magnus
    Decentralized Word2Vec Using Gossip Learning2021In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021Conference paper (Refereed)
    Abstract [en]

    Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.

    Download full text (pdf)
    fulltext
  • 8.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Savvidis, Ioannis
    University of Cyprus.
    Marchioro, Thomas
    Foundation for Research and Technology Hellas.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Pallis, George
    University of Cyprus.
    Dikaiakos, Marios
    University of Cyprus.
    Markatos, Evangelos
    Foundation for Research and Technology Hellas.
    PDS2: A user-centered decentralized marketplace for privacy preserving data processing2021In: Third International Workshop on Blockchain and Data Management (BlockDM 2021), in conjunction with the 37th IEEE International Conference on Data Engineering (ICDE), April 19, 2021, Chania, Crete, Greece, 2021Conference paper (Refereed)
    Abstract [en]

    We envision PDS2, a decentralized data marketplace in which consumers submit their tasks to be run within the platform, on the data of willing providers. The goal of PDS2is to ensure that users maintain full control on their data and do not compromise their privacy, while being rewarded for the value that their data generates. In order to achieve this, our marketplace architecture employs blockchain technology, privacy-preserving computation and decentralized machine learning.

    We then compare different potential solutions and identify the Ethereum blockchain, trusted execution environments and gossip learning as the most suitable for the implementation of PDS2. We also discuss the main open challenges that are left to tackle and possible directions for future work

    Download full text (pdf)
    fulltext
  • 9.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Savvidis, Ioannis
    Univ Cyprus, Dept Comp Sci, Nicosia, Cyprus..
    Marchioro, Thomas
    Fdn Res & Technol Hellas, Inst Comp Sci, Iraklion, Greece..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Pallis, George
    Univ Cyprus, Dept Comp Sci, Nicosia, Cyprus..
    Dikaiakos, Marios D.
    Univ Cyprus, Dept Comp Sci, Nicosia, Cyprus..
    Markatos, Evangelos
    Fdn Res & Technol Hellas, Inst Comp Sci, Iraklion, Greece..
    PDS2: A user-centered decentralized marketplace for privacy preserving data processing2021In: 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 92-99Conference paper (Refereed)
    Abstract [en]

    We envision PDS2, a decentralized data marketplace in which consumers submit their tasks to be run within the platform, on the data of willing providers. The goal of PDS2 is to ensure that users maintain full control on their data and do not compromise their privacy, while being rewarded for the value that their data generates. In order to achieve this, our marketplace architecture employs blockchain technology, privacypreserving computation and decentralized machine learning. We then compare different potential solutions and identify the Ethereum blockchain, trusted execution environments and gossip learning as the most suitable for the implementation of PDS2. We also discuss the main open challenges that are left to tackle and possible directions for future work.

  • 10.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Lekssays, Ahmed
    University of Insubria.
    Carminati, Barbara
    University of Insubria.
    Ferrari, Elena
    University of Insubria.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    LiMNet: Early-Stage Detection of IoT Botnets with Lightweight Memory Networks2021In: Computer Security – ESORICS 2021: 26th European Symposium on Research in Computer Security, Darmstadt, Germany, October 4–8, 2021, Proceedings, Part I / [ed] Elisa Bertino, Haya Shulman, Michael Waidner, Springer Nature , 2021Conference paper (Refereed)
    Abstract [en]

    IoT devices have been growing exponentially in the last few years. This growth makes them an attractive target for attackers due to their low computational power and limited security features. Attackers use IoT botnets as an instrument to perform DDoS attacks which caused major disruptions of Internet services in the last decade. While many works have tackled the task of detecting botnet attacks, only a few have considered early-stage detection of these botnets during their propagation phase.

    While previous approaches analyze each network packet individually to predict its maliciousness, we propose a novel deep learning model called LiMNet (Lightweight Memory Network), which uses an internal memory component to capture the behaviour of each IoT device over time. This memory incorporates both packet features and behaviour of the peer devices. With this information, LiMNet achieves almost maximum AUROC classification scores, between 98.8% and 99.7%, with a 14% improvement over state of the art. LiMNet is also lightweight, performing inference almost 8 times faster than previous approaches.

    Download full text (pdf)
    fulltext
  • 11.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Gossip Learning: Off the Beaten Path2019Conference paper (Refereed)
    Abstract [en]

    The growing computational demands of model training tasks and the increased privacy awareness of consumers call for the development of new techniques in the area of machine learning. Fully decentralized approaches have been proposed, but are still in early research stages. This study analyses gossip learning, one of these state-of-the-art decentralized machine learning protocols, which promises high scalability and privacy preservation, with the goal of assessing its applicability to realworld scenarios.

    Previous research on gossip learning presents strong and often unrealistic assumptions on the distribution of the data, the communication speeds of the devices and the connectivity among them. Our results show that lifting these requirements can, in certain scenarios, lead to slow convergence of the protocol or even unfair bias in the produced models. This paper identifies the conditions in which gossip learning can and cannot be applied, and introduces extensions that mitigate some of its limitations.

  • 12.
    Garcia Bernal, Daniel
    et al.
    KTH.
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sahlgren, Magnus
    RISE Research Institutes of Sweden.
    Federated Word2Vec: Leveraging Federated Learning to Encourage Collaborative Representation LearningManuscript (preprint) (Other academic)
    Abstract [en]

    Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality results. Joining and accessing all these data from multiple sources can be extremely challenging due to privacy and regulatory reasons. Federated Learning can solve these limitations by training models in a distributed fashion, taking advantage of the hardware of the devices that generate the data. We show the viability of training NLP models, specifically Word2Vec, with the Federated Learning protocol. In particular, we focus on a scenario in which a small number of organizations each hold a relatively large corpus. The results show that neither the quality of the results nor the convergence time in Federated Word2Vec deteriorates as compared to centralised Word2Vec.

    Download full text (pdf)
    fulltext
  • 13.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Marchioro, Thomas
    Markatos, Evangelos
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Towards a Realistic Decentralized Naive Bayeswith Differential PrivacyManuscript (preprint) (Other academic)
    Abstract [en]

    This is an extended version of our work in [16]. In this paper,we introduce two novel algorithms to collaboratively train Naive Bayesmodels across multiple private data sources: Federated Naive Bayes andGossip Naive Bayes. Instead of directly providing access to their data,the data owners compute local updates that are then aggregated to builda global model. In order to also prevent indirect privacy leaks from theupdates or from the final model, our algorithms protect the exchangedinformation with differential privacy. We experimentally evaluate ourproposed approaches, examining different scenarios and focusing on potentialreal-world issues, such as different data owner offering differentamounts of data or requesting different levels of privacy. Our results showthat both Federated and Gossip Naive Bayes achieve similar accuracy toa “vanilla” Naive Bayes while maintaining reasonable privacy guarantees,while being extremely robust to heterogeneous data owners.

    Download full text (pdf)
    fulltext
  • 14.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Lekssays, Ahmed
    University of Insubria.
    Carminati, Barbara
    Ferrari, Elena
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Metasoma: Decentralized and CollaborativeEarly-Stage Detection of IoT BotnetsManuscript (preprint) (Other academic)
    Abstract [en]

    Early-stage detection of botnets during their spreadingphase, before any attack, is fundamental to IoT security.Recently introduced lightweight memory networks represent thestate of the art in this domain. However, they require a centralsystem to capture and analyze all traffic in the network, whichmay not always be feasible in real-world scenarios.In this paper, we introduce a decentralized and collaborativealternative, in which the IoT devices themselves are responsiblefor this task without any central observer or coordinator. Ourresults show that the performance of this novel approach iscompetitive with similar centralized solutions, despite the lackof a global view of the network at any participating device.We also provide an extensive analysis of the security limitationsof our fully-decentralized detection system. We identify thepotential exploits that an attacker may attempt to perform, assesstheir impact on the IoT network as well as propose and evaluateeffective countermeasures.

    Download full text (pdf)
    fulltext
  • 15.
    Giaretta, Lodovico
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Fully-Decentralized Training of GNNs using Layer-wise Self-SupervisionManuscript (preprint) (Other academic)
    Abstract [en]

    In existing literature, GNN training has been performed mostly in centralized, and sometimes federated, settings. In this work, we consider a fully-decentralized data-private scenario, where each node has limited knowledge of the surrounding graph. We propose the first architecture that enables GNN training in this fully-decentralized setting, by carefully combining several techniques, including decoupled learning, self-supervision and Gossip Learning. We implement two simulation tools to experimentally evaluate our solution. The results show that the proposed technique can be effectively used in scenarios where centralized or federated approaches are unfeasible or undesirable.

    Download full text (pdf)
    fulltext
1 - 15 of 15
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf