Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Toward Highly-efficient GPU-centric Networking
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS. (NSLAB)ORCID-id: 0000-0002-9400-324X
2024 (Engelska)Licentiatavhandling, monografi (Övrigt vetenskapligt)Alternativ titel
Mot Högeffektiva GPU-centrerade Nätverk (Svenska)
Abstract [en]

Graphics Processing Units (GPUs) are emerging as the most popular accelerator for many applications, powering the core of Machine Learning applications and many computing-intensive workloads. GPUs have typically been consideredas accelerators, with Central Processing Units (CPUs) in charge of the mainapplication logic, data movement, and network connectivity. In these architectures,input and output data of network-based GPU-accelerated application typically traverse the CPU, and the Operating System network stack multiple times, getting copied across the system main memory. These increase application latency and require expensive CPU cycles, reducing the power efficiency of systems, and increasing the overall response times. These inefficiencies become of higher importance in latency-bounded deployments, or with high throughput, where copy times could easily inflate the response time of modern GPUs.

The main contribution of this dissertation is towards a GPU-centric network architecture, allowing GPUs to initiate network transfers without the intervention of CPUs. We focus on commodity hardware, using NVIDIA GPUs and Remote Direct Memory Access over Converged Ethernet (RoCE) to realize this architecture, removing the need of highly homogeneous clusters and ad-hoc designed network architecture, as it is required by many other similar approaches. By porting some rdma-core posting routines to GPU runtime, we can saturate a 100-Gbps link without any CPU cycle, reducing the overall system response time, while increasing the power efficiency and improving the application throughput.The second contribution concerns the analysis of Clockwork, a State-of-The-Art inference serving system, showing the limitations imposed by controller-centric, CPU-mediated architectures. We then propose an alternative architecture to this system based on an RDMA transport, and we study some performance gains that such a system would introduce. An integral component of an inference system is to account and track user flows,and distribute them across multiple worker nodes. Our third contribution aims to understand the challenges of Connection Tracking applications running at 100Gbps, in the context of a Stateful Load Balancer running on commodity hardware.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2024. , s. 160
Serie
TRITA-EECS-AVL ; 2024:30
Nyckelord [en]
Low-Latency Internet Services, Packet Processing, Network Functions Virtualization, Middle Boxes, Commodity Hardware, Multi-Hundred-Gigabit-Per-Second, Low-Level Optimization, Graphics Processing Units, Inference Serving, Remote Direct Memory Access
Nyckelord [sv]
Internettjänster med Låg Fördröjning, Paketbearbetning, Virtualisering av Nätverksfunktioner, Mellanutrustning, Tillgänglig Datorhårdvara, Flera-Hundra- Gigabit-Per-Sekund, Lågnivå-Optimering, Grafikprocessor, Inferensserving, Remote Direct Memory Access
Nationell ämneskategori
Kommunikationssystem Datorsystem
Forskningsämne
Datalogi; Informations- och kommunikationsteknik
Identifikatorer
URN: urn:nbn:se:kth:diva-344316ISBN: 978-91-8040-877-6 (tryckt)OAI: oai:DiVA.org:kth-344316DiVA, id: diva2:1844498
Presentation
2024-04-10, Zoom Webinar: https://kth-se.zoom.us/j/63581339905 Sal C (Sven-Olof Öhrvik) at Electrum, Kistagången 16, Stockholm, Sweden, 09:00 (Engelska)
Opponent
Handledare
Forskningsfinansiär
EU, Europeiska forskningsrådet, 770889Stiftelsen för strategisk forskning (SSF), TCC
Anmärkning

QC 20240315

Tillgänglig från: 2024-03-15 Skapad: 2024-03-14 Senast uppdaterad: 2024-03-15Bibliografiskt granskad

Open Access i DiVA

fulltext(5120 kB)1410 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 5120 kBChecksumma SHA-512
a228723a9581719be59c2b97c0e07f4c7a265ac06e34842c8e03a165cb0e3f0e66e6df19499ee69cabe5ba44d66f154967c9d2ac446b9bcaa83fda03f3fb7b5b
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Girondi, Massimo
Av organisationen
Programvaruteknik och datorsystem, SCS
KommunikationssystemDatorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1412 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 3065 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf