Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Protractor: Leveraging distributed tracing in service meshes for application profiling at scale
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Large scale Internet services are increasingly implemented as distributed systems in order to achieve fault tolerance, availability, and scalability. When requests traverse multiple services, end-to-end metrics no longer tell a clear picture. Distributed tracing emerged to break down end-to-end latency on a per service basis, but only answers where a problem occurs, not why. From user research we found that root-cause analysis of performance problems is often still done by manually correlating information from logs, stack traces, and monitoring tools. Profilers provide fine-grained information, but we found they are rarely used in production systems because of the required changes to existing applications, the substantial storage requirements they introduce, and because it is difficult to correlate profiling data with information from other sources.

The proliferation of modern low-overhead profilers opens up possibilities to do online always-on profiling in production environments. We propose Protractor as the missing link that exploits these possibilities to provide distributed profiling. It features a novel approach that leverages service meshes for application-level transparency, and uses anomaly detection to selectively store relevant profiling information. Profiling information is correlated with distributed traces to provide contextual information for root-cause analysis. Protractor has support for different profilers, and experimental work shows impact on end-to-end request latency is less than 3%. The utility of Protractor is further substantiated with a survey showing the majority of the participants would use it frequently

Abstract [sv]

Storskaliga Internettjänster implementeras allt oftare som distribuerade system för att uppnå feltolerans, tillgänglighet och skalbarhet. När en request spänner över flera tjänster ger inte längre end-to-end övervakning en tydlig bild av orsaken till felet. Distribuerad tracing utvecklades för att spåra end-to-end request latency per tjänst och för att ge en indikation vart problemet kan ligger med visar oftas inte orsaken. Genom user research fann vi att root-cause-analys av prestandaproblem ofta fortfarande görs genom att manuellt korrelera information från loggar, stack traces och övervakningsverktyg. Kod-profilering tillhandahåller detaljerad information, men vi fann att den sällan används i produktionssystem på grund av att de kräver ändringar i den befintliga koden, de stora lagringskraven som de introducerar och eftersom det är svårt att korrelera profilerings data med information från andra källor.

Utbredning av moderna kodprofilerare med låg overhead öppnar upp möjligheten att kontinuerligt köra dem i produktionsmiljöer. Vi introducerar Protractor som kombinerar kodprofilering och distribuerad tracing. Genom att utnyttja och bygga på koncept så som service meshes uppnår vi transparens på applikationsnivå och använder anomalitetsdetektering för att selektivt lagra relevant profileringsinformation. Den informationen korreleras med distribuerade traces för att ge kontext för root-cause-analys. Protractor har stöd för olika kodprofilerare och experiment har visat att påverkan på end-to-end request latency är mindre än 3Användbarheten av Protractor är ytterligare underbyggd med en undersökning som visar att majoriteten av deltagarna skulle använda den ofta.

Place, publisher, year, edition, pages
2018. , p. 80
Series
TRITA-EECS-EX ; 2018:278
Keywords [en]
Observability, distributed tracing, service mesh, distributed profiling, microservices.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-232139OAI: oai:DiVA.org:kth-232139DiVA, id: diva2:1232561
Subject / course
Computer Technology and Software Engineering
Educational program
Master of Science - Network Services and Systems
Supervisors
Examiners
Available from: 2018-07-12 Created: 2018-07-12 Last updated: 2018-07-12Bibliographically approved

Open Access in DiVA

fulltext(1166 kB)3 downloads
File information
File name FULLTEXT01.pdfFile size 1166 kBChecksum SHA-512
9077b9f296a6faae5be0c1c83c02fb3df27172d31b40aca4a0d5a3788184f360137432c2f62690321f4bd362f8a905f995d62c23da27c41baed610ebe87d1707
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 22 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf