Leveraging LLM in Kubernetes
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Kubernetes (K8s) is a complex orchestration platform and the traditional way of handling its troubleshooting is very tedious and cumbersome. The increase in K8s footprint in the IT and Telecom industry in the last decade, increased the pressure on K8s administrators, developers, and users to handle K8s troubleshooting effectively. With the advancement in Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies, Large Language Models (LLMs) have significant advantages of handling data e.g., document generation, document summarization etc. LLMs have the capability to analyze the K8s logs, events, and errors, which further provide solutions to fix the errors, therefore, the K8s administrator or users can leverage the LLM models in this context to enhance the troubleshooting effectiveness. LLM can help K8s administrators, developers, and users to handle K8s troubleshooting effectively and efficiently. Despite various advantages of the LLM models, it has certain limitations also and most common is hallucination where a pre-trained LLM model is unable to provide the correct or no response. LLM models tend to hallucinate, which hinders its use in the IT/Telecom vendor production environment. For handling general K8s issues, LLM response is good, but as we encounter the vendor-specific errors the LLM models fail to provide the correct response for solving the issues.
Injecting the domain-specific information to LLM models tends to improve its response quality and mitigate the hallucination problem. The research goal of this thesis focuses on to assess the LLM models response improvements by injecting domain-specific knowledge and its corresponding impact on enhancing the K8s troubleshooting quality. Two setups, a lab setup, and an organization tool, were used in this research work to capture the LLM response for suggesting the solution to fix the K8s errors in two different scenarios where in the first scenario pre-trained LLM model response was captured without any organization product information awareness and in the second scenario LLM model response captured by injecting an organization product troubleshooting document. Injecting organization-specific information into the LLM models significantly improved its ability to provide improved solutions for fixing the K8s errors. The scenarios with both setups were demonstrated to organizations group of K8s experts to capture their feedback. The survey results provide data on participant's experience with K8s, how they consider the complexity level of K8s troubleshooting, how much it diverts their focus from actual work and how the manual troubleshooting approach eats up their time. The analysis of the survey results states that the LLM model response by injecting domain-specific information clearly outperforms in improving the K8s troubleshooting quality in the vendor environment. The research concludes that the experts are inclined towards the use of injecting domain knowledge to LLM models to improve the K8s troubleshooting. The research work is conducted in lab environment with limited resources and including only one vendor, so it is recommended for a further future research at broader level with multi-vendor K8s clusters, apply different methods of injecting the information to the LLM models.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Kubernetes, K8s, Large Language Models, LLMs, K8s, troubleshooting, RAG
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242762OAI: oai:DiVA.org:su-242762DiVA, id: diva2:1955694
2025-04-302025-04-30