Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating the Role of Large Language Models in Test Configuration Code Generation: An Empirical Study
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background: Automating software development tasks is becoming increasingly relevant as software systems become complex, and Large Language Models (LLMs) have been gaining popularity for their ability to generate code from textual descriptions, offering potential benefits in various coding scenarios

Problem Statement: Despite progress in using LLMs for code generation, research on XML-style test configuration code generation remains limited. This study explores how LLMs can be leveraged for XML-style test configuration code generation.

Objectives: This study explores using LLMs to generate XML-style test configuration code from test specifications. By fine-tuning existing LLMs, the research evaluates the effectiveness of the models in generating test configuration code. The study further investigates the efficacy of using the LLM-generated code as a starting point in a real-world industrial scenario compared to manually writing the code.

Methods: This study employs a multi-method Empirical Study, incorporating an Experiment and a Coding Workshop to assess the effectiveness of LLMs at generating XML configuration code and understand its impact in a real-world context.

Results: The results indicate that Mistral-7B outperforms Phi-3 and Code LLaMA in model performance and structural similarity to the ground-truth code. The coding workshop showed that using LLM-generated code as a starting point reduced coding time by an average of 40.75 minutes for Code 1 and 4 minutes for Code 2 compared to coding from scratch. It also resulted in a lower Tree Edit Distance, though the improvements were not always consistent. Developers also raised concerns about trust, reliability, and domain-specific optimization. While LLMs could quickly generate code, they required additional effort for comprehension and refinement.

Conclusions: The study finds Mistral-7B to be the most effective among the three LLMs for XML-styled test configuration code generation. While it may reduce initial effort when used as a starting point, manual refinement is still needed for accuracy and domain alignment; hence, developers do not completely trust this approach. Future research could explore advanced LLMs, improved validation, and alternative fine-tuning methods.

Place, publisher, year, edition, pages
2025. , p. 50
Keywords [en]
Large Language Models, Generative AI, PEFT QLoRA, Test Configuration Code Generation, Test Code Automation, Fine Tuning.
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27683OAI: oai:DiVA.org:bth-27683DiVA, id: diva2:1949520
External cooperation
Ericsson
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAADA Master Qualification Plan in Software Engineering 120,0 hp
Supervisors
Examiners
Available from: 2025-04-07 Created: 2025-04-02 Last updated: 2025-04-07Bibliographically approved

Open Access in DiVA

fulltext(973 kB)27 downloads
File information
File name FULLTEXT01.pdfFile size 973 kBChecksum SHA-512
cfe42c4ba0fa54206a5ce680da5014b6c9396aa5ee14b05909b124d2825607b82b1f27e9bff8eb1f3102a13d2ab1734dd79553fb611e8328299b55907b7a82b5
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 27 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 158 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf