Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0002-4379-6614
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0001-7526-3727
Synteda, Gothenburg, Sweden.
2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 487-497Conference paper, Published paper (Refereed)
Abstract [en]

Automated testing, particularly for GUI-based systems, remains a costly and labor-intensive process and prone to errors. Despite advancements in automation, manual testing still dominates in industrial practice, resulting in delays, higher costs, and increased error rates. Large Language Models (LLMs) have shown great potential to automate tasks traditionally requiring human intervention, leveraging their cognitive-like abilities for test generation and evaluation. In this study, we present PathFinder, a Multi-Agent LLM (MALLM) framework that incorporates four agents responsible for (a) perception and summarization, (b) decision-making, (c) input handling and extraction, and (d) validation, which work collaboratively to automate exploratory web-based GUI testing. The goal of this study is to assess how different LLMs, applied to different agents, affect the efficacy of automated exploratory GUI testing. We evaluate PathFinder with three models, Mistral-Nemo, Gemma2, and Llama3.1, on four e-commerce websites. Thus, 27 permutations of the LLMs, across three agents (excluding the validation agent), to test the hypothesis that a solution with multiple agents, each using different LLMs, is more efficacious (efficient and effective) than a multi-agent solution where all agents use the same LLM. The results indicate that the choice of LLM constellation (combination of LLMs) significantly impacts efficacy, suggesting that a single LLM across agents may yield the best balance of efficacy (measured by F1-score). Hypothesis to explain this result include, but are not limited to: improved decision-making consistency and reduced task coordination discrepancies. The contributions of this study are an architecture for MALLM-based GUI testing, empirical results on its performance, and novel insights into how LLM selection impacts the efficacy of automated testing. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025. p. 487-497
Series
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, ISSN 2159-4848
Keywords [en]
AI-Assisted Software Testing, Automated Testing, Large Language Models (LLMs), MALLM, Multi-Agent Systems, Ability testing, Autonomous agents, C (programming language), Intelligent agents, Model checking, Software testing, GUI testing, Language model, Large language model, Multi agent, Multi-agent LLM, Multiagent systems (MASs), Software testings, Test generations, Automatic test pattern generation
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-28172DOI: 10.1109/ICST62969.2025.10989038ISI: 001506893900043Scopus ID: 2-s2.0-105007519090ISBN: 9798331508142 (print)OAI: oai:DiVA.org:bth-28172DiVA, id: diva2:1974470
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025
Part of project
T.A.R.G.E.T. – Testing with AI Reinforced GUI Embedded Technology, VinnovaSERT- Software Engineering ReThought, Knowledge Foundation
Funder
Vinnova, 2024- 00242Knowledge Foundation, 20180010Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

fulltext(850 kB)97 downloads
File information
File name FULLTEXT01.pdfFile size 850 kBChecksum SHA-512
d70ab41bb5c2584c1de0c61ba4bc2b658f2b430b80d3f430dbf7c9d65751ff5ec3359a1f406a50e496ed2bd86ccc46a361ffaed335144ec6e1ac1cdf1a304e1a
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Tomic, StevanAlégroth, Emil
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 97 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 316 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf