Open this publication in new window or tab >>2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 487-497Conference paper, Published paper (Refereed)
Abstract [en]
Automated testing, particularly for GUI-based systems, remains a costly and labor-intensive process and prone to errors. Despite advancements in automation, manual testing still dominates in industrial practice, resulting in delays, higher costs, and increased error rates. Large Language Models (LLMs) have shown great potential to automate tasks traditionally requiring human intervention, leveraging their cognitive-like abilities for test generation and evaluation. In this study, we present PathFinder, a Multi-Agent LLM (MALLM) framework that incorporates four agents responsible for (a) perception and summarization, (b) decision-making, (c) input handling and extraction, and (d) validation, which work collaboratively to automate exploratory web-based GUI testing. The goal of this study is to assess how different LLMs, applied to different agents, affect the efficacy of automated exploratory GUI testing. We evaluate PathFinder with three models, Mistral-Nemo, Gemma2, and Llama3.1, on four e-commerce websites. Thus, 27 permutations of the LLMs, across three agents (excluding the validation agent), to test the hypothesis that a solution with multiple agents, each using different LLMs, is more efficacious (efficient and effective) than a multi-agent solution where all agents use the same LLM. The results indicate that the choice of LLM constellation (combination of LLMs) significantly impacts efficacy, suggesting that a single LLM across agents may yield the best balance of efficacy (measured by F1-score). Hypothesis to explain this result include, but are not limited to: improved decision-making consistency and reduced task coordination discrepancies. The contributions of this study are an architecture for MALLM-based GUI testing, empirical results on its performance, and novel insights into how LLM selection impacts the efficacy of automated testing.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, ISSN 2159-4848
Keywords
AI-Assisted Software Testing, Automated Testing, Large Language Models (LLMs), MALLM, Multi-Agent Systems, Ability testing, Autonomous agents, C (programming language), Intelligent agents, Model checking, Software testing, GUI testing, Language model, Large language model, Multi agent, Multi-agent LLM, Multiagent systems (MASs), Software testings, Test generations, Automatic test pattern generation
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-28172 (URN)10.1109/ICST62969.2025.10989038 (DOI)001506893900043 ()2-s2.0-105007519090 (Scopus ID)9798331508142 (ISBN)
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025
Funder
Vinnova, 2024- 00242Knowledge Foundation, 20180010
2025-06-232025-06-232025-09-30Bibliographically approved