Digitala Vetenskapliga Arkivet

Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.ORCID-id: 0000-0002-4379-6614
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.ORCID-id: 0000-0001-7526-3727
Synteda, Gothenburg, Sweden.
2025 (Engelska)Ingår i: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, s. 487-497Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Automated testing, particularly for GUI-based systems, remains a costly and labor-intensive process and prone to errors. Despite advancements in automation, manual testing still dominates in industrial practice, resulting in delays, higher costs, and increased error rates. Large Language Models (LLMs) have shown great potential to automate tasks traditionally requiring human intervention, leveraging their cognitive-like abilities for test generation and evaluation. In this study, we present PathFinder, a Multi-Agent LLM (MALLM) framework that incorporates four agents responsible for (a) perception and summarization, (b) decision-making, (c) input handling and extraction, and (d) validation, which work collaboratively to automate exploratory web-based GUI testing. The goal of this study is to assess how different LLMs, applied to different agents, affect the efficacy of automated exploratory GUI testing. We evaluate PathFinder with three models, Mistral-Nemo, Gemma2, and Llama3.1, on four e-commerce websites. Thus, 27 permutations of the LLMs, across three agents (excluding the validation agent), to test the hypothesis that a solution with multiple agents, each using different LLMs, is more efficacious (efficient and effective) than a multi-agent solution where all agents use the same LLM. The results indicate that the choice of LLM constellation (combination of LLMs) significantly impacts efficacy, suggesting that a single LLM across agents may yield the best balance of efficacy (measured by F1-score). Hypothesis to explain this result include, but are not limited to: improved decision-making consistency and reduced task coordination discrepancies. The contributions of this study are an architecture for MALLM-based GUI testing, empirical results on its performance, and novel insights into how LLM selection impacts the efficacy of automated testing. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025. s. 487-497
Serie
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, ISSN 2159-4848
Nyckelord [en]
AI-Assisted Software Testing, Automated Testing, Large Language Models (LLMs), MALLM, Multi-Agent Systems, Ability testing, Autonomous agents, C (programming language), Intelligent agents, Model checking, Software testing, GUI testing, Language model, Large language model, Multi agent, Multi-agent LLM, Multiagent systems (MASs), Software testings, Test generations, Automatic test pattern generation
Nationell ämneskategori
Programvaruteknik
Identifikatorer
URN: urn:nbn:se:bth-28172DOI: 10.1109/ICST62969.2025.10989038ISI: 001506893900043Scopus ID: 2-s2.0-105007519090ISBN: 9798331508142 (tryckt)OAI: oai:DiVA.org:bth-28172DiVA, id: diva2:1974470
Konferens
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025
Ingår i projekt
T.A.R.G.E.T. – Testing with AI Reinforced GUI Embedded Technology, VinnovaSERT- Software Engineering ReThought, KK-stiftelsen
Forskningsfinansiär
Vinnova, 2024- 00242KK-stiftelsen, 20180010Tillgänglig från: 2025-06-23 Skapad: 2025-06-23 Senast uppdaterad: 2025-09-30Bibliografiskt granskad

Open Access i DiVA

fulltext(850 kB)121 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 850 kBChecksumma SHA-512
d70ab41bb5c2584c1de0c61ba4bc2b658f2b430b80d3f430dbf7c9d65751ff5ec3359a1f406a50e496ed2bd86ccc46a361ffaed335144ec6e1ac1cdf1a304e1a
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Tomic, StevanAlégroth, Emil
Av organisationen
Institutionen för programvaruteknik
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 121 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 342 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf