Digitala Vetenskapliga Arkivet

Change search
ExportLink to record
Permanent link

Direct link
BETA

Project

Project type/Form of grant
Project grant
Title [en]
T.A.R.G.E.T. – Testing with AI Reinforced GUI Embedded Technology
Abstract [sv]
Mjukvarutestning via grafiska gränssnitt, GUI-testning, är en vanlig men kostsam metod i industrin som bidrar till den globala kostnaden, om trillioner dollar varje år, för kvalitetssäkring av mjukvara. GUI-testning krävs för att säkerställa ett systems korrekta beteende men har flera utmaningar. Projektet ska titta på AI-baserad GUI-testning.
Abstract [en]
Software testing via graphical user interfaces, GUI testing, is a common but costly practice in industry that contributes to the global cost of trillions of dollars each year for software quality assurance. GUI testing is required to ensure the correct behavior of a system but has several challenges. The project will look at AI-based GUI testing.
Publications (3 of 3) Show all publications
Tomic, S., Alégroth, E. & Isaac, M. (2025). Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025 (pp. 487-497). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation
2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 487-497Conference paper, Published paper (Refereed)
Abstract [en]

Automated testing, particularly for GUI-based systems, remains a costly and labor-intensive process and prone to errors. Despite advancements in automation, manual testing still dominates in industrial practice, resulting in delays, higher costs, and increased error rates. Large Language Models (LLMs) have shown great potential to automate tasks traditionally requiring human intervention, leveraging their cognitive-like abilities for test generation and evaluation. In this study, we present PathFinder, a Multi-Agent LLM (MALLM) framework that incorporates four agents responsible for (a) perception and summarization, (b) decision-making, (c) input handling and extraction, and (d) validation, which work collaboratively to automate exploratory web-based GUI testing. The goal of this study is to assess how different LLMs, applied to different agents, affect the efficacy of automated exploratory GUI testing. We evaluate PathFinder with three models, Mistral-Nemo, Gemma2, and Llama3.1, on four e-commerce websites. Thus, 27 permutations of the LLMs, across three agents (excluding the validation agent), to test the hypothesis that a solution with multiple agents, each using different LLMs, is more efficacious (efficient and effective) than a multi-agent solution where all agents use the same LLM. The results indicate that the choice of LLM constellation (combination of LLMs) significantly impacts efficacy, suggesting that a single LLM across agents may yield the best balance of efficacy (measured by F1-score). Hypothesis to explain this result include, but are not limited to: improved decision-making consistency and reduced task coordination discrepancies. The contributions of this study are an architecture for MALLM-based GUI testing, empirical results on its performance, and novel insights into how LLM selection impacts the efficacy of automated testing. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, ISSN 2159-4848
Keywords
AI-Assisted Software Testing, Automated Testing, Large Language Models (LLMs), MALLM, Multi-Agent Systems, Ability testing, Autonomous agents, C (programming language), Intelligent agents, Model checking, Software testing, GUI testing, Language model, Large language model, Multi agent, Multi-agent LLM, Multiagent systems (MASs), Software testings, Test generations, Automatic test pattern generation
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-28172 (URN)10.1109/ICST62969.2025.10989038 (DOI)001506893900043 ()2-s2.0-105007519090 (Scopus ID)9798331508142 (ISBN)
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025
Funder
Vinnova, 2024- 00242Knowledge Foundation, 20180010
Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-09-30Bibliographically approved
Buarque Franzosi, D., Alégroth, E. & Isaac, M. (2025). LLM-Based Labelling of Recorded Automated GUI-Based Test Cases. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, April 31-4, 2025 (pp. 453-463). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>LLM-Based Labelling of Recorded Automated GUI-Based Test Cases
2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 453-463Conference paper, Published paper (Refereed)
Abstract [en]

Graphical User Interface (GUI) based testing is a commonly used practice in industry. Although valuable and, in many cases, necessary, it is associated with challenges such as high cost and requirements on both technical and domain expertise. Augmented testing, a novel approach to GUI test automation, aims to mitigate these challenges by allowing users to record and render test cases and test data directly on the GUI of the system under test (SUT). In this context, Scout is an augmented testing tool that captures system states and transitions during manual interaction with the SUT, storing them in a test model that is visually represented in the form of state trees and reports. While this representation provides basic overview of a test suite, e.g. about its size and number of scenarios, it is limited in terms of analysis depth, interpretability, and reproducibility. In particular, without human state labeling, it is challenging to produce meaningful and easily understandable test reports. To address this limitation, we present a novel solution and a demonstrator, integrated into Scout, which leverages large language models (LLMs) to enrich the model-based test case representation by automatically labeling and describing states and describing transitions. We conducted two experiments to evaluate the impact of the solution. First, we compared LLM-enhanced reports with expert-generated reports using embedding distance evaluation metrics. Second, we assessed the usability and perceived value of the enhanced reports through an industrial survey. The results of the study indicate that the plugin can improve readability, actionability, and interpretability of test reports. This work contributes to the automation of GUI testing by reducing the need for manual intervention, e.g. labeling, and technical expertise, e.g. to understand test case models. Although the solution is studied in the context of augmented testing, we argue for the solution's generalizability to related test automation techniques. In addition, we argue that this approach enables actionable insights and lays the groundwork for further research into autonomous testing based on Generative AI. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
High costs, Interpretability, Labelings, Language model, Model-based OPC, Systems under tests, Technical expertise, Test Automation, Test case, Test reports, Graphical user interfaces
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-28173 (URN)10.1109/ICST62969.2025.10988984 (DOI)001506893900040 ()2-s2.0-105007522870 (Scopus ID)9798331508142 (ISBN)
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, April 31-4, 2025
Funder
Knowledge Foundation, 20180010Vinnova, 2024-00242
Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-09-30Bibliographically approved
Buarque Franzosi, D., Capovski, K., Isaac, M. & Byttner, S. (2025). Recommendation System of Client-Requested Projects in the Swedish Consultancy Market with LLMs. In: Nowaczyk S., Vettoruzzo A. (Ed.), CEUR Workshop Proceedings: . Paper presented at 2025 Swedish AI Society Workshop, SAIS 2025, Halmstad, June 16-17, 2025 (pp. 79-92). Technical University of Aachen, 4037
Open this publication in new window or tab >>Recommendation System of Client-Requested Projects in the Swedish Consultancy Market with LLMs
2025 (English)In: CEUR Workshop Proceedings / [ed] Nowaczyk S., Vettoruzzo A., Technical University of Aachen , 2025, Vol. 4037, p. 79-92Conference paper, Published paper (Refereed)
Abstract [en]

This article presents the recommendation system of Personas, a microservice-based platform designed to assist Human Resources (HR) teams in streamlining the recommendation and presentation of candidates to clients based on posted project descriptions. Personas offers functionalities for recommendation, automatic generation of tailored curricula and motivation letters, and conversational support through client- and consultant-facing chatbots. At its core, the recommendation system suggests relevant projects posted by clients to each candidate on a daily basis. It leverages both structured and unstructured textual data, including web-scraped content, user-uploaded documents, curated profiles, and texts generated by Large Language Models (LLMs). All documents are embedded into a shared semantic vector space, enabling fast similarity computations and facilitating Retrieval-Augmented Generation (RAG) workflows. The recommendation pipeline consists of a two-stage process. First, lightweight pre-selection models apply filters and semantic similarity metrics to narrow down the pool of potential assignments. Then, in-depth analyses using LLMs provide refined compatibility assessments. Notably, the LLM-based evaluations serve not only to improve ranking precision, but also as high-quality proxy labels for evaluating and improving pre-selection models. This paper describes each stage of the pipeline-document collection, structuring, curation, pre-selection, and LLM-based analysis-and presents quantitative results demonstrating the system's effectiveness on a large-scale dataset.

Place, publisher, year, edition, pages
Technical University of Aachen, 2025
Series
CEUR Workshop Proceedings, E-ISSN 1613-0073
Keywords
Job Matching, Large Language Models, Recommendation System, Semantic Similarity, Abstracting, Information management, Information systems, Information use, Large datasets, Pipelines, Semantics, Automatic Generation, Chatbots, Language model, Large language model, Pre-selection, Selection model, Swedishs, Textual data, Recommender systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-28781 (URN)2-s2.0-105017771671 (Scopus ID)
Conference
2025 Swedish AI Society Workshop, SAIS 2025, Halmstad, June 16-17, 2025
Funder
Vinnova, 2024-00242Knowledge Foundation, 20180010
Available from: 2025-10-17 Created: 2025-10-17 Last updated: 2025-10-17Bibliographically approved
Principal InvestigatorAlégroth, Emil
Coordinating organisation
Blekinge Institute of Technology
Funder
Period
2024-05-01 - 2027-04-30
National Category
Software Engineering
Identifiers
DiVA, id: project:9481Project, id: 2024-00242_Vinnova

Search in DiVA

Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

Link to external project page