Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automating the Assessment of Retrieval-augmented Generation Responses
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).
2025 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Automatisera bedömning av svar från RAG-modeller (Swedish)
Abstract [en]

Today, manual categorization of test cases is used to evaluate how well CGI’s retrieval-augmented generation (RAG) model works for customers, particularly those withoutprior subject knowledge. Manual categorization is a time-consuming and costlyprocess. This thesis explores the automation of this process by utilizing the frameworksRagas and DeepEval. Framework metrics are identified, evaluated, and used to createa comprehensive automated evaluation through the RAG triad model that considerscontext relevance, groundedness, and answer relevance.The evaluation results for Ragas and DeepEval revealed the pros and cons of eachframework in different areas. This insight led to the creation of combined resultsfrom both frameworks, incorporating Response Relevance and Faithfulness metricsfrom Ragas and the Context Recall metric from DeepEval. The combined approachdemonstrates improved accuracy and reliability by harnessing the strengths of bothframeworks, providing a more robust solution for automated test case evaluation.

Abstract [sv]

Idag används manuell kategorisering för att utvärdera hur bra CGI’sretrieval-augmented generation (RAG) modell fungerar för kunder, särskilt de utanförkunskaper inom området. En process som är både tidskrävande och kostsam.Denna uppsats undersöker därför möjligheten att automatisera denna process genomatt använda ramverken Ragas och DeepEval. Genom att identifiera, utvärdera ochanvända deras mått för att skapa en omfattande automatiserad utvärdering baseradpå the RAG triad, en modell som tar hänsyn till context relevance, groundedness ochanswer relevance.Utvärderingen resulterade i resultattabeller för Ragas och DeepEval, där fördelar ochnackdelar för respektive ramverk inom olika områden framkom. Denna insikt leddetill skapandet av en kombinerad resultattabell som inkluderar Response Relevanceoch Faithfulness-poäng från Ragas samt Context Recall-poäng från DeepEval. Denkombinerade resultattabell visar på förbättrad noggrannhet och tillförlitlighet genomatt utnyttja styrkorna hos båda ramverken, vilket ger en mer robust lösning förautomatiserad utvärdering av testfall.

Place, publisher, year, edition, pages
2025. , p. 95
Keywords [en]
Retrieval-Augmented Generation, Context Recall, Response Relevance, Faithfulness, Ragas, DeepEval
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kau:diva-103038OAI: oai:DiVA.org:kau-103038DiVA, id: diva2:1935379
External cooperation
CGI
Subject / course
Computer Science
Educational program
Engineering: Industrial Engineering and Management (300 ECTS credits)
Supervisors
Examiners
Available from: 2025-02-07 Created: 2025-02-06 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(1905 kB)152 downloads
File information
File name FULLTEXT01.pdfFile size 1905 kBChecksum SHA-512
77ea764f2eefa1feb5096e7c9751f3d4ff778678e13dfa5e9536a5f8f5cb7d82e90dcfa83942665a06f04b0222b24697b674e967b7fa91439e7307356e98c706
Type fulltextMimetype application/pdf

By organisation
Department of Mathematics and Computer Science (from 2013)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 152 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 306 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf