Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Applicability of Transfer Learning Techniques to Different BERT-based Models and Domain-Specific Datasets
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Pretrained Language Models (PLMs), such as BERT and GPT, have advanced the state- of-the-art in a vast range of Natural Language Understanding (NLU) downstream tasks. Using PLMs as a backbone and fine-tuning target tasks has become the standard NLU framework, leading to the emergence of many PLM-focused transfer- learning techniques. Although many variations of the original architectures have become widely used, current research fails to address the impact of different training frameworks on the performance of adaptation methods. For this purpose, we investigate the performance of transfer-learning techniques across BERT-based models on the standard benchmarks. We evaluate two popular methods for transfer in data-scarce scenarios: Task-Adaptive Pretraining (TAPT) and Supplementary Training on Intermediate Labelled-Data Tasks (STILTs), which are based on continued training and require minimal setup. Furthermore, we propose the incorporation of the Next Sentence Prediction (NSP) learning objective introduced by BERT in the second phase of TAPT pretraining. Our results show that although STILTs performs better on average, no transfer method was consistently better across models. Our proposed methods, Task-Adaptive Sentence Pretraining (TASPT) and Task-Adaptive Sentence and Masked Pretraining (TASMPT), show overall performance degradation despite improving results in specific use cases. We conclude that transfer-learning techniques through model optimisation are not robust to variations of BERT in terms of performance impact. The results suggest that a thorough search of transfer methods can be highly effective in improving downstream performance, but no conclusions can be drawn from studies performed in different models.

Abstract [sv]

Pretrained Language Models (PLMs), t.ex. BERT och GPT, har förbättrat den senaste tekniken inom ett stort antal Natural Language Understanding (NLU)-uppgifter i efterföljande led. Att använda PLMs som ryggrad och finjustera måluppgifterna har blivit standardramverket för NLU, vilket har lett till uppkomsten av många PLM- fokuserade tekniker för transferinlärning. Även om många variationer av de ursprungliga arkitekturerna har blivit allmänt använda, misslyckas den nuvarande forskningen med att ta itu med effekterna av olika utbildningsramar på anpassningsmetodernas prestanda. För detta ändamål undersöker vi prestandan för transferinlärningstekniker över BERT-baserade modeller på standardbenchmarks. Vi utvärderar två populära metoder för överföring i scenarier med knappa data: Task-Adaptive Pretraining (TAPT) och Supplementary Training on Intermediate Labelled-Data Tasks (STILTs), som baseras på fortsatt träning och kräver minimal installation. Dessutom föreslår vi att lärandemålet Next Sentence Prediction (NSP), som introducerats av BERT, införlivas i den andra fasen av TAPT-förträningen. Våra resultat visar att även om STILTs presterar bättre i genomsnitt, var ingen överföringsmetod konsekvent bättre mellan modellerna. Våra föreslagna metoder, Task-Adaptive Sentence Pretraining (TASPT) och Task- Adaptive Sentence and Masked Pretraining (TASMPT), visar en övergripande prestandaförsämring trots förbättrade resultat i specifika användningsfall. Vi drar slutsatsen att transferinlärningstekniker genom modelloptimering inte är robusta mot variationer av BERT när det gäller prestandapåverkan. Resultaten tyder på att en grundlig genomgång av överföringsmetoder kan vara mycket effektiv för att förbättra nedströmsprestandan, men inga slutsatser kan dras från studier som utförts i olika modeller.

Place, publisher, year, edition, pages
2024. , p. 103
Series
TRITA-EECS-EX ; 2024:891
Keywords [en]
Pretrained Language Models, Masked Language Modelling, Next Sentence Prediction, Domain Adaptation, Multi-class Classification, Natural Language Processing
Keywords [sv]
Förutbildade språkmodeller, maskerad språkmodellering, prediktion av nästa mening, domänanpassning, flerklassklassificering, bearbetning av naturliga språk
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-360859OAI: oai:DiVA.org:kth-360859DiVA, id: diva2:1942220
External cooperation
BMW Group
Supervisors
Examiners
Available from: 2025-03-11 Created: 2025-03-04 Last updated: 2025-03-11Bibliographically approved

Open Access in DiVA

fulltext(1569 kB)134 downloads
File information
File name FULLTEXT02.pdfFile size 1569 kBChecksum SHA-512
e8466465aeba7e0e831f7f6255ce28be40cf2e4faa452009cff45b0289cc419975c07c228b127889a8239d5d0b4c4417b0e4217e80249722f302ecb9fa503508
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 134 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 356 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf