Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Inhibitor Attention and Model Compression for Sustainable Language Models
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering.
2025 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis investigates the optimization of transformer-based language models through an innovative attention mechanism called inhibitor attention, combined with model compression techniques. The thesis implements an alternative attention mechanism that replaces traditional dot product attention with a manhattan distance-based approach and \gls{relu} activation, examining both theoretical efficiency gains and practical implementation challenges.Through experiments combining this mechanism with knowledge distillation and quantization techniques, we evaluated the effectiveness of these methods on DistilBERT models. The results of from the GLUE benchmark suite show that the fine-tuned inhibitor model achieves competitive performance, scoring 74.5 compared to 77.0 for the traditional dot product model. In the IMDB sentiment analysis task, the inhibitor DistilBERT maintained a precision comparable (92. 81\%) to that of the standard DistilBERT (92. 82\%).While theoretical analysis through GEM5 simulations suggested potential energy savings with inhibitor attention, practical measurements on a CPU revealed contradictory results. The inhibitor model showed higher energy consumption (2011J vs. 1176J at sequence length 128) and lower throughput compared to traditional attention, highlighting the impact of current hardware optimizations on real-world performance. These findings demonstrate that, while inhibitor attention shows promise for developing more efficient transformer models, realizing its potential may require specialized hardware solutions and optimized implementations.

Place, publisher, year, edition, pages
2025. , p. 66
Keywords [en]
AI, Machine Learning, Knowledge Distillation
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:ltu:diva-112306OAI: oai:DiVA.org:ltu-112306DiVA, id: diva2:1950490
External cooperation
Rise (Research Institutes of Sweden)
Educational program
Computer Science and Engineering, master's level
Presentation
2024-11-20, A2527, Luleå, 14:00 (English)
Supervisors
Examiners
Available from: 2025-04-08 Created: 2025-04-08 Last updated: 2025-04-08Bibliographically approved

Open Access in DiVA

fulltext(684 kB)31 downloads
File information
File name FULLTEXT01.pdfFile size 684 kBChecksum SHA-512
0cb0a8ba0f50ec9b15563d7fefad6a998b61502e72494b34dff4ed20dca8c69ffa6142b90d6585e6892e4e83f3decc6b9769b52b9610742fca027ec044a19b2c
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science, Electrical and Space Engineering
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 31 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 54 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf