Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Can you speak Forklift?: Exploring usage of Large Language Models on Forklift Controller Area Network Data for Machine Activity Recognition
Linköping University, Department of Computer and Information Science.
Linköping University, Department of Computer and Information Science.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
CAN du tala Truck? : Utforskar användning av språkmodeller på data från gaffeltruckars Controller Area Network för aktivitetsigenkänning (Swedish)
Abstract [en]

Machine Activity Recognition (MAR) is gaining popularity due to its diverse applications, including improving efficiency, predicting maintenance needs, and identifying anomalies and unsafe behavior patterns. At the same time Large Language Models (LLMs) has shown strengths in similar domains where labeled training data is hard to come by. This paper explores the application of LLMs on Controller Area Network (CAN) data from forklifts for MAR. Conducted in partnership with TMHMS, our research aims to develop an unsupervised method for predicting forklift activities by tokenizing CAN data and using a custom Generative Pre-trained Transformer (GPT) model. A key focus is identifying significant changes for discrete values in CAN data to train the model and evaluate its performance.

Our research makes several contributions, including the creation of a new language for encoding forklift activities and a method for tokenizing CAN data. By using three different own defined vocabularies to train the GPT model, which was then compared to traditional statistical models. Quantitative analysis using Bilingual Evaluation Understudy (BLEU) scores and sequence accuracy showed that the GPT model outperformed the statistical models. Qualitative analysis indicated that the generated sequences were plausible, though further exploration is needed. Our findings suggest that while the GPT model can identify complex patterns in CAN data that statistical models can not. Although the GPT model's predictions did not always align perfectly with the reference data, they were reasonable and demonstrated potential as a powerful tool if human intervention is reduced,  providing a more predictable action chain. Additionally, plotting synthetic sequences with t-Distributed Stochastic Neighbor Embedding (t-SNE) and comparing them with training data showed that the synthetic sequences were more coherent and plausible than those generated by traditional statistical models or random processes.

Abstract [sv]

Stora språkmodeller (LLMs) blir alltmer populära, och många företag utvecklar modeller anpassade till deras specifika behov. Att skapa effektiva skräddarsydda träningsuppsättningar är dock ofta en utmaning. Denna studie undersöker tillämpningen av LLMs på Controller Area Network (CAN) data från gaffeltruckar för maskinaktivitet igenkänning. I samarbete med Toyota Material Handling, syftar vår forskning till att utveckla en unsupervised metod för att förutsäga gaffeltruckars aktiviteter genom att tokenisera CAN-data och använda en anpassad Generative Pre-trained Transformer (GPT) modell. Ett viktigt fokus är att identifiera changepoints i CAN-data för att träna modellen och utvärdera dess prestanda.

Vår forskning bidrar på flera sätt, såsom att skapa ett nytt språk för kodning av gaffeltruckars aktiviteter och utveckla en metod för tokenisering av CAN-data. Vi tränade GPT-modellen med tre olika fabricerade vokabulärer och jämförde den sedan med traditionella statistiska modeller. Kvantitativ analys med BLEU-poäng och sekvensnoggrannhet visade att GPT-modellen presterade bättre än de statistiska modellerna. Kvalitativ analys visade att de genererade sekvenserna var trovärdiga, även om ytterligare forskning behövs. Våra resultat tyder på att GPT-modellen kan identifiera komplexa mönster i CAN-data som statistiska modeller inte kan. Även om GPT-modellens förutsägelser inte alltid exakt matchade referensdatan, var de rimliga och visade potential som ett kraftfullt verktyg med mindre mänsklig intervention. Dessutom visade visualisering av syntetiska sekvenser med t-Distributed Stochastic Neighbor Embedding (t-SNE) och jämförelser med träningsdata att de syntetiska sekvenserna var mer sammanhängande och trovärdiga än de som genererades av traditionella statistiska modeller eller slumpmässiga processer.

Place, publisher, year, edition, pages
2024. , p. 67
Keywords [en]
Large landguage models, machine learning, unsupervised learning, CAN, controller area network, Forklift, predictive maintenance, Machine activity recognition, AI, synthetic data
Keywords [sv]
Språkmodeller, maskininlärning, oövervakad inlärning, CAN, controller are network, gaffeltruck, syntetisk data, data science, AI, aktivitetsigenkänning
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-210933ISRN: LIU-IDA/LITH-EX-A--24/043--SEOAI: oai:DiVA.org:liu-210933DiVA, id: diva2:1927384
External cooperation
Toyota Material Handling Manufacturing Sweden AB
Subject / course
Technical Physics
Presentation
2024-06-14, Alan Turing, Linköping, 15:29 (Swedish)
Supervisors
Examiners
Available from: 2025-02-14 Created: 2025-01-14 Last updated: 2025-02-14Bibliographically approved

Open Access in DiVA

fulltext(4837 kB)47 downloads
File information
File name FULLTEXT01.pdfFile size 4837 kBChecksum SHA-512
0dc04da05b79179aaf151a3d48477a807c7d013f0a209c102a8c0d6d4664d78092a3aca6776102e2e87f5ddcffebbf89de6dcf03a4d30dabfe219342bc75e23c
Type fulltextMimetype application/pdf

By organisation
Department of Computer and Information Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 47 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 871 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf