Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Stuck state avoidance through PID estimation training of Q-learning agent
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Förhindrande av odefinierade tillstånd vid Q-learning träning genom PID estimering (Swedish)
Abstract [en]

Reinforcement learning is conceptually based on an agent learning through interaction with its environment. This trial-and-error learning method makes the process prone to situations in which the agent is stuck in a dead-end, from which it cannot keep learning. This thesis studies a method to diminish the risk that a wheeled inverted pendulum, or WIP, falls over during training by having a Qlearning based agent estimate a PID controller before training it on the balance problem. We show that our approach is equally stable compared to a Q-learning agent without estimation training, while having the WIP falling over less than half the number of times during training. Both agents succeeds in balancing the WIP for a full hour in repeated tests.

Abstract [sv]

Reinforcement learning baseras på en agent som lär sig genom att interagera med sin omgivning. Denna inlärningsmetod kan göra att agenten hamnar i situationer där den fastnar och inte kan fortsätta träningen. I denna examensuppsats utforskas en metod för att minska risken att en självkörande robot faller under inlärning. Detta görs genom att en Q-learning agent tränas till att estimera en PID kontroller innan den tränar på balanseringsproblemet. Vi visar att vår metod är likvärdigt stabil jämfört med en Q-learning agent utan estimeringsträning. Under träning faller roboten färre än hälften så många gånger när den kontrolleras av vår metod. Båda agenterna lyckas balansera roboten under en hel timme.

Place, publisher, year, edition, pages
2019. , p. 32
Series
TRITA-EECS-EX ; 2019:385
Keywords [en]
Q-learning, QL, PID, wheeled inverted pendulum, WIP, reinforcement learning, estimation training
Keywords [sv]
Q-learning, QL, PID, självbalanserande robot, reinforcement learning, estimeringsträning
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-264562OAI: oai:DiVA.org:kth-264562DiVA, id: diva2:1374213
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2019-11-29 Created: 2019-11-29

Open Access in DiVA

fulltext(595 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 595 kBChecksum SHA-512
86eaff7523f33757d8ff2ad53d4cf641b1806fbc50653decae672713626b951ed3e84a391d3e6fff6ba5d908e125fe885a4e539d9ae452fbe9c3c8fdc327b0ad
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf