Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Labyrinth Navigation Using Reinforcement Learning with a High Fidelity Simulation Environment
Linköping University, Department of Electrical Engineering, Automatic Control.
Linköping University, Department of Electrical Engineering, Automatic Control.
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This is a master thesis on the subject of navigation and control using reinforcementlearning, more specifically discrete Q-learning. The Q-learning algorithmis used to develop a steer policy from training inside of a simulation environment.The problem is to navigate a steel ball through a maze made from walls and holes. This thesis is the third thesis made revolving around this problem which allows for performance comparison with more classical control algorithms. The most successful of which is the gain scheduled LQR used to follow a splined path. The reinforcement learning derived steer policy managed at best 68 % success rate when navigating the ball from start to finish. Key features that had large impacton the policy performance when implemented in the simulation environment were response time of the physical servos and uncertainty added to the modelled forces. Compared to the performance of the LQR, which managed 46 % success rate, the reinforcement learning derived policy performs well. But with high fluctuation in performance policy to policy the control method is not a consistent solution to the problem. Future work is needed to perfect the algorithm and the resulting policy. A few interesting issues to investigate could be other formulations of disturbance implementation and training online on the physical system. Training online could allow for fine tuning of the simulation derived policy and learning how to compensate for disturbances that are difficult to model, such as bumps and warping in the labyrinth surface.

Place, publisher, year, edition, pages
2022. , p. 60
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:liu:diva-186611ISRN: LiTH-ISY-EX--22/5492--SEOAI: oai:DiVA.org:liu-186611DiVA, id: diva2:1678183
Subject / course
Electrical Engineering
Available from: 2022-06-29 Created: 2022-06-28 Last updated: 2022-06-29Bibliographically approved

Open Access in DiVA

fulltext(15042 kB)748 downloads
File information
File name FULLTEXT01.pdfFile size 15042 kBChecksum SHA-512
228677ed36072b643b92ae5aecae6e1e678271899711094e88fc8bea7febd98675bcf634e76d96d269fc5b96e67191f550242d0d8b63317da2e493eb3887bc05
Type fulltextMimetype application/pdf

By organisation
Automatic Control
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 748 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 386 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf