Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Impact of observation noise and reward sparseness on Deep Deterministic Policy Gradient when applied to inverted pendulum stabilization
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Effekten av observationsbrus och belöningsgleshet på Deep Deterministic Policy Gradient tillämpad på inverterad pendelstabilisering. (Swedish)
Abstract [en]

Deep Reinforcement Learning (RL) algorithms have been shown to solve complex problems. Deep Deterministic Policy Gradient (DDPG) is a state-of-the-art deep RL algorithm able to handle environments with continuous action spaces. This thesis evaluates how the DDPG algorithm performs in terms of success rate and results depending on observation noise and reward sparseness using a simple environment. A threshold for how much gaussian noise can be added to observations before algorithm performance starts to decrease was found between a standard deviation of 0.025 and 0.05. It was also con-cluded that reward sparseness leads to result inconsistency and irreproducibility, showing the importance of a well-designed reward function. Further testing is required to thoroughly evaluate the performance impact when noisy observations and sparse rewards are combined.

Abstract [sv]

Djupa Reinforcement Learning (RL) algoritmer har visat sig kunna lösa komplexa problem. Deep Deterministic Policy Gradient (DDPG) är en modern djup RL algoritm som kan hantera miljöer med kontinuerliga åtgärdsutrymmen. Denna studie utvärderar hur DDPG-algoritmen presterar med avseende på lösningsgrad och resultat beroende på observationsbrus och belöningsgles-het i en enkel miljö. Ett tröskelvärde för hur mycket gaussiskt brus som kan läggas på observationer innan algoritmens prestanda börjar minska hittades mellan en standardavvikelse på 0,025 och 0,05. Det drogs även slutsatsen att belöningsgleshet leder till inkonsekventa resultat och oreproducerbarhet, vilket visar vikten av en väl utformad belöningsfunktion. Ytterligare tester krävs för att grundligt utvärdera effekten av att kombinera brusiga observationer och glesa belöningssignaler.

Place, publisher, year, edition, pages
2019. , p. 26
Series
TRITA-EECS-EX ; 2019:356
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-259758OAI: oai:DiVA.org:kth-259758DiVA, id: diva2:1353407
Supervisors
Examiners
Available from: 2019-09-24 Created: 2019-09-23 Last updated: 2019-09-24Bibliographically approved

Open Access in DiVA

fulltext(11137 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 11137 kBChecksum SHA-512
804eb944125ecfbc1c16e49cc4771c96b1eddf0451f8fda99e4ca9433a18afb0bece5dfd8cd43ae5abaff1db47de88af97df58b8bb1012caf79502f2dca82710
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 18 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf