Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Goal-Directed Behaviour
KTH, School of Computer Science and Communication (CSC), Robotics, perception and learning, RPL.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Learning behaviour of artificial agents is commonly studied in the framework of Reinforcement Learning. Reinforcement Learning gained increasing popularity in the past years. This is partially due to developments that enabled the possibility to employ complex function approximators, such as deep networks, in combination with the framework. Two of the core challenges in Reinforcement Learning are the correct assignment of credits over long periods of time and dealing with sparse rewards. In this thesis we propose a framework based on the notions of goals to tackle these problems. This work implements several components required to obtain a form of goal-directed behaviour, similar to how it is observed in human reasoning. This includes the representation of a goal space, learning how to set goals and finally how to reach them. The framework itself is build upon the options model, which is a common approach for representing temporally extended actions in Reinforcement Learning. All components of the proposed method can be implemented as deep networks and the complete system can be learned in an end-to-end fashion using standard optimization techniques. We evaluate the approachon a set of continuous control problems of increasing difficulty. We show, that we are able to solve a difficult gathering task, which poses a challenge to state-of-the-art Reinforcement Learning algorithms. The presented approach is furthermore able to scale to complex kinematic agents of the MuJoCo benchmark.

Abstract [sv]

Inlärning av beteende för artificiella agenter studeras vanligen inom Reinforcement Learning.Reinforcement Learning har på senare tid fått ökad uppmärksamhet, detta berordelvis på utvecklingen som gjort det möjligt att använda komplexa funktionsapproximerare, såsom djupa nätverk, i kombination med Reinforcement Learning. Två av kärnutmaningarnainom reinforcement learning är credit assignment-problemet över långaperioder samt hantering av glesa belöningar. I denna uppsats föreslår vi ett ramverk baseratpå delvisa mål för att hantera dessa problem. Detta arbete undersöker de komponentersom krävs för att få en form av målinriktat beteende, som liknar det som observerasi mänskligt resonemang. Detta inkluderar representation av en målrymd, inlärningav målsättning, och till sist inlärning av beteende för att nå målen. Ramverket byggerpå options-modellen, som är ett gemensamt tillvägagångssätt för att representera temporaltutsträckta åtgärder inom Reinforcement Learning. Alla komponenter i den föreslagnametoden kan implementeras med djupa nätverk och det kompletta systemet kan tränasend-to-end med hjälp av vanliga optimeringstekniker. Vi utvärderar tillvägagångssättetpå en rad kontinuerliga kontrollproblem med varierande svårighetsgrad. Vi visar att vikan lösa en utmanande samlingsuppgift, som tidigare state-of-the-art algoritmer har uppvisatsvårigheter för att hitta lösningar. Den presenterade metoden kan vidare skalas upptill komplexa kinematiska agenter i MuJoCo-simuleringar.

Place, publisher, year, edition, pages
2017. , p. 57
Keywords [en]
Hierarchical Reinforcement Learning, Options, Deep Neural Networks
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-213015OAI: oai:DiVA.org:kth-213015DiVA, id: diva2:1136420
Supervisors
Examiners
Available from: 2017-09-19 Created: 2017-08-28 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

Learning Goal-Directed Behaviour(2801 kB)349 downloads
File information
File name FULLTEXT01.pdfFile size 2801 kBChecksum SHA-512
28bed89adb6cd93d40637f2baeb8780a845ee238e70cc221a1cb0fd075c971a37be773605eb914ec3adf4e5d56680bc2c13b1439adf8db3ba93a43dda5f95511
Type fulltextMimetype application/pdf

By organisation
Robotics, perception and learning, RPL
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 349 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 6272 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf