Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Att spela 'Breakout' med hjälp av 'Deep Q-Learning'
KTH, School of Engineering Sciences (SCI).
KTH, School of Engineering Sciences (SCI).
2019 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Learning to Play Breakout Using Deep Q-Learning Networks (English)
Abstract [sv]

I denna rapport implementerar vi en reinforcement learning (RL) algoritm som lär sig spela Breakout på 'Atari Learning Environment'. Den dator drivna spelaren (Agenten) har tillgång till samma information som en mänsklig spelare och vet inget om spelet och dess regler på förhand. Målet är att reproducera tidigare resultat genom att optimera agenten så att den överträffar den typiska mänskliga medelpoängen. För att genomföra detta formaliserar vi problemet som en 'Markov decision Process'. VI applicerar 'Deep Q-learning' algoritmen med 'action masking' för att uppnå en optimal strategi. Vi finner att vår agents genomsnittliga poäng ligger lite under den mänskliga: med 20 poäng som medel och ungefär 65% av den mänskliga motsvarigheten. Vi diskuterar en del möjliga implementationer och förbättringar som kan appliceras i framtida forskningsprojekt.

Abstract [en]

We cover in this report the implementation of a reinforcement learning (RL) algorithm capable of learning how to play the game 'Breakout' on the Atari Learning Environment (ALE). The non-human player (agent) is given no prior information of the game and must learn from the same sensory input that a human would typically receive when playing the game. The aim is to reproduce previous results by optimizing the agent driven control of 'Breakout' so as to surpass a typical human score. To this end, the problem is formalized by modeling it as a Markov Decision Process. We apply the celebrated Deep Q-Learning algorithm with action masking to achieve an optimal strategy. We find our agent's average score to be just below the human benchmarks: achieving an average score of 20, approximately 65% of the human counterpart. We discuss a number of implementations that boosted agent performance, as well as further techniques that could lead to improvements in the future.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:238
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-255799OAI: oai:DiVA.org:kth-255799DiVA, id: diva2:1341574
Supervisors
Examiners
Available from: 2019-08-09 Created: 2019-08-09 Last updated: 2019-08-09Bibliographically approved

Open Access in DiVA

fulltext(627 kB)14 downloads
File information
File name FULLTEXT01.pdfFile size 627 kBChecksum SHA-512
0b850fc124c35a7069ca28b97d0b13728f7154d2cec0460495a38287876538dcd1546d1f34352bcd33638c477df8cf52af585bb0defc0858b9631878d20dd188
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 14 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf