Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Djupinlärning på Snake
KTH, School of Engineering Sciences (SCI).
KTH, School of Engineering Sciences (SCI).
2019 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Deep Reinforcement Learning for Snake (English)
Abstract [sv]

Algoritmer baserade på reinforcement learning har framgångsrikt tillämpats på många olika maskininlärningsproblem. I denna rapport presenterar vi hur vi implementerar varianter på deep Q-learning-algoritmer på det klassiska datorspelet Snake. Vi ämnar undersöka hur en sådan algoritm ska konfigureras för att lära sig spela Snake så bra som möjligt. För att göra detta studerar vi hur inlärningen beror på ett urval av parametrar, genom att variera dessa en och en och studera resultaten. Utifrån detta lyckas vi konstruera en algoritm som lär sig spela spelet så pass bra att den som högst får 66 poäng, vilket motsvarar att täcka 46 % av spelplanen, efter drygt fem timmars träning. Vidare så finner vi att den tränade algoritmen utan större svårigheter hanterar att hinder introduceras i spelet.

 

Abstract [en]

Reinforcement learning algorithms have proven to be successful at various machine learning tasks. In this paper we implement versions of deep Q-learning on the classic video game Snake. We aim to find out how this algorithm should be configured in order for it to learn to play the game as well as possible. To do this, we study how the learning performance of the algorithm depends on some of the many parameters involved, by changing one parameter at a time and recording the effects. From this we are able to set up an algorithm that learns to play the game well enough to achieve a high score of 66 points, corresponding to filling up 46\% of the playing field, after just above 5 hours of training. Further, we find that the trained algorithm can cope well with an obstacle being added to the game.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:249
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-255828OAI: oai:DiVA.org:kth-255828DiVA, id: diva2:1342302
Supervisors
Examiners
Available from: 2019-08-13 Created: 2019-08-13 Last updated: 2019-08-13Bibliographically approved

Open Access in DiVA

fulltext(12058 kB)16 downloads
File information
File name FULLTEXT01.pdfFile size 12058 kBChecksum SHA-512
fd53c47f2fe3ad963f996712c14db915eda3dc9afcf463142bee4bdb968c45f3816ea7849f1532019dd1eb33cb302e20f46c559c16306ae21d732a3d21c6c56d
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 16 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf