Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Reinforcement Learning for Games with Nondeterministic State Transitions
Linköping University, Department of Computer and Information Science, The Division of Statistics and Machine Learning.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Reinforcement Learning för spel med icke-deterministiska tillståndsövergångar (Swedish)
Abstract [en]

Given the recent advances within a subfield of machine learning called reinforcement learning, several papers have shown that it is possible to create self-learning digital agents, agents that take actions and pursue strategies in complex environments without any prior knowledge. This thesis investigates the performance of the state-of-the-art reinforcement learning algorithm proximal policy optimization, when trained on a task with nondeterministic state transitions. The agent’s policy was constructed using a convolutional neural network and the game Candy Crush Friends Saga, a single-player match-three tile game, was used as the environment.

The purpose of this research was to evaluate if the described agent could achieve a higher win rate than average human performance when playing the game of Candy Crush Friends Saga. The research also analyzed the algorithm's generalization capabilities on this task. The results showed that all trained models perform better than a random policy baseline, thus showing it is possible to use the proximal policy optimization algorithm to learn tasks in an environment with nondeterministic state transitions. It also showed that, given the hyperparameters chosen, it was not able to perform better than average human performance.

Place, publisher, year, edition, pages
2019. , p. 69
Keywords [en]
reinforcement learning, proximal policy optimization, PPO, machine learning, artificial intelligence, deep learning, neural network, candy crush, mobile game
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-158523ISRN: LIU-IDA/LITH-EX-A--19/055--SEOAI: oai:DiVA.org:liu-158523DiVA, id: diva2:1334684
External cooperation
Midasplayer AB
Subject / course
Computer Engineering
Presentation
2019-06-14, John Von Neumann, Linköping, 08:00 (English)
Supervisors
Examiners
Available from: 2019-07-11 Created: 2019-07-03 Last updated: 2019-07-11Bibliographically approved

Open Access in DiVA

fulltext(14697 kB)74 downloads
File information
File name FULLTEXT01.pdfFile size 14697 kBChecksum SHA-512
90bcf5b062bc6f84a117d243a10f8fd7943cdbd988b872450164b87d78e5a77536bfdf6b0a52ad8854e13005036ce45bde01f347a9927f98a183b5a9f584e42c
Type fulltextMimetype application/pdf

By organisation
The Division of Statistics and Machine Learning
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 74 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf