Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A study of the exploration/exploitation trade-off in reinforcement learning: Applied to autonomous driving
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
En studie om utforskning/utnyttjande avvägningen inom förstärkande inlärning : Applicerat på autonoma fordon (Swedish)
Abstract [en]

A world initiative was set in motion for decreasing the amount of traffic accidents. Autonomous driving is a field which contributes to the initiative. Following report examines exploration/exploitationtrade-off in reinforcement learning applied to decision making in autonomous driving. The approach consisted of modelling the problemas a Markov Decision Process which was solved with the Q-learning. Decision making utilized exploration greed approach. Scenarios consisted of different kinds of intersections, and was built using SUMO. The ego vehicle was controlled using TraCI. Goal was to discuss thetrade-off from two perspectives - time and safety, measured in numberof collision among other things - in the domain of autonomous driving. Furthermore, exploration prompted ego vehicle to pass the scenarios in less time. This lead to increased collisions, and thus decreased safety. In contrast, exploitation preferred deacceleration and stopping which resulted in increased safety but increased the passage time and traffic. Conclusion was to exploit previous experiences when applying reinforcement learning to decision making in autonomous driving because safety is the highest priority when it comes to autonomous driving and the world initiative.

Abstract [sv]

Ett globalt initiativ startades för att reducera antalet trafikolyckor innan år 2030. Autonoma fordon är ett forskningsområde som bidrar till det globala initiativet. I denna rapport undersöks avvägningen mellan utforskning och utnyttjande inom förstärkningsinlärande för beslutsfattande processen inom autonoma fordon. Tillvägagångssättet bestod av att modellera problemet som Markov Beslutsprocess som löstes med hjälp av Q-learning. Beslutsfattande processen tillvaratog en utnyttjande inställning. Scenario bestod av olika typer av korsningar, och de programmerades med hjälp av SUMO. Autonoma fordonet kontrollerades med hjälp av TraCI. Målet var att diskutera avvägningen från två perspektiv tid och säkerhet, mät i antalet kollisioner bl.a inom forskningsområdet autonoma fordon. Resultat visade att utforskning uppmanade autonoma fordonet att passera scenarion under kortare tid. Detta ledde till ökade antal kollisioner och därmed minskad säkerhet. Å andra sidan, ökad utnyttjande föredrog inbromsning vilket resulterade i ökad antalet lyckade passeringar. Detta leder till ökad säkerhet men ökar också passeringstiden och mängden trafik. Slutsatsen var att man ska föredra utnyttjande av tidigare erfarenheter när man tillämpar förstärkningsinlärande på beslutsfattandeprocessen inom autonoma fordon. Slutsatsen befattades därför att säkerhet har högst prioritet inom autonoma fordon och det globala initiativet.

Place, publisher, year, edition, pages
2019.
Series
TRITA-EECS-EX ; 2019:319
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-254938OAI: oai:DiVA.org:kth-254938DiVA, id: diva2:1336430
Subject / course
Computer and Systems Sciences
Supervisors
Examiners
Available from: 2019-07-29 Created: 2019-07-09 Last updated: 2022-06-26Bibliographically approved

Open Access in DiVA

fulltext(1104 kB)2681 downloads
File information
File name FULLTEXT01.pdfFile size 1104 kBChecksum SHA-512
52fad6d3ddc076453e0b49ab4f83da2c5579960873cc84488b72ee8f5e6bd90f9937b866e06064907960e15c5d64fe93ad97b5ad88b9f88edf9dd7db147d5488
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 2681 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1408 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf