Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Intergrated Computer systems. Linköping University, The Institute of Technology. (KPLAB - Knowledge Processing Lab)
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Intergrated Computer systems. Linköping University, The Institute of Technology. (KPLAB - Knowledge Processing Lab)
Linköping University, Department of Computer and Information Science, Artificial Intelligence and Intergrated Computer systems. Linköping University, The Institute of Technology. (KPLAB - Knowledge Processing Lab)
2015 (English)In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI) / [ed] Blai Bonet and Sven Koenig, AAAI Press, 2015, 2497-2503 p.Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement learning for robot control tasks in continuous environments is a challenging problem due to the dimensionality of the state and action spaces, time and resource costs for learning with a real robot as well as constraints imposed for its safe operation. In this paper we propose a model-based reinforcement learning approach for continuous environments with constraints. The approach combines model-based reinforcement learning with recent advances in approximate optimal control. This results in a bounded-rationality agent that makes decisions in real-time by efficiently solving a sequence of constrained optimization problems on learned sparse Gaussian process models. Such a combination has several advantages. No high-dimensional policy needs to be computed or stored while the learning problem often reduces to a set of lower-dimensional models of the dynamics. In addition, hard constraints can easily be included and objectives can also be changed in real-time to allow for multiple or dynamic tasks. The efficacy of the approach is demonstrated on both an extended cart pole domain and a challenging quadcopter navigation task using real data.

Place, publisher, year, edition, pages
AAAI Press, 2015. 2497-2503 p.
Keyword [en]
Reinforcement Learning, Gaussian Processes, Optimization, Robotics
National Category
Computer Science Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-113385ISBN: 978-1-57735-698-1 (print)OAI: oai:DiVA.org:liu-113385DiVA: diva2:781572
Conference
Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), January 25-30, 2015, Austin, Texas, USA.
Funder
Linnaeus research environment CADICSeLLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsSwedish Foundation for Strategic Research VINNOVAEU, FP7, Seventh Framework Programme
Available from: 2015-01-16 Created: 2015-01-16 Last updated: 2017-08-16Bibliographically approved
In thesis
1. Methods for Scalable and Safe Robot Learning
Open this publication in new window or tab >>Methods for Scalable and Safe Robot Learning
2017 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Robots are increasingly expected to go beyond controlled environments in laboratories and factories, to enter real-world public spaces and homes. However, robot behavior is still usually engineered for narrowly defined scenarios. To manually encode robot behavior that works within complex real world environments, such as busy work places or cluttered homes, can be a daunting task. In addition, such robots may require a high degree of autonomy to be practical, which imposes stringent requirements on safety and robustness. \setlength{\parindent}{2em}\setlength{\parskip}{0em}The aim of this thesis is to examine methods for automatically learning safe robot behavior, lowering the costs of synthesizing behavior for complex real-world situations. To avoid task-specific assumptions, we approach this from a data-driven machine learning perspective. The strength of machine learning is its generality, given sufficient data it can learn to approximate any task. However, being embodied agents in the real-world, robots pose a number of difficulties for machine learning. These include real-time requirements with limited computational resources, the cost and effort of operating and collecting data with real robots, as well as safety issues for both the robot and human bystanders.While machine learning is general by nature, overcoming the difficulties with real-world robots outlined above remains a challenge. In this thesis we look for a middle ground on robot learning, leveraging the strengths of both data-driven machine learning, as well as engineering techniques from robotics and control. This includes combing data-driven world models with fast techniques for planning motions under safety constraints, using machine learning to generalize such techniques to problems with high uncertainty, as well as using machine learning to find computationally efficient approximations for use on small embedded systems.We demonstrate such behavior synthesis techniques with real robots, solving a class of difficult dynamic collision avoidance problems under uncertainty, such as induced by the presence of humans without prior coordination. Initially using online planning offloaded to a desktop CPU, and ultimately as a deep neural network policy embedded on board a 7 quadcopter.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2017. 37 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1780
Keyword
Symbicloud, ELLIIT, WASP
National Category
Computer and Information Science Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-138398 (URN)10.3384/lic.diva-138398 (DOI)978-91-7685-490-7 (ISBN)
Presentation
2017-09-15, Alan Turing, E-huset, Campus Valla, Linköping, 10:15 (English)
Opponent
Supervisors
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnut and Alice Wallenberg FoundationSwedish Foundation for Strategic Research
Available from: 2017-08-17 Created: 2017-08-16 Last updated: 2017-08-18Bibliographically approved

Open Access in DiVA

AAAI-2015-Model-Based-Reinforcement(399 kB)497 downloads
File information
File name FULLTEXT01.pdfFile size 399 kBChecksum SHA-512
fbd8141323bf6064c3f7e17f59b6775deb0ec475cd48764d2261eb20adf4862133ddb5006880890a43224961d7aa4f9b6ac17e7397b57333cba14308e3009e10
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Andersson, OlovHeintz, FredrikDoherty, Patrick
By organisation
Artificial Intelligence and Intergrated Computer systemsThe Institute of Technology
Computer ScienceComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 497 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3408 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf