Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning with Constrained Action
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2025 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Reinforcement Learning (RL) is an important technique in machine learning that has gained special significance in solving real-world problems. Typically RL algorithms work under no constraints on the state of the environment or the actions taken by the agent. But to be better useful for real-world scenarios, RL agents need to learn to work under constraints on states and/ or actions.

RL algorithms for constraints on states, also known as safety-constrained RL, are widely researched and a lot of techniques have been invented for solving such problems. Most of these techniques involve formulating the problem as a Constrained Markov Decision Process (CMDP) and applying algorithms like Q-learning to solve them.

RL problems with constraints on actions, on the other hand, are not widely researched. There exists a limited number of literature on training RL agents under constraints imposed on the actions that the agent can take at each state. Most of the available methods make use of some kind of action mapping technique - a strategy to map the action chosen by the agent, as per its current policy, to the nearest allowed action according to some action constraint. Advantages of such algorithms are that they always make sure that the constraints are not violated. This is particularly helpful in applications where one wrong action by the agent can have a life-threatening or any other critical consequences, medical applications for example.

But there are applications where violations of action constraints are not that critical. In such applications, action mapping might restrict the agent's ability to explore the state space effectively. Modelling the action-constrained problem as a CMDP and using Q-learning to solve it can allow a more relaxed exploration for the agent resulting in a more efficient policy. This work tries to validate this hypothesis by applying Q-learning to a social media problem where the agent tries to enhance the engagement rate by performing different actions while trying to not violate a limit imposed on the number of times it can post content during a given interval.

Experiments under this work have established that using Q-learning to solve an action-constrained social media problem formulated as a CMDP can be significantly more effective at enhancing the engagement rate. Even though the action mapping method can make sure that the agent never posts more content than the allowed limit, the achieved enhancement rate is much lower, in fact, lower than that achieved by a random agent. The Q-learning agent, on the other hand, can achieve around 16% better engagement rate for a small number of constraint violations.

Place, publisher, year, edition, pages
2025.
Keywords [en]
Q-learning, CMDP, Action-constraints, Social Media
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:su:diva-242802OAI: oai:DiVA.org:su-242802DiVA, id: diva2:1955735
Available from: 2025-04-30 Created: 2025-04-30

Open Access in DiVA

fulltext(817 kB)13 downloads
File information
File name FULLTEXT01.pdfFile size 817 kBChecksum SHA-512
8ac1465f45ec0527e3fc2b73b30ef431e6192b2eb0f3fdb1ca52b53b0f9bf5f8c79374791e5f8aba5f204a53366aade048902d77a021640a713543e40f8f4997
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Karthikeyan, Vaishak
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 13 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf