Exact and reinforcement learning approaches have shown clear limitations when attempting to solve the vehicle routing problem (VRP) and its variants, while no research study until now has applied a hierarchical reinforcement learning (HRL) approach on the VRP with stochastic service request (VRPSSR). This thesis proposes a novel HRL framework that optimizes the network routing for the VRPSSR. The proposed framework trains upper and lower level tabular policies that collaborate to optimize the network routing. The framework is assessed in a series
of experiments based on three key metrics: output quality, success rate, and CPU time. In the experiments, multiple scenarios are defined using different levels of environment complexity by adjusting the number of nework customers and the probability of new customer requests. Using optimization benchmarking, the results of the HRL framework are compared against two benchmark algorithms: dynamic programming (DP) and traditional reinforcement learning (RL). The analysis of the results shows that HRL outperformed RL in all three metrics among nearly all scenarios of complexity. The HRL framework also closely approximated the DP output quality while obtaining significantly shorter CPU times. Although bound to limitations related to computational power and resources, the framework demonstrates highly promising results and has the potential to contribute into the area of real world combinatorial problems where the complexity of the environment increases singificantly.