Digitala Vetenskapliga Arkivet

Change search
Refine search result
1 - 12 of 12
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Alinia, Bahram
    et al.
    Telecom SudParis, Inst Mines Telecom, F-91000 Evry, France. alebi, Mohammad Sadegh.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Hajiesmaili, Mohammad H.
    Yekkehkhany, Ali
    Crespi, Noel
    Competitive Online Scheduling Algorithms with Applications in Deadline-Constrained EV Charging2018In: 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, IEEE, 2018, article id 8624184Conference paper (Refereed)
    Abstract [en]

    This paper studies the classical problem of online scheduling of deadline-sensitive jobs with partial values and investigates its extension to Electric Vehicle (EV) charging scheduling by taking into account the processing rate limit of jobs and charging station capacity constraint. The problem lies in the category of time-coupled online scheduling problems without availability of future information. This paper proposes two online algorithms, both of which are shown to be (2-\frac{1}{U})-competitive, where U is the maximum scarcity level, a parameter that indicates demand-to-supply ratio. The first proposed algorithm is deterministic, whereas the second is randomized and enjoys a lower computational complexity. When U grows large, the performance of both algorithms approaches that of the state-of-the-art for the case where there is processing rate limits on the jobs. Nonetheless in realistic cases, where U is typically small, the proposed algorithms enjoy a much lower competitive ratio. To carry out the competitive analysis of our algorithms, we present a proof technique, which is novel to the best of our knowledge. This technique could also be used to simplify the competitive analysis of some existing algorithms, and thus could be of independent interest.

  • 2. Alinia, Bahram
    et al.
    Yousefi, Hamed
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Khonsari, Ahmad
    Maximizing Quality of Aggregation in Delay-Constrained Wireless Sensor Networks2013In: IEEE Communications Letters, ISSN 1089-7798, E-ISSN 1558-2558, Vol. 17, no 11, p. 2084-2087Article in journal (Refereed)
    Abstract [en]

    In this letter, both the number of participating nodes and spatial dispersion are incorporated to establish a bi-objective optimization problem for maximizing the quality of aggregation under interference and delay constraints in tree-based wireless sensor networks (WSNs). The formulated problem is proved to be NP-hard with respect to Weighted-sum scalarization and a distributed heuristic aggregation scheduling algorithm, named SDMAX, is proposed. Simulation results show that SDMAX not only gives a close approximation of the Pareto-optimal solution, but also outperforms the best, to our knowledge, existing alternative proposed so far in the literature.

    Download full text (pdf)
    Lett1
  • 3. Combes, R.
    et al.
    Talebi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Proutiere, Alexandre
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Lelarge, M.
    Combinatorial bandits revisited2015In: Advances in Neural Information Processing Systems, Neural Information Processing Systems, 2015, p. 2116-2124Conference paper (Refereed)
    Abstract [en]

    This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension of the decision space. We propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the adversarial setting under bandit feedback, we propose COMBEXP, an algorithm with the same regret scaling as state-of-the-art algorithms, but with lower computational complexity for some combinatorial problems.

  • 4. Hajiesmaili, M. H.
    et al.
    Talebi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Khonsari, A.
    Utility-optimal dynamic rate allocation under average end-to-end delay requirements2016In: Proceedings of the IEEE Conference on Decision and Control, IEEE conference proceedings, 2016, p. 4842-4847Conference paper (Refereed)
    Abstract [en]

    QoS-aware networking applications such as real-time streaming and video surveillance systems require nearly fixed average end-to-end delay over long periods to communicate efficiently, although may tolerate some delay variations in short periods. This variability exhibits complex dynamics that makes rate control of such applications a formidable task. This paper addresses rate allocation for heterogeneous QoS-aware applications that preserves the long-term average end-to-end delay constraint while, similar to Dynamic Network Utility Maximization (DNUM), strives to achieve the maximum network utility aggregated over a fixed time interval. Since capturing temporal dynamics in QoS requirements of sources is allowed in our system model, we incorporate a novel time-coupling constraint in which delay-sensitivity of sources is considered such that a certain end-to-end average delay for each source over a pre-specified time interval is satisfied. We propose DA-DNUM algorithm, as a dual-based solution, which allocates source rates for the next time interval in a distributed fashion, given the knowledge of network parameters in advance. Through numerical experiments, we show that DA-DNUM gains higher average link utilization and a wider range of feasible scenarios in comparison with the best, to our knowledge, rate control schemes that may guarantee such constraints on delay.

  • 5. Hajiesmaili, Mohammad H.
    et al.
    Talebi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Khonsari, Ahmad
    Joint multipath rate control and scheduling for SVC streams in wireless mesh networks2014In: International Journal of Ad Hoc and Ubiquitous Computing, ISSN 1743-8225, E-ISSN 1743-8233, Vol. 15, no 4, p. 239-251Article in journal (Refereed)
    Abstract [en]

    Rate adaptation of video signal for different quality-of-service scenarios through scalable video coding (SVC) standard has been considered as a key feature for multimedia transmission. This paper addresses joint multipath rate control and scheduling for SVC-encoded video transmission over wireless mesh networks (WMNs). Each video stream is assumed to use multipath routing and to possess a staircase utility function. Using the conflict graph that represents the interference-limited model, we formulate the problem as one of maximising the sum of source utilities subject to transport and link layers constraints. The multipath routing over wireless channels and staircase utilities yield a non-convex optimisation problem. To attain a convex formulation, we adopt multimodal sigmoid approximation and exploit utility-proportional fairness approach. Then, employing dual decomposition, we devise a distributed algorithm for joint multipath rate control and scheduling in WMNs. Experiments validate the effectiveness of our endeavor toward achieving cross-layer optimisation for video transmission in WMNs.

  • 6.
    Hajiesmaili, Mohammad Hassan
    et al.
    Johns Hopkins Univ, Whiting Sch Engn, Baltimore, MD 21218 USA..
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). K.
    Khonsari, Ahmad
    Univ Tehran, Dept Elect & Comp Engn, Tehran 1417614418, Iran.;Inst Res Fundamental Sci, Sch Comp Sci, Tehran 1953833511, Iran..
    Multiperiod Network Rate Allocation With End-to-End Delay Constraints2018In: IEEE Transactions on Control of Network Systems, E-ISSN 2325-5870, Vol. 5, no 3, p. 1087-1097Article in journal (Refereed)
    Abstract [en]

    QoS-aware networking applications such as real-time streaming and video surveillance systems require nearly fixed average end-to-end delay over long periods to communicate efficiently, although may tolerate some delay variations in short periods. This variability exhibits complex dynamics that makes rate control of such applications a formidable task. This paper addresses rate allocation for heterogeneous QoS-aware applications that preserves the long-term end-to-end delay constraint while seeking the maximum network utility cumulated over a fixed time interval. To capture the temporal dynamics of sources, we incorporate a novel time-coupling constraint in which delay sensitivity of sources is considered such that a certain end-to-end average delay for each source over a prespecified time interval is satisfied. We propose an algorithm, as a dual-based solution, which allocates source rates for the next time interval in a distributed fashion, given the knowledge of network parameters in advance. Also, we extend the algorithm to the case that the problem data is not known fully in advance to capture more realistic scenarios. Through numerical experiments, we show that our proposed algorithm attains higher average link utilization and a wider range of feasible scenarios in comparison with the best, to our knowledge, rate control schemes that may guarantee such constraints on delay.

  • 7. Lelarge, Marc
    et al.
    Proutiere, Alexandre
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Spectrum Bandit Optimization2013In: 2013 IEEE Information Theory Workshop, ITW 2013, IEEE conference proceedings, 2013, p. 6691221-Conference paper (Refereed)
    Abstract [en]

    We consider the problem of allocating radio channels to links in a wireless network. Links interact through interference, modelled as a conflict graph (i.e., two interfering links cannot be simultaneously active on the same channel). We aim at identifying the channel allocation maximizing the total network throughput over a finite time horizon. Should we know the average radio conditions on each channel and on each link, an optimal allocation would be obtained by solving an Integer Linear Program (ILP). When radio conditions are unknown a priori, we look for a sequential channel allocation policy that converges to the optimal allocation while minimizing on the way the throughput loss or regret due to the need for exploring suboptimal allocations. We formulate this problem as a generic linear bandit problem, and analyze it in a stochastic setting where radio conditions are driven by a i.i.d. stochastic process, and in an adversarial setting where radio conditions can evolve arbitrarily. We provide, in both settings, algorithms whose regret upper bounds outperform those of existing algorithms.

  • 8.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Minimizing Regret in Combinatorial Bandits and Reinforcement Learning2017Doctoral thesis, monograph (Other academic)
    Abstract [en]

    This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. Her performance can be measured through the notion of regret, which compares her accumulated expected reward against that achieved by an oracle algorithm always following an optimal behavior. In order to maximize her accumulated reward, or equivalently to minimize the regret, she needs to face a trade-off between exploration and exploitation.

    The first part of this thesis investigates combinatorial multi-armed bandit (MAB) problems, which are RL problems whose state-space is a singleton. It also addresses some applications that can be cast as combinatorial MAB problems. The number of arms in such problems generically grows exponentially with the number of basic actions, but the rewards of various arms are correlated. Hence, the challenge in such problems is to exploit the underlying combinatorial structure.For these problems, we derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any admissible algorithm and investigate how these bounds scale with the dimension of the underlying combinatorial structure. We then propose several algorithms and provide finite-time analyses of their regret. The proposed algorithms efficiently exploit the structure of the problem, provide better performance guarantees than existing algorithms, and significantly outperform these algorithms in practice.

    The second part of the thesis concerns RL in an unknown and discrete MDP under the average-reward criterion. We develop some variations of the transportation lemma that could serve as novel tools for the regret analysis of RL algorithms. Revisiting existing regret lower bounds allows us to derive alternative bounds, which motivate that the local variance of the bias function of the MDP, i.e., the variance with respect to next-state transition laws, could serve as a notion of problem complexity for regret minimization in RL. Leveraging these tools also allows us to report a novel regret analysis of the KL-UCRL algorithm for ergodic MDPs. The leading term in our regret bound depends on the local variance of the bias function, thus coinciding with observations obtained from our presented lower bounds. Numerical evaluations in some benchmark MDPs indicate that the leading term of the derived bound can provide an order of magnitude improvement over previously known results for this algorithm.

    Download full text (pdf)
    Sadegh_Talebi_Doctoral_Thesis
  • 9.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Online Combinatorial Optimization under Bandit Feedback2016Licentiate thesis, monograph (Other academic)
    Abstract [en]

    Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems.

    Download full text (pdf)
    M_S_Talebi_Licentiate_Thesis
  • 10.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Proutiere, Alexandre
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Learning proportionally fair allocations with low regret2018In: SIGMETRICS 2018 - Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, Association for Computing Machinery (ACM), 2018, p. 50-52Conference paper (Refereed)
    Abstract [en]

    We address the problem of learning Proportionally Fair (PF) allocations in parallel server systems with unknown service rates. We provide the first algorithms, to our knowledge, for learning such allocations with sub-linear regret

  • 11.
    Talebi Mazraeh Shahi, Mohammad Sadegh
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre.
    Zou, Zhenhua
    Ericsson Res, SE-16483 Stockholm, Sweden..
    Combes, Richard
    Cent Supelec L2S, Telecommun Dept, F-91192 Gif Sur Yvette, France..
    Proutiere, Alexandre
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre.
    Johansson, Mikael
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, ACCESS Linnaeus Centre.
    Stochastic Online Shortest Path Routing: The Value of Feedback2018In: IEEE Transactions on Automatic Control, ISSN 0018-9286, E-ISSN 1558-2523, Vol. 63, no 4, p. 915-930Article in journal (Refereed)
    Abstract [en]

    This paper studies online shortest path routing over multihop networks. Link costs or delays are time varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. Three algorithms, with a tradeoff between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms. We also assess numerically the performance of the proposed algorithms and compare it to that of existing algorithms.

  • 12.
    Talebi, Mohammad Sadegh
    et al.
    KTH, School of Electrical Engineering (EES), Automatic Control.
    Proutiere, Alexandre
    KTH, School of Electrical Engineering (EES), Automatic Control.
    An optimal algorithm for stochastic matroid bandit optimization2016In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) , 2016, p. 548-556Conference paper (Refereed)
    Abstract [en]

    The selection of leaders in leader-follower multi-agent systems can be naturally formulated as a matroid optimization problem. In this paper, we investigate the online and stochastic version of such a problem, where in each iteration or round, we select a set of leaders and then observe a random realization of the corresponding reward, i.e., of the system performance. This problem is referred to as a stochastic matroid bandit, a variant of combinatorial multi-armed bandit problems where the underlying combinatorial structure is a matroid. We consider semi-bandit feedback and Bernoulli rewards, and derive a tight and problem-dependent lower bound on the regret of any consistent algorithm. We propose KL-OSM, a computationally efficient algorithm that exploits the matroid structure. We derive a finite-time upper bound of the regret of KL-OSM that improves the performance guarantees of existing algorithms. This upper bound actually matches our lower bound, i.e., KL-OSM is asymptotically optimal. Numerical experiments attest that KL-OSM outperforms state-of-the-art algorithms in practice, and the difference in some cases is significant. 

1 - 12 of 12
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf