Aller au contenu. | Aller à la navigation

Outils personnels

Navigation

UMR 5672

logo de l'ENS de Lyon
logo du CNRS
Vous êtes ici : Accueil / Séminaires / Machine Learning and Signal Processing / Robustness via distributional dynamic programming

Robustness via distributional dynamic programming

Mastane Achab (coming from post-doc at Universitat Pompeu Fabra)
Quand ? Le 15/02/2022,
de 13:00 à 14:00
Participants Mastane Achab
Ajouter un événement au calendrier vCal
iCal

Title : Robustness via distributional dynamic programming

Asbtract : In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally in distributional reinforcement learning (DRL), the focus is on the whole distribution of the return, not just its expectation. Although DRL-based methods produced state-of-the-art performance in RL with function approximation, they involve additional quantities (compared to the non-distributional setting) that are still not well understood. As a first contribution, we introduce a new class of distributional operators, together with a practical DP algorithm for policy evaluation, that come with a robust MDP interpretation. Indeed, our approach reformulates through an augmented state space where each state is split into a worst-case substate and a best-case substate, whose values are maximized by safe and risky policies respectively. Finally, we derive distributional operators and DP algorithms solving a new control task: How to distinguish safe from risky optimal actions in order to break ties in the space of optimal policies?

More information :  https://mastane.github.io

Exposé en salle MGN 435 (ENS de Lyon, site Monod, 4e, UMPA)