Project: Long-term dynamic and evolutionary autonomy

Acronym DELTA (Reference Number: ANR-17-CHR2-0002)
Duration 01/01/2018 - 31/12/2020
Project Topic Many complex autonomous systems (e.g., electrical distribution networks) repeatedlyselect actions with the aim of achieving a given objective. Reinforcement learning (RL)offers a powerful framework for acquiring adaptive behaviour in this setting, associating ascalar reward with each action and learning from experience which action to select tomaximise long-term reward. Although RL has produced impressive results recently (e.g.,achieving human-level play in Atari games and beating the human world champion in theboard game Go), most existing solutions only work under strong assumptions: theenvironment model is stationary, the objective is fixed, and trials end once the objective ismet.The aim of this project is to advance the state of the art of fundamental research in lifelongRL by developing several novel RL algorithms that relax the above assumptions. The newalgorithms should be robust to environmental changes, both in terms of the observationsthat the system can make and the actions that the system can perform. Moreover, thealgorithms should be able to operate over long periods of time while achieving differentobjectives.The proposed algorithms will address three key problems related to lifelong RL: planning,exploration, and task decomposition. Planning is the problem of computing an actionselection strategy given a (possibly partial) model of the task at hand. Exploration is theproblem of selecting actions with the aim of mapping out the environment rather thanachieving a particular objective. Task decomposition is the problem of defining differentobjectives and assigning a separate action selection strategy to each. The algorithms willbe evaluated in two realistic scenarios: active network management for electricaldistribution networks, and microgrid management. A test protocol will be developed toevaluate each individual algorithm, as well as their combinations.
Project Results
(after finalisation)
The main targeted outcome is to develop an integrated autonomous system which is able to successfully interact with its environment during long periods of time without breaking down or failing to achieve its objectives. This in itself would be a considerable improvement over the state-of-the-art, since we are aware of no realistic applications of lifelong reinforcement learning that are capable of operating for extended time periods while adapting to changes in the environment. The above targeted outcome will be tested empirically in the two simulated scenarios: active network management (ANM) and microgrid management. The simulators developed as part of the project will make it possible to test the autonomous system for long time periods while switching between objectives and introducing changes to the environment, effectively providing a rich, realistic benchmark in which to evaluate the different RL algorithms. Success will be measured using several different criteria: 1) the ability of the system to achieve its objectives, including objectives that were initially unknown; 2) the ability of the system to recover and adapt as a result of changes to the environment model, without having to relearn from scratch; 3) the ability to learn and plan faster over time by taking advantage of the estimated environment model. In RL, the ability to satisfy these criteria can be measured by recording the cumulative reward over time: a higher reward means that the system is better at achieving its objectives. Another targeted outcome is to improve on the state-of-the-art for ANM and microgrid management. So far, only methods from Mixed Integer Nonlinear Programming or metaheuristics have been applied to these problems. We expect that reinforcement learning should have an advantage over these previous approaches, since it allows the decision strategy to improve over time and since the novel RL algorithms will allow the application to adapt to changes in the environment. Since results are available for previous approaches, it is straightforward to measure the performance of RL algorithms in ANM and microgrid management, and compare them to previous approaches to see which one works better. Several of the novel RL algorithms are directly related to the target outcomes specified in the LIIS call. The proposed extension of the UCRL algorithm will be able to actively explore its environment. Exploration will be guided by the current task, either as a result of achieving a new system objective, or as a result of identifying useful subtasks. The proposed method for refining the hierarchical partition will avoid the need for regression and take advantage of existing knowledge about the old environment model. The evaluation protocol proposed in the previous section will help measure the performance of each individual RL algorithm. Finally, another targeted outcome is to disseminate the novel RL algorithms developed as part of the project, both by publishing scientific papers and by making the resulting software modules available as part of a public repository. In this way we hope to contribute to the progress of lifelong RL beyond the scope of this project.
Network CHIST-ERA III
Call CHIST-ERA Call 2017

Project partner

Number Name Role Country
1 Pompeu Fabra University Coordinator Spain
2 Montan University Leoben Partner Austria
3 University of Liège Partner Belgium
4 National Institute of Research in Computer Science and Automation - Lille Partner France