Propostas Submetidas

DEI - FCTUC
Gerado a 2024-03-28 23:35:10 (Europe/Lisbon).
Voltar

Titulo Estágio

Designing and Implementing a Multi-Agent Reinforcement Learning Algorithm for Self-Driving Cars

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI-FCTUC

Enquadramento

In many real world settings, agents (human, artificial or both) interact, collaborate, cooperate, negotiate and compete. Sport games, exploration of unk-nown environments by teams of agents, search-rescue tasks, traffic control, arejust a few examples of domains that involve partially observable, cooperative,multi-agent learning problems, in which a team of agents must learn to coordinate their behaviour while conditioning only on their private observations.These problems bring unique challenges, such as the non-stationarity of lear-ning, multi-agent credit assignment, and the difficulty of representing the valueof joint actions.In the last few years, Multi-Agent Reinforcement Learning (MARL) has be-come a highly active area of research that offers valuable solutions for those pro-blems. Standardised environments such as the ALE and MuJoCo have allowedsingle-agent RL to move beyond toy domains. In addition, MARL platformshave also been leveraged by recently developed MARL platforms.In spite of the recent advances in MARL, there are still open issues. Forinstance, in the majority of real-world scenarios, the extrinsic feedback is sparseor not sufficient, thus intrinsic reward formulations are needed to successfullytrain the agent. Psychological constructs such as curiosity and surprise arehelpful for overcoming this problem.Solving tasks with sparse rewards is one of the most important challenges inRL. In the single-agent setting, this challenge is addressed by introducing intrin-sic rewards that motivate agents to explore unseen regions of their state spaces.However, applying these techniques naively to the multi-agent setting resultsin agents exploring independently, without any coordination among themselves.Exploration in cooperative multi-agent settings can be accelerated and improvedif agents coordinate their exploration.This work investigates MARL with a special focus on: (i) the involvementof intrinsic motivation related models (curiosity, surprise, attention) in its algo-rithms, and (ii) its application to the domain o self-driving cars.

Objetivo

The general goal is to design, implement and analyse an algorithm relying on intrinsic motivation models that can leverage MARL, and apply it to the domain of self-driving cars. In concrete, we aim at reaching the following goals:

- Analysis of MARL algorithms

- Analysis of intrinsic motivation related models (curiosity, surprise, attention)

- Analysis and specification of intrinsic motivation models in the context of MARL

- Implementation and empirical assessment of the proposed algorithm on a MARL platform, and its application to the domain of self-driving cars.

Well known public data sets from popular projects of autonomous cars such as those available in https://waymo.com/open/ (Waymo's data set), https://analyticsindiamag.com/top-10-popular-datasets-for-autonomous-driving-projects/ or https://datasetsearch.research.google.com/search?query=self%20driving%20car&docid=L2cvMTFsajBxbDdqdw%3D%3D will be analysed and considered for the implementation and testing phases. Most of them comprise not only simulation but also large amount of real data, which have been used by real autonomous cars that populate a few cities in the world.

Plano de Trabalhos - Semestre 1

1- State of the Art [Sept – Oct]

2- Design of an algorithm for MARL [Nov]

3- Development and first implementation of the proposed algorithm in a MARL platform, and building the MARL model [Dec – Jan]

4- Thesis Proposal Writing [Dec – Jan]

Plano de Trabalhos - Semestre 2

5- Improvement of the the proposed algorithm and model [Feb – Apr]

6- Experimental Tests [Apr – May]

7- Paper Writing [May – Jun]

8- Thesis Writing [Jan – Jul]

Condições

The eligible student will have at disposal all the necessary computational platforms, tools and devices.

Observações

Bibliography:

[Bazzan, 2009] Bazzan, A. L. C. (2009). Opportunities for multiagent systemsand multiagent reinforcement learning in traffic control.Auton. Agents MultiAgent Syst., 18(3):342–375.
[Dubey et al., 2021] Dubey, R., Mehta, H., and Lombrozo, T. (2021). Curiosityis contagious: A social influence intervention to induce curiosity.Cogn. Sci.,45(2).
[Hameed et al., 2020] Hameed, M. S. A., Khan, M. M., and Schwung, A. (2020).Curiosity based reinforcement learning on robot manufacturing cell.CoRR,abs/2011.08743.
[Hernandez-Leal et al., 2019] Hernandez-Leal, P., Kartal, B., and Taylor, M. E.(2019). A survey and critique of multiagent deep reinforcement learning.Auton. Agents Multi Agent Syst., 33(6):750–797
[Hester and Stone, 2017] Hester, T. and Stone, P. (2017). Intrinsically mo-tivated model learning for developing curious robots.Artif. Intell.,247(C):170–186.
[Li et al., 2019] Li, B., Lu, T., Li, J., Lu, N., Cai, Y., and Wang, S. (2019).Curiosity-driven exploration for off-policy reinforcement learning methods. In2019 IEEE International Conference on Robotics and Biomimetics, ROBIO2019, Dali, China, December 6-8, 2019, pages 1109–1114. IEEE.
[Macedo, 2013] Macedo, L. (2013). Arguments for a computational model for forms of selective attention based on cognitive and affective feelings. In 5th International Conference on Affective Computing and Intelligent Interaction(ACII 2013).
[Macedo, 2020] Macedo, L. (2020). Forms of distributed curiosity in the collaborative exploration of unknown environments by artificial agents. In Denison,S., Mack, M., Xu, Y., and Armstrong, B. C., editors,Proceedings of the 42thAnnual Meeting of the Cognitive Science Society - Developing a Mind: Learning in Humans, Animals, and Machines, CogSci 2020, virtual, July 29 -August 1, 2020. cognitivesciencesociety.org.
[Macedo and Cardoso, 2012] Macedo, L. and Cardoso, A. (2012). The exploration of unknown environments populated with entities by a surprise–curiosity-based agent.Cognitive Systems Research, 19-20:62–87
[Macedo et al., 2012] Macedo, L., Reisenzein, R., and Cardoso, F. (2012).Surprise and anticipation in learning., volume NA, chapter NA, pages 3250–3253.Springer US.
[Samvelyan et al., 2019] Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar,G., Nardelli, N., Rudner, T. G. J., Hung, C., Torr, P. H. S., Foerster,J. N., and Whiteson, S. (2019). The starcraft multi-agent challenge.CoRR,abs/1902.04043.
[Thiede et al., 2020] Thiede, L. A., Krenn, M., Nigam, A., and Aspuru-Guzik,A. (2020). Curiosity in exploring chemical space: Intrinsic rewards for deepmolecular reinforcement learning.CoRR, abs/2012.11293
[Zhou et al., 2020] Zhou, M., Luo, J., Villela, J., Yang, Y., Rusu, D., Miao, J.,Zhang, W., Alban, M., Fadakar, I., Chen, Z., Huang, A., Wen, Y., Hassanza-deh, K., Graves, D., Chen, D., Zhu, Z., Nguyen, N., Elsayed, M., Shao, K.,and Wang, J. (2020). Smarts: Scalable multi-agent reinforcement learning training school for autonomous driving.
[Zhu et al., 2020] Zhu, Z., Diallo, E. A. O., and Sugawara, T. (2020). Lear-ning efficient coordination strategy for multi-step tasks in multi-agent systemsusing deep reinforcement learning. In Rocha, A. P., Steels, L., and van denHerik, H. J., editors,Proceedings of the 12th International Conference onAgents and Artificial Intelligence, ICAART 2020, Volume 1, Valletta, Malta,February 22-24, 2020, pages 287–294. SCITEPRESS

Orientador

Luís Miguel Machado Lopes Macedo
macedo@dei.uc.pt 📩