Titulo Estágio
A secure, dynamic, and fault-tolerant multi-agent system in distributed industrial applications
Áreas de especialidade
Engenharia de Software
Sistemas Inteligentes
Local do Estágio
Rua Cidade Poitiers, nº 155 – 1º Andar 3000-108 Coimbra
Enquadramento
Most industrial architectures still follow the automation pyramid (ISA-95), maintaining a rigid structure with a highly hierarchical dependency between levels and dependency on fixed resources. A rigid structure makes it difficult for the systems to adjust to the new demands of the industry and to increase their resilience to catastrophic failures. As industrial systems increase their resources, the need for a scalable and flexible Cyber-Physical system (CPS) has become evident. Therefore, CPS architectures must evolve to meet new business requirements, and technology rapidly changes. In this context, the emergence of distributed architectures, such as edge computing or service-oriented architecture, strongly contributes to these demands due to its structure with multiple independent systems to communicate and coordinate actions as a unified and coherent system. The distributed approach improves flexibility, scalability, efficiency, reliability, and security, but the existing solutions still rely on centralized management and have limited applications in real-world industrial environments.
To combat centralized management, the concept of multi-agent systems has the potential to emerge, capable of being defined as a distributed system consisting of multiple decision-making agents that interact, collaborate/coordinate, and communicate to achieve a common objective. By working together, the agents can solve complex problems that would be too difficult for a single central software. The multi-agent paradigm enhances system flexibility, efficiency, and resilience, enabling a fully autonomous system and seamless integration with new technologies.
The capabilities of multi-agent systems can be further enhanced with blockchain technologies, such as consensus algorithms and distributed ledgers, to create secure, trustworthy, and trackable decisions. This approach can also be optimized with fault-tolerant methodologies to recover from failures autonomously, significantly contributing to the system's dependability, reliability, fault tolerance, and decentralization. Such innovations enhance CPS by reducing human intervention and minimizing failure susceptibility, improving its performance and reliability in industrial processes. However, the development and implementation of fully autonomous, fault-tolerant, and consensus-driven multi-agent systems in industrial applications remains a significant challenge.
The focus of this internship is to explore multi-agent, autonomous, and fault-tolerant methodologies, implementing them within an industrial software framework. The main goal is to develop a multi-agent system capable of collaborating and making secure distributed decisions, such as dynamically allocating microservices or tasks among different processing instances without relying on centralized management. Furthermore, fault-tolerant strategies will be integrated to improve system resilience, focused on detecting, mitigating, and preventing failures through collaborative and decentralized decisions.
Objetivo
The learning objectives of this master internship are:
- Multiagent and Autonomous methodologies study: study the subject of reliable and secure collaborative techniques between agents or resources. It is desired to develop an autonomous decision system with minimal human intervention or central control, using the collaboration of distributed agents to prevent intrusions or malicious actors within the system.
- Distribution techniques study: Additional techniques will be studied giving support to the integration of the previously mentioned methods into distributed CPSs, such as: (1) the use of blockchain technologies, like a distributed ledger, to enhance shared knowledge across the industrial system; (2) task/service allocation strategies to optimize the distribution of industrial process functionalities and improve process performance; (3) failure detection and mitigation into distributed systems;
- Fault-tolerant techniques study: study the subject of techniques to detect and mitigate failures, like heartbeats, redundancy, parallelism, and failover, to be integrated into distributed systems and applications, in accordance with the Standard IEC 61499;
- Software Integration at distributed systems: understand how to integrate and implement these techniques in distributed CPSs, with a focus on modular and distributed software tailored for industrial environments. This integration significantly contributes to software development by improving coding skills through continuous work with diverse techniques and communication protocols;
- Experimental tests: understand the experimental testing process alongside software development to build a reliable system for industrial environments;
- Research Design: understand how to design and execute an experimental process to address complex and open research issues;
Plano de Trabalhos - Semestre 1
Work Plan 1° semester – 16hrs/week
[01/09/2024 to 20/10/2024] Literature review
Study the concepts to be used in the internship, namely allocation of tasks/services, consensus algorithms, self-capabilities, fault tolerance, and distributed ledger.
[21/10/2024 to 30/11/2024] Definition of collaborative methodology to distributed applications
Define the multi-agents and autonomous methodology for allocating tasks/microservices of an industrial application (process) across instances of the distributed system (RPIs) autonomously without relying on a central resource for decision-making or management. More specifically, it is essential to define the secure collaborative technique to distribute the functionalities of a specific industrial process. Therefore, this includes defining the decision-making function for task distribution based on the instance system characteristics (available CPU, RAM, memory, among others) and specifying the use of distributed ledger information within the system.
[01/12/2024 to 01/02/2025] Integration of initial methodology
The defined methodology for distributing/allocating process tasks or microservices will be integrated into the modular industrial software developed by Oncontrol Technologies (Databridge). This requires an initial integration with Databridge software, followed by the implementation of the methodology. The software is developed in C++, and the communication between instances/resources is handled using ZeroMQ communication framework.
Plano de Trabalhos - Semestre 2
Work Plan 2º semester - 35hrs/week
[01/02/2025 to 01/03/2025] Tests and continuous optimization of methodology
The tests must be initiated with continuous optimization of the methodology and its integration into the Databridge software. Initial results are being observed across different industrial applications.
[01/03/2025 to 17/04/2025] Development of fault-tolerant methodologies
The fault-tolerant techniques, such as redundancy, parallelism, and failover, are developed and implemented for testing in previous applications. These techniques should follow a similar autonomous methodology used for service distribution, without human intervention, but based on replicating functionalities/microservices across other resources. The system must dynamically adapt to detect failures within the CPS, ensuring consistent performance and reliable execution of the industrial process or application. Failure detection and failover techniques will be implemented to ensure the system’s resilience and reliability at all times.
[18/04/2025 to 08/05/2025] Explore and compare the results of different industrial applications
Process, explore, and analyze the result data and the dynamic behavior of the multi-agent system with the implemented techniques. Understand how the system responds in the presence of faults and observe the dynamic allocation of functionalities (function blocks or microservices).
[09/05/2025 to 05/07/2025] Write the thesis.
Condições
Depending on the evolution of the internship, studentship funding may be available to support the development of the work. The work is to be executed at the laboratories of the Oncontrol technologies and CEMMPRE at UC. The thesis supervisor is Tiago Cruz, Associate Professor at the Department of Informatics Engineering of the University of Coimbra.
A workplace will be provided, as well as the necessary computational resources. This internship is associated with the WOOSU project that can be eventually supported by a research grant. Monthly grant amount 1040,98€. Applications until 2025-06-06 here https://apply.uc.pt/IT137-25-170
Observações
For any questions, please contact jorge.alves@oncontrol-tech.com or jerome.mendes@uc.pt, or call +351 239 821 117 between 10:00 and 18:00.
Orientador
Pedro Angelo Morais de Sousa
pedro.sousa@oncontrol-tech.com 📩