Titulo Estágio
A Reinforcement Learning from Human Feedback (RLFH)-based Approach for the Nexus Haulier Capacity Matching Problem of Porto de Sines
Áreas de especialidade
Sistemas Inteligentes
Sistemas Inteligentes
Local do Estágio
Centre for Informatics and Systems of the University of Coimbra (CISUC), at the Department of Informatics Engineering of the University of Coimbra
Enquadramento
The NEXUS Agenda consortium, led by the Port of Sines, comprises 35 partners who share a common goal of developing innovative solutions to achieve a Green and Digital Transition Agenda. The consortium represents the entire value chain, including port authorities, maritime and terminal operators, railway operators, carriers, dry ports, logistics operators, technology suppliers, importers, exporters, as well as universities and research institutes. The expertise and skills of these partners play a crucial role in realizing this pioneering Agenda.
Within the NEXUS framework, a key objective is to harness the vast pool of high-quality data available for the development of Big Data Analytics and Artificial Intelligence Solutions. This internship opportunity is part of Work Package 2 (WP2), which focuses on creating specialized multimodal network and logistics applications to tackle this challenge. All resulting products and services will be designed and demonstrated in collaboration with end users, including terminal operators, road and rail operators, dry ports, and authorities within the multimodal logistics networks.
One specific task within WP2 aims to optimize the utilization of haulier resources, leading to cost savings and faster shipments. This task involves leveraging data collected from other sources to gain insights into the current state of logistic networks. By applying artificial intelligence methods, we aim to forecast the supply and demand of haulier services across the entire logistics network. These forecasts, combined with optimization algorithms, will maximize resource utilization to their highest possible capacity. This endeavor encompasses various challenges such as developing accurate forecasting mechanisms, efficient data collection, problem characterization, modeling, and solving the optimization problem. The proposed activities stem from these challenges and seek to address them effectively.
In today's rapidly evolving world, the importance of Artificial Intelligence (AI) has become increasingly prominent. AI technologies have revolutionized various industries and sectors, driving innovation, efficiency, and improved decision-making processes. Within the realm of AI, Reinforcement Learning (RL) stands out as a powerful paradigm that enables machines to learn and make decisions through interaction with an environment. RL has been widely applied to solve complex problems and optimize outcomes in diverse domains. Furthermore, a recent advancement in RL known as Reinforcement Learning from Human Feedback (RLFH) has garnered significant attention (e.g., in chatGPT-3.5). RLFH combines the strengths of RL and human expertise, allowing human feedback to guide and accelerate the learning process of AI systems. This collaborative approach not only enhances the performance of AI algorithms but also opens up new possibilities for human-AI collaboration in solving complex real-world challenges.
The problem of haulier capacity matching involves efficiently matching available hauliers (trucks or carriers) with transportation demands or shipments. This process requires considering factors such as haulier availability, shipment characteristics, geographical constraints, delivery deadlines, and specific requirements.
Reinforcement Learning from Human Feedback (RLFH) offers several benefits for haulier capacity matching:
Complex and Dynamic Environment: Haulier capacity matching involves a dynamic environment with varying capacities and rapidly changing shipment demands. RLFH enables the AI system to adapt and make context-aware decisions by leveraging human feedback, resulting in more informed choices.
Expertise and Domain Knowledge: Human experts, such as logistics managers or hauliers, possess valuable knowledge about optimizing capacity matching. RLFH allows the AI system to learn from their expertise and adapt decision-making accordingly, leveraging domain knowledge for more accurate matching decisions.
Subjective Preferences and Constraints: Haulier capacity matching often involves subjective preferences and constraints, such as preferred routes or handling requirements. RLFH enables the system to learn from human feedback on these aspects, aligning recommendations with hauliers' and shippers' preferences and constraints. This personalized approach enhances customer satisfaction and operational efficiency.
Iterative Learning and Improvement: Haulier capacity matching is an iterative process due to changing demands and market conditions. RLFH facilitates an iterative feedback loop where the AI system continuously learns from human feedback, refining its matching strategies and adapting to new patterns or requirements. This iterative learning process improves matching capabilities over time, leading to more accurate and efficient haulier capacity allocations.
Objetivo
The aim of this project is to build a AI agent based on RL and more precisely on RLHF for the haulier capacity matching problem of the Porto de Sines. The research objectives to accomplished this aim are:
1. Initial Policy Design: Initially, an AI system can be trained using traditional RL techniques to develop an initial policy for haulier capacity matching. This policy represents the AI system's decision-making strategy for matching hauliers with transportation demands.
2. Human Expertise and Feedback: Human experts, such as experienced logistics managers or hauliers, can provide feedback on the AI system's initial policy. They can evaluate the quality of the haulier-to-demand assignments, identify potential issues, or suggest improvements based on their domain knowledge and expertise.
3. Reward Model Design: Based on the human feedback, a reward model can be constructed to quantify the desirability of different haulier-to-demand assignments. The reward model serves as a guide for the AI system to learn from the human-expert preferences and optimize its decision-making process.
4. RLFH Training: The AI system is then trained using a combination of the initial policy and the reward model derived from human feedback. The system learns to adjust its haulier-to-demand assignments based on the reward signals provided by the human feedback. This collaborative learning process improves the system's capacity matching capabilities.
5. Iterative Feedback Refinement: The AI system's performance is evaluated, and the process of collecting human feedback and refining the reward model is repeated iteratively. The system continuously incorporates the insights and expertise of human operators, allowing for ongoing improvement and adaptation to changing haulier capacity matching requirements.
By leveraging RLFH in the haulier capacity matching problem, the AI system benefits from the domain knowledge and expertise of human operators. The system learns from human feedback, refines its decision-making process, and aligns the haulier-to-demand assignments with the preferences and requirements of both hauliers and shippers. This collaborative approach enhances the efficiency and accuracy of haulier capacity matching, leading to optimized resource utilization, improved logistics operations, and enhanced customer satisfaction.
Plano de Trabalhos - Semestre 1
1- State of the art [Sept – Oct]
2- Problem statement, research aims and objectives [Nov]
3- Design and first implementation of the AI system [Nov – Jan]
4- Thesis proposal writing [Dec – Jan]
Plano de Trabalhos - Semestre 2
5- Improvement of the AI system [Feb – Apr]
6- Experimental Tests [Apr – May]
7- Paper writing [May – Jun]
8- Thesis writing [Jan – Jul]
Condições
The work should take place at the Centre for Informatics and Systems of the University of Coimbra (CISUC) at the Department of Informatics Engineering of the University of Coimbra in the scope of the Nexus research project (https://nexuslab.pt/).*
A 930,98 euros per month scholarship is foreseen for 6 months. The attribution of the scholarship is subject to a public application.
*Project "Agenda Mobilizadora Sines Nexus". ref. No. 7113, supported by the Recovery and Resilience Plan (PRR) and by the European Funds Next Generation EU, following Notice No. 02/C05-i01/2022, Component 5 - Capitalization and Business Innovation - Mobilizing Agendas for Business Innovation.
The candidate must have a very good background knowledge in Artificial Intelligence, especially in the areas of Artificial Intelligence that the internship falls within.
Observações
Advisors:
Luís Macedo, Filipe Araújo.
Orientador
Luís Macedo
macedo@dei.uc.pt 📩