Titulo Estágio
Development of an Orchestration Engine for the DS4NP Platform
Áreas de especialidade
Engenharia de Software
Local do Estágio
DEI
Enquadramento
The leading consulting company McKinsey estimates that there will be a shortage of data scientists to enable organizations to explore the full potential of big data. While the United States may face a shortage of 140,000 to 190,000 professionals with strong analytical skills and with the know-how to analyze big data, Portugal may face an even more dramatic shortage. In contrast to US universities that have been providing Data Science degrees for several years (e.g., at Berkeley and Carnegie Mellon Universities), Portuguese universities are just making their first steps in the area.
This shortage of professionals cannot be mitigated easily, since training students to become data scientists requires time and resources to teach skills from diverse knowledge areas such as Computer Science, Statistics, Business, and Data Visualization.
Hence, the objective of the FCT DataScience4NP (Data Science for Non-Programmers) project is to explore the use of visual programming paradigms to enable non-programmers to be part of the Data Science workforce. More specifically, the objective of the DataScience4NP project is to build Cloud Native Applications (CNA) for Data Science using microservices.
Objetivo
This thesis will continue the current version of the DS4NP cloud and microservices-based machine learning platform (based on technologies such as Kubernetes, Docker and Netflix Conductor, for orchestration)
In particular, the main goal is to build an orchestration engine based on Netflix Conductor (and overcoming its current limitations regarding the maximum number of parallel tasks) to manage and execute complex workflows driving data science applications. The developed orchestrator will replace the current orchestrator of the DS4NP platform.
Data Science applications can be described using processes (workflows), which consist of multiple distinct interconnected analytical microservices provided as-a-service. Data scientists can describe these processes and upload them to an orchestrator (workflow engine), which takes care of state management, correct execution order, parallelism, and synchronization. The use of an orchestrator facilitates the control and visualization of the interactions between the microservices. To enable processes to be reusable and adaptable, workflow templates and variability modeling are key ingredients to explore.
Plano de Trabalhos - Semestre 1
- Review of the state of the art and technologies on orchestration engines
- Analysis of the current DS4NP platform
- Requirement analysis (including both functional and non-function requirements)
- System architecture
- Writing of the preliminary thesis
Plano de Trabalhos - Semestre 2
- System development
- System testing
- Writing of the final thesis
- Writing of a scientific article
Condições
--
Orientador
Filipe Araújo e Rui Paiva
filipius@uc.pt 📩