Proposta sem aluno

DEI - FCTUC
Gerado a 2024-05-07 09:46:08 (Europe/Lisbon).
Voltar

Titulo Estágio

DS4NP 2.0: Machine Learning Microservices for the Data Science for Non-Programmers Platform

Áreas de especialidade

Engenharia de Software

Sistemas Inteligentes

Local do Estágio

DEI

Enquadramento

The leading consulting company McKinsey estimates that there will be a shortage of data scientists to enable organizations to explore the full potential of big data. While the United States may face a shortage of 140,000 to 190,000 professionals with strong analytical skills and with the know-how to analyze big data, Portugal may face an even more dramatic shortage. In contrast to US universities that have been providing Data Science degrees for several years (e.g., at Berkeley and Carnegie Mellon Universities), Portuguese universities are just making their first steps in the area.

This shortage of professionals cannot be mitigated easily, since training students to become data scientists requires time and resources to teach skills from diverse knowledge areas such as Computer Science, Statistics, Business, and Data Visualization.

Hence, the objective of the FCT DataScience4NP project is to explore the use of visual programming paradigms to enable non-programmers to be part of the Data Science workforce. More specifically, the objective of the DataScience4NP project is to build Cloud Native Applications (CNA) for Data Science using microservices.

Objetivo

This thesis will continue the current version of the DS4NP cloud and microservices-based machine learning platform (based on technologies such as Kubernetes, Docker and Netflix Conductor, for orchestration). In particular, the main goals are the following:
• To redefine the current architectural solution so as to handle current issues, namely pertaining to low computational performance, in particular regarding data exchange aspects.
• To adapt the platform to support Big Data algorithms, resorting to frameworks such as Apache Spark or Google TensorFlow, besides populating it with a large set of classical algorithms throughout the complete machine learning pipeline.
• To prepare the current service for a future commercial exploitation in the cloud.
• To perform thorough usability testing.


Technologies to use:
• Cloud: Kubernetes, Docker, OpenStack (cloud do DEI)
• Orchestrator agent: Netflix Conductor (along with Redis and ElasticSearch)
• Machine learning libraries: Weka, ScikitLearn, Apache Spark, Google TensorFlow
• Other technologies: MongoDB, Flask applications, ReactJS (frontend)

Plano de Trabalhos - Semestre 1

- Review of the state of the art and technologies on container technologies and machine learning algorithms
- Analysis of the current DS4NP platform
- Requirement analysis (including both functional and non-functional requirements)
- System architecture
- Writing of the preliminary thesis

Plano de Trabalhos - Semestre 2

- System development
- System testing
- Writing of the final thesis
- Writing of a scientific article

Condições

Nada a referir.

Orientador

Rui Pedro Paiva, Filipe Aráujo
ruipedro@dei.uc.pt 📩