Propostas sem aluno

DEI - FCTUC
Gerado a 2024-12-04 19:01:04 (Europe/Lisbon).
Voltar

Titulo Estágio

Creation of an Automated Machine Learning Module for Data Scientists

Áreas de especialidade

Engenharia de Software

Sistemas Inteligentes

Local do Estágio

Coimbra

Enquadramento

Today’s way to interact between a human and a computer system is changing. Computer systems are becoming more intelligent in the way they communicate with humans, with the latest developments in Natural Language Processing helping a lot the improvement of speech recognition and conversational interaction, like home appliances (Amazon Echo and Google Home). The days of having computer personal assistants has come, with Siri, Cortana and others. But the problem is that there are still areas more complex to be addressed, namely asking for data analysis using a Natural Language Interface (NLI), which can be a standard conversational channel, a search engine or a voice-enabled device. In Critical Software, we are exploring how communication using NLIs can be used to help Data Scientists perform their job faster and better. This is here Automated Machine Learning (or AutoML) comes into play. Our goal in this internship is to integrate AutoML in a Data Science module that is part of a NLI platform that we are developing.

Critical Software is developing a NLI platform, which is a search-based platform that allows the user to do analytical questions/queries in natural language, like: “what are my top 5 customers in USA?” or “list sales product TVs in Japan”. The challenge that we are addressing in this internship is: how can this platform be useful to a Data Scientist job? Can a Data Scientist talk and ask for analysis in natural language? How can AutoML be used in this platform to leverage the user-friendliness of natural language?

Objetivo

The main goal of this internship is to develop a module in Critical’s platform capable of performing several functionalities that Data Scientists do in their daily work in an automated way (also known as AutoML). These activities have several viewpoints:
- Data Understanding (doing statistical analysis, for example);
- Data Preprocessing (feature engineering, for example);
- Model Creation and Evaluation (automatically select the best algorithm and it’s parameters, applied to several machine learning tasks, like classification, clustering, forecasting or others).
The engine has to deal with scalability issues and data volume.

Plano de Trabalhos - Semestre 1

The internship has the following stages:
- Defining the Scope and Requirements of the Modules to be Developed [result: requirement list, M1 and M2]
- Reading and Writing the State of the Art [result: state of the art, M1 to M4]
- Study of the NLI platform [result: platform description, M1 to M4]
- Creating the Technical Specification [result: technical specification, M5 and M6]
- Writing the internship proposal [result: internship proposal, M5 and M6]

Plano de Trabalhos - Semestre 2

The second semester comprises the following stages:
- Setting up the Development Environment [result: Development Environment, M6]
- Development of the Modules [result: first prototype, M7 to M9]
- Validation and Verification [result: second prototype, M10]
- Writing the internship report [result: internship report, M10 and M11]

Condições

É fornecido portátil e local de trabalho.

Orientador

Paulo Gomes
paulo.gomes@criticalsoftware.com 📩