Titulo Estágio
ANSA: Conversational AI for Data Scientists
Áreas de especialidade
Sistemas Inteligentes
Engenharia de Software
Local do Estágio
Coimbra
Enquadramento
Today’s way to interact between a human and a computer system is changing. Computer systems are becoming more intelligent in the way they communicate with humans, with the latest developments in Natural Language Processing helping a lot the improvement of speech recognition and conversational interaction, like bots and home appliances (Echo and Google Home). The days of having computer personal assistants has come, with Siri, Cortana and others. But the problem is that there are still areas more complex to be addressed, namely asking for data analysis to a personal assistant.
Some of the tasks that a Data Scientist performs are routine and can be automated, for instance the first analysis of a dataset, where only a statistical analysis is done. This type of automation is something that Critical Software wants to integrate in the ANSA platform that it is developing. ANSA is a conversational/search platform that allows the user to do analytical questions and queries in natural language, like: “what are my top 5 customers in USA?” or “list sales product X in Japan”. Critical wants to integrate this automation of the simple analysis of the data scientist work using as interaction a conversational interface (search in this case or a chatbot). Imagine that you can type in ANSA’s search box a query like: “model the variable sales” and the system provides you with a machine learning model of the data, or that you type “segment customers kmeans” and it generates and shows the main clusters of customers using the kmeans algorithms. Not only this, but also go a step further and automatically analyse the dataset that the data scientist is working on.
This internship has also the goal of providing her/him with interesting insights using machine learning algorithms. So the main goals are:
- Providing the Data Scientist with basic analysis of the dataset being studied.
- Enabling the Data Scientist an easy search-based interface (or conversational) to ask for Machine Learning analysis.
- Giving the user insights about the dataset by using machine learning analysis.
Objetivo
The main goal of this internship is to develop an engine capable of doing three main functionalities: basic statistic analysis of datasets; question/query-based machine learning analysis; and engine-generated insights based on machine learning. The engine has to deal with the English language and has to deal with business information like a personal assistant would do, answering with text, charts, tables and commands accordingly. This goal can be subdivided in:
- Defining the Scope and Engine Main Characteristics
- Creating the Technical Specification and Selecting the Development Platform
- Development of the Solution
- Testing and Benchmarking the Solution
This internship will be integrated in a bigger project going on Critical Software (the ANSA project), where there is already a base platform.
Plano de Trabalhos - Semestre 1
The internship has the following stages:
- Defining the Scope and Engine Main Characteristics [result: requirement list, M1 and M2]
- Reading and Writing the State of the Art [result: state of the art, M1 to M4]
- Study of the ANSA platform [result: platform description, M1 to M4]
- Creating the Technical Specification [result: technical specification, M5 and M6]
- Writing the internship proposal [result: internship proposal, M5 and M6]
Plano de Trabalhos - Semestre 2
The second semester comprises the following stages:
- Setting up the Development Environment [result: Development Environment, M6]
- Development [result: first prototype, M7 to M9]
- Testing and Benchmarking [result: second prototype, M10]
- Writing the internship report [result: internship report, M10 and M11]
Condições
É fornecido computador e posto de trabalho.
Orientador
Paulo Gomes
paulo.gomes@criticalsoftware.com 📩