Propostas Atribuidas

Gerado a 2025-03-13 07:16:11 (Europe/Lisbon).

Titulo Estágio

BusinessExtract: Automatically Extracting Information from Business Reports

Áreas de especialidade

Engenharia de Software

Sistemas Inteligentes

Local do Estágio



Nowadays expected relation between unstructured information (documents in Natural Language) and structured information (databases, ontologies, etc) is about 80% to 20%, with the unstructured information having the trend to increase and not the opposite. Despite of the clear evidence, the processing of unstructured data has not been completely mastered and Natural Language Processing (NLP) is still an area evolving. The latest developments in NLP techniques and resources along with new Machine Learning techniques like Deep Learning is moving the processing of unstructured information forward. Critical Software systems deal with several unstructured information sources and would benefit a lot from NLP techniques like Text Mining. These techinques are able to enrich and extract structured information that computer systems can process and analyse, helping humans to better processed unstructured information, like documents.


The main goal of this internship is to develop a system capable of automatically extracting structured information from business reports and other documents. The system has to deal with the English language and has to be able to be easily integrated with several Critical Software solutions. Allowing these system to take advantage of the extracted information to be more efficient and effective. This goal can be subdivided in:
- Defining the Scope of the Extraction System and the Business Case
- Creating the Technical Specification and Selecting the Development Platform
- Development of the Solution
- Testing and Benchmarking the Solution
The initial idea is to use business case that is able to serve as a proof of concept and that can later be extrapolated to other Critical Software solutions. The target language is not yet defined, but it will be English or Portuguese.

Plano de Trabalhos - Semestre 1

Fases do estágio: descrição, resultados, calendarização
The internship has the following stages:
- Defining the Scope of the Extraction System and the Business Case [result: requirement list and business case, September and October]
- Reading and Writing the State of the Art [result: state of the art, September to December]
- Study the NLP development platforms and choosing one [result: platform description and comparison, September to December]
- Creating the Technical Specification [result: technical specification, January and February]
- Writing the internship proposal [result: internship proposal, January and February]

Plano de Trabalhos - Semestre 2

Fases do estágio: descrição, resultados, calendarização
The second semester comprises the following stages:
- Setting up the NLP Development Environment [result: NLP Development Environment, February]
- Development [result: first prototype, March to May]
- Testing and Benchmarking [result: second prototype, June]
- Writing the internship report [result: internship report, June and July]


O Orientador Académico e o Orientador Industrial são responsáveis por acompanhar o Estagiário garantindo que este tem as condições necessárias para a execução do estágio, incluindo acesso a instalações e materiais ne-cessários para o efeito. A avaliação do Estágio é da responsabilidade da Instituição de Ensino Superior, sendo o Orientador Industrial responsável por prestar informações requeridas por esta para esse efeito.
A bolsa de estágio oferecida é de 450 euros.


Bruno Saraiva 📩