Propostas Submetidas

DEI - FCTUC
Gerado a 2024-05-02 00:38:53 (Europe/Lisbon).
Voltar

Titulo Estágio

Requirement Extraction and Analysis using Natural Language Processing

Áreas de especialidade

Sistemas Inteligentes

Engenharia de Software

Local do Estágio

Critical Software - Coimbra

Enquadramento

Critical Software validates and verifies critical systems from Space to Railway, and one important art of this task is to analyse documents with hundreds (even thousands) of different requirements that have to be tested. This requires a lot of reading and analysis that only an engineer can fulfil. But there are some tasks within this process that can be automated, making it more efficient. One example is the lifecycle management of customer requirements and other input artefacts that Critical receives from its various customers.

Critical Software wants to explore and develop a system that can help the engineers responsible for the requirement analysis being more efficient, by helping them doing their tasks faster. Critical plans to achieve this by developing a system that is able to automatically extract the requirements and tests from customers documents. The approach that is envisioned to achieve this is using Text Mining and Natural Language Processing, since there are various document formats, though all of them are highly structured. Summarising this internship encompasses several main challenges:

- How to extract text from a document that can have several formats. PDF is a specially challenging format.
- How to identify in the middle of the text what is a requirement and its constituent parts.
- How to formalise the understood requirements, so that an engineer can validate them, using this feedback to improve the system accuracy (learning).
- Explore other ways to use the extracted information to make the engineers more efficient, for example, helping them in change management and version tracking.

Objetivo

The main goal of this internship is to develop a prototype that given a document i able to perform several tasks:
- Extract text from the document.
- Identify and classify each components of the text into requirements and it's parts.
- To show the extracted components to an engineer (using an UI) so that s/he can validate/modify them.
- Store the extracted and validated requirements in a data base for later use.
- To exploit the stored information, so that the engineers can be more efficient.
The prototype has to deal with the English language only.

Plano de Trabalhos - Semestre 1

The internship has the following stages:
- Defining the Scope and Prototype Main Characteristics [result: requirement list, M1 and M2]
- Reading and Writing the State of the Art on Text Mining and NLP [result: state of the art, M1 to M4]
- Study of a specific use case from a real project [result: use case description, M1 to M4]
- Creating the Technical Specification [result: technical specification, M5 and M6]
- Writing the internship proposal [result: internship proposal, M5 and M6]

Plano de Trabalhos - Semestre 2

The second semester comprises the following stages:
- Setting up the Development Environment [result: Development Environment, M6]
- Development [result: first prototype, M7 to M9]
- Testing and Benchmarking [result: second prototype, M10]
- Writing the internship report [result: internship report, M10 and M11]

Condições

É fornecido computador e posto de trabalho.

Orientador

Paulo Gomes
paulo.gomes@criticalsoftware.com 📩