Propostas para selecção dos alunos

DEI - FCTUC
Gerado a 2024-05-02 21:51:38 (Europe/Lisbon).
Voltar

Titulo Estágio

Text Mining Tool to Identify Software Vulnerabilities in Design Time

Local do Estágio

DEI-SSE

Enquadramento

Software vulnerabilities are a big issue for software development. When they are exploited, they can cause consequences such as unauthorized authentication, data losses, financial losses, among others. Current techniques can detect vulnerabilities by analyzing the source code (static techniques) or by executing the software (dynamic techniques) (B. Liu, L. Shi, Z. Cai, and M. Li, “Software Vulnerability Discovery Techniques: A Survey,” in 2012 Fourth International Conference on Multimedia Information Networking and Security, Nov 2012). However, they are not effective, as they can't reveal all the vulnerabilities. Additionally, they identify alerts that can be actual vulnerabilities or false alarms (false positives).
The source code is the main source of relevant information for identifying software vulnerabilities in design time (when the code is in development and not running). Some works have used it to extract data from the code using data mining techniques. For example, Walden et al. predict vulnerabilities of three PHP web applications (Drupal, Moodle, and PHPMyAdmin) using the machine learning random forest algorithm (J. Walden, J. Stuckman, and R. Scandariato, “Predicting Vulnerable Components: Software Metrics vs Text Mining,” in 2014 IEEE 25th International Symposium on Software Reliability Engineering, Nov 2014). To predict the vulnerabilities in their dataset, they compared two different types of static data: text mining and software metrics. Text mining information is the one that has the best results.
Although this technique has already been addressed in the literature, no tool allows creating a structured dataset with text mining information for different languages, especially for C/C++. The focus of this internship will be on the creation of a mechanism to automatically extract information of source code, mainly of C/C++ and/or Java projects. The data should be used in algorithms of machine learning to improve vulnerability detection.

Objetivo

The learning objectives of this master internship are:
1) Software Vulnerabilities: understand the most frequent types of vulnerabilities and how they can affect a software system in case one of them is exploited;
2) Text Mining: understand the main techniques to obtain information from a textual document, especially the source code of an application;
3) Secure Software Development: improve the coding skills to create secure code.

Plano de Trabalhos - Semestre 1

[13/09/2021 a 01/10/2021] Literature Review and Setup experiments.
Study of the concepts to be used in the internship, namely Software vulnerabilities and data mining techniques.
[04/10/2021 a 29/10/2021] Technology and target project definition.
Definition of the technology to extract text mining information about the source. Definition of the project that will be the target of the initial data collection.
[01/11/2021 a 26/11/2021] Definition of the data collection software architecture.
Design of the solution, key components for the automated collection.
[29/11/2021 a 14/01/2022] Write the Dissertation Plan

Plano de Trabalhos - Semestre 2

[31/01/2022 a 29/04/2022] Development of the text mining extraction mechanism.
Implementation of the designed solution, with data from vulnerabilities of CVE-Details.
[18/04/2022 a 17/06/2022] Experimentation with the collected data mining information.
Define experiments to validate the key properties of the solution (probably using machine learning and heuristics).
[02/05/2022 a 03/06/2022] Write a paper or a technical report.
[16/05/2022 a 29/06/2022] Write the thesis.

Condições

This internship can have a studentship to support the development of this work, according to the evolution of the internship. Also, the work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group. A workplace will be provided as well as the required computational resources.

Observações

Professor Marco Vieira is the co-advisor of this proposal.

Orientador

José Alexandre D\'Abruzzo Pereira
josep@dei.uc.pt 📩