Titulo Estágio
Software vulnerabilities discovery with Static Techniques
Local do Estágio
DEI-FCTUC
Enquadramento
Software vulnerabilities are a big issue for software development. When they are exploited, they can cause consequences such as unauthorized authentication, data losses, financial losses, among others. Current techniques can detect vulnerabilities by analyzing the source code (static techniques) or by executing the software (dynamic techniques) (B. Liu, L. Shi, Z. Cai, and M. Li, “Software Vulnerability Discovery Techniques: A Survey,” in 2012 Fourth International Conference on Multimedia Information Networking and Security, Nov 2012). However, they are not effective, as they can't reveal all the vulnerabilities. Additionally, they reveal alerts that can be actual vulnerabilities or false alarms (false positives).
A way of identifying vulnerabilities by analyzing the source code is through software metrics (Medeiros, N. et. al. “Software metrics as indicators of security vulnerabilities”, ISSRE 2017), combining them using techniques such as machine learning. A previous study created a dataset with data about projects in C/C++ and Java with information about vulnerabilities and the software metrics about the source code (https://eden.dei.uc.pt/~nmsa/metrics-dataset/). Unfortunately, this dataset was created in 2016, and it is outdated. Not only the vulnerabilities evolve, but also the attacking techniques. Hence, the need to create a mechanism that collects the vulnerabilities automatically as soon as they are published in the vulnerability websites.
Another effective technique to detect software vulnerabilities during design-time is static code analysis. Different from software metrics, static analysis tools (SATs) report the vulnerabilities directly. However, they suffer from a high number of false alarms. Also, many vulnerabilities are not reported by some SATs. Using more than one SAT, this issue can be decreased, as it has already been addressed in the literature (Algaith et al., “Finding SQL Injection and Cross Site Scripting Vulnerabilities with Diverse Static Analysis Tools”, EDCC 2018). Nevertheless, it does not solve this issue.
The focus of this internship will be on the creation of a mechanism to automatically collect vulnerabilities of C/C++ and/or Java projects. They should be enhanced with static data about the source code (such as software metrics and static analysis alerts). The data should be used in algorithms of machine learning to improve vulnerability detection.
Objetivo
The learning objectives of this master internship are:
1) Software Vulnerabilities: understand the most frequent types of vulnerabilities and how they can affect a software system in case one of them is exploited;
2) Software Metrics: understand about the most common types of software metrics, such as complexity (e.g. cyclomatic complexity), volume (e.g. lines of code), coupling (e.g. coupling between objects), and cohesion (e.g. lack of cohesion);
3) Static Code Analysis: understand how the static analysis tools work and learn how to use some of them;
4) Secure Software Development: improve the coding skills to create secure code.
Plano de Trabalhos - Semestre 1
[14/09/2020 a 02/10/2020] Literature Review and Dataset Setup.
Study of the concepts to be used in the internship, namely Software vulnerabilities, software metrics, and static analysis. Setup of a preliminary dataset.
[05/10/2020 a 30/10/2019] Technology and target project definition.
Definition of the technology to collect the software vulnerabilities, software metrics, and static analysis alerts. Definition of the project that will be the target of the initial data collection, as well as the static analysis tools to be used.
[02/11/2020 a 27/11/2020] Definition of the data collection software architecture and database scheme. Design of the solution, key components for the automated collection mechanism as well as the database scheme.
[30/11/2020 a 15/01/2021] Write the Dissertation Plan
Plano de Trabalhos - Semestre 2
[01/02/2021 a 30/04/2021] Development of the vulnerability collection mechanism. Implementation of the designed solution, with data from CVE-Details. Not only the vulnerabilities should be collected, but also the software metrics (in the file, function, and class levels) and static analysis alerts from the source code.
[19/04/2021 a 18/06/2020] Experimentation with the dataset. Define experiments to validate the key properties of the solution (probably using machine learning and heuristics).
[03/05/2021 a 04/06/2021] Write a paper or a technical report.
[17/05/2021 a 30/06/2021] Write the thesis.
Condições
A studentship may be available to support the development of this work, according to the evolution of the internship. Also, the work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group. A workplace will be provided as well as the required computational resources.
Observações
Trabalho a ser co-orientado por Marco Vieira.
Orientador
José Alexandre D’Abruzzo Pereira
josep@dei.uc.pt 📩