Titulo Estágio
Generative Adversarial Networks to Generate Failure Data
Áreas de especialidade
Sistemas Inteligentes
Engenharia de Software
Local do Estágio
CISUC-SSE
Enquadramento
Several techniques have been developed over the years to avoid or handle faults (e.g., testing, coding practices). In particular, Online Failure Prediction (OFP) attempts to predict the occurrence of failures in the near future by combining past data and the current system state [1]. Such predictions allow taking preemptive measures to avoid, or at least mitigate, their consequences. Notwithstanding the potential of OFP, it is still not widely implemented.
Failures are rare events and thus failure data are often not available. Even if it were possible to gather such data from real systems, that would take years (due to the reliability of modern systems), and by then they would likely be outdated. Fault injection has been accepted as a viable alternative to generate realistic failure data, but fault injectors are difficult to implement/develop. Moreover, due to the size and complexity of modern software, properly conducting a representative fault injection campaign requires executing thousands of time-consuming experiments.
Among Machine Learning (ML) techniques, Generative Adversarial Networks (GANs) have been recurrently used in various domains as a means to generate synthetic realistic data that resemble the distribution of existing data [3]. Briefly, GANs are composed of two neural networks: a generator that tries to generate data samples that are similar to the training data, and a discriminator that tries to distinguish between real and fake data samples. The focus of this internship is to explore the applicability of using GANs to generate synthetic failure data that can be used to create failure predictors. This process will target an up-to-date Linux kernel, and the performance of the resulting models will be assessed with existing failure data obtained through fault injection.
1. Salfner, F., Lenk, M., & Malek, M. (2010). A survey of online failure prediction methods. ACM Computing Surveys (CSUR), 42(3), 1-42.
2. Campos, J. R., & Costa, E. (2020, October). Fault Injection to Generate Failure Data for Failure Prediction: A Case Study. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) (pp. 115-126). IEEE.
3. Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53-65.
Objetivo
The learning objectives of this master’s internship are:
1) Dependability, fault tolerance: study the subject of fault tolerance and its techniques as means to improve the dependability of modern systems;
2) Online Failure Prediction: understand how OFP works and can be used to predict and mitigate incoming failures;
3) Advanced Machine Learning: understand, learn, and implement advanced Machine Learning (ML) techniques, more precisely, Generative Adversarial Networks (GANs)
4) Research Design: understand how to design and execute an experimental process to address complex and open research issues
Plano de Trabalhos - Semestre 1
[12/09/2022 a 20/10/2022] Literature review
Study the concepts to be used in the internship, namely fault tolerance, online failure prediction, and Generative Adversarial Networks (GANs)
[21/10/2022 a 30/11/2022] Definition of the experimental process
Design and plan the experimental process that will be used to conduct the generative campaign. This includes planning and defining all the relevant components such as the GAN’s architecture as well as the architecture of the testbed that will be used to generate and test the data
[01/12/2021 a 15/01/2023] Write the dissertation plan
Plano de Trabalhos - Semestre 2
[06/02/2023 a 17/03/2023] Set up the experimental process and train the GAN models
Set up the testbed and train the GAN model. This includes implementing the target architecture and trying to develop generators that can accurately generate realistic failure data for different failure modes.
[18/03/2023 a 08/05/2023] Explore, assess, and compare the generated data
Compare the generated data with existing failure data obtained through fault injection. Develop failure predictors based on the generated data and assess their performance with the existing data. Assess the viability of the proposed approach as a means to complement fault injection to generate realistic failure data.
[09/05/2023 a 05/06/2023] Write the thesis.
Condições
Depending on the evolution of the internship a studentship may be available to support the development of the work. The work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group. A workplace will be provided as well as the necessary computational resources.
Observações
Trabalho a ser co-orientado por Marco Vieira
Orientador
João R. Campos
jrcampos@dei.uc.pt 📩