Propostas Submetidas

DEI - FCTUC
Gerado a 2024-03-28 20:27:49 (Europe/Lisbon).
Voltar

Titulo Estágio

Using Fault Injection to Support the Development of Dependable Complex Systems

Áreas de especialidade

Engenharia de Software

Local do Estágio

CISUC-SSE

Enquadramento

Several techniques have been developed over the years to avoid or handle faults (e.g., testing, coding practices). In particular, Online Failure Prediction (OFP) attempts to predict the occurrence of failures in the near future by combining past data and the current system state [Salfner, F., Lenk, M., & Malek, M. (2010). A survey of online failure prediction methods. ACM Computing Surveys (CSUR), 42(3), 1-42.]. Such predictions allow taking preemptive measures to avoid, or at least mitigate, their consequences. Notwithstanding the potential of OFP, it is still not widely implemented.

Failures are rare events and thus failure data are often not available. Even if it were possible to gather such data from real systems, that would take years (due to the reliability of modern systems), and by then they would likely be outdated. Over the years fault injection has been accepted as a viable alternative to generate realistic failure data. Still, fault injectors are difficult to implement/develop (especially when targeting entire Operating Systems (OSs)) and thus research on OS-level failure prediction has become stale or relies on outdated OSs. To overcome this, recent work conducted a comprehensive fault injection campaign on an up-to-date LTS Linux kernel 3.16.82 using an updated implementation of a well-known fault injection technique [Campos, J. R., & Costa, E. (2020, October). Fault Injection to Generate Failure Data for Failure Prediction: A Case Study. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) (pp. 115-126). IEEE.].

One of the main limitations of current related work is the representativeness of the generated data. A proper fault injection campaign tries to address this by considering several factors (e.g., fault model, location, …). Still, as related literature on OS-level OFP is almost non-existent it is not possible to validate or compare the generated data. This is further aggravated because existing works consider only a single fault injector and therefore it is not possible to assert if the generated data are in fact representative of the target system or specific to the fault injector used in the study.

The focus of this internship is exploring the applicability of current fault injectors to assess the dependability of complex systems, more precisely on modern OSs. The goal is to conduct a fault injection campaign on a recent OS. The generated data will then be studied, analyzed, and compared with existing failure data from previous studies.

Objetivo

The learning objectives of this master internship are:
1) Dependability, fault tolerance, fault injection: study the subject of fault tolerance, focusing on fault injection, as means to improve the dependability of modern systems
2) Online Failure Prediction: understand the problem of OFP and how it can be used to predict and mitigate incoming failures
3) “Classical” Machine Learning: understand how to use and the impact of classical ML techniques on creating accurate models
4) “Advanced” Machine Learning: study more advanced ML concepts such as concept drift and batch/incremental/online learning to deal with realistic non-stationary environments
5) Research Design: understand how to design and execute an experimental process to address complex and open research issues

Plano de Trabalhos - Semestre 1

[12/09/2022 a 04/10/2022] Literature review
Study the concepts to be used in the internship, namely online failure prediction, “classical” machine learning, concept drift, and online learning
[05/10/2022 a 08/11/2022] Analysis and selection of target techniques
Identification, analysis, and selection of which concept drift and online learning techniques will be studied
[09/11/2022 a 30/11/2022] Definition of the experimental process
Design and plan the experimental process that will be used to conduct the study. This includes defining all the relevant components, from simulating and detecting concept drift to updating the predictors, as well as the architecture of the testbed that will be used
[01/12/2022 a 15/01/2023] Write the dissertation plan

Plano de Trabalhos - Semestre 2

[07/02/2023 a 6/03/2023] Set up the experimental testbed
Set up the testbed required to conduct the experiments. This comprises several tasks, such as simulating and detecting concept drift, to implementing mechanisms to allow online learning.
[7/03/2023 a 17/04/2023] Conduct the experimental campaign
Use the testbed to conduct the experimental process, considering different types and degrees of concept drift as well as different ML methods
[18/04/2023 a 08/05/2023] Explore and assess the generated data
Process, explore, and analyze the generated data to understand the behavior of the predictors under concept drift and the effectiveness of the different techniques considered
[09/05/2022 a 05/06/2023] Write the thesis.

Condições

Depending on the evolution of the internship a studentship may be available to support the development of the work. Also, the work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group. A workplace will be provided as well as the required computational resources.

Observações

Trabalho a ser co-orientado por Marco Vieira

Orientador

João R. Campos
jrcampos@dei.uc.pt 📩