Titulo Estágio
Using Fault Injection to Support the Development of Dependable Complex Systems
Áreas de especialidade
Engenharia de Software
Local do Estágio
CISUC-SSE
Enquadramento
Several techniques have been developed over the years to avoid or handle faults (e.g., testing, coding practices). In particular, Online Failure Prediction (OFP) attempts to predict the occurrence of failures in the near future by combining past data and the current system state [Salfner, F., Lenk, M., & Malek, M. (2010). A survey of online failure prediction methods. ACM Computing Surveys (CSUR), 42(3), 1-42.]. Such predictions allow taking preemptive measures to avoid, or at least mitigate, their consequences. Notwithstanding the potential of OFP, it is still not widely implemented.
Failures are rare events and thus failure data are often not available. Even if it were possible to gather such data from real systems, that would take years (due to the reliability of modern systems), and by then they would likely be outdated. Over the years fault injection has been accepted as a viable alternative to generate realistic failure data. Still, fault injectors are difficult to implement/develop (especially when targeting entire Operating Systems (OSs)) and thus research on OS-level failure prediction has become stale or relies on outdated OSs. To overcome this, recent work conducted a comprehensive fault injection campaign on an up-to-date LTS Linux kernel 3.16.82 using an updated implementation of a well-known fault injection technique [Campos, J. R., & Costa, E. (2020, October). Fault Injection to Generate Failure Data for Failure Prediction: A Case Study. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) (pp. 115-126). IEEE.].
One of the main limitations of current related work is the representativeness of the generated data. A proper fault injection campaign tries to address this by considering several factors (e.g., fault model, location, …). Still, as related literature on OS-level OFP is almost non-existent it is not possible to validate or compare the generated data. This is further aggravated because existing works consider only a single fault injector and therefore it is not possible to assert if the generated data are in fact representative of the target system or specific to the fault injector used in the study.
The focus of this internship is exploring the applicability of current fault injectors to assess the dependability of complex systems, more precisely on modern OSs. The goal is to thoroughly identify and compare candidate fault injectors and select the one with the highest potential. The selected fault injector will then be used to conduct a fault injection campaign on a recent OS. Afterward, the generated data will be studied, analyzed, and compared with existing failure data from previous studies.
Objetivo
The learning objectives of this master internship are:
1) Dependability, fault tolerance: study the subject of fault tolerance and its techniques as means to improve the dependability of modern systems;
2) Fault Injection: understand how to use fault injection and conduct representative fault injection campaigns to assist in developing dependable software;
3) Secure Software Development: study common software bugs and how to classify them, thus improving coding skills to create more dependable solutions;
4) Online Failure Prediction: understand how OFP works and can be used to predict and mitigate incoming failures;
5) Research Design: understand how to design and execute an experimental process to address complex and open research issues
Plano de Trabalhos - Semestre 1
[20/09/2021 a 04/10/2021] Literature review:
Study the concepts to be used in the internship, namely fault tolerance, fault injection, and online failure prediction
[05/10/2021 a 08/11/2021] Analysis and comparison of candidate fault injectors :
Identification, analysis, and comparison of candidate fault injectors for complex/large systems
[09/11/2021 a 30/11/2021] Definition of the experimental process:
Design and plan the experimental process that will be used to conduct the fault injection campaign. This includes defining all the relevant components, from fault model to workload, as well as the architecture of the testbed that will be used
[01/12/2021 a 15/01/2022] Write the dissertation plan
Plano de Trabalhos - Semestre 2
[07/02/2022 a 6/03/2022] Set up the experimental testbed:
Set up the testbed required to conduct the experiments. This includes several considerations, such as preparing the target system to be completely automated, deploying monitors for the system under test, and implementing the intended failure detectors.
[7/03/2022 a 17/04/2022] Conduct the fault injection campaign:
Use the selected fault injector to conduct a fault injection campaign. Depending on the fault injector this may also require additional tasks, such as patching and recompiling the target system or making some modifications to port it to modern systems
[18/04/2022 a 08/05/2022] Explore, assess, and compare the generated data:
Process, explore, and analyze the generated data to understand the behavior of the target system in the presence of faults. Compare with failure data from existing works
[09/05/2022 a 04/06/2022] Write a technical report
[17/05/2022 a 30/06/2022] Write the thesis
Condições
Depending on the evolution of the internship a studentship may be available to support the development of the work. Also, the work is to be executed at the laboratories of the CISUC’s Software and Systems Engineering Group. A workplace will be provided as well as the required computational resources.
Observações
Trabalho a ser co-orientado por Marco Vieira
Orientador
João R. Campos
jrcampos@dei.uc.pt 📩