Titulo Estágio
XWatchDog – System Diagnosis by Analysis of Exceptions in Log Messages on a Large Scale System Environment
Áreas de especialidade
Engenharia de Software
Sistemas Inteligentes
Local do Estágio
Remote / Leiria
Enquadramento
In complex IT environments, composed by thousands of different applications, the amount of data generated by their executions is an invaluable source of operational information. This is not only true in a Business perspective, but also, and foremost, from a Technical one.
In the multinational La Redoute IT department, there are thousands of applications that run every day originating thousands of log entries each day; these applications include Spring batches, WS SOAP, WS REST API, Cobol, Angular. In the scope of Java applications, in many cases exception handling is limited to logging the exception, but not taking advantage of the information provided by it.
This project aims to take advantage of all data available in the La Redoute’s central log server with the goal of understanding what is happening in runtime and how can the information from exception can help the team improving both business and development processes.
By the end of this Master project the student will be familiar with a large-scale systems development environment and knowledgeable in a diverse set of state-of-art technologies, namely:
* Java/ Spring Framework
* Elastic stack (ELK)
* Kafka
* Docker
* Python
Objetivo
The goal of this project is to provide a means to improve the quality of software components in La Redoute IT. We expect to implement an analyzer tool that relies on the massive amounts of data stored in the log messages of our production system to better understand how components are performing in runtime (for instance: there are exceptions being raised and not considered/treated properly by a component?).
This project can be viewed as a runtime complement to the static analysis tools which are based on code characteristics and to the measurements that provide information about the quality of components.
Plano de Trabalhos - Semestre 1
* 1st Semester (20 weeks, 16 hours per week)
1. Overview of the state-of-art and state-of-practice
a. To review the techniques used to parse logs generated by the different components in La Redoute Java stack, focusing specifically in logs generated by Web services (tomcat, glassfish) and spring batches (jar applications triggered by schedulers).
b. To review the techniques, practices and patterns that can be applied specifically to logs generated by Exception Handling structures in Java (exception handling, throw exceptions, …)
2. Ecosystem Overview and Definition of the Data Pipeline
a. Analysis of the patterns of logs that can be found in the logs of La Redoute
b. Definition of what, where and how data will be collected; identification of the best data pipeline type (batch, real-time, …)
3. Analyzer conception and design
a. Definition of how the defined data pipeline can be integrated in the current La Redoute’s infrastructure (e.g., Kubernetes cluster, Kafka streaming, …). Which are the infra and technical requirements to implement the solution?
b. Definition & Identification of the best patterns to analyze the data collected.
>> Expected Deliverables for 1st semester:
- Internship intermediate report
- Scientific Article reporting state-of-the-art (optional)
Plano de Trabalhos - Semestre 2
* 2nd Semester (20 weeks, 40 hours per week)
4. Implementation
a. To implement the technical solution that allows the extraction, data/feature engineering and storage of the “clean” data.
b. To implement 1 (or N) techniques to analyze the data collected.
c. To discuss the results obtained: If we are able to find value in the information provided by the exceptions.
d. To create dashboards with basis on the data collected (e.g. Grafana)
5. Integration and Production
a. To integrate the solution developed in the Development/Operations ecosystem in La Redoute; it includes the deployment of the components in Production (e.g., Kubernetes cluster) and the definition of Service Level Objectives (SLO), Monitoring rules and Recovery actions (e.g., what to do when the component is down).
6. Documentation
a. To create the project documentation
b. To disseminate the solution among the DevOps teams.
>> Expected deliverables for 2nd Semester:
- Internship Final Report
- Technical solution
- Scientific Article (optional)
Condições
* Monthly internship remuneration 750€/month
* Alimentation support 7.5€/worked day
* Possibility to work in remote
* Access to e-learning platform
* With success of the project, get a certification of maximum value of $200
Observações
Competencies required / to develop
* Requirements Analysis
* Conception
* Software Development
* Integration
* Automation
* Data Engineering Techniques
Why you should choose this project
* A relevant problem
The project will be actually applied in large scale environment of a major multinational company.
* Close support
Our team has deep expertise on the technology stack and really feels the need for the analyser, so the student will receive all required support to reach a successful outcome.
* Professional relevance
The technology stack to be used is very diverse, thus promoting the acquisition of competences in many different state-of-art domains.
* Future ready
The student will gain or consolidate his/her competences on software analysis, development and visualization techniques.
Orientador
Ana Filipa Nogueira
anogueira@redoute.pt 📩