Propostas Submetidas

DEI - FCTUC
Gerado a 2024-05-02 20:35:01 (Europe/Lisbon).
Voltar

Titulo Estágio

Microservices Observability

Áreas de especialidade

Engenharia de Software

Local do Estágio

DEI

Enquadramento

As cloud and micro-service architectures become more complex, with more moving parts, and highly distributed, there is a need to build deeper levels of monitoring. Companies face increasing visibility challenges into the health and performance of their systems. Developing state-of-the-art monitorable systems requires to understand how recent technological developments and tools fit together as end-to-end systems and what are the most important metrics and components to monitor.

Recently, a new term coined observability was introduced to describe a new paradigm that encompasses the integration and analysis of service logs, infrastructure and application metrics, and tracing to build new tools for troubleshooting and analyze complex micro-service systems. Well-known companies such as Google, Twitter, and Facebook have built internal tools to address the real need felt by the early adopters of cloud native applications. For example, Facebook developed Canopy, an end-to-end performance tracing infrastructure, which records and processes over 1 billion traces per day for performance analysis.

Objetivo

This internship aims at exploring the use of emergent monitoring and introspection open source tools available to correctly pinpoint performance bottlenecks, identify anomalous behavior, and diagnose the root cause of incidents of micro-service architectures. The main objectives are:

- Define a Use Case using Apache Kafka, a distributed streaming platform, as a central player to be used throughout the project and to build a Proof of Concept (PoC).

- Identify which logging, metrics collection, and request tracing should be monitored from Kafka and which bring a high value to observability.

- Use Google OpenCensus and Etsy StatsD to automatically collect the traces and metrics previously identified from Apache Kafka.

- Send the collected traces and metrics to Prometheus and Google Stackdriver.

- Use the data collected from the three sources and derive define actionable alerts and insightful analysis using Prometheus and Google Stackdriver.

The final system build will represent an example of state-of-the-art monitoring systems. Thus, a study evaluating the performance of the individual components and the overall solution will be conducted.

Plano de Trabalhos - Semestre 1

- Study the state of the art, namely existing tools, and related technologies such as logging, metric collection, and tracing (2 months).
- Define a Use Case and derive requirements for advanced observability (1 month)
- Write intermediate report (1 month).

Plano de Trabalhos - Semestre 2

- Deploy base components (Kafka, StatsD, OpenCensus, Prometheus, etc.) (1 months)
- Integrated components (1 months)
- Test and evaluate the results (2 months)
- Write final report (1 month)

Condições

This work should take place in the context of a research project funded by FCT. A 6-month scholarship of 745 euros per month is foreseen for this work.

Orientador

Filipe Araújo, Prof. Rui Paiva e Prof. António Jorge Cardoso (Huawei/Univ. de Coimbra)
filipius@uc.pt 📩