Propostas Submetidas

Gerado a 2024-07-17 09:19:09 (Europe/Lisbon).

Titulo Estágio

Real time querying of Data Platform

Áreas de especialidade

Engenharia de Software

Engenharia de Software

Local do Estágio



Data Platform (DP) is a set of infrastructure tooling to ingest, transform and access data. The current architecture does not guarantee performant real time querying of the data. However, most client facing products built on top of DP will have that requirement and need. Goal of the internship would be to explore centralizing that responsibility (partially or fully, TBD) in DP instead of offloading to Data Product teams.


Data Platform (DP) is a set of infrastructure tooling to ingest, transform and access data. The current architecture does not guarantee performant real time querying of the data. However, most client facing products built on top of DP will have that requirement and need. Goal of the internship would be to explore centralizing that responsibility (partially or fully, TBD) in DP instead of offloading to Data Product teams.

Plano de Trabalhos - Semestre 1

"Project stages:
1. Literature review - research existing data migration tools, strategies and best practices; research ""real-time databases"" (database technolgies that fit the requirements of the use case).
2. Requirements gathering - define with the Data Platform team a specific Data Product to be used as a benchmark for the whole solution, as well as a set of queries on the migrated data and the KPIs that should be met (query response times). At the end of this step we should also have some kind of documentation regarding the proposed architecture and implementation of the soltion.

Expected results:
Literature review document
Requeriments and architecture proposal"

Plano de Trabalhos - Semestre 2

"Project stages:
3. Infrastructure setup - implement a process for provisioning and configuring a database that should be the destination of the data extracted from the Data Platform; this should be implemented with self-serviceability in mind.
4. Data extraction - develop a process for extracting the selected data and writing it to the database instance provisioned; this process should have strong consistency guarantees and should account for initial ""one-off"" data loading and incremental updates; data retention should also be a concern here.
5. Benchmarking and testing - Develop a benchmarking suite in order to evaluate the solution performance against the agreed SLAs.
6. Optimization and tuning - Iteratively test and fine-tune the database and data migration process in order to improve the performance of the solution.
7. Documentation and reporting - Document the entire solution and implemntation; write the necessary project reports/essays.

Expected results:
A fully functional implementation of a system that extracts data from the Data Platform and makes it available to external system in a database system prepared to handle real-time workloads.
A benchmark atesting the validity of the solution
All related documentation"




João Alves Pereira 📩