Propostas Submetidas

DEI - FCTUC
Gerado a 2025-07-17 14:19:45 (Europe/Lisbon).
Voltar

Titulo Estágio

Evaluation of Data Lakehouse solutions

Áreas de especialidade

Engenharia de Software

Comunicações, Serviços e Infraestruturas

Local do Estágio

DEI-FCTUC / CISUC

Enquadramento

An increasing number of organisations across industries are relying on data to make informed decisions and to power real-time applications, predictive models, and operational intelligence. Traditional data architectures, including data warehouses and data lakes, often fail to meet the performance and agility requirements of current data-driven applications, especially those involving streaming data, such as the generated by IoT. In response to these limitations, the data lakehouse architecture has emerged as a hybrid approach that combines the governance and performance of data warehouses with the scalability and flexibility of data lakes.
This thesis is motivated by the need to evaluate modern data lakehouse platforms using controlled experiments and to determine their applicability to large-scale IoT deployments.

Objetivo

-Explore the core capabilities of data lakehouses and their benefits in data management.
-Evaluate the performance of streaming ingestion pipelines across different data lakehouse platforms.
-Specific objectives include:
-Design and implement realistic streaming ingestion workloads with structured, semi-structured, and unstructured data.
-Develop a benchmarking framework.
-Compare performance (E.g.: throughput (rows/sec), ingestion latency, query availability delay, system resource usage).
-Analyze strengths, weaknesses, and trade-offs of each platform's architecture

In addition, it is hoped that as a result of the work it will be possible to publish a scientific article in an international conference or journal.

Plano de Trabalhos - Semestre 1

T1.1 – State-of-the-art analysis of data lakehouse architecture and platforms.
T1.2 – Selection of data lakehouse platforms to evaluate
T1.3 - Specification of the methodology and metrics to be used in the evaluation.
T1.4 – Writing of the intermediary report.

Plano de Trabalhos - Semestre 2

T2.1 – Evaluation of the selected data lakehouse platforms.
T2.2 – Analysis and validation of the results obtained.
T2.3 – Provide recommendations for platform selection based on performance findings.
T2.4 – Thesis writing and submission of a scientific publication.

Condições

The student will have access to all the computer resources needed to carry out the work. Evaluation, either by simulation or using specific hardware, can be carried out using computer resources available in DEI/CISUC.

Observações

Advisors:
Jorge Bernardino (jorge@isec.pt)
Vasco Pereira (vasco@dei.uc.pt)

Orientador

Vasco Pereira
vasco@dei.uc.pt 📩