Propostas submetidas

DEI - FCTUC
Gerado a 2024-04-20 03:48:34 (Europe/Lisbon).
Voltar

Titulo Estágio

Big Data Approximate Query Answering

Áreas de especialidade

Engenharia de Software

Sistemas de Informação

Local do Estágio

DEI-FCTUC

Enquadramento

The area of online aggregation [1] studies new algorithms that allow to return approximate results (with statistical guarantees) for analytical queries at early stages of the processing, so that the user can terminate it as soon as the accuracy is acceptable. Recent studies have shown encouraging results [2, 3, 4], but there is still much room for improvement:
(1) The existing algorithms have only used simple random sampling or sample random walks to sample from query results. More sophisticated techniques based on Markov Chain Monte Carlo might be more effective.
(2) The streaming algorithms community has developed many techniques to summarize large data sets into compact data structures while preserving properties of the data. These summarization techniques can also be useful in approximate query processing.
(3) Integrating these techniques into data processing engines is still a significant challenge.
In this internship, we intend to propose and evaluate algorithms and techniques to approximate query answering in Big Data systems.

[1] P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD, pages 287–298. ACM, 1999.
[2] J. Bernardino, P. Furtado, H. Madeira. Approximate Query Answering Using Data Warehouse Striping. Journal of Intelligent Information Systems (JIIS). 19(2): 145-167 (2002)
[3] J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, pages 171–182. ACM, 1997.
[4] F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In International Conference on Management of Data (SIGMOD), pages 615–629. ACM, 2016.

Objetivo

In practice, the expected outcomes of this internship are:
- Evaluate the existing algorithms that use simple random sampling or sample random walks to sample from query results.
- Evaluate summarization techniques used by streaming community in approximate query processing.
- Propose new techniques specifically adapted to approximate query answering in Big Data systems.
- A research paper, to be submitted and presented at a top international conference, describing the approach and main results obtained from the experiments.

Plano de Trabalhos - Semestre 1

[Some tasks might overlap; M=Month]
T1 (M1 – M3): Knowledge transfer and state of the art literature review on approximate query answering.
T2 (M3) Design new algorithms and techniques for Big Data Systems, using the information gathered in task T1 as basis.
T3 (M3) Identification of target systems to be used in the experiments.
T4 (M3 – M4) Implementation of a proof of concept prototype.
T5 (M5): Writing the Intermediate report.

Plano de Trabalhos - Semestre 2

[Some tasks might overlap; M=Month]
T1 (M6): Integration of the intermediate defense comments and completion of approximate query answering algorithms and techniques for Big Data Systems.
T2 (M6 – M7): Implementation of the architecture and mechanisms for Big Data systems, and execution of tests (functional).
T3 (M8): Execution of experiments and analysis of results.
T4 (M9): Write a research paper and submission to a top international conference on Database area (IEEE Big Data Congress, Database Systems for Advanced Applications - DASFAA, IEEE International Conference on Data Engineering – ICDE, etc.).
T5 (M10): Writing the thesis.

Condições

The work will be carried out in the facilities of the Department of Informatics Engineering at the University of Coimbra (CISUC - Software and Systems Engineering Group), where a work place and necessary computer resources will be provided.

Orientador

Jorge Bernardino
jorge@isec.pt 📩