Propostas submetidas

Gerado a 2024-04-25 06:04:56 (Europe/Lisbon).

Voltar

Titulo Estágio

An approach to reduction in TID lists in data cube approaches

Áreas de especialidade

Engenharia de Software

Local do Estágio

DEI-FCTUC

Enquadramento

The use of main memory (RAM - Random-Access Memory) is admittedly efficient for processing analytical queries; however, in the context of Big Data, where volume, heterogeneity and update of data are critical, reduction of tuple identification lists becomes fundamental. Approaches for data cube computation adopting the strategy of inverted indices, such as Frag-Cubing, H-Frag and HIC, are efficient alternatives compared to traditional R-OLAP approaches that operate on Big Data; however, they still inherit the problem of the inverted indices size of frequent values attribute in the relation. Normally, the inverted index becomes too large to be stored in RAM. Very frequent attribute values can sequentially generate storage space problems in external memory when it is limited to few gigabytes. Very large TID lists can also encumber the query, since these lists need to be loaded to RAM to later perform intersection operations; therefore, to explore how TID lists can be reduced is still quite important.
Then to explore different representations for TID lists is something relevant for numerous approaches based on inverted index.

Objetivo

In practice, the expected outcomes of this internship are:
- Develop methods to reduction in TID lists in data cube over inverted indices approaches;
- A research paper, to be submitted and presented at a top international conference, describing the approach and main results obtained from the experiments.

Plano de Trabalhos - Semestre 1

[Some tasks might overlap; M=Month]
T1 (M1 – M3): Knowledge transfer and state of the art literature review on data cube and inverted indices for data cube approaches.
T2 (M3) Identification of data set to be used in the experiments.
T3 (M3-M4) Experiments with approaches gathered in task T1 with data set found in task T2.
T5 (M5): Writing the Intermediate report.

Plano de Trabalhos - Semestre 2

[Some tasks might overlap; M=Month]
T6 (M6): Integration of the intermediate defense comments and completion of the inverted indices for data cube approaches.
T7 (M6 – M7): Implementation of the approach for sreduction in TID lists in data cube, and execution of tests.
T8 (M8): Execution of experiments and analysis of results.
T9 (M9): Write a research paper and submission to a top international conference on the Text Mining areas (Database Systems - TODS, Database Systems for Advanced Applications - DASFAA, IEEE International Conference on Data Engineering – ICDE, etc.).
T10 (M10): Writing the thesis.

Condições

The work will be carried out in the facilities of the Department of Informatics Engineering at the University of Coimbra (CISUC - Software and Systems Engineering Group), where a work place and necessary computer resources will be provided.

Observações

A scholarship may be available (value to be defined) for at least part of the duration of the internship.

The work will be co-oriented by Professor Jorge Bernardino

Orientador

Rodrigo Rocha Silva
rrochas@dei.uc.pt 📩