Titulo Estágio
An approach to reduction in TID lists in data cube approaches
Áreas de especialidade
Sistemas de Informação
Engenharia de Software
Local do Estágio
CISUC
Enquadramento
The use of main memory (RAM - Random-Access Memory) is admittedly efficient for processing analytical queries; however, in the context of Big Data, where volume, heterogeneity and update of data are critical, reduction of tuple identification lists becomes fundamental. Approaches for data cube computation adopting the strategy of inverted indices, such as Frag-Cubing, H-Frag and HIC, are efficient alternatives compared to traditional R-OLAP approaches that operate on Big Data; however, they still inherit the problem of the inverted indices size of frequent values attribute in the relation. Normally, the inverted index becomes too large to be stored in RAM. Very frequent attribute values can sequentially generate storage space problems in external memory when it is limited to few gigabytes. Very large TID lists can also encumber the query, since these lists need to be loaded to RAM to later perform intersection operations; therefore, to explore how TID lists can be reduced is still quite important.
Then to explore different representations for TID lists is something relevant for numerous approaches based on inverted index
Objetivo
In practice, the expected outcomes of this internship are:
- Develop methods to reduction in TID lists in data cube over inverted indices approaches;
- A research paper, to be submitted and presented at a top international conference, describing the approach and main results obtained from the experiments.
Plano de Trabalhos - Semestre 1
[Some tasks might overlap; M=Month]
T1 (M1 – M3): Knowledge transfer and state of the art literature review on data cube and inverted indices for data cube approaches.
T2 (M3) Identification of data set to be used in the experiments.
T3 (M3-M4) Experiments with approaches gathered in task T1 with data set found in task T2.
T5 (M5): Writing the Intermediate report.
Plano de Trabalhos - Semestre 2
[Some tasks might overlap; M=Month]
T6 (M6): Integration of the intermediate defense comments and completion of the inverted indices for data cube approaches.
T7 (M6 – M7): Implementation of the approach for sreduction in TID lists in data cube, and execution of tests.
T8 (M8): Execution of experiments and analysis of results.
T9 (M9): Write a research paper and submission to a top international conference on the Text Mining areas (Database Systems - TODS, Database Systems for Advanced Applications - DASFAA, IEEE International Conference on Data Engineering – ICDE, etc.).
T10 (M10): Writing the thesis.
Condições
The work will be carried out in the facilities of the Department of Informatics Engineering at the University of Coimbra (CISUC - Software and Systems Engineering Group), where a work place and necessary computer resources will be provided.
Orientador
Rodrigo Silva, Jorge Bernardino
rrochas@dei.uc.pt 📩