Titulo Estágio
Parallelization of SPARQL Queries on Triple Stores for Semantic Data
Área Tecnológica
Engenharia de Software
Local do Estágio
SSE - CISUC
Enquadramento
Semantic Web is considered the next step of evolution of the Internet. The Web, as we know it today, was mostly written for humans to read. The Semantic Web is intended for both humans and machines to read and act upon the information. Tasks that are done manually today, may be automatized by intelligent agents. Practical usages of the Semantic Web include recommendation engines, advanced searches, system integration…
One of the main issues with applying Semantic Web today is the weak performance of queries. Specially when compared to traditional Relational Databases, the performance of queries to triple-stores is much slower, decreasing the usefulness in services where users expect an immediate response.
In the proposed work, the student will work towards improving the performance of this systems, having two ideas as starting points:
1) A new generation of NoSQL databases is emerging (such as MongoDB, redis, Kyoto cabinet, neo4j, etc). Recent studies, mainly in industry, show these databases have a higher throughput than traditional databases at the cost of simpler data structures, mainly based on key-value store. These new technologies can be used to improve the performance of a triple store.
2) SPARQL queries are a combination of conditions, either with AND or OR relationships. The smaller conditions of the query can be done in parallel, and then merged together for the final query result. Furthermore, several subsets of the triple store can be processed in parallel. Existing triple stores support only concurrent connections, but do not parallelize queries. This is another opportunity to improve performance of triple stores.
Objetivo
The main objective of this project is to improve the performance of triple store queries. Two main artifacts should be the result of this work:
1) A triple-store engine that leverages both the speed and structure of NoSQL databases and the ability to parallelize queries.
2) A performance study benchmarking the proposed triple-store against existing state-of-the-art triple store databases (AllegroGraph, Virtuoso, 4 Store).
Plano de Trabalhos - Semestre 1
17 Sep - 31 Oct
Review of the most recent approaches to this problem in the state of the art.
1 Nov - 31 Nov
Approach - Definition of the requirements and work plan.
1 Dez - 31 Dez
Selection of a NoSQL database and implementation of the backend.
1 Jan - 28 Jan
Writing and reviewing of the first semester report.
Plano de Trabalhos - Semestre 2
15 Feb - 8 Mar
Creation of a SPARQL interface for the triple store.
9 Mar - 31 Mar
Parallelization of query clauses.
1 Apr - 30 Apr
Parallelization of graph subsets.
1 May - 31 May
Benchmark of the triple store against state-of-the-art engines.
1 Jun - 28 Jun
Writing and reviewing of the dissertation.
Condições
The proposed work plan will be performed in the Software and Systems Engineering Group of CISUC, where the student will be given access to required hardware.
A research grant may be given to the student, depending on the acceptance of projects by INOV-C Ignição.
Observações
We are looking for students who understand the importance of the Semantic Web and are familiar with RDF, SPARQL and have used a triple-store before.
Students should also be familiar with concurrent and parallel programming.
NoSQL experience will also give you an advantage on this project.
Orientador
Alcides Fonseca
amaf@dei.uc.pt 📩