Propostas Atribuidas

DEI - FCTUC
Gerado a 2024-12-04 19:12:30 (Europe/Lisbon).
Voltar

Titulo Estágio

Addressing data complexity in imbalanced contexts

Áreas de especialidade

Engenharia de Software

Sistemas Inteligentes

Local do Estágio

DEI

Enquadramento

Imbalanced Data (ID) is known to deteriorate the performance of classifiers, being a common problem researchers need to face across several pattern recognition contexts. However, ID by itself is not completely responsible for the hitches in the classification stage, for there are other factors related to data intrinsic characteristics that influence classification just as much: small disjuncts, lack of density, class overlapping, noisy and missing data, among others. These difficulties are yet to be addressed in a feasible way, since they are most often studied individually, and therefore the challenge arises when they need to be addressed simultaneously and identified in real scenarios.

Objetivo

The core of this work relies on addressing the data complexity associated with imbalanced scenarios. For that, the student will need to define several strategies to analyse not only on the disproportion between classes, but also the structure and nature of data, so that the true sources of the classification difficulties are identified and surpassed. Within this context, the problem of small disjuncts is particularly interesting, where the identification of subconcepts in real world data constitutes a major contribution to research. The student will need to study appropriate clustering solutions and measures to deal with complex shapes attending to the minority vs. majority patterns within clusters.

Plano de Trabalhos - Semestre 1

- Learn about the fundamental aspects of dealing with imbalanced data
- Learn about data complexity factors and measures
- Learn about classification and clustering algorithms
- Select a set of synthetic and real datasets that reflect different scenarios: imbalance ratio, sample size, missing data, small disjuncts, among others
- Plan the experimental setup to follow in the second semester

Plano de Trabalhos - Semestre 2

- Data complexity assessment and classification experiments
- Implement different clustering strategies to apply to the detection of small disjuncts
- Evaluate the results and discuss the findings
- Writing of scientific article and dissertation

Condições

The work is not financed.

Orientador

Pedro Manuel Henriques da Cunha Abreu
pha@dei.uc.pt 📩