Propostas com alunos identificados

DEI - FCTUC
Gerado a 2024-04-23 15:29:50 (Europe/Lisbon).
Voltar

Titulo Estágio

Understanding Fairness Bias in Missing Data Imputation

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI-CISUSC

Enquadramento

Missing data is an issue described by the absence of values in one or more features of a dataset. It can be found in most real-world domains, and it usually has a negative impact on most tasks performed with the data. As an example, most machine learning models cannot cope with missing values and the ones that can show significant performance degradation. Discarding the instances with missing values is the easiest solution, but it is often avoided since the datasets will become biased and valuable data could be lost. A better strategy to deal with missing data is to perform imputation, where new plausible estimates are generated to replace the missing values. The impact of these estimations in the tasks performed with the data has been widely studied, particularly when those tasks are classification and regression problems. However, more recently, another perspective of these estimations has been taken into consideration: how do the imputation procedures impact the fairness of machine learning models trained with the imputed data? Since the data characteristics can change after the imputation (e.g., the distribution of some features may be shifted), the models can be biased towards specific concepts. It is therefore important to understand when and how this fairness bias happens, and how to fix the issue.

Objetivo

In this master’s thesis, the student should study how different imputation methods impact the fairness levels in different classification and regression problems, taking different factors into consideration: different missing data mechanisms, lower and higher levels of missingness, different types of classifiers and regressors, and various fairness metrics. Furthermore, the student should identify strategies to tackle this fairness bias when it happens.

Plano de Trabalhos - Semestre 1

- Study the necessary background knowledge and state-of-the-art works related to missing data and fairness.
- Select the datasets to be considered, a comprehensive baseline of state-of-the-art imputation methods, the fairness metrics, and other aspects needed in the experiments.
- Propose the experimental setup to evaluate the impact of the imputation methods into the fairness levels of different classification and regression problems.
- Obtain initial results for a specific classification problem, while using this information to calibrate and adjust the experimental setup.
- Write the intermediate report.

Plano de Trabalhos - Semestre 2

- Conduct the experiments and evaluate the results.
- Based on the findings, identify strategies to tackle the fairness bias.
- Write the final report and a research paper.

Condições

-

Orientador

Pedro Manuel Henriques da Cunha Abreu
pha@dei.uc.pt 📩