Propostas com aluno identificado

DEI - FCTUC
Gerado a 2025-07-17 15:45:40 (Europe/Lisbon).
Voltar

Titulo Estágio

Towards Fair and Privacy-Preserving Oversampling: A Novel SMOTE Variant Handling Outliers and Skewed Distributions

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI

Enquadramento

Synthetic Minority Over-sampling Technique (SMOTE) and its variants have been widely used to address class imbalance in supervised learning. However, standard SMOTE approaches typically disregard three crucial aspects in modern data science:

-Fairness – Ensuring that synthetic data do not amplify or introduce biases with respect to sensitive attributes (e.g., gender, race).

-Privacy – Protecting individuals' data from being reverse-engineered or exposed through oversampling.

-Data Complexity – Handling real-world challenges such as outliers and skewed distributions that can compromise both the quality and utility of synthetic data.

This thesis proposes to develop a new oversampling algorithm that extends the SMOTE family to explicitly address fairness, privacy preservation, and data complexity (outliers and skewed distributions).

Objetivo

1-Literature Review: Analyze state-of-the-art oversampling algorithms, especially those addressing fairness and privacy.

2-Bias and Privacy Risk Analysis: Understand how current SMOTE variants may introduce bias or leak private information.

3-Algorithm Design: Develop a new oversampling method incorporating:

3.1 Fairness constraints (e.g., equalized odds, demographic parity)

3.2 Privacy-preserving mechanisms (e.g., differential privacy or distance-based privacy filtering)

3.3 Robustness to outliers and capability to handle skewed data distributions

4-Experimental Validation: Evaluate the proposed method on real-world datasets from multiple domains.

5-Comparison with Baselines: Benchmark against existing SMOTE variants and fairness-aware models using metrics for performance, fairness, and privacy.

Plano de Trabalhos - Semestre 1

T1. Literature Survey Study existing SMOTE variants, fairness-aware ML, and privacy-preserving data generation
T2. Problem Formalization Define fairness and privacy metrics relevant to oversampling; specify dataset requirements
T3. Exploratory Data Analysis Analyze selected datasets for imbalance, skew, outliers, and bias

Plano de Trabalhos - Semestre 2

T4. Algorithm Development Design and implement the novel oversampling algorithm
T5. Privacy & Fairness Integration Integrate privacy-preserving techniques (e.g., DP-noise) and fairness-aware generation strategies
T6. Experimental Setup Define benchmarks, performance metrics, fairness/privacy evaluation tools
T7. Evaluation and Tuning Run experiments and refine the method based on empirical findings
T8. Documentation & Thesis Writing Report results, prepare thesis manuscript, and finalize documentation

Condições

n/a

Observações

Orientadores:
• Pedro Abreu
• Penousal Machado

Orientador

Pedro Henriques Abreu/Penousal Machado
pha@dei.uc.pt 📩