Propostas com aluno identificado

Gerado a 2025-07-17 15:22:26 (Europe/Lisbon).

Voltar

Titulo Estágio

Mitigating Bias and Enhancing Fairness in Large Language Models Trained on Imbalanced Data

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

DEI

Enquadramento

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks including text generation, summarization, and question answering. These models are typically trained on massive datasets that are, by nature, noisy, imbalanced, and potentially biased. Imbalanced data, where some classes or groups are significantly underrepresented, is a well-known issue in machine learning, often leading to models that generalize poorly or unfairly across demographic groups.
In the context of LLMs, such imbalances can exacerbate systemic biases, leading to outputs that disadvantage certain populations or propagate stereotypes. Fairness in AI has become a critical area of research, aiming to ensure that models treat all groups equitably, regardless of representation in the training data.
Despite growing awareness of these issues, most existing mitigation techniques are either task-specific or not well-adapted to the scale and architecture of LLMs.

Objetivo

The main objective of this thesis is to investigate the impact of data imbalance on fairness in LLMs and to develop methods to mitigate unfairness arising from such imbalance.

Plano de Trabalhos - Semestre 1

Semester 1: Literature Review, Problem Formulation, Initial Experiments
Month 1–2: Literature Review
-Review current research on imbalanced data handling, fairness in machine learning, and bias in LLMs.
-Study existing fairness metrics and mitigation strategies (e.g., oversampling, adversarial debiasing).

Month 3–4: Dataset Analysis & Task Selection
-Identify or construct datasets with known imbalances (e.g., gender, ethnicity, topic representation).
-Select downstream NLP tasks for evaluation (e.g., classification).

Month 5–6: Baseline Evaluation
-Fine-tune and evaluate existing LLMs (e.g., GPT, T5, LLaMA) on selected datasets.
-Measure performance and fairness using standard metrics.

Plano de Trabalhos - Semestre 2

Semester 2: Method Development, Implementation, and Thesis Writing
Month 1–2: Method Design
-Develop one or more mitigation techniques: re-weighting, data augmentation, or fairness-aware loss functions.
-Define evaluation criteria for success.

Month 3–4: Implementation and Experimentation
-Implement and test proposed methods using pre-trained LLMs.
-Compare results with baselines across fairness and accuracy metrics.

Month 5: Framework and Guideline Creation
-Design a reproducible benchmarking framework for fairness in LLMs under imbalance.
-Draft practical guidelines for developers.

Condições

n/a

Observações

n/a

Orientador

Pedro Henriques Abreu/Miriam Seoane Santos
pha@dei.uc.pt 📩