Titulo Estágio
Generating Fair Synthetic Data using a Hybrid Approach
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
DEI
Enquadramento
In recent years, the field of data science has witnessed an exponential growth in the availability and utilization of vast amounts of data. This surge in data has fueled advancements in various domains, ranging from machine learning to artificial intelligence. However, the widespread use of data-intensive applications raises concerns about data privacy, security, and fairness.
Ensuring fairness in data-driven systems is of paramount importance, as biased data can perpetuate and amplify social inequalities and discrimination. Synthetic data generation has emerged as a promising approach to address these concerns by providing an alternative to sensitive or biased data. By creating realistic yet synthetic datasets, researchers and practitioners can work with data that preserves privacy and reduces bias.
Objetivo
The objective of this thesis proposal is to investigate and develop a hybrid approach for generating fair synthetic data. This hybrid approach aims to combine the strengths of multiple prominent techniques such as morphing algorithms, generative modeling and fairness-aware algorithms.
Generative modeling, specifically techniques like generative adversarial networks (GANs), has demonstrated remarkable success in capturing the underlying distribution of data and generating realistic synthetic samples. Morphing algorithms, on the other hand, provide the ability to transform and interpolate between data instances, facilitating the generation of synthetic samples that smoothly transition across various attributes and exhibit fairness.
Plano de Trabalhos - Semestre 1
1. Literature Review
○ Conduct an in-depth review of relevant literature on synthetic data generation, fairness, generative modeling techniques, and morphing algorithms.
○ Identify key concepts, methodologies, and existing approaches in the field.
2. Problem Definition and Research Questions
○ Clearly define the problem statement for generating fair synthetic data using a hybrid approach.
○ Formulate specific research questions that will guide the study.
3. Data Collection and Preprocessing
○ Identify suitable datasets for experimentation and evaluation.
○ Gather and preprocess the necessary real-world data, ensuring privacy and anonymization measures.
4. Hybrid Morphing Approach Design
○ Explore and design a framework that integrates generative modeling techniques and morphing algorithms to generate fair synthetic data.
○ Define the components, workflows, and interactions within the proposed hybrid approach.
5. Implementation of Hybrid Approach Prototype
○ Develop a prototype of the hybrid morphing approach using appropriate programming languages and tools.
○ Implement the generative modeling and morphing algorithms, incorporating fairness constraints.
6. Evaluation Plan Development
○ Devise a comprehensive evaluation plan to assess the fairness and effectiveness of the generated synthetic data.
○ Identify suitable fairness metrics and performance evaluation criteria.
Plano de Trabalhos - Semestre 2
1. Data Generation and Fairness Analysis
○ Generate synthetic data using the hybrid morphing approach developed in Semester 1.
○ Perform fairness analysis on the generated data, comparing it against the original real-world dataset.
2. Evaluation and Validation
○ Execute the evaluation plan devised in Semester 1 to measure the fairness and effectiveness of the generated synthetic data.
○ Validate the results and analyze the findings in terms of fairness preservation and data quality.
3. Iterative Refinement of the Hybrid Approach
○ Identify limitations or areas for improvement in the hybrid approach based on the evaluation results.
○ Make necessary modifications and refinements to enhance the fairness and quality of the synthetic data.
4. Comparative Analysis
○ Conduct a comparative analysis between the hybrid morphing approach and existing approaches in terms of fairness, realism, and efficiency.
○ Highlight the strengths and limitations of the proposed approach.
5. Results Compilation and Discussion
○ Compile the experimental results, evaluation outcomes, and comparative analysis findings.
○ Discuss the implications of the results in relation to the research questions and objectives.
6. Conclusion and Dissertation Writing
○ Summarize the key findings and contributions of the research.
○ Write the dissertation, including introduction, literature review, methodology, results, discussion, and conclusion chapters.
○ Revise and refine the dissertation, ensuring coherence, clarity, and proper referencing.
Condições
.
Observações
.
Orientador
Pedro Henriques Abreu
pha@dei.uc.pt 📩