Titulo Estágio
Probabilistic synthetic data generation
Áreas de especialidade
Engenharia de Software
Engenharia de Software
Local do Estágio
Coimbra
Enquadramento
When developing solutions to complex modelling problems, such as money mule detection or fraudulent transaction identification problems we tackle at Feedzai, it is often beneficial to start from a simpler, more controlled environment. We propose developing a probabilistic model for synthetic data generation to assist data scientists in developing more specific features during feature engineering, in testing the robustness of their models as well as communicating their results by having simple, easy to parse examples.
Objetivo
When developing solutions to complex modelling problems, such as money mule detection or fraudulent transaction identification problems we tackle at Feedzai, it is often beneficial to start from a simpler, more controlled environment. We propose developing a probabilistic model for synthetic data generation to assist data scientists in developing more specific features during feature engineering, in testing the robustness of their models as well as communicating their results by having simple, easy to parse examples.
Plano de Trabalhos - Semestre 1
"0. Review existing literature on synthetic data generation
1. Onboard on existing Python tech stack at feedzai
2. Interview key stakeholders for synthetic data consumption
Expected results:
1. Literature review
2. Refine planning for the second semester based on review"
Plano de Trabalhos - Semestre 2
"Stages:
1. Define success criteria for synthetic data generation
2. Implement proposal algorithms for synthetic data generation based on literature study
3. Refine approach based on results
4. Document approach and results
5. Draft final report and presentation"
Condições
Remunerated
Orientador
António Luís Correia
luis.correia@feedzai.com 📩