Propostas sem aluno

DEI - FCTUC
Gerado a 2024-05-17 08:22:03 (Europe/Lisbon).
Voltar

Titulo Estágio

Fraud Prevention with algorithms for tabular data

Local do Estágio

Feedzai in Coimbra

Enquadramento

Fraud prevention aims at identifying abnormal customer behavior associated with the intent to obtain
resources under false pretenses. The field has proved to be a fruitful application for machine learning
(ML) algorithms. Traditionally, such (ML) algorithms use features extracted from structured data, i.e.,
data comprised of both categorical and numerical variables. Furthermore, fraud prevention has
additional constraints on what ML models are acceptable. For example due to the regulatory
environment, models must show a good degree of interpretability.
Practitioners often rely on tree-based approaches to address fraud detection. I.e., they use random
forests or gradient boosted trees instead of Deep Learning models (DL). In fact, tree based models
have proved to out-perform the latter in structured data and are better suitable to address the additional
constraints of fraud detection models.
But new research in DL, e.g., TabNet, aims to boost performance in structured datasets by bringi

Objetivo

The goal of the internship is to investigate, compare and benchmark different ML Algorithms suitable
for tabular data, and explore how they fit into the Fraud Detection use case, taking into consideration
it’s main concerns (e.g., unbalanced data, high computational efficiency, explainability, suitability for
fairness constraints, detection performance, etc). The objective is to present both a theoretical
perspective on why these algorithms are suitable, and experimental evidence on their performance.

Plano de Trabalhos - Semestre 1

During the 1st semester, the student must get familiar with the field and state-of-the-art. For this
purpose, by the end of the first semester, e.g., 31st January, the student must have completed a thesis
proposal document, with the motivation, review of the state-of-the-art, and detailed work plan for the
second semester. The student must also present the main findings from this review internally.
Workplan proposal:
 Review the state-of-the-art for ML Algorithms designed for tabular data by the 20th of
November
 Selection and theoretical analysis and comparison of a set of algorithms better suitable for
Fraud Detection by the 5th of December
 Proposal of experimental setups for the comparison and evaluation of algorithms by the 15th
of December
 Proposal document and presentation by the 31st of January

Plano de Trabalhos - Semestre 2

During the 2nd semester the student will empirically explore the application of the several approaches
for structured data in the context of fraud detection. Thus, the work plan for the 2nd semester could be:
 Algorithm implementation/setup and data preparation by the 20thst of March
 Benchmarking, evaluation, and iteration by the 22nd of May
 Documentation, final report, and presentation by the 21st of July

Condições

The software and hardware required for the internship will be provided by Feedzai. Expect to work in
linux systems and using big data tools, e.g., pyspark. This is a paid internship, 1000€ gross (before
taxes and social security) per month (at full time).

Observações

Candidates must have a good knowledge of python.

Orientador

Susana Dias Brandão
susana.brandao@feedzai.com 📩