Propostas com alunos

Gerado a 2025-11-20 19:39:16 (Europe/Lisbon).

Voltar

Titulo Estágio

Anonymizing Private Information: From Noise to Data

Áreas de especialidade

Sistemas Inteligentes

Engenharia de Software

Local do Estágio

CISUC

Enquadramento

The amount of data that people generate nowadays is immense. This enables the development of intelligent systems and services that can improve the quality and safety in the day to day life. Usually, the more information we have, the better the services we can provide to the user. However, we need to take into account that some of this information is private and should be treated with care. Preserving the privacy of information is extremely important nowadays. In an ideal scenario, it should be possible to use all the information available, without compromising the privacy of users. Over the years many researchers have devoted their attention to how we could achieve this, without compromising the quality of services provided.
With the rise of Deep Learning and the success it achieves in a variety of problems, many approaches have been proposed in the literature to tackle the problem of privacy preservation. One possible way is to develop a model that learns the underlying representation of the data, and is then used to generate new examples that follow in the same distributions, but do not reveal any private information about the original dataset [1,2,3,4].

1 - Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127.
2 - Phan, N., Wang, Y., Wu, X., & Dou, D. (2016, February). Differential privacy preservation for deep auto-encoders: an application of human behavior prediction. In Thirtieth AAAI Conference on Artificial Intelligence.
3 - Malekzadeh, M., Clegg, R. G., & Haddadi, H. (2018, April). Replacement autoencoder: A privacy-preserving algorithm for sensory data analysis. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI) (pp. 165-176). IEEE.
4 - Xu, L., & Veeramachaneni, K. (2018). Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264

Objetivo

The main goal of this dissertation is to design, develop and implement a framework that is able to, given a dataset with real data, learn the underlying distribution of the data. The outcome of the learning process will make it possible to generate new, artificial data that will be used to train Machine Learning models, without compromising classification and prediction value of the real data and without revealing private information. In concrete, we will focus on approaches based on the usage of Generative Artificial Networks (GANs) [4], or AutoEnconders [3].

Plano de Trabalhos - Semestre 1

1 - Review of literature .
2 - Definition of the techniques and technologies that will be used.
3 - System Architecture Design
4 - Implementation of the first version of the system
5 - Writing of the intermediate report

Plano de Trabalhos - Semestre 2

6 - Analysis of the first prototype and the obtained results
7 - Implementation of a second version of the prototype
8 - Validation and Refinement
9 - Scientific Article with the main results
10 - Writing of the thesis

Condições

The student will work in the context of the interdisciplinary project CAMELOT. This project is led by the Feedzai company and involves the Carnegie Mellon University, Universidade de Coimbra, Faculdade de Ciências da Universidade de Lisboa, Instituto Superior Técnico.
The work will be conducted in the Evolutionary and Complex Systems and Systems and Software Engineering groups, from CISUC.
There is a possibility of the student being awarded a scholarship (Bolsa de Investigação para Licenciado) for at least 6 months, renewable for an equal period by agreement between the advisor and the intern. The scholarship will follow the Fundação para a Ciência e Tecnologia (FCT) monthly stipend guidelines.

Orientador

Nuno Lourenço / Bruno Cabral / João Paulo Fernandes
naml@dei.uc.pt 📩