Titulo Estágio
A Confidentiality Preserving Streaming Data Classification Platform
Áreas de especialidade
Engenharia de Software
Sistemas de Informação
Local do Estágio
DEI (CISUC)
Enquadramento
The amount of data that people generate nowadays is immense. This enables the development of intelligent systems and services that can improve the quality and safety in the day to day life. Usually, the more information we have, the better the services we can provide to the user. However, we need to take into account that some of this information is private and should be treated with care. Preserving the privacy of information is extremely important nowadays. In an ideal scenario, it should be possible to use all the information available, without compromising the privacy of users. Over the years many researchers have devoted their attention to how we could achieve this, without compromising the quality of services provided.
With the rise of Deep Learning and the success it achieves in a variety of problems, many approaches have been proposed in the literature to tackle the problem of privacy preservation. One possible way is to develop a model that learns the underlying representation of the data and is then used to generate new examples that follow the same distributions, but do not reveal any private information about the original dataset [1,2,3,4].
But, for the “real world”, models don’t suffice, you still need a supporting infrastructure, a full stack of software, processes, or, simply put, you need a well-engineered system. Thus, this internship is not about the models, but about developing an end-to-end system for taking this such approach into production, into the “real world”.
1 - Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, 2(1), 1-127.
2 - Phan, N., Wang, Y., Wu, X., & Dou, D. (2016, February). Differential privacy preservation for deep auto-encoders: an application of human behavior prediction. In Thirtieth AAAI Conference on Artificial Intelligence.
3 - Malekzadeh, M., Clegg, R. G., & Haddadi, H. (2018, April). Replacement autoencoder: A privacy-preserving algorithm for sensory data analysis. In 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI) (pp. 165-176). IEEE.
4 - Xu, L., & Veeramachaneni, K. (2018). Synthesizing tabular data using generative adversarial networks. arXiv preprint arXiv:1811.11264.
Objetivo
The main goal of this dissertation is to design, develop, and implement a system that is able to both train and take into production AI models for streaming data classification without undermining data confidentiality. Such a system will have strong quality attributes requirements in terms of interoperability, having to integrate data from multiple sources, security and privacy, and, at the same time, performance, since response latency is of the utmost importance for the project goals. By the end of the internship, the student will have used a well defined Software Engineering process, identified architectural drivers, designed a software architecture, evaluated this architecture, implemented and tested the system in the context of the CAMELOT project.
Plano de Trabalhos - Semestre 1
Step 1 - Review of the literature.
Step 2 - Architectural Drivers elicitation.
Step 3 - System Architecture design and evaluation.
Step 4 - Definition of the Backlog
Step 5 - Writing of the intermediate report
Plano de Trabalhos - Semestre 2
Step 6 - Sprint-oriented implementation, validation, and refinement of the system.
Step 7 - Writing of the final report
Condições
The student will work in the context of the interdisciplinary project CAMELOT. This project is led by the Feedzai company and involves the Carnegie Mellon University, Universidade de Coimbra, Faculdade de Ciências da Universidade de Lisboa, Instituto Superior Técnico.
The work will be conducted in the Evolutionary and Complex Systems and Systems and Software Engineering groups, from CISUC. The eligible student will have at disposal all the necessary computational platforms, tools and devices.
There is a possibility of the student being awarded a scholarship for at least 6 months, renewable for an equal period by agreement between the advisor and the intern. The scholarship will follow the Fundação para a Ciência e Tecnologia (FCT) monthly stipend guidelines (Bolsa de Investigação para Licenciado).
Orientador
Bruno Cabral
bcabral@dei.uc.pt 📩