Propostas sem aluno

Gerado a 2021-11-28 07:42:28 (Europe/Lisbon).

Titulo Estágio

Autoencoders for feature representation applied to Drug Discovery

Áreas de especialidade

Sistemas Inteligentes

Engenharia de Software

Local do Estágio

Laboratory of Artificial Neural Networks (LARN)


The traditional drug discovery process may take up to 15 years from conceptualization to market with a cost that can reach one thousand million dollars, without any warranties that the identified compounds will reach the market. The first three stages, namely target identification, lead discovery, and lead optimization, may take 4 to 7 years alone. This is mainly a data-driven process that starts with all human proteins that can be used as putative targets, the millions of lead compounds that need to be evaluated and, for the final candidates, a massive number of structural variants to be tested.
Inserted in the D4 project, Target Identification is proposed to be optimized and accelerated. With reduced dimensionality, we want to retain only the relevant information shortening the time for the initial stages in Drug Development. There are evidences that support the use of the architectures in study are able to identify key aspects/attributes that allow these goals to be achieved.


The objective of this proposal is to develop an algorithm for improved implementation of the Drug Discovery pipeline. This algorithm can be implemented under the paradigm of Autoencoders, but also Restricted Boltzman Machine and Deep Belief Network. Other alternatives may be looked for during this project, but these are the main incentives.
The main goal is to develop a model able to apply Deep Learning Techniques to Drug Discovery:
(i) Construct the data set for the predictive model;
(ii) Perform Data Pre-Processing, Normalization and Scaling;
(iii) Select appropriate ML algorithms for building the Predictive Model;

Plano de Trabalhos - Semestre 1

First semester:
•Overview Drug Discovery, including target identification, lead discovery, and lead optimization;
•Overview of Deep Learning techniques, namely Autoencoders, and then Restricted Boltzmann Machines (RBMs), Deep Belief Networks (DBNs) and Convolutional Neural Networks (CNN);
•Propose initial predictive model;
•Prepare the intermediate report.

Plano de Trabalhos - Semestre 2

•Select, and preprocess a collection of large datasets for experiments;
•Study, and select, machine learning (ML) algorithms and feature selection (FS) algorithms for building the predictive model;
•Analyze experimental results: e.g., study parameter values; compare performance of the reduced datasets vs. previous results, etc.;
•Prepare a research paper and the final version of the thesis.


This work will be carried out in the Laboratory of Neural Networks (LARN) of CISUC, where there will be a regular supervision and feedback on the behalf of the supervisors.
Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications.


This proposal is supported by the funded FCT project D4 (Deep Drug Discovery and Deployment):
A scholarship of 745 euros with duration of 3 months will be available for this proposal. Interested students are invited to contact the supervisors.
Logistics @Laboratory of Neural Networks (LARN)
Joel P. Arrais (
Carlos Pereira (


Joel P. Arrais 📩