Propostas submetidas

DEI - FCTUC
Gerado a 2024-05-03 16:48:10 (Europe/Lisbon).
Voltar

Titulo Estágio

Fragment Based Drug Design with Deep Neural Networks

Áreas de especialidade

Sistemas Inteligentes

Local do Estágio

Laboratory of Artificial Neural Networks (LARN)

Enquadramento

The term de novo Drug Design refers to a collection of techniques for the production of novel chemical compounds, either by in-vitro synthesis or computer-aided, endowed with desired pharmaceutical properties. Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative approaches addressing the challenge belong to two broad categories, differing in how molecules are represented. One approach encodes molecular graphs as strings of text, and learns their corresponding character-based language model. Another, more expressive, approach operates directly on the molecular graph.

In this proposal, we plan to address two limitations of the former: generation of invalid and duplicate molecules. To improve validity rates, we plan to develop a language model for small molecular substructures called fragments, loosely inspired by the well-known paradigm of Fragment-Based Drug Design. In other words, Deep Learning Networks (DL) generate molecules fragment by fragment, instead of atom by atom.
Note that since fragments are chemically sound, the approach needs to ensure validity only when connecting a novel fragment; in contrast,
character-based DP approaches need to maintain validity after each novel atom is added. To improve uniqueness rates, a frequency-based masking strategy is proposed that helps generate molecules with infrequent fragments.

Objetivo

The main objective of this proposal is to develop a deep generative model based on the Recurrent Neural Network, which can learn molecular generation biased towards some desired chemical properties. The main goals of this proposal are:

i) Construct a data set for the generator model;
ii) Perform data pre-processing, normalization and scaling;
iii) Select appropriate algorithms to convert the data set to fragment such as Molvec;
iv) Select appropriate ML algorithms for building the generative model;
v) Perform sampling and model evaluation and validate the overall model with real data sets.
vi) Integrate the implemented components reusable platform.

Plano de Trabalhos - Semestre 1

1- Overview of drug discovery, including target identification, lead discovery, and lead optimization;
2- Overview of machine learning techniques, namely Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Fragment based Algorithm;
3- Propose initial deep generative model combined with Fragment Generator algorithm and prepare the first case study
4- Prepare the intermediate report.

Plano de Trabalhos - Semestre 2

1- Select, and pre-process a collection of large datasets for experiments;
2- Study, and select, Machine Learning (ML) algorithms for building the generative model to create valid novel drug molecules;
3- Study and select fragment selection algorithms
4- Analyse experimental results: e.g., study parameter values; compare performance of the reduced datasets vs. previous results, etc.;
5- Prepare a research paper and the final version of the thesis.

Condições

This work will be carried out in the Laboratory of Neural Networks (LARN) of CISUC, where there will be regular supervision and feedback on behalf of the supervisors.
Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications

Orientador

Maryam Abbasi, Joel P. Arrais, Bernardete Ribeiro
maryam@dei.uc.pt 📩