Titulo Estágio
Multi-Objective Drug Design with Recurrent Neural Networks
Áreas de especialidade
Sistemas Inteligentes
Local do Estágio
Laboratory of Artificial Neural Networks (LARN
Enquadramento
The drug design process is lengthy and demands huge investment, which can be optimized through a method called de novo. This is a computer-aided drug design technique to build novel chemical molecules with desired pharmacological properties from scratch. However, the result is often molecules that are not feasible to be synthesized in the laboratory. One of the reasons is that multiple, pharmaceutically relevant parameters are not correctly optimized since only the main functional objective is taken into account. This proposal proposes the use of state-of-the-art Deep Learning methods to develop a computational model that accurately generates novel molecules that not only have the predicted activity against a target but also include multiple pharmacological objectives.
In particular, the goal is to design a multiobjective, evolutionary drug design approach. A recurrent neural network will be used to generate molecules, and the best are selected and used to retrain the network through transfer learning. Transfer learning allows knowledge to be transferred between tasks and has proven to be an efficient way of improving the accuracy of models on narrowly-defined tasks. The best of the generated molecules are selected by the novel application of a non-dominated sorting algorithm, a proven method of multiobjective optimization. The purpose is to optimize different criteria of drug candidates that originate from various drug-likability features. The innovation from this proposal comes from the use of a deep generative method combined with molecular selection (non-dominated sorting).
Objetivo
The main objective of this proposal is to develop a deep generative method, a recurrent neural network in conjunction with a non-dominated sorting algorithm to create a cycle for multiobjective de novo drug design. Initially, the long short term memory (LSTM) recurrent neural network that can generate new molecules with similar properties and similar diversity to the original training data. Then applying a non-dominated sorting algorithm to select the best of the molecules generated. The main goals of this proposal are:
1. Construct a data set for the generative model;
2. Perform Data Pre-Processing, Normalization and Scaling;
3. Select appropriate ML algorithms for building the generative model;
4. Select appropriate algorithms for implementing the multiobjective
model;
5. Perform sampling and model evaluation and validate the overall model
with real data sets.
6. Integrate the implemented components reusable platform.
Plano de Trabalhos - Semestre 1
1st semester
• Overview of drug discovery, including target identification, lead
discovery, and lead optimization;
• Overview of machine learning techniques, namely Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Non-dominated sorting methods;
• Propose initial deep generative model combined with transfer learning
workflow and prepare the first case study
• Prepare the intermediate report.
Plano de Trabalhos - Semestre 2
• Select, and pre-process a collection of large datasets for experiments;
• Study, and select, Machine Learning (ML) algorithms for building the generative model to create valid novel drug molecules;
• Study and select feature selection algorithms for generating chemically feasible drug molecule
• Analyse experimental results: e.g., study parameter values; compare
performance of the reduced datasets vs. previous results, etc.;
• Applied a non-dominated sorting algorithm to select the best of the
molecules generated.
• Prepare a research paper and the final version of the thesis.
Condições
This work will be carried out in the Laboratory of Neural Networks (LARN) of CISUC, where there will be regular supervision and feedback on behalf of the supervisors. Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications
Orientador
Maryam Abbasi, Joel P. Arrais
maryam@dei.uc.pt 📩