Titulo Estágio
Structure-Based De Novo Molecular Design for Drug Discovery
Local do Estágio
LARN
Enquadramento
The traditional methods of drug discovery are often time-consuming and resource-intensive. Recently, the advent of generative molecular design has shown promise in expediting this process by leveraging computational methods to explore vast chemical spaces. However, most generative models have primarily utilized small-molecule information, especially in the form of text based representations, neglecting the critical role of ligand/protein structures and surfaces of interaction in drug design. There is a pressing need to integrate both target protein and drug structure information into generative molecular design models to enhance the prediction of on-target binding affinity. By incorporating the three-dimensional structures of target proteins and/or the ligand, these models can generate molecules more likely to bind effectively and exhibit the desired biological activity. This integration can improve the efficiency and accuracy of drug discovery, leading to the development of more effective therapeutics, but also with potential direct applications in other fields, like the phytochemical industry and agriculture. This project aims to address this gap by exploring and categorizing recent approaches incorporating protein structures into de novo molecule optimization and developing generative models that incorporate ligand/protein structure information. Specifically, it will develop generative models that incorporate ligand/protein structure information, aiming to maximize the predicted on-target binding affinity or any property of interest of the generated molecule.
Objetivo
• Select, and pre-process a collection of large datasets such as the Protein Data Bank (PDB) and ChEMBL.
• Study, and select, Machine Learning (ML) algorithms for building the model .
• Development of generative models that incorporate drug structure information, target structure information, or both.
• Training the models using the collected datasets.
• Evaluation of the developed models based on metrics such as validity, novelty, and predicted binding affinity of generated molecules.
• Comparison with traditional generative models that do not incorporate drug/target structure information.
• Prepare a research paper and the final version of the thesis.
Plano de Trabalhos - Semestre 1
• Overview of drug discovery, including target identification, lead discovery, and lead optimization;
• Overview of machine learning techniques, namely Recurrent Neural Networks (RNN), Reinforcement Learning and ; Graph neural network (GNN)
• Categorization of recent approaches into distribution learning or goal-directed optimization.
• Analysis of whether these approaches are protein structure-explicit or implicit with respect to the generative model.
• Propose initial deep model workflow and prepare the first case study
• Prepare the intermediate report.
Plano de Trabalhos - Semestre 2
• Select, and pre-process a collection of large datasets such as the Protein Data Bank (PDB) and ChEMBL;
• Study, and select, Machine Learning (ML) algorithms for building the model
• Development of generative models that incorporate protein structure information.
• Training the models using the collected datasets.
• Evaluation of the developed models based on metrics such as validity, novelty, and predicted binding affinity of generated molecules.
• Comparison with traditional generative models that do not incorporate protein structure information.;
• Prepare a research paper and the final version of the thesis.
Condições
This work will be carried out in the Laboratory of Neural Networks (LARN) of CISUC, where there will be regular supervision and feedback on behalf of the supervisors.
Familiarity with machine learning and data mining algorithms and software tools are essential. Participating students will acquire valuable knowledge and experience with model building and data science by mining massive datasets, which skills are currently in high demand for various technology employers due to the relevance to various applications.
Observações
Supervisors:
Joel P. Arrais (jpa@dei.uc.pt); Maryam Abbasi (Maryam.abbasi@ipc.pt)
Orientador
Joel P. Arrais (jpa@dei.uc.pt); Maryam Abbasi (Maryam.abbasi@ipc.pt)
jpa@dei.uc.pt 📩