Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

Overview

Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks

Requirements

  • python 0.10+
  • rdkit 2020.03.3.0
  • biopython 1.78
  • openbabel 2.4.1
  • numpy 1.19.2
  • scipy 1.5.2
  • torchvision 0.7.0

Conda enviroment is highly recommended for this implementation

Data Preparation for classification models

Data preperation requires the ligand and protein to be in a mol format readable by rdkit .mol, .mol2, and .pdb are readily handled by rdkit .sdf is easily handled with openbabel conversion, made convenient with the pybel wrapper

Both files can then be fed into extractM2.py where the cropping window can be adjusted on line 29 The extract method will operates best if the initial protein file is in pdbqt format. For easy model integration it is best to store the m2 protein window produced by the extract script along with the original protein ex: pickle.dump((m1,m2), file)

Once cropped complexes are stored, their numpy featurization files can be created. Files for the different models are labeled in the Data_Prep directory

The scripts are designed to use keys that reference the cropped and stored pairs from the previous step. Users will need to alter scripts to include their desired directories, as well as key traversal. Once these changes have been made, the scripts can be called with

python -W ignore gnn[f/p]_data_prep.py

Data Preparation for regression models

The data needs to be in mol format as similar to classification models. We have provided some sample mol files representing protein and ligand. Here the protein is cropped at 8Å window using the extract script as mentioned previously.

The cropped protein-ligand can be used to create features in numpy format. Sample training and test keys along with the corresponding pIC50 and experimental-binding-affinity (EBA) labels are provided in keys folder. All the files are saved in pickle format with train and test keys as list and the label files as disctionary with key corresponding to the train/test key and value corresponding to the label. The prepare_eba_data.py and prepapre_pic50_data.py uses the cropped protein-ligand mol files to create the correspnding features for the model and save them in compressed numpy file format in the corresponding numpy directory.

These scripts can be called as:

python repare_pic50_data.py <path to pkl-mol directory> <path to save numpy features>
python repare_eba_data.py <path to pkl-mol directory> <path to save numpy features>

Training

Below is an example of the training command. Additional options can be added to the argument parser here (learning rate, layer amount and dimension, etc). Defaults are in place for undeclared parameters including a save directory.

Classfication models

python -W ignore -u train.py --dropout_rate=0.3 --epoch=500 --ngpu=1 --batch_size=32 --num_workers=0  --train_keys=<your_training_keys.pkl>  --test_keys=<your_test_keys.pkl>

Regression models

python -W ignore -u train.py --dropout_rate=0.3 --epoch=500 --ngpu=1 --batch_size=1 --num_workers=0 --data_dir=<path to feature-numpy folder> --train_keys=<your_training_keys.pkl>  --test_keys=<your_test_keys.pkl>

The save directory stores each epoch as a .pt allowing the best model inatance to be loaded later on Training and test metrics such as loss and ROC are stored in the same directory for each GPU used. Ex 3 GPUS: log-rank1.csv, log-rank2.csv, and log-rank3.csv

Owner
Neeraj Kumar
Computational Biology/Chemistry and Bioinformatics.
Neeraj Kumar
Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code.

Yasunori Shimura 7 Jul 27, 2022
Self-Supervised Methods for Noise-Removal

SSMNR | Self-Supervised Methods for Noise Removal Image denoising is the task of removing noise from an image, which can be formulated as the task of

1 Jan 16, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

MusicYOLO MusicYOLO framework uses the object detection model, YOLOX, to locate notes in the spectrogram. Its performance on the ISMIR2014 dataset, MI

Xianke Wang 2 Aug 02, 2022
Isaac Gym Reinforcement Learning Environments

Isaac Gym Reinforcement Learning Environments

NVIDIA Omniverse 714 Jan 08, 2023
Neural Ensemble Search for Performant and Calibrated Predictions

Neural Ensemble Search Introduction This repo contains the code accompanying the paper: Neural Ensemble Search for Performant and Calibrated Predictio

AutoML-Freiburg-Hannover 26 Dec 12, 2022
Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks This is the code for the paper: MentorNet: Learning Data-Driven Curriculum fo

Google 302 Dec 23, 2022
Deep Learning agent of Starcraft2, similar to AlphaStar of DeepMind except size of network.

Introduction This repository is for Deep Learning agent of Starcraft2. It is very similar to AlphaStar of DeepMind except size of network. I only test

Dohyeong Kim 136 Jan 04, 2023
Applying PVT to Semantic Segmentation

Applying PVT to Semantic Segmentation Here, we take MMSegmentation v0.13.0 as an example, applying PVTv2 to SemanticFPN. For details see Pyramid Visio

35 Nov 30, 2022
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

Yue Gao 102 Jan 02, 2023
SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

SSL_SLAM2 Lightweight 3-D Localization and Mapping for Solid-State LiDAR (Intel Realsense L515 as an example) This repo is an extension work of SSL_SL

Wang Han 王晗 1.3k Jan 08, 2023
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Conceptual 12M We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-train

Google Research Datasets 226 Dec 07, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
Learning Open-World Object Proposals without Learning to Classify

Learning Open-World Object Proposals without Learning to Classify Pytorch implementation for "Learning Open-World Object Proposals without Learning to

Dahun Kim 149 Dec 22, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

343 Jan 06, 2023
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 07, 2023
A Transformer-Based Siamese Network for Change Detection

ChangeFormer: A Transformer-Based Siamese Network for Change Detection (Under review at IGARSS-2022) Wele Gedara Chaminda Bandara, Vishal M. Patel Her

Wele Gedara Chaminda Bandara 214 Dec 29, 2022
modelvshuman is a Python library to benchmark the gap between human and machine vision

modelvshuman is a Python library to benchmark the gap between human and machine vision. Using this library, both PyTorch and TensorFlow models can be evaluated on 17 out-of-distribution datasets with

Bethge Lab 244 Jan 03, 2023
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
A framework for attentive explainable deep learning on tabular data

🧠 kendrite A framework for attentive explainable deep learning on tabular data 💨 Quick start kedro run 🧱 Built upon Technology Description Links ke

Marnix Koops 3 Nov 06, 2021