A Python package to process & model ChEMBL data.

Overview

insilico: A Python package to process & model ChEMBL data.

PyPI version License: MIT

ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL) based in Hinxton, UK.

insilico helps drug researchers find promising compounds for drug discovery. It preprocesses ChEMBL molecular data and outputs Lapinski's descriptors and chemical fingerprints using popular bioinformatic libraries. Additionally, this package can be used to make a decision tree model that predicts drug efficacy.

About the package name

The term in silico is a neologism used to mean pharmacology hypothesis development & testing performed via computer (silicon), and is related to the more commonly known biological terms in vivo ("within the living") and in vitro ("within the glass".)

Installation

Installation via pip:

$ pip install insilico

Installation via cloned repository:

$ git clone https://github.com/konstanzer/insilico
$ cd insilico
$ python setup.py install

Python dependencies

For preprocessing, rdkit-pypi, padelpy, and chembl_webresource_client and for modeling, sklearn and seaborn

Basic Usage

insilico offers two functions: one to search the ChEMBL database and a second to output preprocessed ChEMBL data based on the molecular ID. Using the chemical fingerprint from this output, the Model class creates a decision tree and outputs residual plots and metrics.

The function process_target_data saves the chemical fingerprint and, optionally, molecular descriptor plots to a data folder if plots=True.

When declaring the model class, you may specify a test set size and a variance threshold, which sets the minimum variance allowed for each column. This optional step may eliminate hundreds of features unhelpful for modeling. When calling the decision_tree function, optionally specify max tree depth and cost-complexity alpha, hyperparameters to control overfitting. If save=True, the model is saved to the data folder.

from insilico import target_search, process_target_data, Model

# return search results for 'P. falciparum D6'
result = target_search('P. falciparum')

# returns a dataframe of molecular data for CHEMBL2367107 (P. falciparum D6)
df = process_target_data('CHEMBL2367107')

model = Model(test_size=0.2, var_threshold=0.15)

# returns a decision tree and metrics (R^2 and MAE) & saves residual plot
tree, metrics = model.decision_tree(df, max_depth=50, ccp_alpha=0.)

# returns split data for use in other models
X_train, X_test, y_train, y_test = model.split_data()

Advanced option: Use optional 'fp' parameter to specify fingerprinter

Valid fingerprinters are "PubchemFingerprinter" (default), "ExtendedFingerprinter", "EStateFingerprinter", "GraphOnlyFingerprinter", "MACCSFingerprinter", "SubstructureFingerprinter", "SubstructureFingerprintCount", "KlekotaRothFingerprinter", "KlekotaRothFingerprintCount", "AtomPairs2DFingerprinter", and "AtomPairs2DFingerprintCount".

df = process_target_data('CHEMBL2367107', plots=False, fp='SubstructureFingerprinter')

Contributing, Reporting Issues & Support

Make a pull request if you'd like to contribute to insilico. Contributions should include tests for new features added and documentation. File an issue to report problems with the software or feature requests. Include information such as error messages, your OS/environment and Python version.

Questions may be sent to Steven Newton ([email protected]).

References

Bioinformatics Project from Scratch: Drug Discovery by Chanin Nantasenamat

Owner
Steven Newton
"Nobody can do it all but everybody can do something." -Sylvia Earle, marine biologist (Mission-Blue.org)
Steven Newton
Demystifying How Self-Supervised Features Improve Training from Noisy Labels

Demystifying How Self-Supervised Features Improve Training from Noisy Labels This code is a PyTorch implementation of the paper "[Demystifying How Sel

<a href=[email protected]"> 4 Oct 14, 2022
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
For auto aligning, cropping, and scaling HR and LR images for training image based neural networks

ImgAlign For auto aligning, cropping, and scaling HR and LR images for training image based neural networks Usage Make sure OpenCV is installed, 'pip

15 Dec 04, 2022
PyQt6 configuration in yaml format providing the most simple script.

PyamlQt(ぴゃむるきゅーと) PyQt6 configuration in yaml format providing the most simple script. Requirements yaml PyQt6, ( PyQt5 ) Installation pip install Pya

Ar-Ray 7 Aug 15, 2022
Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Vector Neurons: A General Framework for SO(3)-Equivariant Networks Created by Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacc

Congyue Deng 332 Dec 29, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 04, 2022
N-gram models- Unsmoothed, Laplace, Deleted Interpolation

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

Ravika Nagpal 1 Jan 04, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
Learning to Reach Goals via Iterated Supervised Learning

Vanilla GCSL This repository contains a vanilla implementation of "Learning to Reach Goals via Iterated Supervised Learning" proposed by Dibya Gosh et

Christoph Heindl 4 Aug 10, 2022
EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

EquiBind: geometric deep learning for fast predictions of the 3D structure in which a small molecule binds to a protein

Hannes Stärk 355 Jan 03, 2023
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

Minho Ryu 29 Nov 30, 2022
Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

Machine learning enabling high-throughput and remote operations at large-scale user facilities. Overview This repository contains the source code and

BNL 4 Sep 24, 2022
基于Pytorch实现优秀的自然图像分割框架!(包括FCN、U-Net和Deeplab)

语义分割学习实验-基于VOC数据集 usage: 下载VOC数据集,将JPEGImages SegmentationClass两个文件夹放入到data文件夹下。 终端切换到目标目录,运行python train.py -h查看训练 (torch) Li Xiang 28 Dec 21, 2022

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

Naiyuan Liu 232 Dec 29, 2022
A library for implementing Decentralized Graph Neural Network algorithms.

decentralized-gnn A package for implementing and simulating decentralized Graph Neural Network algorithms for classification of peer-to-peer nodes. De

Multimedia Knowledge and Social Analytics Lab 5 Nov 07, 2022
This is an official pytorch implementation of Fast Fourier Convolution.

Fast Fourier Convolution (FFC) for Image Classification This is the official code of Fast Fourier Convolution for image classification on ImageNet. Ma

pkumi 199 Jan 03, 2023
Original Implementation of Prompt Tuning from Lester, et al, 2021

Prompt Tuning This is the code to reproduce the experiments from the EMNLP 2021 paper "The Power of Scale for Parameter-Efficient Prompt Tuning" (Lest

Google Research 282 Dec 28, 2022