GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

Overview

GeneDisco: A benchmark for active learning in drug discovery

Python version Library version

In vitro cellular experimentation with genetic interventions, using for example CRISPR technologies, is an essential step in early-stage drug discovery and target validation that serves to assess initial hypotheses about causal associations between biological mechanisms and disease pathologies. With billions of potential hypotheses to test, the experimental design space for in vitro genetic experiments is extremely vast, and the available experimental capacity - even at the largest research institutions in the world - pales in relation to the size of this biological hypothesis space.

GeneDisco (published at ICLR-22) is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source i mplementations of state-of-the-art active learning policies for experimental design and exploration.

GeneDisco ICLR-22 Challenge

Learn more about the GeneDisco challenge for experimental design for optimally exploring the vast genetic intervention space here.

Install

pip install genedisco

Use

How to Run the Full Benchmark Suite?

Experiments (all baselines, acquisition functions, input and target datasets, multiple seeds) included in GeneDisco can be executed sequentially for e.g. acquired batch size 64, 8 cycles and a bayesian_mlp model using:

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --max_num_jobs=1

Results are written to the folder at /path/to/genedisco_cache, and processed datasets will be cached at /path/to/genedisco_cache (please replace both with your desired paths) for faster startup in future invocations.

Note that due to the number of experiments being run by the above command, we recommend execution on a compute cluster.
The GeneDisco codebase also supports execution on slurm compute clusters (the slurm command must be available on the executing node) using the following command and using dependencies in a Python virtualenv available at /path/to/your/virtualenv (please replace with your own virtualenv path):

run_experiments \
  --cache_directory=/path/to/genedisco_cache  \
  --output_directory=/path/to/genedisco_output  \
  --acquisition_batch_size=64  \
  --num_active_learning_cycles=8  \
  --schedule_on_slurm \
  --schedule_children_on_slurm \
  --remote_execution_virtualenv_path=/path/to/your/virtualenv

Other scheduling systems are currently not supported by default.

How to Run A Single Isolated Experiment (One Learning Cycle)?

To run one active learning loop cycle, for example, with the "topuncertain" acquisition function, the "achilles" feature set and the "schmidt_2021_ifng" task, execute the following command:

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="topuncertain" \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

How to Evaluate a Custom Acquisition Function?

To run a custom acquisition function, set --acquisition_function_name="custom" and --acquisition_function_path to the file path that contains your custom acquisition function.

active_learning_loop  \
    --cache_directory=/path/to/genedisco/genedisco_cache \
    --output_directory=/path/to/genedisco/genedisco_output \
    --model_name="bayesian_mlp" \
    --acquisition_function_name="custom" \
    --acquisition_function_path=/path/to/custom_acquisition_function.py \
    --acquisition_batch_size=64 \
    --num_active_learning_cycles=8 \
    --feature_set_name="achilles" \
    --dataset_name="schmidt_2021_ifng" 

...where "/path/to/custom_acquisition_function.py" contains code for your custom acquisition function corresponding to the BaseBatchAcquisitionFunction interface, e.g.:

import numpy as np
from typing import AnyStr, List
from slingpy import AbstractDataSource
from slingpy.models.abstract_base_model import AbstractBaseModel
from genedisco.active_learning_methods.acquisition_functions.base_acquisition_function import \
    BaseBatchAcquisitionFunction

class RandomBatchAcquisitionFunction(BaseBatchAcquisitionFunction):
    def __call__(self,
                 dataset_x: AbstractDataSource,
                 batch_size: int,
                 available_indices: List[AnyStr], 
                 last_selected_indices: List[AnyStr] = None, 
                 model: AbstractBaseModel = None,
                 temperature: float = 0.9,
                 ) -> List:
        selected = np.random.choice(available_indices, size=batch_size, replace=False)
        return selected

Note that the last class implementing BaseBatchAcquisitionFunction is loaded by GeneDisco if there are multiple valid acquisition functions present in the loaded file.

Citation

Please consider citing, if you reference or use our methodology, code or results in your work:

@inproceedings{mehrjou2022genedisco,
    title={{GeneDisco: A Benchmark for Experimental Design in Drug Discovery}},
    author={Mehrjou, Arash and Soleymani, Ashkan and Jesson, Andrew and Notin, Pascal and Gal, Yarin and Bauer, Stefan and Schwab, Patrick},
    booktitle={{International Conference on Learning Representations (ICLR)}},
    year={2022}
}

License

License

Authors

Patrick Schwab, GlaxoSmithKline plc
Arash Mehrjou, GlaxoSmithKline plc
Andrew Jesson, University of Oxford
Ashkan Soleymani, MIT

Acknowledgements

PS and AM are employees and shareholders of GlaxoSmithKline plc.

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Use this instead: https://github.com/facebookresearch/maskrcnn-benchmark A Pytorch Implementation of Detectron Example output of e2e_mask_rcnn-R-101-F

Roy 2.8k Dec 29, 2022
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Hong-Jia Chen 127 Dec 23, 2022
Official implementation for "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" (CVPR 2022)

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (CVPR2022) https://arxiv.org/abs/2203.08483 Unpaired image-to-image (I2I

Xueqi Hu 50 Dec 16, 2022
A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Jun-Yan Zhu 27 Aug 08, 2022
Object recognition using Azure Custom Vision AI and Azure Functions

Step by Step on how to create an object recognition model using Custom Vision, export the model and run the model in an Azure Function

El Bruno 11 Jul 08, 2022
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch Reference Paper URL Author: Yi Tay, Dara Bahri, Donald Metzler

Myeongjun Kim 66 Nov 30, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
ObjectDetNet is an easy, flexible, open-source object detection framework

Getting started with the ObjectDetNet ObjectDetNet is an easy, flexible, open-source object detection framework which allows you to easily train, resu

5 Aug 25, 2020
AWS documentation corpus for zero-shot open-book question answering.

aws-documentation We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions a

Sia Gholami 2 Jul 07, 2022
Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Bayes-Newton Bayes-Newton is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Wil

AaltoML 165 Nov 27, 2022
ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

ChebLieNet: Invariant spectral graph NNs turned equivariant by Riemannian geometry on Lie groups Hugo Aguettaz, Erik J. Bekkers, Michaël Defferrard We

haguettaz 12 Dec 10, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
Global-Local Context Network for Person Search

Global-Local Context Network for Person Search Abstract: Person search aims to jointly localize and identify a query person from natural, uncropped im

Peng Zheng 15 Oct 17, 2022
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

EdiTTS: Score-based Editing for Controllable Text-to-Speech Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Au

Neosapience 98 Dec 25, 2022
SpinalNet: Deep Neural Network with Gradual Input

SpinalNet: Deep Neural Network with Gradual Input This repository contains scripts for training different variations of the SpinalNet and its counterp

H M Dipu Kabir 142 Dec 30, 2022
Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.

Uncertainty-based OOD detection Exploring the link between uncertainty estimates obtained by "exact" Bayesian inference and out-of-distribution (OOD)

Christian Henning 1 Nov 05, 2022