DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

Last update: Jan 02, 2023

Overview

`DockStream`

Description

DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution and post hoc analysis can be automated via the benchmarking and analysis workflow. The flexilibity to specifiy a large variety of docking configurations allows tailored protocols for diverse end applications. DockStream can also parallelize docking across CPU cores, increasing throughput. DockStream is integrated with the de novo design platform, REINVENT, allowing one to incorporate docking into the generative process, thus providing the agent with 3D structural information.

Supported Backends

Ligand Embedders

Docking Backends

Note: The CCDC package, the OpenEye toolkit and Schrodinger's tools require you to obtain the respective software from those vendors.

Tutorials and Usage

Detailed Jupyter Notebook tutorials for all DockStream functionalities and workflows are provided in DockStreamCommunity. The DockStream repository here contains input JSON templates located in examples. The templates are organized as follows:

target_preparation: Preparing targets for docking
ligand_preparation: Generating 3D coordinates for ligands
docking: Docking ligands
integration: Combining different ligand embedders and docking backends into a single input JSON to run successively

Requirements

Two Conda environments are provided: DockStream via environment.yml and DockStreamFull via environment_full.yml. DockStream suffices for all use cases except when CCDC GOLD software is used, in which case DockStreamFull is required.

git clone <DockStream repository>
cd <DockStream directory>
conda env create -f environment.yml
conda activate DockStream

Enable use of OpenEye software (from REINVENT README)

You will need to set the environmental variable OE_LICENSE to activate the oechem license. One way to do this and keep it conda environment specific is: On the command-line, first:

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

Then edit ./etc/conda/activate.d/env_vars.sh as follows:

#!/bin/sh
export OE_LICENSE='/opt/scp/software/oelicense/1.0/oe_license.seq1'

and finally, edit ./etc/conda/deactivate.d/env_vars.sh :

#!/bin/sh
unset OE_LICENSE

Unit Tests

After cloning the DockStream repository, enable licenses, if applicable (OpenEye, CCDC, Schrodinger). Then execute the following:

python unit_tests.py

Contributors

Christian Margreitter ([email protected]) Jeff Guo ([email protected]) Alexey Voronov ([email protected])

Comments

Glide dockings using local machine

Hi, I am trying to play with DockStream using Schrodinger. I am wondering if there is the possibility to use it in the local machine specifying $SCHRODINGER/glide instead of the tokens procedure.

opened by Oulfin 6
Bug in Glide backend parallelization

First, thanks for contributing this nice toolbox.

This is to report a bug in the following module:

https://github.com/MolecularAI/DockStream/blob/7bdfd4a67f5c938e3222db59387e5a95e8a59e56/dockstream/core/Schrodinger/Glide_docker.py#L404

while loop is used to process all sublists in batches. However, the number of processed sublists as recorded in jobs_submitted could be off because this variable is the cumulative sum of len(tmp_output_dirs), which could be smaller than len(cur_slice_sublists) if any of the sublists has no valid molecules to write out.

The bug may cause some of the sublists get processed repeatedly, and in extreme cases may result in an infinite loop.

I didn't check if any other backend uses similar logic to parallelize the run and may suffer from the same problem.

opened by hshany 3
Question: Is it possible to feed an sdf file of prepared ligands straight into docking?

I'm trying to work out whether it's possible to put an sdf file of prepared ligands straight into a Glide run? i.e. not specifying an input_pool to the docking_runs list? (especially when using docker.py)

opened by reskyner 2
Raise LigandPreparationFailed error

For OpenEye Hybrid, it reported LigandPreparationFailed errors for both CORINA and OMEGA backend. One example is shown below: `File "/DockStream/dockstream/core/OpenEyeHybrid/Omega_ligand_preparator.py", line 66, in init raise LigandPreparationFailed("Cannot initialize OMEGA backend - abort.") dockstream.utils.dockstream_exceptions.LigandPreparationFailed: Cannot initialize OMEGA backend - abort.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/DockStream/docker.py", line 132, in raise LigandPreparationFailed dockstream.utils.dockstream_exceptions.LigandPreparationFailed`

Could you please help me with this problem? I tried both the provided receptor-ligand data files from DockStreamCommunity and my own dataset. Both reported same LigandPreparation error. Thank you in advance!

opened by fangffRS 1
ADV 1.2.0 support

For DockStream to work with the new AutoDock-Vina 1.2.0 (https://pubs.acs.org/doi/10.1021/acs.jcim.1c00203), the "log-file" specification has to go:

https://github.com/MolecularAI/DockStream/blob/efefbe52d3cecb8b6d1b72ab719aad1e4702833b/dockstream/core/AutodockVina/AutodockVina_docker.py#L275

Should be backwards-compatible.

opened by CMargreitter 1
Input file of the function "parse_maestro"

First of all, thank you for your wonderful work in drug development area using AI. I am using Glide to get the result through DockStream. I think the the function parse_maestro in Glide_docker.py can be used to extract setting for docking(In DockStream, this setting is written json file). Is this right? If so, could you tell me the input file type for the parse_mastro?! (eg. maegz, mae, sdf, etc.) I tried the function with maegz (output from Glide docking), but I couldn't get the result. I want to use parse_maestro function to reproduce the setting which applied to previous docking simulation. I would be very grateful if you could give the answer to me. Thanks!

opened by SejeongPark8354 0
Openbabel integration failed

I am trying to implement Dockstream with the vina backend, an exception is raised with openBabel executable.

Traceback (most recent call last): File "DockStream/target_preparator.py", line 130, in prep = AutodockVinaTargetPreparator(conf=config, target=input_pdb_path, run_number=run_number) File "C:\Users\Y-8874903-E.ESTUDIANT\OneDrive - URV\Escritorio\PLIP interaction\DockStream\dockstream\core\AutodockVina\AutodockVina_target_preparator.py", line 56, in init raise TargetPreparationFailed("Cannot initialize OpenBabel external library, which should be part of the environment - abort.") dockstream.utils.dockstream_exceptions.TargetPreparationFailed: Cannot initialize OpenBabel external library, which should be part of the environment - abort.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "DockStream/target_preparator.py", line 139, in raise TargetPreparationFailed() from e dockstream.utils.dockstream_exceptions.TargetPreparationFailed

Follow all necessary steps mentioned in docs.

opened by Crispae 1
Parallelization of ADV for docking

Hello,

I am trying to run first docking experiments together with reinvent. I am observing many ADV jobs getting started with -cpu 1 (hardcoded), but a few (1 or 2) take quite long and leave all other CPUs idle until the batch has finished and a new batch has started.

This leaves quite some capacity of a e.g. 16-core machine unused - at least that is my impression when observing the run via top or ps. In the dockstream.config, parallelization.number_cores is set to 16.

Are there better practical settings to better exploit larger machines with 16-64 CPUs ?

Lars

opened by LarsAC 3
No module named 'ccdc'

I believe I successfully installed the normal (not Full) DockStream package as per your instructions on the github site, and then tried to run the unit test, but this fails with a complaint regarding the ccdc module missing (see below). But I want to use Glide so wouldn’t need (nor have) ccdc. I am doing this on Ubuntu 18.04.

Dockstream/python ./unit_tests.py Traceback (most recent call last): File "./unit_tests.py", line 10, in from tests.Gold import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/init.py", line 1, in from tests.Gold.test_Gold_target_preparation import * File "/media/data/evehom/Projects/CompChem/DockStream/tests/Gold/test_Gold_target_preparation.py", line 11, in from dockstream.core.Gold.Gold_target_preparator import GoldTargetPreparator File "/media/data/evehom/Projects/CompChem/DockStream/dockstream/core/Gold/Gold_target_preparator.py", line 3, in import ccdc ModuleNotFoundError: No module named 'ccdc'

opened by Evert-Homan 4