Tandem Mass Spectrum Prediction with Graph Transformers

Overview

MassFormer

This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv.

Setting Up Environment

We recommend using conda. Three conda yml files are provided in the env/ directory (cpu.yml, cu101.yml, cu102.yml), providing different pytorch installation options (CPU-only, CUDA 10.1, CUDA 10.2). They can be trivially modified to support other versions of CUDA.

To set up an environment, run the command conda env create -f ${CONDA_YAML}, where ${CONDA_YAML} is the path to the desired yaml file.

Downloading NIST Data

Note: this step requires a Windows System or Virtual Machine

The NIST 2020 LC-MS/MS dataset can be purchased from an authorized distributor. The spectra and associated compounds can be exported to MSP/MOL format using the included lib2nist software. There is a single MSP file which contains all of the mass spectra, and multiple MOL files which include the molecular structure information for each spectrum (linked by ID). We've included a screenshot describing the lib2nist export settings.

Alt text

There is a minor bug in the export software that sometimes results in errors when parsing the MOL files. To fix this bug, run the script python mol_fix.py ${MOL_DIR}, where ${MOL_DIR} is a path to the NIST export directory with MOL files.

Downloading Massbank Data

The MassBank of North America (MB-NA) data is in MSP format, with the chemical information provided in the form of a SMILES string (as opposed to a MOL file). It can be downloaded from the MassBank website, under the tab "LS-MS/MS Spectra".

Exporting and Preparing Data

We recommend creating a directory called data/ and placing the downloaded and uncompressed data into a folder data/raw/.

To parse both of the datasets, run parse_and_export.py. Then, to prepare the data for model training, run prepare_data.py. By default the processed data will end up in data/proc/.

Setting Up Weights and Biases

Our implementation uses Weights and Biases (W&B) for logging and visualization. For full functionality, you must set up a free W&B account.

Training Models

A default config file is provided in "config/template.yml". This trains a MassFormer model on the NIST HCD spectra. Our experiments used systems with 32GB RAM, 1 Nvidia RTX 2080 (11GB VRAM), and 6 CPU cores.

The config/ directory has a template config file template.yml and 8 files corresponding to the experiments from the paper. The template config can be modified to train models of your choosing.

To train a template model without W&B with only CPU, run python runner.py -w False -d -1

To train a template model with W&B on CUDA device 0, run python runner.py -w True -d 0

Reproducing Tables

To reproduce a model from one of the experiments in Table 2 or Table 3 from the paper, run python runner.py -w True -d 0 -c ${CONFIG_YAML} -n 5 -i ${RUN_ID}, where ${CONFIG_YAML} refers to a specific yaml file in the config/ directory and ${RUN_ID} refers to an arbitrary but unique integer ID.

Reproducing Visualizations

The explain.py script can be used to reproduce the visualizations in the paper, but requires a trained model saved on W&B (i.e. by running a script from the previous section).

To reproduce a visualization from Figures 2,3,4,5, run python explain.py ${WANDB_RUN_ID} --wandb_mode=online, where ${WANDB_RUN_ID} is the unique W&B run id of the desired model's completed training script. The figues will be uploaded as PNG files to W&B.

Reproducing Sweeps

The W&B sweep config files that were used to select model hyperparameters can be found in the sweeps/ directory. They can be initialized using wandb sweep ${PATH_TO_SWEEP}.

Owner
Röst Lab
Röst lab at U of T -- join us at https://gitter.im/Roestlab/Lobby
Röst Lab
Implementation for "Seamless Manga Inpainting with Semantics Awareness" (SIGGRAPH 2021 issue)

Seamless Manga Inpainting with Semantics Awareness [SIGGRAPH 2021](To appear) | Project Website | BibTex Introduction: Manga inpainting fills up the d

101 Jan 01, 2023
DeLiGAN - This project is an implementation of the Generative Adversarial Network

This project is an implementation of the Generative Adversarial Network proposed in our CVPR 2017 paper - DeLiGAN : Generative Adversarial Net

Video Analytics Lab -- IISc 110 Sep 13, 2022
Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation This reposi

First Person Vision @ Image Processing Laboratory - University of Catania 1 Aug 21, 2022
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Here is deepparse. Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning. Use deepparse to Use the pr

GRAAL/GRAIL 192 Dec 20, 2022
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022
Rank 1st in the public leaderboard of ScanRefer (2021-03-18)

InstanceRefer InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

63 Dec 07, 2022
TensorFlow implementation of Elastic Weight Consolidation

Elastic weight consolidation Introduction A TensorFlow implementation of elastic weight consolidation as presented in Overcoming catastrophic forgetti

James Stokes 67 Oct 11, 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

290 Dec 29, 2022
PyTorch implementation of DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration (BMVC 2021)

DeepUME: Learning the Universal Manifold Embedding for Robust Point Cloud Registration [video] [paper] [supplementary] [data] [thesis] Introduction De

Natalie Lang 10 Dec 14, 2022
Jaxtorch (a jax nn library)

Jaxtorch (a jax nn library) This is my jax based nn library. I created this because I was annoyed by the complexity and 'magic'-ness of the popular ja

nshepperd 17 Dec 08, 2022
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 03, 2023
Official PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

DD3D: "Is Pseudo-Lidar needed for Monocular 3D Object detection?" Install // Datasets // Experiments // Models // License // Reference Full video Offi

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

The Top 10 Computer Vision Papers of 2021 The top 10 computer vision papers in 2021 with video demos, articles, code, and paper reference. While the w

Louis-François Bouchard 118 Dec 21, 2022
The open-source and free to use Python package miseval was developed to establish a standardized medical image segmentation evaluation procedure

miseval: a metric library for Medical Image Segmentation EVALuation The open-source and free to use Python package miseval was developed to establish

59 Dec 10, 2022
DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group 94 Dec 18, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks This repository contains the code and data for the corresp

Friederike Metz 7 Apr 23, 2022