scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Last update: Nov 28, 2022

Overview

scAR

scAR (single cell Ambient Remover) is a package for denoising multiple single cell omics data. It can be used for multiple tasks, such as, sgRNA assignment for scCRISPRseq, identity barcode assignment for cell indexing, protein denoising for CITE-seq, mRNA denoising for scRNAseq, and etc... It is built using probabilistic deep learning, illustrated as follows:

Installation
Usage
Dependencies
Resources
License
Reference

Installation

Clone this repository,

$ git clone https://github.com/Novartis/scAR.git

Enter the cloned directory:

$ cd scAR

To install the dependencies, create a conda environment:

Please use scAR-gpu if you have an nvidia graphis card and the corresponging driver installed.

$ conda env create -f scAR-gpu.yml

Please use scAR-cpu if you don't have a graphis card availalble.

$ conda env create -f scAR-cpu.yml

To activate the scAR conda environment run:

$ conda activate scAR

Usage

There are two ways to run scAR.

Use scAR API if you are Python users

>>> from scAR import model
>>> scarObj = model(adata.X.to_df(), empty_profile)
>>> scarObj.train()
>>> scarObj.inference()
>>> adata.layers["X_scAR_denoised"] = scarObj.native_counts
>>> adata.obsm["X_scAR_assignment"] = scarObj.feature_assignment  # feature assignment, e.g., sgRNAs, tags, and etc.. Only available in 'cropseq' mode

See the tutorials

Run scAR from the command line

$ scar raw_count_matrix.pickle -t technology -e empty_profile.pickle -o output

raw_count_matrix.pickle, a pickle-formatted raw count matrix (MxN) with cells in rows and features in columns
empty_profile.pickle, a pickle-formatted feature frequencies (Nx1) in empty droplets
technology, a string, either 'scRNAseq' or 'CROPseq' or 'CITEseq'

Use scar --help command to see other optional arguments and parameters.

The output folder contains four (or five) files:

output
├── denoised_counts.pickle		# denoised count matrix
├── expected_noise_ratio.pickle	# estimated noise ratio
├── BayesFactor.pickle			# bayesian factor of ambient contamination
├── expected_native_freq.pickle	# estimated native frequencies
└── assignment.pickle			# feature assignment, e.g., sgRNAs, tags, and etc.. Gernerated under 'cropseq' mode

Dependencies

Resources

Tutorials:
If you'd like to contribute, please contact Caibin ([email protected]).
Please use the issues to submit bug reports.

License

This project is licensed under the terms of License.
Copyright 2022 Novartis International AG.

Reference

If you use scAR in your research, please consider citing our manuscript,

@article {Sheng2022.01.14.476312,
	author = {Sheng, Caibin and Lopes, Rui and Li, Gang and Schuierer, Sven and Waldt, Annick and Cuttat, Rachel and Dimitrieva, Slavica and Kauffmann, Audrey and Durand, Eric and Galli, Giorgio G and Roma, Guglielmo and de Weck, Antoine},
	title = {Probabilistic modeling of ambient noise in single-cell omics data},
	elocation-id = {2022.01.14.476312},
	year = {2022},
	doi = {10.1101/2022.01.14.476312},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/01/14/2022.01.14.476312},
	eprint = {https://www.biorxiv.org/content/early/2022/01/14/2022.01.14.476312.full.pdf},
	journal = {bioRxiv}
}

Comments

Stochastic rounding to integers for downstream use in TotalVI/SCVI

Hi Caibin,

I tried using scar's output as input for TotalVI/SCVI. As expected, those gave an error because the input is not integer anymore. I would suggest implementing stochastic rounding to integers as done in SoupX.

Let me know if you're interested and I can find the time to implement it.

Regards, Mikhael
enhancement

opened by mdmanurung 9
BiocondaBot not triggered
Hi @fgypas , I made a new release v0.4.1 but bioconda somehow is not triggered upon the new release.

In the new release, some codes related to building process have been refactored.

All information in setup.py (deleted) is integrated into setup.cfg.

An extra pyproject.toml file is added.

I am wondering whether these affect the bioconda-recipes.

Many thanks, Caibin
opened by CaibinSh 7
New release
Hi @fgypas ,

I am making a new release. There are mainly three changes: 1) addition of a readthedocs; 2) code reformatting via black and pylint (pylint now can score >7, so I have increase the standard in the Action test from 0.5 to 6); 3) renaming 'scAR' to 'scar'.

I have a couple of questions regarding whether these changes influence the bioconda recipe.

Will renaming package name (scAR) require modification in bioconda PR? All uppercase ('scAR') is changed to lowercase ('scar') in everywhere possible (inc. folder, environment, and etc.) But the repo name may stay as 'scAR' for a while, as renaming repo name requires permission from Nick.

Should we exclude the folder of datasets in the conda recipe? In addition, a folder, named 'datasets' contains >100 MBs data is added for the tutorial. Should we exclude it?

question
opened by CaibinSh 3
Implementation in scvi-tools

Hi scAR team,

I'm reaching out to gauge interest in having a mirror implementation in scvi-tools for scAR. Given the existing infrastructure in the scvi-tools repository, I was able to create a port of scAR quite easily as an external module. Of course, the implementation will link to this repository as the original and cites the paper in the docs. On top of that, the port would allow users of scvi-tools to use the pretrained scAR encoder for doublet detection using the solo model.

Here's the pending pull request so you can check out what it would look like in the final implementation: https://github.com/scverse/scvi-tools/pull/1683

Please let me know what you think!

opened by ricomnl 2
Positive-valued denoising results for ADTs with raw 0 counts

Hi scar team!

Thank you for developing this interesting package. I had a question about the resulting denoised values for CITE-seq experiments.

I've noticed that some cells that originally have a 0 value for an ADT (as a raw count) will have a positive value (>0) for that ADT after the denoising procedure. Below, I show this case for the CD25 ADT in the 10xPBMC5k CITE-seq dataset (from the tutorial at https://scar-tutorials.readthedocs.io/en/latest/tutorials/scAR_tutorial_denoising_CITEseq.html).

I'm a bit confused about how to best interpret these values and how they are occurring. Should these be set to 0 after the denoising procedure?
question

opened by diegoalexespi 2
Sparsity values for mRNA decontamination?

Hello,

I was wondering what the recommendations for the sparsity value would be in denoising mRNA? Specifically if we don't know too much of the data besides UMI/nGenes in the cells etc.? I noticed its generally set at 1 for sgRNA decontamination, but what would the general recommended value be for mRNA?

Thanks, Chang
question

opened by cnk113 1
Number of training epochs + batch size
Dear scAR-Team,

thank you for developing this package. I am currently exploring it and I would like to ask you

how do you determine the number of epochs the user should use for feature_type = "mRNA"? In your tutorials you used 400 epochs and in your paper you mentioned that you fixed the epochs to 800. I applied it for various batch sizes (up to 1000) and noticed that the model is sensitive to it.

I noticed that you use rather small batch-size - is scAR sensitive to the batch-size, it is just due to computational limitations or due to better perfromance?

Thank you in advance!

Best,
question
opened by KalinNonchev 1
bump to version 0.3.2

fix(*): changelog docs: adding docstring in documentation docs: adding Release notes in documentation docs: adding docstring in documentation test: adding semantic release refactor: further refactoring codes fix semantic release

opened by CaibinSh 1
ask for permission of Webhooks

Hi @kliatsko ,

We are currently refactoring and adding functionalities to scAR.

Could you please grant the Webhooks permission for us to automate the documentation?

Many thanks in advance. Best regards, Caibin on behalf of the scar team @fgypas @Tobias-Ternent @mr-nvs @AlexMTYZ.
help wanted

opened by CaibinSh 1
New release
Additions of readthedocs

Code refactoring

Renaming module names, e.g. changing "scAR" -> "scar"

Renaming parameter names, e.g.

changing "scRNAseq_tech" -> "feature_type" changing "model" -> "count_model" changing "scRNAseq_tech" -> "feature_type"

Black and Pylint re-formatting the code

enhancement
opened by CaibinSh 1
Black github action

Addition of black github action that runs on every push and every pull request. It shows in the stdout all the changes that need to be made (--diff), but returns exit code 0, even if errors are observed.

opened by fgypas 1

Releases(v0.4.4)

v0.4.4(Aug 9, 2022)
Documentation

Update dependency (03cf19e)

Update dependencies (9bd7f1c)

Update documentations (418996c)

Update dependencies (1bde351)

main: Add link to anndata and scanpy (8436e05)

main: Update dependencies (984df35)

main: Update documentation for .h5 file (2a309e0)

Add a link of binary installers (2faed3e)

Update documentations (e26a6e9)

Add competing methods (8564b2b)

scar: Add versionadded directives for parameter sparsity and round_to_int (33e35ca)

Update docs (a4da539)

Update introduction (a036b24)

Change readthedocs template (421e52f)

data_generator: Update docs (1f8f668)

data_generator: Re-style docs (afef9fb)

*: Re-style docs (2d550fa)

Performance

main: Command line tool supports a new input: filtered_feature_bc_matrix.h5 (73bc13e)

setup: Add an error raise statement (f4fb1a8)

Source code(tar.gz)
Source code(zip)
v0.4.3(Jun 15, 2022)
Fix

setup: Fix a bug to allow sample reasonable numbers of droplets (ef6f7e4)

main: Fix a bug in main to set default NN number (794ff17)

Documentation

main: Add scanpy as dependency (252a492)

Performance

main: Set a separate batchsize_infer parameter for inference (8727f04)

setup: Add an option of random sampling droplets to speed up calculation (ce042dd)

setup: Enable manupulate large-scale emptydroplets (15f1840)

Source code(tar.gz)
Source code(zip)
v0.4.2(Jun 7, 2022)
Documentation

Update dependencies (784ea63)

Update dependencies (cbf1fc6)

Change background of logo (de267ed)

Update readme (e97dbf1)

Modify scAR_logo (1f6e890)

Update logo (18b51e7)

Performance

Add a setup_anndata method (#54) (923b1e5)

Change sparsity to 1 for scCRISPR-seq and cell indexing (d4b2c3d)

Source code(tar.gz)
Source code(zip)
v0.4.1(May 19, 2022)
What's Changed

Feature

inference: add a round_to_int parameter to round the counts (float) for easy interpretation and better integration into other methods (#47) (902a2b9) (8694239)

Build

setup: replace setup.py with setup.cfg and pyproject.toml (#51) (3dc999a)

Chore

unittest: refactor unittest (#51) (a597c5f)

main: refactor device (#51) (d807404)

Documentation

readthedocs: add scAR_logo image (#51) (c34f362)

tutorials: add ci=None to speed up plotting (#51) (902a2b9)

Contributor

@CaibinSh and @mdmanurung

Full Changelog: https://github.com/Novartis/scar/compare/v0.4.0...v0.4.1
Source code(tar.gz)
Source code(zip)
v0.4.0(May 5, 2022)
Feature

scar.model: Addition of a sparsity parameter (#44) (0c30046)

scar.main: Introduce a sparsity parameter (cd33fdd)

Documentation

Modify Changlog.md (deb920c)

Source code(tar.gz)
Source code(zip)
v0.3.5(May 3, 2022)
Documentation

Delete API.rst (497b080)

Update documentations (5ad9986)

Update documentations (11fa2b8)

Source code(tar.gz)
Source code(zip)
v0.3.4(May 1, 2022)

fix a bug in setup:

importing modules of scar in setup introduces problem. Change it back to exec(open("scar/main/version.py").read())

Source code(tar.gz)
Source code(zip)
v0.3.3(May 1, 2022)
Fix

*: Changelog (b9171a3)

*: Changelog (44a4409)

Documentation

Autodoc command line interface (0efae6c)

Source code(tar.gz)
Source code(zip)
v0.3.2(Apr 29, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.1(Apr 29, 2022)
Fix

*: Addition of semantic releasing (6e83c3d)

Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 27, 2022)
What's Changed

Implementation of readthedocs documentation. Tutorials, installations and API are available.

Code refactoring

Renaming module names, e.g. changing "scAR" -> "scar" Renaming parameter names, e.g.

"scRNAseq_tech" -> "feature_type" "model" -> "count_model" "empty_profile" -> "ambient_profile" ...

Black and Pylint re-formatting the code

New release by @CaibinSh in https://github.com/Novartis/scAR/pull/26

Contributor

@CaibinSh @fgypas @mr-nvs @Tobias-Ternent

Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.3...v0.3.0
Source code(tar.gz)
Source code(zip)
v0.2.3(Apr 20, 2022)
Add integration test

Black formating

Bump version to 0.2.3

Contributors: @fgypas , @mr-nvs and @CaibinSh

What's Changed

Develop by @CaibinSh in https://github.com/Novartis/scAR/pull/19

Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.2...v0.2.3
Source code(tar.gz)
Source code(zip)
v0.2.2(Apr 4, 2022)
v0.2.2

Remove torchaudio

Add test data for integration tests

Bump version to 0.2.2

Contributors: @CaibinSh @fgypas

What's Changed

Remove torchaudio, add test data and bump version to 0.2.2 by @fgypas in https://github.com/Novartis/scAR/pull/15

Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.1-beta...v0.2.2
Source code(tar.gz)
Source code(zip)
v0.2.1-beta(Apr 1, 2022)
fix a typo in scAR-gpu.yml

reorganise init.py files

Contributor: @CaibinSh

What's Changed

Develop by @CaibinSh in https://github.com/Novartis/scAR/pull/12

Full Changelog: https://github.com/Novartis/scAR/compare/v0.2.0-beta...v0.2.1-beta
Source code(tar.gz)
Source code(zip)
v0.2.0-beta(Apr 1, 2022)
Support for training of the model with CPUs

Addition of two yaml files for CPU/GPU installation

Refactor of setup.py and structure of the package

Addition of tests with pytest

Addition of lint checks

Automate build with github actions (install package and run lint checks and pytest)

Update documentation

Version 0.2.0

Co-authored-by: @CaibinSh @mr-nvs @Tobias-Ternent @fgypas

What's Changed

0.2.0-release by @fgypas in https://github.com/Novartis/scAR/pull/11

Full Changelog: https://github.com/Novartis/scAR/commits/v0.2.0-beta
Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://doi.org/10.1101/2022.01.14.476312

Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

Geometrically Adaptive Dictionary Attack on Face Recognition This is the Pytorch code of our paper "Geometrically Adaptive Dictionary Attack on Face R

6 Nov 21, 2022

BoxInst: High-Performance Instance Segmentation with Box Annotations

Introduction This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge, the paper is BoxInst: High-Performan

88 Dec 21, 2022

Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_

26 Jul 22, 2022

PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

Impersonator PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer an

1.7k Jan 06, 2023

PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

This repository is an official PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks". Th

13 Nov 18, 2022

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices Abstract For practical deep neural network design on mobile devices, it is e

11 Dec 30, 2022

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

50 Dec 29, 2022

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Vietnamese sign lagnuage recognition using MHI and CNN This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm

3 Feb 24, 2022

Dataloader tools for language modelling

Installation: pip install lm_dataloader Design Philosophy A library to unify lm dataloading at large scale Simple interface, any tokenizer can be inte

5 Mar 25, 2022

Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

40 Nov 23, 2022

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

MixHop and N-GCN ⠀ A PyTorch implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019)

393 Dec 13, 2022

PyTorch Implementation of CycleGAN and SSGAN for Domain Transfer (Minimal)

MNIST-to-SVHN and SVHN-to-MNIST PyTorch Implementation of CycleGAN and Semi-Supervised GAN for Domain Transfer. Prerequites Python 3.5 PyTorch 0.1.12

401 Dec 30, 2022

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation Zhaoyun Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li

25 Dec 16, 2022

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

13 Nov 21, 2022

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Related tags

Overview

scAR

Table of Contents

Installation

Usage

Dependencies

Resources

License

Reference

Comments

Releases(v0.4.4)

v0.4.4(Aug 9, 2022)

Documentation

Performance

v0.4.3(Jun 15, 2022)

Fix

Documentation

Performance

v0.4.2(Jun 7, 2022)

Documentation

Performance

v0.4.1(May 19, 2022)

What's Changed

Feature

Build

Chore

Documentation

Contributor

v0.4.0(May 5, 2022)

Feature

Documentation

v0.3.5(May 3, 2022)

Documentation

v0.3.4(May 1, 2022)

v0.3.3(May 1, 2022)

Fix

Documentation

v0.3.2(Apr 29, 2022)

v0.3.1(Apr 29, 2022)

Fix

v0.3.0(Apr 27, 2022)

What's Changed

Contributor

v0.2.3(Apr 20, 2022)

What's Changed

v0.2.2(Apr 4, 2022)

What's Changed

v0.2.1-beta(Apr 1, 2022)

What's Changed

v0.2.0-beta(Apr 1, 2022)

What's Changed

Owner

Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

BoxInst: High-Performance Instance Segmentation with Box Annotations

Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

PyTorch implementation of our ICCV 2019 paper: Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Dataloader tools for language modelling

Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

PyTorch Implementation of CycleGAN and SSGAN for Domain Transfer (Minimal)

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Face Recognition plus identification simply and fast | Python

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Contrastive Loss Gradient Attack (CLGA)

Tensorflow implementation for Self-supervised Graph Learning for Recommendation