Data Augmentation with Variational Autoencoders

Overview



Documentation 	Status Downloads 	Status

Documentation

Pyraug

This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging contexts such as high dimensional and low sample size data.

Installation

To install the library from pypi.org run the following using pip

$ pip install pyraug

or alternatively you can clone the github repo to access to tests, tutorials and scripts.

$ git clone https://github.com/clementchadebec/pyraug.git

and install the library

$ cd pyraug
$ pip install .

Augmenting your Data

In Pyraug, a typical augmentation process is divided into 2 distinct parts:

  1. Train a model using the Pyraug's TrainingPipeline or using the provided scripts/training.py script
  2. Generate new data from a trained model using Pyraug's GenerationPipeline or using the provided scripts/generation.py script

There exist two ways to augment your data pretty straightforwardly using Pyraug's built-in functions.

Using Pyraug's Pipelines

Pyraug provides two pipelines that may be used to either train a model on your own data or generate new data with a pretrained model.

note: These pipelines are independent of the choice of the model and sampler. Hence, they can be used even if you want to access to more advanced features such as defining your own autoencoding architecture.

Launching a model training

To launch a model training, you only need to call a TrainingPipeline instance. In its most basic version the TrainingPipeline can be built without any arguments. This will by default train a RHVAE model with default autoencoding architecture and parameters.

>>> from pyraug.pipelines import TrainingPipeline
>>> pipeline = TrainingPipeline()
>>> pipeline(train_data=dataset_to_augment)

where dataset_to_augment is either a numpy.ndarray, torch.Tensor or a path to a folder where each file is a data (handled data formats are .pt, .nii, .nii.gz, .bmp, .jpg, .jpeg, .png).

More generally, you can instantiate your own model and train it with the TrainingPipeline. For instance, if you want to instantiate a basic RHVAE run:

>>> from pyraug.models import RHVAE
>>> from pyraug.models.rhvae import RHVAEConfig
>>> model_config = RHVAEConfig(
...    input_dim=int(intput_dim)
... ) # input_dim is the shape of a flatten input data
...   # needed if you did not provide your own architectures
>>> model = RHVAE(model_config)

In case you instantiate yourself a model as shown above and you did not provide all the network architectures (encoder, decoder & metric if applicable), the ModelConfig instance will expect you to provide the input dimension of your data which equals to n_channels x height x width x .... Pyraug's VAE models' networks indeed default to Multi Layer Perceptron neural networks which automatically adapt to the input data shape.

note: In case you have different size of data, Pyraug will reshape it to the minimum size min_n_channels x min_height x min_width x ...

Then the TrainingPipeline can be launched by running:

>>> from pyraug.pipelines import TrainingPipeline
>>> pipe = TrainingPipeline(model=model)
>>> pipe(train_data=dataset_to_augment)

At the end of training, the model weights models.pt and model config model_config.json file will be saved in a folder outputs/my_model/training_YYYY-MM-DD_hh-mm-ss/final_model.

Important: For high dimensional data we advice you to provide you own network architectures and potentially adapt the training and model parameters see documentation for more details.

Launching data generation

To launch the data generation process from a trained model, run the following.

>>> from pyraug.pipelines import GenerationPipeline
>>> from pyraug.models import RHVAE
>>> model = RHVAE.load_from_folder('path/to/your/trained/model') # reload the model
>>> pipe = GenerationPipeline(model=model) # define pipeline
>>> pipe(samples_number=10) # This will generate 10 data points

The generated data is in .pt files in dummy_output_dir/generation_YYYY-MM-DD_hh-mm-ss. By default, it stores batch data of a maximum of 500 samples.

Retrieve generated data

Generated data can then be loaded pretty easily by running

>>> import torch
>>> data = torch.load('path/to/generated_data.pt')

Using the provided scripts

Pyraug provides two scripts allowing you to augment your data directly with commandlines.

note: To access to the predefined scripts you should first clone the Pyraug's repository. The following scripts are located in scripts folder. For the time being, only RHVAE model training and generation is handled by the provided scripts. Models will be added as they are implemented in pyraug.models

Launching a model training:

To launch a model training, run

$ python scripts/training.py --path_to_train_data "path/to/your/data/folder" 

The data must be located in path/to/your/data/folder where each input data is a file. Handled image types are .pt, .nii, .nii.gz, .bmp, .jpg, .jpeg, .png. Depending on the usage, other types will be progressively added.

At the end of training, the model weights models.pt and model config model_config.json file will be saved in a folder outputs/my_model_from_script/training_YYYY-MM-DD_hh-mm-ss/final_model.

Launching data generation

Then, to launch the data generation process from a trained model, you only need to run

$ python scripts/generation.py --num_samples 10 --path_to_model_folder 'path/to/your/trained/model/folder' 

The generated data is stored in several .pt files in outputs/my_generated_data_from_script/generation_YYYY-MM-DD_hh_mm_ss. By default, it stores batch data of 500 samples.

Important: In the simplest configuration, default configurations are used in the scripts. You can easily override as explained in documentation. See tutorials for a more in depth example.

Retrieve generated data

Generated data can then be loaded pretty easily by running

>>> import torch
>>> data = torch.load('path/to/generated_data.pt')

Getting your hands on the code

To help you to understand the way Pyraug works and how you can augment your data with this library we also provide tutorials that can be found in examples folder:

Dealing with issues

If you are experiencing any issues while running the code or request new features please open an issue on github

Citing

If you use this library please consider citing us:

@article{chadebec_data_2021,
	title = {Data {Augmentation} in {High} {Dimensional} {Low} {Sample} {Size} {Setting} {Using} a {Geometry}-{Based} {Variational} {Autoencoder}},
	copyright = {All rights reserved},
	journal = {arXiv preprint arXiv:2105.00026},
  	arxiv = {2105.00026},
	author = {Chadebec, Clément and Thibeau-Sutre, Elina and Burgos, Ninon and Allassonnière, Stéphanie},
	year = {2021}
}

Credits

Logo: SaulLu

You might also like...
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-supervised ViT.

MAE for Self-supervised ViT Introduction This is an unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners for self-sup

 An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners
An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners This is a coarse version for MAE, only make the pretrain model, the fine

A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks

A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks without the use of any outside machine learning libraries - all from scratch.

Autoencoders pretraining using clustering

Autoencoders pretraining using clustering

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

This is the official Pytorch implementation of
This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

README This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et a

Comments
  • It takes a long time to train the model

    It takes a long time to train the model

    I am trying to train a RHVAE model for data augmentation and the model starts training but it takes a long time training and do not see any results. I do not know if is an error from my dataset, computer or from the library. Could you help me?

    opened by mikel-hernandezj 2
  • Geodesics computation

    Geodesics computation

    It would be great to have a function to compute geodesics, given a trained model and two points in the latent space.

    The goal would be to allow the exploration of the latent space via geodesics, as visualised in Figure 2 of (Chadebec et al., 2021):

    Screenshot 2021-09-28 at 10 06 34 enhancement 
    opened by Virgiliok 2
  • riemann_tools

    riemann_tools

    Hi,

    In on of your example notebooks (geodesic_computation_example), you import the function Geodesic_autodiff from the package riemann_tools. I cannot find any mention of this package however. Could you perhaps provide some documentation on how to install/import the riemann_tools? Thank you in advance!

    Edit: removing the import solved the problem

    opened by VivienvV 0
Releases(v0.0.6)
Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019) This repository contains python (3.5.2) implementation of

Doyup Lee 222 Dec 21, 2022
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Vaibhav Saxena 35 Nov 06, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
The source code of the paper "Understanding Graph Neural Networks from Graph Signal Denoising Perspectives"

GSDN-F and GSDN-EF This repository provides a reference implementation of GSDN-F and GSDN-EF as described in the paper "Understanding Graph Neural Net

Guoji Fu 18 Nov 14, 2022
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022
Faster Convex Lipschitz Regression

Faster Convex Lipschitz Regression This reepository provides a python implementation of our Faster Convex Lipschitz Regression algorithm with GPU and

Ali Siahkamari 0 Nov 19, 2021
vit for few-shot classification

Few-Shot ViT Requirements PyTorch (= 1.9) TorchVision timm (latest) einops tqdm numpy scikit-learn scipy argparse tensorboardx Pretrained Checkpoints

Martin Dong 26 Nov 30, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
This program will stylize your photos with fast neural style transfer.

Neural Style Transfer (NST) Using TensorFlow Demo TensorFlow TensorFlow is an end-to-end open source platform for machine learning. It has a comprehen

Ismail Boularbah 1 Aug 08, 2022
Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation

FCN_MSCOCO_Food_Segmentation Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation Input data: [http://mscoco.org/dataset/#ove

Alexander Kalinovsky 11 Jan 08, 2019
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
implicit displacement field

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022
This repository compare a selfie with images from identity documents and response if the selfie match.

aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made

1 Jan 27, 2022
RepVGG: Making VGG-style ConvNets Great Again

RepVGG: Making VGG-style ConvNets Great Again (PyTorch) This is a super simple ConvNet architecture that achieves over 80% top-1 accuracy on ImageNet

2.8k Jan 04, 2023
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

The-Secret-Sharing-Schemes This interactive script demonstrates the Secret Shari

Nishaant Goswamy 1 Jan 02, 2022
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Volsdf - Volume Rendering of Neural Implicit Surfaces

Volume Rendering of Neural Implicit Surfaces Project Page | Paper | Data This re

Lior Yariv 221 Jan 07, 2023