[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Last update: Nov 09, 2022

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

pip install -r requirements.txt

Dataset Preparation

Given the dataset, please prepare the images paths in a folder named by the dataset with the following folder strcuture.

    flist/dataset_name
        ├── train.flist    # paths of training images
        ├── valid.flist    # paths of validation images
        └── test.flist     # paths of testing images

In this work, we use CelebA-HQ (Download availbale here), Places2 (Download availbale here), ParisStreet View (need author's permission to download)

ImageNet K-means Cluster: The kmeans_centers.npy is downloaded from image-gpt, it's used to quantitize the low-resolution images.

Testing with Pre-trained Models

Download pre-trained models:

CelebA-HQ: BAT ; Upsmapler
Places2: BAT ; Upsmapler
Paris-StreetView: BAT ; Upsmapler

Put the pre-trained model under the checkpoints folder, e.g.

    checkpoints
        ├── celebahq_bat_pretrain
            ├── latest_net_G.pth

Prepare the input images and masks to test.

python bat_sample.py --num_sample [1] --tran_model [bat name] --up_model [upsampler name] --input_dir [dir of input] --mask_dir [dir of mask] --save_dir [dir to save results]

Training New Models

Pretrained VGG model Download from here, move it to models/. This model is used to calculate training loss for the upsampler.

New models can be trained with the following commands.

Prepare dataset. Use --dataroot option to locate the directory of file lists, e.g. ./flist, and specify the dataset name to train with --dataset_name option. Identify the types and mask ratio using --mask_type and --pconv_level options.
Train the transformer.

# To specify your own dataset or settings in the bash file.
bash train_bat.sh

Please note that some of the transformer settings are defined in train_bat.py instead of options/, and this script will take every available gpus for training, please define the GPUs via CUDA_VISIBLE_DEVICES instead of --gpu_ids, which is used for the upsampler.

Train the upsampler.

# To specify your own dataset or settings in the bash file.
bash train_up.sh

The upsampler is typically trained by the low-resolution ground truth, we find that using some samples from the trained BAT might be helpful to improve the performance i.e. PSNR, SSIM. But the sampling process is quite time consuming, training with ground truth also could yield reasonable results.

Citation

If you find this code helpful for your research, please cite our papers.

@inproceedings{yu2021diverse,
  title={Diverse Image Inpainting with Bidirectional and Autoregressive Transformers},
  author={Yu, Yingchen and Zhan, Fangneng and Wu, Rongliang and Pan, Jianxiong and Cui, Kaiwen and Lu, Shijian and Ma, Feiying and Xie, Xuansong and Miao, Chunyan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE and minGPT, we apprecite the authors for sharing their codes.

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

Dataset Preparation

Testing with Pre-trained Models

Training New Models

Citation

Acknowledgments

Owner

Yingchen Yu

Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

【steal piano】GitHub偷情分析工具！

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

Modeling CNN layers activity with Gaussian mixture model

A heterogeneous entity-augmented academic language model based on Open Academic Graph (OAG)

Serve TensorFlow ML models with TF-Serving and then create a Streamlit UI to use them

Rest API Written In Python To Classify NSFW Images.

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

AAAI 2022: Stationary diffusion state neural estimation

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Code reproduce for paper "Vehicle Re-identification with Viewpoint-aware Metric Learning"

TuckER: Tensor Factorization for Knowledge Graph Completion

Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

Unpaired Caricature Generation with Multiple Exaggerations