High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

Overview

Image Completion Transformer (ICT)

Project Page | Paper (ArXiv) | Pre-trained Models | Supplemental Material

This repository is the official pytorch implementation of our ICCV 2021 paper, High-Fidelity Pluralistic Image Completion with Transformers.

Ziyu Wan1, Jingbo Zhang1, Dongdong Chen2, Jing Liao1
1City University of Hong Kong, 2Microsoft Cloud AI

🎈 Prerequisites

  • Python >=3.6
  • PyTorch >=1.6
  • NVIDIA GPU + CUDA cuDNN
pip install -r requirements.txt

To directly inference, first download the pretrained models from Dropbox, then

cd ICT
wget -O ckpts_ICT.zip https://www.dropbox.com/s/cqjgcj0serkbdxd/ckpts_ICT.zip?dl=1
unzip ckpts_ICT.zip

Some tips:

  • Masks should be binarized.
  • The extensions of images and masks should be .png.
  • The model is trained for 256x256 input resolution only.
  • Make sure that the downsampled (32x32 or 48x48) mask could cover all the regions you want to fill. If not, dilate the mask.

🌟 Pipeline

Why transformer?

Compared with traditional CNN-based methods, transformers have better capability in understanding shape and geometry.

🚀 Training

1) Transformer

cd Transformer
python main.py --name [exp_name] --ckpt_path [save_path] \
               --data_path [training_image_path] \
               --validation_path [validation_image_path] \
               --mask_path [mask_path] \
               --BERT --batch_size 64 --train_epoch 100 \
               --nodes 1 --gpus 8 --node_rank 0 \
               --n_layer [transformer_layer #] --n_embd [embedding_dimension] \
               --n_head [head #] --ImageNet --GELU_2 \
               --image_size [input_resolution]

Notes of transformer:

  • --AMP: Reduce the memory cost while training, but sometimes will lead to NAN.
  • --use_ImageFolder: Enable this option while training on ImageNet
  • --random_stroke: Generate the mask on-the-fly.
  • Our code is also ready for training on multiple machines.

2) Guided Upsampling

cd Guided_Upsample
python train.py --model 2 --checkpoints [save_path] \
                --config_file ./config_list/config_template.yml \
                --Generator 4 --use_degradation_2

Notes of guided upsampling:

  • --use_degradation_2: Bilinear downsampling. Try to match the transformer training.
  • --prior_random_degree: Stochastically deviate the sequence elements by K nearest neighbour.
  • Modify the provided config template according to your own training environments.
  • Training the upsample part won't cost many GPUs.

Inference

We provide very covenient and neat script for inference.

python run.py --input_image [test_image_folder] \
              --input_mask [test_mask_folder] \
              --sample_num 1  --save_place [save_path] \
              --ImageNet --visualize_all

Notes of inference:

  • --sample_num: How many completion results do you want?
  • --visualize_all: You could save each output result via disabling this option.
  • --ImageNet --FFHQ --Places2_Nature: You must enable one option to select corresponding ckpts.
  • Please use absolute path.

More results

FFHQ

Places2

ImageNet

To Do

  • Release training code
  • Release testing code
  • Release pre-trained models
  • Add Google Colab

📔 Citation

If you find our work useful for your research, please consider citing the following papers :)

@article{wan2021high,
  title={High-Fidelity Pluralistic Image Completion with Transformers},
  author={Wan, Ziyu and Zhang, Jingbo and Chen, Dongdong and Liao, Jing},
  journal={arXiv preprint arXiv:2103.14031},
  year={2021}
}

The real-world application of image inpainting is also ready! Try and cite our old photo restoration algorithm here.

@inproceedings{wan2020bringing,
title={Bringing Old Photos Back to Life},
author={Wan, Ziyu and Zhang, Bo and Chen, Dongdong and Zhang, Pan and Chen, Dong and Liao, Jing and Wen, Fang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={2747--2757},
year={2020}
}

💡 Acknowledgments

This repo is built upon minGPT and Edge-Connect. We also thank the provided cluster centers from OpenAI.

📨 Contact

This repo is currently maintained by Ziyu Wan (@Raywzy) and is for academic research use only. Discussions and questions are welcome via [email protected].

Owner
Ziyu Wan
Ph.D Student @ City University of Hong Kong
Ziyu Wan
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案 团队:我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一 前言 19年开始,陆陆续续参加了一些比赛,拿到过一些top,比较懒一直都没分享过,这次比较幸运又拿了top1,打算分享下 分类的任务

shaochenjie 131 Dec 31, 2022
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Implementation of ViViT: A Video Vision Transformer

ViViT: A Video Vision Transformer Unofficial implementation of ViViT: A Video Vision Transformer. Notes: This is in WIP. Model 2 is implemented, Model

Rishikesh (ऋषिकेश) 297 Jan 06, 2023
This is a simple plugin for Vim that allows you to use OpenAI Codex.

🤖 Vim Codex An AI plugin that does the work for you. This is a simple plugin for Vim that will allow you to use OpenAI Codex. To use this plugin you

Tom Dörr 195 Dec 28, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
A Python package for time series augmentation

tsaug tsaug is a Python package for time series augmentation. It offers a set of augmentation methods for time series, as well as a simple API to conn

Arundo Analytics 278 Jan 01, 2023
A multi-mode modulator for multi-domain few-shot classification (ICCV)

A multi-mode modulator for multi-domain few-shot classification (ICCV)

Yanbin Liu 8 Apr 28, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 05, 2023
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022
Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

LSF-SAC Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy G

Hanhan 2 Aug 14, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
the official implementation of the paper "Isometric Multi-Shape Matching" (CVPR 2021)

Isometric Multi-Shape Matching (IsoMuSh) Paper-CVF | Paper-arXiv | Video | Code Citation If you find our work useful in your research, please consider

Maolin Gao 9 Jul 17, 2022
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 01, 2023
Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urba

Yu Tian 115 Dec 29, 2022
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

DynaBOA Code repositoty for the paper: Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation Shanyan Guan, Jingwei Xu, Michell

197 Jan 07, 2023
Code examples and benchmarks from the paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective"

Code For the Paper "Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective" Author: Robert Bamler Date: 22 D

4 Nov 02, 2022
Official pytorch implementation of Rainbow Memory (CVPR 2021)

Rainbow Memory: Continual Learning with a Memory of Diverse Samples

Clova AI Research 91 Dec 17, 2022
Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images

Lung Segmentation (2D) Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images. Demo See the application of the

163 Sep 21, 2022
This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Subreddit Analysis This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by

Georgios Chochlakis 1 Dec 17, 2021