CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

Overview

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (CVPR 2021, oral presentation)

teaser

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation
CVPR 2021, oral presentation
Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen

Paper | Slides

Abstract

We present the full-resolution correspondence learning for cross-domain images, which aids image translation. We adopt a hierarchical strategy that uses the correspondence from coarse level to guide the fine levels. At each hierarchy, the correspondence can be efficiently computed via PatchMatch that iteratively leverages the matchings from the neighborhood. Within each PatchMatch iteration, the ConvGRU module is employed to refine the current correspondence considering not only the matchings of larger context but also the historic estimates. The proposed CoCosNet v2, a GRU-assisted PatchMatch approach, is fully differentiable and highly efficient. When jointly trained with image translation, full-resolution semantic correspondence can be established in an unsupervised manner, which in turn facilitates the exemplar-based image translation. Experiments on diverse translation tasks show that CoCosNet v2 performs considerably better than state-of-the-art literature on producing high-resolution images.

Installation

First please install dependencies for the experiment:

pip install -r requirements.txt

We recommend to install Pytorch version after Pytorch 1.6.0 since we made use of automatic mixed precision for accelerating. (we used Pytorch 1.7.0 in our experiments)

Prepare the dataset

First download the Deepfashion dataset (high resolution version) from this link. Note the file name is img_highres.zip. Unzip the file and rename it as img.
If the password is necessary, please contact this link to access the dataset.
We use OpenPose to estimate pose of DeepFashion(HD). We offer the keypoints detection results used in our experiment in this link. Download and unzip the results file.
Since the original resolution of DeepfashionHD is 750x1101, we use a Python script to process the images to the resolution 512x512. You can find the script in data/preprocess.py. Note you need to download our train-val split lists train.txt and val.txt from this link in this step.
Download the train-val lists from this link, and the retrival pair lists from this link. Note train.txt and val.txt are our train-val lists. deepfashion_ref.txt, deepfashion_ref_test.txt and deepfashion_self_pair.txt are the paring lists used in our experiment. Download them all and move below the folder data/.
Finally create the root folder deepfashionHD, and move the folders img and pose below it. Now the the directory structure is like:

deepfashionHD
│
└─── img
│   │
│   └─── MEN
│   │   │   ...
│   │
│   └─── WOMEN
│       │   ...
│   
└─── pose
│   │
│   └─── MEN
│   │   │   ...
│   │
│   └─── WOMEN
│       │   ...

Inference Using Pretrained Model

The inference results are saved in the folder checkpoints/deepfashionHD/test. Download the pretrained model from this link.
Move the models below the folder checkpoints/deepfashionHD. Then run the following command.

python test.py --name deepfashionHD --dataset_mode deepfashionHD --dataroot dataset/deepfashionHD --PONO --PONO_C --no_flip --batchSize 8 --gpu_ids 0 --netCorr NoVGGHPM --nThreads 16 --nef 32 --amp --display_winsize 512 --iteration_count 5 --load_size 512 --crop_size 512

The inference results are saved in the folder checkpoints/deepfashionHD/test.

Training from scratch

Make sure you have prepared the DeepfashionHD dataset as the instruction.
Download the pretrained VGG model from this link, move it to vgg/ folder. We use this model to calculate training loss.

Run the following command for training from scratch.

python train.py --name deepfashionHD --dataset_mode deepfashionHD --dataroot dataset/deepfashionHD --niter 100 --niter_decay 0 --real_reference_probability 0.0 --hard_reference_probability 0.0 --which_perceptual 4_2 --weight_perceptual 0.001 --PONO --PONO_C --vgg_normal_correct --weight_fm_ratio 1.0 --no_flip --video_like --batchSize 16 --gpu_ids 0,1,2,3,4,5,6,7 --netCorr NoVGGHPM --match_kernel 1 --featEnc_kernel 3 --display_freq 500 --print_freq 50 --save_latest_freq 2500 --save_epoch_freq 5 --nThreads 16 --weight_warp_self 500.0 --lr 0.0001 --nef 32 --amp --weight_warp_cycle 1.0 --display_winsize 512 --iteration_count 5 --temperature 0.01 --continue_train --load_size 550 --crop_size 512 --which_epoch 15

Note that --dataroot parameter is your DeepFashionHD dataset root, e.g. dataset/DeepFashionHD.
We use 8 32GB Tesla V100 GPUs to train the network. You can set batchSize to 16, 8 or 4 with fewer GPUs and change gpu_ids.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{zhou2021full,
  title={CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation},
  author={Zhou, Xingran and Zhang, Bo and Zhang, Ting and Zhang, Pan and Bao, Jianmin and Chen, Dong and Zhang, Zhongfei and Wen, Fang},
  booktitle={CVPR},
  year={2021}
}

Acknowledgments

This code borrows heavily from CocosNet and DeepPruner. We also thank SPADE and RAFT.

License

The codes and the pretrained model in this repository are under the MIT license as specified by the LICENSE file.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

Obstacle Tower Challenge using Deep Reinforcement Learning Unity Obstacle Tower is a challenging realistic 3D, third person perspective and procedural

Zhuoyu Feng 5 Feb 10, 2022
Retrieve and analysis data from SDSS (Sloan Digital Sky Survey)

Author: Behrouz Safari License: MIT sdss A python package for retrieving and analysing data from SDSS (Sloan Digital Sky Survey) Installation Install

Behrouz 3 Oct 28, 2022
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
⚓ Eurybia monitor model drift over time and securize model deployment with data validation

View Demo · Documentation · Medium article 🔍 Overview Eurybia is a Python library which aims to help in : Detecting data drift and model drift Valida

MAIF 172 Dec 27, 2022
UIUCTF 2021 Public Challenge Repository

UIUCTF-2021-Public UIUCTF 2021 Public Challenge Repository Notes: every challenge folder contains a challenge.yml file in the format for ctfcli, CTFd'

SIGPwny 15 Nov 03, 2022
R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

dffm The goal of dffm is to provide functionality to apply the methods developed in the paper “Dynamic Factor Model for Functional Time Series: Identi

Sven Otto 3 Dec 09, 2022
Bayesian Optimization Library for Medical Image Segmentation.

bayesmedaug: Bayesian Optimization Library for Medical Image Segmentation. bayesmedaug optimizes your data augmentation hyperparameters for medical im

Şafak Bilici 7 Feb 10, 2022
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
Reinforcement learning models in ViZDoom environment

DoomNet DoomNet is a ViZDoom agent trained by reinforcement learning. The agent is a neural network that outputs a probability of actions given only p

Andrey Kolishchak 126 Dec 09, 2022
Understanding the Effects of Datasets Characteristics on Offline Reinforcement Learning

Understanding the Effects of Datasets Characteristics on Offline Reinforcement Learning Kajetan Schweighofer1, Markus Hofmarcher1, Marius-Constantin D

Institute for Machine Learning, Johannes Kepler University Linz 17 Dec 28, 2022
Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

Stephen James 84 Dec 29, 2022
Codebase for Inducing Causal Structure for Interpretable Neural Networks

Interchange Intervention Training (IIT) Codebase for Inducing Causal Structure for Interpretable Neural Networks Release Notes 12/01/2021: Code and Pa

Zen 6 Oct 10, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

Keras 1.6k Dec 21, 2022
Code for KDD'20 "Generative Pre-Training of Graph Neural Networks"

GPT-GNN: Generative Pre-Training of Graph Neural Networks GPT-GNN is a pre-training framework to initialize GNNs by generative pre-training. It can be

Ziniu Hu 346 Dec 19, 2022
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich c

Hust Visual Learning Team 79 Nov 25, 2022
Code repo for "Transformer on a Diet" paper

Transformer on a Diet Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020). Installation pip insta

cgraywang 31 Sep 26, 2021