PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Overview

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

Warning: the master branch might collapse. To obtain similar result in README, you can fall back to this commit, but remembered that some ops were not correctly implemented under that commit. Besides, you'd better use a lower learning rate, 1e-4 would be fine.

How to create CelebA-HQ dataset

I borrowed h5tool.py from official code. To create CelebA-HQ dataset, we have to download the original CelebA dataset, and the additional deltas files from here. After that, run

python2 h5tool.py create_celeba_hq file_name_to_save /path/to/celeba_dataset/ /path/to/celeba_hq_deltas

This is what I used on my laptop

python2 h5tool.py create_celeba_hq /Users/yuan/Downloads/CelebA-HQ /Users/yuan/Downloads/CelebA/Original\ CelebA/ /Users/yuan/Downloads/CelebA/CelebA-HQ-Deltas

I found that MD5 checking were always failed, so I just commented out the MD5 checking part(LN 568 and LN 589)

With default setting, it took 1 day on my server. You can specific num_threads and num_tasks for accleration.

Training from scratch

You have to create CelebA-HQ dataset first, please follow the instructions above.

To obtain the similar results in samples directory, see train_no_tanh.py or train.py scipt for details(with default options). Both should work well. For example, you could run

conda create -n pytorch_p36 python=3.6 h5py matplotlib
source activate pytorch_p36
conda install pytorch torchvision -c pytorch
conda install scipy
pip install tensorflow

#0=first gpu, 1=2nd gpu ,2=3rd gpu etc...
python train.py --gpu 0,1,2 --train_kimg 600 --transition_kimg 600 --beta1 0 --beta2 0.99 --gan lsgan --first_resol 4 --target_resol 256 --no_tanh

train_kimg(transition_kimg) means after seeing train_kimg * 1000(transition_kimg * 1000) real images, switching to fade in(stabilize) phase. Currently only support LSGAN and GAN with --no_noise option, since WGAN-GP is unavailable, --drift option does not affect the result. --no_tanh means do not use tanh at generator's output layer.

If you are Python 2 user, You'd better add this to the top of train.py since I use print('something...', file=f) to write experiment settings to file.

from __future__ import print_function

Tensorboard

tensorboard --logdir='./logs'

Update history

  • Update(20171213): Update data.py, now when fading in, real images are weighted combination of current resolution images and 0.5x resolution images. This weighting trick is similar to the one used in Generator's outputs or Discriminator's inputs. This helps stabilize when fading in.

  • Update(20171129): Add restoration mode. Basides, after many trying, I failed to combine BEGAN and PG-GAN. It's removed from the repository.

  • Update(20171124): Now training with CelebA-HQ dataset. Besides, still failing to introduce progressive growing to BEGAN, even with many modifications.

  • Update(20171121): Introduced progressive growing to BEGAN, see train_began.py script. However, experiments showed that it did not work at this moment. Finding bugs and tuning network structure...

  • Update(20171119): Unstable came from resize_activation function, after replacing repeat by torch.nn.functional.upsample, problem solved. And now I believe that both train.py and train_no_tanh should be stable. Restored from 128x128 stabilize, and continued training, currently at 256x256, phase = fade in, temporary results(first 2 columns on the left were generated, and the other 2 columns were taken from dataset):

  • Update(20171118): Making mistake in resize activation function(repeat is not a right in this function), though it's wrong, it's still effective when resolution<256, but collapsed at resolution>=256. Changing it now, scripts will be updated tomorrow. Sorry for this mistake.

  • Update(20171117): 128x128 fade in results(first 2 columns on the left were generated, and the other 2 columns were taken from dataset):

  • Update(20171116): Adding noise only to RGB images might still collapse. Switching to the same trick as the paper suggested. Besides, the paper used linear as activation of G's output layer, which is reasonable, as I observed in the experiments. Temporary results: 64x64, phase=fade in, the left 4 columns are Generated, and the right 4 columns are from real samples(when fading in, instability might occur, for example, the following results is not so promising, however, as the training goes, it gets better), higher resolution will be available soon.

  • Update(20171115): Mode collapse happened when fading in, debugging... => It turns out that unstable seems to be normal when fading in, after some more iterations, it gets better. Now I'm not using the same noise adding trick as the paper suggested, however, it had been implemented, I will test it and plug it into the network.

  • Update(20171114): First version, seems that the generator tends to generate white image. Debugging now. => Fixed some bugs. Now seems normal, training... => There are some unknown problems when fading in, debugging...

  • Update(20171113): Generator and Discriminator: ok, simple test passed.

  • Update(20171112): It's now under reimplementation.

  • Update(20171111): It's still under implementation. I did not care design the structure, and now I had to reimplement(phase='fade in' is hard to implement under current structure). I also fixed some bugs, since reimplementation is needed, I do not plan to pull requests at this moment.

Reference implementation

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.

Yongchun Zhu 81 Dec 29, 2022
Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

BERT-ATTACK Code for our EMNLP2020 long paper: BERT-ATTACK: Adversarial Attack Against BERT Using BERT Dependencies Python 3.7 PyTorch 1.4.0 transform

Linyang Li 142 Jan 04, 2023
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
Robotics with GPU computing

Robotics with GPU computing Cupoch is a library that implements rapid 3D data processing for robotics using CUDA. The goal of this library is to imple

Shirokuma 625 Jan 07, 2023
Official source code to CVPR'20 paper, "When2com: Multi-Agent Perception via Communication Graph Grouping"

When2com: Multi-Agent Perception via Communication Graph Grouping This is the PyTorch implementation of our paper: When2com: Multi-Agent Perception vi

34 Nov 09, 2022
Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

HAABSAStar Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis". This project builds on the code from https://gith

1 Sep 14, 2020
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

AirPose AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation Check the teaser video This repository contains the code of A

Robot Perception Group 41 Dec 05, 2022
DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

DeOldify - A Deep Learning based project for colorizing and restoring old images (and video!)

Jason Antic 15.8k Jan 04, 2023
Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking Part-Aware Measurement for Robust Multi-View Multi-Human 3D P

19 Oct 27, 2022
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

ViViT is a collection of numerical tricks to efficiently access curvature from the generalized Gauss-Newton (GGN) matrix based on its low-rank structure. Provided functionality includes computing

Felix Dangel 12 Dec 08, 2022
A Convolutional Transformer for Keyword Spotting

☢️ Audiomer ☢️ Audiomer: A Convolutional Transformer for Keyword Spotting [ arXiv ] [ Previous SOTA ] [ Model Architecture ] Results on SpeechCommands

49 Jan 27, 2022
Python Algorithm Interview Book Review

파이썬 알고리즘 인터뷰 책 리뷰 리뷰 IT 대기업에 들어가고 싶은 목표가 있다. 내가 꿈꿔온 회사에서 일하는 사람들의 모습을 보면 멋있다고 생각이 들고 나의 목표에 대한 열망이 강해지는 것 같다. 미래의 핵심 사업 중 하나인 SW 부분을 이끌고 발전시키는 우리나라의 I

SharkBSJ 1 Dec 14, 2021
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

Comprehensive Knowledge Distillation with Causal Intervention This repository is a PyTorch implementation of "Comprehensive Knowledge Distillation wit

Xiang Deng 10 Nov 03, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
Multilingual Image Captioning

Multilingual Image Captioning Authors: Bhavitvya Malik, Gunjan Chhablani Demo Link: https://huggingface.co/spaces/flax-community/multilingual-image-ca

Gunjan Chhablani 32 Nov 25, 2022
KGDet: Keypoint-Guided Fashion Detection (AAAI 2021)

KGDet: Keypoint-Guided Fashion Detection (AAAI 2021) This is an official implementation of the AAAI-2021 paper "KGDet: Keypoint-Guided Fashion Detecti

Qian Shenhan 35 Dec 29, 2022