Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Last update: Sep 26, 2022

Related tags

Deep Learning TE-VQGAN

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Requirements

TBU

Download Dataset

TBU

Training TE-VQGAN (Stage 1)

TBU

Training Bi-directional Image-Text Generator (Stage 2)

TBU

Thanks to

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch. Thanks to all related works!

Citation

@misc{shin2021translationequivariant,
      title={Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation}, 
      author={Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi},
      year={2021},
      eprint={2112.00384},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Related tags

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Requirements

Download Dataset

Training TE-VQGAN (Stage 1)

Training Bi-directional Image-Text Generator (Stage 2)

Thanks to

Citation

Owner

Woncheol Shin

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

A smaller subset of 10 easily classified classes from Imagenet, and a little more French

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

An open framework for Federated Learning.

This library contains a Tensorflow implementation of the paper Stability Analysis of Unfolded WMMSE for Power Allocation

Official repo for the work titled "SharinGAN: Combining Synthetic and Real Data for Unsupervised GeometryEstimation"

A demonstration of using a live Tensorflow session to create an interactive face-GAN explorer.

DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

An essential implementation of BYOL in PyTorch + PyTorch Lightning

Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

Numerical Methods with Python, Numpy and Matplotlib

This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Neighborhood Contrastive Learning for Novel Class Discovery

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

Consecutive-Subsequence - Simple software to calculate susequence with highest sum