πŸ‡°πŸ‡· Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isn’t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captions’ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAI’s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ν•˜μ˜μ—μ„œ 색상은 μŠ€μΉ΄μ΄λΈ”λ£¨μ΄λ‹€. μƒμ˜μ—μ„œ κΈ°μž₯은 둱이닀. 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” 싀크이닀. ν”„λ¦°νŠΈμ—λŠ” 무지이닀. λ„₯라인은 브이λ„₯이닀. 핏은 λ…Έλ©€
Generated Image image
Caption μ•„μš°ν„°λŠ” 색상이 μΉ΄ν‚€ μ†Œμž¬κ°€ 우븐 핏이 루즈인 μ½”νŠΈμ΄λ‹€. ν•˜μ˜λŠ” 색상이 넀이비 μ†Œμž¬κ°€ λ°λ‹˜ 핏이 μŠ€ν‚€λ‹ˆμΈ 청바지이닀.
Generated Image image
Caption ν•˜μ˜μ—μ„œ κΈ°μž₯은 발λͺ©μ΄λ‹€. 색상은 블루이닀. μΉ΄ν…Œκ³ λ¦¬λŠ” μŠ€μ»€νŠΈμ΄λ‹€. μ†Œμž¬μ—λŠ” λ°λ‹˜μ΄λ‹€. 핏은 μ™€μ΄λ“œμ΄λ‹€. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μΉ΄ν…Œκ³ λ¦¬λŠ” λΈ”λΌμš°μŠ€μ΄λ‹€. λ””ν…ŒμΌμ—λŠ” 셔링이닀. μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μ†Œμž¬μ—λŠ” μš°λΈμ΄λ‹€.
Generated Image image
Caption μƒμ˜μ—μ„œ κΈ°μž₯은 노멀이닀. μƒμ˜μ—μ„œ 색상은 ν™”μ΄νŠΈμ΄λ‹€. μƒμ˜μ—μ„œ μ„œλΈŒμƒ‰μƒμ€ λΈ”λž™μ΄λ‹€. μƒμ˜μ—μ„œ μΉ΄ν…Œκ³ λ¦¬λŠ” 티셔츠이닀. μƒμ˜μ—μ„œ μ†Œλ§€κΈ°μž₯은 λ°˜νŒ”μ΄λ‹€. μƒμ˜μ—μ„œ μ†Œμž¬μ—λŠ” 저지이닀. μƒμ˜μ—μ„œ ν”„λ¦°νŠΈμ—λŠ” λ ˆν„°λ§μ΄λ‹€. μƒμ˜μ—μ„œ λ„₯라인은 λΌμš΄λ“œλ„₯이닀. μƒμ˜μ—μ„œ 핏은 λ£¨μ¦ˆμ΄λ‹€.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Models’ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    변경사항

    add) model.py

    ν˜„μˆ˜λ‹˜μ˜ KoCLIP이 DALLE Roberta μ—μ„œ μž‘λ™ν•˜κ²Œλ” μ½”λ“œλ₯Ό μˆ˜μ •ν•œ νŒŒμΌμž…λ‹ˆλ‹€.

    dev branch에 μ‘΄μž¬ν•˜λŠ” model.py λΉ„κ΅ν•˜λ©΄μ„œ μˆ˜μ •μ΄ ν•„μš”ν•©λ‹ˆλ‹€.

    add) generate.ipynb

    KoCLIP이 μž‘λ™ν•˜λŠ”κ²ƒμ„ λ³Ό 수 μžˆλ„λ‘ λ§Œλ“  μ½”λ“œμž…λ‹ˆλ‹€.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    변경사항:

    refactor) clipmodel.py

    • CLIPModel μ΅œμ’… λ²„μ „μœΌλ‘œ μˆ˜μ •
    • clip folder둜 이동

    add) clip/train_clip.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ μ½”λ“œμž…λ‹ˆλ‹€

    add) clip/dataloader.py

    • CLIP λͺ¨λΈ ν•™μŠ΅μ— μ‚¬μš©ν•œ dataloader ν•¨μˆ˜μž…λ‹ˆλ‹€.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    변경사항

    modify) loader.py

    • TextImageDatasetμ—μ„œ texts, imageλ₯Ό 뢈러올 λ•Œ, dataκ°€ 없을 경우 λ°œμƒν•˜λŠ” μ—λŸ¬ 처리
    • skip_sample ν•¨μˆ˜λ₯Ό ν™œμš©ν•˜μ—¬ errorκ°€ λ°œμƒν•  경우, random ν˜Ήμ€ λ‹€μŒ index둜 λ³€ν™˜ν•˜μ—¬ skip
    • κΈ°μ‘΄ train_dalle_gpt_roberta.pyλ₯Ό λ°”νƒ•μœΌλ‘œ μˆ˜μ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
Deep metric learning methods implemented in Chainer

Deep Metric Learning Implementation of several methods for deep metric learning in Chainer v4.2.0. Proxy-NCA: No Fuss Distance Metric Learning using P

ronekko 156 Nov 28, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
How will electric vehicles affect traffic congestion and energy consumption: an integrated modelling approach

EV-charging-impact This repository contains the code that has been used for the Queue modelling for the paper "How will electric vehicles affect traff

7 Nov 30, 2022
Automatic tool focused on deriving metallicities of open clusters

metalcode Automatic tool focused on deriving metallicities of open clusters. Based on the method described in PΓΆhnl & Paunzen (2010, https://ui.adsabs

2 Dec 13, 2021
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

Guy Rosin 12 Jan 06, 2023
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 04, 2023
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

The Ultimate PyTorch Source-Build Template Translations: ν•œκ΅­μ–΄ TL;DR PyTorch built from source can be x4 faster than a naΓ―ve PyTorch install. This repos

Joonhyung Lee/μ΄μ€€ν˜• 651 Dec 12, 2022
GraphGT: Machine Learning Datasets for Graph Generation and Transformation

GraphGT: Machine Learning Datasets for Graph Generation and Transformation Dataset Website | Paper Installation Using pip To install the core environm

y6q9 50 Aug 18, 2022
A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Timbre Dissimilarity Metrics A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API Installation pip install -e . Usag

Ben Hayes 21 Jan 05, 2022
Fastquant - Backtest and optimize your trading strategies with only 3 lines of code!

fastquant πŸ€“ Bringing backtesting to the mainstream fastquant allows you to easily backtest investment strategies with as few as 3 lines of python cod

Lorenzo Ampil 1k Dec 29, 2022
A PyTorch Reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution

TecoGAN-PyTorch Introduction This is a PyTorch reimplementation of TecoGAN: Temporally Coherent GAN for Video Super-Resolution (VSR). Please refer to

165 Dec 17, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 05, 2022
TensorFlow Tutorials with YouTube Videos

TensorFlow Tutorials Original repository on GitHub Original author is Magnus Erik Hvass Pedersen Introduction These tutorials are intended for beginne

9.1k Jan 02, 2023
PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

deep-hist PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation PyT

Winfried LΓΆtzsch 10 Dec 06, 2022
A PyTorch library for Vision Transformers

VFormer A PyTorch library for Vision Transformers Getting Started Read the contributing guidelines in CONTRIBUTING.rst to learn how to start contribut

Society for Artificial Intelligence and Deep Learning 142 Nov 28, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit c

Monash Green AI Lab 51 Dec 10, 2022
Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

T-DNA Source code for the ACL-IJCNLP 2021 paper entitled Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adapta

shizhediao 17 Dec 22, 2022
Fine-tuning StyleGAN2 for Cartoon Face Generation

Cartoon-StyleGAN πŸ™ƒ : Fine-tuning StyleGAN2 for Cartoon Face Generation Abstract Recent studies have shown remarkable success in the unsupervised imag

Jihye Back 520 Jan 04, 2023
Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Introduction This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ( 7 Jan 03, 2023