MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

https://sites.google.com/cornell.edu/recsys2021tutorial

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

Real-Time Social Distance Monitoring tool using Computer Vision

Machine learning library for fast and efficient Gaussian mixture models

Linear algebra python - Number of operations and problems in Linear Algebra and Numerical Linear Algebra

FluxTraining.jl gives you an endlessly extensible training loop for deep learning

Malware Env for OpenAI Gym

The official PyTorch code for NeurIPS 2021 ML4AD Paper, "Does Thermal data make the detection systems more reliable?"

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Dense Prediction Transformers

Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Implements Stacked-RNN in numpy and torch with manual forward and backward functions

This is a Image aid classification software based on python TK library development

一个免费开源一键搭建的通用验证码识别平台，大部分常见的中英数验证码识别都没啥问题。