PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

Related tags

Deep LearningStoryViz
Overview

Improving Generation and Evaluation of Visual Stories via Semantic Consistency

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency". Link to arXiv paper: https://arxiv.org/abs/2105.10026

Requirements:

This code has been tested on torch==1.7.1 and torchvision==0.8.2

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Training DuCo-StoryGAN:

To train DuCo-StoryGAN, first train the VideoCaptioning model on the PororoSV dataset:
python train_mart.py --data_dir
Default parameters were used to train the model used in our paper.

Next, train the generative model:
python train_gan.py --cfg ./cfg/pororo_s1_duco.yml --data_dir
If training DuCo-StoryGAN on a new dataset, make sure to train the Video Captioning model (see below) before training the GAN. The vocabulary file prepared for the video-captioning model is re-used for generating common input_ids for both models. Change location of video captioning checkpoint in config file.

Unless specified, the default output root directory for all model checkpoints is ./out/

Training Evaluation Models:

  • Video Captioning Model
    The video captioning model trained for DuCo-StoryGAN (see above) is used for evaluation. python train_mart.py --data_dir

  • Hierarchical Deep Multimodal Similarity (H-DAMSM)
    python train_damsm.py --cfg ./cfg/pororo_damsm.yml --data_dir

  • Character Classifier
    python train_classifier.py --data_dir --model_name inception --save_path ./models/inception --batch_size 8 --learning_rate 1e-05

Inference from DuCo-StoryGAN:

Use the following command to infer from trained weights for DuCo-StoryGAN:
python train_gan.py --cfg ./cfg/pororo_s1_duco_eval.yml --data_dir --checkpoint --infer_dir

Download our pretrained checkpoint from here.

Evaluation:

Download the pretrained models for evaluations:
Character Classifier, Video Captioning

Use the following command to evaluate classification accuracy of generated images:
python eval_scripts/eval_classifier.py --image_path --data_dir --model_path --model_name inception --mode

Use the following command to evaluate BLEU Score of generated images:
python eval_scripts/translate.py --batch_size 50 --pred_dir --data_dir --checkpoint_file --eval_mode

Acknowledgements

The code in this repository has been adapted from the MART, StoryGAN and MirrorGAN codebases.

Owner
Adyasha Maharana
Adyasha Maharana
Multi-tool reverse engineering collaboration solution.

CollaRE v0.3 Intorduction CollareRE is a tool for collaborative reverse engineering that aims to allow teams that do need to use more then one tool du

105 Nov 27, 2022
Code for the CVPR2021 workshop paper "Noise Conditional Flow Model for Learning the Super-Resolution Space"

NCSR: Noise Conditional Flow Model for Learning the Super-Resolution Space Official NCSR training PyTorch Code for the CVPR2021 workshop paper "Noise

57 Oct 03, 2022
CSE-519---Project - Job Title Analysis (Project for CSE 519 - Data Science Fundamentals)

A Multifaceted Approach to Job Title Analysis CSE 519 - Data Science Fundamentals Project Description Project consists of three parts: Salary Predicti

Jimit Dholakia 1 Jan 04, 2022
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

Microsoft 344 Dec 29, 2022
TensorFlow implementation of "Attention is all you need (Transformer)"

[TensorFlow 2] Attention is all you need (Transformer) TensorFlow implementation of "Attention is all you need (Transformer)" Dataset The MNIST datase

YeongHyeon Park 4 Jan 05, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
🛠️ Tools for Transformers compression using Lightning ⚡

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

Jules Belveze 66 Dec 11, 2022
NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

NeoDTI NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions (Bioinformatics).

62 Nov 26, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
A python package to perform same transformation to coco-annotation as performed on the image.

coco-transform-util A python package to perform same transformation to coco-annotation as performed on the image. Installation Way 1 $ git clone https

1 Jan 14, 2022
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
Split Variational AutoEncoder

Split-VAE Split Variational AutoEncoder Introduction This repository contains and implemementation of a Split Variational AutoEncoder (SVAE). In a SVA

Andrea Asperti 2 Sep 02, 2022
An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise

45 Dec 08, 2022
Boosted CVaR Classification (NeurIPS 2021)

Boosted CVaR Classification Runtian Zhai, Chen Dan, Arun Sai Suggala, Zico Kolter, Pradeep Ravikumar NeurIPS 2021 Table of Contents Quick Start Train

Runtian Zhai 4 Feb 15, 2022