Official code repository for the EMNLP 2021 paper

Last update: Dec 19, 2022

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

PyTorch code for the EMNLP 2021 paper "Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization". See the arxiv paper here.

Requirements:

This code has been tested on torch==1.11.0.dev20211014 (nightly) and torchvision==0.12.0.dev20211014 (nightly)

Prepare Repository:

Download the PororoSV dataset and associated files from here and save it as ./data. Download GloVe embeddings (glove.840B.300D) from here. The default location of the embeddings is ./data/ (see ./dcsgan/miscc/config.py).

Extract Constituency Parses:

To install the Berkeley Neural Parser with SpaCy:

pip install benepar

To extract parses for PororoSV:

python parse.py --dataset pororo --data_dir <path-to-data-directory>

Extract Dense Captions:

We use the Dense Captioning Model implementation available here. Download the pretrained model as outlined in their repository. To extract dense captions for PororoSV:
python describe_pororosv.py --config_json <path-to-config> --lut_path <path-to-VG-regions-dict-lite.pkl> --model_checkpoint <path-to-model-checkpoint> --img_path <path-to-data-directory> --box_per_img 10 --batch_size 1

Training VLC-StoryGAN:

To train VLC-StoryGAN for PororoSV:
python train_gan.py --cfg ./cfg/pororo_s1_vlc.yml --data_dir <path-to-data-directory> --dataset pororo\

Unless specified, the default output root directory for all model checkpoints is ./out/

Evaluation Models:

Please see here for evaluation models for character classification-based scores, BLEU2/3 and R-Precision.

To evaluate Frechet Inception Distance (FID):
python eval_vfid --img_ref_dir <path-to-image-directory-original images> --img_gen_dir <path-to-image-directory-generated-images> --mode <mode>

More details coming soon.

Citation:

@inproceedings{maharana2021integrating,
  title={Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization},
  author={Maharana, Adyasha and Bansal, Mohit},
  booktitle={EMNLP},
  year={2021}
}

Official code repository for the EMNLP 2021 paper

Related tags

Overview

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

Requirements:

Prepare Repository:

Extract Constituency Parses:

Extract Dense Captions:

Training VLC-StoryGAN:

Evaluation Models:

Citation:

Owner

Adyasha Maharana

YOLOv5 + ROS2 object detection package

RMTD: Robust Moving Target Defence Against False Data Injection Attacks in Power Grids

PyTorch implementation of "Optimization Planning for 3D ConvNets"

Read number plates with https://platerecognizer.com/

DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.

RaceBERT -- A transformer based model to predict race and ethnicty from names

Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

Portfolio asset allocation strategies: from Markowitz to RNNs

A Loss Function for Generative Neural Networks Based on Watson’s Perceptual Model

Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

A library for graph deep learning research

Winning solution of the Indoor Location & Navigation Kaggle competition

Use MATLAB to simulate the signal and extract features. Use PyTorch to build and train deep network to do spectrum sensing.

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Language-Agnostic Website Embedding and Classification

Learning Neural Painters Fast! using PyTorch and Fast.ai