Neural Module Network for VQA in Pytorch

Overview

Neural Module Network (NMN) for VQA in Pytorch

Note: This is NOT an official repository for Neural Module Networks.

NMN is a network that is assembled dynamically by composing shallow network fragments called modules into a deeper structure. These modules are jointly trained to be freely composable. This is a PyTorch implementation of Neural Module Networks for Visual Question Answering. Most Ideas are directly taken from the following paper:

Neural Module Networks: Jacob Andreas, Marcus Rohrbach, Trevor Darrell and Dan Klein. CVPR 2016.

Please cite the above paper in case you use this code in your work. The instructions to reproduce the results can be found below, but first some results demo:

Demo:

More results can be seen with visualize_model.ipynb.

Dependencies:

Following are the main python dependencies of the project: torch, torchvision caffe, matplotlib, numpy, matplotlib and sexpdata.

You also need to have stanford parser available. Once dowloaded, make sure to set STANFORDPARSER in .bashrc so that directory $STANFORDPARSER/libexec/ has stanford-parser.jar

Download Data:

You need to download Images, Annotations and Questions from VQA website. And you need to download VGG model file used to preprocess the images. To save you some efforts of making sure downloaded files are appropriate placed in directory structure, I have prepared few download.txt's'

Run the following command in root directory find . | grep download.txt. You should be able to see the following directories containing download.txt:

./preprocessing/lib/download.txt
./raw_data/Annotations/download.txt
./raw_data/Images/download.txt
./raw_data/Questions/download.txt

Each download.txt has specific instruction with wget command that you need to run in the respective directory. Make sure files are as expected as mentioned in corresponding download.txt after downloading data.

Proprocessing:

preprocessing directory contains the scripts required to preprocess the raw_data. This preprocessed data is stored in preprocessed_data. All scripts in this repository operate on some set. When you download the data, the default sets (directory names) are train2014 and val2014. You can build a question type specific subsets like train2014-sub, val2014-sub by using pick_subset.py. You need to be sure that training / testing / validation set names are consistent in the following scripts (generally set at top of code). By default, everything would work on default sets, but if you need specific set, you need to follow the comments below. You need to run the following scripts in order:

1. python preprocessing/pick_subset.py 	[# Optional: If you want to operate on spcific question-type ]
2. python preprocessing/build_answer_vocab.py         [# Run on your Training Set only]
3. python preprocessing/build_layouts.py              [# Run on your Training Set only]
4. python preprocessing/build_module_input_vocab.py   [# Run on your Training Set only]
5. python preprocessing/extract_image_vgg_features.py [# Run on all Train/ Test / Val Sets]

ToDo: Add setting.py to make sure set-names can be globally configured for experiment.

Run Experiments:

You can start training the model with python train_cmp_nn_vqa.py. The accuracy/loss logs will be piped to logs/cmp_nn_vqa.log. Once training is done, the selected model will be automatically saved at saved_models/cmp_nn_vqa.pt

Visualize Model:

The results can be visualized by running visualize_model.ipynb and selecting model name which was saved.

Evaluate Model:

The model can be evaluated by running python evaluation/evaluate.py. A short summary report should be seen on stdout.

To Do:

  1. Add more documentation
  2. Some more code cleaning
  3. Document results of this implementation on VQA datset
  4. Short blog on implementing NMN in PyTorch

Any Issues?

Please shoot me an email at [email protected]. I will try to fix it as soon as possible.

Owner
Harsh Trivedi
I research in NLP and ML at Stony Brook University
Harsh Trivedi
Classifying cat and dog images using Kaggle dataset

PyTorch Image Classification Classifies an image as containing either a dog or a cat (using Kaggle's public dataset), but could easily be extended to

Robert Coleman 74 Nov 22, 2022
Parasite: a tool allowing you to compress and decompress files, to reduce their size

🦠 Parasite 🦠 Parasite is a tool written in Python3 allowing you to "compress" any file, reducing its size. ⭐ Features ⭐ + Fast + Good optimization,

Billy 30 Nov 25, 2022
SSD-based Object Detection in PyTorch

SSD-based Object Detection in PyTorch 서강대학교 현대모비스 SW 프로그램에서 진행한 인공지능 프로젝트입니다. Jetson nano를 이용해 pre-trained network를 fine tuning시켜 차량 및 신호등 인식을 구현하였습니다

Haneul Kim 1 Nov 16, 2021
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
PyTorch reimplementation of Diffusion Models

PyTorch pretrained Diffusion Models A PyTorch reimplementation of Denoising Diffusion Probabilistic Models with checkpoints converted from the author'

Patrick Esser 265 Jan 01, 2023
Official code for "On the Frequency Bias of Generative Models", NeurIPS 2021

Frequency Bias of Generative Models Generator Testbed Discriminator Testbed This repository contains official code for the paper On the Frequency Bias

35 Nov 01, 2022
Background Matting: The World is Your Green Screen

Background Matting: The World is Your Green Screen By Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steve Seitz, and Ira Kemelmacher-Shlizerman Th

Soumyadip Sengupta 4.6k Jan 04, 2023
Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

Rao Muhammad Umer 6 Nov 14, 2022
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

PhyCRNet Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs Paper link: [ArXiv] By: Pu Ren, Chengping Rao, Yang

Pu Ren 11 Aug 23, 2022
Parameterized Explainer for Graph Neural Network

PGExplainer This is a Tensorflow implementation of the paper: Parameterized Explainer for Graph Neural Network https://arxiv.org/abs/2011.04573 NeurIP

Dongsheng Luo 89 Dec 12, 2022
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Wasi Ahmad 26 Dec 03, 2022
The repo of the preprinting paper "Labels Are Not Perfect: Inferring Spatial Uncertainty in Object Detection"

Inferring Spatial Uncertainty in Object Detection A teaser version of the code for the paper Labels Are Not Perfect: Inferring Spatial Uncertainty in

ZINING WANG 21 Mar 03, 2022
用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本和PARL(paddle)版本

用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learning 用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本、PARL(paddle)版本和pytorch版本

72 Dec 17, 2022
Transformers based fully on MLPs

Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and

Fawaz Sammani 35 Dec 30, 2022
ivadomed is an integrated framework for medical image analysis with deep learning.

Repository on the collaborative IVADO medical imaging project between the Mila and NeuroPoly labs.

144 Dec 19, 2022
Lexical Substitution Framework

LexSubGen Lexical Substitution Framework This repository contains the code to reproduce the results from the paper: Arefyev Nikolay, Sheludko Boris, P

Samsung 37 Sep 15, 2022
AWS documentation corpus for zero-shot open-book question answering.

aws-documentation We present the AWS documentation corpus, an open-book QA dataset, which contains 25,175 documents along with 100 matched questions a

Sia Gholami 2 Jul 07, 2022
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 06, 2022
Code for "Adversarial Attack Generation Empowered by Min-Max Optimization", NeurIPS 2021

Min-Max Adversarial Attacks [Paper] [arXiv] [Video] [Slide] Adversarial Attack Generation Empowered by Min-Max Optimization Jingkang Wang, Tianyun Zha

Jingkang Wang 12 Nov 23, 2022