Deep Multimodal Neural Architecture Search

Related tags

Deep Learningmmnas
Overview

MMNas: Deep Multimodal Neural Architecture Search

This repository corresponds to the PyTorch implementation of the MMnas for visual question answering (VQA), visual grounding (VGD), and image-text matching (ITM) tasks.

example-image

Prerequisites

Software and Hardware Requirements

You may need a machine with at least 4 GPU (>= 8GB), 50GB memory for VQA and VGD and 150GB for ITM and 50GB free disk space. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

You should first install some necessary packages.

  1. Install Python >= 3.6

  2. Install Cuda >= 9.0 and cuDNN

  3. Install PyTorch >= 0.4.1 with CUDA (Pytorch 1.x is also supported).

  4. Install SpaCy and initialize the GloVe as follows:

    $ pip install -r requirements.txt
    $ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
    $ pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset Preparations

Please follow the instructions in dataset_setup.md to download the datasets and features.

Search

To search an optimal architecture for a specific task, run

$ python3 search_[vqa|vgd|vqa].py

At the end of each searching epoch, we will output the optimal architecture (choosing operators with largest architecture weight for every block) accroding to current architecture weights. When the optimal architecture doesn't change for several continuous epochs, you can kill the searching process manually.

Training

The following script will start training network with the optimal architecture that we've searched by MMNas:

$ python3 train_[vqa|vgd|itm].py --RUN='train' --ARCH_PATH='./arch/train_vqa.json'

To add:

  1. --VERSION=str, e.g.--VERSION='mmnas_vqa' to assign a name for your this model.

  2. --GPU=str, e.g.--GPU='0, 1, 2, 3' to train the model on specified GPU device.

  3. --NW=int, e.g.--NW=8 to accelerate I/O speed.

  1. --RESUME to start training with saved checkpoint parameters.

  2. --ARCH_PATH can use the different searched architectures.

If you want to evaluate an architecture that you got from seaching stage, for example, it's the output architecture at the 50-th searching epoch for vqa model, you can run

$ python3 train_vqa.py --RUN='train' --ARCH_PATH='[PATH_TO_YOUR_SEARCHING_LOG]' --ARCH_EPOCH=50

Validation and Testing

Offline Evaluation

It's convenient to modify follows args: --RUN={'val', 'test'} --CKPT_PATH=[Your Model Path] to Run val or test Split.

Example:

$ python3 train_vqa.py --RUN='test' --CKPT_PATH=[Your Model Path] --ARCH_PATH=[Searched Architecture Path]

Online Evaluation (ONLY FOR VQA)

Test Result files will stored in ./logs/ckpts/result_test/result_train_[Your Version].json

You can upload the obtained result file to Eval AI to evaluate the scores on test-dev and test-std splits.

Pretrained Models

We provide the pretrained models in pretrained_models.md to reproduce the experimental results in our paper.

Citation

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@article{yu2020mmnas,
  title={Deep Multimodal Neural Architecture Search},
  author={Yu, Zhou and Cui, Yuhao and Yu, Jun and Wang, Meng and Tao, Dacheng and Tian, Qi},
  journal={Proceedings of the 28th ACM International Conference on Multimedia},
  pages = {3743--3752},
  year={2020}
}
Owner
Vision and Language Group@ MIL
Hangzhou Dianzi University
Vision and Language Group@ MIL
Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

SmallPebble Project status: experimental, unstable. SmallPebble is a minimal/toy automatic differentiation/deep learning library written from scratch

Sidney Radcliffe 92 Dec 30, 2022
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images

Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images In this paper, we present an effective Dynamic Enhancement Anchor

13 Dec 09, 2022
CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search

CAPITAL: Optimal Subgroup Identification via Constrained Policy Tree Search This repository is the official implementation of CAPITAL: Optimal Subgrou

Hengrui Cai 0 Oct 19, 2021
CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energy Management, 2020, PikaPika team

Citylearn Challenge This is the PyTorch implementation for PikaPika team, CityLearn Challenge Multi-Agent Reinforcement Learning for Intelligent Energ

bigAIdream projects 10 Oct 10, 2022
Python Library for Signal/Image Data Analysis with Transport Methods

PyTransKit Python Transport Based Signal Processing Toolkit Website and documentation: https://pytranskit.readthedocs.io/ Installation The library cou

24 Dec 23, 2022
Local Attention - Flax module for Jax

Local Attention - Flax Autoregressive Local Attention - Flax module for Jax Install $ pip install local-attention-flax Usage from jax import random fr

Phil Wang 16 Jun 16, 2022
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
YKKDetector For Python

YKKDetector OpenCVを利用した機械学習データをもとに、VRChatのスクリーンショットなどからYKKさん(もとい「幽狐族のお姉様」)を検出できるソフトウェアです。 マニュアル こちらから実行環境のセットアップから解説する詳細なマニュアルをご覧いただけます。 ライセンス 本ソフトウェア

あんふぃとらいと 5 Dec 07, 2021
Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

PGDF This repo is the official implementation of our paper "Sample Prior Guided Robust Model Learning to Suppress Noisy Labels ". Citation If you use

CVSM Group - email: <a href=[email protected]"> 22 Dec 23, 2022
Code corresponding to The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents This is the code corresponding to The Introspective

0 Jan 10, 2022
Taichi Course Homework Template

太极图形课S1-标题部分 这个作业未来或将是你的开源项目,标题的内容可以来自作业中的核心关键词,让读者一眼看出你所完成的工作/做出的好玩demo 如果暂时未想好,起名时可以参考“太极图形课S1-xxx作业” 如下是作业(项目)展开说明的方法,可以帮大家理清思路,并且也对读者非常友好,请小伙伴们多多参

TaichiCourse 30 Nov 19, 2022
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

Christoph Reich 23 Sep 21, 2022
Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Second-order Attention Network for Single Image Super-resolution (CVPR-2019) "Second-order Attention Network for Single Image Super-resolution" is pub

516 Dec 28, 2022
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Training neural models with structured signals.

Neural Structured Learning in TensorFlow Neural Structured Learning (NSL) is a new learning paradigm to train neural networks by leveraging structured

955 Jan 02, 2023
Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation This is the implementation of the approach describ

Taosha Fan 47 Nov 15, 2022
Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia

3 Apr 12, 2022
Adaptive Attention Span for Reinforcement Learning

Adaptive Transformers in RL Official implementation of Adaptive Transformers in RL In this work we replicate several results from Stabilizing Transfor

100 Nov 15, 2022