Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Last update: Oct 18, 2022

Related tags

Deep Learning MIGCN

Overview

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos

Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Model Pipeline

Usage

Environment Settings

We use the PyTorch framework.

Python version: 3.7.0
PyTorch version: 1.4.0

Get Code

Clone the repository:

git clone https://github.com/zmzhang2000/MIGCN.git
cd MIGCN

Data Preparation

Charades-STA

Download the preprocessed annotations and features of Charades-STA with I3D features.
Save them in data/charades.

ActivityNet

Download the preprocessed annotations of ActivityNet.
Download the C3D features of ActivityNet.
Process the C3D feature according to process_activitynet_c3d() in data/preprocess/preprocess.py.
Save them in data/activitynet.

Pre-trained Models

Download the checkpoints of Charades-STA and ActivityNet.
Save them in checkpoints

Data Generation

We provide the generation procedure of all MIGCN data.

The raw data is listed in data/raw_data/download.sh.
The preprocess code is in data/preprocess.

Training

Train MIGCN on Charades-STA with I3D feature:

python main.py --dataset charades --feature i3d

Train MIGCN on ActivityNet with C3D feature:

python main.py --dataset activitynet --feature c3d

Testing

Test MIGCN on Charades-STA with I3D feature:

python main.py --dataset charades --feature i3d --test --model_load_path checkpoints/$MODEL_CHECKPOINT

Test MIGCN on ActivityNet with C3D feature:

python main.py --dataset activitynet --feature c3d --test --model_load_path checkpoints/$MODEL_CHECKPOINT

Other Hyper-parameters

List other hyper-parameters by:

python main.py -h

Reference

Please cite the following paper if MIGCN is helpful for your research

@ARTICLE{9547801,
  author={Zhang, Zongmeng and Han, Xianjing and Song, Xuemeng and Yan, Yan and Nie, Liqiang},
  journal={IEEE Transactions on Image Processing}, 
  title={Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos}, 
  year={2021},
  volume={30},
  number={},
  pages={8265-8277},
  doi={10.1109/TIP.2021.3113791}}

Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Related tags

Overview

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos

Model Pipeline

Usage

Environment Settings

Get Code

Data Preparation

Charades-STA

ActivityNet

Pre-trained Models

Data Generation

Training

Testing

Other Hyper-parameters

Reference

Owner

Zongmeng Zhang

Saliency - Framework-agnostic implementation for state-of-the-art saliency methods (XRAI, BlurIG, SmoothGrad, and more).

iris - Open Source Photos Platform Powered by PyTorch

PyTorch implementation of Deformable Convolution

A Haskell kernel for IPython.

Robot Servers and Server Manager software for robo-gym

Decision Transformer: A brand new Offline RL Pattern

Finetune alexnet with tensorflow - Code for finetuning AlexNet in TensorFlow >= 1.2rc0

This repository contains demos I made with the Transformers library by HuggingFace.

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Source code for the paper "Periodic Traveling Waves in an Integro-Difference Equation With Non-Monotonic Growth and Strong Allee Effect"

TVNet: Temporal Voting Network for Action Localization

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This project uses Template Matching technique for object detecting by detection of template image over base image.

Python Implementation of algorithms in Graph Mining, e.g., Recommendation, Collaborative Filtering, Community Detection, Spectral Clustering, Modularity Maximization, co-authorship networks.

retweet 4 satoshi ⚡️

Dados coletados e programas desenvolvidos no processo de iniciação científica

Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Code for our paper: Online Variational Filtering and Parameter Learning