Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Last update: Dec 23, 2022

Related tags

Deep Learning PanoAVQA

Overview

Pano-AVQA

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

[Paper] [Poster] [Video]

Getting Started

This code is based on following libraries:

python=3.8
pytorch=1.7.0 (with cuda 10.2)

To create virtual environment with all necessary libraries:

conda env create -f environment.yml

By default data should be saved under data/feat/{audio,label,visual} directory and logs (w/ cache, checkpoint) are saved under data/{cache,ckpt,log} directory. Using symbolic link is recommended:

ln -s {path_to_your_data_directory} data

We use single TITAN RTX for training, but GPUs with less memory are still doable with smaller batch size (provided precomputed features).

Dataset

We plan to release the Pano-AVQA dataset public within this year, including Q&A annotation, precomputed features, etc. Please stay tuned!

Model

Training

Default configuration is provided in code/config.py. To run with this configuration:

python cli.py

To run with custom configuration, either modify code/config.py or execute:

python cli.py with {{flags_at_your_disposal}}

Inference

Model weight is saved under ./data/log directory. To run inference only:

python cli.py eval with ckpt_file=../data/log/{experiment}/{ckpt}.pth

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{Yun2021PanoAVQA,
    author = {Yun, Heeseung and Yu, Youngjae and Yang, Wonsuk and Lee, Kangil and Kim, Gunhee},
    title = {Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos},
    booktitle = {ICCV},
    year = {2021}
}

Contact

If you have any inquiries, please don't hesitate to contact us via heeseung.yun at vision.snu.ac.kr.

Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)

Related tags

Overview

Pano-AVQA

[Paper] [Poster] [Video]

Getting Started

Dataset

Model

Training

Inference

Citation

Contact

Owner

Heeseung Yun

Like Dirt-Samples, but cleaned up

Implementation of the federated dual coordinate descent (FedDCD) method.

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

Reproducing Results from A Hybrid Approach to Targeting Social Assistance

Code for our paper: Online Variational Filtering and Parameter Learning

Re-implementation of the vector capsule with dynamic routing

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

The fastai book, published as Jupyter Notebooks

Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

This is a yolo3 implemented via tensorflow 2.7

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Implementation of PyTorch-based multi-task pre-trained models

An Open Source Machine Learning Framework for Everyone

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning