Towards Long-Form Video Understanding

Last update: Dec 26, 2022

Related tags

Deep Learning lvu

Overview

Towards Long-Form Video Understanding

Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021

[Paper] [Project Page] [Dataset]

Citation

@inproceedings{lvu2021,
  Author    = {Chao-Yuan Wu and Philipp Kr\"{a}henb\"{u}hl},
  Title     = {{Towards Long-Form Video Understanding}},
  Booktitle = {{CVPR}},
  Year      = {2021}}

Overview

This repo implements Object Transformers for long-form video understanding.

Getting Started

Please organize data/ as follows

data
|_ ava
|_ features
|_ instance_meta
|_ lvu_1.0

ava, features, and instance_meta could be found at this Google Drive folder. lvu_1.0 can be found at here.

Please also download pre-trained weights at this Google Drive folder and put them in pretrained_models/.

Pre-training

python3 -u run_pretrain.py

This pretrains on a small demo dataset data/instance_meta/instance_meta_pretrain_demo.pkl as an example. Please follow its file format if you'd like to pretrain on a larger dataset (e.g., latest full version of MovieClips).

Training and evaluating on AVA v2.2

python3 -u run_ava.py

This should achieve 31.0 mAP.

Training and evaluating on LVU tasks

python3 -u run.py [1-9]

The argument selects a task to run on. Please see run.py for details.

Acknowledgment

This implementation largely borrows from Huggingface Transformers. Please consider citing it if you use this repo.

Towards Long-Form Video Understanding

Related tags

Overview

Towards Long-Form Video Understanding

[Paper] [Project Page] [Dataset]

Citation

Overview

Getting Started

Pre-training

Training and evaluating on AVA v2.2

Training and evaluating on LVU tasks

Acknowledgment

Owner

Chao-Yuan Wu

The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR.

Locally Differentially Private Distributed Deep Learning via Knowledge Distillation (LDP-DL)

Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》

A Machine Teaching Framework for Scalable Recognition

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

Vanilla and Prototypical Networks with Random Weights for image classification on Omniglot and mini-ImageNet. Made with Python3.

A Light CNN for Deep Face Representation with Noisy Labels

Efficient neural networks for analog audio effect modeling

Face recognize system

SE-MSCNN: A Lightweight Multi-scaled Fusion Network for Sleep Apnea Detection Using Single-Lead ECG Signals

An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects different compression algorithms have.

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

DeepLab is a state-of-art deep learning system for semantic image segmentation built on top of Caffe.

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

A simple approach to emable dense segmentation with ViT.

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

Neural Point-Based Graphics