Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

GitHub Logo

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

  • For PredCLS:
python train.py -mode predcls -datasize large -data_path $DATAPATH 
  • For SGCLS:
python train.py -mode sgcls -datasize large -data_path $DATAPATH 
  • For SGDET:
python train.py -mode sgdet -datasize large -data_path $DATAPATH 

Evaluation

You can evaluate the STTran with test.py.

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Owner
Yuren Cong
Yuren Cong
HyperPose is a library for building high-performance custom pose estimation applications.

HyperPose is a library for building high-performance custom pose estimation applications.

TensorLayer Community 1.2k Jan 04, 2023
Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

MTM This is the official repository of the paper: Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Cla

ICTMCG 13 Sep 17, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 03, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
Codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Contrast and Mix (CoMix) The repository contains the codes for the paper Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Backgroun

Computer Vision and Intelligence Research (CVIR) 13 Dec 10, 2022
Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

shzhang 59 Dec 10, 2022
Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

PGpoints Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021) Hyeontae Son, Young Min Kim Pre

Hyeontae Son 9 Jun 06, 2022
"Learning and Analyzing Generation Order for Undirected Sequence Models" in Findings of EMNLP, 2021

undirected-generation-dev This repo contains the source code of the models described in the following paper "Learning and Analyzing Generation Order f

Yichen Jiang 0 Mar 25, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
Supplementary code for TISMIR paper "Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form"

Sliding-Window Pitch-Class Histograms as a Means of Modeling Musical Form This is supplementary code for the TISMIR paper Sliding-Window Pitch-Class H

1 Nov 27, 2021
The official codes for the ICCV2021 presentation "Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting"

UEPNet (ICCV2021 Poster Presentation) This repository contains codes for the official implementation in PyTorch of UEPNet as described in Uniformity i

Tencent YouTu Research 15 Dec 14, 2022
The code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention.

CrossFormer This repository is the code for our paper CrossFormer: A Versatile Vision Transformer Based on Cross-scale Attention. Introduction Existin

cheerss 238 Jan 06, 2023
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

7 Oct 11, 2022
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
Magic tool for managing internet connection in local network by @zalexdev

Megacut ✂️ A new powerful Python3 tool for managing internet on a local network Installation git clone https://github.com/stryker-project/megacut cd m

Stryker 12 Dec 15, 2022
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT This is the code base for the paper DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. Code in this repository is also available

Castorini 132 Nov 14, 2022
Deep Learning Algorithms for Hedging with Frictions

Deep Learning Algorithms for Hedging with Frictions This repository contains the Forward-Backward Stochastic Differential Equation (FBSDE) solver and

Xiaofei Shi 3 Dec 22, 2022
Temporal-Relational CrossTransformers

Temporal-Relational Cross-Transformers (TRX) This repo contains code for the method introduced in the paper: Temporal-Relational CrossTransformers for

83 Dec 12, 2022