Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Overview

[AAAI2022] Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

Overall pipeline of OCN.

Paper Link: [arXiv] [AAAI official paper]

If you find our work or the codebase inspiring and useful to your research, please cite

@article{yuan2022OCN_HOI,
  title={Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics},
  author={Yuan, Hangjie and Wang, Mang and Ni, Dong and Xu, Liangpeng},
  journal={arXiv preprint arXiv:2202.00259},
  year={2022}
}

Dataset preparation

1. HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

qpic
 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

2. V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Dependencies and Training

To simplify the steps, we combine the installation of externel dependencies and training into one '.sh' file. You can directly run the codes after rightly preparing the dataset.

# Training on HICO-DET
bash train_hico.sh
# Training on V-COCO
bash train_vcoco.sh

Note that you can refer to the publicly available codebase for the preparation of two datasets.

Pre-trained parameters

OCN uses COCO pretrained models for fair comparisons with previous methods. The pretrained models can be downloaded from DETR repository.

For HICO-DET, you can convert the pre-trained parameters with the following command.

python convert_parameters.py \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE

For V-COCO, you can convert the pre-trained parameters with the following command.

python convert_parameters.py \
        --load_path /PATH/TO/PRETRAIN \
        --save_path /PATH/TO/SAVE \
        --dataset vcoco \

Evaluation

The mAP on HICO-DET under the Full set, Rare set and Non-Rare Set will be reported during the training process. Or you can evaluate the performance using commands below:

python main.py \
    --pretrained /PATH/TO/PRETRAINED_MODEL \
    --output_dir /PATH/TO/OUTPUT \
    --hoi \
    --dataset_file hico \
    --hoi_path /PATH/TO/data/hico_20160224_det \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet101 \
    --num_workers 4 \
    --batch_size 4 \
    --exponential_hyper 1 \
    --exponential_loss \
    --semantic_similar_coef 1 \
    --verb_loss_type focal \
    --semantic_similar \
    --OCN \
    --eval \

The results for the official evaluation of V-COCO must be obtained by the generated pickle file of detection results.

python generate_vcoco_official.py \
        --param_path /PATH/TO/CHECKPOINT \
        --save_path /PATH/TO/SAVE/vcoco.pickle \
        --hoi_path /PATH/TO/VCOCO/data/v-coco \
        --batch_size 4 \
        --OCN \

Then you should run following codes after modifying the path to get the final performance:

python datasets/vsrl_eval.py

Results

Below we present the results and links for downloading corresponding parameters and logs: (The checkpoints can produce higher results than what are reported in the paper.) We will soon update this table.

Owner
A Ph.D. candidate and a realistic idealist.
Pytorch implementation of Learning with Opponent-Learning Awareness

Pytorch implementation of Learning with Opponent-Learning Awareness using DiCE

Alexis David Jacq 82 Sep 15, 2022
A convolutional recurrent neural network for classifying A/B phases in EEG signals recorded for sleep analysis.

CAP-Classification-CRNN A deep learning model based on Inception modules paired with gated recurrent units (GRU) for the classification of CAP phases

Apurva R. Umredkar 2 Nov 25, 2022
Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.

This book was written for you: an aspiring data scientist with a quantitative background, facing down the gauntlet of the interview process in an increasingly competitive field. For most of you, the

4.1k Dec 28, 2022
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

DeepMetaHandles (CVPR2021 Oral) [paper] [animations] DeepMetaHandles is a shape deformation technique. It learns a set of meta-handles for each given

Liu Minghua 73 Dec 15, 2022
Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google 89 Dec 22, 2022
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation Requirements This repository needs mmsegmentation Training To train

Adelaide Intelligent Machines (AIM) Group 7 Sep 12, 2022
E2VID_ROS - E2VID_ROS: E2VID to a real-time system

E2VID_ROS Introduce We extend E2VID to a real-time system. Because Python ROS ca

Robin Shaun 7 Apr 17, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
HGCN: Harmonic Gated Compensation Network For Speech Enhancement

HGCN The official repo of "HGCN: Harmonic Gated Compensation Network For Speech Enhancement", which was accepted at ICASSP2022. How to use step1: Calc

ScorpioMiku 33 Nov 14, 2022
Additional functionality for use with fastai’s medical imaging module

fmi Adding additional functionality to fastai's medical imaging module To learn more about medical imaging using Fastai you can view my blog Install g

14 Oct 31, 2022
Scalable, event-driven, deep-learning-friendly backtesting library

...Minimizing the mean square error on future experience. - Richard S. Sutton BTGym Scalable event-driven RL-friendly backtesting library. Build on

Andrew 922 Dec 27, 2022
Implementation of Continuous Sparsification, a method for pruning and ticket search in deep networks

Continuous Sparsification Implementation of Continuous Sparsification (CS), a method based on l_0 regularization to find sparse neural networks, propo

Pedro Savarese 23 Dec 07, 2022
Commonsense Ability Tests

CATS Commonsense Ability Tests Dataset and script for paper Evaluating Commonsense in Pre-trained Language Models Use making_sense.py to run the exper

XUHUI ZHOU 28 Oct 19, 2022
A program to recognize fruits on pictures or videos using yolov5

Yolov5 Fruits Detector Requirements Either Linux or Windows. We recommend Linux for better performance. Python 3.6+ and PyTorch 1.7+. Installation To

Fateme Zamanian 30 Jan 06, 2023
The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier')

The PyTorch re-implement of a 3D CNN Tracker to extract coronary artery centerlines with state-of-the-art (SOTA) performance. (paper: 'Coronary artery centerline extraction in cardiac CT angiography

James 135 Dec 23, 2022
A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022
Interactive dimensionality reduction for large datasets

BlosSOM 🌼 BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimen

19 Dec 14, 2022
PyTorch implementation of the YOLO (You Only Look Once) v2

PyTorch implementation of the YOLO (You Only Look Once) v2 The YOLOv2 is one of the most popular one-stage object detector. This project adopts PyTorc

申瑞珉 (Ruimin Shen) 433 Nov 24, 2022