This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Overview

Skeleton Aware Multi-modal Sign Language Recognition

By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu.

Smile Lab @ Northeastern University

Python 3.7 Packagist Last Commit License: CC0 4.0 PWC


This repo contains the official code of Skeleton Aware Multi-modal Sign Language Recognition (SAM-SLR) that ranked 1st in CVPR 2021 Challenge: Looking at People Large Scale Signer Independent Isolated Sign Language Recognition.

Our paper has been accepted to CVPR21 Workshop. A preprint version is available on arXiv. Please cite our paper if you find this repo useful in your research.

News

[2021/04/10] Our workshop paper has been accepted. Citation info updated.

[2021/03/24] A preprint version of our paper is released here.

[2021/03/20] Our work has been verified and announced by the organizers as the 1st place winner of the challenge!

[2021/03/15] The code is released to public on GitHub.

[2021/03/11] Our team (smilelab2021) ranked 1st in both tracks and here are the links to the leaderboards:

Table of Contents

Data Preparation

Download AUTSL Dataset.

We processed the dataset into six modalities in total: skeleton, skeleton features, rgb frames, flow color, hha and flow depth.

  1. Please put original train, val, test videos in data folder as
    data
    ├── train
    │   ├── signer0_sample1_color.mp4
    │   ├── signer0_sample1_depth.mp4
    │   ├── signer0_sample2_color.mp4
    │   ├── signer0_sample2_depth.mp4
    │   └── ...
    ├── val
    │   └── ...
    └── test
        └── ...
  1. Follow the data_processs/readme.md to process the data.

  2. Use TPose/data_process to extract wholebody pose features.

Requirements and Docker Image

The code is written using Anaconda Python >= 3.6 and Pytorch 1.7 with OpenCV.

Detailed enviroment requirment can be found in requirement.txt in each code folder.

For convenience, we provide a Nvidia docker image to run our code.

Download Docker Image

Pretrained Models

We provide pretrained models for all modalities to reproduce our submitted results. Please download them at and put them into corresponding folders.

Download Pretrained Models

Usage

Reproducing the Results Submitted to CVPR21 Challenge

To test our pretrained model, please put them under each code folders and run the test code as instructed below. To ensemble the tested results and reproduce our final submission. Please copy all the results .pkl files to ensemble/ and follow the instruction to ensemble our final outputs.

For a step-by-step instruction, please see reproduce.md.

Skeleton Keypoints

Skeleton modality can be trained, finetuned and tested using the code in SL-GCN/ folder. Please follow the SL-GCN/readme.md instruction to prepare skeleton data into four streams (joint, bone, joint_motion, bone motion).

Basic usage:

python main.py --config /path/to/config/file

To train, finetune and test our models, please change the config path to corresponding config files. Detailed instruction can be found in SL-GCN/readme.md

Skeleton Feature

For the skeleton feature, we propose a Separable Spatial-Temporal Convolution Network (SSTCN) to capture spatio-temporal information from those features.

Please follow the instruction in SSTCN/readme.txt to prepare the data, train and test the model.

RGB Frames

The RGB frames modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_clip.py

python Sign_Isolated_Conv3D_clip_finetune.py

python Sign_Isolated_Conv3D_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Optical Flow

The RGB optical flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_flow_clip.py

python Sign_Isolated_Conv3D_flow_clip_funtine.py

python Sign_Isolated_Conv3D_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth HHA

The Depth HHA modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_hha_clip_mask.py

python Sign_Isolated_Conv3D_hha_clip_mask_finetune.py

python Sign_Isolated_Conv3D_hha_clip_mask_test.py

Detailed instruction can be found in Conv3D/readme.md

Depth Flow

The Depth Flow modality can be trained, finetuned and tested using the following commands in Conv3D/ folder.

python Sign_Isolated_Conv3D_depth_flow_clip.py

python Sign_Isolated_Conv3D_depth_flow_clip_finetune.py

python Sign_Isolated_Conv3D_depth_flow_clip_test.py

Detailed instruction can be found in Conv3D/readme.md

Model Ensemble

For both RGB and RGBD track, the tested results of all modalities need to be ensemble together to generate the final results.

  1. For RGB track, we use the results from skeleton, skeleton feature, rgb, and flow color modalities to ensemble the final results.

    a. Test the model using newly trained weights or provided pretrained weights.

    b. Copy all the test results to ensemble folder and rename them as their modality names.

    c. Ensemble SL-GCN results from joint, bone, joint motion, bone motion streams in gcn/ .

     python ensemble_wo_val.py; python ensemble_finetune.py
    

    c. Copy test_gcn_w_val_finetune.pkl to ensemble/. Copy RGB, TPose and optical flow results to ensemble/. Ensemble final prediction.

     python ensemble_multimodal_rgb.py
    

    Final predictions are saved in predictions.csv

  2. For RGBD track, we use the results from skeleton, skeleton feature, rgb, flow color, hha and flow depth modalities to ensemble the final results. a. copy hha and flow depth modalities to ensemble/ folder, then

     python ensemble_multimodal_rgb.py
    

To reproduce our results in CVPR21Challenge, we provide .pkl files to ensemble and obtain our final submitted predictions. Detailed instruction can be find in ensemble/readme.md

License

Licensed under the Creative Commons Zero v1.0 Universal license with the following exceptions:

  • The code is released for academic research use only. Commercial use is prohibited.
  • Published versions (changed or unchanged) must include a reference to the origin of the code.

Citation

If you find this project useful in your research, please cite our paper

@inproceedings{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2021}
}

@article{jiang2021skeleton,
  title={Skeleton Aware Multi-modal Sign Language Recognition},
  author={Jiang, Songyao and Sun, Bin and Wang, Lichen and Bai, Yue and Li, Kunpeng and Fu, Yun},
  journal={arXiv preprint arXiv:2103.08833},
  year={2021}
}

Reference

https://github.com/Sun1992/SSTCN-for-SLR

https://github.com/jin-s13/COCO-WholeBody

https://github.com/open-mmlab/mmpose

https://github.com/0aqz0/SLR

https://github.com/kchengiva/DecoupleGCN-DropGraph

https://github.com/HRNet/HRNet-Human-Pose-Estimation

https://github.com/charlesCXK/Depth2HHA

Owner
Isen (Songyao Jiang)
Isen (Songyao Jiang)
RRxIO - Robust Radar Visual/Thermal Inertial Odometry: Robust and accurate state estimation even in challenging visual conditions.

RRxIO - Robust Radar Visual/Thermal Inertial Odometry RRxIO offers robust and accurate state estimation even in challenging visual conditions. RRxIO c

Christopher Doer 64 Dec 29, 2022
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 05, 2022
Code implementation of "Sparsity Probe: Analysis tool for Deep Learning Models"

Sparsity Probe: Analysis tool for Deep Learning Models This repository is a limited implementation of Sparsity Probe: Analysis tool for Deep Learning

3 Jun 09, 2021
PyTorch implementation of adversarial patch

adversarial-patch PyTorch implementation of adversarial patch This is an implementation of the Adversarial Patch paper. Not official and likely to hav

Jamie Hayes 172 Nov 29, 2022
Code for "OctField: Hierarchical Implicit Functions for 3D Modeling (NeurIPS 2021)"

OctField(Jittor): Hierarchical Implicit Functions for 3D Modeling Introduction This repository is code release for OctField: Hierarchical Implicit Fun

55 Dec 08, 2022
Implementation of the GBST block from the Charformer paper, in Pytorch

Charformer - Pytorch Implementation of the GBST (gradient-based subword tokenization) module from the Charformer paper, in Pytorch. The paper proposes

Phil Wang 105 Dec 26, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
Learn about Spice.ai with in-depth samples

Samples Learn about Spice.ai with in-depth samples ServerOps - Learn when to run server maintainance during periods of low load Gardener - Intelligent

Spice.ai 16 Mar 23, 2022
Multi-Joint dynamics with Contact. A general purpose physics simulator.

MuJoCo Physics MuJoCo stands for Multi-Joint dynamics with Contact. It is a general purpose physics engine that aims to facilitate research and develo

DeepMind 5.2k Jan 02, 2023
The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs

catsetmat The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs To be able to run it, add catsetmat to PYTHONPATH H

2 Dec 19, 2022
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
FFCV: Fast Forward Computer Vision (and other ML workloads!)

Fast Forward Computer Vision: train models at a fraction of the cost with accele

FFCV 2.3k Jan 03, 2023
Real Time Object Detection and Classification using Yolo Algorithm.

Real time Object detection & Classification using YOLO algorithm. Real Time Object Detection and Classification using Yolo Algorithm. What is Object D

Ketan Chawla 1 Apr 17, 2022
Tools for manipulating UVs in the Blender viewport.

UV Tool Suite for Blender A set of tools to make editing UVs easier in Blender. These tools can be accessed wither through the Kitfox - UV panel on th

35 Oct 29, 2022
Code and data of the ACL 2021 paper: Few-Shot Text Ranking with Meta Adapted Synthetic Weak Supervision

MetaAdaptRank This repository provides the implementation of meta-learning to reweight synthetic weak supervision data described in the paper Few-Shot

THUNLP 5 Jun 16, 2022
Autoencoders pretraining using clustering

Autoencoders pretraining using clustering

IITiS PAN 2 Dec 16, 2021
Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 27, 2022
A reimplementation of DCGAN in PyTorch

DCGAN in PyTorch A reimplementation of DCGAN in PyTorch. Although there is an abundant source of code and examples found online (as well as an officia

Diego Porres 6 Jan 08, 2022
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

17 Dec 19, 2022