This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Last update: Dec 19, 2022

Overview

VAC_CSLR

This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper]

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite We also provide a python version evaluation tool for convenience, but sclite can provide more detailed statistics.
[Optional] SeanNaren/warp-ctc At the beginning of this research, we adopt warp-ctc for supervision, and we recently find that pytorch version CTC can reach similar results.

Data Preparation

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it to ./dataset/phoenix, it is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phienix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python data_preprocess.py --process-image --multiprocessing
```

Inference

We provide the pretrained models for inference, you can download them from:

Backbone	WER on Dev	WER on Test	Pretrained model
ResNet18	21.2%	22.3%	[Baidu] (passwd: qi83) [Dropbox]

To evaluate the pretrained model, run the command below：
python main.py --load-weights resnet18_slr_pretrained.pt --phase test

Training

The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model on phoenix14, run the command below:

python main.py --work-dir PATH_TO_SAVE_RESULTS --config PATH_TO_CONFIG_FILE --device AVAILABLE_GPUS

Feature Extraction

We also provide feature extraction function to extract frame-wise features for other research purpose, which can be achieved by:

python main.py --load-weights PATH_TO_PRETRAINED_MODEL --phase features

To Do List

Pure python implemented evaluation tools.
WAR and WER calculation scripts.

Citation

If you find this repo useful in your research works, please consider citing:

@InProceedings{Min_2021_ICCV,
    author    = {Min, Yuecong and Hao, Aiming and Chai, Xiujuan and Chen, Xilin},
    title     = {Visual Alignment Constraint for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11542-11551}
}

Relevant paper

Self-Mutual Distillation Learning for Continuous Sign Language Recognition[paper]

@InProceedings{Hao_2021_ICCV,
    author    = {Hao, Aiming and Min, Yuecong and Chen, Xilin},
    title     = {Self-Mutual Distillation Learning for Continuous Sign Language Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {11303-11312}
}

Acknowledge

We appreciate the help from Runpeng Cui, Hao Zhou@Rhythmblue and Xinzhe Han@GeraldHan :)

This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

Related tags

Overview

VAC_CSLR

Prerequisites

Data Preparation

Inference

Training

Feature Extraction

To Do List

Citation

Relevant paper

Acknowledge

Owner

Yuecong Min

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

Dynamic Capacity Networks using Tensorflow

Multi-Stage Progressive Image Restoration

The pure and clear PyTorch Distributed Training Framework.

Semi-Supervised Learning, Object Detection, ICCV2021

A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

Construct a neural network frame by Numpy

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Sequence lineage information extracted from RKI sequence data repo

ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

Real-Time Seizure Detection using EEG: A Comprehensive Comparison of Recent Approaches under a Realistic Setting

Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

JudeasRx - graphical app for doing personalized causal medicine using the methods invented by Judea Pearl et al.

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)