Doing the asl sign language classification on static images using graph neural networks.

Last update: Nov 09, 2022

Related tags

Overview

SignLangGNN

When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL sign language classification problem. The twist here is we used the graph generated from the hand images using mediapipe. And the graph I got, I extrated the {x, y, z} co-ordinates of the nodes and also the edge index for the connecteion and translated this image classification problem to a graph classiciation problem.

Project Structure

--------- Data
            |___ CSVs # containing the co-ordinates of per images
            |___ raw
                   |___ train.csv
                   |___ valid.csv
                   |___ test.csv 
            |___ ImageData
                   |___ asl_alphabet_test
                            |___ A/
                            |___ B/ 
                            ....
                            |___ space

                   |___ asl_alphabet_train
            |
            |___ Models # the GNN models
            |___ src
                   |__ dataset.py # pyg custom data
                   |__ train.py   # train loop
                   |__ utils.py   # different utility functions
            |
            |___ main.py # from data to train
            |___ run.py  # real time video visualization

I used PyTorch geometric and PyTorch for the project. To view the results in details head over to the IPYNB folder and see the first IPYNB file. To run this project first clone this repo using this command:

git clone https://github.com/Anindyadeep/SignLangGNN

After that run the main.py using this command. Other things will be managed automatically, provided al,l the essential libraries are installed.

python3 main.py

Initial Results

The traning and validation process went smooth as with a very simple base model it gave an train acc of 0.85 and validation acc of 0.86. It also provided an test acc of 0.84. The model was run for 8 epochs. The model also gets confused with some sort of examples and we can say that it currently suffers from adverserial attacks.

Improvements

These are the improvements we can do with this project:

Improved GNN models. We can make more robust and complex models and improve the performance.
Adding edge features. Some of the edge features like distance between two nodes and the angle between two nodes could produce some potential improvements to the performance of our model.

Future Works

Using Temporal Graph Neural Nets could make more robust and accurate model for this kind of problem. But for that we need temporal data like videos instaed of images, so that we could generate static temporal graphs and compute on them as a dynamic graph sequence problem.

Doing the asl sign language classification on static images using graph neural networks.

Related tags

Overview

SignLangGNN

Initial Results

Improvements

Future Works

Owner

Code for ICDM2020 full paper: "Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning"

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

A custom DeepStack model for detecting 16 human actions.

Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

Code for SALT: Stackelberg Adversarial Regularization, EMNLP 2021.

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

Using PyTorch Perform intent classification using three different models to see which one is better for this task

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Official PyTorch implementation of "Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets" (ICLR 2021)

DiSECt: Differentiable Simulator for Robotic Cutting

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

novel deep learning research works with PaddlePaddle