Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Last update: Jan 23, 2022

Related tags

Deep Learning Video-Captioning

Overview

Video-Captioning

A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video.

Approach

In our framework we use a sequence-to-sequence model to perform video visual relationship predictions where the input is a sequence of video frames and the output is a relation triplet < object1 − relationship − object2 > representing the videos. We extend the sequence-to-sequence modelling approach to an input of sequence of video frames.

Figure: Bidirectional LSTM layer (coloured red) encodes visual feature inputs, and the LSTM layer (coloured green) decodes the features into a sequence of words.

Results

Python Dependencies

Pandas
Keras
Tensorflow
Numpy
albumenations
Pillow

Procedure

Training

For training the model, run the script train.py.

  python train.py

For training on your own dataset: Save your data in a directory (for the format check the data folder). Update the json files.

object1_object2.json: It contains a dictionary for each object, with object labels as keys and ids as values.
relationship.json: It contains a dictionary for each relationship, with relationship labels as keys and ids as values.
training_annotations.json: It contains a dictionary for each video in the training data, with video ids as keys and a list of as values.

While running the script provide your directory path.

  python eval.py --train_data

Testing

For testing the model or making predictions on your own dataset, run the script eval.py.

  python eval.py --test_data

Result will be saved to a csv file 'test_data_predictions.csv'.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Related tags

Overview

Video-Captioning

Approach

Results

Python Dependencies

Procedure

Training

Testing

Owner

Unofficial JAX implementations of Deep Learning models

NLU Dataset Diagnostics

Xview3 solution - XView3 challenge, 2nd place solution

A python tutorial on bayesian modeling techniques (PyMC3)

A keras-based real-time model for medical image segmentation (CFPNet-M)

DIR-GNN - Discovering Invariant Rationales for Graph Neural Networks

Vision Transformer for 3D medical image registration (Pytorch).

ML-based medical imaging using Azure

the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

This implementation contains the application of GPlearn's symbolic transformer on a commodity futures sector of the financial market.

Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

Dialect classification

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Code for "Typilus: Neural Type Hints" PLDI 2020

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

Measure WWjj polarization fraction

Face Recognition plus identification simply and fast | Python

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

Code for "FPS-Net: A convolutional fusion network for large-scale LiDAR point cloud segmentation".