Automatic Video Captioning Evaluation Metric --- EMScore

Last update: Nov 28, 2022

Related tags

Deep Learning emscore

Overview

Automatic Video Captioning Evaluation Metric --- EMScore

Overview

For an illustration, EMScore can be computed as:

Installation

modify the encode_text() function in CLIP/clip/model.py as follows:

def encode_text(self, text, local=False):
    x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]

    x = x + self.positional_embedding.type(self.dtype)
    x = x.permute(1, 0, 2)  # NLD -> LND
    x = self.transformer(x)
    x = x.permute(1, 0, 2)  # LND -> NLD
    x = self.ln_final(x).type(self.dtype)

    if local:
        x = x @ self.text_projection
    else:
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
  
    return x

Push your modified CLIP to your GitHub.

Install

$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/$Yours_GitHub_name/CLIP

Replace cudatoolkit=11.0 above with the appropriate CUDA version on your machine or cpuonly when installing on a machine without a GPU.

Usage:

A general demo

python demo.py

VATEX-EVAL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1jAfZZKEgkMEYFF2x1mhYo39nH-TNeGm6?usp=sharing

run code

python VATEX-EVAL-demo.py --storage_path $storage_path --use_n_refs 1 --use_feat_cache --use_idf

ActivityNet-FOIL

download the files in the following link, and save at a storage directory

https://drive.google.com/drive/folders/1oY9EJiEi_db_1GH-R33JDqfE8txffKR3?usp=sharing

run code

python ActivityNet-FOIL_demo.py --storage_path $storage_path --use_references --use_idf

Others

if you want extract embeddings by yourself:

python extract_video_embeddings.py --videos_path $your_video_path  --save_path $your_storage_path --backbone 'ViT-B/32'

Automatic Video Captioning Evaluation Metric --- EMScore

Related tags

Overview

Overview

Installation

Usage:

A general demo

VATEX-EVAL

ActivityNet-FOIL

Others

Owner

Yaya Shi

Code and training data for our ECCV 2016 paper on Unsupervised Learning

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches

PyTorch trainer and model for Sequence Classification

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

PyTorch code for training MM-DistillNet for multimodal knowledge distillation

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

A motion detection system with RaspberryPi, OpenCV, Python

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

Official code for UnICORNN (ICML 2021)

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

6D Grasping Policy for Point Clouds