Diverse Object-Scene Compositions For Zero-Shot Action Recognition

This repository contains the source code for the use of object-scene compositions for zero-shot action recognition.

This repository includes:

object and scene predictions for UCF-101, UCF-Sports, J-HMDB
script to retrieve object and scene predictions for Kinetics
scripts to obtain word and sentence embeddings for all datasets used and for object-scene compositions
script to obtain action predictions from any given action dataset, given the object and scene predictions and the respective action labels

Software used

python 3.8.8
pytorch 1.7.1
numpy 1.19.2
fasttext 0.9.2
sentence-transformers 1.2.0
scikit-learn 0.24.1

Downloading the object and scene predictions for Kinetics

While the action labels and video annotations for Kinetics are already present in the repo, the object and scene predictions need to be retrieved using:

bash kineticsdownload.sh

Obtaining word and sentence embeddings for all datasets

To compute the word and sentence embeddings for all the video and image datasets run:

python getfasttextembs.py; python getbertembs.py

This will additionally compute the embeddings for all object-scene compositions and the similarities between all action labels and objects-scene compositions.

Using the main script

The main script can be run using the default arguments as follows: To compute the word and sentence embeddings for all the video and image datasets run:

python zero-shot-actions.py

There are several flags that can be used. Descriptions for these can be shown by running:

python zero-shot-actions.py --help

Lastly, a helper function to compute results for different datasets and for different flag values is available:

python make_results.py

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Related tags

Overview

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Software used

Downloading the object and scene predictions for Kinetics

Obtaining word and sentence embeddings for all datasets

Using the main script

Owner

Self-Supervised Contrastive Learning of Music Spectrograms

RoboDesk A Multi-Task Reinforcement Learning Benchmark

Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

Dual Attention Network for Scene Segmentation (CVPR2019)

QICK: Quantum Instrumentation Control Kit

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Streamlit App For Product Analysis - Streamlit App For Product Analysis

Code of the paper "Deep Human Dynamics Prior" in ACM MM 2021.

Code for our paper "Interactive Analysis of CNN Robustness"

This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

All supplementary material used by me while TA-ing CS3244: Machine Learning

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective

PyTorch 1.0 inference in C++ on Windows10 platforms

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

An open source Python package for plasma science that is under development

Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation