Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Last update: Jun 05, 2022

Related tags

Overview

👁️ Hindsight AI: Crime Classification With Clip

About

For Educational Purposes Only This is a recursive neural net trained to classify specific crime classes based on the UCF-Crime dataset UCF-CRIME or to perform general anomaly detection. The model uses images that have been encoded into the CLIP image embedding space.

Introducing CLIP

The model we are utilizing in our application, CLIP (developed by OpenAI), is a generalized image classification model which can take any image and produce word embeddings for the purpose of matching raw text strings to the contents of the image. The design and training of the model allows for high zero-shot performance in classifying images (i.e. image classification problems outside of the training set). The following image provides a summary of the model (taken from A. Radford et al.):

While typical image classification models train an image feature extractor and a linear classifier to predict a label, CLIP trains an image encoder and text encoder to predict the correct pairings of a batch of (image, text) training examples. At test time the learned text encoder synthesizes a zero-shot linear classifier by embedding the names or descriptions of the target dataset’s classes.

Installation

Clone the repo and the required packages can be found in the required.txt file. Running classifier.py will start an interactive application that will attempt to perform anomaly detection or multi-class classification on videos found in the 'Videos' directory.

The scripts that were used to create the image sequence database from the video files of the UCF-Crime dataset as well as the training scripts and models can be found in the src directory.

Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

Related tags

Overview

👁️ Hindsight AI: Crime Classification With Clip

About

Introducing CLIP

Installation

Owner

Miles Tweed

Vision Transformer for 3D medical image registration (Pytorch).

My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

This is an official implementation for "PlaneRecNet".

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

adversarial_multi_armed_bandit_variable_plays

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, CVPR 2019.

Implementation of average- and worst-case robust flatness measures for adversarial training.

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

4st place solution for the PBVS 2022 Multi-modal Aerial View Object Classification Challenge - Track 1 (SAR) at PBVS2022

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

A python script to lookup Passport Index Dataset

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

This project uses ViT to perform image classification tasks on DATA set CIFAR10.