OMNIVORE is a single vision model for many different visual modalities

Related tags

Deep Learningomnivore
Overview

Omnivore: A Single Model for Many Visual Modalities

PWC PWC PWC PWC PWC

[paper][website]

OMNIVORE is a single vision model for many different visual modalities. It learns to construct representations that are aligned across visual modalities, without requiring training data that specifies correspondences between those modalities. Using OMNIVORE’s shared visual representation, we successfully identify nearest neighbors of left: an image (ImageNet-1K validation set) in vision datasets that contain right: depth maps (ImageNet-1K training set), single-view 3D images (ImageNet-1K training set), and videos (Kinetics-400 validation set).

This repo contains the code to run inference with a pretrained model on an image, video or RGBD image.

Usage

Setup and Installation

conda create --name omnivore python=3.8
conda activate omnivore
conda install pytorch=1.9.0 torchvision=0.10.0 torchaudio=0.9.0 cudatoolkit=11.1 -c pytorch -c nvidia
conda install -c conda-forge -c pytorch -c defaults apex
conda install pytorchvideo

To run the notebook you may also need to install the follwing:

conda install jupyter nb_conda ipykernel
python -m ipykernel install --user --name omnivore

Run Inference

Follow the inference_tutorial.ipynb tutorial locally or Open in Colab for step by step instructions on how to run inference with an image, video and RGBD image.

Model Zoo

Name IN1k Top 1 Kinetics400 Top 1 SUN RGBD Top 1 Model
Omnivore Swin T 81.2 78.9 62.3 weights
Omnivore Swin S 83.4 82.2 64.6 weights
Omnivore Swin B 84.0 83.3 65.4 weights
Omnivore Swin B (IN21k) 85.3 84.0 67.2 weights
Omnivore Swin L (IN21k) 86.0 84.1 67.1 weights

Numbers are based on Table 2. and Table 4. in the Omnivore Paper.

Torch Hub

Models can be loaded via torch hub e.g.

model = torch.hub.load("facebookresearch/omnivore", model="omnivore_swinB")

The class mappings for the datasets can be downloaded as follows:

wget https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json 
wget https://dl.fbaipublicfiles.com/pyslowfast/dataset/class_names/kinetics_classnames.json 
wget https://dl.fbaipublicfiles.com/omnivore/sunrgbd_classnames.json

Citation

If this work is helpful in your research, please consider starring us and citing:

@article{girdhar2022omnivore,
  title={{Omnivore: A Single Model for Many Visual Modalities}},
  author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan},
  journal={arXiv preprint arXiv:2201.08377},
  year={2022}
}

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more information.

License

Omnivore is released under the CC-BY-NC 4.0 license. See LICENSE for additional details. However the Swin Transformer implementation is additionally licensed under the Apache 2.0 license (see NOTICE for additional details).

Owner
Meta Research
Meta Research
PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

This is a PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR), using subpixel convolution to optimize the inference speed of TecoGAN VSR model. Please refer to the offi

789 Jan 04, 2023
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
State of the Art Neural Networks for Generative Deep Learning

pyradox-generative State of the Art Neural Networks for Generative Deep Learning Table of Contents pyradox-generative Table of Contents Installation U

Ritvik Rastogi 8 Sep 29, 2022
Generic ecosystem for feature extraction from aerial and satellite imagery

Note: Robosat is neither maintained not actively developed any longer by Mapbox. See this issue. The main developers (@daniel-j-h, @bkowshik) are no l

Mapbox 1.9k Jan 06, 2023
✔️ Visual, reactive testing library for Julia. Time machine included.

PlutoTest.jl (alpha release) Visual, reactive testing library for Julia A macro @test that you can use to verify your code's correctness. But instead

Pluto 68 Dec 20, 2022
[ICCV 2021] Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

ADDS-DepthNet This is the official implementation of the paper Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation I

LIU_LINA 52 Nov 24, 2022
Implementation of Nalbach et al. 2017 paper.

Deep Shading Convolutional Neural Networks for Screen-Space Shading Our project is based on Nalbach et al. 2017 paper. In this project, a set of buffe

Marcel Santana 17 Sep 08, 2022
Use your Philips Hue lights as Racing Flags. Works with Assetto Corsa, Assetto Corsa Competizione and iRacing.

phue-racing-flags Use your Philips Hue lights as Racing Flags. Explore the docs » Report Bug · Request Feature Table of Contents About The Project Bui

50 Sep 03, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
An open-source outlier detection package by Getcontact Data Team

pyfbad The pyfbad library supports anomaly detection projects. An end-to-end anomaly detection application can be written using the source codes of th

Teknasyon Tech 41 Dec 27, 2022
DIT is a DTLS MitM proxy implemented in Python 3. It can intercept, manipulate and suppress datagrams between two DTLS endpoints and supports psk-based and certificate-based authentication schemes (RSA + ECC).

DIT - DTLS Interception Tool DIT is a MitM proxy tool to intercept DTLS traffic. It can intercept, manipulate and/or suppress DTLS datagrams between t

52 Nov 30, 2022
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022
DIR-GNN - Discovering Invariant Rationales for Graph Neural Networks

DIR-GNN "Discovering Invariant Rationales for Graph Neural Networks" (ICLR 2022)

Ying-Xin (Shirley) Wu 70 Nov 13, 2022
Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

Easy Few-Shot Learning Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification. This repository is made for you

Sicara 399 Jan 08, 2023
Implementation of the Remixer Block from the Remixer paper, in Pytorch

Remixer - Pytorch Implementation of the Remixer Block from the Remixer paper, in Pytorch. It claims that substituting the feedforwards in transformers

Phil Wang 35 Aug 23, 2022
This repository contains demos I made with the Transformers library by HuggingFace.

Transformers-Tutorials Hi there! This repository contains demos I made with the Transformers library by 🤗 HuggingFace. Currently, all of them are imp

3.5k Jan 01, 2023
A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

Miles Zhang 54 Dec 21, 2022
The official implementation of Equalization Loss v1 & v2 (CVPR 2020, 2021) based on MMDetection.

The Equalization Losses for Long-tailed Object Detection and Instance Segmentation This repo is official implementation CVPR 2021 paper: Equalization

Jingru Tan 129 Dec 16, 2022
A simple and useful implementation of LPIPS.

lpips-pytorch Description Developing perceptual distance metrics is a major topic in recent image processing problems. LPIPS[1] is a state-of-the-art

So Uchida 121 Dec 24, 2022
Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Xiangyin Kong 7 Nov 08, 2022