GluonMM is a library of transformer models for computer vision and multi-modality research

Last update: Dec 02, 2022

Overview

GluonMM

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

Install

First, clone the repository locally,

git clone https://github.com/amazon-research/gluonmm.git

Then install dependencies,

conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python

# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.

Model zoo

Image classification

Video action recognition

VidTr

Usage

For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Acknowledgement

Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.

GluonMM is a library of transformer models for computer vision and multi-modality research

Related tags

Overview

GluonMM

Install

Model zoo

Image classification

Video action recognition

Usage

Security

License

Acknowledgement

Owner

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

This is an easy python software which allows to sort images with faces by gender and after by age.

Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A curated list of awesome Machine Learning frameworks, libraries and software.

PSGAN running with ncnn⚡妆容迁移/仿妆⚡Imitation Makeup/Makeup Transfer⚡

[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

An end-to-end regression problem of predicting the price of properties in Bangalore.

DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors

Code for the IJCAI 2021 paper "Structure Guided Lane Detection"

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Torchyolo - Yolov3 ve Yolov4 modellerin Pytorch uygulamasıdır

Neural Radiance Fields Using PyTorch

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Implement slightly different caffe-segnet in tensorflow

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

a dnn ai project to classify which food people are eating on audio recordings

Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

Make your AirPlay devices as TTS speakers

An end-to-end framework for mixed-integer optimization with data-driven learned constraints.