Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

Overview

merlot_reserve

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

MERLOT Reserve (in submission) is a model for learning joint representations of vision, language, and sound from YouTube. The learned model can be used in a zero-shot or finetuned setting, where it does well on tasks like VCR and TVQA.

Visit our project page at rowanzellers.com/merlotreserve or read the full paper to learn more.

What's here

We are releasing the following:

  • JAX code, and model checkpoints, for the MERLOT model
  • Code for pretraining the model
  • Code for finetuning the model on VCR and TVQA
  • Code for doing zero-shot inference with the model

Environment and setup

There are two different ways to run MERLOT Reserve:

  • Pretraining on videos You'll need a TPU Pod VM for this. This step shouldn't be necessary for most people, as we have released model checkpoints.
  • Finetuning on VCR or TVQA I've done this on a TPU v3-8 VM. This should be possible on GPU(s), but I haven't tested this on such hardware.
  • Zero-shot inference I've ran this on a GPU (even an older, Titan X from 2016 works.)

Installation on a GPU Machine

Install Cuda 11.4 (I used this link) and CUDNN 8.2. You might have to add something like this to your PATH:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Create the environment:

conda create --name mreserve python=3.8 && conda activate mreserve
conda install -y python=3.8 tqdm numpy pyyaml scipy ipython cython typing h5py pandas matplotlib

# Install jax
pip install jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_releases.html
# If doing this on TPUs instead of locally...
# pip install "jax[tpu]>=0.2.18" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

# This is needed sometimes https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp
pip uninstall numpy
pip install numpy==1.19.5

pip install -r requirements.txt

You can then try out the interactive script at demo/demo_video.py. It will handle downloading the model checkpoint for you.

Installation on a Cloud TPU VM

See the instructions in pretrain/ to set up your environment on a TPU v3-8 VM.

Checkpoints

These should get auto-downloaded if you use PretrainedMerlotReserve in mreserve/modeling.py. All are flax checkpoint files:

# pretrained checkpoints
gs://merlotreserve/ckpts/base
gs://merlotreserve/ckpts/base_resadapt
gs://merlotreserve/ckpts/large
gs://merlotreserve/ckpts/large_resadapt

# finetuned checkpoints
gs://merlotreserve/vcr_ckpts/vcr_finetune_base
gs://merlotreserve/vcr_ckpts/vcr_finetune_large

gs://merlotreserve/tvqa_ckpts/tvqa_finetune_base
gs://merlotreserve/tvqa_ckpts/tvqa_finetune_large

# TVQA Data
gs://merlotreserve/finetune_data/tvqa/

# VCR data
gs://merlotreserve/finetune_data/vcr/
Owner
Rowan Zellers
Rowan Zellers
AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614 AquaTimer is a programmable timer for 12V devices such as lighting, solenoid

Stefan Wagner 4 Jun 13, 2022
Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

JumpDetectR Name of QuantLet : JumpDetectR Published in : 'To be published as "Jump dynamics in high frequency crypto markets"' Description : 'Scala

LvB 12 Jan 01, 2023
A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

MouseLand 733 Dec 29, 2022
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
Code for "Multi-Compound Transformer for Accurate Biomedical Image Segmentation"

News The code of MCTrans has been released. if you are interested in contributing to the standardization of the medical image analysis community, plea

97 Jan 05, 2023
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Daniel Povey 41 Jan 07, 2023
Caffe models in TensorFlow

Caffe to TensorFlow Convert Caffe models to TensorFlow. Usage Run convert.py to convert an existing Caffe model to TensorFlow. Make sure you're using

Saumitro Dasgupta 2.8k Dec 31, 2022
Implementation of ProteinBERT in Pytorch

ProteinBERT - Pytorch (wip) Implementation of ProteinBERT in Pytorch. Original Repository Install $ pip install protein-bert-pytorch Usage import torc

Phil Wang 92 Dec 25, 2022
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

7 Aug 16, 2022
multimodal transformer

This repo holds the code to perform experiments with the multimodal autoregressive probabilistic model Transflower. Overview of the repo It is structu

Guillermo Valle 68 Dec 13, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

75 Dec 28, 2022
Think Big, Teach Small: Do Language Models Distil Occam’s Razor?

Think Big, Teach Small: Do Language Models Distil Occam’s Razor? Software related to the paper "Think Big, Teach Small: Do Language Models Distil Occa

0 Dec 07, 2021
Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective

Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective Zhengzhuo Xu, Zenghao Chai, Chun Yuan This is the PyTorch implement

Sincere 16 Dec 15, 2022
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

52 Dec 23, 2022
Optimal space decomposition based-product quantization for approximate nearest neighbor search

Optimal space decomposition based-product quantization for approximate nearest neighbor search Abstract Product quantization(PQ) is an effective neare

Mylove 1 Nov 19, 2021
The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

Deep Residual Fourier Transformation for Single Image Deblurring Xintian Mao, Yiming Liu, Wei Shen, Qingli Li and Yan Wang code will be released soon

145 Dec 13, 2022
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
Source code for our paper "Do Not Trust Prediction Scores for Membership Inference Attacks"

Do Not Trust Prediction Scores for Membership Inference Attacks Abstract: Membership inference attacks (MIAs) aim to determine whether a specific samp

<a href=[email protected]"> 3 Oct 25, 2022
Continual Learning of Long Topic Sequences in Neural Information Retrieval

ContinualPassageRanking Repository for the paper "Continual Learning of Long Topic Sequences in Neural Information Retrieval". In this repository you

0 Apr 12, 2022