Multimodal Reinforcement Learning

JAX implementations of the following multimodal reinforcement learning approaches.

Dual-coding Episodic Memory from "Grounded Language Learning Fast and Slow"

The goal in this setting is for the agent to be presented with multiple objects with made up names following "This is a _____" statements and to then carry out an instruction such as "Move the wazzle to the table." This task requires the agent to learn long-term language and vision representations for concepts like "This is a" and objects that carry over between episodes such as "table" while also being able to learn one-shot representations of novel objects and their names.

Usage

Start by setting up the environment locally by running

poetry install
poetry shell

The learning environment depends on Docker and requires that the Docker Desktop program is running (on Mac). Once that's done you can run the default environment (fast mapping with 3 objects from the paper).

python fast_slow_learning/main.py

Solving reinforcement learning tasks which require language and vision

Related tags

Overview

Multimodal Reinforcement Learning

Usage

Owner

Henry Prior

Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

Code for our paper 'Generalized Category Discovery'

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Miscellaneous and lightweight network tools

Fast methods to work with hydro- and topography data in pure Python.

Demo notebooks for Qiskit application modules demo sessions (Oct 8 & 15):

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Pure python implementations of popular ML algorithms.

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

Unified tracking framework with a single appearance model

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

Learning Chinese Character style with conditional GAN

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Implementation of the Swin Transformer in PyTorch.

Lightweight library to build and train neural networks in Theano

Image Segmentation using U-Net, U-Net with skip connections and M-Net architectures

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)