Neural Turing Machines (NTM) - PyTorch Implementation

Overview

PyTorch Neural Turing Machine (NTM)

PyTorch implementation of Neural Turing Machines (NTM).

An NTM is a memory augumented neural network (attached to external memory) where the interactions with the external memory (address, read, write) are done using differentiable transformations. Overall, the network is end-to-end differentiable and thus trainable by a gradient based optimizer.

The NTM is processing input in sequences, much like an LSTM, but with additional benfits: (1) The external memory allows the network to learn algorithmic tasks easier (2) Having larger capacity, without increasing the network's trainable parameters.

The external memory allows the NTM to learn algorithmic tasks, that are much harder for LSTM to learn, and to maintain an internal state much longer than traditional LSTMs.

A PyTorch Implementation

This repository implements a vanilla NTM in a straight forward way. The following architecture is used:

NTM Architecture

Features

  • Batch learning support
  • Numerically stable
  • Flexible head configuration - use X read heads and Y write heads and specify the order of operation
  • copy and repeat-copy experiments agree with the paper

Copy Task

The Copy task tests the NTM's ability to store and recall a long sequence of arbitrary information. The input to the network is a random sequence of bits, ending with a delimiter. The sequence lengths are randomised between 1 to 20.

Training

Training convergence for the copy task using 4 different seeds (see the notebook for details)

NTM Convergence

The following plot shows the cost per sequence length during training. The network was trained with seed=10 and shows fast convergence. Other seeds may not perform as well but should converge in less than 30K iterations.

NTM Convergence

Evaluation

Here is an animated GIF that shows how the model generalize. The model was evaluated after every 500 training samples, using the target sequence shown in the upper part of the image. The bottom part shows the network output at any given training stage.

Copy Task

The following is the same, but with sequence length = 80. Note that the network was trained with sequences of lengths 1 to 20.

Copy Task


Repeat Copy Task

The Repeat Copy task tests whether the NTM can learn a simple nested function, and invoke it by learning to execute a for loop. The input to the network is a random sequence of bits, followed by a delimiter and a scalar value that represents the number of repetitions to output. The number of repetitions, was normalized to have zero mean and variance of one (as in the paper). Both the length of the sequence and the number of repetitions are randomised between 1 to 10.

Training

Training convergence for the repeat-copy task using 4 different seeds (see the notebook for details)

NTM Convergence

Evaluation

The following image shows the input presented to the network, a sequence of bits + delimiter + num-reps scalar. Specifically the sequence length here is eight and the number of repetitions is five.

Repeat Copy Task

And here's the output the network had predicted:

Repeat Copy Task

Here's an animated GIF that shows how the network learns to predict the targets. Specifically, the network was evaluated in each checkpoint saved during training with the same input sequence.

Repeat Copy Task

Installation

The NTM can be used as a reusable module, currently not packaged though.

  1. Clone repository
  2. Install PyTorch
  3. pip install -r requirements.txt

Usage

Execute ./train.py

usage: train.py [-h] [--seed SEED] [--task {copy,repeat-copy}] [-p PARAM]
                [--checkpoint-interval CHECKPOINT_INTERVAL]
                [--checkpoint-path CHECKPOINT_PATH]
                [--report-interval REPORT_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           Seed value for RNGs
  --task {copy,repeat-copy}
                        Choose the task to train (default: copy)
  -p PARAM, --param PARAM
                        Override model params. Example: "-pbatch_size=4
                        -pnum_heads=2"
  --checkpoint-interval CHECKPOINT_INTERVAL
                        Checkpoint interval (default: 1000). Use 0 to disable
                        checkpointing
  --checkpoint-path CHECKPOINT_PATH
                        Path for saving checkpoint data (default: './')
  --report-interval REPORT_INTERVAL
                        Reporting interval
Owner
Guy Zana
I make things, author of Curated Papers
Guy Zana
ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

ICON: Implicit Clothed humans Obtained from Normals Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black CVPR 2022 News 🚩 [2022/04/26] H

Yuliang Xiu 1.1k Jan 04, 2023
This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

AlphaRotate: A Rotation Detection Benchmark using TensorFlow Abstract AlphaRotate is maintained by Xue Yang with Shanghai Jiao Tong University supervi

yangxue 972 Jan 05, 2023
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

ACSC Automatic extrinsic calibration for non-repetitive scanning solid-state LiDAR and camera systems. System Architecture 1. Dependency Tested with U

KINO 192 Dec 13, 2022
Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection

fpn.pytorch Pytorch implementation of Feature Pyramid Network (FPN) for Object Detection Introduction This project inherits the property of our pytorc

Jianwei Yang 912 Dec 21, 2022
Learning to trade under the reinforcement learning framework

Trading Using Q-Learning In this project, I will present an adaptive learning model to trade a single stock under the reinforcement learning framework

Uirá Caiado 470 Nov 28, 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
MARE - Multi-Attribute Relation Extraction

MARE - Multi-Attribute Relation Extraction Repository for the paper submission: #TODO: insert link, when available Environment Tested with Ubuntu 18.0

0 May 11, 2021
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022
This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning It includes /bert, which is the original BERT repos

Mitchell Gordon 11 Nov 15, 2022
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Google Cloud Vertex AI Samples Welcome to the Google Cloud Vertex AI sample repository. Overview The repository contains notebooks and community conte

Google Cloud Platform 560 Dec 31, 2022
Six - a Python 2 and 3 compatibility library

Six is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the g

Benjamin Peterson 919 Dec 28, 2022
MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

MLP-Numpy A simple modular implementation of Multi Layer Perceptron in pure Numpy. I used the Iris dataset from scikit-learn library for the experimen

Soroush Omranpour 1 Jan 01, 2022
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight or group of weights, in order to achieve a given trade-off between model size and accuracy.

Differentiable Model Compression via Pseudo Quantization Noise DiffQ performs differentiable quantization using pseudo quantization noise. It can auto

Facebook Research 145 Dec 30, 2022
Simple-Neural-Network From Scratch in Python

Simple-Neural-Network From Scratch in Python This is a simple Neural Network created without any Machine Learning Libraries. The only dependencies are

Aum Shah 1 Dec 28, 2021
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Tianshou - An elegant PyTorch deep reinforcement learning library.

Tianshou (天授) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on

Tsinghua Machine Learning Group 5.5k Jan 05, 2023
Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

DeltaConv [Paper] [Project page] Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds" by Ru

98 Nov 26, 2022
Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification

This repo holds the codes of our paper: Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification, which is ac

Feng Gao 17 Dec 28, 2022