IOT: Instance-wise Layer Reordering for Transformer Structures

Related tags

Deep LearningIOT
Overview

Introduction

This repository contains the code for Instance-wise Ordered Transformer (IOT), which is introduced in the ICLR2021 paper IOT: Instance-wise Layer Reordering for Transformer Structures.

If you find this work helpful in your research, please cite as:

@inproceedings{
zhu2021iot,
title={{\{}IOT{\}}: Instance-wise Layer Reordering for Transformer Structures},
author={Jinhua Zhu and Lijun Wu and Yingce Xia and Shufang Xie and Tao Qin and Wengang Zhou and Houqiang Li and Tie-Yan Liu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=ipUPfYxWZvM}
}

Requirements and Installation

  • PyTorch version == 1.0.0
  • Python version >= 3.5

To install IOT:

git clone https://github.com/instance-wise-ordered-transformer/IOT
cd IOT
pip install --editable .

Getting Started

Take IWSLT14 De-En translation as an example.

Data Preprocessing

cd examples/translation/
bash prepare-iwslt14.sh
cd ../..

TEXT=examples/translation/iwslt14.tokenized.de-en
python preprocess.py --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir data-bin/iwslt14.tokenized.de-en --joined-dictionary

Training

Encoder order is set to be the default one without reordering (ENCODER_MAX_ORDER=1), since the paper finds that both reordering encoder and decoder is not good as reordering decoder only.

#!/bin/bash
export CUDA_VISIBLE_DEVICES=${1:-0}
nvidia-smi

ENCODER_MAX_ORDER=1
DECODER_MAX_ORDER=3
DECODER_ORDER="0 3 5"
DIVERSITY=0.1
GS_MAX=20
GS_MIN=2
GS_R=0
GS_UF=5000
KL=0.01
CLAMPVAL=0.05

DECODER_ORDER_NAME=`echo $DECODER_ORDER | sed 's/ //g'`
SAVE_DIR=checkpoints/dec_${DECODER_MAX_ORDER}_order_${DECODER_ORDER_NAME}_div_${DIVERSITY}_gsmax_${GS_MAX}_gsmin_${GS_MIN}_gsr_${GS_R}_gsuf_${GS_UF}_kl_${KL}_clampval_${CLAMPVAL}
mkdir -p ${SAVE_DIR}

python -u train.py data-bin/iwslt14.tokenized.de-en -a transformer_iwslt_de_en \
--optimizer adam --lr 0.0005 -s de -t en --label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 \
--min-lr 1e-09 --lr-scheduler inverse_sqrt --weight-decay 0.0001 --criterion label_smoothed_cross_entropy \
--max-update 100000 --warmup-updates 4000 --warmup-init-lr 1e-07 --adam-betas '(0.9,0.98)' \
--save-dir $SAVE_DIR --share-all-embeddings  --gs-clamp --decoder-orders $DECODER_ORDER  \
--encoder-max-order $ENCODER_MAX_ORDER  --decoder-max-order $DECODER_MAX_ORDER  --diversity $DIVERSITY \
--gumbel-softmax-max $GS_MAX  --gumbel-softmax-min $GS_MIN --gumbel-softmax-tau-r $GS_R  --gumbel-softmax-update-freq $GS_UF \
--kl $KL --clamp-value $CLAMPVAL | tee -a ${SAVE_DIR}/train.log

Evaluation

#!/bin/bash
set -x
set -e

pip install -e . --user
export CUDA_VISIBLE_DEVICES=${1:-0}
nvidia-smi

ENCODER_MAX_ORDER=1
DECODER_MAX_ORDER=3
DECODER_ORDER="0 3 5"
DIVERSITY=0.1
GS_MAX=20
GS_MIN=2
GS_R=0
GS_UF=5000
KL=0.01
CLAMPVAL=0.05

DECODER_ORDER_NAME=`echo $DECODER_ORDER | sed 's/ //g'`
SAVE_DIR=checkpoints/dec_${DECODER_MAX_ORDER}_order_${DECODER_ORDER_NAME}_div_${DIVERSITY}_gsmax_${GS_MAX}_gsmin_${GS_MIN}_gsr_${GS_R}_gsuf_${GS_UF}_kl_${KL}_clampval_${CLAMPVAL}

python generate.py data-bin/iwslt14.tokenized.de-en \
  --path $SAVE_DIR/checkpint_best.pt \
  --batch-size 128 --beam 5 --remove-bpe --quiet --num-ckts $DECODER_MAX_ORDER 
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
Deep learning model, heat map, data prepo

deep learning model, heat map, data prepo

Pamela Dekas 1 Jan 14, 2022
Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant self-at

35 Oct 18, 2022
Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Graph-InfoClust-GIC [PAKDD 2021] PAKDD'21 version Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs Preprint version Graph InfoClu

Costas Mavromatis 21 Dec 03, 2022
Recursive Bayesian Networks

Recursive Bayesian Networks This repository contains the code to reproduce the results from the NeurIPS 2021 paper Lieck R, Rohrmeier M (2021) Recursi

Robert Lieck 11 Oct 18, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 04, 2022
Prediction of MBA refinance Index (Mortgage prepayment)

Prediction of MBA refinance Index (Mortgage prepayment) Deep Neural Network based Model The ability to predict mortgage prepayment is of critical use

Ruchil Barya 1 Jan 16, 2022
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Håkon Hukkelås 30 Nov 18, 2022
Image restoration with neural networks but without learning.

Warning! The optimization may not converge on some GPUs. We've personally experienced issues on Tesla V100 and P40 GPUs. When running the code, make s

Dmitry Ulyanov 7.4k Jan 01, 2023
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Stan 229 Dec 29, 2022
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Martin Marek 6 Mar 03, 2022
This repository contains datasets and baselines for benchmarking Chinese text recognition.

Benchmarking-Chinese-Text-Recognition This repository contains datasets and baselines for benchmarking Chinese text recognition. Please see the corres

FudanVI Lab 254 Dec 30, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022
Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user who joins your server.

Discord-Protect Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user wh

Tir Omar 2 Oct 28, 2021
Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

CMKN Implementation of the convolutional motif kernel network (CMKN) introduced in Ditz et al., "Convolutional Motif Kernel Network", 2021. Testing Yo

1 Nov 17, 2021
Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

Qingshan Xu 118 Jan 04, 2023
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

Zongze Wu 267 Dec 30, 2022