IOT: Instance-wise Layer Reordering for Transformer Structures

Related tags

Deep LearningIOT
Overview

Introduction

This repository contains the code for Instance-wise Ordered Transformer (IOT), which is introduced in the ICLR2021 paper IOT: Instance-wise Layer Reordering for Transformer Structures.

If you find this work helpful in your research, please cite as:

@inproceedings{
zhu2021iot,
title={{\{}IOT{\}}: Instance-wise Layer Reordering for Transformer Structures},
author={Jinhua Zhu and Lijun Wu and Yingce Xia and Shufang Xie and Tao Qin and Wengang Zhou and Houqiang Li and Tie-Yan Liu},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=ipUPfYxWZvM}
}

Requirements and Installation

  • PyTorch version == 1.0.0
  • Python version >= 3.5

To install IOT:

git clone https://github.com/instance-wise-ordered-transformer/IOT
cd IOT
pip install --editable .

Getting Started

Take IWSLT14 De-En translation as an example.

Data Preprocessing

cd examples/translation/
bash prepare-iwslt14.sh
cd ../..

TEXT=examples/translation/iwslt14.tokenized.de-en
python preprocess.py --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir data-bin/iwslt14.tokenized.de-en --joined-dictionary

Training

Encoder order is set to be the default one without reordering (ENCODER_MAX_ORDER=1), since the paper finds that both reordering encoder and decoder is not good as reordering decoder only.

#!/bin/bash
export CUDA_VISIBLE_DEVICES=${1:-0}
nvidia-smi

ENCODER_MAX_ORDER=1
DECODER_MAX_ORDER=3
DECODER_ORDER="0 3 5"
DIVERSITY=0.1
GS_MAX=20
GS_MIN=2
GS_R=0
GS_UF=5000
KL=0.01
CLAMPVAL=0.05

DECODER_ORDER_NAME=`echo $DECODER_ORDER | sed 's/ //g'`
SAVE_DIR=checkpoints/dec_${DECODER_MAX_ORDER}_order_${DECODER_ORDER_NAME}_div_${DIVERSITY}_gsmax_${GS_MAX}_gsmin_${GS_MIN}_gsr_${GS_R}_gsuf_${GS_UF}_kl_${KL}_clampval_${CLAMPVAL}
mkdir -p ${SAVE_DIR}

python -u train.py data-bin/iwslt14.tokenized.de-en -a transformer_iwslt_de_en \
--optimizer adam --lr 0.0005 -s de -t en --label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 \
--min-lr 1e-09 --lr-scheduler inverse_sqrt --weight-decay 0.0001 --criterion label_smoothed_cross_entropy \
--max-update 100000 --warmup-updates 4000 --warmup-init-lr 1e-07 --adam-betas '(0.9,0.98)' \
--save-dir $SAVE_DIR --share-all-embeddings  --gs-clamp --decoder-orders $DECODER_ORDER  \
--encoder-max-order $ENCODER_MAX_ORDER  --decoder-max-order $DECODER_MAX_ORDER  --diversity $DIVERSITY \
--gumbel-softmax-max $GS_MAX  --gumbel-softmax-min $GS_MIN --gumbel-softmax-tau-r $GS_R  --gumbel-softmax-update-freq $GS_UF \
--kl $KL --clamp-value $CLAMPVAL | tee -a ${SAVE_DIR}/train.log

Evaluation

#!/bin/bash
set -x
set -e

pip install -e . --user
export CUDA_VISIBLE_DEVICES=${1:-0}
nvidia-smi

ENCODER_MAX_ORDER=1
DECODER_MAX_ORDER=3
DECODER_ORDER="0 3 5"
DIVERSITY=0.1
GS_MAX=20
GS_MIN=2
GS_R=0
GS_UF=5000
KL=0.01
CLAMPVAL=0.05

DECODER_ORDER_NAME=`echo $DECODER_ORDER | sed 's/ //g'`
SAVE_DIR=checkpoints/dec_${DECODER_MAX_ORDER}_order_${DECODER_ORDER_NAME}_div_${DIVERSITY}_gsmax_${GS_MAX}_gsmin_${GS_MIN}_gsr_${GS_R}_gsuf_${GS_UF}_kl_${KL}_clampval_${CLAMPVAL}

python generate.py data-bin/iwslt14.tokenized.de-en \
  --path $SAVE_DIR/checkpint_best.pt \
  --batch-size 128 --beam 5 --remove-bpe --quiet --num-ckts $DECODER_MAX_ORDER 
Exadel CompreFace is a free and open-source face recognition GitHub project

Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that

Exadel 2.6k Jan 04, 2023
Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."

DialogLM Code for AAAI 2022 paper: DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization. Pre-trained Models We release two ve

Microsoft 92 Dec 19, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
A deep learning based semantic search platform that computes similarity scores between provided query and documents

semanticsearch This is a deep learning based semantic search platform that computes similarity scores between provided query and documents. Documents

1 Nov 30, 2021
(to be released) [NeurIPS'21] Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

Higher-Order Transformers Kim J, Oh S, Hong S, Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs, NeurIPS 2021. [arxiv] W

Jinwoo Kim 44 Dec 28, 2022
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
The dataset of tweets pulling from Twitters with keyword: Hydroxychloroquine, location: US, Time: 2020

HCQ_Tweet_Dataset: FREE to Download. Keywords: HCQ, hydroxychloroquine, tweet, twitter, COVID-19 This dataset is associated with the paper "Understand

2 Mar 16, 2022
AbelNN: Deep Learning Python module from scratch

AbelNN: Deep Learning Python module from scratch I have implemented several neural networks from scratch using only Numpy. I have designed the module

Abel 2 Apr 12, 2022
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Table of Content Introduction Datasets Getting Started Requirements Usage Example Training & Evaluation CPM: Color-Pattern Makeup Transfer CPM is a ho

VinAI Research 248 Dec 13, 2022
MPRNet-Cloud-removal: Progressive cloud removal

MPRNet-Cloud-removal Progressive cloud removal Requirements 1.Pytorch = 1.0 2.Python 3 3.NVIDIA GPU + CUDA 9.0 4.Tensorboard Installation 1.Clone the

Semi 95 Dec 18, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham

61 Dec 26, 2022
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 07, 2022
Graph Neural Networks with Keras and Tensorflow 2.

Welcome to Spektral Spektral is a Python library for graph deep learning, based on the Keras API and TensorFlow 2. The main goal of this project is to

Daniele Grattarola 2.2k Jan 08, 2023
这是一个yolox-keras的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Keras当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤 Ho

Bubbliiiing 64 Nov 10, 2022