Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Overview

Length-Adaptive Transformer

This is the official Pytorch implementation of Length-Adaptive Transformer. For detailed information about the method, please refer to our paper.

Our code is based on HuggingFace's ( πŸ€— ) Transformers library. Currently, it only supports limited transformers (BERT and DistilBERT) and downstream tasks (SQuAD 1.1 and GLUE benchmark). We will extend it one-by-one to support other transformers and tasks. You can easily apply our method to any other use cases beforehand.

Getting Started

Requirements

  • Python 3
  • PyTorch
  • πŸ€— Transformers
  • torchprofile (to measure FLOPs)

Dataset Preparation

(Standard) Finetuning pretrained transformer

For SQuAD 1.1, use run_squad.py slightly modified from πŸ€— Transformers' question-answering example.

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --evaluate_during_training \
  --save_only_best \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 3.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/standard

For GLUE, use run_glue.py slightly modified from πŸ€— Transformers' text-classification example.

python run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/standard

Training with LengthDrop

Starting from a checkpoint finetuned without Drop-and-Restore, continue finetuning for additional steps with Drop-and-Restore and LengthDrop.

python run_squad.py \
  --model_type bert \
  --model_name_or_path $SQUAD_OUTPUT_DIR/standard/checkpoint-best \
  --do_train \
  --do_eval \
  --evaluate_during_training \
  --save_only_best \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 5.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/length_adaptive \
  --length_adaptive \
  --num_sandwich 2 \
  --length_drop_ratio_bound 0.2 \
  --layer_dropout_prob 0.2 \
python run_glue.py \
  --model_name_or_path $GLUE_OUTPUT_DIR/$TASK_NAME/standard/checkpoint-best \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 5.0 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/length_adaptive
  --length_adaptive \
  --num_sandwich 2 \
  --length_drop_ratio_bound 0.2 \
  --layer_dropout_prob 0.2 \

Evolutionary Search of Length Configurations

After training with LengthDrop, perform an evolutionary search to find length configurations for anytime prediction.

python run_squad.py \
  --model_type bert \
  --model_name_or_path $SQUAD_OUTPUT_DIR/length_adaptive/checkpoint-best \
  --do_search \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_eval_batch_size 32 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/evolutionary_search \
  --evo_iter 30 \
  --mutation_size 30 \
  --crossover_size 30 \
python run_glue.py \
  --model_name_or_path $GLUE_OUTPUT_DIR/$TASK_NAME/length_adaptive/checkpoint-best \
  --task_name $TASK_NAME \
  --do_search \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_eval_batch_size 32 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/evolutionary_search
  --evo_iter 30 \
  --mutation_size 30 \
  --crossover_size 30 \

License

Copyright 2020-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
Clova AI Research
Open source repository of Clova AI Research, NAVER & LINE
Clova AI Research
MakeItTalk: Speaker-Aware Talking-Head Animation

MakeItTalk: Speaker-Aware Talking-Head Animation This is the code repository implementing the paper: MakeItTalk: Speaker-Aware Talking-Head Animation

Adobe Research 285 Jan 08, 2023
Rotary Transformer

[δΈ­ζ–‡|English] Rotary Transformer Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative

325 Jan 03, 2023
Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

Yaoqing Yang 8 Dec 30, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022
Code for our paper: Online Variational Filtering and Parameter Learning

Variational Filtering To run phi learning on linear gaussian (Fig1a) python linear_gaussian_phi_learning.py To run phi and theta learning on linear g

16 Aug 14, 2022
Stochastic Scene-Aware Motion Prediction

Stochastic Scene-Aware Motion Prediction [Project Page] [Paper] Description This repository contains the training code for MotionNet and GoalNet of SA

Mohamed Hassan 31 Dec 09, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
SphereFace: Deep Hypersphere Embedding for Face Recognition

SphereFace: Deep Hypersphere Embedding for Face Recognition By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song License SphereFa

Weiyang Liu 1.5k Dec 29, 2022
Implementation of gaze tracking and demo

Predicting Customer Demand by Using Gaze Detecting and Object Tracking This project is the integration of gaze detecting and object tracking. Predict

2 Oct 20, 2022
TransReID: Transformer-based Object Re-Identification

TransReID: Transformer-based Object Re-Identification [arxiv] The official repository for TransReID: Transformer-based Object Re-Identification achiev

569 Dec 30, 2022
Build Low Code Automated Tensorflow, What-IF explainable models in just 3 lines of code.

Build Low Code Automated Tensorflow explainable models in just 3 lines of code.

Hasan Rafiq 170 Dec 26, 2022
Python scripts for performing stereo depth estimation using the MobileStereoNet model in ONNX

ONNX-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in ONNX Stereo depth estimation on the cone

Ibai Gorordo 23 Nov 29, 2022
This is a collection of our NAS and Vision Transformer work.

AutoML - Neural Architecture Search This is a collection of our AutoML-NAS work iRPE (NEW): Rethinking and Improving Relative Position Encoding for Vi

Microsoft 832 Jan 08, 2023
The pyrelational package offers a flexible workflow to enable active learning with as little change to the models and datasets as possible

pyrelational is a python active learning library developed by Relation Therapeutics for rapidly implementing active learning pipelines from data management, model development (and Bayesian approximat

Relation Therapeutics 95 Dec 27, 2022
Crosslingual Segmental Language Model

Crosslingual Segmental Language Model This repository contains the code from Multilingual unsupervised sequence segmentation transfers to extremely lo

C.M. Downey 1 Jun 13, 2022
Simple, efficient and flexible vision toolbox for mxnet framework.

MXbox: Simple, efficient and flexible vision toolbox for mxnet framework. MXbox is a toolbox aiming to provide a general and simple interface for visi

Ligeng Zhu 31 Oct 19, 2019
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
Sample code from the Neural Networks from Scratch book.

Neural Networks from Scratch (NNFS) book code Code from the NNFS book (https://nnfs.io) separated by chapter.

Harrison 172 Dec 31, 2022
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 06, 2022
Code for Max-Margin Contrastive Learning - AAAI 2022

Max-Margin Contrastive Learning This is a pytorch implementation for the paper Max-Margin Contrastive Learning accepted to AAAI 2022. This repository

Anshul Shah 12 Oct 22, 2022