PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install 

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

  • To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):

    bash scripts/hyperformer++.sh
    
  • To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):

    bash scripts/hyperformer.sh
    
  • To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):

    bash scripts/adapters_dagger.sh   
    
  • To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):

    bash scripts/adapters.sh 
    
  • To run T5 finetuning model in a multi-task learning setup:

    bash scripts/finetune.sh
    
  • To run T5 finetuning model in a single-task learning setup:

    bash scripts/finetune_single_task.sh
    

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

Owner
Rabeeh Karimi Mahabadi
PhD student in NLP, working on representation learning and textual entailment
Rabeeh Karimi Mahabadi
Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

Mark Fairbanks 144 Jan 08, 2023
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022
PyDeepFakeDet is an integrated and scalable tool for Deepfake detection.

PyDeepFakeDet An integrated and scalable library for Deepfake detection research. Introduction PyDeepFakeDet is an integrated and scalable Deepfake de

Junke, Wang 49 Dec 11, 2022
Benchmark library for high-dimensional HPO of black-box models based on Weighted Lasso regression

LassoBench LassoBench is a library for high-dimensional hyperparameter optimization benchmarks based on Weighted Lasso regression. Note: LassoBench is

Kenan Šehić 5 Mar 15, 2022
realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

Ar-Ray 2 Mar 22, 2022
PyTorch implementation of CloudWalk's recent work DenseBody

densebody_pytorch PyTorch implementation of CloudWalk's recent paper DenseBody. Note: For most recent updates, please check out the dev branch. Update

Lingbo Yang 401 Nov 19, 2022
[ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing

NeRFlow [ICCV'21] Neural Radiance Flow for 4D View Synthesis and Video Processing Datasets The pouring dataset used for experiments can be download he

44 Dec 20, 2022
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

isaac 27 Jul 09, 2022
MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

64 Dec 29, 2022
Categorizing comments on YouTube into different categories.

Youtube Comments Categorization This repo is for categorizing comments on a youtube video into different categories. negative (grievances, complaints,

Rhitik 5 Nov 26, 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
Api's bulid in Flask perfom to manage Todo Task.

Citymall-task Api's bulid in Flask perfom to manage Todo Task. Installation Requrements : Python: 3.10.0 MongoDB create .env file with variables DB_UR

Aisha Tayyaba 1 Dec 17, 2021
Details about the wide minima density hypothesis and metrics to compute width of a minima

wide-minima-density-hypothesis Details about the wide minima density hypothesis and metrics to compute width of a minima This repo presents the wide m

Nikhil Iyer 9 Dec 27, 2022
Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

1.1k Dec 27, 2022
Certified Patch Robustness via Smoothed Vision Transformers

Certified Patch Robustness via Smoothed Vision Transformers This repository contains the code for replicating the results of our paper: Certified Patc

Madry Lab 35 Dec 14, 2022
🐸STT integration examples

🐸 STT 0.9.x Examples These are various examples on how to use or integrate 🐸 STT using our packages. It is a good way to just try out 🐸 STT before

coqui 92 Dec 19, 2022
Fortuitous Forgetting in Connectionist Networks

Fortuitous Forgetting in Connectionist Networks Introduction This repository includes reference code for the paper Fortuitous Forgetting in Connection

Hattie Zhou 14 Nov 26, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
Simple Dynamic Batching Inference

Simple Dynamic Batching Inference 解决了什么问题? 众所周知,Batch对于GPU上深度学习模型的运行效率影响很大。。。 是在Inference时。搜索、推荐等场景自带比较大的batch,问题不大。但更多场景面临的往往是稀碎的请求(比如图片服务里一次一张图)。 如果

116 Jan 01, 2023
Submanifold sparse convolutional networks

Submanifold Sparse Convolutional Networks This is the PyTorch library for training Submanifold Sparse Convolutional Networks. Spatial sparsity This li

Facebook Research 1.8k Jan 06, 2023