Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Last update: Dec 31, 2022

Related tags

Overview

T-Few

This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning".

This method outperforms in-context learning with GPT-3 and achieves state-of-the-art on "RAFT".

Setup

First, create a virtual environment for the project and install all the requirments. (We use conda to manage environments. Be sure to install and initialize conda first.)

Create a virtual environment with python 3.7 conda create -n tfew python==3.7, then activate the environment conda activate tfew.
Install other dependencies. pip install -r requirements.txt -f https://download.pytorch.org/whl/cu113/torch_stable.html
If you plan to run SAID, then install dependencies with python src/intrinsic_said_setup.py develop. Otherwise, skip this step.

The steps above only needs to be done once. In addition, every time you start a new session, you will need to run . bin/start.sh

Run your first experiment

Once you finished setting up the environment, you can try running CUDA_VISIBLE_DEVICES=3 python -m src.pl_train -c t0.json+rte.json -k save_model=False exp_name=first_exp The outputs of this run will be saved to ${OUTPUT_PATH}/first_exp/, which is usually /t-few/exp_out/first_exp/. Here, first_exp is the experiment name, you can run more experiments with different expeirment names. The code will automatically skip finished experiments. (However, if you wish to rerun a finished experiment under the same experiment name, you will need to manually remove the corresponding files in the output directory.)

There are two ways to control an experiment.

You can specify config files with -c. Multiple config files can be combined with +. (When there are conflits, config terms from the config file on the right will have greater power.) This will be convinient when you have multiple terms that forms a fixed group.
You can override values with -k. This will be convinient when you need to change a small number of terms.

It is recommended to use GPUs with 40GB to train T0(3B) and 80GB to train T0

Run an array of experiments

In this project, we often need to run a large number of experiments. Here is an example bash script bin/few-shot-pretrained-3b-100k.sh to fine-tune 3B pre-trained (IA)3 on all datasets.

This should take a few hours. After that, you can use scripts/get_results_table.py to generate a csv summary.

Citation

If you find this repo helpful, welcome to cite our work:

@article{liu2020tfew,
  title={Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning},
  author={Liu, Haokun and Tam, Derek and Muqeeth, Mohammed and Mohta, Jay and Huang, Tenghao and Bansal, Mohit and Raffel, Colin},
  journal={arXiv preprint arXiv:2205.05638},
  year={2022}
}

We use the following code in our works:

@article{mahabadi2021compacter,
  title={Compacter: Efficient low-rank hypercomplex adapter layers},
  author={Mahabadi, Rabeeh Karimi and Henderson, James and Ruder, Sebastian},
  journal={arXiv preprint arXiv:2106.04647},
  year={2021}
}

@article{sung2021training,
  title={Training Neural Networks with Fixed Sparse Masks},
  author={Sung, Yi-Lin and Nair, Varun and Raffel, Colin},
  journal={arXiv preprint arXiv:2111.09839},
  year={2021}
}

@article{aghajanyan2020intrinsic,
  title={Intrinsic dimensionality explains the effectiveness of language model fine-tuning},
  author={Aghajanyan, Armen and Zettlemoyer, Luke and Gupta, Sonal},
  journal={arXiv preprint arXiv:2012.13255},
  year={2020}
}

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Related tags

Overview

T-Few

Setup

Run your first experiment

Run an array of experiments

Citation

Owner

Code for Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

ICCV2021 - A New Journey from SDRTV to HDRTV.

MacroTools provides a library of tools for working with Julia code and expressions.

A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Fully Connected DenseNet for Image Segmentation

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Activity image-based video retrieval

Unofficial Tensorflow Implementation of ConvNeXt from A ConvNet for the 2020s

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation