Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Last update: Sep 28, 2022

Related tags

Overview

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

This repo is built upon a local copy of transformers==2.1.1. This repo has been tested on torch==1.4.0 with python 3.7 and CUDA 10.1.

To start, create a new environment and install:

conda create -n grad2task python=3.7
conda activate grad2task
cd Grad2Task
pip install -e .

We use wandb for logging. Please set it up following this doc and specify your project name on wandb in run_meta_training.sh:

export WANDB=[YOUR PROJECT NAME]

Download the dataset and unzip it under the main folder: https://drive.google.com/file/d/1uAdgZFYv9epk6tQVQ3SwboxFpSlkC_ZW/view?usp=sharing

If need to place it somewhere else, specify its path in path.sh.

Train & Evaluation

To train/evaluate models:

bash meta_learn.sh [MODEL_NAME] [MODE] [EXP_ID]

where [MODEL_NAME] refers to model name, [MODE] is experiment model and [EXP_ID] is an optional experiment id used for mark different runs using the same model. Options for [MODEL_NAM] and MODE are listed as follow:

`[MODE]`	Description
train	Training models.
test_best	Test the model with the best validation performance.
test_latest	Test the latest checkpoint.
test	Test model without meta-training. Only applicable to the `fine-tune-baseline` model.

`[MODEL_NAME]`	Description
fine-tune-baseline	Fine-tuning BERT for each task separately.
bert-protonet-euc	ProtoNet with BERT as encoder, using Euclidean distance as distance metric.
bert-protonet-euc-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using Euclidean distance as distance metric.
bert-protonet	ProtoNet with BERT as encoder, using cosine distance as distance metric.
bert-protonet-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using cosine distance as distance metric.
bert-leopard	Leopard with pretrained BERT [1].
bert-leopard-fixlr	Leopard but with fixed learning rates.
bert-cnap-bn-euc-context-cls-shift-scale-ar	Our proposed approach using gradients as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-X	Our proposed approach using average input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XGrad	Our proposed approach using both gradients and input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XY	Our proposed approach using input and textual label encoding as task representation.
bert-cnap-bn-euc-context-shift-scale-ar	Same with our proposed approach except adapting all tokens instead of just the [CLS] token as we do.
bert-cnap-bn-pretrained-taskemb	Our proposed approach with pretrained task embedding model.
bert-cnap-bn-hyper	A hypernetwork based approach.

To run a model with different hyperparameters, first name this run by [EXP_ID] and then specify the new hyperparameters in run/meta_learn.sh. For example, if one wants to run bert-protonet-euc with a smaller learning rate, they could modify run/meta_learn.sh as:

...
elif [ $1 == "bert-protonet-bn" ]; then # ProtoNet with cosince distance
    export LEARNING_RATE=2e-5
    export CHECKPOINT_FREQ=1000
    if [ ${EXP_ID} == *"lr1e-5" ]; then
        export LEARNING_RATE=1e-5
        export CHECKPOINT_FREQ=2000
        # modify other hyperparameters here
    fi
...

and then run:

bash meta_learn.sh bert-protonet-bn train lr1e-5

Reference

[1] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, 2020.

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Related tags

Overview

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

Train & Evaluation

Reference

Owner

Jixuan Wang

Wordplay, an artificial Intelligence based crossword puzzle solver.

Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

This is the code for our paper "Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text"

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Automatic learning-rate scheduler

Multiple custom object count and detection using YOLOv3-Tiny method

Prototype python implementation of the ome-ngff table spec

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

Benchmark for evaluating open-ended generation

Implementation of Multistream Transformers in Pytorch

Image based Human Fall Detection

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Calculates carbon footprint based on fuel mix and discharge profile at the utility selected. Can create graphs and tabular output for fuel mix based on input file of series of power drawn over a period of time.

Automatically creates genre collections for your Plex media

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Improving 3D Object Detection with Channel-wise Transformer

2020 CCF大数据与计算智能大赛-非结构化商业文本信息中隐私信息识别-第7名方案

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility