Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Related tags

Text Data & NLPCLIF
Overview

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning

This repo is for Findings at EMNLP 2021 paper: Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning. Code clean-up is still in progress.

Data

Please extract the downloaded data and place it under PROJECT_DIR/datasets. Our training data stream and few-shot datasets are curated from https://github.com/iesl/leopard and https://github.com/INK-USC/CrossFit.

The directory structure is

PROJECT_DIR/datasets/crossfit_data/data/ + 55 classification tasks from the link above, e.g. PROJECT_DIR/datasets/crossfit_data/data/anli
PROJECT_DIR/datasets/leopard/ + 17 tasks from the link above, e.g. PROJECT_DIR/datasets/leopard/airline

Environment

Our code uses PyTorch 1.7.1. To allow fp16 training, you should also install apex.

Running Experiments

Training on CLIF-26

reg=0.01
lr=1e-4
seed=0
python run_model.py --tasks cola sst2 mrpc stsb qqp mnli qnli rte wnli \
--output_dir runs/glue_cfew_10k_choice_hnet_hardlong_sample_reg${reg}_s64_d256_limit/${lr}/${seed} \
--do_train --eval_period 100000 --eval_at_epoch_end  --wait_step 3 --num_train_epochs 100 --seed ${seed} \
--train_batch_size 64 --gradient_accumulation_steps 2 --learning_rate ${lr} --max_output_length 8 \
--generator_hdim 32 --example_limit 100 --train_limit 10000 --cl_method hnet --h_l2reg ${reg} \
--adapter_dim 256 --adapter_dim_final 64  --hard_long_term  --limit_label_vocab_space \
--sample_batch --scale_loss --stm_size 64

Few-shot evaluation on CLIF-26

python run_model.py --task_collection leopard --k_shot 16 --max_input_length 100  \
--output_dir /runs/glue_cfew_10k_choice_hnet_hardlong_sample_reg${reg}_s64_d256_limit/${lr}/${seed} \
--do_few_shot_predict --eval_period 100000 --eval_at_epoch_end  --wait_step 3 --num_train_epochs 100 \
--seed ${seed} --train_batch_size 64 --predict_batch_size 16 --few_shot_train_batch_size 16 \
--few_shot_wait_step 100000 --few_shot_num_train_epochs 800 --wait_step 3 --gradient_accumulation_steps 4 \
--scale_by_accumulation --learning_rate ${lr} --max_output_length 8  --generator_hdim 32 \
--example_limit 100 --train_limit 10000 --cl_method naive --h_l2reg ${reg} --adapter_dim 256 \
--adapter_dim_final 64 --hard_long_term --limit_label_vocab_space --no_short_term --long_term_task_emb_num 9 \
--postfix "naive_16shot"  --sample_batch --stm_size 64 --few_shot_eval_period 200

Training and evaluation on CLIF-55

reg=0.01
lr=1e-4
seed=0
python run_model.py  --task_collection crossfit_cls_train --crossfit_k_shot 16 --ssd --output_dir runs/crossfit_hnet_merge_space_${reg}/${lr}/${seed} --skip_intermediate_ckpt --add_space --merge_split --split_id ${seed} --seed ${seed} --do_train --eval_every_k_tasks 5 --eval_period 100 --skip_intermediate_ckpt --train_batch_size 64 --wait_step 3 --num_train_epochs 10000000  --learning_rate ${lr} --max_output_length 64 --example_limit 100 --train_limit 10000 --cl_method hnet --h_l2reg ${reg} --adapter_dim 256 --generator_hdim 32 --adapter_dim_final 64 --sample_batch --hard_long_term --stm_size 64
python run_model.py --task_collection crossfit_cls_train --crossfit_k_shot 16 --ssd --output_dir runs/crossfit_hnet_merge_space${reg}/${lr}/${seed} --skip_intermediate_ckpt --add_space --merge_split --split_id ${seed} --seed ${seed} --do_predict --eval_every_k_tasks 5 --eval_period 100 --skip_intermediate_ckpt --train_batch_size 64 --wait_step 3 --num_train_epochs 10000000  --learning_rate ${lr} --max_output_length 64 --example_limit 100 --train_limit 10000 --cl_method hnet --h_l2reg ${reg} --adapter_dim 256 --generator_hdim 32 --adapter_dim_final 64 --sample_batch --hard_long_term --stm_size 64
for split_id in 0 1 2 3 4
do
  python run_model.py --task_collection crossfit_cls_test --crossfit_k_shot 16 --ssd --postfix "split${split_id}"  --long_term_task_emb_num 45 --do_few_shot_predict --few_shot_eval_period 200 --few_shot_num_train_epochs 800 --few_shot_train_batch_size 64 --few_shot_wait_step 100 --mtl_task_num 45 --output_dir runs/crossfit_hnet_merge_space_${reg}/${lr}/${seed} --add_space  --limit_label_vocab_space --split_id ${split_id} --seed ${seed} --eval_period 100 --train_batch_size 64 --gradient_accumulation_steps 1 --wait_step 6 --num_train_epochs 10000  --learning_rate ${lr} --max_output_length 64 --example_limit 100 --train_limit 10000 --cl_method naive --adapter_dim 256 --generator_hdim 32 --adapter_dim_final 64 --sample_batch --hard_long_term
done

Here are mapping between command line arguments and implemented methods.

  • BART-Single without adapter: --cl_method naive --no_param_gen --skip_adapter --train_all
  • BART-Single-MTL: --cl_method naive --no_param_gen --skip_mtl --mtl --train_all
  • BiHNET-Vanilla: --cl_method naive --hard_long_term
  • BiHNET with trained task embeddings: --cl_method hnet --no_short_term --train_task_embs --hard_long_term
  • BART-Adapter-Single: --cl_method naive --no_param_gen --lr 3e-4
Owner
INK Lab @ USC
Intelligence and Knowledge Discovery (INK) Research Lab at University of Southern California
INK Lab @ USC
A fast, efficient universal vector embedding utility package.

Magnitude: a fast, simple vector embedding utility library A feature-packed Python package and vector storage file format for utilizing vector embeddi

Plasticity 1.5k Jan 02, 2023
Code for using and evaluating SpanBERT.

SpanBERT This repository contains code and models for the paper: SpanBERT: Improving Pre-training by Representing and Predicting Spans. If you prefer

Meta Research 798 Dec 30, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022
Journalism AI – Quotes extraction for modular journalism

Quote extraction for modular journalism (JournalismAI collab 2021)

Journalism AI collab 2021 207 Dec 25, 2022
T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets (product titles, images, comments, etc.).

55 Nov 22, 2022
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022
Scikit-learn style model finetuning for NLP

Scikit-learn style model finetuning for NLP Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide vari

indico 665 Dec 17, 2022
Converts text into a PDF of handwritten notes

Text To Handwritten Notes Converts text into a PDF of handwritten notes Explore the docs » · Report Bug · Request Feature · Steps: $ git clone https:/

UVSinghK 63 Oct 09, 2022
jiant is an NLP toolkit

🚨 Update 🚨 : As of 2021/10/17, the jiant project is no longer being actively maintained. This means there will be no plans to add new models, tasks,

ML² AT CILVR 1.5k Dec 28, 2022
Top2Vec is an algorithm for topic modeling and semantic search.

Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.

Dimo Angelov 2.4k Jan 06, 2023
Simple Annotated implementation of GPT-NeoX in PyTorch

Simple Annotated implementation of GPT-NeoX in PyTorch This is a simpler implementation of GPT-NeoX in PyTorch. We have taken out several optimization

labml.ai 101 Dec 03, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

Adobe, Inc. 148 Dec 26, 2022
Outreachy TFX custom component project

Schema Curation Custom Component Outreachy TFX custom component project This repo contains the code for Schema Curation Custom Component made as a par

Robert Crowe 5 Jul 16, 2021
CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

74 Dec 10, 2021
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing Trankit is a light-weight Transformer-based Pyth

652 Jan 06, 2023
NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

pretrain4ir_tutorial NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking 用作NLPIR实验室, Pre-training

ZYMa 12 Apr 07, 2022
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022