OptiPrompt

This is the PyTorch implementation of the paper Factual Probing Is [MASK]: Learning vs. Learning to Recall.

We propose OptiPrompt, a simple and effective approach for Factual Probing. OptiPrompt optimizes the prompts on the input embedding space directly. It outperforms previous prompting methods on the LAMA benchmark. Furthermore, in order to better interprete probing results, we propose control experiments based on the probing results on randomly initialized models. Please check our paper for details.

Quick links

Setup
- Install dependencies
- Download the data
Run OptiPrompt
- Train/evaluate OptiPrompt
- Run experiments on all relations
Run Fine-tuning
Evaluate LAMA/LPAQA/AutoPrompt prompts
Questions?
Citation

Setup

Install dependecies

Our code is based on python 3.7. All experiments are run on a single GPU.

Please install all the dependency packages using the following command:

pip install -r requirements.txt

Download the data

We pack all datasets we used in our experiments here. Please download it and extract the files to ./data, or run the following commands to autoamtically download and extract it.

bash script/download_data.sh

The datasets are structured as below.

data
├── LAMA-TREx                         # The original LAMA-TREx test set (34,039 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx_UHN                     # The LAMA-TREx_UHN test set (27,102 examples)
│   ├── P17.jsonl                     # Testing file for the relation `P17`
│   └── ...
├── LAMA-TREx-easy-hard               # The easy and hard partitions of the LAMA-TREx dataset (check the paper for details)
│   ├── Easy                          # The LAMA-easy partition (10,546 examples)
│   │   ├── P17.jsonl                 # Testing file for the relation `P17`
│   │   └── ...
│   └── Hard                          # The LAMA-hard partition (23,493 examples)
│       ├── P17.jsonl                 # Testing file for the relation `P17`
│       └── ...
├── autoprompt_data                   # Training data collected by AutoPrompt
│   ├── P17                           # Train/dev/test files for the relation `P17`
│   │   ├── train.jsonl               # Training examples
│   │   ├── dev.jsonl                 # Development examples
│   │   └── test.jsonl                # Test examples (the same as LAMA-TREx test set)
│   └── ...
└── cmp_lms_data                      # Training data collected by ourselves which can be used for BERT, RoBERTa, and ALBERT (we only use this dataset in Table 6 in the paper)
    ├── P17                           # Train/dev/test files for the relation `P17`
    │   ├── train.jsonl               # Training examples
    │   ├── dev.jsonl                 # Development examples
    │   ├── test.jsonl                # Test examples (a subset of the LAMA-TREx test set, filtered using the common vocab of three models)
    └── ...

Run OptiPrompt

Train/evaluate OptiPrompt

You can use code/run_optiprompt.py to train or evaluate the prompts on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_optiprompt.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions \
    [--init_manual_template] [--num_vectors 5 | 10]

Arguments:

relation_profile: the meta information for each relation, containing the manual templates.
relation: the relation type (e.g., P101) considered in this experiment.
common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
do_train: whether to train the prompts on a training and development set.
do_eval: whether to test the trained prompts on a testing set.
{train|dev|test}_data: the file path of training/development/testing dataset.
random_init: how do we random initialize the model before training, there are three settings:
- none: use the pre-trained model, no random initialization is used;
- embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
- all: the Rand M control setting, where we random initialize all the parameters of the model.
init_manual_template: whether initialize the dense vectors in OptiPrompt using the manual prompts.
num_vectors: how many dense vectors are added in OptiPrompt (this argument is valid only when init_manual_template is not set).
output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_optiprompt.sh) to run OptiPrompt on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_opti.sh

The default setting of this script is to run OptiPromot initialized with manual prompts on the pre-trained bert-base-cased model (no random initialization is used). The results will be stored in the outputs directory.

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_optiprompt.sh if you want to run experiments on other settings.

Run Fine-tuning

We release the code that we used in our experiments (check Section 4 in the paper).

Fine-tuning language models on factual probing

You can use code/run_finetune.py to fine-tune a language model on a specific relation. A command template is as follow:

rel=P101
dir=outputs/${rel}
mkdir -p ${dir}

python code/run_finetune.py \
    --relation_profile relation_metainfo/LAMA_relations.jsonl \
    --relation ${rel} \
    --common_vocab_filename common_vocabs/common_vocab_cased.txt \
    --model_name bert-base-cased \
    --do_train \
    --train_data data/autoprompt_data/${rel}/train.jsonl \
    --dev_data data/autoprompt_data/${rel}/dev.jsonl \
    --do_eval \
    --test_data data/LAMA-TREx/${rel}.jsonl \
    --output_dir ${dir} \
    --random_init none \
    --output_predictions

Arguments:

relation_profile: the meta information for each relation, containing the manual templates.
relation: the relation type (e.g., P101) considered in this experiment.
common_vocab_filename: the vocabulary used to filter out facts; it should be the intersection of different models' for fair comparison.
model_name: the pre-trained model used in this experiment, e.g., bert-base-cased, albert-xxlarge-v1.
do_train: whether to train the prompts on a training and development set.
do_eval: whether to test the trained prompts on a testing set.
{train|dev|test}_data: the file path of training/development/testing dataset.
random_init: how do we random initialize the model before training, there are three settings:
- none: use the pre-trained model, no random initialization is used;
- embedding: the Rand E control setting, where we random initialize the embedding layer of the model;
- all: the Rand M control setting, where we random initialize all the parameters of the model.
output_predictions: whether to output top-k predictions for each testing fact (k is specified by --k).

Run experiments on all relations

We provide an example script (scripts/run_finetune.sh) to run fine-tuning on all 41 relations on the LAMA benchmark. Run the following command to use it:

bash scripts/run_finetune.sh

Please modify the shell variables (i.e., OUTPUTS_DIR, MODEL, RAND) in scripts/run_finetune.sh if you want to run experiments on other settings.

Evaluate LAMA/LPAQA/AutoPrompt prompts

We provide a script to evaluate prompts released in previous works (based on code/run_finetune.py with only --do_eval). Please use the foolowing command:

bash scripts/run_eval_prompts {lama | lpaqa | autoprompt}

Questions?

If you have any questions related to the code or the paper, feel free to email Zexuan Zhong ([email protected]) or Dan Friedman ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

@inproceedings{zhong2021factual,
   title={Factual Probing Is [MASK]: Learning vs. Learning to Recall},
   author={Zhong, Zexuan and Friedman, Dan and Chen, Danqi},
   booktitle={North American Association for Computational Linguistics (NAACL)},
   year={2021}
}

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall

Related tags

Overview

OptiPrompt

Quick links

Setup

Install dependecies

Download the data

Run OptiPrompt

Train/evaluate OptiPrompt

Run experiments on all relations

Run Fine-tuning

Fine-tuning language models on factual probing

Run experiments on all relations

Evaluate LAMA/LPAQA/AutoPrompt prompts

Questions?

Citation

Owner

Princeton Natural Language Processing

NAS-Bench-x11 and the Power of Learning Curves

Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

OpenMMLab Model Deployment Toolset

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

load .txt to train YOLOX, same as Yolo others

🌎 The Modern Declarative Data Flow Framework for the AI Empowered Generation.

Bunch of different tools which helps visualizing and annotating images for semantic/instance segmentation tasks

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

Data loaders and abstractions for text and NLP

Framework for training options with different attention mechanism and using them to solve downstream tasks.

List of papers, code and experiments using deep learning for time series forecasting

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

Implements an infinite sum of poisson-weighted convolutions

Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"