Exploiting a Zoo of Checkpoints for Unseen Tasks

Overview

Exploiting a Zoo of Checkpoints for Unseen Tasks

                               

This repo includes code to reproduce all results in the above Neurips paper, authored by Jiaji Huang, Qiang Qiu and Kenneth Church.

Dependencies

We used python 3.8.5, but other versions close to that should also work. Install all required packages by

pip install --upgrade pip
pip install -r requirements.txt

We used cuda 10.2.89, but any version that meets pytorch's requirement should also work.

Highlight of Results

We highlight some major results, so that readers do not have to read the paper to grasp the main ideas. Concisely, the paper tries to answer the question:

"Can we use a checkpoint zoo to build something that better adapts to unseen tasks?"

To answer the question, first we need to understand the geometry of a space of tasks.

Characterize the Task Space

In the paper, we model the tasks as following a Gaussian process. Its covariance is computed by applying kernel alignment to extracted features. The features are obtained by inputting probe data into checkpoints, each trained for a task. For example, using 34 checkpoints from Huggingface models, we can estimate the 34x34 covariance (of their corresponding tasks).

To reproduce the above figure, refer to LMs/README.md.

Exploit the Task Space

We hypothesize that representative tasks are more generalizable to new tasks. This, of course, needs a rigorious mathematical proof. But empirically we find it is true, as indicated by the experiments on NLP and vision tasks.

So, how to identify reprentative tasks? They are supposed to convey the most information about the rest of the task space. We formulate the problem into a Max-Mutual-Information (MMI) objective. The solver takes the covariance as input, and greedily picks reprentative tasks.

Linguistic Tasks

Using the 34x34 covariance matrix, we can identify that the 5 most representative tasks are those corresponding to roberta-base, distilbert-base-uncased, t5-base, bert-base-cased and bart-large. Combining these checkpoints yields superior results on 8 new linguistic tasks, e.g., below is an example of chunking task.

To reproduce full results, check LMs/README.md for details.

Computer Vision Tasks

The observation holds for vision tasks too. Below is an experiment set up on cifar100. MMI shows steady gain over random selection, and outperforms another baseline.

To reproduce all results, check vision/README.md for details.

Additional Comments

Note: This project requires running many small jobs. So it will be very useful if you have a cluster powered by slurm, which can launch jobs in parallel. In the job-launching scripts, you can see multiple commands like

sbatch -p $partition --gres=gpu:1 --wrap "python run.py" -o $job_log_path

If you do not have such a cluster, just use

python run.py > $job_log_path

instead.

Owner
Baidu Research
Baidu Research
Baidu Research
Hi Guys, here I am providing examples, which will help you in Lerarning Python

LearningPython Hi guys, here I am trying to include as many practice examples of Python Language, as i Myself learn, and hope these will help you in t

4 Feb 03, 2022
Code for the paper "Multi-task problems are not multi-objective"

Multi-Task problems are not multi-objective This is the code for the paper "Multi-Task problems are not multi-objective" in which we show that the com

Michael Ruchte 5 Aug 19, 2022
Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

Codebase for "INVASE: Instance-wise Variable Selection" Authors: Jinsung Yoon, James Jordon, Mihaela van der Schaar Paper: Jinsung Yoon, James Jordon,

Jinsung Yoon 50 Nov 11, 2022
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

浅梦 828 Jan 04, 2023
Deep Reinforcement Learning with pytorch & visdom

Deep Reinforcement Learning with pytorch & visdom Sample testings of trained agents (DQN on Breakout, A3C on Pong, DoubleDQN on CartPole, continuous A

Jingwei Zhang 783 Jan 04, 2023
Model that predicts the probability of a Twitter user being anti-vaccination.

stylebody {text-align: justify}/style AVAXTAR: Anti-VAXx Tweet AnalyzeR AVAXTAR is a python package to identify anti-vaccine users on twitter. The

10 Sep 27, 2022
Ludwig Benchmarking Toolkit

Ludwig Benchmarking Toolkit The Ludwig Benchmarking Toolkit is a personalized benchmarking toolkit for running end-to-end benchmark studies across an

HazyResearch 17 Nov 18, 2022
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations)

Graph Neural Networks with Learnable Structural and Positional Representations Source code for the paper "Graph Neural Networks with Learnable Structu

Vijay Prakash Dwivedi 180 Dec 22, 2022
A different spin on dataclasses.

dataklasses Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it: from

David Beazley 752 Nov 18, 2022
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Tong 8 Apr 25, 2022
Implementation of Kalman Filter in Python

Kalman Filter in Python This is a basic example of how Kalman filter works in Python. I do plan on refactoring and expanding this repo in the future.

Enoch Kan 35 Sep 11, 2022
Code for the paper "Asymptotics of ℓ2 Regularized Network Embeddings"

README Code for the paper Asymptotics of L2 Regularized Network Embeddings. Requirements Requires Stellargraph 1.2.1, Tensorflow 2.6.0, scikit-learm 0

Andrew Davison 0 Jan 06, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
This is an implementation of PIFuhd based on Pytorch

Open-PIFuhd This is a unofficial implementation of PIFuhd PIFuHD: Multi-Level Pixel-Aligned Implicit Function forHigh-Resolution 3D Human Digitization

Lingteng Qiu 235 Dec 19, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 01, 2023
Individual Treatment Effect Estimation

CAPE Individual Treatment Effect Estimation Run CAPE python train_causal.py --loop 10 -m cape_cau -d NI --i_t 1 Run a baseline model python train_cau

S. Deng 4 Sep 02, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation Munan Ning*, Donghuan Lu*, Dong Wei†, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Y

Munan Ning 36 Dec 07, 2022
Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

NSGDC Some codes in this repo are copied/modified from opensource implementations made available by UNITER, PyTorch, HuggingFace, OpenNMT, and Nvidia.

Zhihao Fan 2 Nov 07, 2022