This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Last update: Nov 15, 2022

Related tags

Overview

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

It includes /bert, which is the original BERT repository modified to be weight prunable. (And to use gradient checkpointing, if you need that. This can be disabled by setting a unix environment variable DISABLE_GRAD_CHECKPOINT=True. This only works during fine-tuning, not during pre-training.)

I am currently in the process of converting these experiments into a ducttape workflow, so things are a little unstable right now.

Things that have not been converted to ducttape:

Anything in tables/
Anything in graphs/

If you need all the experiments from the paper, check out this commit. It's very messy, so be prepared to read the code. I will not be releasing a guide to run that code, since it will be made obselete by the ducttape workflow.

Configuration

pip install -r requirements.txt

To pre-train, you will need a GPU with at least 12 GB of GPU RAM. I've been using Titan RTX's via Univa Grid Engine. If you don't like this setup, you will need to modify tapes/submitters.tape and/or main.tconf.

You'll also need the Wikipedia corpus and BookCorpus, which can be retrieved with scripts/download_wiki.sh or scripts/download_bookcorpus.sh, respectively. GLUE data can be retrieved by running scripts/get_glue.py.

You will need to update tapes/link_data.tape to point to dataset locations.

You will also need to update main.tconf to point to the location of your repository on disk (so ducttape knows where to find packages).

AFAIK, no one besides me has used this code. If you have trouble, please open an issue and I'll do what I can to help out.

Most experiments are run using

ducttape main.tape -C main.tconf -p main

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Related tags

Overview

Configuration

Owner

Mitchell Gordon

Users can free try their models on SIDD dataset based on this code

General Vision Benchmark, a project from OpenGVLab

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Repository for the semantic WMI loss

We present a regularized self-labeling approach to improve the generalization and robustness properties of fine-tuning.

Neural Nano-Optics for High-quality Thin Lens Imaging

Recurrent Scale Approximation (RSA) for Object Detection

Official code for paper Exemplar Based 3D Portrait Stylization.

This project is used for the paper Differentiable Programming of Isometric Tensor Network

No Code AI/ML platform

A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

PyTorch implementation(s) of various ResNet models from Twitch streams.

Local Attention - Flax module for Jax

toroidal - a lightweight transformer library for PyTorch

This repo tries to recognize faces in the dataset you created

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

A Small and Easy approach to the BraTS2020 dataset (2D Segmentation)