Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Last update: Dec 05, 2022

Overview

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

This is the Pytorch implementation for sparse progressive distillation (SPD). For more details about the motivation, techniques and experimental results, refer to our paper here.

Running

Environment Preparation (using python3)
```
pip install -r requirements.txt
```
Dataset Preparation

The original GLUE dataset could be downloaded here.

BERT_base fine-tuning on GLUE

We use finetuned BERT_base as the teacher. For each task of GLUE benchmark, we obtain the finetuned model using the original huggingface transformers code with the following script.

python run_glue.py \
          --model_name_or_path $INT_DIR \
          --task_name $TASK_NAME \
          --do_train \
          --do_eval \
          --data_dir $GLUE_DIR/$TASK_NAME/ \
          --max_seq_length 128 \
          --per_gpu_train_batch_size 32 \
          --per_gpu_eval_batch_size 32 \
          --learning_rate 3e-5 \
          --num_train_epochs 4.0 \
          --output_dir $OUT_DIR \
          --evaluate_during_training \
          --overwrite_output_dir \
          --logging_steps 400 \
          --logging_dir $OUT_DIR \
          --save_steps 10000

Sparse Progressive Distillation

We use run_glue.py to run the sparse progressive distillation. --num_prune_epochs is the epochs for pruning. --num_train_epochs is the total number of epochs (pruning, progressive distillation, finetuning).

python run_glue.py \
  --model_name_or_path PATH_TO_FINETUNED_MODEL \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/$TASK_NAME/ \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 6.4e-4 \
  --save_steps 50 \
  --num_prune_epochs 30 \
  --num_train_epochs 60 \
  --sparsity 0.9 \
  --output_dir $OUT_DIR \
  --evaluate_during_training \
  --replacing_rate 0.8 \
  --overwrite_output_dir \
  --steps_for_replacing 0 \
  --scheduler_type linear

To Dos

Provide our teacher model for each task.
Provide best performed model checkpoint for each task.

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Related tags

Overview

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Running

BERT_base fine-tuning on GLUE

Sparse Progressive Distillation

To Dos

Owner

AWS documentation corpus for zero-shot open-book question answering.

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

DataCLUE: 国内首个以数据为中心的AI测评（含模型分析报告）

Plenoxels: Radiance Fields without Neural Networks

This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

Breast cancer is been classified into benign tumour and malignant tumour.

Tools for computational pathology

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

Faune proche - Retrieval of Faune-France data near a google maps location

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

HistoKT: Cross Knowledge Transfer in Computational Pathology

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Dialect classification

Tutorial to set up TensorFlow Object Detection API on the Raspberry Pi

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

PPO Lagrangian in JAX

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment