This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Last update: Mar 24, 2022

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the GLUE dataset (containing MNLI) using the script in HERE, and put the files into ./dataset/glue/. Download the Amazon Reviews dataset from HERE, and extract it into ./dataset/amazon_review/

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

bash train_domain.sh

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

bash finetune_domain.sh

Train the teacher model (HRKD-teacher) from multi-domain

bash train_multi_domain.sh

And then put the checkpoints to the specified directories (see the beginning of finetune_multi_domain.py for more details).

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

bash finetune_multi_domain.sh

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021hrkd,
  title     = {{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression},
  author    = {Chenhe Dong and Yaliang Li and Ying Shen and Minghui Qiu},
  booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2021}
}

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Related tags

Overview

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Requirements

Download checkpoints

Prepare dataset

Train the teacher model (BERT$_{\rm B}$-single) from single-domain

Distill the student model (BERT$_{\rm S}$) with TinyBERT-KD from single-domain

Train the teacher model (HRKD-teacher) from multi-domain

Distill the student model (BERT$_{\rm S}$) with our HRKD from multi-domain

Reference

Owner

Chenhe Dong

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

Deep Learning Slide Captcha

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

Systematic generalisation with group invariant predictions

This tool uses Deep Learning to help you draw and write with your hand and webcam.

A parallel framework for population-based multi-agent reinforcement learning.

Code release for "Making a Bird AI Expert Work for You and Me".

A static analysis library for computing graph representations of Python programs suitable for use with graph neural networks.

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモ

Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Official PyTorch implementation of RobustNet (CVPR 2021 Oral)

Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating