Official PyTorch implementation of PS-KD

Overview

LGCNS AI Research pytorch

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD)

Accepted at ICCV 2021, oral presentation

  • Official PyTorch implementation of Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD).
    [Slides] [Paper] [Video]
  • Kyungyul Kim, ByeongMoon Ji, Doyoung Yoon and Sangheum Hwang

Abstract

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. Please refer to the paper for more details.

Requirements

We have tested the code on the following environments:

  • Python 3.7.7 / Pytorch (>=1.6.0) / torchvision (>=0.7.0)

Datasets

Currently, only CIFAR-100, ImageNet dataset is supported.

#) To verify the effectivness of PS-KD on Detection task and Machine translation task, we used

  • For object detection: Pascal VOC
  • For machine translation: IWSLT 15 English-German / German-English, Multi30k.
  • (Please refer to the paper for more details)

How to Run

Single-node & Multi-GPU Training

To train a single model with 1 nodes & multi-GPU, run the command as follows:

$ python3 main.py --lr 0.1 \
                  --lr_decay_schedule 150 225 \
                  --PSKD \
                  --experiments_dir '<set your own path>' \
                  --classifier_type 'ResNet18' \
                  --data_path '<root your own data path>' \
                  --data_type '<cifar100 or imagenet>' \
                  --alpha_T 0.8 \
                  --rank 0 \
                  --world_size 1 \
                  --multiprocessing_distributed True

Multi-node Training

To train a single model with 2 nodes, for instance, run the commands below in sequence:

# on the node #0
$ python3 main.py --lr 0.1 \
                  --lr_decay_schedule 150 225 \
                  --PSKD \
                  --experiments_dir '<set your own path>' \
                  --classifier_type 'ResNet18' \
                  --data_path '<root your own data path>' \
                  --data_type '<cifar100 or imagenet>' \
                  --alpha_T 0.8 \
                  --rank 0 \
                  --world_size 2 \
                  --dist_url tcp://{master_ip}:{master_port} \
                  --multiprocessing_distributed
# on the node #1
$ python3 main.py --lr 0.1 \
                  --lr_decay_schedule 150 225 \
                  --PSKD \
                  --experiments_dir '<set your own path>' \
                  --classifier_type 'ResNet18' \
                  --data_path '<root your own data path>' \
                  --data_type '<cifar100 or imagenet>' \
                  --alpha_T 0.8 \
                  --rank 1 \
                  --world_size 2 \
                  --dist_url tcp://{master_ip}:{master_port} \
                  --multiprocessing_distributed

Saving & Loading Checkpoints

Saved Filenames

  • save_dir will be automatically determined(with sequential number suffixes) unless otherwise designated.
  • Model's checkpoints are saved in ./{experiments_dir}/models/checkpoint_{epoch}.pth.
  • The best checkpoints are saved in ./{experiments_dir}/models/checkpoint_best.pth.

Loading Checkpoints (resume)

  • Pass model path as a --resume argument

Experimental Results

Performance measures

  • Top-1 Error / Top-5 Error
  • Negative Log Likelihood (NLL)
  • Expected Calibration Error (ECE)
  • Area Under the Risk-coverage Curve (AURC)

Results on CIFAR-100

Model + Method Dataset Top-1 Error Top-5 Error NLL ECE AURC
PreAct ResNet-18 (baseline) CIFAR-100 24.18 6.90 1.10 11.84 67.65
PreAct ResNet-18 + Label Smoothing CIFAR-100 20.94 6.02 0.98 10.79 57.74
PreAct ResNet-18 + CS-KD [CVPR'20] CIFAR-100 21.30 5.70 0.88 6.24 56.56
PreAct ResNet-18 + TF-KD [CVPR'20] CIFAR-100 22.88 6.01 1.05 11.96 61.77
PreAct ResNet-18 + PS-KD CIFAR-100 20.82 5.10 0.76 1.77 52.10
PreAct ResNet-101 (baseline) CIFAR-100 20.75 5.28 0.89 10.02 55.45
PreAct ResNet-101 + Label Smoothing CIFAR-100 19.84 5.07 0.93 3.43 95.76
PreAct ResNet-101 + CS-KD [CVPR'20] CIFAR-100 20.76 5.62 1.02 12.18 64.44
PreAct ResNet-101 + TF-KD [CVPR'20] CIFAR-100 20.13 5.10 0.84 6.14 58.8
PreAct ResNet-101 + PS-KD CIFAR-100 19.43 4.30 0.74 6.92 49.01
DenseNet-121 (baseline) CIFAR-100 20.05 4.99 0.82 7.34 52.21
DenseNet-121 + Label Smoothing CIFAR-100 19.80 5.46 0.92 3.76 91.06
DenseNet-121 + CS-KD [CVPR'20] CIFAR-100 20.47 6.21 1.07 13.80 73.37
DenseNet-121 + TF-KD [CVPR'20] CIFAR-100 19.88 5.10 0.85 7.33 69.23
DenseNet-121 + PS-KD CIFAR-100 18.73 3.90 0.69 3.71 45.55
ResNeXt-29 (baseline) CIFAR-100 18.65 4.47 0.74 4.17 44.27
ResNeXt-29 + Label Smoothing CIFAR-100 17.60 4.23 1.05 22.14 41.92
ResNeXt-29 + CS-KD [CVPR'20] CIFAR-100 18.26 4.37 0.80 5.95 42.11
ResNeXt-29 + TF-KD [CVPR'20] CIFAR-100 17.33 3.87 0.74 6.73 40.34
ResNeXt-29 + PS-KD CIFAR-100 17.28 3.60 0.72 9.18 40.19
PyramidNet-200 (baseline) CIFAR-100 16.80 3.69 0.73 8.04 36.95
PyramidNet-200 + Label Smoothing CIFAR-100 17.82 4.72 0.89 3.46 105.02
PyramidNet-200 + CS-KD [CVPR'20] CIFAR-100 18.31 5.70 1.17 14.70 70.05
PyramidNet-200 + TF-KD [CVPR'20] CIFAR-100 16.48 3.37 0.79 10.48 37.04
PyramidNet-200 + PS-KD CIFAR-100 15.49 3.08 0.56 1.83 32.14

Results on ImageNet

Model +Method Dataset Top-1 Error Top-5 Error NLL ECE AURC
DenseNet-264* ImageNet 22.15 6.12 -- -- --
ResNet-152 ImageNet 22.19 6.19 0.88 3.84 61.79
ResNet-152 + Label Smoothing ImageNet 21.73 5.85 0.92 3.91 68.24
ResNet-152 + CS-KD [CVPR'20] ImageNet 21.61 5.92 0.90 5.79 62.12
ResNet-152 + TF-KD [CVPR'20] ImageNet 22.76 6.43 0.91 4.70 65.28
ResNet-152 + PS-KD ImageNet 21.41 5.86 0.84 2.51 61.01

* denotes results reported in the original papers

Citation

If you find this repository useful, please consider giving a star and citation PS-KD:

@InProceedings{Kim_2021_ICCV,
    author    = {Kim, Kyungyul and Ji, ByeongMoon and Yoon, Doyoung and Hwang, Sangheum},
    title     = {Self-Knowledge Distillation With Progressive Refinement of Targets},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6567-6576}
}

Contact for Issues

License

Copyright (c) 2021-present LG CNS Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Owner
Open source repository of LG CNS AI Research (LAIR), LG
BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition

Rui Qian 17 Dec 12, 2022
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

223 Dec 17, 2022
simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Teach with Dynamic Loss Functions This repo contains the simple demo for the NeurIPS-18 paper: Learning to Teach with Dynamic Loss Functio

Lijun Wu 15 Dec 30, 2021
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Vassilis Choutas 1.3k Jan 07, 2023
Balancing Principle for Unsupervised Domain Adaptation

Blancing Principle for Domain Adaptation NeurIPS 2021 Paper Abstract We address the unsolved algorithm design problem of choosing a justified regulari

Marius-Constantin Dinu 4 Dec 15, 2022
[peer review] An Arbitrary Scale Super-Resolution Approach for 3D MR Images using Implicit Neural Representation

ArSSR This repository is the pytorch implementation of our manuscript "An Arbitrary Scale Super-Resolution Approach for 3-Dimensional Magnetic Resonan

Qing Wu 19 Dec 12, 2022
Contrastive Feature Loss for Image Prediction

Contrastive Feature Loss for Image Prediction We provide a PyTorch implementation of our contrastive feature loss presented in: Contrastive Feature Lo

Alex Andonian 44 Oct 05, 2022
Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

IESL 20 Dec 06, 2022
img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation Figure 1: We estimate the 6DoF rigid transformation of a 3D face (rendered in si

Vítor Albiero 519 Dec 29, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
Multi-View Radar Semantic Segmentation

Multi-View Radar Semantic Segmentation Paper Multi-View Radar Semantic Segmentation, ICCV 2021. Arthur Ouaknine, Alasdair Newson, Patrick Pérez, Flore

valeo.ai 37 Oct 25, 2022
PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)

Unsupervised Depth Completion with Calibrated Backprojection Layers PyTorch implementation of Unsupervised Depth Completion with Calibrated Backprojec

80 Dec 13, 2022
Exploration of some patients clinical variables.

Answer_ALS_clinical_data Exploration of some patients clinical variables. All the clinical / metadata data is available here: https://data.answerals.o

1 Jan 20, 2022
A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️

hf-hub-lightning A callback for pushing lightning models to the Hugging Face Hub. Note: I made this package for myself, mostly...if folks seem to be i

Nathan Raw 27 Dec 14, 2022
Binary classification for arrythmia detection with ECG datasets.

HEART DISEASE AI DATATHON 2021 [Eng] / [Kor] #English This is an AI diagnosis modeling contest that uses the heart disease echocardiography and electr

HY_Kim 3 Jul 14, 2022
Code for "Long-tailed Distribution Adaptation"

Long-tailed Distribution Adaptation (Accepted in ACM MM2021) This project is built upon BBN. Installation pip install -r requirements.txt Usage Traini

Zhiliang Peng 10 May 18, 2022
Fuzzy Overclustering (FOC)

Fuzzy Overclustering (FOC) In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in

2 Nov 08, 2022
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022
Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022