Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Overview

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning

Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Geonmo Gu*1, Byungsoo Ko*1, Han-Gyu Kim2 (* Authors contributed equally.)

1@NAVER/LINE Vision, 2@NAVER Clova Speech

Overview

Proxy Synthesis

  • Proxy Synthesis (PS) is a novel regularizer for any softmax variants and proxy-based losses in deep metric learning.

How it works?

  • Proxy Synthesis exploits synthetic classes and improves generalization by considering class relations and obtaining smooth decision boundaries.
  • Synthetic classes mimic unseen classes during training phase as described in below Figure.

Experimental results

  • Proxy Synthesis improves performance for every loss and benchmark dataset.

Getting Started

Installation

  1. Clone the repository locally
$ git clone https://github.com/navervision/proxy-synthesis
  1. Create conda virtual environment
$ conda create -n proxy_synthesis python=3.7 anaconda
$ conda activate proxy_synthesis
  1. Install pytorch
$ conda install pytorch torchvision cudatoolkit=<YOUR_CUDA_VERSION> -c pytorch
  1. Install faiss
$ conda install faiss-gpu cudatoolkit=<YOUR_CUDA_VERSION> -c pytorch
  1. Install requirements
$ pip install -r requirements.txt

Prepare Data

  • Download CARS196 dataset and unzip
$ wget http://imagenet.stanford.edu/internal/car196/car_ims.tgz
$ tar zxvf car_ims.tgz -C ./dataset
  • Rearrange CARS196 directory by following structure
# Dataset structure
/dataset/carDB/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  test/
    class1/
      img3.jpeg
    class2/
      img4.jpeg
# Rearrange dataset structure
$ python dataset/prepare_cars.py

Train models

Norm-SoftMax loss with CARS196

# Norm-SoftMax
$ python main.py --gpu=0 \
--save_path=./logs/CARS196_norm_softmax \
--data=./dataset/carDB --data_name=cars196 \
--dim=512 --batch_size=128 --epochs=130 \
--freeze_BN --loss=Norm_SoftMax \
--decay_step=50 --decay_stop=50 --n_instance=1 \
--scale=23.0 --check_epoch=5

PS + Norm-SoftMax loss with CARS196

# PS + Norm-SoftMax
$ python main.py --gpu=0 \
--save_path=./logs/CARS196_PS_norm_softmax \
--data=./dataset/carDB --data_name=cars196 \
 --dim=512 --batch_size=128 --epochs=130 \
--freeze_BN --loss=Norm_SoftMax \
--decay_step=50 --decay_stop=50 --n_instance=1 \
--scale=23.0 --check_epoch=5 \
--ps_alpha=0.40 --ps_mu=1.0

Proxy-NCA loss with CARS196

# Proxy-NCA
$ python main.py --gpu=0 \
--save_path=./logs/CARS196_proxy_nca \
--data=./dataset/carDB --data_name=cars196 \
--dim=512 --batch_size=128 --epochs=130 \
--freeze_BN --loss=Proxy_NCA \
--decay_step=50 --decay_stop=50 --n_instance=1 \
--scale=12.0 --check_epoch=5

PS + Proxy-NCA loss with CARS196

# PS + Proxy-NCA
$ python main.py --gpu=0 \
--save_path=./logs/CARS196_PS_proxy_nca \
--data=./dataset/carDB --data_name=cars196 \
--dim=512 --batch_size=128 --epochs=130 \
--freeze_BN --loss=Proxy_NCA \
--decay_step=50 --decay_stop=50 --n_instance=1 \
--scale=12.0 --check_epoch=5 \
--ps_alpha=0.40 --ps_mu=1.0

Check Test Results

$ tensorboard --logdir=logs --port=10000

Experimental results

  • We report [email protected], RP and MAP performances of each loss, which are trained with CARS196 dataset for 8 runs.

[email protected]

Loss 1 2 3 4 5 6 7 8 Mean ± std
Norm-SoftMax 83.38 83.25 83.25 83.18 83.05 82.90 82.83 82.79 83.08 ± 0.21
PS + Norm-SoftMax 84.69 84.58 84.45 84.35 84.22 83.95 83.91 83.89 84.25 ± 0.31
Proxy-NCA 83.74 83.69 83.62 83.32 83.06 83.00 82.97 82.84 83.28 ± 0.36
PS + Proxy-NCA 84.52 84.39 84.32 84.29 84.22 84.12 83.94 83.88 84.21 ± 0.21

RP

Loss 1 2 3 4 5 6 7 8 Mean ± std
Norm-SoftMax 35.85 35.51 35.28 35.28 35.24 34.95 34.87 34.84 35.23 ± 0.34
PS + Norm-SoftMax 37.01 36.98 36.92 36.74 36.74 36.73 36.54 36.45 36.76 ± 0.20
Proxy-NCA 36.08 35.85 35.79 35.66 35.66 35.63 35.47 35.43 35.70 ± 0.21
PS + Proxy-NCA 36.97 36.84 36.72 36.64 36.63 36.60 36.43 36.41 36.66 ± 0.18

MAP

Loss 1 2 3 4 5 6 7 8 Mean ± std
Norm-SoftMax 25.56 25.56 25.00 24.93 24.90 24.59 24.57 24.56 24.92 ± 0.35
PS + Norm-SoftMax 26.71 26.67 26.65 26.56 26.53 26.52 26.30 26.17 26.51 ± 0.18
Proxy-NCA 25.66 25.52 25.37 25.36 25.33 25.26 25.22 25.04 25.35 ± 0.18
PS + Proxy-NCA 26.77 26.63 26.50 26.42 26.37 26.31 26.25 26.12 26.42 ± 0.20

Performance Graph

  • Below figure shows performance graph of test set during training.

Reference

  • Our code is based on SoftTriple repository (Arxiv, Github)

Citation

If you find Proxy Synthesis useful in your research, please consider to cite the following paper.

@inproceedings{gu2020proxy,
    title={Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning},
    author={Geonmo Gu, Byungsoo Ko, and Han-Gyu Kim},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    year={2021}
}

License

Copyright 2021-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
NAVER/LINE Vision
Open source repository of Vision, NAVER & LINE
NAVER/LINE Vision
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 01, 2022
TensorFlow tutorials and best practices.

Effective TensorFlow 2 Table of Contents Part I: TensorFlow 2 Fundamentals TensorFlow 2 Basics Broadcasting the good and the ugly Take advantage of th

Vahid Kazemi 8.7k Dec 31, 2022
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation By Qiang Zhou*, Zilong Huang*, Lichao Huang, Han Shen, Yon

Forest 117 Apr 01, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
Teaches a student network from the knowledge obtained via training of a larger teacher network

Distilling-the-knowledge-in-neural-network Teaches a student network from the knowledge obtained via training of a larger teacher network This is an i

Abhishek Sinha 146 Dec 11, 2022
RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

RMNet: Equivalently Removing Residual Connection from Networks This repository is the official implementation of "RMNet: Equivalently Removing Residua

184 Jan 04, 2023
Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Multi-modal Vision Transformers Excel at Class-agnostic Object Detection

Muhammad Maaz 206 Jan 04, 2023
PyTorch implementation of SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching

SMODICE: Versatile Offline Imitation Learning via State Occupancy Matching This is the official PyTorch implementation of SMODICE: Versatile Offline I

Jason Ma 14 Aug 30, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images

268 Nov 27, 2022
Combining Diverse Feature Priors

Combining Diverse Feature Priors This repository contains code for reproducing the results of our paper. Paper: https://arxiv.org/abs/2110.08220 Blog

Madry Lab 5 Nov 12, 2022
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 06, 2023
YOLOv3 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices

Ultralytics 9.3k Jan 07, 2023
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Jan 08, 2023
COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

COPA-SSE Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning. COPA-SSE contains crowdsourced explanations for the Balanced

Ana Brassard 5 Jul 31, 2022
DumpSMBShare - A script to dump files and folders remotely from a Windows SMB share

DumpSMBShare A script to dump files and folders remotely from a Windows SMB shar

Podalirius 178 Jan 06, 2023
This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE

Guochen Yu 68 Dec 16, 2022
DLL: Direct Lidar Localization

DLL: Direct Lidar Localization Summary This package presents DLL, a direct map-based localization technique using 3D LIDAR for its application to aeri

Service Robotics Lab 127 Dec 16, 2022