PyTorch implementation of SIFT descriptor

Last update: Dec 24, 2022

Overview

This is an differentiable pytorch implementation of SIFT patch descriptor. It is very slow for describing one patch, but quite fast for batch. It can be used for descriptop-based learning shape of affine feature.

UPD 08/2019 : pytorch-sift is added to kornia and available by kornia.features.SIFTDescriptor

There are different implementations of the SIFT on the web. I tried to match Michal Perdoch implementation, which gives high quality features for image retrieval CVPR2009. However, on planar datasets, it is inferior to vlfeat implementation. The main difference is gaussian weighting window parameters, so I have made a vlfeat-like version too. MP version weights patch center much more (see image below, left) and additionally crops everything outside the circular region. Right is vlfeat version

descriptor_mp_mode = SIFTNet(patch_size = 65,
                        sigma_type= 'hesamp',
                        masktype='CircularGauss')

descriptor_vlfeat_mode = SIFTNet(patch_size = 65,
                        sigma_type= 'vlfeat',
                        masktype='Gauss')

Results:

OPENCV-SIFT - mAP 
   Easy     Hard      Tough     mean
-------  -------  ---------  -------
0.47788  0.20997  0.0967711  0.26154

VLFeat-SIFT - mAP 
    Easy      Hard      Tough      mean
--------  --------  ---------  --------
0.466584  0.203966  0.0935743  0.254708

PYTORCH-SIFT-VLFEAT-65 - mAP 
    Easy      Hard      Tough      mean
--------  --------  ---------  --------
0.472563  0.202458  0.0910371  0.255353

NUMPY-SIFT-VLFEAT-65 - mAP 
    Easy      Hard      Tough      mean
--------  --------  ---------  --------
0.449431  0.197918  0.0905395  0.245963

PYTORCH-SIFT-MP-65 - mAP 
    Easy      Hard      Tough      mean
--------  --------  ---------  --------
0.430887  0.184834  0.0832707  0.232997

NUMPY-SIFT-MP-65 - mAP 
    Easy     Hard      Tough      mean
--------  -------  ---------  --------
0.417296  0.18114  0.0820582  0.226832

Speed:

0.00246 s per 65x65 patch - numpy SIFT
0.00028 s per 65x65 patch - C++ SIFT
0.00074 s per 65x65 patch - CPU, 256 patches per batch
0.00038 s per 65x65 patch - GPU (GM940, mobile), 256 patches per batch
0.00038 s per 65x65 patch - GPU (GM940, mobile), 256 patches per batch

If you use this code for academic purposes, please cite the following paper:

@InProceedings{AffNet2018,
    title = {Repeatability Is Not Enough: Learning Affine Regions via Discriminability},
    author = {Dmytro Mishkin, Filip Radenovic, Jiri Matas},
    booktitle = {Proceedings of ECCV},
    year = 2018,
    month = sep
}

PyTorch implementation of SIFT descriptor

Related tags

Overview

Owner

Dmytro Mishkin

The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

Hyperbolic Hierarchical Clustering.

Official NumPy Implementation of Deep Networks from the Principle of Rate Reduction (2021)

The mini-AlphaStar (mini-AS, or mAS) - mini-scale version (non-official) of the AlphaStar (AS)

UniFormer - official implementation of UniFormer

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Semiconductor Machine learning project

Modeling CNN layers activity with Gaussian mixture model

Customised to detect objects automatically by a given model file(onnx)

Yolo ros - YOLO-ROS for HUAWEI ATLAS200

A supplementary code for Editable Neural Networks, an ICLR 2020 submission.

ML for NLP and Computer Vision.

GAN-based 3D human pose estimation model for 3DV'17 paper

High performance distributed framework for training deep learning recommendation models based on PyTorch.

ISNAS-DIP: Image Specific Neural Architecture Search for Deep Image Prior [CVPR 2022]

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.