Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Last update: Nov 17, 2022

Related tags

Overview

CMIC-Retrieval

Code for Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning. ICCV 2021.

Introduction

In this work, we tackle the problem of single image-based 3D shape retrieval (IBSR), where we seek to find the most matched shape of a given single 2D image from a shape repository. Most of the existing works learn to embed 2D images and 3D shapes into a common feature space and perform metric learning using a triplet loss. Inspired by the great success in recent contrastive learning works on self-supervised representation learning, we propose a novel IBSR pipeline leveraging contrastive learning. We note that adopting such cross-modal contrastive learning between 2D images and 3D shapes into IBSR tasks is non-trivial and challenging: contrastive learning requires very strong data augmentation in constructed positive pairs to learn the feature invariance, whereas traditional metric learning works do not have this requirement. However, object shape and appearance are entangled in 2D query images, thus making the learning task more difficult than contrasting single-modal data. To mitigate the challenges, we propose to use multi-view grayscale rendered images from the 3D shapes as a shape representation. We then introduce a strong data augmentation technique based on color transfer, which can significantly but naturally change the appearance of the query image, effectively satisfying the need for contrastive learning. Finally, we propose to incorporate a novel category-level contrastive loss that helps distinguish similar objects from different categories, in addition to classic instance-level contrastive loss. Our experiments demonstrate that our approach achieves the best performance on all the three popular IBSR benchmarks, including Pix3D, Stanford Cars, and Comp Cars, outperforming the previous state-of-the-art from 4% - 15% on retrieval accuracy.

About this repository

This repository provides data, pre-trained models and code.

Citations

@inProceedings{lin2021cmic,
	title={Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning},
	author={Lin, Ming-Xian and Yang, Jie and Wang, He and Lai, Yu-Kun and Jia, Rongfei and Zhao, Binqiang and Gao, Lin},
	year={2021},
	booktitle={International Conference on Computer Vision (ICCV)}
}

Updates

[Oct 1, 2021] Preliminary version of Data and Code released. For more code and data, coming soon. Please follow our updates.

Code for 'Single Image 3D Shape Retrieval via Cross-Modal Instance and Category Contrastive Learning', ICCV 2021

Related tags

Overview

CMIC-Retrieval

Introduction

About this repository

Citations

Updates

Owner

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

A collection of implementations of deep domain adaptation algorithms

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

Machine learning algorithms for many-body quantum systems

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Repo for EchoVPR: Echo State Networks for Visual Place Recognition

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

Neural network-based build time estimation for additive manufacturing

Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

RodoSol-ALPR Dataset

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

Revisiting Self-Training for Few-Shot Learning of Language Model.

Source code of generalized shuffled linear regression

The 2nd place solution of 2021 google landmark retrieval on kaggle.

Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images