Rich Semantics Improve Few-Shot Learning

Paper Link

Abstract :

Human learning benefits from multi-modal inputs that often appear as rich semantics (e.g., description of an object's attributes while learning about it). This enables us to learn generalizable concepts from very limited visual examples. However, current few-shot learning (FSL) methods use numerical class labels to denote object classes which do not provide rich semantic meanings about the learned concepts. In this work, we show that by using 'class-level' language descriptions, that can be acquired with minimal annotation cost, we can improve the FSL performance. Given a support set and queries, our main idea is to create a bottleneck visual feature (hybrid prototype) which is then used to generate language descriptions of the classes as an auxiliary task during training. We develop a Transformer based forward and backward encoding mechanism to relate visual and semantic tokens that can encode intricate relationships between the two modalities. Forcing the prototypes to retain semantic information about class description acts as a regularizer on the visual features, improving their generalization to novel classes at inference. Furthermore, this strategy imposes a human prior on the learned representations, ensuring that the model is faithfully relating visual and semantic concepts, thereby improving model interpretability. Our experiments on four datasets and ablation studies show the benefit of effectively modeling rich semantics for FSL.

Citation

@article{rsfsl,
  title={Rich Semantics Improve Few-shot Learning},
  author={Afham, Mohamed and Khan, Salman and Khan, Muhammad Haris and Naseer, Muzammal and Khan, Fahad Shahbaz},
  journal={32nd British Machine Vision Conference},
  year={2021}
}

Class Level descriptions of miniImageNet dataset is available in miniImageNet_descriptions. Some qualitative examples are shown below.

Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Related tags

Overview

Rich Semantics Improve Few-Shot Learning

Paper Link

Abstract :

Citation

Code will be released soon !!

Owner

Mohamed Afham

Anime Face Detector using mmdet and mmpose

Isaac Gym Reinforcement Learning Environments

Educational 2D SLAM implementation based on ICP and Pose Graph

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Tutoriais publicados nas nossas redes sociais para obtenção de dados, análises simples e outras tarefas relevantes no mercado financeiro.

Bayesian Neural Networks in PyTorch

PASSL包含 SimCLR，MoCo，BYOL，CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer，Swin-Transformer，BEiT，CVT，T2T，MLP_Mixer等视觉Transformer算法

A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training @ KDD 2020

StyleGAN - Official TensorFlow Implementation

Gesture recognition on Event Data

Code for Paper: Self-supervised Learning of Motion Capture

IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

Can we learn gradients by Hamiltonian Neural Networks?

Code for Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid

DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs

A Python package for faster, safer, and simpler ML processes

The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

The datasets and code of ACL 2021 paper "Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions".