Unimodal Face Classification with Multimodal Training

This is a PyTorch implementation of the following paper:

Unimodal Face Classification with Multimodal Training

Wenbin Teng (Boston University), Chongyang Bai (Dartmouth College)

Abstract: We propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids the useless modality (if any) from confusing the model. This way, the learned autoencoders can generate robust embeddings in single-modality face classification on test stage. We evaluate our framework in two face classification datasets and two kinds of testing input: (1) poor-condition image and (2) point cloud or 3D face mesh, when both 2D and 3D modalities are available for training.

The proposed method applies both 2D and 3D encoder to extract the embeddings of each individual modalities. Divergence between both embeddings is minimized adaptively through measuring the classification loss. Based on the type of testing modality, we use certain decoder to reconstruct 2D and 3D inputs from feature embeddings. An overview of the proposed network is shown in the following picture:

Unimodal Face Classification with Multimodal Training

Related tags

Overview

Unimodal Face Classification with Multimodal Training

Owner

Wenbin Teng

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Romanian Automatic Speech Recognition from the ROBIN project

Pytorch Implementation of the paper "Cross-domain Correspondence Learning for Exemplar-based Image Translation"

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"

Meli Data Challenge 2021 - First Place Solution

FID calculation with proper image resizing and quantization steps

🥇Samsung AI Challenge 2021 1등 솔루션입니다🥇

Continual Learning of Electronic Health Records (EHR).

This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

Pre-trained NFNets with 99% of the accuracy of the official paper

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Code and data for ImageCoDe, a contextual vison-and-language benchmark

Enigma-Plus - Python based Enigma machine simulator with some extra features

CowHerd is a partially-observed reinforcement learning environment

Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

TransVTSpotter: End-to-end Video Text Spotter with Transformer

Denoising Diffusion Probabilistic Models