A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

Related tags

Deep LearningCFN-SR
Overview

CFN-SR

A CROSS-MODAL FUSION NETWORK BASED ON SELF-ATTENTION AND RESIDUAL STRUCTURE FOR MULTIMODAL EMOTION RECOGNITION

The audio-video based multimodal emotion recognition has attracted a lot of attention due to its robust performance. Most of the existing methods focus on proposing different cross-modal fusion strategies. However, these strategies introduce redundancy in the features of different modalities without fully considering the complementary properties between modal information, and these approaches do not guarantee the non-loss of original semantic information during intra- and inter-modal interactions. In this paper, we propose a novel cross-modal fusion network based on self-attention and residual structure (CFN-SR) for multimodal emotion recognition. Firstly, we perform representation learning for audio and video modalities to obtain the semantic features of the two modalities by efficient ResNeXt and 1D CNN, respectively. Secondly, we feed the features of the two modalities into the cross-modal blocks separately to ensure efficient complementarity and completeness of information through the self-attention mechanism and residual structure. Finally, we obtain the output of emotions by splicing the obtained fused representation with the original representation. To verify the effectiveness of the proposed method, we conduct experiments on the RAVDESS dataset. The experimental results show that the proposed CFN-SR achieves the state-of-the-art and obtains 75.76% accuracy with 26.30M parameters.

image-20211007154526694

Setup

Install dependencies

pip install opencv-python moviepy librosa sklearn

Download the RAVDESS dataset using the bash script

bash scripts/download_ravdess.sh <path/to/RAVDESS>

Or download the files manually

and follow the folder structure below and have .csv files in landmarks/ (do not modify file names)

RAVDESS/
    landmarks/
        .csv landmark files
    Actor_01/
    ...
    Actor_24/

Preprocess the dataset using the following

python dataset_prep.py --datadir <path/to/RAVDESS>

Generated folder structure (do not modify file names)

RAVDESS/
    landmarks/
        .csv landmark files
    Actor_01/
    ...
    Actor_24/
    preprocessed/
        Actor_01/
        ...
        Actor_24/
            01-01-01-01-01-01-24.mp4/
                frames/
                    .jpg frames
                audios/
                    .wav raw audio
                    .npy MFCC features
            ...

Download checkpoints folder from Google Drive. The following script downloads all pretrained models (unimodal and MSAF) for all 6 folds.

bash scripts/download_checkpoints.sh

Train

python main_msaf.py --datadir <path/to/RAVDESS/preprocessed> --checkpointdir checkpoints --train

All parameters

usage: main_msaf.py [-h] [--datadir DATADIR] [--k_fold K_FOLD] [--lr LR]
                    [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS]
                    [--epochs EPOCHS] [--checkpointdir CHECKPOINTDIR] [--no_verbose]
                    [--log_interval LOG_INTERVAL] [--no_save] [--train]

Result

Model Fusion Stage Accuracy #Params
Averaging Late 68.82 25.92M
Multiplicative Late 70.35 25.92M
Multiplication Late 70.56 25.92M
Concat + FC Early 71.04 26.87M
MCBP Early 71.32 51.03M
MMTM Model 73.12 31.97M
MSAF Model 74.86 25.94M
ERANNs Model 74.80
CFN-SR(Ours) Model 75.76 26.30M

Reference

  • Note that some codes references MSAF
Owner
skeleton
skeleton
Dynamic hair modeling from monocular videos using deep neural networks

Dynamic Hair Modeling The source code of the networks for our paper "Dynamic hair modeling from monocular videos using deep neural networks" (SIGGRAPH

53 Oct 18, 2022
AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

Confidence-based Graph Convolutional Networks for Semi-Supervised Learning Source code for AISTATS 2019 paper: Confidence-based Graph Convolutional Ne

MALL Lab (IISc) 56 Dec 03, 2022
Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019) This repository contains python (3.5.2) implementation of

Doyup Lee 222 Dec 21, 2022
Yolo ros - YOLO-ROS for HUAWEI ATLAS200

YOLO-ROS YOLO-ROS for NVIDIA YOLO-ROS for HUAWEI ATLAS200, please checkout for b

ChrisLiu 5 Oct 18, 2022
MAVE: : A Product Dataset for Multi-source Attribute Value Extraction

MAVE: : A Product Dataset for Multi-source Attribute Value Extraction The dataset contains 3 million attribute-value annotations across 1257 unique ca

Google Research Datasets 89 Jan 08, 2023
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

MilaGraph 50 Dec 09, 2022
Measuring Coding Challenge Competence With APPS

Measuring Coding Challenge Competence With APPS This is the repository for Measuring Coding Challenge Competence With APPS by Dan Hendrycks*, Steven B

Dan Hendrycks 218 Dec 27, 2022
LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs.

LocUNet LocUNet is a deep learning method to localize a UE based solely on the reported signal strengths from a set of BSs. The method utilizes accura

4 Oct 05, 2022
A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

Jack Hessel 167 Jul 06, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Dec 30, 2022
Edge Restoration Quality Assessment

ERQA - Edge Restoration Quality Assessment ERQA - a full-reference quality metric designed to analyze how good image and video restoration methods (SR

MSU Video Group 27 Dec 17, 2022
General Virtual Sketching Framework for Vector Line Art (SIGGRAPH 2021)

General Virtual Sketching Framework for Vector Line Art - SIGGRAPH 2021 Paper | Project Page Outline Dependencies Testing with Trained Weights Trainin

Haoran MO 118 Dec 27, 2022
African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

2 Jan 05, 2023
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

    VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain

squaresLab 32 Oct 24, 2022
ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees This repository is the official implementation of the empirica

Kuan-Lin (Jason) Chen 2 Oct 02, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
Code for testing convergence rates of Lipschitz learning on graphs

📈 LipschitzLearningRates The code in this repository reproduces the experimental results on convergence rates for k-nearest neighbor graph infinity L

2 Dec 20, 2021
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022