Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Last update: Dec 08, 2022

Related tags

Deep Learning RGBTCrowdCounting

Overview

RGBT Crowd Counting

Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [PDF]

Download RGBT-CC Dataset & Models: [Dropbox][BaiduYun (PW: RGBT)]

Our framework can be implemented with various backbone networks. You can refer to this page for implementing BL+IADM. Moreover, the proposed framework can also be applied to RGBD crowd counting and the implementation of CSRNet+IADM is available.

If you use this code and benchmark for your research, please cite our work:

@inproceedings{liu2021cross,
  title={Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting},
  author={Liu, Lingbo and Chen, Jiaqi and Wu, Hefeng and Li, Guanbin and Li, Chenglong and Lin, Liang},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Introduction

Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.

RGBT-CC Benchmark

To promote the future research of this task, we propose a large-scale RGBT Crowd Counting (RGBT-CC) benchmark. Specifically, this benchmark consists of 2,030 pairs of 640x480 RGB-thermal images captured in various scenarios (e.g., malls, streets, playgrounds, train stations, metro stations, etc). Among these samples, 1,013 pairs are captured in the light and 1,017 pairs are in the darkness. A total of 138,389 pedestrians are marked with point annotations, on average 68 people per image. Finally, the proposed RGBT-CC benchmark is randomly divided into three parts: 1030 pairs are used for training, 200 pairs are for validation and 800 pairs are for testing. Compared with those Internet-based datasets with serious bias, our RGBT-CC dataset has closer crowd density distribution to realistic cities, since our images are captured in urban scenes with various densities. Therefore, our dataset has wider applications for urban crowd analysis.

Method

The proposed RGBT crowd counting framework is composed of three parallel backbones and an Information Aggregation-Distribution Module (IADM). Specifically, the top and bottom backbones are developed for modality-specific (i.e. RGB images and thermal images) representation learning, while the middle backbone is designed for modality-shared representation learning. To fully exploit the multimodal complementarities, our IADM dynamically transfers the specific-shared information to collaboratively enhance the modality-specific and modality-shared representations. Consequently, the final modality-shared feature contains comprehensive information and facilitates generating high-quality crowd density maps.

Experiments

More References

Crowd Counting with Deep Structured Scale Integration Network, ICCV 2019 [PDF]

Crowd Counting using Deep Recurrent Spatial-Aware Network, IJCAI 2018 [PDF]

Efficient Crowd Counting via Structured Knowledge Transfer, ACM MM 2020 [PDF]

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Related tags

Overview

RGBT Crowd Counting

Introduction

RGBT-CC Benchmark

Method

Experiments

More References

Owner

TensorFlow for Raspberry Pi

Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

A package for music online and offline rhythmic information analysis including music Beat, downbeat, tempo and meter tracking.

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

Official code for Next Check-ins Prediction via History and Friendship on Location-Based Social Networks (MDM 2018)

An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

Unbiased Learning To Rank Algorithms (ULTRA)

Api for getting bin info and getting encrypted card details for adyen.

This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

PyTorch reimplementation of Diffusion Models

Deep Compression for Dense Point Cloud Maps.

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"