Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Related tags

Deep LearningDCHN
Overview

2021-IEEE TCYB-DCHN

Peng Hu, Xi Peng, Hongyuan Zhu, Jie Lin, Liangli Zhen, Dezhong Peng, Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J]. IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4982-4993, Oct. 2021. (PyTorch Code)

Abstract

Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these difficulties, we propose a decoupled CVH network (DCHN) approach which consists of a semantic hashing autoencoder module (SHAM) and multiple multiview hashing networks (MHNs). To be specific, SHAM adopts a hashing encoder and decoder to learn a discriminative Hamming space using either a few labels or the number of classes, that is, the so-called flexible inputs. After that, MHN independently projects all samples into the discriminative Hamming space that is treated as an alternative ground truth. In brief, the Hamming space is learned from the semantic space induced from the flexible inputs, which is further used to guide view-specific hashing in an independent fashion. Thanks to such an independent/decoupled paradigm, our method could enjoy high computational efficiency and the capacity of handling the increasing number of views by only using a few labels or the number of classes. For a newly coming view, we only need to add a view-specific network into our model and avoid retraining the entire model using the new and previous views. Extensive experiments are carried out on five widely used multiview databases compared with 15 state-of-the-art approaches. The results show that the proposed independent hashing paradigm is superior to the common joint ones while enjoying high efficiency and the capacity of handling newly coming views.

Framework

DCHN

Figure 1. Framework of the proposed DCHN method. g is the output of the corresponding view (i.e., image, text, video, etc.). o is the semantic hash code that is computed by the corresponding label y and semantic hashing transformation W. W is computed by the proposed semantic hashing autoencoder module (SHAM). sgn is an elementwise sign function. ℒR and ℒH are hash reconstruction and semantic hashing functions, respectively. In the training stage, first, W is used to recast the label y as a ground-truth hash code o. Then, the obtained hash code is used to guide view-specific networks with a semantic hashing reconstruction regularizer. Such a learning scheme makes the v view-specific neural networks (one network for each view) can be trained separately since they are decoupled and do not share any trainable parameters. Therefore, our DCHN can be easy to scale to a large number of views. In the inference stage, each trained view-specific network fk(xk, Θk) is used to compute the hash code of the sample xk.

SHAM

Figure 1. Proposed SHAM utilizes the semantic information (e.g., labels or classes) to learn an encoder W and a decoder WT by mutually converting the semantic and Hamming spaces. SHAM is one key component of our independent hashing paradigm.

Usage

First, to train SHAM wtih 64 bits on MIRFLICKR-25K, just run trainSHAM.py as follows:

python trainSHAM.py --datasets mirflickr25k --output_shape 64 --gama 1 --available_num 100

Then, to train a model for image modality wtih 64 bits on MIRFLICKR-25K, just run main_DCHN.py as follows:

python main_DCHN.py --mode train --epochs 100 --view 0 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 0

For text modality:

python main_DCHN.py --mode train --epochs 100 --view 1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --gpu_id 1

To evaluate the trained models, you could run main_DCHN.py as follows:

python main_DCHN.py --mode eval --view -1 --datasets mirflickr25k --output_shape 64 --alpha 0.02 --gama 1 --available_num 100 --num_workers 0

Comparison with the State-of-the-Art

Table 1: Performance comparison in terms of MAP scores on the MIRFLICKR-25K and IAPR TC-12 datasets. The highest MAP score is shown in bold.

   Method    MIRFLICKR-25K IAPR TC-12
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.581 0.520 0.553 0.573 0.578 0.544 0.556 0.579 0.329 0.292 0.309 0.298 0.332 0.295 0.311 0.304
SePH [21] 0.729 0.738 0.744 0.750 0.753 0.762 0.764 0.769 0.467 0.476 0.486 0.493 0.463 0.475 0.485 0.492
SePHlr [12] 0.729 0.746 0.754 0.763 0.760 0.780 0.785 0.793 0.410 0.434 0.448 0.463 0.461 0.495 0.515 0.525
RoPH [34] 0.733 0.744 0.749 0.756 0.757 0.759 0.768 0.771 0.457 0.481 0.493 0.500 0.451 0.478 0.488 0.495
LSRH [22] 0.756 0.780 0.788 0.800 0.772 0.786 0.791 0.802 0.474 0.490 0.512 0.522 0.474 0.492 0.511 0.526
KDLFH [23] 0.734 0.755 0.770 0.771 0.764 0.780 0.794 0.797 0.306 0.314 0.351 0.357 0.307 0.315 0.350 0.356
DLFH [23] 0.721 0.743 0.760 0.767 0.761 0.788 0.805 0.810 0.306 0.314 0.326 0.340 0.305 0.315 0.333 0.353
MTFH [13] 0.581 0.571 0.645 0.543 0.584 0.556 0.633 0.531 0.303 0.303 0.307 0.300 0.303 0.303 0.308 0.302
DJSRH [14] 0.620 0.630 0.645 0.660 0.620 0.626 0.645 0.649 0.368 0.396 0.419 0.439 0.370 0.400 0.423 0.437
DCMH [9] 0.737 0.754 0.763 0.771 0.753 0.760 0.763 0.770 0.423 0.439 0.456 0.463 0.449 0.464 0.476 0.481
SSAH [20] 0.797 0.809 0.810 0.802 0.782 0.797 0.799 0.790 0.501 0.503 0.496 0.479 0.504 0.530 0.554 0.565
DCHN0 0.806 0.823 0.836 0.842 0.797 0.808 0.823 0.827 0.487 0.492 0.550 0.573 0.481 0.488 0.543 0.567
DCHN100 0.813 0.816 0.823 0.840 0.808 0.803 0.814 0.830 0.533 0.558 0.582 0.596 0.527 0.557 0.582 0.595

Table 2: Performance comparison in terms of MAP scores on the NUS-WIDE and MS-COCO datasets. The highest MAP score is shown in bold.

   Method    NUS-WIDE MS-COCO
Image → Text Text → Image Image → Text Text → Image
16 32 64 128 16 32 64 128 16 32 64 128 16 32 64 128
Baseline 0.281 0.337 0.263 0.341 0.299 0.339 0.276 0.346 0.362 0.336 0.332 0.373 0.348 0.341 0.347 0.359
SePH [21] 0.644 0.652 0.661 0.664 0.654 0.662 0.670 0.673 0.586 0.598 0.620 0.628 0.587 0.594 0.618 0.625
SePHlr [12] 0.607 0.624 0.644 0.651 0.630 0.649 0.665 0.672 0.527 0.571 0.592 0.600 0.555 0.596 0.618 0.621
RoPH [34] 0.638 0.656 0.662 0.669 0.645 0.665 0.671 0.677 0.592 0.634 0.649 0.657 0.587 0.628 0.643 0.652
LSRH [22] 0.622 0.650 0.659 0.690 0.600 0.662 0.685 0.692 0.580 0.563 0.561 0.567 0.580 0.611 0.615 0.632
KDLFH [23] 0.323 0.367 0.364 0.403 0.325 0.365 0.368 0.408 0.373 0.403 0.451 0.542 0.370 0.400 0.449 0.542
DLFH [23] 0.316 0.367 0.381 0.404 0.319 0.379 0.386 0.415 0.352 0.398 0.455 0.443 0.359 0.393 0.456 0.442
MTFH [13] 0.265 0.473 0.434 0.445 0.243 0.418 0.414 0.485 0.288 0.264 0.311 0.413 0.301 0.284 0.310 0.406
DJSRH [14] 0.433 0.453 0.467 0.442 0.457 0.468 0.468 0.501 0.478 0.520 0.544 0.566 0.462 0.525 0.550 0.567
DCMH [9] 0.569 0.595 0.612 0.621 0.548 0.573 0.585 0.592 0.548 0.575 0.607 0.625 0.568 0.595 0.643 0.664
SSAH [20] 0.636 0.636 0.637 0.510 0.653 0.676 0.683 0.682 0.550 0.577 0.576 0.581 0.552 0.578 0.578 0.669
DCHN0 0.648 0.660 0.669 0.683 0.662 0.677 0.685 0.697 0.602 0.658 0.682 0.706 0.591 0.652 0.669 0.696
DCHN100 0.654 0.671 0.681 0.691 0.668 0.683 0.697 0.707 0.662 0.701 0.703 0.720 0.650 0.689 0.693 0.714

Citation

If you find DCHN useful in your research, please consider citing:

@article{hu2021joint,
  author={Hu, Peng and Peng, Xi and Zhu, Hongyuan and Lin, Jie and Zhen, Liangli and Peng, Dezhong},
  journal={IEEE Transactions on Cybernetics}, 
  title={Joint Versus Independent Multiview Hashing for Cross-View Retrieval}, 
  year={2021},
  volume={51},
  number={10},
  pages={4982-4993},
  doi={10.1109/TCYB.2020.3027614}}
}
Owner
https://penghu-cs.github.io/
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

87 Jan 07, 2023
Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Playground4AWS Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021. Architecture Minecraft and Lamps This project i

Vinicius Senger 5 Nov 30, 2022
📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

📚 A collection of all the Deep Learning Metrics that I came across which are not accuracy/loss.

Rahul Vigneswaran 1 Jan 17, 2022
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators This is our Pytorch implementation for t

RUCAIBox 12 Jul 22, 2022
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Face Recognition: Too Bias, or Not Too Bias? Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition:

Joseph P. Robinson 41 Dec 12, 2022
Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Bert Axioms This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics Required Data In order to run this code, you

Arthur Câmara 5 Jan 21, 2022
Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

32 Dec 26, 2022
Build and run Docker containers leveraging NVIDIA GPUs

NVIDIA Container Toolkit Introduction The NVIDIA Container Toolkit allows users to build and run GPU accelerated Docker containers. The toolkit includ

NVIDIA Corporation 15.6k Jan 01, 2023
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
End-to-end image segmentation kit based on PaddlePaddle.

English | 简体中文 PaddleSeg PaddleSeg has released the new version including the following features: Our team won the 6.2k Jan 02, 2023

Deep learning model for EEG artifact removal

DeepSeparator Introduction Electroencephalogram (EEG) recordings are often contaminated with artifacts. Various methods have been developed to elimina

23 Dec 21, 2022
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

RTD-Net (ICCV 2021) This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021. N

Multimedia Computing Group, Nanjing University 80 Nov 30, 2022
Data Augmentation with Variational Autoencoders

Documentation Pyraug This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging con

112 Nov 30, 2022
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and

Arun S. Maiya 1.1k Jan 02, 2023
Method for facial emotion recognition compitition of Xunfei and Datawhale .

人脸情绪识别挑战赛-第3名-W03KFgNOc-源代码、模型以及说明文档 队名:W03KFgNOc 排名:3 正确率: 0.75564 队员:yyMoming,xkwang,RichardoMu。 比赛链接:人脸情绪识别挑战赛 文章地址:link emotion 该项目分别训练八个模型并生成csv文

6 Oct 17, 2022
Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

Find-Lane-Line This project is to use openCV library and Python to detect the road-lane-line. Data Pipeline Step one : Color Selection Step two : Cann

Kenny Cheng 3 Aug 17, 2022
Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation Ported from https://github.com/hzwer/arXiv2020-RIFE Dependencies NumPy

49 Jan 07, 2023
Source code of our work: "Benchmarking Deep Models for Salient Object Detection"

SALOD Source code of our work: "Benchmarking Deep Models for Salient Object Detection". In this works, we propose a new benchmark for SALient Object D

22 Dec 30, 2022
The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

savior是一个能够进行快速集成算法模块并支持高性能部署的轻量开发框架。能够帮助将团队进行快速想法验证(PoC),避免重复的去github上找模型然后复现模型;能够帮助团队将功能进行流程拆解,很方便的提高分布式执行效率;能够有效减少代码冗余,减少不必要负担。

Tao Luo 125 Dec 22, 2022