Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Overview

FENSE

The metric, Fluency ENhanced Sentence-bert Evaluation (FENSE), for audio caption evaluation, proposed in the paper "Can Audio Captions Be Evaluated with Image Caption Metrics?"

The main branch contains an easy-to-use interface for fast evaluation of an audio captioning system.

Online demo avaliable at https://share.streamlit.io/blmoistawinde/fense/main/streamlit_demo/app.py .

To get the dataset (AudioCaps-Eval and Clotho-Eval) and the code to reproduce, please refer to the experiment-code branch.

Installation

Clone the repository and pip install it.

git clone https://github.com/blmoistawinde/fense.git
cd fense
pip install -e .

Usage

Single Sentence

To get the detailed scores of each component for a single sentence.

from fense.evaluator import Evaluator

print("----Using tiny models----")
evaluator = Evaluator(device='cpu', sbert_model='paraphrase-MiniLM-L6-v2', echecker_model='echecker_clotho_audiocaps_tiny')

eval_cap = "An engine in idling and a man is speaking and then"
ref_cap = "A machine makes stitching sounds while people are talking in the background"

score, error_prob, penalized_score = evaluator.sentence_score(eval_cap, [ref_cap], return_error_prob=True)

print("Cand:", eval_cap)
print("Ref:", ref_cap)
print(f"SBERT sim: {score:.4f}, Error Prob: {error_prob:.4f}, Penalized score: {penalized_score:.4f}")

System Score

To get a system's overall score on a dataset by averaging sentence-level FENSE, you can use eval_system.py, with your system outputs prepared in the format like test_data/audiocaps_cands.csv or test_data/clotho_cands.csv .

For AudioCaps test set:

python eval_system.py --device cuda --dataset audiocaps --cands_dir ./test_data/audiocaps_cands.csv

For Clotho Eval set:

python eval_system.py --device cuda --dataset clotho --cands_dir ./test_data/clotho_cands.csv

Performance Benchmark

We benchmark the performance of FENSE with different choices of SBERT model and Error Detector on the two benchmark dataset AudioCaps-Eval and Clotho-Eval. (*) is the combination reported in paper.

AudioCaps-Eval

SBERT echecker HC HI HM MM total
paraphrase-MiniLM-L6-v2 none 62.1 98.8 93.7 75.4 80.4
paraphrase-MiniLM-L6-v2 tiny 57.6 94.7 89.5 82.6 82.3
paraphrase-MiniLM-L6-v2 base 62.6 98 82.5 85.4 85.5
paraphrase-TinyBERT-L6-v2 none 64 99.2 92.5 73.6 79.6
paraphrase-TinyBERT-L6-v2 tiny 58.6 95.1 88.3 82.2 82.1
paraphrase-TinyBERT-L6-v2 base 64.5 98.4 91.6 84.6 85.3(*)
paraphrase-mpnet-base-v2 none 63.1 98.8 94.1 74.1 80.1
paraphrase-mpnet-base-v2 tiny 58.1 94.3 90 83.2 82.7
paraphrase-mpnet-base-v2 base 63.5 98 92.5 85.9 85.9

Clotho-Eval

SBERT echecker HC HI HM MM total
paraphrase-MiniLM-L6-v2 none 59.5 95.1 76.3 66.2 71.3
paraphrase-MiniLM-L6-v2 tiny 56.7 90.6 79.3 70.9 73.3
paraphrase-MiniLM-L6-v2 base 60 94.3 80.6 72.3 75.3
paraphrase-TinyBERT-L6-v2 none 60 95.5 75.9 66.9 71.8
paraphrase-TinyBERT-L6-v2 tiny 59 93 79.7 71.5 74.4
paraphrase-TinyBERT-L6-v2 base 60.5 94.7 80.2 72.8 75.7(*)
paraphrase-mpnet-base-v2 none 56.2 96.3 77.6 65.2 70.7
paraphrase-mpnet-base-v2 tiny 54.8 91.8 80.6 70.1 73
paraphrase-mpnet-base-v2 base 57.1 95.5 81.9 71.6 74.9

Reference

If you use FENSE in your research, please cite:

@misc{zhou2021audio,
      title={Can Audio Captions Be Evaluated with Image Caption Metrics?}, 
      author={Zelin Zhou and Zhiling Zhang and Xuenan Xu and Zeyu Xie and Mengyue Wu and Kenny Q. Zhu},
      year={2021},
      eprint={2110.04684},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
You might also like...
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Source code for NAACL 2021 paper
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

Pure python PEMDAS expression solver without using built-in eval function

pypemdas Pure python PEMDAS expression solver without using built-in eval function. Supports nested parenthesis. Supported operators: + - * / ^ Exampl

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Yet another video caption

Yet another video caption

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU)

DocFormer - PyTorch Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for t

171 Jan 06, 2023
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f

Zoey Liu 0 Jan 07, 2022
Fully Connected DenseNet for Image Segmentation

Fully Connected DenseNets for Semantic Segmentation Fully Connected DenseNet for Image Segmentation implementation of the paper The One Hundred Layers

Somshubra Majumdar 84 Oct 31, 2022
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Xiong Weiyu 1 Jul 14, 2022
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
Code for ICCV 2021 paper Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes using Scene Graphs

Graph-to-3D This is the official implementation of the paper Graph-to-3d: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs | arx

Helisa Dhamo 33 Jan 06, 2023
Justmagic - Use a function as a method with this mystic script, like in Nim

justmagic Use a function as a method with this mystic script, like in Nim. Just

witer33 8 Oct 08, 2022
A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo

idn-solver Paper | Project Page This repository contains the code release of our ICCV 2021 paper: A Confidence-based Iterative Solver of Depths and Su

zhaowang 43 Nov 17, 2022
PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Value Iteration Networks in PyTorch Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. Value Iteration Networks. Neural Information Processing

LEI TAI 75 Nov 24, 2022
Code for the paper "Attention Approximates Sparse Distributed Memory"

Attention Approximates Sparse Distributed Memory - Codebase This is all of the code used to run analyses in the paper "Attention Approximates Sparse D

Trenton Bricken 14 Dec 05, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
Benchmarking the robustness of Spatial-Temporal Models

Benchmarking the robustness of Spatial-Temporal Models This repositery contains the code for the paper Benchmarking the Robustness of Spatial-Temporal

Yi Chenyu Ian 15 Dec 16, 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

Yi Wei 75 Dec 22, 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

CLIP-GEN [简体中文][English] 本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。 CLIP-GEN 是一个 Language-F

75 Dec 29, 2022
Distance-Ratio-Based Formulation for Metric Learning

Distance-Ratio-Based Formulation for Metric Learning Environment Python3 Pytorch (http://pytorch.org/) (version 1.6.0+cu101) json tqdm Preparing datas

Hyeongji Kim 1 Dec 07, 2022
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of

264 Jan 09, 2023
PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

PyTorch NeRF and pixelNeRF NeRF: Tiny NeRF: pixelNeRF: This repository contains minimal PyTorch implementations of the NeRF model described in "NeRF:

Michael A. Alcorn 178 Dec 20, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 02, 2022
A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

IconQA About IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and c

Pan Lu 24 Dec 30, 2022