Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Overview

Variance-Aware-MT-Test-Sets

Variance-Aware Machine Translation Test Sets

License

See LICENSE. We follow the data licensing plan as the same as the WMT benchmark.

VAT Data

We release 70 lightweight and discriminative test sets for machine translation evaluation, covering 35 translation directions from WMT16 to WMT20 competitions. See VAT_data folder for detailed information.

For each translation direction of a specific year, both source and reference are provided for different types of evaluation metrics. For example,

VAT_data/
├── wmt20
    ├── ...
    ├── vat_newstest2020-zhen-ref.en.txt
    └── vat_newstest2020-zhen-src.zh.txt

Meta-Information of VAT

We also provide the meta-inforamtion of reserved data. Each json file contains the IDs of retained data in the original test set. For instance, file wmt20/bert-r_filter-std60.json describes:

{
	...
	"en-de": [4, 6, 10, 13, 14, 15, ...],
	"de-en": [0, 3, 4, 5, 7, 9, ...],
	...
}

Reproduce & Create VAT

The reported results in the paper were produced by single NVIDIA GeForce 1080Ti card.

We will keep updating the code and related documentation after the response.

Requirements

  • sacreBLEU version >= 1.4.14
  • BLEURT version >= 0.0.2
  • COMET version >= 0.1.0
  • BERTScore version >= 0.3.7 (hug_trans==4.2.1)
  • PyTorch version >= 1.5.1
  • Python version >= 3.8
  • CUDA & cudatoolkit >= 10.1

Note: the minimal version is for reproducing the results

Pipeline

  1. Use score_xxx.py to generate the CSV files that stores the sentence-level scores evaluated by the corresponding metrics. For example, evaluating all the WMT20 submissions of all the language pairs using BERTScore:
    CUDA_VISIBLE_DEVICES=0 python score_bert.py -b 128 -s -r dummy -c dummy --rescale_with_baseline \
    	--hypos-dir ${WMT_DATA_PATH}/system-outputs \
    	--refs-dir ${WMT_DATA_PATH}/references \
    	--scores-dir ${WMT_DATA_PATH}/results/system-level/scores_ALL \
    	--testset-name newstest2020 --score-dump wmt20-bertscore.csv
    (Alternative Option) You can use your implementation for dumping the scores given by the metrics. But the CSV header should contain:
    ,TESTSET,LP,ID,METRIC,SYS,SCORE
    
  2. Use cal_filtering.py to filter the test set based on the score warehouse calculated in the last step. For example,
    python cal_filtering.py --score-dump wmt20-bertscore.csv --output VAT_meta/wmt20-test/ --filter-per 60
    It will produce the json files which contain the IDs of reserved sentences.

Statistics of VAT (References)

Benchmark Translation Direction # Sentences # Words # Vocabulary
wmt20 km-en 928 17170 3645
wmt20 cs-en 266 12568 3502
wmt20 en-de 567 21336 5945
wmt20 ja-en 397 10526 3063
wmt20 ps-en 1088 20296 4303
wmt20 en-zh 567 18224 5019
wmt20 en-ta 400 7809 4028
wmt20 de-en 314 16083 4046
wmt20 zh-en 800 35132 6457
wmt20 en-ja 400 12718 2969
wmt20 en-cs 567 16579 6391
wmt20 en-pl 400 8423 3834
wmt20 en-ru 801 17446 6877
wmt20 pl-en 400 7394 2399
wmt20 iu-en 1188 23494 3876
wmt20 ru-en 396 6966 2330
wmt20 ta-en 399 7427 2148
wmt19 zh-en 800 36739 6168
wmt19 en-cs 799 15433 6111
wmt19 de-en 800 15219 4222
wmt19 en-gu 399 8494 3548
wmt19 fr-de 680 12616 3698
wmt19 en-zh 799 20230 5547
wmt19 fi-en 798 13759 3555
wmt19 en-fi 799 13303 6149
wmt19 kk-en 400 9283 2584
wmt19 de-cs 799 15080 6166
wmt19 lt-en 400 10474 2874
wmt19 en-lt 399 7251 3364
wmt19 ru-en 800 14693 3817
wmt19 en-kk 399 6411 3252
wmt19 en-ru 799 16393 6125
wmt19 gu-en 406 8061 2434
wmt19 de-fr 680 16181 3517
wmt19 en-de 799 18946 5340
wmt18 en-cs 1193 19552 7926
wmt18 cs-en 1193 23439 5453
wmt18 en-fi 1200 16239 7696
wmt18 en-tr 1200 19621 8613
wmt18 en-et 800 13034 6001
wmt18 ru-en 1200 26747 6045
wmt18 et-en 800 20045 5045
wmt18 tr-en 1200 25689 5955
wmt18 fi-en 1200 24912 5834
wmt18 zh-en 1592 42983 7985
wmt18 en-zh 1592 34796 8579
wmt18 en-ru 1200 22830 8679
wmt18 de-en 1199 28275 6487
wmt18 en-de 1199 25473 7130
wmt17 en-lv 800 14453 6161
wmt17 zh-en 800 20590 5149
wmt17 en-tr 1203 17612 7714
wmt17 lv-en 800 18653 4747
wmt17 en-de 1202 22055 6463
wmt17 ru-en 1200 24807 5790
wmt17 en-fi 1201 17284 7763
wmt17 tr-en 1203 23037 5387
wmt17 en-zh 800 18001 5629
wmt17 en-ru 1200 22251 8761
wmt17 fi-en 1201 23791 5300
wmt17 en-cs 1202 21278 8256
wmt17 de-en 1202 23838 5487
wmt17 cs-en 1202 22707 5310
wmt16 tr-en 1200 19225 4823
wmt16 ru-en 1199 23010 5442
wmt16 ro-en 800 16200 3968
wmt16 de-en 1200 22612 5511
wmt16 en-ru 1199 20233 7872
wmt16 fi-en 1200 20744 5176
wmt16 cs-en 1200 23235 5324
Owner
NLP2CT Lab, University of Macau
Natural Language Processing & Portuguese - Chinese Machine Translation Laboratory
NLP2CT Lab, University of Macau
Job Assignment System by Real-time Emotion Detection

Emotion-Detection Job Assignment System by Real-time Emotion Detection Emotion is the essential role of facial expression and it could provide a lot o

1 Feb 08, 2022
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

ZJU3DV 1.4k Jan 04, 2023
An example of Scatterbrain implementation (combining local attention and Performer)

An example of Scatterbrain implementation (combining local attention and Performer)

HazyResearch 97 Jan 02, 2023
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
Sharing of contents on mitochondrial encounter networks

mito-network-sharing Sharing of contents on mitochondrial encounter networks Required: R with igraph, brainGraph, ggplot2, and XML libraries; igraph l

Stochastic Biology Group 0 Oct 01, 2021
A 10000+ hours dataset for Chinese speech recognition

WenetSpeech Official website | Paper A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition Download Please visit the official website, rea

310 Jan 03, 2023
Addition of pseudotorsion caclulation eta, theta, eta', and theta' to barnaba package

Addition to Original Barnaba Code: This is modified version of Barnaba package to calculate RNA pseudotorsion angles eta, theta, eta', and theta'. Ple

Mandar Kulkarni 1 Jan 11, 2022
A Closer Look at Reference Learning for Fourier Phase Retrieval

A Closer Look at Reference Learning for Fourier Phase Retrieval This repository contains code for our NeurIPS 2021 Workshop on Deep Learning and Inver

Tobias Uelwer 1 Oct 28, 2021
Catbird is an open source paraphrase generation toolkit based on PyTorch.

Catbird is an open source paraphrase generation toolkit based on PyTorch. Quick Start Requirements and Installation The project is based on PyTorch 1.

Afonso Salgado de Sousa 5 Dec 15, 2022
Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Fibonaaci Series Generating the Fibonacci sequence is a classic recursive proble

Moureen Caroline O 1 Feb 15, 2022
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
Depth image based mouse cursor visual haptic

Depth image based mouse cursor visual haptic How to run it. Install pyqt5. Install python modules pip install Pillow pip install numpy For illustrati

Xiong Jie 17 Dec 20, 2022
Lightwood is Legos for Machine Learning.

Lightwood is like Legos for Machine Learning. A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glu

MindsDB Inc 312 Jan 08, 2023
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms

CARLA - Counterfactual And Recourse Library CARLA is a python library to benchmark counterfactual explanation and recourse models. It comes out-of-the

Carla Recourse 200 Dec 28, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

TART This project is a PyTorch implementation for Transition Matrix Representati

Lee Sael 2 Jan 19, 2022
SIEM Logstash parsing for more than hundred technologies

LogIndexer Pipeline Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM Why this project exists The overhead o

146 Dec 29, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Skyformer This repository is the official implementation of Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr"om Method (NeurIPS 2021).

Qi Zeng 46 Sep 20, 2022