"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Last update: Oct 21, 2022

Related tags

Overview

bandit-nmt

THIS REPO DEMONSTRATES HOW TO INTEGRATE A POLICY GRADIENT METHOD INTO NMT. FOR A STATE-OF-THE-ART NMT CODEBASE, VISIT simple-nmt.

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-decoder model and benchmarks the combination under simulated noisy rewards.

Requirements:

Python 3.6
PyTorch 0.2

NOTE: as of Sep 16 2017, the code got 2x slower when I upgraded to PyTorch 2.0. This is a known issue and PyTorch is fixing it.

IMPORTANT: Set home directory (otherwise scripts will not run correctly):

> export BANDIT_HOME=$PWD
> export DATA=$BANDIT_HOME/data
> export SCRIPT=$BANDIT_HOME/scripts

Data extraction

Download pre-processing scripts

> cd $DATA/scripts
> bash download_scripts.sh

For German-English

> cd $DATA/en-de
> bash extract_data_de_en.sh

NOTE: train_2014 and train_2015 highly overlap. Please be cautious when using them for other projects.

Data should be ready in $DATA/en-de/prep

TODO: Chinese-English needs segmentation

Data pre-processing

> cd $SCRIPT
> bash make_data.sh de en

Pretraining

Pretrain both actor and critic

> cd $SCRIPT
> bash pretrain.sh en-de $YOUR_LOG_DIR

See scripts/pretrain.sh for more details.

Pretrain actor only

> cd $BANDIT_HOME
> python train.py -data $YOUR_DATA -save_dir $YOUR_SAVE_DIR -end_epoch 10

Reinforcement training

> cd $BANDIT_HOME

From scratch

> python train.py -data $YOUR_DATA -save_dir $YOUR_SAVE_DIR -start_reinforce 10 -end_epoch 100 -critic_pretrain_epochs 5

From a pretrained model

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -save_dir $YOUR_SAVE_DIR -start_reinforce -1 -end_epoch 100 -critic_pretrain_epochs 5

Perturbed rewards

For example, use thumb up/thump down reward:

> cd $BANDIT_HOME
> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -save_dir $YOUR_SAVE_DIR -start_reinforce -1 -end_epoch 100 -critic_pretrain_epochs 5 -pert_func bin -pert_param 1

See lib/metric/PertFunction.py for more types of function.

Evaluation

> cd $BANDIT_HOME

On heldout sets (heldout BLEU):

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -eval -save_dir .

On bandit set (per-sentence BLEU):

> python train.py -data $YOUR_DATA -load_from $YOUR_MODEL -eval_sample -save_dir .

"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

Related tags

Overview

bandit-nmt

Data extraction

Data pre-processing

Pretraining

Reinforcement training

Perturbed rewards

Evaluation

Owner

Khanh Nguyen

Neural HMMs are all you need (for high-quality attention-free TTS)

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

FcaNet: Frequency Channel Attention Networks

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

Deploy a ML inference service on a budget in less than 10 lines of code.

A curated list of Generative Deep Art projects, tools, artworks, and models

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

Pretty Tensor - Fluent Neural Networks in TensorFlow

Libraries, tools and tasks created and used at DeepMind Robotics.

codebase for "A Theory of the Inductive Bias and Generalization of Kernel Regression and Wide Neural Networks"

A simple Rock-Paper-Scissors game using CV in python

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

Point-NeRF: Point-based Neural Radiance Fields

Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Complete* list of autonomous driving related datasets