TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Overview

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nigel Collier

Code of our paper: TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

Introduction:

Masked language models (MLMs) such as BERT and RoBERTa have revolutionized the field of Natural Language Understanding in the past few years. However, existing pre-trained MLMs often output an anisotropic distribution of token representations that occupies a narrow subset of the entire representation space. Such token representations are not ideal, especially for tasks that demand discriminative semantic meanings of distinct tokens. In this work, we propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations. TaCL is fully unsupervised and requires no additional data. We extensively test our approach on a wide range of English and Chinese benchmarks. The results show that TaCL brings consistent and notable improvements over the original BERT model. Furthermore, we conduct detailed analysis to reveal the merits and inner-workings of our approach

Main Results:

We show the comparison between TaCL (base version) and the original BERT (base version).

(1) English benchmark results on SQuAD (Rajpurkar et al., 2018) (dev set) and GLUE (Wang et al., 2019) average score.

Model SQuAD 1.1 (EM/F1) SQuAD 2.0 (EM/F1) GLUE Average
BERT 80.8/88.5 73.4/76.8 79.6
TaCL 81.6/89.0 74.4/77.5 81.2

(2) Chinese benchmark results (test set F1) on four NER tasks (MSRA, OntoNotes, Resume, and Weibo) and three Chinese word segmentation (CWS) tasks (PKU, CityU, and AS).

Model MSRA OntoNotes Resume Weibo PKU CityU AS
BERT 94.95 80.14 95.53 68.20 96.50 97.60 96.50
TaCL 95.44 82.42 96.45 69.54 96.75 98.16 96.75

Huggingface Models:

Model Name Model Address
English (cambridgeltl/tacl-bert-base-uncased) link
Chinese (cambridgeltl/tacl-bert-base-chinese) link

Example Usage:

import torch
# initialize model
from transformers import AutoModel, AutoTokenizer
model_name = 'cambridgeltl/tacl-bert-base-uncased'
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# create input ids
text = '[CLS] clbert is awesome. [SEP]'
tokenized_token_list = tokenizer.tokenize(text)
input_ids = torch.LongTensor(tokenizer.convert_tokens_to_ids(tokenized_token_list)).view(1, -1)
# compute hidden states
representation = model(input_ids).last_hidden_state # [1, seqlen, embed_dim]

Tutorial (in Chinese language) on how to use Chinese TaCL BERT to performance Name Entity Recognition and Chinese word segmentation:

Tutorial link

Tutorial on how to reproduce the results in our paper:

1. Environment Setup:

python version: 3.8
pip3 install -r requirements.txt

2. Train TaCL:

(1) Prepare pre-training data:

Please refer to details provided in ./pretraining_data directory.

(2) Train the model:

Please refer to details provided in ./pretraining directory.

3. Experiments on English Benchmarks:

Please refer to details provided in ./english_benchmark directory.

4. Experiments on Chinese Benchmarks:

(1) Chinese Benchmark Data Preparation:

chmod +x ./download_benchmark_data.sh
./download_benchmark_data.sh

(2) Fine-tuning and Inference:

Please refer to details provided in ./chinese_benchmark directory.

5. Replicate Our Analysis Results:

We provide all essential code to replicate the results (the images below) provided in our analysis section. The related codes and instructions are located in ./analysis directory. Have fun!

Citation:

If you find our paper and resources useful, please kindly cite our paper:

@misc{su2021tacl,
      title={TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning}, 
      author={Yixuan Su and Fangyu Liu and Zaiqiao Meng and Lei Shu and Ehsan Shareghi and Nigel Collier},
      year={2021},
      eprint={2111.04198},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Contact

If you have any questions, feel free to contact me via ([email protected]).

Owner
Yixuan Su
I am a final-year PhD student at the University of Cambridge, supervised by Professor Nigel Collier.
Yixuan Su
Tightness-aware Evaluation Protocol for Scene Text Detection

TIoU-metric Release on 27/03/2019. This repository is built on the ICDAR 2015 evaluation code. If you propose a better metric and require further eval

Yuliang Liu 206 Nov 18, 2022
Pytorch Implementation for Dilated Continuous Random Field

DilatedCRF Pytorch implementation for fully-learnable DilatedCRF. If you find my work helpful, please consider our paper: @article{Mo2022dilatedcrf,

DunnoCoding_Plus 3 Nov 13, 2022
The official re-implementation of the Neurips 2021 paper, "Targeted Neural Dynamical Modeling".

Targeted Neural Dynamical Modeling Note: This is a re-implementation (in Tensorflow2) of the original TNDM model. We do not plan to further update the

6 Oct 05, 2022
2021 credit card consuming recommendation

2021 credit card consuming recommendation

Wang, Chung-Che 7 Mar 08, 2022
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group 94 Dec 18, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
Deep Reinforcement Learning with pytorch & visdom

Deep Reinforcement Learning with pytorch & visdom Sample testings of trained agents (DQN on Breakout, A3C on Pong, DoubleDQN on CartPole, continuous A

Jingwei Zhang 783 Jan 04, 2023
Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing

FGHV Impelmentation for paper Feature Generation and Hypothesis Verification for Reliable Face Anti-Spoofing Requirements Python 3.6 Pytorch 1.5.0 Cud

5 Jun 02, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

47 Dec 16, 2022
DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

DeepMetaHandles (CVPR2021 Oral) [paper] [animations] DeepMetaHandles is a shape deformation technique. It learns a set of meta-handles for each given

Liu Minghua 73 Dec 15, 2022
Luminaire is a python package that provides ML driven solutions for monitoring time series data.

A hands-off Anomaly Detection Library Table of contents What is Luminaire Quick Start Time Series Outlier Detection Workflow Anomaly Detection for Hig

Zillow 670 Jan 02, 2023
Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Batch Soft-DTW(Dynamic Time Warping) in TensorFlow2 including forward and backward computation Custom TensorFlow2 implementations of forward and backw

19 Aug 30, 2022
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

287 Dec 21, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
TDmatch is a Python library developed to perform matching tasks in three categories:

TDmatch TDmatch is a Python library developed to perform matching tasks in three categories: Text to Data which matches tuples of a table to text docu

Naser Ahmadi 5 Aug 11, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 06, 2023
Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung

Vending_Machine_(Mesin_Penjual_Minuman) Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung Raw Sketch untuk Essay Ringkasan P

QueenLy 1 Nov 08, 2021
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

What is This This is a simple re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"(1). Only Sections

102 Dec 14, 2022