Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Last update: Dec 06, 2022

Related tags

Deep Learning LEBERT

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Arxiv link of the paper: https://arxiv.org/abs/2105.07148

Requirement

Python 3.7.0
Transformer 3.4.0
Numpy 1.18.5
Packaging 17.1
skicit-learn 0.23.2
torch 1.16.0+cu92
tqdm 4.50.2
multiprocess 0.70.10
tensorflow 2.3.1
tensorboardX 2.1
seqeval 1.2.1

Input Format

CoNLL format (prefer BIOES tag scheme), with each character its label for one line. Sentences are splited with a null line.

美   B-LOC  
国   E-LOC  
的   O  
华   B-PER  
莱   I-PER  
士   E-PER  

我   O  
跟   O  
他   O  
谈   O  
笑   O  
风   O  
生   O

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese BERT: https://cdn.huggingface.co/bert-base-chinese-pytorch_model.bin

Chinese word embedding:

Word Embedding: https://ai.tencent.com/ailab/nlp/en/data/Tencent_AILab_ChineseEmbedding.tar.gz

Checkpoints and Shells

Directory Structure of data

berts
- bert
  - config.json
  - vocab.txt
  - pytorch_model.bin
dataset
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
vocab
- tencent_vocab.txt, the vocab of pre-trained word embedding table.
embedding
- word_embedding.txt
result
- NER
  - weibo
  - note4
  - msra
  - resume
- POS
  - ctb5
  - ctb6
  - ud1
  - ud2
- CWS
  - ctb6
  - msr
  - pku
log

Run

1.Convert .char.bmes file to .json file, python3 to_json.py
2.run the shell, sh run_ner.sh

If you want to load my checkpoints, you need to make some revisions to your transformers.

My model is trained in distribution mode so it can not be directly loaded by single-GPU mode. You can follow the below steps to revise the transformers before load my checkpoints.

Enter the source code director of Transformer, cd source/transformers-master
Find the modeling_util.py, and positioned to about 995 lines
change the code as follows:
Compile the revised source code and install. python3 setup.py install

Cite

@misc{liu2021lexicon,
      title={Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter}, 
      author={Wei Liu and Xiyan Fu and Yue Zhang and Wenming Xiao},
      year={2021},
      eprint={2105.07148},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Related tags

Overview

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter

Requirement

Input Format

Chinese BERT，Chinese Word Embedding, and Checkpoints

Chinese BERT

Chinese word embedding:

Checkpoints and Shells

Directory Structure of data

Run

If you want to load my checkpoints, you need to make some revisions to your transformers.

Cite

Owner

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

Social Fabric: Tubelet Compositions for Video Relation Detection

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

In Search of Probeable Generalization Measures

TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments for IV 2022.

End-To-End Optimization of LiDAR Beam Configuration

Finetune SSL models for MOS prediction

Capstone-Project-2 - A game program written in the Python language

Faster Convex Lipschitz Regression

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

PyTorch Code for "Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning"

A modular, open and non-proprietary toolkit for core robotic functionalities by harnessing deep learning

ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

Self-attentive task GAN for space domain awareness data augmentation.