ConvBERT-Prod

Overview

ConvBERT

目录

0. 仓库结构

root:[./]
|--convbert_base_outputs
|      |--args.json
|      |--best.pdparams
|      |      |--model_config.json
|      |      |--model_state.pdparams
|      |      |--tokenizer_config.json
|      |      |--vocab.txt
|--convbert_infer
|      |--inference.pdiparams
|      |--inference.pdiparams.info
|      |--inference.pdmodel
|      |--tokenizer_config.json
|      |--vocab.txt
|--deploy
|      |--inference_python
|      |      |--infer.py
|      |      |--README.md
|      |--serving_python
|      |      |--config.yml
|      |      |--convbert_client
|      |      |      |--serving_client_conf.prototxt
|      |      |      |--serving_client_conf.stream.prototxt
|      |      |--convbert_server
|      |      |      |--inference.pdiparams
|      |      |      |--inference.pdmodel
|      |      |      |--serving_server_conf.prototxt
|      |      |      |--serving_server_conf.stream.prototxt
|      |      |--PipelineServingLogs
|      |      |      |--pipeline.log
|      |      |      |--pipeline.log.wf
|      |      |      |--pipeline.tracer
|      |      |--pipeline_http_client.py
|      |      |--ProcessInfo.json
|      |      |--README.md
|      |      |--web_service.py
|--images
|      |--convbert_framework.jpg
|      |--py_serving_client_results.jpg
|      |--py_serving_startup_visualization.jpg
|--LICENSE
|--output_inference_engine.npy
|--output_predict_engine.npy
|--paddlenlp
|--print_project_tree.py
|--README.md
|--requirements.txt
|--shell
|      |--export.sh
|      |--inference_python.sh
|      |--predict.sh
|      |--train.sh
|      |--train_dist.sh
|--test_tipc
|      |--common_func.sh
|      |--configs
|      |      |--ConvBERT
|      |      |      |--model_linux_gpu_normal_normal_serving_python_linux_gpu_cpu.txt
|      |      |      |--train_infer_python.txt
|      |--docs
|      |      |--test_serving.md
|      |      |--test_train_inference_python.md
|      |      |--tipc_guide.png
|      |      |--tipc_serving.png
|      |      |--tipc_train_inference.png
|      |--output
|      |      |--python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log
|      |      |--python_infer_gpu_usetrt_null_precision_null_batchsize_null.log
|      |      |--results_python.log
|      |      |--results_serving.log
|      |      |--server_infer_gpu_pipeline_http_usetrt_null_precision_null_batchsize_1.log
|      |--README.md
|      |--test_serving.sh
|      |--test_train_inference_python.sh
|--tools
|      |--export_model.py
|      |--predict.py
|--train.log
|--train.py

1. 简介

论文: ConvBERT: Improving BERT with Span-based Dynamic Convolution

摘要: 像BERT及其变体这样的预训练语言模型最近在各种自然语言理解任务中取得了令人印象深刻的表现。然而,BERT严重依赖全局自注意力块,因此需要大量内存占用和计算成本。 虽然它的所有注意力头从全局角度查询整个输入序列以生成注意力图,但我们观察到一些头只需要学习局部依赖,这意味着存在计算冗余。 因此,我们提出了一种新颖的基于跨度的动态卷积来代替这些自注意力头,以直接对局部依赖性进行建模。新的卷积头与其余的自注意力头一起形成了一个新的混合注意力块,在全局和局部上下文学习中都更有效。 我们为 BERT 配备了这种混合注意力设计并构建了一个ConvBERT模型。实验表明,ConvBERT 在各种下游任务中明显优于BERT及其变体,具有更低的训练成本和更少的模型参数。 值得注意的是,ConvBERT-base 模型达到86.4GLUE分数,比ELECTRA-base高0.7,同时使用不到1/4的训练成本。

2. 数据集和复现精度

数据集为SST-2

模型 sst-2 dev acc (复现精度)
ConvBERT 0.9461

3. 准备环境与数据

3.1 准备环境

  • 下载代码
git clone https://github.com/junnyu/ConvBERT-Prod.git
  • 安装paddlepaddle
# 需要安装2.2及以上版本的Paddle,如果
# 安装GPU版本的Paddle
pip install paddlepaddle-gpu==2.2.0
# 安装CPU版本的Paddle
pip install paddlepaddle==2.2.0

更多安装方法可以参考:Paddle安装指南

  • 安装requirements
pip install -r requirements.txt

3.2 准备数据

SST-2数据已经集成在paddlenlp仓库中。

3.3 准备模型

如果您希望直接体验评估或者预测推理过程,可以直接根据第2章的内容下载提供的预训练模型,直接体验模型评估、预测、推理部署等内容。

4. 开始使用

4.1 模型训练

  • 单机单卡训练
export CUDA_VISIBLE_DEVICES=0
python -m paddle.distributed.launch --gpus "0" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

部分训练日志如下所示。

====================================================================================================
global step 2500/6315, epoch: 1, batch: 394, rank_id: 0, loss: 0.140546, lr: 0.0000671182, speed: 3.7691 step/s
global step 2600/6315, epoch: 1, batch: 494, rank_id: 0, loss: 0.062813, lr: 0.0000653589, speed: 4.1413 step/s
global step 2700/6315, epoch: 1, batch: 594, rank_id: 0, loss: 0.051268, lr: 0.0000635996, speed: 4.1867 step/s
global step 2800/6315, epoch: 1, batch: 694, rank_id: 0, loss: 0.133289, lr: 0.0000618403, speed: 4.1769 step/s
eval loss: 0.342346, acc: 0.9461009174311926,
eval done total : 1.9056718349456787 s
====================================================================================================
  • 单机多卡训练
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus "0,1,2,3" train.py \
    --model_type convbert \
    --model_name_or_path convbert-base \
    --task_name sst-2 \
    --max_seq_length 128 \
    --learning_rate 1e-4 \
    --num_train_epochs 3 \
    --output_dir ./convbert_base_outputs/ \
    --logging_steps 100 \
    --save_steps 400 \
    --batch_size 32   \
    --warmup_proportion 0.1

更多配置参数可以参考train.pyget_args_parser函数。

4.2 模型评估

该项目中,训练与评估脚本同时进行,请查看训练过程中的评价指标。

4.3 模型预测

  • 使用GPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.9959235191345215,表示预测的标签ID是0,置信度为0.9959

  • 使用CPU预测
python tools/predict.py --model_path=./convbert_base_outputs/best.pdparams --device=cpu

对于下面的文本进行预测

the problem , it is with most of these things , is the script .

最终输出结果为label_id: 0, prob: 0.995919406414032,表示预测的标签ID是0,置信度为0.9959

5. 模型推理部署

5.1 基于Inference的推理

Inference推理教程可参考:链接

5.2 基于Serving的服务化部署

Serving部署教程可参考:链接

6. TIPC自动化测试脚本

以Linux基础训练推理测试为例,测试流程如下。

  • 运行测试命令
bash test_tipc/test_train_inference_python.sh test_tipc/configs/ConvBERT/train_infer_python.txt whole_train_whole_infer

如果运行成功,在终端中会显示下面的内容,具体的日志也会输出到test_tipc/output/文件夹中的文件中。

�[33m Run successfully with command - python train.py --save_steps 400      --max_steps=6315           !  �[0m
�[33m Run successfully with command - python tools/export_model.py --model_path=./convbert_base_outputs/best.pdparams --save_inference_dir ./convbert_infer      !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=True               > ./test_tipc/output/python_infer_gpu_usetrt_null_precision_null_batchsize_null.log 2>&1 !  �[0m
�[33m Run successfully with command - python deploy/inference_python/infer.py --model_dir ./convbert_infer --use_gpu=False --benchmark=False               > ./test_tipc/output/python_infer_cpu_usemkldnn_False_threads_null_precision_null_batchsize_null.log 2>&1 !  �[0m

7. 注意

为了可以使用静态图导出功能,本项目修改了paddlenlp仓库中的convbert模型,主要修改部分如下。

    1. 使用paddle.shape而不是tensor.shape获取tensor的形状。
    1. F.unfold对于静态图不怎么友好,只好采用for循环。
if self.conv_type == "sdconv":
    bs = paddle.shape(q)[0]
    seqlen = paddle.shape(q)[1]
    mixed_key_conv_attn_layer = self.key_conv_attn_layer(query)
    conv_attn_layer = mixed_key_conv_attn_layer * q

    # conv_kernel_layer
    conv_kernel_layer = self.conv_kernel_layer(conv_attn_layer)
    conv_kernel_layer = tensor.reshape(
        conv_kernel_layer, shape=[-1, self.conv_kernel_size, 1])
    conv_kernel_layer = F.softmax(conv_kernel_layer, axis=1)
    conv_out_layer = self.conv_out_layer(query)
    conv_out_layer = paddle.stack(
        [
            paddle.slice(F.pad(conv_out_layer, pad=[
                            self.padding, self.padding], data_format="NLC"), [1], starts=[i], ends=[i+seqlen])
            for i in range(self.conv_kernel_size)
        ],
        axis=-1,
    )
    conv_out_layer = tensor.reshape(
        conv_out_layer,
        shape=[-1, self.head_dim, self.conv_kernel_size])
    conv_out_layer = tensor.matmul(conv_out_layer, conv_kernel_layer)
    conv_out = tensor.reshape(
        conv_out_layer,
        shape=[bs, seqlen, self.num_heads, self.head_dim])

8. LICENSE

本项目的发布受Apache 2.0 license许可认证。

9. 参考链接与文献

TODO

Owner
yujun
Please show me your code.
yujun
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
Poetry PEP 517 Build Backend & Core Utilities

Poetry Core A PEP 517 build backend implementation developed for Poetry. This project is intended to be a light weight, fully compliant, self-containe

Poetry 293 Jan 02, 2023
code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling This repository contains PyTorch evaluation code, training code and pretrain

Facebook Research 94 Oct 26, 2022
AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

AI Dynamic Text Reader: This is a simple dynamic text reader based on Artificial

Md. Rakibul Islam 1 Jan 18, 2022
Implementation of "Adversarial purification with Score-based generative models", ICML 2021

Adversarial Purification with Score-based Generative Models by Jongmin Yoon, Sung Ju Hwang, Juho Lee This repository includes the official PyTorch imp

15 Dec 15, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

Facebook Research 605 Jan 02, 2023
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

NLP learning Trying to learn NLP to use in my projects! Table of Contents About The Project Built With Getting Started Requirements Run Usage License

Faraz Farangizadeh 3 Aug 25, 2022
Fixes mojibake and other glitches in Unicode text, after the fact.

ftfy: fixes text for you print(fix_encoding("(ง'⌣')ง")) (ง'⌣')ง Full documentation: https://ftfy.readthedocs.org Testimonials “My life is li

Luminoso Technologies, Inc. 3.4k Dec 29, 2022
Beyond Accuracy: Behavioral Testing of NLP models with CheckList

CheckList This repository contains code for testing NLP Models as described in the following paper: Beyond Accuracy: Behavioral Testing of NLP models

Marco Tulio Correia Ribeiro 1.8k Dec 28, 2022
A framework for cleaning Chinese dialog data

A framework for cleaning Chinese dialog data

Yida 136 Dec 20, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 03, 2023
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 04, 2023
Easy-to-use CPM for Chinese text generation

CPM 项目描述 CPM(Chinese Pretrained Models)模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型,参数量分别为109M、334M、2.6B,用户需申请与通过审核,方可下载。 由于原项目需要考虑大模型的训练和使用,需要安装较为复杂

382 Jan 07, 2023
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 03, 2023
Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation.

Covid-19-BOT Samantha, A covid-19 information bot which will provide basic information about this pandemic in form of conversation. This bot uses torc

Neeraj Majhi 2 Nov 05, 2021
Tool to check whether a GCP bucket is public or not.

Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha

DIVYANSHU SHUKLA 7 Nov 24, 2022
Pangu-Alpha for Transformers

Pangu-Alpha for Transformers Usage Download MindSpore FP32 weights for GPU from here to data/Pangu-alpha_2.6B.ckpt Activate MindSpore environment and

One 5 Oct 01, 2022