NLP: SLU tagging

Last update: Jan 14, 2022

Related tags

Text Data & NLP slu-homework

Overview

创建环境

conda create -n slu python=3.6
source activate slu
pip install torch==1.7.1

运行

训练：在根目录下运行

python scripts/slu_baseline.py

测试：在根目录下运行（将会读取test_unlabelled.json并在data目录下生成test.json）环境与原始相同

python scripts/slu_evaluate.py

代码说明

utils/args.py:定义了所有涉及到的可选参数，如需改动某一参数可以在运行的时候将命令修改成
```
  python scripts/slu_baseline.py --
      
      

      
     
```
其中，为要修改的参数名，为修改后的值
utils/initialization.py:初始化系统设置，包括设置随机种子和显卡/CPU
utils/vocab.py:构建编码输入输出的词表
utils/word2vec.py:读取词向量
utils/example.py:读取数据
utils/batch.py:将数据以批为单位转化为输入
model/slu_baseline_tagging.py:baseline模型
scripts/slu_baseline.py:主程序脚本

有关预训练语言模型

本次代码中没有加入有关预训练语言模型的代码，如需使用预训练语言模型我们推荐使用下面几个预训练模型，若使用预训练语言模型，不要使用large级别的模型

Bert: https://huggingface.co/bert-base-chinese
Bert-WWM: https://huggingface.co/hfl/chinese-bert-wwm-ext
Roberta-WWM: https://huggingface.co/hfl/chinese-roberta-wwm-ext
MacBert: https://huggingface.co/hfl/chinese-macbert-base

推荐使用的工具库

transformers
- 使用预训练语言模型的工具库: https://huggingface.co/
nltk
- 强力的NLP工具库: https://www.nltk.org/
stanza
- 强力的NLP工具库: https://stanfordnlp.github.io/stanza/
jieba
- 中文分词工具: https://github.com/fxsjy/jieba

Owner

北海若

Undergraduate, at SJTU & MSRA.

北海若

GitHub Repository

Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

578 Dec 13, 2022

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Chiffrement Inverse En Python3 Programme de chiffrement et de déchiffrement inverse d'un message en python3. Explication du chiffrement inverse avec c

2 Mar 26, 2022

Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

9.1k Jan 02, 2023

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

2 Sep 27, 2022

CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍思想来自于苏神 GlobalPointer，原始版本是基于keras实现的，模型结构实现参考现有 pytorch 复现代码【感谢!】，基于torch百分百复现苏神原始效果。数据集中文医学命名实体数据集点这里申请，很简单，共包含九类医学

85 Dec 28, 2022

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

9 Dec 17, 2022

Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

25.6k Jan 06, 2023

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

248 Dec 17, 2022

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

767 Jan 09, 2023

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors [Paper] [Project Website] Pytorch implementation for SAVI2I. We

44 Dec 30, 2022

Search Git commits in natural language

NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co

50 Mar 22, 2022

GooAQ 🥑 : Google Answers to Google Questions!

This repository contains the code/data accompanying our recent work on long-form question answering.

112 Nov 06, 2022

Pipeline for chemical image-to-text competition

BMS-Molecular-Translation Introduction This is a pipeline for Bristol-Myers Squibb – Molecular Translation by Vadim Timakin and Maksim Zhdanov. We got

7 Sep 20, 2022

Easy, fast, effective, and automatic g-code compression!

Getting to the meat of g-code. Easy, fast, effective, and automatic g-code compression! MeatPack nearly doubles the effective data rate of a standard

97 Nov 21, 2022

A Fast Command Analyser based on Dict and Pydantic

Alconna Alconna 隶属于ArcletProject，在Cesloi内有内置 Alconna 是 Cesloi-CommandAnalysis 的高级版，支持解析消息链一般情况下请当作简易的消息链解析器/命令解析器文档暂时的文档 Example from arclet.alcon

19 Jan 03, 2023

This library is testing the ethics of language models by using natural adversarial texts.

prompt2slip This library is testing the ethics of language models by using natural adversarial texts. This tool allows for short and simple code and v

9 Dec 28, 2021

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

2 Jun 30, 2022

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

1 Jan 03, 2022

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

Deduplication is the task to combine different representations of the same real world entity.

Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training wi

63 Nov 17, 2022