硕士期间自学的NLP子任务，供学习参考

Last update: May 31, 2022

Overview

NLP_Chinese_down_stream_task

自学的NLP子任务，供学习参考

任务1 ：短文本分类

(1).数据集：THUCNews中文文本数据集(10分类)

(2).模型：BERT+FC/LSTM，Pytorch实现

(3).使用方法：

预训练模型使用的是中文BERT-WWM, 下载地址(https://github.com/ymcui/Chinese-BERT-wwm), 下载解压后放入[bert_pretrain]文件夹下，运行“main.py”即可

(4).训练结果：

任务2：命名体识别(NER)

(1).数据集：china-people-daily-ner-corpus（中国人民日报数据集）

(2).模型：BiLSTM+CRF，Tensorflow_cpu >= 2.1

使用了中文Wikipedia训练好的100维词向量，运行main.py即可。

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(1).数据集：fake-news-pair-classification-challenge(kaggle虚假新闻标题分类竞赛，标签有三种关系：'unrelated', 'agreed', 'disagreed')

(2).模型：Siamese LSTM + 任意文本相似度匹配方法，Tensorflow_cpu >= 2.1

(3).使用方法：

直接运行“main.py”即可

硕士期间自学的NLP子任务，供学习参考

Related tags

Overview

NLP_Chinese_down_stream_task

任务1 ：短文本分类

(3).使用方法：

(4).训练结果：

任务2：命名体识别(NER)

(3).训练结果:

(4).F1-Score结果:

任务3：文本匹配（语义相似度，Semantic Textual Similarity）

(3).使用方法：

(4).训练结果：

Reference:

Owner

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Python library to make development of portfolio analysis faster and easier

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Repository for Project Insight: NLP as a Service

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

customer care chatbot made with Rasa Open Source.

SpikeX - SpaCy Pipes for Knowledge Extraction

Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

Perform sentiment analysis and keyword extraction on Craigslist listings

Client library to download and publish models and other files on the huggingface.co hub

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Chinese segmentation library

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.