PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Last update: Dec 21, 2022

Overview

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

pororo performs Natural Language Processing and Speech-related tasks.

It is easy to solve various subtasks in the natural language and speech processing field by simply passing the task name.

Installation

pororo is based on torch=1.6(cuda 10.1) and python>=3.6
You can install a package through the command below:

pip install pororo

Or you can install it locally:

git clone https://github.com/kakaobrain/pororo.git
cd pororo
pip install -e .

For library installation for specific tasks other than the common modules, please refer to INSTALL.md
For the utilization of Automatic Speech Recognition, wav2letter should be installed separately. For the installation, please run the asr-install.sh file

bash asr-install.sh

Usage

pororo can be used as follows:
First, in order to import pororo, you must execute the following snippet

>>> from pororo import Pororo

After the import, you can check the tasks currently supported by the pororo through the following commands

>>> from pororo import Pororo
>>> Pororo.available_tasks()
"Available tasks are ['mrc', 'rc', 'qa', 'question_answering', 'machine_reading_comprehension', 'reading_comprehension', 'sentiment', 'sentiment_analysis', 'nli', 'natural_language_inference', 'inference', 'fill', 'fill_in_blank', 'fib', 'para', 'pi', 'cse', 'contextual_subword_embedding', 'similarity', 'sts', 'semantic_textual_similarity', 'sentence_similarity', 'sentvec', 'sentence_embedding', 'sentence_vector', 'se', 'inflection', 'morphological_inflection', 'g2p', 'grapheme_to_phoneme', 'grapheme_to_phoneme_conversion', 'w2v', 'wordvec', 'word2vec', 'word_vector', 'word_embedding', 'tokenize', 'tokenise', 'tokenization', 'tokenisation', 'tok', 'segmentation', 'seg', 'mt', 'machine_translation', 'translation', 'pos', 'tag', 'pos_tagging', 'tagging', 'const', 'constituency', 'constituency_parsing', 'cp', 'pg', 'collocation', 'collocate', 'col', 'word_translation', 'wt', 'summarization', 'summarisation', 'text_summarization', 'text_summarisation', 'summary', 'gec', 'review', 'review_scoring', 'lemmatization', 'lemmatisation', 'lemma', 'ner', 'named_entity_recognition', 'entity_recognition', 'zero-topic', 'dp', 'dep_parse', 'caption', 'captioning', 'asr', 'speech_recognition', 'st', 'speech_translation', 'ocr', 'srl', 'semantic_role_labeling', 'p2g', 'aes', 'essay', 'qg', 'question_generation', 'age_suitability']"

To check which models are supported by each task, you can go through the following process

>>> from pororo import Pororo
>>> Pororo.available_models("collocation")
'Available models for collocation are ([lang]: ko, [model]: kollocate), ([lang]: en, [model]: collocate.en), ([lang]: ja, [model]: collocate.ja), ([lang]: zh, [model]: collocate.zh)'

If you want to perform a specific task, you can put the task name in the task argument and the language name in the lang argument

>>> from pororo import Pororo
>>> ner = Pororo(task="ner", lang="en")

After object construction, it can be used in a way that passes the input value as follows:

>>> ner("Michael Jeffrey Jordan (born February 17, 1963) is an American businessman and former professional basketball player.")
[('Michael Jeffrey Jordan', 'PERSON'), ('(', 'O'), ('born', 'O'), ('February 17, 1963)', 'DATE'), ('is', 'O'), ('an', 'O'), ('American', 'NORP'), ('businessman', 'O'), ('and', 'O'), ('former', 'O'), ('professional', 'O'), ('basketball', 'O'), ('player', 'O'), ('.', 'O')]

If task supports multiple languages, you can change the lang argument to take advantage of models trained in different languages.

>>> ner = Pororo(task="ner", lang="ko")
>>> ner("마이클 제프리 조던(영어: Michael Jeffrey Jordan, 1963년 2월 17일 ~ )은 미국의 은퇴한 농구 선수이다.")
[('마이클 제프리 조던', 'PERSON'), ('(', 'O'), ('영어', 'CIVILIZATION'), (':', 'O'), (' ', 'O'), ('Michael Jeffrey Jordan', 'PERSON'), (',', 'O'), (' ', 'O'), ('1963년 2월 17일 ~', 'DATE'), (' ', 'O'), (')은', 'O'), (' ', 'O'), ('미국', 'LOCATION'), ('의', 'O'), (' ', 'O'), ('은퇴한', 'O'), (' ', 'O'), ('농구 선수', 'CIVILIZATION'), ('이다.', 'O')]
>>> ner = Pororo(task="ner", lang="ja")
>>> ner("マイケル・ジェフリー・ジョーダンは、アメリカ合衆国の元バスケットボール選手")
[('マイケル・ジェフリー・ジョーダン', 'PERSON'), ('は', 'O'), ('、アメリカ合衆国', 'O'), ('の', 'O'), ('元', 'O'), ('バスケットボール', 'O'), ('選手', 'O')]
>>> ner = Pororo(task="ner", lang="zh")
>>> ner("麥可·傑佛瑞·喬丹是美國退役NBA職業籃球運動員，也是一名商人，現任夏洛特黃蜂董事長及主要股東")
[('麥可·傑佛瑞·喬丹', 'PERSON'), ('是', 'O'), ('美國', 'GPE'), ('退', 'O'), ('役', 'O'), ('nba', 'ORG'), ('職', 'O'), ('業', 'O'), ('籃', 'O'), ('球', 'O'), ('運', 'O'), ('動', 'O'), ('員', 'O'), ('，', 'O'), ('也', 'O'), ('是', 'O'), ('一', 'O'), ('名', 'O'), ('商', 'O'), ('人', 'O'), ('，', 'O'), ('現', 'O'), ('任', 'O'), ('夏洛特黃蜂', 'ORG'), ('董', 'O'), ('事', 'O'), ('長', 'O'), ('及', 'O'), ('主', 'O'), ('要', 'O'), ('股', 'O'), ('東', 'O')]

If the task supports multiple models, you can change the model argument to use another model.

>>> from pororo import Pororo
>>> mt = Pororo(task="mt", lang="multi", model="transformer.large.multi.mtpg")
>>> fast_mt = Pororo(task="mt", lang="multi", model="transformer.large.multi.fast.mtpg")

Documentation

For more detailed information, see full documentation

If you have any questions or requests, please report the issue.

Citation

If you apply this library to any project and research, please cite our code:

@misc{pororo,
  author       = {Heo, Hoon and Ko, Hyunwoong and Kim, Soohwan and
                  Han, Gunsoo and Park, Jiwoo and Park, Kyubyong},
  title        = {PORORO: Platform Of neuRal mOdels for natuRal language prOcessing},
  howpublished = {\url{https://github.com/kakaobrain/pororo}},
  year         = {2021},
}

Contributors

Hoon Heo, Hyunwoong Ko, Soohwan Kim, Gunsoo Han, Jiwoo Park and Kyubyong Park

License

PORORO project is licensed under the terms of the Apache License 2.0.

Comments

Fix typo on para_gen docstrings and html
Title

fix typo on para_gen docstrings and html

Description

Englosh to English

Linked Issues

resolved #43

MRC랑 한번에 PR 했어야 했는데.. 여러모로 번거롭게 해드려서 죄송합니다...
opened by SDSTony 1
Fix typo on machine_reading_comprehension.py and mrc.html
Title

Fix typo on machine_reading_comprehension.py and mrc.html

Description

Fix typo comprehesion to comprehension found on

machine_reading_comprehension.py docstring

mrc.html

Linked Issues

resolved #41
opened by SDSTony 1
Fix typo on age_suitability.html
fix typo from nudiy to nudity

Title

fix typo on age_suitability.html

Description

There is a typo on age_suitability.html page. I think the word Nudiy should be fixed into Nudity. I've edited the html file directly in this PR. If this isn't a proper way to edit a published web document, please cancel this PR. Thank you.

Linked Issues

#39
opened by SDSTony 1

Improve MRC inference and change output

Title

Improve MRC inference and change output

Summary

Predict span using top10 start&end position
Add score output
Add logit output

Description

In predicting span in the MRC, the existing code used only the maximum value of start position and end position. For a more accurate inference, the top 10 start positions and end positions were used to predict the highest score span. At this time, the score is defined as the sum of start logit and end logit. Finally, I added logit and score to the output for user convenience.

Examples

>>> mrc = Pororo(task="mrc", lang="ko")
>>> mrc(
>>>    "카카오브레인이 공개한 것은?",
>>>    "카카오 인공지능(AI) 연구개발 자회사 카카오브레인이 AI 솔루션을 첫 상품화했다. 카카오는 카카오브레인 '포즈(pose·자세분석) API'를 유료 공개한다고 24일 밝혔다. 카카오브레인이 AI 기술을 유료 API를 공개하는 것은 처음이다. 공개하자마자 외부 문의가 쇄도한다. 포즈는 AI 비전(VISION, 영상·화면분석) 분야 중 하나다. 카카오브레인 포즈 API는 이미지나 영상을 분석해 사람 자세를 추출하는 기능을 제공한다."
>>> )
('포즈(pose·자세분석) API',
 (33, 44),
 (5.7833147048950195, 4.649877548217773),
 10.433192253112793)
>>> # when mecab doesn't work well for postprocess, you can set `postprocess` option as `False`
>>> mrc("카카오브레인이 공개한 라이브러리 이름은?", "카카오브레인은 자연어 처리와 음성 관련 태스크를 쉽게 수행할 수 있도록 도와 주는 라이브러리 pororo를 공개하였습니다.", postprocess=False)
('pororo', (31, 35), (8.656489372253418, 8.14583683013916), 16.802326202392578)

opened by skaurl 0

Fixed Code Quality Issues
Title

Fixed Code Quality Issues

Description

Summary:

Remove unnecessary generator

Remove methods with an unnecessary super delegation

Remove redundant None

Add .deepsource.toml

I ran a DeepSource Analysis on my fork of this repository. You can see all the issues raised by DeepSource here.

DeepSource helps you to automatically find and fix issues in your code during code reviews. This tool looks for anti-patterns, bug risks, performance problems, and raises issues. There are plenty of other issues in relation to Bug Discovery and Anti-Patterns which you would be interested to take a look at.

If you do not want to use DeepSource to continuously analyze this repo, I'll remove the .deepsource.toml from this PR and you can merge the rest of the fixes. If you want to setup DeepSource for Continuous Analysis, I can help you set that up.
opened by HarshCasper 0
Update TTS example comment
Title

Update TTS example comment

Description

Update TTS example comment (Cross-lingual Voice Style Transfer => Code-Switching)

Linked Issues

resolved #00
opened by sooftware 0
Delete unuse files & Add tts example ipynb
Title

Delete unuse files & Add tts example ipynb

Description

Delete unuse files (examples/.ipynb/, examples/Untitle.ipynb)

Add examples/speech_synthesis.ipynb

Linked Issues

resolved #00
opened by sooftware 0
Update TTS
Title

Denote TTS INSTALL.md & 3rd_party_model & Add tts-install.sh

Description

Denote TTS install requirements

Denote 3rd_party_model (TTS)

Add tts-install.sh

Test complete

docstring example update

Linked Issues

resolved #00
opened by sooftware 0
Mount TTS
Title

Mount TTS

Description

Mount TTS (Text-To-Speech) Task

Update LICENSE.3rd_party_library

Add test file (tts)

demo page (Not yet completed)

Linked Issues

resolved #00
opened by sooftware 0
Feature/6 kwargs
Title

Add kwargs to __call__ and predict

Description

Add kwargs to __call__ and predict to prevent generate unnecessary custom predict function

Linked Issues

resolved #6
opened by Huffon 0

fix: prevent OSError: read-only file system error

Description

I found that there is a chance of OSError to occur when we try to load models into a temporary directory such as in the strictly managed environment like some containers on the cloud.

[2022-03-23 04:07:37,080] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     review_scoring_model = Pororo(task="review", lang="ko")
[2022-03-23 04:07:37,080] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/pororo.py", line 203, in __new__
[2022-03-23 04:07:37,080] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     task_module = SUPPORTED_TASKS[task](
[2022-03-23 04:07:37,080] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/tasks/review_scoring.py", line 86, in load
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     model = (BrainRobertaModel.load_model(
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/models/brainbert/BrainRoBERTa.py", line 33, in load_model
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     ckpt_dir = download_or_load(model_name, lang)
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/tasks/utils/download_utils.py", line 318, in download_or_load
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     return download_or_load_bert(info)
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/tasks/utils/download_utils.py", line 104, in download_or_load_bert
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     type_dir = download_from_url(
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/pororo/tasks/utils/download_utils.py", line 288, in download_from_url
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     wget.download(url, type_dir)
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/site-packages/wget.py", line 506, in download
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".")
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/tempfile.py", line 331, in mkstemp
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]   File "/usr/local/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
[2022-03-23 04:07:37,081] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000]     fd = _os.open(file, flags, 0o600)
[2022-03-23 04:07:37,082] {ecs.py:362} INFO - [2022-03-23T04:07:12.901000] OSError: [Errno 30] Read-only file system: './brainbert.base.ko.review_rating.zip4zkvg88b.tmp'

This commit will prevent that to happen. The code for the new function 'download' is originated from wget library written by anatoly techtonik with slight revision done by me.

opened by daun-io 0

Improve MRC inference and change output

Title

Improve MRC inference and change output

Summary

Predict span using top10 start&end position
Add score output
Add logit output

Description

Examples

>>> mrc = Pororo(task="mrc", lang="ko")
>>> mrc(
>>>    "카카오브레인이 공개한 것은?",
>>>    "카카오 인공지능(AI) 연구개발 자회사 카카오브레인이 AI 솔루션을 첫 상품화했다. 카카오는 카카오브레인 '포즈(pose·자세분석) API'를 유료 공개한다고 24일 밝혔다. 카카오브레인이 AI 기술을 유료 API를 공개하는 것은 처음이다. 공개하자마자 외부 문의가 쇄도한다. 포즈는 AI 비전(VISION, 영상·화면분석) 분야 중 하나다. 카카오브레인 포즈 API는 이미지나 영상을 분석해 사람 자세를 추출하는 기능을 제공한다."
>>> )
('포즈(pose·자세분석) API',
 (33, 44),
 (5.7833147048950195, 4.649877548217773),
 10.433192253112793)
>>> # when mecab doesn't work well for postprocess, you can set `postprocess` option as `False`
>>> mrc("카카오브레인이 공개한 라이브러리 이름은?", "카카오브레인은 자연어 처리와 음성 관련 태스크를 쉽게 수행할 수 있도록 도와 주는 라이브러리 pororo를 공개하였습니다.", postprocess=False)
('pororo', (31, 35), (8.656489372253418, 8.14583683013916), 16.802326202392578)

opened by skaurl 0

Releases(0.4.0)

0.4.0(Feb 12, 2021)
Fix CPU-only machine error for Text Summarization (#11)

Apply kwargs to every predict func (#13)

Add Word Sense Disambiguation for Korean

Add apply_wsd args for Korean Named Entity Recognition

Add Speech Synthesis module (#27)

Fix show_probs KeyError for Japanese Sentiment Analysis (#28)

Source code(tar.gz)
Source code(zip)
0.3.2(Feb 3, 2021)
Bug fixes:

change typing.OrderedDict to collections.OrderedDict

install dataclasses if python version is lower than 3.7

Source code(tar.gz)
Source code(zip)
0.3.1(Feb 2, 2021)
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

pororo performs Natural Language Processing and Speech-related tasks.

It is easy to solve various subtasks in the natural language and speech processing field by simply passing the task name.

Supported Tasks

You can see more information here !

TEXT CLASSIFICATION

Automated Essay Scoring

Age Suitability Prediction

Natural Language Inference

Paraphrase Identification

Review Scoring

Semantic Textual Similarity

Sentence Embedding

Sentiment Analysis

Zero-shot Topic Classification

SEQUENCE TAGGING

Contextualized Embedding

Dependency Parsing

Fill-in-the-blank

Machine Reading Comprehension

Named Entity Recognition

Part-of-Speech Tagging

Semantic Role Labeling

SEQ2SEQ

Constituency Parsing

Grammatical Error Correction

Grapheme-to-Phoneme

Phoneme-to-Grapheme

Machine Translation

Paraphrase Generation

Question Generation

Text Summarization

MISC.

Automatic Speech Recognition

Image Captioning

Collocation

Lemmatization

Morphological Inflection

Optical Character Recognition

Tokenization

Word Translation

Source code(tar.gz)
Source code(zip)

Owner

Kakao Brain

Kakao Brain Corp.

GitHub Repository https://kakaobrain.github.io/pororo

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

823 Dec 28, 2022

This repository structures data in title, summary, tags, sentiment given a fragment of a conversation

Understand-conversation-AI This repository structures data in title, summary, tags, sentiment given a fragment of a conversation How to install: pip i

1 Jan 11, 2022

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

Pretrain_Bert_with_MaskLM Info 使用Mask LM预训练任务来预训练Bert模型。基于pytorch框架，训练关于垂直领域语料的预训练语言模型，目的是提升下游任务的表现。 Pretraining Task Mask Language Model，简称Mask LM，即

24 Dec 10, 2022

scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

233 Sep 09, 2022

Repositório da disciplina no semestre 2021-2

Avisos! Nenhum aviso! Compiladores 1 Este é o Git da disciplina Compiladores 1. Aqui ficará o material produzido em sala de aula assim como tarefas, w

6 May 13, 2022

A PyTorch-based model pruning toolkit for pre-trained language models

English | 中文说明 TextPruner是一个为预训练语言模型设计的模型裁剪工具包，通过轻量、快速的裁剪方法对模型进行结构化剪枝，从而实现压缩模型体积、提升模型速度。其他相关资源：知识蒸馏工具TextBrewer：https://github.com/airaria/TextBrewe

231 Jan 08, 2023

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

CrossNER is a fully-labeled collected of named entity recognition (NER) data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specia

89 Nov 10, 2022

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Arabic-Phonetic-Output You can input the phonetic version of any Arabic text her

1 Dec 30, 2021

Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

112 Dec 05, 2022

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

0 Dec 27, 2021

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

Twitter-NLP-Analysis Business Problem I got last @turk_politika 3000 tweets with

7 Mar 12, 2022

A NLP program: tokenize method, PoS Tagging with deep learning

IRIS NLP SYSTEM A NLP program: tokenize method, PoS Tagging with deep learning Report Bug · Request Feature Table of Contents About The Project Built

7 Dec 13, 2022

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

- 基于标题的大规模商品实体检索top1 一、任务介绍 CCKS 2020：基于标题的大规模商品实体检索，任务为对于给定的一个商品标题，参赛系统需要匹配到该标题在给定商品库中的对应商品实体。输入：输入文件包括若干行商品标题。输出：输出文本每一行包括此标题对应的商品实体，即给定知识库中商品 ID，

43 Nov 11, 2022

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX. This package is still in alpha stag

211 Dec 28, 2022

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Related tags

Overview

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Installation

Usage

Documentation

Citation

Contributors

License

Comments

Title

Description

Linked Issues

Title

Description

Linked Issues

Title

Description

Linked Issues

Title

Summary

Description

Examples

Title

Description

Title

Description

Linked Issues

Title

Description

Linked Issues

Title

Description

Linked Issues

Title

Description

Linked Issues

Title

Description

Linked Issues

Description

Title

Summary

Description

Examples

Releases(0.4.0)

0.4.0(Feb 12, 2021)

0.3.2(Feb 3, 2021)

0.3.1(Feb 2, 2021)

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Supported Tasks

TEXT CLASSIFICATION

SEQUENCE TAGGING

SEQ2SEQ

MISC.

Owner

Kakao Brain

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

This repository structures data in title, summary, tags, sentiment given a fragment of a conversation

使用Mask LM预训练任务来预训练Bert模型。训练垂直领域语料的模型表征，提升下游任务的表现。

scikit-learn wrappers for Python fastText.

Repositório da disciplina no semestre 2021-2

A PyTorch-based model pruning toolkit for pre-trained language models

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Simple and efficient RevNet-Library with DeepSpeed support

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

A NLP program: tokenize method, PoS Tagging with deep learning

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Unsupervised text tokenizer focused on computational efficiency

中文无监督SimCSE Pytorch实现

The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.

基于百度的语音识别，用python实现，pyaudio+pyqt

Multilingual text (NLP) processing toolkit