A fast and easy implementation of Transformer with PyTorch.

Overview

FasySeq

FasySeq is a shorthand as a Fast and easy sequential modeling toolkit. It aims to provide a seq2seq model to researchers and developers, which can be trained efficiently and modified easily. This toolkit is based on Transformer(Vaswani et al.), and will add more seq2seq models in the future.

Dependency

PyTorch >= 1.4
NLTK

Result

...

Structure

...

To Be Updated

  • top-k and top-p sampling
  • multi-GPU inference
  • length penalty in beam search
  • ...

Preprocess

Build Vocabulary

createVocab.py

NamedArguments Description
-f/--file The files used to build the vocabulary.
Type: List
--vocab_num The maximum size of vocabulary, the excess word will be discard according to the frequency.
Type: Int Default: -1
--min_freq The minimum frequency of token in vocabulary. The word with frequency less than min_freq will be discard.
Type: Int Default: 0
--lower Whether to convert all words to lowercase
--save_path The path to save voacbulary.
Type: str

Process Data

preprocess.py

NamedArguments Description
--source The path of source file.
Type: str
[--target] The path of target file.
Type: str
--src_vocab The path of source vocabulary.
Type: str
[--tgt_vocab] The path of target vocabulary.
Type: str
--save_path The path to save the processed data.
Type: str

Train

train.py

NamedArguments Description
Model -
--share_embed Source and target share the same vocabulary and word embedding. The max position of embedding is max(max_src_position, max_tgt_position) if the model employ share embedding.
--max_src_position The maximum source position, all src-tgt pairs which source sentences' lenght are greater than max_src_position will be cut or discard. If max_src_position > max source length, it wil be set to max source length.
Type: Int Default: inf
--max_tgt_position The maximum target position, all src_tgt pairs which target sentences' length are greater than max_tgt_position will be cut or discard. If max_tgt_position > max target length, it wil be set to max target length.
Type: Int Default: inf
--position_method The method to introduce positional information.
Option: encoding/embedding
--normalize_before Leveraging before layer normalization. See Xiong et al.
Checkpoint -
--checkpoint_path The path to save checkpoint file.
Type: str Default: None
--restore_file The checkpoint file to be loaded.
Type: str Default: None
--checkpoint_num Save the nearest checkpoint_num breakpoint.
Type: Int Default: inf
Data -
--vocab Vocabulary path. If you use share embedding, the vocabulary will be loaded from this path.
Type: str Default: None
--src_vocab Source vocabulary path.
Type: str Default: None
--tgt_vocab Target vocabulary path.
Type: str Default: None
--file The training data file.
Type: str
--max_tokens The maximum tokens in each batch.
Type: Int Default: 1000
--discard_invalid_data The data which length of source or data is more than maximum position will be discard if use this option, otherwise the long sentences will be cut into max position.
Train -
--cuda_num The device's ID of GPU.
Type: List
--grad_accumulate The num of gradient accumulate.
Type: Int Default: 1
--epoch The total epoch to train.
Type: Int Default: inf
--batch_print_info The number of batch to print training information.
Type: Int Default: 1000

Inference

generator.py

NamedArguments Description
--cuda_num The device's ID of GPU.
Type: List
--file The inference data file which has been processed.
Type: str
--raw_file The raw inference data file, and will be preprocessed before generated.
Type: str
--ref_file The reference file.
Type: str
--max_length
--max_alpha
--max_add_token
Maximum generated length = min(max_length, max_alpha * max_src_len, max_add_token + max_src_token)
Type: Int Default: inf
--max_tokens The maximum tokens in each batch.
Type: Int Default: 1000
--src_vocab Source vocabulary path.
Type: str Default: None
--tgt_vocab Target vocabulary path.
Type: str Default: None
--vocab Vocabulary path. If you use share embedding, the vocabulary will be loaded from this path.
Type: str Default: None
--model_path The path of pre-trained model.
Type: str
--output_path The path of output. the result will be saved into output_path/result.txt.
Type: str
--decode_method The decode method.
Option:greedy/beam
--beam Beam size.
Type: Int Default: 5

Postpreposs

avg_param.py

The average parameter code we employed is the same as fairseq.

License

FasySeq(-py) is Apache-2.0 License. The license applies to the pre-trained models as well.

You might also like...
Fast, general, and tested differentiable structured prediction in PyTorch
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

Reformer, the efficient Transformer, in Pytorch
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (t

Google's Meena transformer chatbot implementation
Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.
Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

LibreTranslate Try it online! | API Docs | Community Forum Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it d

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.
xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Description xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building bl

Owner
宁羽
宁羽
American Sign Language (ASL) to Text Converter

Signterpreter American Sign Language (ASL) to Text Converter Recommendations Although there is grayscale and gaussian blur, we recommend that you use

0 Feb 20, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
The entmax mapping and its loss, a family of sparse softmax alternatives.

entmax This package provides a pytorch implementation of entmax and entmax losses: a sparse family of probability mappings and corresponding loss func

DeepSPIN 330 Dec 22, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 01, 2022
FastFormers - highly efficient transformer models for NLU

FastFormers FastFormers provides a set of recipes and methods to achieve highly efficient inference of Transformer models for Natural Language Underst

Microsoft 678 Jan 05, 2023
Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

Salesforce 1.7k Dec 28, 2022
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
Fast, DB Backed pretrained word embeddings for natural language processing.

Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo

Victor Zhong 212 Nov 21, 2022
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

3 Dec 20, 2022
SurvTRACE: Transformers for Survival Analysis with Competing Events

⭐ SurvTRACE: Transformers for Survival Analysis with Competing Events This repo provides the implementation of SurvTRACE for survival analysis. It is

Zifeng 13 Oct 06, 2022
Maix Speech AI lib, including ASR, chat, TTS etc.

Maix-Speech 中文 | English Brief Now only support Chinese, See 中文 Build Clone code by: git clone https://github.com/sipeed/Maix-Speech Compile x86x64 c

Sipeed 267 Dec 25, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

Nathan Cooper 2.3k Jan 01, 2023
Repository of the Code to Chatbots, developed in Python

Description In this repository you will find the Code to my Chatbots, developed in Python. I'll explain the structure of this Repository later. Requir

Li-am K. 0 Oct 25, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022