poutyne-transformers

Train 🤗 -transformers models with Poutyne.

Installation

pip install poutyne-transformers

Example

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
from torch.utils.data import DataLoader
from torch import optim
from poutyne import Model
from poutyne_transformers import TransformerCollator, model_loss, ModelWrapper

print('Loading model & tokenizer.')
transformer = AutoModelForSequenceClassification.from_pretrained('distilbert-base-cased', num_labels=2, return_dict=True)
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-cased')

print('Loading & preparing dataset.')
dataset = load_dataset("imdb")
dataset = dataset.map(lambda entry: tokenizer(entry['text'], add_special_tokens=True, padding='max_length', truncation=True), batched=True)
dataset = dataset.remove_columns(['text'])
dataset.set_format('torch')

collate_fn = TransformerCollator()
train_dataloader = DataLoader(dataset['train'], batch_size=16, collate_fn=collate_fn)
test_dataloader = DataLoader(dataset['test'], batch_size=16, collate_fn=collate_fn)

print('Preparing training.')
wrapped_transformer = ModelWrapper(transformer)
optimizer = optim.AdamW(wrapped_transformer.parameters(), lr=5e-5)
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
model = Model(wrapped_transformer, optimizer, loss_function=model_loss, device=device)

print('Starting training.')
model.fit_generator(train_dataloader, test_dataloader, epochs=1)

Train 🤗-transformers model with Poutyne.

Related tags

Overview

poutyne-transformers

Installation

Example

Owner

Lennart Keller

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

A framework for cleaning Chinese dialog data

voice2json is a collection of command-line tools for offline speech/intent recognition on Linux

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Minimal GUI for accessing the Watson Text to Speech service.

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Wind Speed Prediction using LSTMs in PyTorch

ACL'22: Structured Pruning Learns Compact and Accurate Models

Natural language Understanding Toolkit

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Get list of common stop words in various languages in Python

A simple implementation of N-gram language model.

Mednlp - Medical natural language parsing and utility library

jiant is an NLP toolkit

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

LewusBot - Twitch ChatBot built in python with twitchio library

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

A Fast Command Analyser based on Dict and Pydantic