A high-level yet extensible library for fast language model tuning via automatic prompt search

Last update: Dec 07, 2022

Related tags

Overview

ruPrompts

ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with HuggingFace Hub, configuration system powered by Hydra, and command line interface.

Prompt is a text instruction for language model, like

Translate English to French:
cat =>

For some tasks the prompt is obvious, but for some it isn't. With ruPrompts you can define only the prompt format, like {text}, and train it automatically for any task, if you have a training dataset.

You can currently use ruPrompts for text-to-text tasks, such as summarization, detoxification, style transfer, etc., and for styled text generation, as a special case of text-to-text.

Features

Modular structure for convenient extensibility
Integration with HF Transformers, support for all models with LM head
Integration with HF Hub for sharing and loading pretrained prompts
CLI and configuration system powered by Hydra
Pretrained prompts for ruGPT-3

Installation

ruPrompts can be installed with pip:

pip install ruprompts[hydra]

See Installation for other installation options.

Usage

Loading a pretrained prompt for styled text generation:

>> ppln_joke("Говорит кружка ложке") [{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]">

>>> import ruprompts
>>> from transformers import pipeline

>>> ppln_joke = pipeline("text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_joke")
>>> ppln_joke("Говорит кружка ложке")
[{"generated_text": 'Говорит кружка ложке: "Не бойся, не утонешь!".'}]

For text2text tasks:

>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли") [{"generated_text": 'Опять эти люди все испортили'}]">

>>> ppln_detox = pipeline("text2text-generation-with-prompt", prompt="konodyuk/prompt_rugpt3large_detox_russe")
>>> ppln_detox("Опять эти тупые дятлы все испортили, чтоб их черти взяли")
[{"generated_text": 'Опять эти люди все испортили'}]

Proceed to Quick Start for a more detailed introduction or start using ruPrompts right now with our Colab Tutorials.

License

ruPrompts is Apache 2.0 licensed. See the LICENSE file for details.

A high-level yet extensible library for fast language model tuning via automatic prompt search

Related tags

Overview

ruPrompts

Features

Installation

Usage

License

Owner

Sber AI

Chinese segmentation library

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

DeepPavlov Tutorials

Almost State-of-the-art Text Generation library

PortaSpeech - PyTorch Implementation

Pretrain CPM - 大规模预训练语言模型的预训练代码

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

A program that uses real statistics to choose the best times to bet on BloxFlip's crash gamemode

Pipelines de datos, 2021.

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

Tools and data for measuring the popularity & growth of various programming languages.

VMD Audio/Text control with natural language

A Fast Command Analyser based on Dict and Pydantic

👑 spaCy building blocks and visualizers for Streamlit apps

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Linking data between GBIF, Biodiverse, and Open Tree of Life

Every Google, Azure & IBM text to speech voice for free