NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Overview

Checks Forks Issues Pull requests Contributors License

NL-Augmenter 🦎 🐍

The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformations augment text datasets in diverse ways, including: randomizing names and numbers, changing style/syntax, paraphrasing, KB-based paraphrasing ... and whatever creative augmentation you contribute. We invite submissions of transformations to this framework by way of GitHub pull request, through August 31, 2021. All submitters of accepted transformations (and filters) will be included as co-authors on a paper announcing this framework.

The framework organizers can be contacted at [email protected].

Submission timeline

Due date Description
A̶u̶g̶u̶s̶t̶ 3̶1̶, 2̶0̶2̶1̶ P̶u̶l̶l̶ r̶e̶q̶u̶e̶s̶t̶ m̶u̶s̶t̶ b̶e̶ o̶p̶e̶n̶e̶d̶ t̶o̶ b̶e̶ e̶l̶i̶g̶i̶b̶l̶e̶ f̶o̶r̶ i̶n̶c̶l̶u̶s̶i̶o̶n̶ i̶n̶ t̶h̶e̶ f̶r̶a̶m̶e̶w̶o̶r̶k̶ a̶n̶d̶ a̶s̶s̶o̶c̶i̶a̶t̶e̶d̶ p̶a̶p̶e̶r̶
September 2̶2̶, 30 2021 Review process for pull request above must be complete

A transformation can be revised between the pull request submission and pull request merge deadlines. We will provide reviewer feedback to help with the revisions.

The transformations which are already accepted to NL-Augmenter are summarized in the transformations folder. Transformations undergoing review can be seen as pull requests.

Table of contents

Colab notebook

Open In Colab To quickly see transformations and filters in action, run through our colab notebook.

Some Ideas for Transformations

If you need inspiration for what transformations to implement, check out https://github.com/GEM-benchmark/NL-Augmenter/issues/75, where some ideas and previous papers are discussed. So far, contributions have focused on morphological inflections, character level changes, and random noise. The best new pull requests will be dissimilar from these existing contributions.

Installation

Requirements

  • Python 3.7

Instructions

# When creating a new transformation, replace this with your forked repository (see below)
git clone https://github.com/GEM-benchmark/NL-Augmenter.git
cd NL-Augmenter
python setup.py sdist
pip install -e .
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

How do I create a transformation?

Setup

First, fork the repository in GitHub! 🍴

fork button

Your fork will have its own location, which we will call PATH_TO_YOUR_FORK. Next, clone the forked repository and create a branch for your transformation, which here we will call my_awesome_transformation:

git clone $PATH_TO_YOUR_FORK
cd NL-Augmenter
git checkout -b my_awesome_transformation

We will base our transformation on an existing example. Create a new transformation directory by copying over an existing transformation. You can choose to copy from other transformation directories depending on the task you wish to create a transformation for. Check some of the existing pull requests and merged transformations first to avoid duplicating efforts or creating transformations too similar to previous ones.

cd transformations/
cp -r butter_fingers_perturbation my_awesome_transformation
cd my_awesome_transformation

Creating a transformation

  1. In the file transformation.py, rename the class ButterFingersPerturbation to MyAwesomeTransformation and choose one of the interfaces from the interfaces/ folder. See the full list of options here.
  2. Now put all your creativity in implementing the generate method. If you intend to use external libraries, add them with their version numbers in requirements.txt
  3. Update my_awesome_transformation/README.md to describe your transformation.

Testing and evaluating (Optional)

Once you are done, add at least 5 example pairs as test cases in the file test.json so that no one breaks your code inadvertently.

Once the transformation is ready, test it:

pytest -s --t=my_awesome_transformation

If you would like to evaluate your transformation against a common 🤗 HuggingFace model, we encourage you to check evaluation

Code Styling To standardized the code we use the black code formatter which will run at the time of pre-commit. To use the pre-commit hook, install pre-commit with pip install pre-commit (should already be installed if you followed the above instructions). Then run pre-commit install to install the hook. On future commits, you should see the black code formatter is run on all python files you've staged for commit.

Submitting

Once the tests pass and you are happy with the transformation, submit them for review. First, commit and push your changes:

git add transformations/my_awesome_transformation/*
git commit -m "Added my_awesome_transformation"
git push --set-upstream origin my_awesome_transformation

Finally, submit a pull request. The last git push command prints a URL that can be copied into a browser to initiate such a pull request. Alternatively, you can do so from the GitHub website.

pull request button

Congratulations, you've submitted a transformation to NL-Augmenter!

How do I create a filter?

We also accept pull-requests for creating filters which identify interesting subpopulations of a dataset. The process to add a new filter is just the same as above. All filter implementations require implementing .filter instead of .generate and need to be placed in the filters folder. So, just the way transformations can transform examples of text, filters can identify whether an example follows some pattern of text! The only difference is that while transformations return another example of the same input format, filters simply return True or False! For step-by-step instructions, follow these steps.

BIG-Bench 🪑

If you are interested in NL-Augmenter, you may also be interested in the BIG-bench large scale collaborative benchmark for language models.

Most Creative Implementations 🏆

After all pull-requests have been merged, 3 of the most creative implementations would be selected and featured on this README page and on the NL-Augmenter webpage.

License

Some transformations include components released under a different (permissive, open source) license. For license details, refer to the README.md and any license files in the transformations's or filter's directory.

Adaptive Graph Convolution for Point Cloud Analysis

Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph

64 Dec 21, 2022
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

COVID-ViT COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models This code is to response to te MIA-COV19 compe

17 Dec 30, 2022
Implementation of Convolutional LSTM in PyTorch.

ConvLSTM_pytorch This file contains the implementation of Convolutional LSTM in PyTorch made by me and DavideA. We started from this implementation an

Andrea Palazzi 1.3k Dec 29, 2022
PyTorch implementations of Generative Adversarial Networks.

This repository has gone stale as I unfortunately do not have the time to maintain it anymore. If you would like to continue the development of it as

Erik Linder-Norén 13.4k Jan 08, 2023
A project studying the influence of communication in multi-objective normal-form games

Communication in Multi-Objective Normal-Form Games This repo consists of five different types of agents that we have used in our study of communicatio

Willem Röpke 0 Dec 17, 2021
[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games

Contextual Action Language Model (CALM) and the ClubFloyd Dataset Code and data for paper Keep CALM and Explore: Language Models for Action Generation

Princeton Natural Language Processing 43 Dec 16, 2022
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite:lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 05, 2023
Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

Yijun Li 642 Dec 27, 2022
Space Time Recurrent Memory Network - Pytorch

Space Time Recurrent Memory Network - Pytorch (wip) Implementation of Space Time Recurrent Memory Network, recurrent network competitive with attentio

Phil Wang 50 Nov 07, 2021
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity

SSD: Single Shot MultiBox Detector Introduction Here is my pytorch implementation of 2 models: SSD-Resnet50 and SSDLite-MobilenetV2.

Viet Nguyen 149 Jan 07, 2023
A python implementation of Deep-Image-Analogy based on pytorch.

Deep-Image-Analogy This project is a python implementation of Deep Image Analogy.https://arxiv.org/abs/1705.01088. Some results Requirements python 3

Peng Lu 171 Dec 14, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

54 Dec 12, 2022
Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

Sean Naren 2k Jan 04, 2023
TransVTSpotter: End-to-end Video Text Spotter with Transformer

TransVTSpotter: End-to-end Video Text Spotter with Transformer Introduction A Multilingual, Open World Video Text Dataset and End-to-end Video Text Sp

weijiawu 66 Dec 26, 2022
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022
A Japanese Medical Information Extraction Toolkit

JaMIE: a Japanese Medical Information Extraction toolkit Joint Japanese Medical Problem, Modality and Relation Recognition The Train/Test phrases requ

7 Dec 12, 2022
CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

Tianwei Yin 134 Dec 23, 2022
商品推荐系统

商品top50推荐系统 问题建模 本项目的数据集给出了15万左右的用户以及12万左右的商品, 以及对应的经过脱敏处理的用户特征和经过预处理的商品特征,旨在为用户推荐50个其可能购买的商品。 推荐系统架构方案 本项目采用传统的召回+排序的方案。

107 Dec 29, 2022