GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Last update: Jan 05, 2023

Related tags

Overview

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger is an open-source toolkit for grammatical profiling for language learning. It can analyze text in English and Chinese and show you grammatical items included in the input, along with its estimated difficulty.

Usage

GrammarTagger is written in Python (3.7+) and AllenNLP (2.1.0+). If you have conda installed, you can set up the environment as follows:

git clone https://github.com/octanove/grammartagger.git
cd grammartagger
conda create -n grammartagger python=3.7
conda activate grammartagger
pip install -r requirements.txt

Also, download the pretrained models (see below). After these steps, you can run GrammarTagger as follows:

English:

echo 'He loves to learn new languages, and last month he practiced some lessons in Spanish.' | python scripts/predict.py model-en-multi.tar.gz | jq
{
  "spans": [
    {
      "span": [0, 3],
      "tokens": ["[CLS]", "he", "loves", "to"],
      "label": "194:VP.SV.AFF"
    },
    {
      "span": [2, 2],
      "tokens": ["loves"],
      "label": "60:TA.PRESENT.does.AFF"
    },
    {
      "span": [2, 4],
      "tokens": ["loves", "to", "learn"],
      "label": "101:TO.VV_to_do"
    },
    ...
  ],
  "tokens": [
      "[CLS]", "he", "loves", "to", "learn", "new", "languages", ",",
      "and", "last", "month", "he", "practiced", "some", "lessons", "in", "spanish", ".", "[SEP]"
  ],
  "level_probs": {
    "c2": 0.008679441176354885,
    "b2": 0.005526999477297068,
    "c1": 0.05267713591456413,
    "b1": 0.06360447406768799,
    "a2": 0.06990284472703934,
    "a1": 0.7954732775688171
  }
}

Chinese:

$ echo '她住得很远，我想送她回去。' | python scripts/predict.py model-zh-multi.tar.gz | jq
{
  "spans": [
    {
      "span": [2, 5],
      "tokens": ["住", "得", "很", "远"],
      "label": "2.12.1:V 得 A:(using adverbs)"
    },
    {
      "span": [4, 4]
      "tokens": ["很"],
      "label": "1.06.2:很:very"
    },
    {
      "span": [8, 8],
      "tokens": ["想"],
      "label": "1.08.1:想:to want"
    }
  ],
  "tokens": ["[CLS]", "她", "住", "得", "很", "远", "，", "我", "想", "送", "她", "回", "去", "。", "[SEP]"],
  "level_probs": {
    "HSK 6": 9.971807230613194e-06,
    "HSK 5": 0.0011904890416190028,
    "HSK 3": 0.005279902834445238,
    "HSK 4": 0.00014815296162851155,
    "HSK 2": 0.9917035102844238,
    "HSK 1": 0.0016456041485071182
  }
}

Technical details

GrammarTagger is based on pretrained contextualizers, namely BERT (Devlin et al. 2019), and span classification. See the following paper for more details.

Hagiwara et al. 2021. GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for Language Education

Pretrained models

English: model-en-multi.tar.gz (387 MB)
Chinese: model-zh-multi.tar.gz (363 MB)

These pretrained models are licensed under CC BY-NC-ND 4.0 for academic/personal uses. If you are interested in a commercial license, please contact [email protected]. We are also working on improved models with wider grammar coverage and higher accuracy.

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Related tags

Overview

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

Usage

Technical details

Pretrained models

Owner

Octanove Labs

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Japanese synonym library

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Auto-researching tool generating word documents.

DAGAN - Dual Attention GANs for Semantic Image Synthesis

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

Predict the spans of toxic posts that were responsible for the toxic label of the posts

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Contains descriptions and code of the mini-projects developed in various programming languages

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Constituency Tree Labeling Tool

A Telegram bot to add notes to Flomo.

Bot to connect a real Telegram user, simulating responses with OpenAI's davinci GPT-3 model.

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.