System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Deep Learninggec_ip
Overview

System Combination for Grammatical Error Correction Based on Integer Programming


This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

  1. Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.

  2. Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.

  3. Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

Owner
NUS NLP Group
National University of Singapore
NUS NLP Group
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Gesture recognition on Event Data

Event based Gesture Recognition Gesture recognition on Event Data usually involv

2 Feb 14, 2022
Epidemiology analysis package

zEpid zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is

Paul Zivich 111 Jan 08, 2023
Data Augmentation with Variational Autoencoders

Documentation Pyraug This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging con

112 Nov 30, 2022
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Exploring Simple 3D Multi-Object Tracking for

QCraft 141 Nov 21, 2022
Official implementation of the paper "Topographic VAEs learn Equivariant Capsules"

Topographic Variational Autoencoder Paper: https://arxiv.org/abs/2109.01394 Getting Started Install requirements with Anaconda: conda env create -f en

T. Andy Keller 69 Dec 12, 2022
NLU Dataset Diagnostics

NLU Dataset Diagnostics This repository contains data and scripts to reproduce the results from our paper: Aarne Talman, Marianna Apidianaki, Stergios

Language Technology at the University of Helsinki 1 Jul 20, 2022
Physics-informed Neural Operator for Learning Partial Differential Equation

PINO Physics-informed Neural Operator for Learning Partial Differential Equation Abstract: Machine learning methods have recently shown promise in sol

107 Jan 02, 2023
HAR-stacked-residual-bidir-LSTMs - Deep stacked residual bidirectional LSTMs for HAR

HAR-stacked-residual-bidir-LSTM The project is based on this repository which is presented as a tutorial. It consists of Human Activity Recognition (H

Guillaume Chevalier 287 Dec 27, 2022
a baseline to practice

ccks2021_track3_baseline a baseline to practice 路径可能会有问题,自己改改 torch==1.7.1 pyhton==3.7.1 transformers==4.7.0 cuda==11.0 this is a baseline, you can fi

45 Nov 23, 2022
LVI-SAM: Tightly-coupled Lidar-Visual-Inertial Odometry via Smoothing and Mapping

LVI-SAM This repository contains code for a lidar-visual-inertial odometry and mapping system, which combines the advantages of LIO-SAM and Vins-Mono

Tixiao Shan 1.1k Dec 27, 2022
African language Speech Recognition - Speech-to-Text

Swahili-Speech-To-Text Table of Contents Swahili-Speech-To-Text Overview Scenario Approach Project Structure data: models: notebooks: scripts tests: l

2 Jan 05, 2023
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

Fellowship Prediction GitHub Profile Comparative Analysis Tool Built with BentoML Table of Contents: Features Disclaimer Technologies Used Contributin

Damir Temir 51 Dec 29, 2022
Creating Artificial Life with Reinforcement Learning

Although Evolutionary Algorithms have shown to result in interesting behavior, they focus on learning across generations whereas behavior could also be learned during ones lifetime.

Maarten Grootendorst 49 Dec 21, 2022
Simultaneous NMT/MMT framework in PyTorch

This repository includes the codes, the experiment configurations and the scripts to prepare/download data for the Simultaneous Machine Translation wi

<a href=[email protected]"> 37 Sep 29, 2022
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
Pytorch port of Google Research's LEAF Audio paper

leaf-audio-pytorch Pytorch port of Google Research's LEAF Audio paper published at ICLR 2021. This port is not completely finished, but the Leaf() fro

Dennis Fedorishin 80 Oct 31, 2022
We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

ConTNet Introduction ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large rec

93 Nov 08, 2022
Python implementation of Wu et al (2018)'s registration fusion

reg-fusion Projection of a central sulcus probability map using the RF-ANTs approach (right hemisphere shown). This is a Python implementation of Wu e

Dan Gale 26 Nov 12, 2021