NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Related tags

Deep LearningTLM
Overview

NLP From Scratch Without Large-Scale Pretraining

This repository contains the code, pre-trained model checkpoints and curated datasets for our paper: NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework.

In our proposed framework, named TLM (task-driven language modeling), instead of training a language model over the entire general corpus and then finetuning it on task data, we first usetask data as queries to retrieve a tiny subset of the general corpus, and then perform joint learning on both the task objective and self-supervised language modeling objective.

Requirements

We implement our models and training loops based on the opensource products from HuggingFace. The core denpencies of this repository are listed in requirements.txt, which can be installed through:

pip install -r requirements.txt

All our experiments are conducted on a node with 8 A100 40GB SXM gpus. Different computational devices may result slightly different results from the reported ones.

Models and Datasets

We release the trained models on 8 tasks with 3 different scales, together with the task datasets and selected external data. Our released model checkpoints, datasets and the performance of each model for each task are listed in the following table.

AGNews Hyp. Help. IMDB ACL. SciERC Chem. RCT
Small 93.74 93.53 70.54 93.08 69.84 80.51 81.99 86.99
Medium 93.96 94.05 70.90 93.97 72.37 81.88 83.24 87.28
Large 94.36 95.16 72.49 95.77 72.19 83.29 85.12 87.50

The released models and datasets are compatible with HuggingFace's Transformers and Datasets. We provide an example script to evaluate a model checkpoints on a certain task, run

bash example_scripts/evaluate.sh

To get the evaluation results for SciERC with a small-scale model.

Training

We provide two example scripts to train a model from scratch, run

bash example_scripts/train.sh && bash example_scripts/finetune.sh

To train a small-scale model for SciERC. Here example_scripts/train.sh corresponds to the first stage training where the external data ratio and MLM weight are non-zero, and example_scripts/finetune.sh corresponds to the second training stage where no external data or self-supervised loss can be perceived by the model.

Citation

Please cite our paper if you use TLM in your work:

@misc{yao2021tlm,
title={NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework},
author={Yao, Xingcheng and Zheng, Yanan and Yang, Xiaocong and Yang, Zhilin},
year={2021}
}
Owner
Xingcheng Yao
Undergraduate student at IIIS, Tsinghua University
Xingcheng Yao
A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Higher Performance Pytorch Implementation of DeepLab V3 Plus Introduction This repo is an (re-)implementation of Encoder-Decoder with Atrous Separab

linhua 326 Nov 22, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
ACL'2021: LM-BFF: Better Few-shot Fine-tuning of Language Models

LM-BFF (Better Few-shot Fine-tuning of Language Models) This is the implementation of the paper Making Pre-trained Language Models Better Few-shot Lea

Princeton Natural Language Processing 607 Jan 07, 2023
Neural Fixed-Point Acceleration for Convex Optimization

Licensing The majority of neural-scs is licensed under the CC BY-NC 4.0 License, however, portions of the project are available under separate license

Facebook Research 27 Oct 06, 2022
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

8 Jul 09, 2021
Graph InfoClust: Leveraging cluster-level node information for unsupervised graph representation learning

Graph-InfoClust-GIC [PAKDD 2021] PAKDD'21 version Graph InfoClust: Maximizing Coarse-Grain Mutual Information in Graphs Preprint version Graph InfoClu

Costas Mavromatis 21 Dec 03, 2022
PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

PowerGridworld provides users with a lightweight, modular, and customizable framework for creating power-systems-focused, multi-agent Gym environments that readily integrate with existing training fr

National Renewable Energy Laboratory 37 Dec 17, 2022
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah

Hanxun Huang 98 Dec 07, 2022
BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构。 文档地址:https://basecls.readthedocs.io 安装 安装环境 BaseCls 需要 Python = 3.6。 BaseCls 依赖 M

MEGVII Research 28 Dec 23, 2022
Using NumPy to solve the equations of fluid mechanics together with Finite Differences, explicit time stepping and Chorin's Projection methods

Computational Fluid Dynamics in Python Using NumPy to solve the equations of fluid mechanics 🌊 🌊 🌊 together with Finite Differences, explicit time

Felix Köhler 4 Nov 12, 2022
SAN for Product Attributes Prediction

SAN Heterogeneous Star Graph Attention Network for Product Attributes Prediction This repository contains the official PyTorch implementation for ADVI

Xuejiao Zhao 9 Dec 12, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Pansharpening by convolutional neural networks in the full resolution framework

Z-PNN: Zoom Pansharpening Neural Network Pansharpening by convolutional neural networks in the full resolution framework is a deep learning method for

20 Nov 24, 2022
Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

Microsoft 148 Nov 20, 2022
Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

3DETR: An End-to-End Transformer Model for 3D Object Detection PyTorch implementation and models for 3DETR. 3DETR (3D DEtection TRansformer) is a simp

Facebook Research 487 Dec 31, 2022
A custom DeepStack model for detecting 16 human actions.

DeepStack_ActionNET This repository provides a custom DeepStack model that has been trained and can be used for creating a new object detection API fo

MOSES OLAFENWA 16 Nov 11, 2022
Reverse engineer your pytorch vision models, in style

🔍 Rover Reverse engineer your CNNs, in style Rover will help you break down your CNN and visualize the features from within the model. No need to wri

Mayukh Deb 32 Sep 24, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
This is the repository for the paper "Have I done enough planning or should I plan more?"

Metacognitive Learning Tool box https://re.is.mpg.de What Is This? This repository contains two modules used to analyse metacognitive learning in huma

0 Dec 01, 2021
The Instructed Glacier Model (IGM)

The Instructed Glacier Model (IGM) Overview The Instructed Glacier Model (IGM) simulates the ice dynamics, surface mass balance, and its coupling thro

27 Dec 16, 2022