Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

NLP - Machine learning

Just a basic Telegram AI chat bot written in Python using Pyrogram.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Application for shadowing Chinese.

A CSRankings-like index for speech researchers

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Fast, DB Backed pretrained word embeddings for natural language processing.

A Python script which randomly chooses and prints a file from a directory.

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Task-based datasets, preprocessing, and evaluation for sequence models.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

ACL'22: Structured Pruning Learns Compact and Accurate Models

A simple implementation of N-gram language model.

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

Espial is an engine for automated organization and discovery of personal knowledge

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"