Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Text Data & NLPPLBART
Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

  • Code summarization [REF]
  • Code generation [REF]
  • Code translation [REF]
  • Clone detection [REF]
  • Vulnerability REF [REF]

Notes

  • We will publish the pretrained PLBART checkpoint soon.
  • We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}
Owner
Wasi Ahmad
I am a Ph.D. student in CS at UCLA.
Wasi Ahmad
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
This is a GUI program that will generate a word search puzzle image

Word Search Puzzle Generator Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing Cont

11 Feb 22, 2022
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

NeuralQA: A Usable Library for (Extractive) Question Answering on Large Datasets with BERT Still in alpha, lots of changes anticipated. View demo on n

Victor Dibia 220 Dec 11, 2022
This repository contains the code for "Generating Datasets with Pretrained Language Models".

Datasets from Instructions (DINO πŸ¦• ) This repository contains the code for Generating Datasets with Pretrained Language Models. The paper introduces

Timo Schick 154 Jan 01, 2023
Baseline code for Korean open domain question answering(ODQA)

Open-Domain Question Answering(ODQA)λŠ” λ‹€μ–‘ν•œ μ£Όμ œμ— λŒ€ν•œ λ¬Έμ„œ μ§‘ν•©μœΌλ‘œλΆ€ν„° μžμ—°μ–΄ μ§ˆμ˜μ— λŒ€ν•œ 닡변을 μ°Ύμ•„μ˜€λŠ” taskμž…λ‹ˆλ‹€. μ΄λ•Œ μ‚¬μš©μž μ§ˆμ˜μ— λ‹΅λ³€ν•˜κΈ° μœ„ν•΄ μ£Όμ–΄μ§€λŠ” 지문이 λ”°λ‘œ μ‘΄μž¬ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. λ”°λΌμ„œ 사전에 κ΅¬μΆ•λ˜μ–΄μžˆλŠ” Knowl

VUMBLEB 69 Nov 04, 2022
βœ”πŸ‘‰A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

AI-For-Road-Safety Challenge hosted by Omdena Hyderabad Chapter Original Repo Link : https://github.com/OmdenaAI/omdena-india-roadsafety Final Present

Prathima Kadari 7 Nov 29, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. πŸ›  Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

MITRE Cybersecurity 64 Nov 22, 2022
Score-Based Point Cloud Denoising (ICCV'21)

Score-Based Point Cloud Denoising (ICCV'21) [Paper] https://arxiv.org/abs/2107.10981 Installation Recommended Environment The code has been tested in

Shitong Luo 79 Dec 26, 2022
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.

Dense Passage Retrieval Dense Passage Retrieval (DPR) - is a set of tools and models for state-of-the-art open-domain Q&A research. It is based on the

Meta Research 1.1k Jan 07, 2023
Open-World Entity Segmentation

Open-World Entity Segmentation Project Website Lu Qi*, Jason Kuen*, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia This projec

DV Lab 408 Dec 29, 2022
Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

Mathieu Godbout 1 Nov 19, 2021
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022