Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Last update: Dec 19, 2022

Related tags

Deep Learning incontext-learning

Overview

GINC small-scale in-context learning dataset

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. We provide code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

Quickstart

Please create a conda environment or virtualenv using the information in conda-env.yml, then install transformers by going into the transformers/ directory and running pip install -e .. Modify consts.sh to change the default output locations and insert code to activate the environment of choice. Run scripts/runner.sh to run all the experiments on sbatch.

Explore the data

The default dataset has vocab size 50 and the pretraining data is generated as a mixture of 5 HMMs. The pretraining dataset is in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/train.json while in-context prompts are in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/id_prompts_randomsample_*.json.

This repo contains the experiments for the paper An Explanation of In-context Learning as Implicit Bayesian Inference. If you found this repo useful, please cite

@article{xie2021incontext,
  author = {Sang Michael Xie and Aditi Raghunathan and Percy Liang and Tengyu Ma},
  journal = {arXiv preprint arXiv:2111.02080},
  title = {An Explanation of In-context Learning as Implicit Bayesian Inference},
  year = {2021},
}

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Related tags

Overview

GINC small-scale in-context learning dataset

Quickstart

Explore the data

Owner

P-Lambda

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

ML model to classify between cats and dogs

Fashion Entity Classification

Adjust Decision Boundary for Class Imbalanced Learning

Interactive Image Generation via Generative Adversarial Networks

Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Teaches a student network from the knowledge obtained via training of a larger teacher network

An experiment on the performance of homemade Q-learning AIs in Agar.io depending on their state representation and available actions

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

Face Mask Detection System built with OpenCV, TensorFlow using Computer Vision concepts

Crawl & visualize ICLR papers and reviews

Pretty Tensor - Fluent Neural Networks in TensorFlow

(IEEE TIP 2021) Regularized Densely-connected Pyramid Network for Salient Instance Segmentation

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

DeepAL: Deep Active Learning in Python

Does Pretraining for Summarization Reuqire Knowledge Transfer?

Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN.

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling