Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Last update: Aug 09, 2022

Related tags

Deep Learning grokking

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Original paper can be found here

Datasets

I'm not super clear on how they defined their division. I am using integer division:

$$x\circ y = (x // y) mod p$$, for some prime $$p$$ and $$0\leq x,y \leq p$$
$$x\circ y = (x // y) mod p$$ if y is odd else (x - y) mod p, for some prime $$p$$ and $$0\leq x,y \leq p$$

Hyperparameters

The default hyperparameters are from the paper, but can be adjusted via the command line when running train.py

Running experiments

To run with default settings, simply run python train.py. The first time you train on any dataset you have to specify --force_data.

Arguments:

optimizer args

"--lr", type=float, default=1e-3
"--weight_decay", type=float, default=1
"--beta1", type=float, default=0.9
"--beta2", type=float, default=0.98

model args

"--num_heads", type=int, default=4
"--layers", type=int, default=2
"--width", type=int, default=128

data args

"--data_name", type=str, default="perm", choices=[
- "perm_xy", # permutation composition x * y
- "perm_xyx1", # permutation composition x * y * x^-1
- "perm_xyx", # permutation composition x * y * x
- "plus", # x + y
- "minus", # x - y
- "div", # x / y
- "div_odd", # x / y if y is odd else x - y
- "x2y2", # x^2 + y^2
- "x2xyy2", # x^2 + y^2 + xy
- "x2xyy2x", # x^2 + y^2 + xy + x
- "x3xy", # x^3 + y
- "x3xy2y" # x^3 + xy^2 + y ]
"--num_elements", type=int, default=5 (choose 5 for permutation data, 97 for arithmetic data)
"--data_dir", type=str, default="./data"
"--force_data", action="store_true", help="Whether to force dataset creation."

training args

"--batch_size", type=int, default=512
"--steps", type=int, default=10**5
"--train_ratio", type=float, default=0.5
"--seed", type=int, default=42
"--verbose", action="store_true"
"--log_freq", type=int, default=10
"--num_workers", type=int, default=4

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Related tags

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Datasets

Hyperparameters

Running experiments

Arguments:

optimizer args

model args

data args

training args

Owner

Tom Lieberum

Minimalistic PyTorch training loop

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Source code for CVPR 2020 paper "Learning to Forget for Meta-Learning"

Catalyst.Detection

This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

Share a benchmark that can easily apply reinforcement learning in Job-shop-scheduling

Code release for ConvNeXt model

XViT - Space-time Mixing Attention for Video Transformer

Deeplab-resnet-101 in Pytorch with Jaccard loss

A PyTorch Toolbox for Face Recognition

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Tgbox-bench - Simple TGBOX upload speed benchmark

Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

Learning Spatio-Temporal Transformer for Visual Tracking

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

MTA:SA Server Configer.