This is the code used in the paper "Entity Embeddings of Categorical Variables".

Last update: Nov 29, 2022

Overview

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kaggle competition, please use the Kaggle branch.

To run the code one needs first download and unzip the train.csv and store.csv files on Kaggle and put them in this folder.

If you use Anaconda you can install the dependecies like the following example:

conda create --name ee python=3.7 pip
conda activate ee
pip install scikit-learn xgboost tensorflow keras jupyter matplotlib

Please refer to Keras for more details regarding how to install keras.

Next, run the following scripts to extract the csv files and prepare the features:

python3 extract_csv_files.py
python3 prepare_features.py

To run the models:

python3 train_test_model.py

You can anaylize the embeddings with plot_embeddings.ipynb. For example, the following are the learned embeeding of German States printed in 2D and the map of Germany side by side. Considering the algorithm knows nothing about German geography the remarkable resemblance between the two demonstrates the power of the algorithm for abductive reasoning. I expect entity embedding will be a very useful tool to study the relationship of genome, proteins, drugs, diseases and I would love to see its applications in biology and medicine one day.

Visualizaiton of Entity Embedding of German States in 2D	Map of Germany

This is the code used in the paper "Entity Embeddings of Categorical Variables".

Related tags

Overview

Owner

Cheng Guo

Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Models, datasets and tools for Facial keypoints detection

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

A Pytorch Implementation of ClariNet

The official homepage of the (outdated) COCO-Stuff 10K dataset.

Multiple paper open-source codes of the Microsoft Research Asia DKI group

Official Python implementation of the 'Sparse deconvolution'-v0.3.0

Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

Codebase for Image Classification Research, written in PyTorch.

Image super-resolution through deep learning

An experimental technique for efficiently exploring neural architectures.

Website for D2C paper

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Source code for Fixed-Point GAN for Cloud Detection

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Semi-supervised Transfer Learning for Image Rain Removal. In CVPR 2019.

This repo provides the official code for TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/pdf/2103.04430.pdf).