Official code for HH-VAEM

Overview

HH-VAEM

This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the sampling-based feature acquisition technique presented in the paper Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo. HH-VAEM is a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. The repository contains the implementation and the experiments provided in the paper.

Please, if you use this code, cite the preprint using:

@article{peis2022missing,
  title={Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo},
  author={Peis, Ignacio and Ma, Chao and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel},
  journal={arXiv preprint arXiv:2202.04599},
  year={2022}
}

Instalation

The installation is straightforward using the following instruction, that creates a conda virtual environment named HH-VAEM using the provided file environment.yml:

conda env create -f environment.yml

Usage

Training

The project is developed in the recent research framework PyTorch Lightning. The HH-VAEM model is implemented as a LightningModule that is trained by means of a Trainer. A model can be trained by using:

# Example for training HH-VAEM on Boston dataset
python train.py --model HHVAEM --dataset boston --split 0

This will automatically download the boston dataset, split in 10 train/test splits and train HH-VAEM on the training split 0. Two folders will be created: data/ for storing the datasets and logs/ for model checkpoints and TensorBoard logs. The variable LOGDIR can be modified in src/configs.py to change the directory where these folders will be created (this might be useful for avoiding overloads in network file systems).

The following datasets are available:

  • A total of 10 UCI datasets: avocado, boston, energy, wine, diabetes, concrete, naval, yatch, bank or insurance.
  • The MNIST datasets: mnist or fashion_mnist.
  • More datasets can be easily added to src/datasets.py.

For each dataset, the corresponding parameter configuration must be added to src/configs.py.

The following models are also available (implemented in src/models/):

  • HHVAEM: the proposed model in the paper.
  • VAEM: the VAEM strategy presented in (Ma et al., 2020) with Gaussian encoder (without including the Partial VAE).
  • HVAEM: A Hierarchical VAEM with two layers of latent variables and a Gaussian encoder.
  • HMCVAEM: A VAEM that includes a tuned HMC sampler for the true posterior.
  • For MNIST datasets (non heterogeneous data), use HHVAE, VAE, HVAE and HMCVAE.

By default, the test stage will be executed at the end of the training stage. This can be cancelled with --test 0 for manually running the test using:

# Example for testing HH-VAEM on Boston dataset
python test.py --model HHVAEM --dataset boston --split 0

which will load the trained model to be tested on the boston test split number 0. Once all the splits are tested, the average results can be obtained using the script in the run/ folder:

# Example for obtaining the average test results with HH-VAEM on Boston dataset
python test_splits.py --model HHVAEM --dataset boston

Experiments

The experiments in the paper can be executed using:

# Example for running the SAIA experiment with HH-VAEM on Boston dataset
python active_learning.py --model HHVAEM --dataset boston --method mi --split 0

# Example for running the OoD experiment using MNIST and Fashion-MNIST as OoD:
python ood.py --model HHVAEM --dataset mnist --dataset_ood fashion_mnist --split 0

Once this is executed on all the splits, you can plot the SAIA error curves or obtain the average OoD metrics using the scripts in the run/ folder:

# Example for running the SAIA experiment with HH-VAEM on Boston dataset
python active_learning_plots.py --models VAEM HHVAEM --dataset boston

# Example for running the OoD experiment using MNIST and Fashion-MNIST as OoD:
python ood_splits.py --model HHVAEM --dataset mnist --dataset_ood fashion_mnist


Help

Use the --help option for documentation on the usage of any of the mentioned scripts.

Contributors

Ignacio Peis
Chao Ma
José Miguel Hernández-Lobato

Contact

For further information: [email protected]

Owner
Ignacio Peis
PhD student at UC3M \\ Visitor at the Machine Learning Group, CBL, University of Cambridge
Ignacio Peis
This project has Classification and Clustering done Via kNN and K-Means respectfully

This project has Classification and Clustering done Via kNN and K-Means respectfully. It later tests its efficiency via F1/accuracy/recall/precision for kNN and Davies-Bouldin Index for Clustering. T

Mohammad Ali Mustafa 0 Jan 20, 2022
Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python

Stock Price Prediction Bank Jago Using Facebook Prophet Machine Learning & Python Overview Bank Jago has attracted investors' attention since the end

Najibulloh Asror 3 Feb 10, 2022
Bonsai: Gradient Boosted Trees + Bayesian Optimization

Bonsai is a wrapper for the XGBoost and Catboost model training pipelines that leverages Bayesian optimization for computationally efficient hyperparameter tuning.

24 Oct 27, 2022
An open-source library of algorithms to analyse time series in GPU and CPU.

An open-source library of algorithms to analyse time series in GPU and CPU.

Shapelets 216 Dec 30, 2022
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo

Eduardo Blancas 354 Dec 31, 2022
Generate music from midi files using BPE and markov model

Generate music from midi files using BPE and markov model

Aditya Khadilkar 37 Oct 24, 2022
Machine-learning-dell - Repositório com as atividades desenvolvidas no curso de Machine Learning

📚 Descrição Neste curso da Dell aprofundamos nossos conhecimentos em Machine Learning. 🖥️ Aulas (Em curso) 1.1 - Python aplicado a Data Science 1.2

Claudia dos Anjos 1 Jan 05, 2022
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

Aaron Zuspan 76 Dec 15, 2022
Time Series Prediction with tf.contrib.timeseries

TensorFlow-Time-Series-Examples Additional examples for TensorFlow Time Series(TFTS). Read a Time Series with TFTS From a Numpy Array: See "test_input

Zhiyuan He 476 Nov 17, 2022
Python module for performing linear regression for data with measurement errors and intrinsic scatter

Linear regression for data with measurement errors and intrinsic scatter (BCES) Python module for performing robust linear regression on (X,Y) data po

Rodrigo Nemmen 56 Sep 27, 2022
Made in collaboration with Chris George for Art + ML Spring 2019.

Deepdream Eyes Made in collaboration with Chris George for Art + ML Spring 2019.

Francisco Cabrera 1 Jan 12, 2022
Pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code

pandas-method-chaining pandas-method-chaining is a plugin for flake8 that provides method chaining linting for pandas code. It is a fork from pandas-v

Francis 5 May 14, 2022
Traingenerator 🧙 A web app to generate template code for machine learning ✨

Traingenerator 🧙 A web app to generate template code for machine learning ✨ 🎉 Traingenerator is now live! 🎉

Johannes Rieke 1.2k Jan 07, 2023
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 06, 2023
Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. As one example, to check if a np.array has any NaNs using

Python for Data 835 Dec 27, 2022
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021
A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

Victor Umunna 7 Oct 13, 2022
ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

ClearML 4k Jan 09, 2023
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 09, 2023