Ludwig Benchmarking Toolkit

Overview

Ludwig Benchmarking Toolkit

The Ludwig Benchmarking Toolkit is a personalized benchmarking toolkit for running end-to-end benchmark studies across an extensible set of tasks, deep learning models, standard datasets and evaluation metrics.

Getting set-up

To get started, use the following commands to set-up your conda environment.

git clone https://github.com/HazyResearch/ludwig-benchmarking-toolkit.git
cd ludwig-benchmarking-toolkit
conda env create -f environments/{environment-osx.yaml, environment-linux.yaml}
conda activate lbt

Relevant files and directories

experiment-templates/task_template.yaml: Every task (i.e. text classification) will have its owns task template. The template specifies the model architecture (encoder and decoder structure), training parameters, and a hyperopt configuration for the task at hand. A large majority of the values of the template will be populated by the values in the hyperopt_config.yaml file and dataset_metadata.yaml at training time. The sample task template located in experiment-templates/task_template.yaml is for text classification. See sample-task-templates/ for other examples.

experiment-templates/hyperopt_config.yaml: provides a range of values for training parameters and hyperopt params that will populate the hyperopt configuration in the model template

experiment-templates/dataset_metadata.yaml: contains list of all available datasets (and associated metadata) that the hyperparameter optimization can be performed over.

model-configs/: contains all encoder specific yaml files. Each files specifies possible values for relevant encoder parameters that will be optimized over. Each file in this directory adheres to the naming convention {encoder_name}_hyperopt.yaml

hyperopt-experiment-configs/: houses all experiment configs built from the templates specified above (note: this folder will be populated at run-time) and will be used when the hyperopt experiment is called. At a high level, each config file specifies the training and hyperopt information for a (task, dataset, architecture) combination. An example might be (text classification, SST2, BERT)

elasticsearch_config.yaml : this is an optional file that is to be defined if an experiment data will be saved to an elastic database.

USAGE

Command-Line Usage

Running your first TOY experiment:

For testing/setup purposes we have included a toy dataset called toy_agnews. This dataset contains a small set of training, test and validation samples from the original agnews dataset.

Before running a full-scale experiment, we recommend running an experiment locally on the toy dataset:

python experiment_driver.py --run_environment local --datasets toy_agnews --custom_models_list rnn

Running your first REAL experiment:

Steps for configuring + running an experiment:

  1. Declare and configure the search space of all non-model specific training and preprocessing hyperparameters in the experiment-templates/hyperopt_config.yaml file. The parameters specified in this file will be used across all model experiments.

  2. Declare and configure the search space of model specific hyperparameters in the {encoder}_hyperopt.yaml files in ./model_configs

    NOTE:

    • for both (1) and (2) see the Ludwig Hyperparamter Optimization guide to see what parameters for training, preprocessing, and input/ouput features can be used in the hyperopt search
    • if the exectuor type is Ray the list of available search spaces and input format differs slightly than the built-in ludwig types. Please see the Ray Tune search space docs for more information.
  3. Run the following command specifying the datasets, encoders, path to elastic DB index config file, run environment and more:

        python experiment_driver.py \
            --experiment_output_dir  
         
          
            --run_environment {local, gcp}
            --elasticsearch_config 
          
           
            --dataset_cache_dir 
           
            
            --custom_model_list 
            
             
            --datasets 
             
               --resume_existing_exp bool 
             
            
           
          
         

NOTE: Please use python experiment_driver.py -h to see list of available datasets, encoders and args

API Usage

It is also possible to run, customize and experiments using LBTs APIs. In the following section, we describe the three flavors of APIs included in LBT.

experiment API

This API provides an alternative method for running experiments. Note that runnin experiments via the API still requires populating the aforemented configuration files

from lbt.experiments import experiment

experiment(
    models = ['rnn', 'bert'],
    datasets = ['agnews'],
    run_environment = "local",
    elastic_search_config = None,
    resume_existing_exp = False,
)

tools API

This API provides access to two tooling integrations (TextAttack and Robustness Gym (RG)). The TextAttack API can be used to generate adversarial attacks. Moreover, users can use the TextAttack interface to augment data files. The RG API which empowers users to inspect model performance on a set of generic, pre-built slices and to add more slices for their specific datasets and use cases.

from lbt.tools.robustnessgym import RG 
from lbt.tools.textattack import attack, augment

# Robustness Gym API Usage
RG( dataset_name="AGNews",
    models=["bert", "rnn"],
    path_to_dataset="agnews.csv", 
    subpopulations=[ "entities", "positive_words", "negative_words"]))

# TextAttack API Usage
attack(dataset_name="AGNews", path_to_model="agnews/model/rnn_model",
    path_to_dataset="agnews.csv", attack_recipe=["CharSwapAugmenter"])

augment(dataset_name="AGNews", transformations_per_example=1
   path_to_dataset="agnews.csv", augmenter=["WordNetAugmenter"])

visualizations API

This API provides out-of-the-box support for visualizations for learning behavior, model performance, and hyperparameter optimization using the training and evaluation statistics generated during model training

import lbt.visualizations

# compare model performance
compare_performance_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_feature_name="class_index",
)

# compare training and validation trajectory
learning_curves_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_feature_name="class_index",
)

# visualize hyperoptimzation search
hyperopt_viz(
    dataset_name="toy_agnews",
    model_name="rnn",
    output_dir="."
)

EXPERIMENT EXTENSIBILITY

Adding new custom datasets

Adding custom dataset requires creating a new LBTDataset class and adding it to the dataset registry. Creating an LBTDataset object requires implementing three class methods: download, process and load. Please see the the ToyAGNews dataset as an example.

Adding new metrics

Adding custom evaluation metrics requires creating a new LBTMetric class and adding it to the metrics registry. Creating an LBTMetric object requires implementing the run class method which takes as potential inputs a path to a model directory, path to a dataset, training batch size, and training statistics. Please see the pre-built LBT metrics for examples.

ELASTICSEARCH RESEARCH DATABASE

To get credentials to upload experiments to the shared Elasticsearch research database, please fill out this form.

Owner
HazyResearch
We are a CS research group led by Prof. Chris Ré.
HazyResearch
A fast model to compute optical flow between two input images.

DCVNet: Dilated Cost Volumes for Fast Optical Flow This repository contains our implementation of the paper: @InProceedings{jiang2021dcvnet, title={

Huaizu Jiang 8 Sep 27, 2021
StyleGAN2-ADA-training-jupyter - Training custom datasets in styleGAN2-ADA by NVIDIA using Jupyter

styleGAN2-ADA-training-jupyter Training custom datasets in styleGAN2-ADA on Jupyter Official StyleGAN2-ADA by NIVIDIA Paper Training Generative Advers

Mang Su Hyun 2 Feb 24, 2022
Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

samplics 37 Dec 15, 2022
TensorFlow (Python API) implementation of Neural Style

neural-style-tf This is a TensorFlow implementation of several techniques described in the papers: Image Style Transfer Using Convolutional Neural Net

Cameron 3.1k Jan 02, 2023
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization Official PyTorch implementation for our URST (Ultra-Resolution Sty

czczup 148 Dec 27, 2022
HyperCube: Implicit Field Representations of Voxelized 3D Models

HyperCube: Implicit Field Representations of Voxelized 3D Models Authors: Magdalena Proszewska, Marcin Mazur, Tomasz Trzcinski, Przemysław Spurek [Pap

Magdalena Proszewska 3 Mar 09, 2022
Training Cifar-10 Classifier Using VGG16

opevcvdl-hw3 This project uses pytorch and Qt to achieve the requirements. Version Python 3.6 opencv-contrib-python 3.4.2.17 Matplotlib 3.1.1 pyqt5 5.

Kenny Cheng 3 Aug 17, 2022
Problem-943.-ACMP - Problem 943. ACMP

Problem-943.-ACMP В "main.py" расположен вариант моего решения задачи 943 с серв

Konstantin Dyomshin 2 Aug 19, 2022
[ICML 2022] The official implementation of Graph Stochastic Attention (GSAT).

Graph Stochastic Attention (GSAT) The official implementation of GSAT for our paper: Interpretable and Generalizable Graph Learning via Stochastic Att

85 Nov 27, 2022
Active window border replacement for window managers.

xborder Active window border replacement for window managers. Usage git clone https://github.com/deter0/xborder cd xborder chmod +x xborders ./xborder

deter 250 Dec 30, 2022
Official implementation of "Generating 3D Molecules for Target Protein Binding"

Generating 3D Molecules for Target Protein Binding This is the official implementation of the GraphBP method proposed in the following paper. Meng Liu

DIVE Lab, Texas A&M University 74 Dec 07, 2022
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
Computer Vision Paper Reviews with Key Summary of paper, End to End Code Practice and Jupyter Notebook converted papers

Computer-Vision-Paper-Reviews Computer Vision Paper Reviews with Key Summary along Papers & Codes. Jonathan Choi 2021 The repository provides 100+ Pap

Jonathan Choi 2 Mar 17, 2022
Self-Supervised Image Denoising via Iterative Data Refinement

Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S

Zhang Yi 72 Jan 01, 2023
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022
MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

MOpt-AFL 1. Description MOpt-AFL is a AFL-based fuzzer that utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal sele

172 Dec 18, 2022
MolRep: A Deep Representation Learning Library for Molecular Property Prediction

MolRep: A Deep Representation Learning Library for Molecular Property Prediction Summary MolRep is a Python package for fairly measuring algorithmic p

AI-Health @NSCC-gz 83 Dec 24, 2022
Real-time analysis of intracranial neurophysiology recordings.

py_neuromodulation Click this button to run the "Tutorial ML with py_neuro" notebooks: The py_neuromodulation toolbox allows for real time capable pro

Interventional Cognitive Neuromodulation - Neumann Lab Berlin 15 Nov 03, 2022
Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

EmotionUI Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI. demo screenshot (with RealSense) required packages Python = 3.6 num

Yang Jiao 2 Dec 23, 2021
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

face.evoLVe: High-Performance Face Recognition Library based on PaddlePaddle & PyTorch Evolve to be more comprehensive, effective and efficient for fa

Zhao Jian 3.1k Jan 02, 2023