Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

Overview

A Comprehensive Experimental Evaluation for Database Configuration Tuning

This is the source code to the paper "Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation". Please refer to the paper for the experimental details.

Table of Content

An Efficient Database Configuration Tuning Benchmark via Surrogate

Through the benchmark, you can evaluate the tuning optimizers' performance with minimum overhead.

Quick installation & Run

  1. Preparations: Python == 3.7

  2. Install packages and download the surrogate model

    pip install -r requirements.txt
    pip install .

The surrogate models can be found in the Google drive. To easily run the tuning benchmark, you can download the surrogate models and place them in the fold autotune/tuning_benchmark/surrogate.

  1. Run the benchmark. We use optimization over the configuration space of JOB as an example.
python run_benchmark.py --method=VBO --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job  --lhs_log=result/job_5knobs_vbo.res
python run_benchmark.py --method=MBO   --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job --lhs_log=result/job_5knobs_mbo.res
python run_benchmark.py --method=SMAC  --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job   --lhs_log=result/job_5knobs_smac.res
python run_benchmark.py --method=TPE --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job  --lhs_log=result/job_5knobs_tpe.res
python run_benchmark.py --method=TURBO --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job --lhs_log=result/job_5knobs_turbo.res --tr_init 
python run_benchmark.py --method=GA --knobs_config=experiment/gen_knobs/JOB_shap.json --knobs_num=5 --workload=job --lhs_log=result/job_5knobs_ga.res 

Data Description

You can find all the training data for the tuning benchmark in autotune/tuning_benchmark/data.

Experimental Evaluation

Environment Installation

In our experiments, the operating system is Linux 4.9. We conduct experimets on MySQL 5.7.19.

  1. Preparations: Python == 3.7

  2. Install packages

    pip install -r requirements.txt
    pip install .
  3. Download and install MySQL 5.7.19 and boost

    wget http://sourceforge.net/projects/boost/files/boost/1.59.0/boost_1_59_0.tar.gz
    wget https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-boost-5.7.19.tar.gz
    
    sudo cmake . -DCMAKE_INSTALL_PREFIX=PATH_TO_INSTALL -DMYSQL_DATADIR=PATH_TO_DATA -DDEFAULT_CHARSET=utf8 -DDEFAULT_COLLATION=utf8_general_ci -DMYSQL_TCP_PORT=3306 -DWITH_MYISAM_STORAGE_ENGINE=1 -DWITH_INNOBASE_STORAGE_ENGINE=1 -DWITH_ARCHIVE_STORAGE_ENGINE=1 -DWITH_BLACKHOLE_STORAGE_ENGINE=1 -DWITH_MEMORY_STORAGE_ENGINE=1 -DENABLE_DOWNLOADS=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=PATH_TO_BOOST;
    sudo make -j 16;
    sudo make install;

Workload Preparation

SYSBENCH

Download and install

git clone https://github.com/akopytov/sysbench.git
./autogen.sh
./configure
make && make install

Load data

sysbench --db-driver=mysql --mysql-host=$HOST --mysql-socket=$SOCK --mysql-port=$MYSQL_PORT --mysql-user=root --mysql-password=$PASSWD --mysql-db=sbtest --table_size=800000 --tables=150 --events=0 --threads=32 oltp_read_write prepare > sysbench_prepare.out

OLTP-Bench

We install OLTP-Bench to use the following workload: TPC-C, SEATS, Smallbank, TATP, Voter, Twitter, SIBench.

  • Download
git clone https://github.com/oltpbenchmark/oltpbench.git
  • To run oltpbenchmark outside the folder, modify the following file:

    • ./src/com/oltpbenchmark/DBWorkload.java (Line 85)

      pluginConfig = new XMLConfiguration("PATH_TO_OLTPBENCH/config/plugin.xml"); # modify this
      
    • ./oltpbenchmark

      
      #!/bin/bash
      
      java -Xmx8G -cp `$OLTPBENCH_HOME/classpath.sh bin` -Dlog4j.configuration=$OLTPBENCH_HOME/log4j.properties com.oltpbenchmark.DBWorkload $@
      
      
    • ./classpath.sh

      #!/bin/bash
      
      echo -ne "$OLTPBENCH_HOME/build"
      
      for i in `ls $OLTPBENCH_HOME/lib/*.jar`; do
      
          # IMPORTANT: Make sure that we do not include hsqldb v1
      
          if [[ $i =~ .*hsqldb-1.* ]]; then
      
              continue
      
          fi
      
          echo -ne ":$i"
      
      done
      
  • Install

    ant bootstrap
    ant resolve
    ant build

Join-Order-Benchmark (JOB)

Download IMDB Data Set from http://homepages.cwi.nl/~boncz/job/imdb.tgz.

Follow the instructions of https://github.com/winkyao/join-order-benchmark to load data into MySQL.

Environment Variables

Before running the experiments, the following environment variables require to be set.

export SYSBENCH_BIN=PATH_TO_sysbench/src/sysbench
export OLTPBENCH_BIN=PATH_TO_oltpbench/oltpbenchmark
export MYSQLD=PATH_TO_mysqlInstall/bin/mysqld
export MYSQL_SOCK=PATH_TO_mysql/base/mysql.sock
export MYCNF=PATH_TO_autotune/template/experiment_normandy.cnf
export DATADST=PATH_TO_mysql/data
export DATASRC=PATH_TO_mysql/data_copy

Experiments Design

All optimization methods are listed as follows:

Method String of ${METHOD}
Vanilla BO VBO
Mixed-Kernel BO MBO
Sequential Model-based Algorithm Configuration SMAC
Tree-structured Parzen Estimator TPE
Trust-Region BO TURBO
Deep Deterministic Policy Gradient DDPG
Genetic Algorithm GA

Exp.1: Tuning improvement over knob set generated by different important measurements.

Compared importance measurements: lasso, gini, fanova, ablation, shap.

To conduct the experiment shown in Figure 3(a), the script is as follows. Please specify ${lhs_log}.

python train.py --knobs_config=experiment/gen_knobs/JOB_lasso.json    --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_gini.json     --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_fanova.json   --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_ablation.json --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_shap.jso      --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}

python train.py --knobs_config=experiment/gen_knobs/JOB_lasso.json    --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_gini.json     --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_fanova.json   --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_ablation.json --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_shap.jso      --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}

To conduct the experiments in (b), (c), and (d), modify ${knobs_num},${method},${workload}, ${dbname}, and ${y_variable}, where

  • ${knobs_num} = 5, 20

  • ${method} = VBO, DDPG

  • ${workload} = job, sysbench

    • if ${workload} == job, then ${dbname} = imdbload, ${y_variable}=lat
    • if ${workload} == sysbench, then ${dbname} =sbtest , ${y_variable}=tps

Note${knobs_config} indicates the configuration file where knobs are ranked by importance.

  • We provide the configuration file generated on our VM: experiment/gen_knobs/${workload}_${measure}.json.
  • You can also generate new configuration file with samples in your environment.

Exp.2: Performance improvement and tuning cost when increasing the number of tuned knobs.

To conduct the experiment shown in Figure 5 (a) and 5 (b), the script is as follows.

python train.py --method=VBO --workload=job --dbname=imdbload --y_variable=lat --lhs_num=10 --knobs_num=${knobs_num} --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=VBO --workload=sysbench --dbname=sbtest --y_variable=tps --lhs_num=10 --knobs_num=${knobs_num} --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --lhs_log=${lhs_log}

Please specify ${knobs_num} and ${lhs_log}, where

  • ${knobs_num} = 5, 10, 15, 20, 30, 50, 70, 90, 197

Exp.3: Incremental Knob Selection.

Compared methods: 5 Knobs, 20 Knobs, increase, decrease.

To conduct the experiment shown in Figure 6(a), the script is as follows. Please specify ${lhs_log}.

python train.py --method=VBO       --knobs_num=5  --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=VBO       --knobs_num=20 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=increase --knobs_num=-1 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=decrease   --knobs_num=-1 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}

To conduct the experiment shown in (b), you can

  • replace --workload=JOB --y_variable=lat with --workload=sysbench --y_variable=tps

Exp.4: Optimizer comparision on different configuration space.

Compared optimizers: VBO, MBO, SMAC, TPE, TURBO, DDPG, GA.

To conduct the experiment shown in Figure 7(a), the script is as follows. Please specify ${lhs_log}.

python train.py --method=VBO   --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=MBO   --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=SMAC  --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=TPE   --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=TURBO --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=DDPG  --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=GA    --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}

To conduct the experiment shown in (b), (c), (d), (e), (f), and (g), you can

  • replace --knobs_num=5 with--knobs_num=20 or --knobs_num=197
  • replace --workload=JOB --y_variable=lat --dbname=imdbload with --workload=sysbench --y_variable=tps --dbname=sbtest

Exp.5: Comparison experiment for knobs heterogeneity.

Compared optimizers: VBO, MBO, SMAC, DDPG.

To conduct the experiment shown in Figure 8(a) and (b), the script is as follows.

python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname}   --knobs_config=experiment/gen_knobs/JOB_continuous.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname}   --knobs_config=experiment/gen_knobs/JOB_heterogeneous.json --lhs_log=${lhs_log} --lhs_num=10

Please specify ${method}, ${dbname} and ${lhs_log}, where

  • ${method} is one of VBO, MBO, SMAC, DDPG.

Exp.6: Algorithm overhead comparison.

Compared optimizers: MBO, SMAC, TPE, TURBO, DDPG, GA.

To conduct the experiment shown in Figure 8(a) and (b), the script is as follows.

python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname}   --knobs_config=experiment/gen_knobs/job_shap.json --lhs_log=${lhs_log} --lhs_num=10

Please specify ${method}, ${dbname} and ${lhs_log}, where

  • ${method} is one of MBO, SMAC, TPE, TURBO, DDPG, GA.

Note if you have already done Exp.4, you can skip running the above script and analyze log files in script/log/.

Exp.7: Transfering methods comparison.

Compared methods: RGPE-MBO, RGPE-SMAC, MAP-MBO, MAP-SMAC, FineTune-DDPG

To conduct the experiment shown in Table 9, there are two steps:

  • Pre-train on source workloads (Smallbank, SIBench, Voter, Seats, TATP);
  • Validate on target workloads (TPCC, SYSBENCH, Twitter).

Scripts for pre-trains is similar to the ones for Exp.4

To validate on target workloads, the scripts are as follows.

python train.py --method=MBO  --RGPE --source_repo=${repo}         --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc   --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10 
python train.py --method=SMAC --RGPE --source_repo=${repo}         --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc   --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10  
python train.py --method=MBO  --workload_map --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc   --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10 
python train.py --method=SMAC --workload_map --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc   --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10 
python train.py --method=DDPG --params=model_params/${ddpg_params} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc   --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10 

Note that

  • for RGPE- methods, you should specify --RGPE --source_repo=${repo}
  • for MAP- methods, you should specify --workload_map --source_repo=${repo}
  • for FineTune-DDPG, you should specify --params=model_params/${ddpg_params}

Project Code Overview

  • autotune/tuner.py : the implemented optimization methods.
  • autotune/dbenv.py : the interacting functions with database.
  • script/train.py : the python script to start an experiment.
  • script/experiment/gen_knob : the knob importance ranking files generated by different methods.
Owner
DAIR Lab
Data and Intelligence Research (DAIR) Lab @ Peking University
DAIR Lab
It helps user to learn Pick-up lines and share if he has a better one

Pick-up-Lines-Generator(Open Source) It helps user to learn Pick-up lines Share and Add one or many to the DataBase Unique SQLite DataBase AI Undercon

knock_nott 0 May 04, 2022
A crossplatform menu bar application using mpv as DLNA Media Renderer.

Macast Chinese README A menu bar application using mpv as DLNA Media Renderer. Install MacOS || Windows || Debian Download link: Macast release latest

4.4k Jan 01, 2023
Source code for The Power of Many: A Physarum Swarm Steiner Tree Algorithm

Physarum-Swarm-Steiner-Algo Source code for The Power of Many: A Physarum Steiner Tree Algorithm Code implements ideas from the following papers: Sher

Sheryl Hsu 2 Mar 28, 2022
Bayesian Optimization using GPflow

Note: This package is for use with GPFlow 1. For Bayesian optimization using GPFlow 2 please see Trieste, a joint effort with Secondmind. GPflowOpt GP

GPflow 257 Dec 26, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

NVIDIA Research Projects 13.1k Jan 09, 2023
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

43 Oct 14, 2022
Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices,

Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices, Linh Van Ma, Tin Trung Tran, Moongu Jeon, ICAIIC 2022 (The 4th

Linh 11 Oct 10, 2022
FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI 声明: 本项目仅限于学习交流,不可用于非法用途,包括但不限于:用于游戏外挂等,使用本项目产生的任何后果与本人无关! 简介 本项目基于yolov5,实现了一款FPS类游戏(CF、CSGO等)的自瞄AI,本项目旨在使用现

Fabian 246 Dec 28, 2022
UNAVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring

UNAVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring Code Summary aggregate.py: this script aggr

1 Dec 28, 2021
Do you like Quick, Draw? Well what if you could train/predict doodles drawn inside Streamlit? Also draws lines, circles and boxes over background images for annotation.

Streamlit - Drawable Canvas Streamlit component which provides a sketching canvas using Fabric.js. Features Draw freely, lines, circles, boxes and pol

Fanilo Andrianasolo 325 Dec 28, 2022
The code written during my Bachelor Thesis "Classification of Human Whole-Body Motion using Hidden Markov Models".

This code was written during the course of my Bachelor thesis Classification of Human Whole-Body Motion using Hidden Markov Models. Some things might

Matthias Plappert 14 Dec 06, 2022
Source code for deep symbolic optimization.

Update July 10, 2021: This repository now supports an additional symbolic optimization task: learning symbolic policies for reinforcement learning. Th

Brenden Petersen 290 Dec 25, 2022
Deep generative modeling for time-stamped heterogeneous data, enabling high-fidelity models for a large variety of spatio-temporal domains.

Neural Spatio-Temporal Point Processes [arxiv] Ricky T. Q. Chen, Brandon Amos, Maximilian Nickel Abstract. We propose a new class of parameterizations

Facebook Research 75 Dec 19, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
All supplementary material used by me while TA-ing CS3244: Machine Learning

CS3244-Tutorial-Material All supplementary material used by me while TA-ing CS3244: Machine Learning at NUS School of Computing. What is this? I teach

Rishabh Anand 18 Sep 23, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Group-Free 3D Object Detection via Transformers

Group-Free 3D Object Detection via Transformers By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong. This repo is the official implementation of "Group-

Ze Liu 213 Dec 07, 2022