Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

Overview

GENDIS Build Status PyPI version Read The Docs Downloads

GENetic DIscovery of Shapelets

In the time series classification domain, shapelets are small subseries that are discriminative for a certain class. It has been shown that by projecting the original dataset to a distance space, where each axis corresponds to the distance to a certain shapelet, classifiers are able to achieve state-of-the-art results on a plethora of datasets.

This repository contains an implementation of GENDIS, an algorithm that searches for a set of shapelets in a genetic fashion. The algorithm is insensitive to its parameters (such as population size, crossover and mutation probability, ...) and can quickly extract a small set of shapelets that is able to achieve predictive performances similar (or better) to that of other shapelet techniques.

Installation

We currently support Python 3.5 & Python 3.6. For installation, there are two alternatives:

  1. Clone the repository https://github.com/IBCNServices/GENDIS.git and run (python3 -m) pip -r install requirements.txt
  2. GENDIS is hosted on PyPi. You can just run (python3 -m) pip install gendis to add gendis to your dist-packages (you can use it from everywhere).

Make sure NumPy and Cython is already installed (pip install numpy and pip install Cython), since that is required for the setup script.

Tutorial & Example

1. Loading & preprocessing the datasets

In a first step, we need to construct at least a matrix with timeseries (X_train) and a vector with labels (y_train). Additionally, test data can be loaded as well in order to evaluate the pipeline in the end.

import pandas as pd
# Read in the datafiles
train_df = pd.read_csv(<DATA_FILE>)
test_df = pd.read_csv(<DATA_FILE>)
# Split into feature matrices and label vectors
X_train = train_df.drop('target', axis=1)
y_train = train_df['target']
X_test = test_df.drop('target', axis=1)
y_test = test_df['target']

2. Creating a GeneticExtractor object

Construct the object. For a list of all possible parameters, and a description, please refer to the documentation in the code

from gendis.genetic import GeneticExtractor
genetic_extractor = GeneticExtractor(population_size=50, iterations=25, verbose=True, 
                                     mutation_prob=0.3, crossover_prob=0.3, 
                                     wait=10, max_len=len(X_train) // 2)

3. Fit the GeneticExtractor and construct distance matrix

shapelets = genetic_extractor.fit(X_train, y_train)
distances_train = genetic_extractor.transform(X_train)
distances_test = genetic_extractor.transform(X_test)

4. Fit ML classifier on constructed distance matrix

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
lr = LogisticRegression()
lr.fit(distances_train, y_train)

print('Accuracy = {}'.format(accuracy_score(y_test, lr.predict(distances_test))))

Example notebook

A simple example is provided in this notebook

Data

All datasets in this repository are downloaded from timeseriesclassification. Please refer to them appropriately when using any dataset.

Paper experiments

In order to reproduce the results from the corresponding paper, please check out this directory.

Tests

We provide a few doctests and unit tests. To run the doctests: python3 -m doctest -v <FILE>, where <FILE> is the Python file you want to run the doctests from. To run unit tests: nose2 -v

Contributing, Citing and Contact

If you have any questions, are experiencing bugs in the GENDIS implementation, or would like to contribute, please feel free to create an issue/pull request in this repository or take contact with me at gilles(dot)vandewiele(at)ugent(dot)be

If you use GENDIS in your work, please use the following citation:

@article{vandewiele2021gendis,
  title={GENDIS: Genetic Discovery of Shapelets},
  author={Vandewiele, Gilles and Ongenae, Femke and Turck, Filip De},
  journal={Sensors},
  volume={21},
  number={4},
  pages={1059},
  year={2021},
  publisher={Multidisciplinary Digital Publishing Institute}
}
Owner
IDLab Services
Internet and Data Lab research group from Ghent University
IDLab Services
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
A Python library for choreographing your machine learning research.

A Python library for choreographing your machine learning research.

AI2 270 Jan 06, 2023
A demo project to elaborate how Machine Learn Models are deployed on production using Flask API

This is a salary prediction website developed with the help of machine learning, this makes prediction of salary on basis of few parameters like interview score, experience test score.

1 Feb 10, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 02, 2023
Warren - Stock Price Predictor

Web app to predict closing stock prices in real time using Facebook's Prophet time series algorithm with a multi-variate, single-step time series forecasting strategy.

Kumar Nityan Suman 153 Jan 03, 2023
Practical Time-Series Analysis, published by Packt

Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

Packt 325 Dec 23, 2022
Azure MLOps (v2) solution accelerators.

Azure MLOps (v2) solution accelerator Welcome to the MLOps (v2) solution accelerator repository! This project is intended to serve as the starting poi

Microsoft Azure 233 Jan 01, 2023
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
fMRIprep Pipeline To Machine Learning

fMRIprep Pipeline To Machine Learning(Demo) 所有配置均在config.py文件下定义 前置环境(lilab) 各个节点均安装docker,并有fmripre的镜像 可以使用conda中的base环境(相应的第三份包之后更新) 1. fmriprep scr

Alien 3 Mar 08, 2022
Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, Ma**ng L*v* dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

Alfiyanto Kondolele 1 Apr 06, 2022
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. Th

Swiggy 66 Dec 06, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 05, 2023
Diabetes Prediction with Logistic Regression

Diabetes Prediction with Logistic Regression Exploratory Data Analysis Data Preprocessing Model & Prediction Model Evaluation Model Validation: Holdou

AZİZE SULTAN PALALI 2 Oct 23, 2021
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
A Python implementation of GRAIL, a generic framework to learn compact time series representations.

GRAIL A Python implementation of GRAIL, a generic framework to learn compact time series representations. Requirements Python 3.6+ numpy scipy tslearn

3 Nov 24, 2021
Responsible AI Workshop: a series of tutorials & walkthroughs to illustrate how put responsible AI into practice

Responsible AI Workshop Responsible innovation is top of mind. As such, the tech industry as well as a growing number of organizations of all kinds in

Microsoft 9 Sep 14, 2022