A library to generate synthetic time series data by easy-to-use factors and generator

Last update: Dec 20, 2022

Overview

timeseries-generator

This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_generator) and demo notebooks on how to generate synthetic timeseries data (under /examples). The goal here is to have non-sensitive data available to demo solutions and test the effectiveness of those solutions and/or algorithms. In order to test your algorithm, you want to have time series available containing different kinds of trends. The python package should help create different kinds of time series while still being maintainable.

`timeseries_generator` package

For this package, it is assumed that a time series is composed of a base value multiplied by many factors.

ts = base_value * factor1 * factor2 * ... * factorN + Noiser

These factors can be anything, random noise, linear trends, to seasonality. The factors can affect different features. For example, some features in your time series may have a seasonal component, while others do not.

Different factors are represented in different classes, which inherit from the BaseFactor class. Factor classes are input for the Generator class, which creates a dataframe containing the features, base value, all the different factors working on the base value and and the final factor and value.

Core concept

Generator: a python class to generate the time series. A generator contains a list of factors and noiser. By overlaying the factors and noiser, generator can produce a customized time series
Factor: a python class to generate the trend, seasonality, holiday factors, etc. Factors take effect by multiplying on the base value of the generator.
Noised: a python class to generate time series noise data. Noiser take effect by summing on top of "factorized" time series. This formula describes the concepts we talk above

Built-in Factors

LinearTrend: give a linear trend based on the input slope and intercept
CountryYearlyTrend: give a yearly-based market cap factor based on the GDP per - capita.
EUEcoTrendComponents: give a monthly changed factor based on EU industry product public data
HolidayTrendComponents: simulate the holiday sale peak. It adapts the holiday days - differently in different country
BlackFridaySaleComponents: simulate the BlackFriday sale event
WeekendTrendComponents: more sales at weekends than on weekdays
FeatureRandFactorComponents: set up different sale amount for different stores and different product
ProductSeasonTrendComponents: simulate season-sensitive product sales. In this example code, we have 3 different types of product:
- winter jacket: inverse-proportional to the temperature, more sales in winter
- basketball top: proportional to the temperature, more sales in summer
- Yoga Mat: temperature insensitive

Installation

pip install timeseries-generator

Usage

from timeseries_generator import LinearTrend, Generator, WhiteNoise, RandomFeatureFactor
import pandas as pd

# setting up a linear tren
lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")
g = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-20-2020"))
g.generate()
g.plot()

# update by adding some white noise to the generator
wn = WhiteNoise(stdev_factor=0.05)
g.update_factor(wn)
g.generate()
g.plot()

Example Notebooks

We currently have 2 example notebooks available:

generate_stationary_process: Good for introducing the basics of the timeseries_generator. Shows how to apply simple linear trends and how to introduce features and labels, as well as random noise.
use_external_factors: Goes more into detail and shows how to use the external_factors submodule. Shows how to create seasonal trends.

Web based prototyping UI

We also use Streamlit to build a web-based UI to demonstrate how to use this package to generate synthesis time series data in an interactive web UI.

streamlit run examples/streamlit/app.py

License

This package is released under the Apache License, Version 2.0

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 5, 2023

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar

15.4k Jan 7, 2023

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022

Visualize classified time series data with interactive Sankey plots in Google Earth Engine

sankee Visualize changes in classified time series data with interactive Sankey plots in Google Earth Engine Contents Description Installation Using P

76 Dec 15, 2022

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing values.

179 Dec 31, 2022

A collection of Scikit-Learn compatible time series transformers and tools.

tsfeast A collection of Scikit-Learn compatible time series transformers and tools. Installation Create a virtual environment and install: From PyPi p

0 Mar 30, 2022

Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

7k Jan 6, 2023

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

6k Jan 6, 2023

Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

3.3k Jan 3, 2023

Comments

Time series data augmentation

There is a code example that gives to increase the amount of series data by adding slightly modified copies of already existing time series data or newly created synthetic series data from existing data?

opened by YAYAYru 0

KeyError: 'country'

From the following code,

from timeseries_generator import HolidayFactor, LinearTrend, Generator

lt = LinearTrend(coef=2.0, offset=1., col_name="my_linear_trend")

g: Generator = Generator(factors={lt}, features=None, date_range=pd.date_range(start="01-01-2020", end="01-01-2021"))

holiday_factor = HolidayFactor(
    country_feature_name="country",
)
g.add_factor(holiday_factor)
g.generate()

I get the error. I am not sure this is expected behavior.

File /usr/local/Caskroom/miniconda/base/envs/tf/lib/python3.9/site-packages/pandas/core/frame.py:10083, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
...
-> 1849     raise KeyError(key)
   1851 # Check for duplicates
   1852 if values.ndim > 1:

KeyError: 'country'

opened by twobitunicorn 0

[Feature request] Customizable feature combinations
Hi team, Thanks for the useful library! I wonder if you'd be open to this idea:

I would like to be able to:

Set up categorizing features (let's say, for illustration, CATEGORY=[footwear, t-shirts, socks], SIZE=[S, M, L, US-Mens-8, US-Womens-6) and define Factors on them

Generate time-series with more restricted feature combinations than the outer product (again for illustration, "t-shirt sizes for t-shirts, shoe sizes for footwear")

Today, it seems like Generator.generate() hard-codes the assumption that time-series should be generated for the product of all provided feature values.

It'd be helpful if, instead, we could have the option of customizing this join to limit down generated combinations?

Some options I can think of:

Leave the library as-is: Users generate full outer product and limit down what they want in post-processing

This seems possible already, but very RAM-intensive if your desired combinations are sparse?

Accept an optional dataframe of factor combinations as parameter to the generate() method

Gives full flexibility over which combinations are kept / ignored, without assuming any particular rigid hierarchies between features

...But might need to do a bit of validation to protect against user errors? May not be super easy to use without some documented examples / functions to generate the dataframe

Some more complex API for feature configuration that accommodates specifying valid/invalid feature combinations

Might be nicer for usability, but difficult to make general: E.g. a straightforward hierarchy could be represented as a nested dict, but in practice many applications have multiple intersecting views of product category information e.g. brand, type, target segment, etc.
opened by athewsey 1
Generate hourly data

First of all, thank you for making this repository public! I enjoy its ease of use and the built-in factors.

Problem description

I'm currently trying to generate revenue data for a bar/restaurant on an hourly basis. As far as I can see, the timeseries-generator only supports generating one data point per day, not per hour.

I tried to generate hourly data like g = Generator(factors={lt}, features=None, date_range=pd.date_range(start='15/9/2021', end='30/9/2021', freq='h')) which didn't work.

Potential solution

Add the possibility to generate hourly data too. If this is a promising idea in your opinion, I'm willing to contribute to the implementation.

Thank you in advance!

opened by nileger 1

Releases(v0.1.0)

v0.1.0(Jul 20, 2021)
first release of time series generators, including:

base factor

linear trend factor

sinusoidal factor

white noise factor

random factor

holiday factor

weekday factor

country GDP factor

EU industry index factor

Examples

notebooks which includes some simple examples

streamlit dashboard

Source code(tar.gz)
Source code(zip)

Owner

Nike Inc.

GitHub Repository

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

6 Oct 20, 2022

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

AI Fairness 360 (AIF360) The AI Fairness 360 toolkit is an extensible open-source library containg techniques developed by the research community to h

1.9k Jan 06, 2023

The Fuzzy Labs guide to the universe of open source MLOps

Open Source MLOps This is the Fuzzy Labs guide to the universe of free and open source MLOps tools. Contents What is MLOps, anyway? Data version contr

352 Dec 29, 2022

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

SKlearn_to_MLFLow In this Repo a simple Sklearn Model will be trained and pushed to MLFlow Install This Repo is based on poetry python3 -m venv .venv

1 Dec 13, 2021

A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

438 Dec 17, 2022

2D fluid simulation implementation of Jos Stam paper on real-time fuild dynamics, including some suggested extensions.

Fluid Simulation Usage Download this repo and store it in your computer. Open a terminal and go to the root directory of this folder. Make sure you ha

5 Dec 02, 2022

CVXPY is a Python-embedded modeling language for convex optimization problems.

CVXPY The CVXPY documentation is at cvxpy.org. We are building a CVXPY community on Discord. Join the conversation! For issues and long-form discussio

4.3k Jan 08, 2023

Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Lingtrain Alignment Studio Intro Lingtrain Alignment Studio is the ML based app for accurate texts alignment on different languages. Extracts parallel

186 Jan 03, 2023

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

297 Dec 13, 2022

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification Introduction. This package includes the pyth

5 Dec 06, 2022

An open-source library of algorithms to analyse time series in GPU and CPU.

216 Dec 30, 2022

InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

183 Jan 03, 2023

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

English | 简体中文 AutoX是什么？ AutoX一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色: AutoX在多个kaggle数据集上，效果显著优于其他解决方案(见效果对比)。简单易用: AutoX的接口和sklearn类似，方便上手使用。

431 Dec 28, 2022

moDel Agnostic Language for Exploration and eXplanation

moDel Agnostic Language for Exploration and eXplanation Overview Unverified black box model is the path to the failure. Opaqueness leads to distrust.

1.2k Jan 04, 2023

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, Ma**ng L*v* dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

1 Apr 06, 2022

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

1 Jan 19, 2022

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

1 Jan 06, 2022

Machine Learning e Data Science com Python

Machine Learning e Data Science com Python Arquivos do curso de Data Science e Machine Learning com Python na Udemy, cliqe aqui para acessá-lo. O prin

1 Jan 27, 2022

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

ClearML - Auto-Magical Suite of tools to streamline your ML workflow Experiment Manager, MLOps and Data-Management ClearML Formerly known as Allegro T

4k Jan 09, 2023

A model to predict steering torque fully end-to-end

torque_model The torque model is a spiritual successor to op-smart-torque, which was a project to train a neural network to control a car's steering f

4 Jun 03, 2022

A library to generate synthetic time series data by easy-to-use factors and generator

Related tags

Overview

timeseries-generator

timeseries_generator package

Core concept

Built-in Factors

Installation

Usage

Example Notebooks

Web based prototyping UI

License

You might also like...

A machine learning toolkit dedicated to time-series data

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

A machine learning toolkit dedicated to time-series data

Visualize classified time series data with interactive Sankey plots in Google Earth Engine

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A collection of Scikit-Learn compatible time series transformers and tools.

Automatic extraction of relevant features from time series:

A unified framework for machine learning with time series

Probabilistic time series modeling in Python

Comments

Time series data augmentation

KeyError: 'country'

[Feature request] Customizable feature combinations

Generate hourly data

Problem description

Potential solution

Releases(v0.1.0)

v0.1.0(Jul 20, 2021)

Owner

Nike Inc.

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

The Fuzzy Labs guide to the universe of open source MLOps

In this Repo a simple Sklearn Model will be trained and pushed to MLFlow

A python library for Bayesian time series modeling

2D fluid simulation implementation of Jos Stam paper on real-time fuild dynamics, including some suggested extensions.

CVXPY is a Python-embedded modeling language for convex optimization problems.

Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

Temporal Alignment Prediction for Supervised Representation Learning and Few-Shot Sequence Classification

An open-source library of algorithms to analyse time series in GPU and CPU.

InfiniteBoost: building infinite ensembles with gradient descent

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

moDel Agnostic Language for Exploration and eXplanation

Lseng-iseng eksplor Machine Learning dengan menggunakan library Scikit-Learn

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

Predico Disease Prediction system based on symptoms provided by patient- using Python-Django & Machine Learning

Machine Learning e Data Science com Python

ClearML - Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, MLOps and Data-Management

A model to predict steering torque fully end-to-end

`timeseries_generator` package

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。