Retail-Sim is python package to easily create synthetic dataset of retaile store.

Overview

Retailer's Sale Data Simulation

Retail-Sim is python package to easily create synthetic dataset of retaile store.

Simulation Model

Simulator consists of env, that generates retailer store simulated data.

Modelling PLAN

Products

Create fake products and relationship between them. Relationship between products (Cateogries, to be more precise) consists of "exchangability", "complementarity". Products have many attributes, such as

  • Base Price
  • Base Cost
  • Volume
  • Attractiveness
  • Category
  • Price elasticity
  • Relative Consumption rate
  • Loyalty

Volume implies how much satisfaction it provieds to the customer (How much of a need it subtracts). Volume is proportional to price, which can be set with vol_price_corr.

Products are discretely grouped by some category. Each category has attribute "consumption rate", "general trend", and "seasonal trend". In real life, products such as fresh food, tissues, bottled water would have high consumption rate. General trend is random linear-like trend, seasonal trend is trend of sales that has period of 1 year. In real life, product like icecream would have winter-oriented seasonal trend.

Customers

Every customer has random set of "needs". Just as real life, you might need shampoo, pair of scissors, and some spagetti souce(All of these are considered as one category) Customers will try to fill those needs. As it happens in real life, customers are encourged to buy the product that both satisfy the needs and has a high preference.

Product's Total Attractiveness

Every product comes with the Attractiveness attribute. If it has higher attractiveness, it is more likely to sell. However,

  • If the product is on discount, it will become more attractive.
  • If the product is on discount and it is advertised to be, it will become even more attractive.
  • If the product has high loyalty, it will have very high attractiveness to some customers.
  • There might be some general trend on the attractiveness.

Therefore during simulation, total attractiveness will be defined as:

$$Total = max(\text{Attractiveness} + \text{elasticity} * \text{discounted rate}, B(loyalty) * infty)$$

Customer's state transition

Customers will buy with n budget, where n is pareto distibuted among all customers. They will randomly pick a category depending on their current need distribution. After that, they will buy a product in that category, based on the products' total attractiveness. Buying that product will subtract the customer's need of that category by Volume's amount.

Owner
Corca AI
AI B2B Consulting Company
Corca AI
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in cluste

Amazon Web Services - Labs 53 Dec 08, 2022
Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations.

Elicited Helper tools to construct probability distributions built from expert elicited data for use in monte carlo simulations. Credit to Brett Hoove

Ryan McGeehan 3 Nov 04, 2022
This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

sweemeng 2 Nov 23, 2021
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
Synthetic Data Generation for tabular, relational and time series data.

An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github

The Synthetic Data Vault Project 1.2k Jan 07, 2023
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 01, 2021
An Indexer that works out-of-the-box when you have less than 100K stored Documents

U100KIndexer An Indexer that works out-of-the-box when you have less than 100K stored Documents. U100K means under 100K. At 100K stored Documents with

Jina AI 7 Mar 15, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Jan 02, 2023
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022
Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU 3 Dec 07, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

1 Feb 07, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj

Keenan Cooper 1 Jan 12, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 09, 2022
Top 50 best selling books on amazon

It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years

Nahla Tarek 1 Nov 18, 2021
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022