Hierarchical Time Series Forecasting using Prophet

Overview

htsprophet

Hierarchical Time Series Forecasting using Prophet

Credit to Rob J. Hyndman and research partners as much of the code was developed with the help of their work.

https://www.otexts.org/fpp

https://robjhyndman.com/publications/

Credit to Facebook and their fbprophet package.

https://facebookincubator.github.io/prophet/

It was my intention to make some of the code look similar to certain sections in the Prophet and (Hyndman's) hts packages.

Downloading

  1. pip install htsprophet

If you'd like to just skip to coding with the package, runHTS.py should help you with that, but if you like reading, the following should help you understand how I built htsprophet and how it works.

Part I: The Data

I originally used Redfin traffic data to build this package.

I pulled the data so that date was in the first column, my layers were the middle columns, and the number I wanted to forecast was in the last column.

I made a function called makeWeekly() , that rolls up your data into the weekly level. It’s not a necessary function, it was mostly just convenient for me.

So the data looked like this:

Date Platform Medium BusinessMarket Sessions
1100 B.C. Stone Tablet Land Birmingham 23234
... Car Phone Air Auburn 2342
... Sea Evanston 233
... Seattle 445
... 46362

I then ran my orderHier() function with just this dataframe as its input.

NOTE: you cannot run this function if you have more than 4 columns in the middle (in between Date and Sessions for ex.)

To run this function, you specify the data, and how you want your middle columns to be ordered.

So orderHier(data, 2, 1, 3) means you want the second column after date to be the first level of the hierarchy.

Our example would look like this:

Alt text

Date Total Land Air Sea Land_Stone tablet Land_Car Phone Air_Stone tablet
1100 B.C. 24578 23135 555 888 23000 135 550
1099 B.C. 86753 86654 44 55 2342 84312 22
... ... ... ... ... ... ... ...
*All numbers represent the number of sessions for each node in the Hierarchy

If you have more than 4 categorical columns, then you must get the data in this format on your own while also producing the list of lists called nodes

Nodes – describes the structure of the hierarchy.

Here it would equal [[3],[2,2,2],[4,4,4,4,4,4]]

There are 3 nodes in the first level: Land, Air, Sea.

There are 2 children for each of those nodes: Stone tablet, Car phone.

There are 4 business markets for each of those nodes: Tokyo, Hamburg etc.

If you use the orderHier function, nodes will be the second output of the function.

Part II: Prophet Inputs

Anything that you would specify in Prophet you can specify in hts().

It’s flexible and will allow you to input a dataframe of values for inputs like cap, capF, and changepoints.

All of these inputs are specified when you call hts, and after that you just let it run.

The following is the description of inputs and outputs for hts as well as the specified defaults:

Parameters
----------------
 y - dataframe of time-series data
           Layout:
               0th Col - Time instances
               1st Col - Total of TS
               2nd Col - One of the children of the Total TS
               3rd Col - The other child of the Total TS
               ...
               ... Rest of the 1st layer
               ...
               Xth Col - First Child of the 2nd Col
               ...
               ... All of the 2nd Col's Children
               ...
               X+Yth Col - First Child of the 3rd Col
               ...
               ..
               .   And so on...

 h - number of step ahead forecasts to make (int)

 nodes - a list or list of lists of the number of child nodes at each level
 Ex. if the hierarchy is one total with two child nodes that comprise it, the nodes input would be [2]
 
 method – (String)  the type of hierarchical forecasting method that the user wants to use. 
            Options:
            "OLS" - optimal combination using ordinary least squares (Default), 
            "WLSS" - optimal combination using structurally weighted least squares, 
            "WLSV" - optimal combination using variance weighted least squares, 
            "FP" - forcasted proportions (top-down)
            "PHA" - proportions of historical averages (top-down)
            "AHP" - average historical proportions (top-down)
            "BU" - bottom-up (simple addition)
            "CVselect" - select which method is best for you based on 3-fold Cross validation (longer run time)
 
 freq - (Time Frequency) input for the forecasting function of Prophet 
 
 include_history - (Boolean) input for the forecasting function of Prophet
 
 transform - (None or "BoxCox") Do you want to transform your data before fitting the prophet function? If yes, type "BoxCox"
            
 cap - (Dataframe or Constant) carrying capacity of the input time series.  If it is a dataframe, then
                               the number of columns must equal len(y.columns) - 1
                               
 capF - (Dataframe or Constant) carrying capacity of the future time series.  If it is a dataframe, then
                                the number of columns must equal len(y.columns) - 1
 
 changepoints - (DataFrame or List) changepoints for the model to consider fitting. If it is a dataframe, then
                                    the number of columns must equal len(y.columns) - 1
 
 n_changepoints - (constant or list) changepoints for the model to consider fitting. If it is a list, then
                                     the number of items must equal len(y.columns) - 1
 skipFitting - (Boolean) if y is already a dictionary of dataframes, set this to True, and DO NOT run with method = "cvSelect" or transform = "BoxCox"
 
 numThreads - (int) number of threads you want to use when running cvSelect. Note: 14 has shown to decrease runtime by 10 percent 
 
 All other inputs - see Prophet
 
Returns
-----------------
 ynew - a dictionary of DataFrames with predictions, seasonalities and trends that can all be plotted

Don’t forget to specify the frequency if you’re not using daily data.

All other functions should be self-explanatory.

Part III: Room For Improvement

  1. Prediction intervals
Owner
Collin Rooney
Collin Rooney
Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

Mert Sezer Ardal 1 Jan 31, 2022
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 208 Dec 27, 2022
Bayesian optimization in JAX

Bayesian optimization in JAX

Predictive Intelligence Lab 26 May 11, 2022
Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

Packt 124 Dec 28, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
A classification model capable of accurately predicting the price of secondhand cars

The purpose of this project is create a classification model capable of accurately predicting the price of secondhand cars. The data used for model building is open source and has been added to this

Akarsh Singh 2 Sep 13, 2022
Code for the TCAV ML interpretability project

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Been Kim, Martin Wattenberg, Justin Gilmer, C

552 Dec 27, 2022
using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

Mohammed Hussien 2 May 02, 2022
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies

Amichay Oren 458 Dec 24, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

SUN Group @ UMN 28 Aug 03, 2022
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
Hierarchical Time Series Forecasting using Prophet

htsprophet Hierarchical Time Series Forecasting using Prophet Credit to Rob J. Hyndman and research partners as much of the code was developed with th

Collin Rooney 131 Dec 02, 2022
🎛 Distributed machine learning made simple.

🎛 lazycluster Distributed machine learning made simple. Use your preferred distributed ML framework like a lazy engineer. Getting Started • Highlight

Machine Learning Tooling 44 Nov 27, 2022
Machine-Learning with python (jupyter)

Machine-Learning with python (jupyter) 머신러닝 야학 작심 10일과 쥬피터 노트북 기반 데이터 사이언스 시작 들어가기전 https://nbviewer.org/ 페이지를 통해서 쥬피터 노트북 내용을 볼 수 있다. 위 페이지에서 현재 레포 기

HyeonWoo Jeong 1 Jan 23, 2022
AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Data Science on AWS - O'Reilly Book Get the book on Amazon.com Book Outline Quick Start Workshop (4-hours) In this quick start hands-on workshop, you

Data Science on AWS 2.8k Jan 03, 2023
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Jan 09, 2023
Predicting job salaries from ads - a Kaggle competition

Predicting job salaries from ads - a Kaggle competition

Zygmunt Zając 57 Oct 23, 2020
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023