4th place solution to datafactory challenge by Intermarché.

Last update: Mar 19, 2022

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to predict the sales made by intermarche in the first quarter of 2019. We have the data of the past year (2018) to train our model to fit the sales.

Data 💿

We have the record of sales for a set of pairs (store, item) and for each day of 2018 (if there was at least one sale). The data are structured as:

date	store	item	quantity
2018-01-01	1	12	1
2018-01-01	1	17	2
2018-01-01	1	22	3

We have additional tables available such as:

Product characteristics.
Store characteristics.
Product prices by store and by quarter.

Solution 🤖

The main difficulty of the challenge is to find the days for which a store has recorded no sales for a given product. Indeed, Intermarché does not provide records for which the target variable (quantity) is equal to 0. I found that adding up to 5 zeros after a sale for a given pair (store / item) maximized the performance of my model and limited the overfitting of my aggregates.

Features:

Aggregates by item / store (mean + std)
Aggregates on prices. (mean)
Aggregates on the characteristics of the stores. (mean)
Aggregates on product characteristics. (mean)
Rolling medians over the last 9 weeks.
Features on dates. (weekend / holidays / day of the week)

I used LightGBM and performed a 3-fold cross-validation with bagging to make my prediction. I transformed the target variable to train my model using quantity = log(1 + quantity). Poisson loss helps a bit. I didn't look for the hyperparameters of the model.

Finally I set all predictions of February and March as the predictions of the second and third week of January.

Also I set to 0 the set of predictions associated to triplets (store / item / day of the week) for which we have not enough records in the training set.

Run ♻️

To reproduce my results, you must download the data in the folder data/raw.

python scripts/prepare_raw_data.py
python scripts/features/aggs_items.py
python scripts/features/aggs_prices.py
python scripts/features/aggs_stores.py
python scripts/features/aggs.py 
python scripts/features/lags.py
python scripts/features/cal.py 
python scripts/make_train_test.py
python scripts/learn.py
python scripts/polish_sub.py

License

This project is free and open-source software licensed under the MIT license.

4th place solution to datafactory challenge by Intermarché.

Related tags

Overview

Solution to Datafactory challenge by Intermarché.

Data 💿

Solution 🤖

Run ♻️

License

Owner

Raphael Sourty

Final report with code for KAIST Course KSE 801.

Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Repository of best practices for deep learning in Julia, inspired by fastai

Self-Regulated Learning for Egocentric Video Activity Anticipation

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

Experiments with the Robust Binary Interval Search (RBIS) algorithm, a Query-Based prediction algorithm for the Online Search problem.

A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Temporally Coherent GAN SIGGRAPH project.

TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments for IV 2022.

Image processing in Python

Change is Everywhere: Single-Temporal Supervised Object Change Detection in Remote Sensing Imagery (ICCV 2021)

SCAN: Learning to Classify Images without Labels, incl. SimCLR. [ECCV 2020]

An inofficial PyTorch implementation of PREDATOR based on KPConv.

Using pytorch to implement unet network for liver image segmentation.

A check for whether the dependency jobs are all green.

MMDetection3D is an open source object detection toolbox based on PyTorch

Hepsiburada - Hepsiburada Urun Bilgisi Cekme

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules