Meli Data Challenge 2021 - First Place Solution

Overview

Meli Data Challenge 2021 - First Place Solution

My solution for the Meli Data Challenge 2021, first place in both public and private leaderboards.

The Model

My final model is an ensemble combining recurrent neural networks and XGBoost regressors. Neural networks are trained to predict the stock days probability distribution using the RPS as loss function. XGBoost regressors are trained to predict stock days using different objectives, here the intuition behind this:

  • MSE loss: the regressor trained with this loss will output values close to the expected mean.
  • Pseudo-Huber loss: an alternative for the MAE loss, this regressor outputs values close to the expected median.
  • Quantile loss: 11 regressors are trained using a quantile loss with alpha 0, 0.1, 0.2, ..., 1. This helps to build the final probability distribution.

The outputs of all these level-0 models are concatenated to train a feedforward neural network with the RPS as loss function.

diagram

The last 30 days of the train dataset are used to generate the labels and the target stock input. The remaining 29 days are used to generate the time series input.

The train/validation split is done at a sku level:

  • For level-0 models: 450000 sku's are used for training and the rest for validation.
  • For the level-1 model: the sku's used for training level-0 models are removed from the dataset and the remaining sku's are split again into train/validation.

Once all models are trained, the last 29 days of the train dataset and the provided target stock values are used as input to generate the submission.

Disclaimer: the entire solution lacks some fine tuning since I came up with this little ensemble monster towards the end of the competition. I didn't have the time to fine-tune each model (there are technically 16 models to tune if we consider each quantile regressor as an independent model).

How to run the solution

Requirements

  • TensorFlow v2.
  • Pandas.
  • Numpy.
  • Scikit-learn.

CUDA drivers and a CUDA-compatible GPU is required (I didn't have the time to test this on a CPU).

Some scripts require up to 30GB of RAM (again, I didn't have the time to implement a more memory-efficient solution).

The solution was tested on Ubuntu 20.04 with Python 3.8.10.

Downloading the dataset

Download the dataset files from https://ml-challenge.mercadolibre.com/downloads and put them into the dataset/ directory.

On linux, you can do that by running:

cd dataset && wget \
https://meli-data-challenge.s3.amazonaws.com/2021/test_data.csv \
https://meli-data-challenge.s3.amazonaws.com/2021/train_data.parquet \
https://meli-data-challenge.s3.amazonaws.com/2021/items_static_metadata_full.jl

Running the scripts

All-in-one script

A convenient script to run the entire solution is provided:

cd src
./run-solution.sh

Note: the entire process may take more than 3 hours to run.

Step by step

If you find trouble running the al-in-one script, you can run the solution step by step following the instructions bellow:

cd into the src directory:

cd src

Extract time series from the dataset:

python3 ./preprocessing/extract-time-series.py

Generate a supervised learning dataset:

python3 ./preprocessing/generate-sl-dataset.py

Train all level-0 models:

python3 ./train-all.py

Train the level-1 ensemble:

python3 ./train-ensemble.py

Generate the submission file and gzip it:

python3 ./generate-submission.py && gzip ./submission.csv

Utility scripts

The training_scripts directory contains some scripts to train each model separately, example usage:

python3 ./training_scripts/train-lstm.py
Owner
Matias Moreyra
Electronics Engineer, Software Developer.
Matias Moreyra
3D detection and tracking viewer (visualization) for kitti & waymo dataset

3D detection and tracking viewer (visualization) for kitti & waymo dataset

222 Jan 08, 2023
Official Implementation of HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation

HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation by Lukas Hoyer, Dengxin Dai, and Luc Van Gool [Arxiv] [Paper] Overview Unsup

Lukas Hoyer 149 Dec 28, 2022
Crab is a flexible, fast recommender engine for Python that integrates classic information filtering recommendation algorithms in the world of scientific Python packages (numpy, scipy, matplotlib).

Crab - A Recommendation Engine library for Python Crab is a flexible, fast recommender engine for Python that integrates classic information filtering r

python-recsys 1.2k Dec 21, 2022
Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Text2Music Emotion Embedding Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings Reference Emotion Embedding Spaces for Matching

Minz Won 50 Dec 05, 2022
Accelerating BERT Inference for Sequence Labeling via Early-Exit

Sequence-Labeling-Early-Exit Code for ACL 2021 paper: Accelerating BERT Inference for Sequence Labeling via Early-Exit Requirement: Please refer to re

李孝男 23 Oct 14, 2022
CRF-RNN for Semantic Image Segmentation - PyTorch version

This repository contains the official PyTorch implementation of the "CRF-RNN" semantic image segmentation method, published in the ICCV 2015

Sadeep Jayasumana 170 Dec 13, 2022
Official repository for Natural Image Matting via Guided Contextual Attention

GCA-Matting: Natural Image Matting via Guided Contextual Attention The source codes and models of Natural Image Matting via Guided Contextual Attentio

Li Yaoyi 349 Dec 26, 2022
Emotion classification of online comments based on RNN

emotion_classification Emotion classification of online comments based on RNN, the accuracy of the model in the test set reaches 99% data: Large Movie

1 Nov 23, 2021
Implementation of trRosetta and trDesign for Pytorch, made into a convenient package

trRosetta - Pytorch (wip) Implementation of trRosetta and trDesign for Pytorch, made into a convenient package

Phil Wang 67 Dec 17, 2022
HuSpaCy: industrial-strength Hungarian natural language processing

HuSpaCy: Industrial-strength Hungarian NLP HuSpaCy is a spaCy model and a library providing industrial-strength Hungarian language processing faciliti

HuSpaCy 120 Dec 14, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
A curated list of neural network pruning resources.

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awesome-deep-learning-papers and Awesome-NAS.

Yang He 1.7k Jan 09, 2023
face property detection pytorch

This is the face property train code of project face-detection-project

i am x 2 Oct 18, 2021
SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer A novel graph neural network (GNN) based model (termed SlideGraph+

28 Dec 24, 2022
Answering Open-Domain Questions of Varying Reasoning Steps from Text

This repository contains the authors' implementation of the Iterative Retriever, Reader, and Reranker (IRRR) model in the EMNLP 2021 paper "Answering Open-Domain Questions of Varying Reasoning Steps

26 Dec 22, 2022
SIR model parameter estimation using a novel algorithm for differentiated uniformization.

TenSIR Parameter estimation on epidemic data under the SIR model using a novel algorithm for differentiated uniformization of Markov transition rate m

The Spang Lab 4 Nov 30, 2022
Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

Christoph Reich 122 Dec 12, 2022
Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning"

CMSF Official Code for "Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning" Requirements Python = 3.7.6 PyTorch

4 Nov 25, 2022
Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021) 99% of the code in this repository originates from this link. ICCV 2021 pap

Jeesoo Kim 10 Feb 01, 2022
x-transformers-paddle 2.x version

x-transformers-paddle x-transformers-paddle 2.x version paddle 2.x版本 https://github.com/lucidrains/x-transformers 。 requirements paddlepaddle-gpu==2.2

yujun 7 Dec 08, 2022