SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

Overview

License: MIT Python GitHub code size in bytes Downloads GitHub Workflow Status PyPI version GitHub issues GitHub commit activity GitHub last commit arXiv

[arXiv]

The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, which was actually in operation for a decade. In addition, the SHIFT15M dataset has several types of dataset shifts, allowing us to evaluate the robustness of the model to different types of shifts (e.g., covariate shift and target shift).

We provide the Datasheet for SHIFT15M. This datasheet is based on the Datasheets for Datasets [1] template.

System Python 3.6 Python 3.7 Python 3.8
Linux CPU
Linux GPU
Windows CPU / GPU Status Currently Unavailable Status Currently Unavailable Status Currently Unavailable
Mac OS CPU

SHIFT15M is a large-scale dataset based on approximately 15 million items accumulated by the fashion search service IQON.

Installation

From PyPi

$ pip install shift15m

From source

$ git clone https://github.com/st-tech/zozo-shift15m.git
$ cd zozo-shift15m
$ poetry build
$ pip install dist/shift15m-xxxx-py3-none-any.whl

Download SHIFT15M dataset

Use Dataset class

You can download SHIFT15M dataset as follows:

from shift15.datasets import NumLikesRegression

dataset = NumLikesRegression(root="./data", download=True)

Download directly by using download scripts

Please download the dataset as follows:

$ bash scripts/download_all.sh

To avoid downloading the test dataset for set matching (80GB), which is not required in training, you can use the following script.

$ bash scripts/download_all_wo_set_testdata.sh

Tasks

The following tasks are now available:

Tasks Task type Shift type # of input dim # of output dim
NumLikesRegression regression target shift (N, 25) (N, 1)
SumPricesRegression regression covariate shift, target shift (N, 1) (N, 1)
ItemPriceRegression regression target shift (N, 4096) (N, 1)
ItemCategoryClassification classification target shift (N, 4096) (N, 7)
Set2SetMatching set-to-set matching covariate shift (N, 4096)x(M, 4096) (1)

Benchmarks

As templates for numerical experiments on the SHIFT15M dataset, we have published experimental results for each task with several models.

Original Dataset Structure

The original dataset is maintained in json format, and a row consists of the following:

{
  "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
  "like_num":"xx",
  "set_id":"xxx",
  "items":[
    {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
    ...
  ],
  "publish_date":"yyyy-mm-dd"
}

Contributing

To learn more about making a contribution to SHIFT15M, please see the following materials:

License

The dataset itself is provided under a CC BY-NC 4.0 license. On the other hand, the software in this repository is provided under the MIT license.

Dataset metadata

The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.

property value
name SHIFT15M Dataset
alternateName SHIFT15M
alternateName shift15m-dataset
url
sameAs https://github.com/st-tech/zozo-shift15m
description SHIFT15M is a multi-objective, multi-domain dataset which includes multiple dataset shifts.
provider
property value
name ZOZO Research
sameAs https://ja.wikipedia.org/wiki/ZOZO
license
property value
name CC BY-NC 4.0
url

Citation

@misc{Kimura_SHIFT15M_Multiobjective_LargeScale_2021,
author = {Kimura, Masanari and Nakamura, Takuma and Saito, Yuki},
month = {8},
title = {SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts},
year = {2021}
}

Errata

No errata are currently available.

References

  • [1] Gebru, Timnit, et al. "Datasheets for datasets." arXiv preprint arXiv:1803.09010 (2018).
Comments
Releases(v0.2.0)
  • v0.2.0(Sep 20, 2022)

    • add tags info as follows:
    {
      "user":{"user_id":"xxxx", "fav_brand_ids":"xxxx,xx,..."},
      "like_num":"xx",
      "set_id":"xxx",
      "items":[
        {"price":"xxxx","item_id":"xxxxxx","category_id1":"xx","category_id2":"xxxxx"},
        ...
      ],
      "publish_date":"yyyy-mm-dd",
      "tags": "tag_a, tag_b, tag_c, ..."
    }
    
    • add superset matching benchmark
    • fix a label creation bug on set matching with multiple splits
    Source code(tar.gz)
    Source code(zip)
  • v.0.1.2(Nov 24, 2021)

Owner
ZOZO, Inc.
ZOZO, Inc.
Fibonacci Method Gradient Descent

An implementation of the Fibonacci method for gradient descent, featuring a TKinter GUI for inputting the function / parameters to be examined and a matplotlib plot of the function and results.

Emma 1 Jan 28, 2022
*ObjDetApp* deploys a pytorch model for object detection

*ObjDetApp* deploys a pytorch model for object detection

Will Chao 1 Dec 26, 2021
FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack Case study of the FCA. The code can be find in FCA. Cas

IDRL 21 Dec 15, 2022
BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)

BARF 🤮 : Bundle-Adjusting Neural Radiance Fields Chen-Hsuan Lin, Wei-Chiu Ma, Antonio Torralba, and Simon Lucey IEEE International Conference on Comp

Chen-Hsuan Lin 539 Dec 28, 2022
Exploring Visual Engagement Signals for Representation Learning

Exploring Visual Engagement Signals for Representation Learning Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie and Ser-Nam Lim C

Menglin Jia 9 Jul 23, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
Self-Supervised Speech Pre-training and Representation Learning Toolkit.

What's New Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site

s3prl 1.6k Jan 08, 2023
Expand human face editing via Global Direction of StyleCLIP, especially to maintain similarity during editing.

Oh-My-Face This project is based on StyleCLIP, RIFE, and encoder4editing, which aims to expand human face editing via Global Direction of StyleCLIP, e

AiLin Huang 51 Nov 17, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentation"

Hyper-Convolution Networks for Biomedical Image Segmentation Code for our WACV 2022 paper "Hyper-Convolution Networks for Biomedical Image Segmentatio

Tianyu Ma 17 Nov 02, 2022
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 13 Nov 29, 2022
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

How Tight Can PAC-Bayes be in the Small Data Regime? This is the code to reproduce all experiments for the following paper: @inproceedings{Foong:2021:

5 Dec 21, 2021
Spectral Tensor Train Parameterization of Deep Learning Layers

Spectral Tensor Train Parameterization of Deep Learning Layers This repository is the official implementation of our AISTATS 2021 paper titled "Spectr

Anton Obukhov 12 Oct 23, 2022
Framework that uses artificial intelligence applied to mathematical models to make predictions

LiconIA Framework that uses artificial intelligence applied to mathematical models to make predictions Interface Overview Table of contents [TOC] 1 Ar

4 Jun 20, 2021
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
Non-stationary GP package written from scratch in PyTorch

NSGP-Torch Examples gpytorch model with skgpytorch # Import packages import torch from regdata import NonStat2D from gpytorch.kernels import RBFKernel

Zeel B Patel 1 Mar 06, 2022