A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

  • Contrastive Learning for Event Sequences (CoLES)
  • Contrastive Predictive Coding (CPC)
  • Replaced Token Detection (RTD) from ELECTRA
  • Next Sequence Prediction (NSP) from BERT
  • Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

  • Self-supervided training and embeddings for downstream task notebook
  • Self-supervided embeddings in CatBoost notebook
  • Self-supervided training and fine-tuning notebook
  • PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments
  • torch.stack in def collate_feature_dict

    torch.stack in def collate_feature_dict

    ptls/data_load/utils.py

    Hello!

    If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

    opened by Ivanich-spb 11
  • Not supported multiGPU option from pytorchlightning.Trainer

    Not supported multiGPU option from pytorchlightning.Trainer

    Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

    AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

    opened by mazitovs 1
  • Correct seq_len for feature dict

    Correct seq_len for feature dict

    rec = {
        'mcc': [0, 1, 2, 3],
        'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0],
    }
    

    How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array

    opened by ivkireev86 1
  • Save categories encodings along with model weights in demos

    Save categories encodings along with model weights in demos

    Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

    opened by ivkireev86 1
  • Documentation index

    Documentation index

    Прототип главной страницы документации. Три секции:

    • описание моделей библиотеки
    • гайд как использовать библиотеку
    • как писать свои компоненты

    Есть краткое описание и ссылки на подробные (которые напишем потом).

    В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

    На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.

    opened by ivkireev86 1
  • KL cyclostationarity test tools

    KL cyclostationarity test tools

    Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

    Think about tests for other frameworks.

    opened by ivkireev86 0
  • Repair pyspark tests

    Repair pyspark tests

    def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

        df = df.withColumn('ts', dt_to_timestamp('dt'))
        ts = [rec.ts for rec in df.select('ts').collect()]
    
      assert ts == [0, 1325419276, 1640822400]
    

    E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError


    def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

      assert et[0] == 0
    

    E assert -10800 == 0

    ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError

    opened by ikretus 0
  • docs. Development guide (for demo notebooks)

    docs. Development guide (for demo notebooks)

    • add current patterns
    • when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False
    documentation user feedback 
    opened by ivkireev86 0
Releases(v0.5.1)
  • v0.5.1(Dec 28, 2022)

    What's Changed

    • fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90
    • add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87
    • MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88
    • fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

    New Contributors

    • @ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90
    • @mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Nov 9, 2022)

    What's Changed

    • Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72
    • Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73
    • fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74
    • ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75
    • tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76
    • Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78
    • spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80
    • Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81
    • tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83
    • Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

    New Contributors

    • @blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 27, 2022)

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    What's Changed

    • Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29
    • regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36
    • lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34
    • Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40
    • Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42
    • feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43
    • Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37
    • Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45
    • fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50
    • Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52
    • Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58
    • Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62
    • doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

    New Contributors

    • @ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jun 12, 2022)

    More Pythonic Core API: constructor arguments instead of config objects

    What's Changed

    • cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9
    • All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15
    • Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13
    • all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20
    • Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8
    • Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18
    • loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22
    • Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23
    • readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25
    • init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26
    • work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7
    • trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

    Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0

    Source code(tar.gz)
    Source code(zip)
Owner
Dmitri Babaev
Dmitri Babaev
Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

RSPNet Official Pytorch implementation for AAAI2021 paper "RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning" [Suppleme

35 Jun 24, 2022
git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

Yue Cao 51 Nov 22, 2022
Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021)

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021) By Jinhyung Park, Dohae Lee, In-Kwon Lee from Yonsei University (Seoul,

Jinhyung Park 0 Jan 09, 2022
Attention-guided gan for synthesizing IR images

SI-AGAN Attention-guided gan for synthesizing IR images This repository contains the Tensorflow code for "Pedestrian Gender Recognition by Style Trans

1 Oct 25, 2021
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
Kaggle | 9th place single model solution for TGS Salt Identification Challenge

UNet for segmenting salt deposits from seismic images with PyTorch. General We, tugstugi and xuyuan, have participated in the Kaggle competition TGS S

Erdene-Ochir Tuguldur 276 Dec 20, 2022
PyTorch implementation of "Dataset Knowledge Transfer for Class-Incremental Learning Without Memory" (WACV2022)

Dataset Knowledge Transfer for Class-Incremental Learning Without Memory [Paper] [Slides] Summary Introduction Installation Reproducing results Citati

Habib Slim 5 Dec 05, 2022
Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

AutoInt: Automatic Integration for Fast Neural Volume Rendering CVPR 2021 Project Page | Video | Paper PyTorch implementation of automatic integration

Stanford Computational Imaging Lab 149 Dec 22, 2022
My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

ML2021Spring There are my projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU) Course Web : https://speech.ee.

Ding-Li Chen 15 Aug 29, 2022
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation This repository contains the source code of our paper, ESPNet (acc

Sachin Mehta 515 Dec 13, 2022
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

savior是一个能够进行快速集成算法模块并支持高性能部署的轻量开发框架。能够帮助将团队进行快速想法验证(PoC),避免重复的去github上找模型然后复现模型;能够帮助团队将功能进行流程拆解,很方便的提高分布式执行效率;能够有效减少代码冗余,减少不必要负担。

Tao Luo 125 Dec 22, 2022
Learning Open-World Object Proposals without Learning to Classify

Learning Open-World Object Proposals without Learning to Classify Pytorch implementation for "Learning Open-World Object Proposals without Learning to

Dahun Kim 149 Dec 22, 2022
The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Quantum Qubit Rotation Algorithm Single qubit rotation gates $$ U(\Theta)=\bigotimes_{i=1}^n R_x (\phi_i) $$ QQRA for the max-cut problem This code wa

SheffieldWang 0 Oct 18, 2021