A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Last update: Dec 17, 2022

Related tags

Overview

pytorch-lifestream a library built upon PyTorch for building embeddings on discrete event sequences using self-supervision. It can process terabyte-size volumes of raw events like game history events, clickstream data, purchase history or card transactions.

It supports various methods of self-supervised training, adapted for event sequences:

Contrastive Learning for Event Sequences (CoLES)
Contrastive Predictive Coding (CPC)
Replaced Token Detection (RTD) from ELECTRA
Next Sequence Prediction (NSP) from BERT
Sequences Order Prediction (SOP) from ALBERT

It supports several types of encoders, including Transformer and RNN. It also supports many types of self-supervised losses.

The following variants of the contrastive losses are supported:

Contrastive loss (paper)
Triplet loss (paper)
Binomial deviance loss (paper)
Histogramm loss (paper)
Margin loss (paper)
VICReg loss (paper)

Install from PyPi

pip install pytorch-lifestream

Install from source

# Ubuntu 20.04

sudo apt install python3.8 python3-venv
pip3 install pipenv

pipenv sync  --dev # install packages exactly as specified in Pipfile.lock
pipenv shell
pytest

Demo notebooks

Self-supervided training and embeddings for downstream task notebook
Self-supervided embeddings in CatBoost notebook
Self-supervided training and fine-tuning notebook
PySpark and Parquet for data preprocessing notebook

Experiments on public datasets

pytorch-lifestream usage experiments on several public event datasets are available in the separate repo

Comments

torch.stack in def collate_feature_dict

ptls/data_load/utils.py

Hello!

If the dataloader has a feature called target. And the batchsize is not a multiple of the length of the dataset, then an error pops up on the last batch: "Sizes of tensors must match except in dimension 0". Due to the use of torch.staсk when processing a feature startwith 'target'.

opened by Ivanich-spb 11
Not supported multiGPU option from pytorchlightning.Trainer

Try to set Trainer(gpus=[0,1]), while using PtlsDataModule as data module, get such error:

AttributeError: Can't pickle local object 'PtlsDataModule.__init__.<locals>.train_dataloader'

opened by mazitovs 1
Correct seq_len for feature dict
rec = { 'mcc': [0, 1, 2, 3], 'target_distribution': [0.1, 0.2, 0.4, 0.1, 0.1, 0.0], }

How to get correct seq_len. true len: 4 possible length: 4, 6 'target_distribution' is incorrect field to get length, this is not a sequence, this is an array
opened by ivkireev86 1
Save categories encodings along with model weights in demos

Вместе с обученной моделью необходимо сохранять обученный препроцессор и разбивку на трейн-тест. Иначе категории могут поехать и сохраненная предобученная модель станет бесполезной.

opened by ivkireev86 1
Documentation index
Прототип главной страницы документации. Три секции:

описание моделей библиотеки

гайд как использовать библиотеку

как писать свои компоненты

Есть краткое описание и ссылки на подробные (которые напишем потом).

В описании модулей предложена структура библиотеки. Предполагается, что мы эти модули в ближайшее создадим и перетащим туда соответсвующие классы из библиотеки. Старые, модули, которые станут пустыми, удалим. Далее будем придерживаться схемы, описанной в этом документе.

На ревью предлагается чекнуть предлагаемую структуру библиотеки, названия модулей ну и сам описательный текст документа.
opened by ivkireev86 1
KL cyclostationarity test tools

Test provides a hystogram with self-samples similarity vs. random sample similarity. Shows compatibility with CoLES.

Think about tests for other frameworks.

opened by ivkireev86 0
Repair pyspark tests
def test_dt_to_timestamp(): spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00'}, {'dt': '2012-01-01 12:01:16'}, {'dt': '2021-12-30 00:00:00'} ])

df = df.withColumn('ts', dt_to_timestamp('dt')) ts = [rec.ts for rec in df.select('ts').collect()]

assert ts == [0, 1325419276, 1640822400]

E assert [-10800, 1325...6, 1640811600] == [0, 1325419276, 1640822400] E At index 0 diff: -10800 != 0 E Use -v to get more diff

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:16: AssertionError

def test_datetime_to_timestamp(): t = DatetimeToTimestamp(col_name_original='dt') spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame(data=[ {'dt': '1970-01-01 00:00:00', 'rn': 1}, {'dt': '2012-01-01 12:01:16', 'rn': 2}, {'dt': '2021-12-30 00:00:00', 'rn': 3} ]) df = t.fit_transform(df) et = [rec.event_time for rec in df.select('event_time').collect()]

assert et[0] == 0

E assert -10800 == 0

ptls_tests/test_preprocessing/test_pyspark/test_event_time.py:48: AssertionError
opened by ikretus 0
docs. Development guide (for demo notebooks)
add current patterns

when model training start print message "model training stats, please wait. See tensorboard to track progress", use it with enable_progress=False

documentation user feedback
opened by ivkireev86 0

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)
What's Changed

fixed cpc import by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/90

add softmaxloss and tests by @ArtyomVorobev in https://github.com/dllllb/pytorch-lifestream/pull/87

MLM NSP Module by @mazitovs in https://github.com/dllllb/pytorch-lifestream/pull/88

fix test dropout error by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/91

New Contributors

@ArtyomVorobev made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/90

@mazitovs made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/88

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.5.0...v0.5.1
Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 9, 2022)
What's Changed

Fix metrics reset by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/72

Pandas preprocessing without df copy, faster preprocessing for large datasets by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/73

fix in supervised-sequence-to-target.ipynb by @blinovpd in https://github.com/dllllb/pytorch-lifestream/pull/74

ptls.nn.PBDropout by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/75

tanh for rnn starter by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/76

Auc regr metric by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/78

spatial dropout for NoisyEmbedding, LastMaxAvgEncoder, warning for bidir RnnEncoder by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/80

Hparam tuning demo. hydra, optuna, tensorboard by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/81

tabformer by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/83

Supervised Coles Module, trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/84

New Contributors

@blinovpd made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/74

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.4.0...v0.5.0
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 27, 2022)
What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0

What's Changed

Seq encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/29

regr.task ZILNLoss, RMSE, BucketAccuracy by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/36

lighting modules and nn layers refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/34

Demo colab by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/40

Fix drop target arrays by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/42

feature naming by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/43

Update abs_module.py by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/37

Extended inference demo by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/45

fix import path by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/46

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/50

Experiments sync by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/52

Target dist by @ikretus in https://github.com/dllllb/pytorch-lifestream/pull/58

Data load refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/60

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/62

doc update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/63

New Contributors

@ikretus made their first contribution in https://github.com/dllllb/pytorch-lifestream/pull/36

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Jun 12, 2022)
More Pythonic Core API: constructor arguments instead of config objects

What's Changed

cpc params by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/9

All modules by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/15

Mlm pretrain by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/13

all encoders and get rid of get_loss by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/19

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/20

Documentation index by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/8

Demos api update by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/18

loss output correction by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/22

Test fixes by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/23

readme_demo_link by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/25

init by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/26

work without logger by @justalge in https://github.com/dllllb/pytorch-lifestream/pull/7

trx_encoder refactoring by @ivkireev86 in https://github.com/dllllb/pytorch-lifestream/pull/28

Full Changelog: https://github.com/dllllb/pytorch-lifestream/compare/v0.1.2...v0.3.0
Source code(tar.gz)
Source code(zip)

Owner

Dmitri Babaev

GitHub Repository

Summary of related papers on visual attention

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp

2.1k Dec 30, 2022

Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

16 Dec 02, 2022

Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

1.1k Dec 27, 2022

An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

99 Dec 11, 2022

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

39 Sep 21, 2022

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

DeepTCN TensorFlow TensorFlow (Python) implementation of multivariate time series forecasting model introduced in Chen, Y., Kang, Y., Chen, Y., & Wang

21 Dec 19, 2022

Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021.

PHDimGeneralization Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021. Overvie

13 Nov 08, 2022

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

298 Dec 21, 2022

a simple, efficient, and intuitive text editor

Oxygen beta a simple, efficient, and intuitive text editor Overview oxygen is a simple, efficient, and intuitive text editor designed as more featured

1 Feb 23, 2022

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

22 Sep 15, 2022

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

66 Jan 04, 2023

Code release for Local Light Field Fusion at SIGGRAPH 2019

Local Light Field Fusion Project | Video | Paper Tensorflow implementation for novel view synthesis from sparse input images. Local Light Field Fusion

1.1k Dec 27, 2022

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations The code of: Weakly Supervised Learning of Instance Segmentation with I

472 Dec 29, 2022

Code accompanying the paper on "An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers" published at NeurIPS, 2021

Code for "An Empirical Investigation of Domian Generalization with Empirical Risk Minimizers" (NeurIPS 2021) Motivation and Introduction Domain Genera

15 Dec 27, 2022

A library built upon PyTorch for building embeddings on discrete event sequences using self-supervision

Related tags

Overview

Install from PyPi

Install from source

Demo notebooks

Experiments on public datasets

Comments

Releases(v0.5.1)

v0.5.1(Dec 28, 2022)

What's Changed

New Contributors

v0.5.0(Nov 9, 2022)

What's Changed

New Contributors

v0.4.0(Jul 27, 2022)

What's Changed

New Contributors

What's Changed

New Contributors

What's Changed

New Contributors

v0.3.0(Jun 12, 2022)

What's Changed

Owner

Dmitri Babaev

Summary of related papers on visual attention

Robustness between the worst and average case

Code release for Local Light Field Fusion at SIGGRAPH 2019

An Unsupervised Graph-based Toolbox for Fraud Detection

This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

Official implementation of "Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks", NeurIPS 2021.

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

a simple, efficient, and intuitive text editor

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Code release for Local Light Field Fusion at SIGGRAPH 2019

Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations, CVPR 2019 (Oral)

Code accompanying the paper on "An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers" published at NeurIPS, 2021

A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

NeuralForecast is a Python library for time series forecasting with deep learning models

Python package to add text to images, textures and different backgrounds

Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Trainable Bilateral Filter Layer (PyTorch)