🧮 Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Overview

LDA4Rec

Project generated with PyScaffold

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model After All" by Florian Wilhelm. The preprint can be found here along with the following statement:

"© Florian Wilhelm 2021. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in RecSys '21: Fifteenth ACM Conference on Recommender Systems Proceedings, https://doi.org/10.1145/3460231.3474266."

Installation

In order to set up the necessary environment:

  1. review and uncomment what you need in environment.yml and create an environment lda4rec with the help of conda:
    conda env create -f environment.yml
    
  2. activate the new environment with:
    conda activate lda4rec
    
  3. (optionally) get a free neptune.ai account for experiment tracking and save the api token under ~/.neptune_api_token (default).

Running Experiments

First check out and adapt the default experiment config configs/default.yaml and run it with:

lda4rec -c configs/default.yaml run

A config like configs/default.yaml can also be used as a template to create an experiment set with:

lda4rec -c configs/default.yaml create -ds movielens-100k

using the Movielens-100k dataset. Check out cli.py for more details.

Cloud Setup

Commands for setting up an Ubuntu 20.10 VM with at least 20 GiB of HD on e.g. a GCP c2-standard-30 instance:

tmux
sudo apt-get install -y build-essential
curl https://sh.rustup.rs -sSf | sh
source $HOME/.cargo/env
cargo install pueue
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
sh Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
git clone https://github.com/FlorianWilhelm/lda4rec.git
cd lda4rec
conda env create -f environment.yml
conda activate lda4rec
vim ~/.neptune_api_token # and copy it over

Then create and run all experiments for full control over parallelism with pueue:

pueued -d # only once to start the daemon
pueue parallel 10
export OMP_NUM_THREADS=4  # to limit then number of threads per model
lda4rec -c configs/default.yaml create # to create the config files
find ./configs -maxdepth 1 -name "exp_*.yaml" -exec pueue add "lda4rec -c {} run" \; -exec sleep 30 \;

Remark: -exec sleep 30 avoids race condition when reading datasets if parallelism is too high.

Dependency Management & Reproducibility

  1. Always keep your abstract (unpinned) dependencies updated in environment.yml and eventually in setup.cfg if you want to ship and install your package via pip later on.
  2. Create concrete dependencies as environment.lock.yml for the exact reproduction of your environment with:
    conda env export -n lda4rec -f environment.lock.yml
    For multi-OS development, consider using --no-builds during the export.
  3. Update your current environment with respect to a new environment.lock.yml using:
    conda env update -f environment.lock.yml --prune

Project Organization

├── AUTHORS.md              <- List of developers and maintainers.
├── CHANGELOG.md            <- Changelog to keep track of new features and fixes.
├── LICENSE.txt             <- License as chosen on the command-line.
├── README.md               <- The top-level README for developers.
├── configs                 <- Directory for configurations of model & application.
├── data                    <- Downloaded datasets will be stored here.
├── docs                    <- Directory for Sphinx documentation in rst or md.
├── environment.yml         <- The conda environment file for reproducibility.
├── notebooks               <- Jupyter notebooks. Naming convention is a number (for
│                              ordering), the creator's initials and a description,
│                              e.g. `1.0-fw-initial-data-exploration`.
├── logs                    <- Generated logs are collected here.
├── results                 <- Results as exported from neptune.ai.
├── setup.cfg               <- Declarative configuration of your project.
├── setup.py                <- Use `python setup.py develop` to install for development or
│                              or create a distribution with `python setup.py bdist_wheel`.
├── src
│   └── lda4rec             <- Actual Python package where the main functionality goes.
├── tests                   <- Unit tests which can be run with `py.test`.
├── .coveragerc             <- Configuration for coverage reports of unit tests.
├── .isort.cfg              <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

How to Cite

Please cite LDA4Rec if it helps your research. You can use the following BibTeX entry:

@inproceedings{wilhelm2021lda4rec,
author = {Wilhelm, Florian},
title = {Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All},
year = {2021},
month = sep,
isbn = {978-1-4503-8458-2/21/09},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3460231.3474266},
doi = {10.1145/3460231.3474266},
booktitle = {Fifteenth ACM Conference on Recommender Systems},
numpages = {8},
location = {Amsterdam, Netherlands},
series = {RecSys '21}
}

License

This sourcecode is AGPL-3-only licensed. If you require a more permissive licence, e.g. for commercial reasons, contact me to obtain a licence for your business.

Acknowledgement

Special thanks goes to Du Phan and Fritz Obermeyer from the (Num)Pyro project for their kind help and helpful comments on my code.

Note

This project has been set up using PyScaffold 4.0 and the dsproject extension 0.6. Some source code was taken from Spotlight (MIT-licensed) by Maciej Kula as well as lrann (MIT-licensed) by Florian Wilhelm and Marcel Kurovski.

Owner
Florian Wilhelm
Data Scientist with a mathematical background.
Florian Wilhelm
PyTorch implementation of "Optimization Planning for 3D ConvNets"

Optimization-Planning-for-3D-ConvNets Code for the ICML 2021 paper: Optimization Planning for 3D ConvNets. Authors: Zhaofan Qiu, Ting Yao, Chong-Wah N

Zhaofan Qiu 2 Jan 12, 2022
Pure python implementations of popular ML algorithms.

Minimal ML algorithms This repo includes minimal implementations of popular ML algorithms using pure python and numpy. The purpose of these notebooks

Alexis Gidiotis 3 Jan 10, 2022
Multi-Modal Machine Learning toolkit based on PyTorch.

简体中文 | English TorchMM 简介 多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库,为处理图片文本等多模态数据提供高效的解决方案,助力多模态学习应用落地。 近期更新 2022.1.5 发布 TorchMM 初始版本 v1.0 特性 丰富的任务场景:工具

njustkmg 1 Jan 05, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
Author: Wenhao Yu ([email protected]). ACL 2022. Commonsense Reasoning on Knowledge Graph for Text Generation

Diversifying Commonsense Reasoning Generation on Knowledge Graph Introduction -- This is the pytorch implementation of our ACL 2022 paper "Diversifyin

DM2 Lab @ ND 61 Dec 30, 2022
A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

Manifold Matching via Deep Metric Learning for Generative Modeling A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generat

69 Dec 10, 2022
2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

2021-AIAC-QQ-Browser-Hyperparameter-Optimization-Rank6

Aigege 8 Mar 31, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022
Summary of related papers on visual attention

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper Vision-Attention-Papers Channel attention Spatial attention Temp

MenghaoGuo 2.1k Dec 30, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

64 Dec 16, 2022
Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge

Part of the 9th place solution for the Bristol-Myers Squibb – Molecular Translation challenge translating images containing chemical structures into I

Erdene-Ochir Tuguldur 22 Nov 30, 2022
Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Time2box Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

LingCai 4 Aug 23, 2022
This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

HiRID-ICU-Benchmark This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

Biomedical Informatics at ETH Zurich 30 Dec 16, 2022
Flax is a neural network ecosystem for JAX that is designed for flexibility.

Flax: A neural network library and ecosystem for JAX designed for flexibility Overview | Quick install | What does Flax look like? | Documentation See

Google 3.9k Jan 02, 2023
免费获取http代理并生成proxifier配置文件

freeproxy 免费获取http代理并生成proxifier配置文件 公众号:台下言书 工具说明:https://mp.weixin.qq.com/s?__biz=MzIyNDkwNjQ5Ng==&mid=2247484425&idx=1&sn=56ccbe130822aa35038095317

说书人 32 Mar 25, 2022
A library for differentiable nonlinear optimization.

Theseus A library for differentiable nonlinear optimization built on PyTorch to support constructing various problems in robotics and vision as end-to

Meta Research 1.1k Dec 30, 2022
This repository includes different versions of the prescribed-time controller as Simulink blocks and MATLAB script codes for engineering applications.

Prescribed-time Control Prescribed-time control (PTC) blocks in Simulink environment, MATLAB R2020b. For more theoretical details, refer to the papers

Amir Shakouri 1 Mar 11, 2022
NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

NAC Official PyTorch implementation of NAC from the paper: Neural Auto-Curricula in Two-Player Zero-Sum Games. We release code for: Gradient based ora

Xidong Feng 19 Nov 11, 2022