This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Related tags

Deep LearningCQFS
Overview

Collaborative-driven Quantum Feature Selection

This repository was developed by Riccardo Nembrini, PhD student at Politecnico di Milano. See the websites of our quantum computing group and of our recommender systems group for more information on our teams and works. This repository contains the source code for the article "Feature Selection for Recommender Systems with Quantum Computing".

Here we explain how to install dependencies, setup the connection to D-Wave Leap quantum cloud services and how to run experiments included in this repository.

Installation

NOTE: This repository requires Python 3.7

It is suggested to install all the required packages into a new Python environment. So, after repository checkout, enter the repository folder and run the following commands to create a new environment:

If you're using virtualenv:

virtualenv -p python3 cqfs
source cqfs/bin/activate

If you're using conda:

conda create -n cqfs python=3.7 anaconda
conda activate cqfs

Remember to add this project in the PYTHONPATH environmental variable if you plan to run the experiments on the terminal:

export PYTHONPATH=$PYTHONPATH:/path/to/project/folder

Then, make sure you correctly activated the environment and install all the required packages through pip:

pip install -r requirements.txt

After installing the dependencies, it is suggested to compile Cython code in the repository.

In order to compile you must first have installed: gcc and python3 dev. Under Linux those can be installed with the following commands:

sudo apt install gcc 
sudo apt-get install python3-dev

If you are using Windows as operating system, the installation procedure is a bit more complex. You may refer to THIS guide.

Now you can compile all Cython algorithms by running the following command. The script will compile within the current active environment. The code has been developed for Linux and Windows platforms. During the compilation you may see some warnings.

python run_compile_all_cython.py

D-Wave Setup

In order to make use of D-Wave cloud services you must first sign-up to D-Wave Leap and get your API token.

Then, you need to run the following command in the newly created Python environment:

dwave setup

This is a guided setup for D-Wave Ocean SDK. When asked to select non-open-source packages to install you should answer y and install at least D-Wave Drivers (the D-Wave Problem Inspector package is not required, but could be useful to analyse problem solutions, if solving problems with the QPU only).

Then, continue the configuration by setting custom properties (or keeping the default ones, as we suggest), apart from the Authentication token field, where you should paste your API token obtained on the D-Wave Leap dashboard.

You should now be able to connect to D-Wave cloud services. In order to verify the connection, you can use the following command, which will send a test problem to D-Wave's QPU:

dwave ping

Running CQFS Experiments

First of all, you need to prepare the original files for the datasets.

For The Movies Dataset you need to download The Movies Dataset from Kaggle and place the compressed files in the directory recsys/Data_manager_offline_datasets/TheMoviesDataset/, making sure the file is called the-movies-dataset.zip.

For CiteULike_a you need to download the following .zip file and place it in the directory recsys/Data_manager_offline_datasets/CiteULike/, making sure the file is called CiteULike_a_t.zip.

We cannot provide data for Xing Challenge 2017, but if you have the dataset available, place the compressed file containing the dataset's original files in the directory recsys/Data_manager_offline_datasets/XingChallenge2017/, making sure the file is called xing_challenge_data_2017.zip.

After preparing the datasets, you should run the following command under the data directory:

python split_NameOfTheDataset.py

This python script will generate the data splits used in the experiments. Moreover, it will preprocess the dataset and check for any error in the preprocessing phase. The resulting splits are saved in the recsys/Data_manager_split_datasets directory.

After splitting the dataset, you can actually run the experiments. All the experiment scripts are in the experiments directory, so enter this folder first. Each dataset has separated experiment scripts that you can find in the corresponding directories. From now on, we will assume that you are running the following commands in the dataset-specific folders, thus running the scripts contained there.

Collaborative models

First of all, we need to optimize the chosen collaborative models to use with CQFS. To do so, run the following command:

python CollaborativeFiltering.py

The resulting models will be saved into the results directory.

CQFS

Then, you can run the CQFS procedure. We divided the procedure into a selection phase and a recommendation phase. To perform the selection through CQFS run the following command:

python CQFS.py

This script will solve the CQFS problem on the corresponding dataset and save all the selected features in appropriate subdirectories under the results directory.

After solving the feature selection problem, you should run the following command:

python CQFSTrainer.py

This script will optimize an ItemKNN content-based recommender system for each selection corresponding to the given hyperparameters (and previously obtained through CQFS), using only the selected features. Again, all the results are saved in the corresponding subdirectories under the results directory.

NOTE: Each selection with D-Wave Leap hybrid service on these problems is performed in around 8 seconds for The Movies Dataset and around 30 for CiteULike_a. Therefore, running the script as it is would result in consuming all the free time given with the developer plan on D-Wave Leap and may result in errors or invalid selections when there's no free time remaining.

We suggest to reduce the number of hyperparameters passed when running the experiments or, even better, chose a collaborative model and perform all the experiments on it.

This is not the case when running experiments with Simulated Annealing, since it is executed locally.

For Xing Challenge 2017 experiments run directly on the D-Wave QPU. Leaving all the hyperparameters unchanged, all the experiments should not exceed the free time of the developer plan. Pay attention when increasing the number of reads from the sampler or the annealing time.

Baselines

In order to obtain the baseline evaluations you can run the corresponding scripts with the following commands:

# ItemKNN content-based with all the features
python baseline_CBF.py

# ItemKNN content-based with features selected through TF-IDF
python baseline_TFIDF.py

# CFeCBF feature weighting baseline
python baseline_CFW.py

Acknowledgements

Software produced by Riccardo Nembrini. Recommender systems library by Maurizio Ferrari Dacrema.

Article authors: Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

Owner
Quantum Computing Lab @ Politecnico di Milano
Quantum Machine Learning group
Quantum Computing Lab @ Politecnico di Milano
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。

MASR中文语音识别(pytorch版) 开箱即用 自行训练 使用与训练分离(增量训练) 识别率高 说明:因为每个人电脑机器不同,而且有些安装包安装起来比较麻烦,强烈建议直接用我编译好的docker环境跑 目前docker基础环境为ubuntu-cuda10.1-cudnn7-pytorch1.6.

发送小信号 180 Dec 17, 2022
NAS-Bench-x11 and the Power of Learning Curves

NAS-Bench-x11 NAS-Bench-x11 and the Power of Learning Curves Shen Yan, Colin White, Yash Savani, Frank Hutter. NeurIPS 2021. Surrogate NAS benchmarks

AutoML-Freiburg-Hannover 13 Nov 18, 2022
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库,帮助大家挑选或训练出更适合自己科研或者业务的模型结构。 文档地址:https://basecls.readthedocs.io 安装 安装环境 BaseCls 需要 Python = 3.6。 BaseCls 依赖 M

MEGVII Research 28 Dec 23, 2022
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
Viperdb - A tiny log-structured key-value database written in pure Python

ViperDB 🐍 ViperDB is a lightweight embedded key-value store written in pure Pyt

17 Oct 17, 2022
ICLR21 Tent: Fully Test-Time Adaptation by Entropy Minimization

⛺️ Tent: Fully Test-Time Adaptation by Entropy Minimization This is the official project repository for Tent: Fully-Test Time Adaptation by Entropy Mi

Dequan Wang 204 Dec 25, 2022
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
Reading list for research topics in Masked Image Modeling

awesome-MIM Reading list for research topics in Masked Image Modeling(MIM). We list the most popular methods for MIM, if I missed something, please su

ligang 231 Dec 07, 2022
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
SAMO: Streaming Architecture Mapping Optimisation

SAMO: Streaming Architecture Mapping Optimiser The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model

Alexander Montgomerie-Corcoran 20 Dec 10, 2022
A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

George Stein 290 Dec 29, 2022
Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

xuanyuan14 5 Mar 05, 2022
Object detection, 3D detection, and pose estimation using center point detection:

Objects as Points Object detection, 3D detection, and pose estimation using center point detection: Objects as Points, Xingyi Zhou, Dequan Wang, Phili

Xingyi Zhou 6.7k Jan 03, 2023
PyTorch implementation of ENet

PyTorch-ENet PyTorch (v1.1.0) implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from the lua-torc

David Silva 333 Dec 29, 2022
SysWhispers Shellcode Loader

Shhhloader Shhhloader is a SysWhispers Shellcode Loader that is currently a Work in Progress. It takes raw shellcode as input and compiles a C++ stub

icyguider 630 Jan 03, 2023
A High-Quality Real Time Upscaler for Anime Video

Anime4K Anime4K is a set of open-source, high-quality real-time anime upscaling/denoising algorithms that can be implemented in any programming langua

15.7k Jan 06, 2023
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

KIND (Kessler Italian Named-entities Dataset) KIND is an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with t

Digital Humanities 5 Jun 21, 2022
Pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments

Cascaded-FCN This repository contains the pre-trained models for a Cascaded-FCN in caffe and tensorflow that segments the liver and its lesions out of

300 Nov 22, 2022