Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Last update: Sep 14, 2022

Related tags

Overview

Graph Neural Topic Model (GNTM)

This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Requirements

Python >= 3.6
Pytorch == 1.6.0
torch-geometric == 1.7.0
torch-scatter == 2.0.6
torch-sparse == 0.6.9

Dataset

The links of the datasets can be found in the following:

The Glove word embeddings can be download from theis link.

The datasets and word embedings should be placed with the guide of the paths in the settings.py.

Usage

Before training GNTM, we first need to preprocess the data by the following scripts (need adjust some parameters based on the description in our paper for different datasets.):

cd dataPrepare
python preprocess.py
python graph_data.py

Example script to train GNTM:

python main.py \
--device cuda:0 \
--dataset News20 \
--model GDGNNMODEL \
--num_topic 20 \
--num_epoch 400 \
--ni 300  \
--word \
--taskid 0 \
--nwindow  3

Here,

--dataset specifies the dataset name, currently it supports News20, TMN, BNC and Reuters for 20 News Group, Tag My News, British National Corpus and Reuters, respectively.
--device represents computation device, such as cpu or cuda:0.
--model represents the used model, GDGNNMODEL is corresponding to GNTM
--num_topic represents the number of topics.
--num_epoch represents the maximized number of training epochs.
--ni represents the dimension of word embeddings.
--taskid is corresponding to the random seed.
--nwindow represents the window size to construct dpcument graphs.

Reference

If you find our methods or code helpful, please kindly cite the paper:

@inproceedings{shen2021topic,
  title={Topic Modeling Revisited: A Document Graph-based Neural Network Perspective},
  author={Shen, Dazhong and Qin, Chuan and Wang, Chao and Dong, Zheng and Zhu, Hengshu and Xiong, Hui},
  booktitle={Proceedings of Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS-2021)},
  year={2021}
}

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Related tags

Overview

Graph Neural Topic Model (GNTM)

Requirements

Dataset

Usage

Reference

Owner

Dazhong Shen

Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

Open source annotation tool for machine learning practitioners.

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

Pathdreamer: A World Model for Indoor Navigation

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

A Learning-based Camera Calibration Toolbox

LaBERT - A length-controllable and non-autoregressive image captioning model.

An implementation of RetinaNet in PyTorch.

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

Keras Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation by (Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio)

Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting

4th place solution for the SIGIR 2021 challenge.

FinEAS: Financial Embedding Analysis of Sentiment 📈

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

Depression Asisstant GDSC Challenge Solution

I will implement Fastai in each projects present in this repository.