Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Last update: Dec 16, 2022

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

ACL2021 Findings

Usage

0. Prepare environment

Requirements:

python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2

1. Prepare data

Download preprocessed datasets from Google Drive and extract files to the path ./data.

2. Run the model

python main.py --data_dir ./data/{dataset} --output_dir ./output

3. Evaluation

topic coherence: coherence score.

topic diversity:

python utils/TU.py --data_path {path of topic word file}

Citation

If you are interested in our work, please cite as

@inproceedings{wu2021discovering,
    title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
    author = "Wu, Xiaobao  and
    Li, Chunping  and
    Miao, Yishu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.15",
    doi = "10.18653/v1/2021.findings-acl.15",
    pages = "175--185",
}

Other related works

EMNLP2020 Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

NLPCC2020 Learning Multilingual Topics with Neural Variational Inference

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Related tags

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

Usage

0. Prepare environment

1. Prepare data

2. Run the model

3. Evaluation

Citation

Other related works

Owner

Xiaobao Wu

This repository contains Python scripts for extracting linguistic features from Filipino texts.

pytorch implementation of Attention is all you need

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

Anomaly Detection 이상치 탐지 전처리 모듈

SDL: Synthetic Document Layout dataset

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

A method for cleaning and classifying text using transformers.

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

Toward Model Interpretability in Medical NLP

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

Official Stanford NLP Python Library for Many Human Languages

Code for Editing Factual Knowledge in Language Models

Understanding the Difficulty of Training Transformers

CoSENT 比Sentence-BERT更有效的句向量方案

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.