Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Last update: Sep 17, 2022

Overview

MTM

This is the official repository of the paper:

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, and Lei Zhong.

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

PDF / Poster / Code / Chinese Dataset / Chinese Blog 1 / Chinese Blog 2

Datasets

There are two experimental datasets, including the Twitter Dataset, and the firstly proposed Weibo Dataset. Note that you can download the Weibo Dataset only after an "Application to Use the Chinese Dataset for Detecting Previously Fact-Checked Claim" has been submitted.

Code

Key Requirements

python==3.6.10
torch==1.6.0
torchvision==0.7.0
transformers==3.2.0

Usage for Weibo Dataset

After you download the dataset (the way to access is described here), move the FN_11934_filtered.json and DN_27505_filtered.json into the path MTM/dataset/Weibo/raw:

mkdir MTM/dataset/Weibo/raw
mv FN_11934_filtered.json MTM/dataset/Weibo/raw
mv DN_27505_filtered.json MTM/dataset/Weibo/raw

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_weibo.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_weibo.sh, which includes three steps:

Prepare RougeBert's Training data:

python prepare_for_rouge.py --dataset Weibo --pretrained_model bert-base-chinese

Training:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
--dataset Weibo --pretrained_model bert-base-chinese --save './ckpts/Weibo' \
--rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.01 \
--fp16 True

then you can get ckpts/Weibo/[EPOCH].pt.

Vectorize the claims and articles (get embeddings):

CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
--dataset Weibo --pretrained_model bert-base-chinese \
--rouge_bert_model_file './ckpts/Weibo/[EPOCH].pt' \
--batch_size 1024 --embeddings_type static

PMB

cd MTM/preprocess/PMB

Prepare the clustering data:
```
mkdir data
mkdir data/Weibo
```
and you can get data/Weibo/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

Kmeans clustering. You can refer to the run_weibo.sh:

python kmeans_clustering.py --dataset Weibo --pretrained_model bert-base-chinese --clustering_data_file 'data/Weibo/clustering_training_data_[TS_SMALL]
     
      <[TS_LARGE].pkl'

then you can get data/Weibo/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Weibo.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Weibo

You can refer to the run_weibo.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Weibo' \
--dataset 'Weibo' --pretrained_model 'bert-base-chinese' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Weibo/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Weibo/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Weibo/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Weibo/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 3 \
--lr 5e-6 --epochs 10 --batch_size 32 \

then the results and ranking reports will be saved in ckpts/Weibo.

Usage for Twitter Dataset

The description of the dataset can be seen at here.

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_twitter.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_twitter.sh, which includes three steps:

Prepare RougeBert's Training data:

python prepare_for_rouge.py --dataset Twitter --pretrained_model bert-base-uncased

Training:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
--dataset Twitter --pretrained_model bert-base-uncased --save './ckpts/Twitter' \
--rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.05 \
--fp16 True

then you can get ckpts/Twitter/[EPOCH].pt.

Vectorize the claims and articles (get embeddings):

CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
--dataset Twitter --pretrained_model bert-base-uncased \
--rouge_bert_model_file './ckpts/Twitter/[EPOCH].pt' \
--batch_size 1024 --embeddings_type static

PMB

cd MTM/preprocess/PMB

Prepare the clustering data:
```
mkdir data
mkdir data/Twitter
```
and you can get data/Twitter/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

Kmeans clustering. You can refer to the run_twitter.sh:

python kmeans_clustering.py --dataset Twitter --pretrained_model bert-base-uncased --clustering_data_file 'data/Twitter/clustering_training_data_[TS_SMALL]
     
      <[TS_LARGE].pkl'

then you can get data/Twitter/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Twitter.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Twitter

You can refer to the run_twitter.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Twitter' \
--dataset 'Twitter' --pretrained_model 'bert-base-uncased' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Twitter/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Twitter/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Twitter/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Twitter/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 5 \
--lr 1e-4 --epochs 10 --batch_size 16 \

then the results and ranking reports will be saved in ckpts/Twitter.

Citation

@inproceedings{MTM,
  author    = {Qiang Sheng and
               Juan Cao and
               Xueyao Zhang and
               Xirong Li and
               Lei Zhong},
  title     = {Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting
               Previously Fact-Checked Claims},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021},
  pages     = {5468--5481},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
  url       = {https://doi.org/10.18653/v1/2021.acl-long.425},
  doi       = {10.18653/v1/2021.acl-long.425},
}

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Related tags

Overview

MTM

Datasets

Code

Key Requirements

Usage for Weibo Dataset

Preparation

Tokenize

ROT

PMB

Training and Inferring

Usage for Twitter Dataset

Preparation

Tokenize

ROT

PMB

Training and Inferring

Citation

Owner

ICTMCG

Multi-Scale Progressive Fusion Network for Single Image Deraining

Steer OpenAI's Jukebox with Music Taggers

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

Build Graph Nets in Tensorflow

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Benchmarks for Object Detection in Aerial Images

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

Code Impementation for "Mold into a Graph: Efficient Bayesian Optimization over Mixed Spaces"

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Boosted CVaR Classification (NeurIPS 2021)

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

Naszilla is a Python library for neural architecture search (NAS)

Reinforcement Learning for the Blackjack

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Offline Reinforcement Learning with Implicit Q-Learning

Affine / perspective transformation in Pose Estimation with Tensorflow 2

ColossalAI-Benchmark - Performance benchmarking with ColossalAI