Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Overview

LOREN

Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

front

DEMO System

Check out our demo system! Note that the results will be slightly different from the paper, since we use an up-to-date Wikipedia as the evidence source whereas FEVER uses Wikipedia dated 2017.

Dependencies

  • CUDA > 11
  • Prepare requirements: pip3 install -r requirements.txt.
    • Also works for allennlp==2.3.0, transformers==4.5.1, torch==1.8.1.
  • Set environment variable $PJ_HOME: export PJ_HOME=/YOUR_PATH/LOREN/.

Download Pre-processed Data and Checkpoints

  • Pre-processed data at Google Drive. Unzip it and put them under LOREN/data/.

    • Data for training a Seq2Seq MRC is at data/mrc_seq2seq_v5/.
    • Data for training veracity prediction is at data/fact_checking/v5/*.json.
      • Note: dev.json uses ground truth evidence for validation, where eval.json uses predicted evidence for validation. This is consistent with the settings in KGAT.
    • Evidence retrieval models are not required for training LOREN, since we directly adopt the retrieved evidence from KGAT, which is at data/fever/baked_data/ (using only during pre-processing).
    • Original data is at data/fever/ (using only during pre-processing).
  • Pre-trained checkpoints at Huggingface Models. Unzip it and put them under LOREN/models/.

    • Checkpoints for veracity prediciton are at models/fact_checking/.
    • Checkpoint for generative MRC is at models/mrc_seq2seq/.
    • Checkpoints for KGAT evidence retrieval models are at models/evidence_retrieval/ (not used in training, displayed only for the sake of completeness).

Training LOREN from Scratch

For quick training and inference with pre-processed data & pre-trained models, please go to Veracity Prediction.

First, go to LOREN/src/.

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

You'll need to download three external models in this step, i.e., two models from AllenNLP in parsing_client/sentence_parser.py and a T5-based question generation model in qg_client/question_generator.py. Don't worry, they'll be automatically downloaded.

  • Run python3 pproc_client/pproc_questions.py --roles eval train val test
  • This generates cached json files:
    • AG_PREFIX/answer.{role}.cache: extracted phrases are stored in the field answers.
    • QG_PREFIX/question.{role}.cache: generated questions are stored in the field cloze_qs, generate_qs and questions (two types of questions concatenated).

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)
  • Run python3 pproc_client/pproc_mrc.py -o LOREN/data/mrc_seq2seq_v5.
  • This generates files for Seq2Seq training in a HuggingFace style:
    • data/mrc_seq2seq_v5/{role}.source: concatenated question and evidence text.
    • data/mrc_seq2seq_v5/{role}.target: answer (claim phrase).
Training Seq2Seq
  • Go to mrc_client/seq2seq/, which is modified based on HuggingFace's examples.
  • Follow script/train.sh.
  • The best checkpoint will be saved in $output_dir (e.g., models/mrc_seq2seq/).
    • Best checkpoints are decided by ROUGE score on dev set.

3) Run MRC for all questions and assemble local premises

  • Run python3 pproc_client/pproc_evidential.py --roles val train eval test -m PATH_TO_MRC_MODEL/.
  • This generates files:
    • {role}.json: files for veracity prediction. Assembled local premises are stored in the field evidential_assembled.

4) Building NLI prior

Before training veracity prediction, we'll need a NLI prior from pre-trained NLI models, such as DeBERTa.

  • Run python3 pproc_client/pproc_nli_labels.py -i PATH_TO/{role}.json -m microsoft/deberta-large-mnli.
  • Mind the order! The predicted classes [Contradict, Neutral, Entailment] correspond to [REF, NEI, SUP], respectively.
  • This generates files:
    • Adding a new field nli_labels to {role}.json.

2 Veracity Prediction

This part is rather easy (less pipelined :P). A good place to start if you want to skip the above pre-processing.

1) Training

  • Go to folder check_client/.
  • See what scripts/train_*.sh does.

2) Testing

  • Stay in folder check_client/
  • Run python3 fact_checker.py --params PARAMS_IN_THE_CODE
  • This generates files:
    • results/*.predictions.jsonl

3) Evaluation

  • Go to folder eval_client/

  • For Label Accuracy and FEVER score: fever_scorer.py

  • For CulpA (turn on --verbose in testing): culpa.py

Citation

If you find our paper or resources useful to your research, please kindly cite our paper (pre-print, official published paper coming soon).

@misc{chen2021loren,
      title={LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification}, 
      author={Jiangjie Chen and Qiaoben Bao and Changzhi Sun and Xinbo Zhang and Jiaze Chen and Hao Zhou and Yanghua Xiao and Lei Li},
      year={2021},
      eprint={2012.13577},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Owner
Jiangjie Chen
Ph.D. student.
Jiangjie Chen
This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Motion .

ROSEFusion 🌹 This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Moti

219 Dec 27, 2022
Baseline powergrid model for NY

Baseline-powergrid-model-for-NY Table of Contents About The Project Built With Usage License Contact Acknowledgements About The Project As the urgency

Anderson Energy Lab at Cornell 6 Nov 24, 2022
MPI Interest Group on Algorithms on 1st semester 2021

MPI Algorithms Interest Group Introduction Lecturer: Steve Yan Location: TBA Time Schedule: TBA Semester: 1 Useful URLs Typora: https://typora.io Goog

Ex10si0n 13 Sep 08, 2022
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

STARCH (Storm Tracking And Regional CHaracterization) STARCH computes regional extreme storm physical and moisture balance characteristics based on sp

Onosama 7 Oct 20, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Winter School Welcome to the CorrelAid ML Winter School! Task The problem we want to solve is to classify trees in Roosevel

CorrelAid 12 Nov 23, 2022
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking

RDA: Robust Domain Adaptation via Fourier Adversarial Attacking Updates 08/2021: check out our domain adaptation for video segmentation paper Domain A

17 Nov 30, 2022
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

Rishabh Anand 24 Mar 23, 2022
[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

SMYRF: Efficient attention using asymmetric clustering Get started: Abstract We propose a novel type of balanced clustering algorithm to approximate a

Giannis Daras 46 Dec 22, 2022
Official source code of Fast Point Transformer, CVPR 2022

Fast Point Transformer Project Page | Paper This repository contains the official source code and data for our paper: Fast Point Transformer Chunghyun

182 Dec 23, 2022
基于PaddleOCR搭建的OCR server... 离线部署用

开头说明 DangoOCR 是基于大家的 CPU处理器 来运行的,CPU处理器 的好坏会直接影响其速度, 但不会影响识别的精度 ,目前此版本识别速度可能在 0.5-3秒之间,具体取决于大家机器的配置,可以的话尽量不要在运行时开其他太多东西。需要配合团子翻译器 Ver3.6 及其以上的版本才可以使用!

胖次团子 131 Dec 25, 2022
A library for answering questions using data you cannot see

A library for computing on data you do not own and cannot see PySyft is a Python library for secure and private Deep Learning. PySyft decouples privat

OpenMined 8.5k Jan 02, 2023
Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

T2I_CL This is the official Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning Requirements Linux Python

42 Dec 31, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
Deploy optimized transformer based models on Nvidia Triton server

Deploy optimized transformer based models on Nvidia Triton server

Lefebvre Sarrut Services 1.2k Jan 05, 2023
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

Phil Wang 90 Nov 24, 2022
General Assembly Capstone: NBA Game Predictor

Project 6: Predicting NBA Games Problem Statement Can I predict the results of NBA games from the back-half of a season from the opening half of the s

Adam Muhammad Klesc 1 Jan 14, 2022