This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Overview

Introduction: X-Ray Report Generation

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". Our work adopts x-ray (also including some history data for patients if there are any) as input, a CNN is used to learn the embedding features for x-ray, as a result, disease-state-style information (Previously, almost all work used detected disease embedding for input of text generation network which could possibly exclude the false negative diseases) is extracted and fed into the text generation network (transformer). To make sure the consistency of detected diseases and generated x-ray reports, we also create a interpreter to enforce the accuracy of the x-ray reports. For details, please refer to here.

Data we used for experiments

We use two datasets for experiments to validate our method:

Performance on two datasets

Datasets Methods BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE-L
Open-I Single-view 0.463 0.310 0.215 0.151 0.186 0.377
Multi-view 0.476 0.324 0.228 0.164 0.192 0.379
Multi-view w/ Clinical History 0.485 0.355 0.273 0.217 0.205 0.422
Full Model (w/ Interpreter) 0.515 0.378 0.293 0.235 0.219 0.436
MIMIC Single-view 0.447 0.290 0.200 0.144 0.186 0.317
Multi-view 0.451 0.292 0.201 0.144 0.185 0.320
Multi-view w/ Clinical History 0.491 0.357 0.276 0.223 0.213 0.389
Full Model (w/ Interpreter) 0.495 0.360 0.278 0.224 0.222 0.390

Environments for running codes

  • Operating System: Ubuntu 18.04

  • Hardware: tested with RTX 2080 TI (11G)

  • Software: tested with PyTorch 1.5.1, Python3.7, CUDA 10.0, tensorboardX, tqdm

  • Anaconda is strongly recommended

  • Other Libraries: Spacy, SentencePiece, nlg-eval

How to use our code for train/test

Step 0: Build your vocabulary model with SentencePiece (tools/vocab_builder.py)

  • Please make sure that you have preprocess the medical reports accurately.
  • We use the top 900 high-frequency words
  • We use 100 unigram tokens extracted from SentencePiece to avoid the out-of-vocabulary situation.
  • In total we have 1000 words and tokens. Update: You can skip step 0 and use the vocabulary files in Vocabulary/*.model

Step 1: Train the LSTM and/or Transformer models, which are just text classifiers, to obtain 14 common disease labels.

  • Use the train_text.py to train the models on your working datasets. For example, the MIMIC-CXR comes with CheXpert labels; you can use these labels as ground-truth to train a differentiable text classifier model. Here the text classifier is a binary predictor (postive/uncertain) = 1 and (negative/unmentioned) = 0.
  • Assume the trained text classifier is perfect and exactly reflects the medical reports. Although this is not the case, in practice, it gives us a good approximation of how good the generated reports are. Human evaluation is also needed to evalutate the generated reports.
  • The goals here are:
  1. Evaluate the performance of the generated reports by comparing the predicted labels and the ground-truth labels.
  2. Use the trained models to fine-tune medical reports' output.

Step 2: Test the text classifier models using the train_text.py with:

  • PHASE = 'TEST'
  • RELOAD = True --> Load the trained models for testing

Step 3: Transfer the trained model to obtain 14 common disease labels for the Open-I datasets and any dataset that doesn't have ground-truth labels.

  • Transfer the learned model to the new dataset by predicting 14 disease labels for the entire dataset by running extract_label.py on the target dataset. The output file is file2label.json
  • Split them into train, validation, and test sets (we have already done that for you, just put the file2label.json in a place where the NLMCXR dataset can see).
  • Build your own text classifier (train_text.py) based on the extracted disease labels (treat them as ground-truth labels).
  • In the end, we want the text classifiers (LSTM/Transformer) to best describe your model's output on the working dataset.

Step 4: Get additional labels using (tools/count_nounphrases.py)

  • Note that 14 disease labels are not enough to generate accurate reports. This is because for the same disease, we might have different ways to express it. For this reason, additional labels are needed to enhance the quality of medical reports.
  • The output of the coun_nounphrases.py is a json file, you can use it as input to the exising datasets such as MIMIC or NLMCXR.
  • Therefore, in total we have 14 disease labels + 100 noun-phrases = 114 disease-related topics/labels. Please check the appendix in our paper.

Step 5: Train the ClsGen model (Classifier-Generator) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = False --> We trained our model from scratch

Step 6: Train the ClsGenInt model (Classifier-Generator-Interpreter) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = True --> Load the ClsGen trained from the step 4, load the Interpreter model from Step 1 or 3
  • Reduce the learning rate --> Since the ClsGen has already converged, we need to reduce the learning rate to fine-tune the word representation such that it minimize the interpreter error.

Step 7: Generate the outputs

  • Use the infer function in the train_full.py to generate the outputs. This infer function ensures that no ground-truth labels and medical reports are being used in the inference phase (we used teacher forcing / ground-truth labels during training phase).
  • Also specify the threshold parameter, see the appendix of our paper on which threshold to choose from.
  • Final specify your the name of your output files.

Step 8: Evaluate the generated reports.

  • Use the trained text classifier model in step 1 to evaluate the clinical accuracy
  • Use the nlg-eval library to compute BLEU-1 to BLEU-4 scores and other metrics.

Our pretrained models

Our model is uploaded in google drive, please download the model from

Model Name Download Link
Our Model for MIMIC Google Drive
Our Model for NLMCXR Google Drive

Citation

If it is helpful to you, please cite our work:

@inproceedings{nguyen-etal-2021-automated,
    title = "Automated Generation of Accurate {\&} Fluent Medical {X}-ray Reports",
    author = "Nguyen, Hoang  and
      Nie, Dong  and
      Badamdorj, Taivanbat  and
      Liu, Yujie  and
      Zhu, Yingying  and
      Truong, Jason  and
      Cheng, Li",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.288",
    doi = "10.18653/v1/2021.emnlp-main.288",
    pages = "3552--3569",
}

Owner
no name
no name
PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

Smoothed Mutual Information ``Lower Bound'' Estimator PyTorch implementation for the ICLR 2020 paper Understanding the Limitations of Variational Mutu

50 Nov 09, 2022
Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set This is the repository for the Deep Learning proje

Robert Krug 3 Feb 06, 2022
The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao. Noisy-Labeled NER with Confidence Estimation. NAACL 2021. [arxiv]

30 Nov 12, 2022
Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration Project Page | Paper Yifan Peng*, Suyeon Choi*, Jongh

Stanford Computational Imaging Lab 19 Dec 11, 2022
image scene graph generation benchmark

Scene Graph Benchmark in PyTorch 1.7 This project is based on maskrcnn-benchmark Highlights Upgrad to pytorch 1.7 Multi-GPU training and inference Bat

Microsoft 303 Dec 27, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

MMNet This repo is the official implementation of ICCV 2021 paper "Multi-scale Matching Networks for Semantic Correspondence.". Pre-requisite conda cr

joey zhao 25 Dec 12, 2022
Sparse-dense operators implementation for Paddle

Sparse-dense operators implementation for Paddle This module implements coo, csc and csr matrix formats and their inter-ops with dense matrices. Feel

北海若 3 Dec 17, 2022
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
My 1st place solution at Kaggle Hotel-ID 2021

1st place solution at Kaggle Hotel-ID My 1st place solution at Kaggle Hotel-ID to Combat Human Trafficking 2021. https://www.kaggle.com/c/hotel-id-202

Kohei Ozaki 18 Aug 19, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

3 May 12, 2022
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Welcome to Carbon Insight Carbon Insight is a platform aiming to display the carbon neutralization roadmap for researchers, decision-makers, and other

Microsoft 14 Oct 24, 2022
ML-Ensemble – high performance ensemble learning

A Python library for high performance ensemble learning ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framew

Sebastian Flennerhag 764 Dec 31, 2022
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 05, 2022
Create images and texts with the First Order Generative Adversarial Networks

First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority

Zalando Research 35 Dec 11, 2021
FinRL­-Meta: A Universe for Data­-Driven Financial Reinforcement Learning. 🔥

FinRL-Meta: A Universe of Market Environments. FinRL-Meta is a universe of market environments for data-driven financial reinforcement learning. Users

AI4Finance Foundation 543 Jan 08, 2023
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 03, 2023
某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

elective-dataset-2021spring 某学校2021春季选课系统GIF验证码数据集(29338张) + 准确率98.4%的Baseline模型 + 上下游相关工具。 数据集采用 知识共享署名-非商业性使用 4.0 国际许可协议 进行许可。 Baseline模型和上下游相关工具采用

xmcp 27 Sep 17, 2021