Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

GPT-2 Model for Leetcode Questions in python

KoBART model on huggingface transformers

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Deep Learning Topics with Computer Vision & NLP

Data loaders and abstractions for text and NLP

Extract Keywords from sentence or Replace keywords in sentences.

Generating Korean Slogans with phonetic and structural repetition

Implementation of legal QA system based on SentenceKoBART

NLP applications using deep learning.

TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

Lattice methods in TensorFlow

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

A PyTorch implementation of VIOLET

An open source library for deep learning end-to-end dialog systems and chatbots.

2021海华AI挑战赛·中文阅读理解·技术组·第三名

Semi-automated vocabulary generation from semantic vector models