HistoKT: Cross Knowledge Transfer in Computational Pathology

Related tags

Deep LearningHistoKT
Overview

HistoKT: Cross Knowledge Transfer in Computational Pathology

Exciting News! HistoKT has been accepted to ICASSP 2022.

HistoKT: Cross Knowledge Transfer in Computational Pathology,
Ryan Zhang, Jiadai Zhu, Stephen Yang, Mahdi S. Hosseini, Angelo Genovese, Lina Chen, Corwyn Rowsell, Savvas Damaskinos, Sonal Varma, Konstantinos N. Plataniotis
Accepted in 2022 IEEE International Conference on Acourstics, Speech, and Signal Processing (ICASSP2022)

Overview

In computational pathology, the lack of well-annotated datasets obstructs the application of deep learning techniques. Since pathologist time is expensive, dataset curation is intrinsically difficult. Thus, many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research follows a model-centric approach, tuning network parameters to improve transfer results over few datasets. In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. First, we create a standardization workflow for aggregating existing histopathological data. We then measure inter-domain knowledge by training ResNet18 models across multiple histopathological datasets, and cross-transferring between them to determine the quantity and quality of innate shared knowledge. Additionally, we use weight distillation to share knowledge between models without additional training. We find that hard to learn, multi-class datasets benefit most from pretraining, and a two stage learning framework incorporating a large source domain such as ImageNet allows for better utilization of smaller datasets. Furthermore, we find that weight distillation enables models trained on purely histopathological features to outperform models using external natural image data.

Results

We report our transfer learning using ResNet18 results accross various datasets, with two initialization methods (random and ImageNet initialization). Each item in the matrix represents the Top-1 test accuracy of a ResNet18 model trained on the source dataset and deep-tuned on the target dataset. Items are highlighted in a colour gradient from deep red to deep green, where green represents significant accuracy improvement after tuning, and red represents accuracy decline after tuning.

No Pretraining

ImageNet Initialization

Table of Contents

Getting Started

Dependencies

  • Requirements are specified in requirements.txt
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
backcall==0.2.0
bleach==3.3.0
cffi==1.14.5
colorama==0.4.4
cycler==0.10.0
decorator==4.4.2
defusedxml==0.7.1
entrypoints==0.3
et-xmlfile==1.1.0
h5py==3.2.1
imageio==2.9.0
ipykernel==5.5.4
ipython==7.23.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.0
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
kiwisolver==1.3.1
MarkupSafe==2.0.0
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.3.0
numpy==1.20.3
openpyxl==3.0.7
packaging==20.9
pandas==1.2.4
pandocfilters==1.4.3
parso==0.8.2
pickleshare==0.7.5
Pillow==8.2.0
prometheus-client==0.10.1
prompt-toolkit==3.0.18
pyaml==20.4.0
pycparser==2.20
Pygments==2.9.0
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2021.1
PyWavelets==1.1.1
pywin32==300
pywinpty==0.5.7
PyYAML==5.4.1
pyzmq==22.0.3
qtconsole==5.1.0
QtPy==1.9.0
scikit-image==0.18.1
scikit-learn==0.24.2
scipy==1.6.3
Send2Trash==1.5.0
six==1.16.0
sklearn==0.0
terminado==0.9.5
testpath==0.4.4
threadpoolctl==2.1.0
tifffile==2021.4.8
torch==1.8.1+cu102
torchaudio==0.8.1
torchvision==0.9.1+cu102
tornado==6.1
traitlets==5.0.5
typing-extensions==3.10.0.0
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1

Running the Code

This codebase was created in collaboration with the RMSGD repository. As such, much of the training pipeline is shared.

Downloading datasets

All available datasets can be found on their respective websites. Some datasets, such as ADP, are available by request.

A list of all datasets used in this paper can be found below:

Preprocessing and Training

To prepare datasets for training, please use the functions found in dataset_processing\standardize_datasets.py after downloading all the datasets and placing them all in one folder.

cd HistoKT/dataset_processing
python standardize_datasets.py

A standardized version of each dataset will be created in the dataset folder.

To run the code for training, use the src/adas/train.py file:

cd HistoKT
python src/adas/train.py --config CONFIG --data DATA_FOLDER

Options for Training

--config CONFIG       Set configuration file path: Default = 'configAdas.yaml'
--data DATA           Set data directory path: Default = '.adas-data'
--output OUTPUT       Set output directory path: Default = '.adas-output'
--checkpoint CHECKPOINT
                    Set checkpoint directory path: Default = '.adas-checkpoint'
--resume RESUME       Set checkpoint resume path: Default = None
--pretrained_model PRETRAINED_MODEL
                    Set checkpoint pretrained model path: Default = None
--freeze_encoder FREEZE_ENCODER
                    Set if to freeze encoder for post training: Default = True
--root ROOT           Set root path of project that parents all others: Default = '.'
--save-freq SAVE_FREQ
                    Checkpoint epoch save frequency: Default = 25
--cpu                 Flag: CPU bound training: Default = False
--gpu GPU             GPU id to use: Default = 0
--multiprocessing-distributed
                    Use multi-processing distributed training to launch N processes per node, which has N GPUs. This is the fastest way to use PyTorch for either   
                    single node or multi node data parallel training: Default = False
--dist-url DIST_URL   url used to set up distributed training:Default = 'tcp://127.0.0.1:23456'
--dist-backend DIST_BACKEND
                    distributed backend: Default = 'nccl'
--world-size WORLD_SIZE
                    Number of nodes for distributed training: Default = -1
--rank RANK           Node rank for distributed training: Default = -1
--color_aug COLOR_AUG
                    override config color augmentation, can also choose "no_aug"
--norm_vals NORM_VALS
                    override normalization values, use dataset string. e.g. "BACH_transformed"

Training Output

All training output will be saved to the OUTPUT_PATH location. After a full experiment, results will be recorded in the following format:

  • OUTPUT
    • Timestamped xlsx sheet with the record of train and validation (notated as test) acc, loss, and rank metrics for each layer in the network (refer to AdaS)
  • CHECKPOINT
    • checkpoint dictionaries with a snapshot of the model's parameters at a given epoch.

Code Organization

Configs

We provide sample configuration files for ResNet18 over all used datasets in configs\NewPretrainingConfigs

These configs were used for training the model on each dataset from random initialization.

All available options can be found in the config files.

Visualization

We provide sample code to plot training curves in Plots

We provide sample code on using the statistical method t-SNE to visualize the high-dimensional features in T-sne.

We provide sample code on using the visual explanation algorithm Grad-CAM heat-maps in gradCAM.

Version History

  • 0.1
    • Initial Release
Owner
Mahdi S. Hosseini
Assistant Professor in ECE Department at University of New Brunswick. My research interests cover broad topics in Machine Learning and Computer Vision problems
Mahdi S. Hosseini
Efficient Online Bayesian Inference for Neural Bandits

Efficient Online Bayesian Inference for Neural Bandits By Gerardo Durán-Martín, Aleyna Kara, and Kevin Murphy AISTATS 2022.

Probabilistic machine learning 49 Dec 27, 2022
Code for AutoNL on ImageNet (CVPR2020)

Neural Architecture Search for Lightweight Non-Local Networks This repository contains the code for CVPR 2020 paper Neural Architecture Search for Lig

Yingwei Li 104 Aug 31, 2022
A Temporal Extension Library for PyTorch Geometric

Documentation | External Resources | Datasets PyTorch Geometric Temporal is a temporal (dynamic) extension library for PyTorch Geometric. The library

Benedek Rozemberczki 1.9k Jan 07, 2023
Malware Env for OpenAI Gym

Malware Env for OpenAI Gym Citing If you use this code in a publication please cite the following paper: Hyrum S. Anderson, Anant Kharkar, Bobby Fila

ENDGAME 563 Dec 29, 2022
Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting

Guided Internet-delivered Cognitive Behavioral Therapy Adherence Forecasting #Dataset The folder "Dataset" contains the dataset use in this work and m

0 Jan 08, 2022
Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

UTNet (Accepted at MICCAI 2021) Official implementation of UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation Introduction Transf

110 Jan 01, 2023
Static-test - A playground to play with ideas related to testing the comparability of the code

Static test playground ⚠️ The code is just an experiment. Compiles and runs on U

Igor Bogoslavskyi 4 Feb 18, 2022
IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID,

Intermediate Domain Module (IDM) This repository is the official implementation for IDM: An Intermediate Domain Module for Domain Adaptive Person Re-I

Yongxing Dai 87 Nov 22, 2022
Código de um painel de auto atendimento feito em Python.

Painel de Auto-Atendimento O intuito desse projeto era fazer em Python um programa que simulasse um painel de auto atendimento, no maior estilo Mac Do

Calebe Alves Evangelista 2 Nov 09, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 01, 2023
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Facebook Research 338 Dec 29, 2022
Implementation of Online Label Smoothing in PyTorch

Online Label Smoothing Pytorch implementation of Online Label Smoothing (OLS) presented in Delving Deep into Label Smoothing. Introduction As the abst

83 Dec 14, 2022
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing

Notice: Support for Python 3.6 will be dropped in v.0.2.1, please plan accordingly! Efficient and Scalable Physics-Informed Deep Learning Collocation-

tensordiffeq 74 Dec 09, 2022
A PyTorch implementation of "Graph Wavelet Neural Network" (ICLR 2019)

Graph Wavelet Neural Network ⠀⠀ A PyTorch implementation of Graph Wavelet Neural Network (ICLR 2019). Abstract We present graph wavelet neural network

Benedek Rozemberczki 490 Dec 16, 2022
Complex Answer Generation For Conversational Search Systems.

Complex Answer Generation For Conversational Search Systems. Code for Does Structure Matter? Leveraging Data-to-Text Generation for Answering Complex

Hanane Djeddal 0 Dec 06, 2021
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
MNIST, but with Bezier curves instead of pixels

bezier-mnist This is a work-in-progress vector version of the MNIST dataset. Samples Here are some samples from the training set. Note that, while the

Alex Nichol 15 Jan 16, 2022
🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Conditional Motion In-Betweening (CMIB) Official implementation of paper: Conditional Motion In-betweeening. Paper(arXiv) | Project Page | YouTube in-

Jihoon Kim 81 Dec 22, 2022
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022