A transformer-based method for Healthcare Image Captioning in Vietnamese

Last update: May 05, 2022

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

This repo GitHub contains our solution for vieCap4H Challenge 2021. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-based pre-trained model to obtain language presentation. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of BLEU 30.3% on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively.

Figure 1. An overview of our solution based on RSTNet

1. Data preparation

The grid features of vieCap4H can be downloaded via links below:

Dataset can be downloaded at https://aihub.vn/competitions/40 Annotations must be converted to COCO format. We have already converted and it is available at:

viecap4h-public-train.json.

2. Training

Pre-training BERT-based model with PhoBERT-based

python train_language.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of BERT-based model should be appeared in folder saved_language_models

Then, continue to train Transformer model via command below::

python train_transformer.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of Transformr-based model should be appeared in folder saved_transformer_rstnet_models

Where <images path> is data folder, <features path> is the path of grid features folder, <annotations folder> is the path of folder that contains file viecap4h-public-train.json.

3. Inference

The results can be obtained via command below:

python test_viecap.py

4. Pre-trained model

To implement our results on leaderboard, two pretrained models for BERT-based model and Transformer model can be downloaded via links below:

Updating...

A transformer-based method for Healthcare Image Captioning in Vietnamese

Related tags

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

1. Data preparation

2. Training

3. Inference

4. Pre-trained model

Owner

Doanh B C

On the Analysis of French Phonetic Idiosyncrasies for Accent Recognition

Explainer for black box models that predict molecule properties

Flow is a computational framework for deep RL and control experiments for traffic microsimulation.

Source-to-Source Debuggable Derivatives in Pure Python

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

Blender Add-On for slicing meshes with planes

Convolutional Neural Network for Text Classification in Tensorflow

A script depending on VASP output for calculating Fermi-Softness.

AdelaiDepth is an open source toolbox for monocular depth prediction.

This is the source code for: Context-aware Entity Typing in Knowledge Graphs.

MLJetReconstruction - using machine learning to reconstruct jets for CMS

AISTATS 2019: Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

PyTorch implementation of our ICCV2021 paper: StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

Controlling a game using mediapipe hand tracking

Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)