Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.

Citation

If you use ssbaseline in your work, please cite:

@article{zhu2020simple,
  title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
  author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
  journal={arXiv preprint arXiv:2012.05153},
  year={2020}
}

Installation

First install the repo using

git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop

Getting Data

We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.

Datasets	ImDBs	Object Faster R-CNN Features	OCR Faster R-CNN Features	OCR Recog-CNN Features
TextVQA	TextVQA ImDB	Open Images	TextVQA SBD-Trans OCRs	TextVQA SBD-Trans OCRs
ST-VQA	ST-VQA ImDB	ST-VQA Objects	ST-VQA SBD-Trans OCRs	ST-VQA SBD-Trans OCRs

Pretrained Models

We release the following pretrained models for ssbaseline on TextVQA.

For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.

Datasets	Config Files (under `configs/vqa/`)	Pretrained Models	Metrics	Notes
TextVQA (`m4c_textvqa`)	`m4c_textvqa/m4c_with_stvqa.yml`	`ssbaseline_with_stvqa`	val accuracy - 45.53%; test accuracy - 45.66%	SBD-Trans OCRs; ST-VQA as additional data

Training and Evaluation

Please follow the M4C README for the training and evaluation of the M4C model on each dataset.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Related tags

Overview

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Citation

Installation

Getting Data

Pretrained Models

Training and Evaluation

Owner

ZephyrZhuQi

GitHub repository for "Improving Video Generation for Multi-functional Applications"

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Demo project for real time anomaly detection using kafka and python

[SIGMETRICS 2022] One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

PyTorch deep learning projects made easy.

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

This is a vision-based 3d model manipulation and control UI

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

CLOOB training (JAX) and inference (JAX and PyTorch)

Pytorch implementation of Zero-DCE++

This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Sleep staging from ECG, assisted with EEG

Unsupervised Foreground Extraction via Deep Region Competition