Detectron2 for Document Layout Analysis

Last update: Nov 21, 2022

Overview

Detectron2 trained on PubLayNet dataset

This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Detectron2 implementation.
PubLayNet is a very large dataset for document layout analysis (document segmentation). It can be used to trained semantic segmentation/Object detection models.

NOTE

Models are trained on a portion of the dataset (train-0.zip, train-1.zip, train-2.zip, train-3.zip)
Trained on total 191,832 images
Models are evaluated on dev.zip (~11,000 images)
Backbone pretrained on COCO dataset is used but trained from scratch on PubLayNet dataset
Trained using Nvidia GTX 1080Ti 11GB
Trained on Windows 10

Steps to test pretrained models locally or jump to next section for docker deployment

Install the latest Detectron2 from https://github.com/facebookresearch/detectron2
Copy config files (DLA_*) from this repo to the installed Detectron2
Download the relevant model from the Benchmarking section. If you have downloaded model using wget then refer https://github.com/hpanwar08/detectron2/issues/22
Add the below code in demo/demo.py in the mainto get confidence along with label names

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']

Then run below command for prediction on single image (change the config file relevant to the model)

python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

Docker Deployment

For local docker deployment for testing use Docker DLA

Benchmarking

Architecture	No. images	AP	AP50	AP75	AP Small	AP Medium	AP Large	Model size full	Model size trimmed
MaskRCNN Resnext101_32x8d FPN 3X	191,832	90.574	97.704	95.555	39.904	76.350	95.165	816M	410M
MaskRCNN Resnet101 FPN 3X	191,832	90.335	96.900	94.609	36.588	73.672	94.533	480M	240M
MaskRCNN Resnet50 FPN 3X	191,832	87.219	96.949	94.385	38.164	72.292	94.081		168M

Configuration used for training

Architecture	Config file	Training Script
MaskRCNN Resnext101_32x8d FPN 3X	configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml	./tools/train_net_dla.py
MaskRCNN Resnet101 FPN 3X	configs/DLA_mask_rcnn_R_101_FPN_3x.yaml	./tools/train_net_dla.py
MaskRCNN Resnet50 FPN 3X	configs/DLA_mask_rcnn_R_50_FPN_3x.yaml	./tools/train_net_dla.py

Some helper code and cli commands

Add the below code in demo/demo.py to get confidence along with label names

from detectron2.data import MetadataCatalog
MetadataCatalog.get("dla_val").thing_classes = ['text', 'title', 'list', 'table', 'figure']

Then run below command for prediction on single image

python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu

TODOs ⏰

Train MaskRCNN resnet50

Sample results from detectron2

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

It is powered by the PyTorch deep learning framework.
Includes more features such as panoptic segmentation, densepose, Cascade R-CNN, rotated bounding boxes, etc.
Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
It trains much faster.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Quick Start

See GETTING_STARTED.md, or the Colab Notebook.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Detectron2 for Document Layout Analysis

Related tags

Overview

Detectron2 trained on PubLayNet dataset

Steps to test pretrained models locally or jump to next section for docker deployment

Docker Deployment

Benchmarking

Configuration used for training

Some helper code and cli commands

TODOs ⏰

Sample results from detectron2

What's New

Installation

Quick Start

Model Zoo and Baselines

License

Citing Detectron

Owner

Himanshu

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM，xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... （pytorch, tf2.0）

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Repository for Multimodal AutoML Benchmark

KoCLIP: Korean port of OpenAI CLIP, in Flax

Style-based Neural Drum Synthesis with GAN inversion

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Anonymize BLM Protest Images

Tensorflow2 Keras-based Semantic Segmentation Models Implementation

(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Gradient Step Denoiser for convergent Plug-and-Play

[CVPR'21] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration

A faster pytorch implementation of faster r-cnn

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Depth-Aware Video Frame Interpolation (CVPR 2019)

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

ObjDetApp deploys a pytorch model for object detection

Expressive Power of Invariant and Equivaraint Graph Neural Networks (ICLR 2021)

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

This project is used for the paper Differentiable Programming of Isometric Tensor Network

Detectron2 for Document Layout Analysis

Related tags

Overview

Detectron2 trained on PubLayNet dataset

Steps to test pretrained models locally or jump to next section for docker deployment

Docker Deployment

Benchmarking

Configuration used for training

Some helper code and cli commands

TODOs ⏰

Sample results from detectron2

What's New

Installation

Quick Start

Model Zoo and Baselines

License

Citing Detectron

Owner

Himanshu

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM，xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... （pytorch, tf2.0）

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Repository for Multimodal AutoML Benchmark

KoCLIP: Korean port of OpenAI CLIP, in Flax

Style-based Neural Drum Synthesis with GAN inversion

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Anonymize BLM Protest Images

Tensorflow2 Keras-based Semantic Segmentation Models Implementation

(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Gradient Step Denoiser for convergent Plug-and-Play

[CVPR'21] Locally Aware Piecewise Transformation Fields for 3D Human Mesh Registration

A faster pytorch implementation of faster r-cnn

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Depth-Aware Video Frame Interpolation (CVPR 2019)

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

*ObjDetApp* deploys a pytorch model for object detection

Expressive Power of Invariant and Equivaraint Graph Neural Networks (ICLR 2021)

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

This project is used for the paper Differentiable Programming of Isometric Tensor Network

ObjDetApp deploys a pytorch model for object detection