CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

Overview

LED2-Net

This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering".

You can visit our project website and upload your own panorama to see the 3D results!

[Project Website] [Paper (arXiv)]

Prerequisite

This repo is primarily based on PyTorch. You can use the follwoing command to intall the dependencies.

pip install -r requirements.txt

Preparing Training Data

Under LED2Net/Dataset, we provide the dataloader of Matterport3D and Realtor360. The annotation formats of the two datasets follows PanoAnnotator. The detailed description of the format is explained in LayoutMP3D.

Under config/, config_mp3d.yaml and config_realtor360.yaml are the configuration file for Matterport3D and Realtor360.

Matterport3D

To train/val on Matterport3D, please modify the two items in config_mp3d.yaml.

dataset_image_path: &dataset_image_path '/path/to/image/location'
dataset_label_path: &dataset_label_path '/path/to/label/location'

The dataset_image_path and dataset_label_path follow the folder structure:

  dataset_image_path/
  |-------17DRP5sb8fy/
          |-------00ebbf3782c64d74aaf7dd39cd561175/
                  |-------color.jpg
          |-------352a92fb1f6d4b71b3aafcc74e196234/
                  |-------color.jpg
          .
          .
  |-------gTV8FGcVJC9/
          .
          .
  dataset_label_path/
  |-------mp3d_train.txt
  |-------mp3d_val.txt
  |-------mp3d_test.txt
  |-------label/
          |-------Z6MFQCViBuw_543e6efcc1e24215b18c4060255a9719_label.json
          |-------yqstnuAEVhm_f2eeae1a36f14f6cb7b934efd9becb4d_label.json
          .
          .
          .

Then run main.py and specify the config file path

python main.py --config config/config_mp3d.yaml --mode train # For training
python main.py --config config/config_mp3d.yaml --mode val # For testing

Realtor360

To train/val on Realtor360, please modify the item in config_realtor360.yaml.

dataset_path: &dataset_path '/path/to/dataset/location'

The dataset_path follows the folder structure:

  dataset_path/
  |-------train.txt
  |-------val.txt
  |-------sun360/
          |-------pano_ajxqvkaaokwnzs/
                  |-------color.png
                  |-------label.json
          .
          .
  |-------istg/
          |-------1/
                  |-------1/
                          |-------color.png
                          |-------label.json
                  |-------2/
                          |-------color.png
                          |-------label.json
                  .
                  .
          .
          .
          
  

Then run main.py and specify the config file path

python main.py --config config/config_realtor360.yaml --mode train # For training
python main.py --config config/config_realtor360.yaml --mode val # For testing

Run Inference

After finishing the training, you can use the following command to run inference on your own data (xxx.jpg or xxx.png).

python run_inference.py --config YOUR_CONFIG --src SRC_FOLDER/ --dst DST_FOLDER --ckpt XXXXX.pkl

This script will predict the layouts of all images (jpg or png) under SRC_FOLDER/ and store the results as json files under DST_FOLDER/.

Pretrained Weights

We provide the pretrained model of Realtor360 in this link.

Currently, we use DuLa-Net's post processing for inference. We will release the version using HorizonNet's post processing later.

Layout Visualization

To visualize the 3D layout, we provide the visualization tool in 360LayoutVisualizer. Please clone it and install the corresponding packages. Then, run the following command

cd 360LayoutVisualizer/
python visualizer.py --img xxxxxx.jpg --json xxxxxx.json

Citation

@misc{wang2021led2net,
      title={LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering}, 
      author={Fu-En Wang and Yu-Hsuan Yeh and Min Sun and Wei-Chen Chiu and Yi-Hsuan Tsai},
      year={2021},
      eprint={2104.00568},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Fu-En Wang
Hi, I am a member of VSLAB in National Tsing Hua University. You can check my personal website for more research projects (https://fuenwang.ml/).
Fu-En Wang
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

An Agnostic Object Detection Framework IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-q

airctic 790 Jan 05, 2023
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Use Youdao OCR API to covert your clipboard image to text.

Alfred Clipboard OCR 注:本仓库基于 oott123/alfred-clipboard-ocr 的逻辑用 Python 重写,换用了有道 AI 的 API,准确率更高,有效防止百度导致隐私泄露等问题,并且有道 AI 初始提供的 50 元体验金对于其资费而言个人用户基本可以永久使用

Junlin Liu 6 Sep 19, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

81 Jan 01, 2023
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 03, 2023
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
Lightning Fast Language Prediction 🚀

whatthelang Lightning Fast Language Prediction 🚀 Dependencies The dependencies can be installed using the requirements.txt file: $ pip install -r req

Indix 152 Oct 16, 2022
A tool to enhance your old/damaged pictures built using python & opencv.

Breathe Life into your Old Pictures Table of Contents About The Project Getting Started Prerequisites Usage Contact Acknowledgments About The Project

Shah Anwaar Khalid 5 Dec 16, 2021
[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks This is an official PyTorch code repository of the paper "Cloud Transformers:

Visual Understanding Lab @ Samsung AI Center Moscow 27 Dec 15, 2022
天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 - 第三名解决方案

天池2021"全球人工智能技术创新大赛"【赛道一】:医学影像报告异常检测 比赛链接 个人博客记录 目录结构 ├── final------------------------------------决赛方案PPT ├── preliminary_contest--------------------

19 Aug 17, 2022
Localization of thoracic abnormalities model based on VinBigData (top 1%)

Repository contains the code for 2nd place solution of VinBigData Chest X-ray Abnormalities Detection competition. The goal of competition was to auto

33 May 24, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Deskew by Marek Mauder https://galfar.vevb.net/deskew https://github.com/galfar/deskew v1.30 2019-06-07 Overview Deskew is a command line tool for des

Marek Mauder 127 Dec 03, 2022
APS 6º Semestre - UNIP (2021)

UNIP - Universidade Paulista Ciência da Computação (CC) DESENVOLVIMENTO DE UM SISTEMA COMPUTACIONAL PARA ANÁLISE E CLASSIFICAÇÃO DE FORMAS Link do git

Eduardo Talarico 5 Mar 09, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 07, 2023
OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
a Deep Learning Framework for Text

DeLFT DeLFT (Deep Learning Framework for Text) is a Keras and TensorFlow framework for text processing, focusing on sequence labelling (e.g. named ent

Patrice Lopez 350 Dec 19, 2022