ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Last update: Jan 05, 2023

Related tags

Overview

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.

ARKitScenes_screen_720p.mov

Paper

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

upon using these data or source code, please cite

@inproceedings{
dehghan2021arkitscenes,
title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=tjZjv_qh_CE}
}

Overview

ARKitScenes is not only the first RGB-D dataset that is captured with now widely available depth sensor, but also is the largest indoor scene understanding data ever collected. In addition to the raw and processed data, ARKitScenes includes high resolution depth maps captured using a stationary laser scanner, as well as manually labeled 3D oriented bounding boxes for a large taxonomy of furniture. We further provide helper scripts for two downstream tasks: 3D object detection and RGB-D guided upsampling. We hope that our dataset can help push the boundaries of existing state-of-the-art methods and introduce new challenges that better represent real world scenarios.

Key features

• ARKitScenes is the first RGB-D dataset captured with the widely available Apple LiDAR scanner. Along with the raw data we provide the camera pose and surface reconstruction for each scene.

• ARKitScenes is the largest indoor 3D dataset consisting of 5,047 captures of 1,661 unique scenes.

• We provide high quality ground truth of (a) registered RGB-D frames and (b) oriented bounding boxes of room defining objects.

Below is an overview of RGB-D datasets and their ground truth assets compared with ARKitScenes. HR and LR represent High Resolution and Low Resolution respectively, and are available for a subset of 2,257 captures of 841 unique scenes.

Data collection

In the figure below, we provide (a) illustration of iPad Pro scanning set up. (b) mesh overlay to assist data collection with iPad Pro. (c) example of one of the scan patterns captured with the iPad pro, the red markers show the chosen locations of the stationary laser scanner in that room.

Data download

To download the data please follow the data documentation

Tasks

Here we provide the two tasks mentioned in our paper, namely, 3D Object Detection (3DOD) and depth upsampling.

3DOD

Depth upsampling

License

The ARKitScenes dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/. For queries regarding a commercial license, contact [email protected] If you have any other questions raise an issue in the repository and contact [email protected]

ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

Related tags

Overview

ARKitScenes

Paper

Overview

Key features

Data collection

Data download

Tasks

3DOD

Depth upsampling

License

Owner

Apple

Unofficial implementation of PatchCore anomaly detection

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

Unofficial implementation of the Involution operation from CVPR 2021

A benchmark framework for Tensorflow

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

PowerGridworld: A Framework for Multi-Agent Reinforcement Learning in Power Systems

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Deploy recommendation engines with Edge Computing

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.