Autonomous Perception: 3D Object Detection with Complex-YOLO

Overview

Autonomous Perception: 3D Object Detection with Complex-YOLO

Gif of 50 frames of darknet

LiDAR object detection with Complex-YOLO takes four steps:

  1. Computing LiDAR point-clouds from range images.
  2. Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library (PCL).
  3. Using both Complex-YOLO Darknet and Resnet to predict 3D dectections on transformed LiDAR images.
  4. Evaluating the detections based Precision and Recall.

Complex-Yolo Pipeline

Complex-Yolo is both highly accurate and highly performant in production:

Complex-Yolo Performance

Computing LiDAR Point-Clouds from Waymo Range Images

Waymo uses multiple sensors including LiDAR, cameras, radar for autonomous perception. Even microphones are used to help detect ambulance and police sirens.

Visualizing LiDAR Range and Intensity Channels

LiDAR visualization 1

Roof-mounted "Top" LiDAR rotates 360 degrees with a vertical field of vision or ~20 degrees (-17.6 degrees to +2.4 degrees) with a 75m limit in the dataset.

LiDAR data is stored as a range image in the Waymo Open Dataset. Using OpenCV and NumPy, we filtered the "range" and "intensity" channels from the image, and converted the float data to 8-bit unsigned integers. Below is a visualization of two video frames, where the top half is the range channel, and the bottom half is the intensity for each visualization:

LiDAR visualization 2

Visualizing th LiDAR Point-cloud

There are 64 LEDs in Waymo's top LiDAR sensor. Limitations of 360 LiDAR include the space between beams (aka resolution) widening with distance from the origin. Also the car chasis will create blind spots, creating the need for Perimeter LiDAR sensors to be inlcuded on the sides of the vehicles.

We leveraged the Open3D library to make a 3D interactive visualization of the LiDAR point-cloud. Commonly visible features are windshields, tires, and mirros within 40m. Beyond 40m, cars are like slightly rounded rectangles where you might be able to make ou the windshield. Further away vehicles and extremely close vehicles typically have lower resolution, as well as vehicles obstructing the detection of other vehicles.

10 Vehicles Showing Different Types of LiDAR Interaction:

  1. Truck with trailer - most of truck is high resolution visible, but part of the trailer is in the 360 LiDAR's blind-spot.
  2. Car partial in blind spot, back-half isn't picked up well. This car blocks the larges area behind it from being detected by the LiDAR.
  3. Car shape is higly visible, where you can even see the side-mirrors and the LiDAR passing through the windshield.
  4. Car driving in other lane. You can see the resolution of the car being lower because the further away the 64 LEDs project the lasers, the futher apart the points of the cloud will be. It is also obstructed from some lasers by Car 2.
  5. This parked is unobstructed, but far enough away where it's difficult to make our the mirrors or the tires.
  6. Comparing this car to Car 3, you can see where most of the definition is either there or slightly worse, because it is further way.
  7. Car 7 is both far away and obstructed, so you can barely tell it's a car. It's basically a box with probably a windshield.
  8. Car 8 is similar to Car 6 on the right side, but obstructed by Car 6 on the left side.
  9. Car 9 is at the limit of the LiDAR's dataset's perception. It's hard to tell it's a car.
  10. Car 10 is at the limit of the LiDAR's perception, and is also obstructed by car 8.

Transforming the point-cloud to a Bird's Eye View using the Point Cloud Library

Convert sensor coordinates to Bird's-Eye View map coordinates

The birds-eye view (BEV) of a LiDAR point-cloud is based on the transformation of the x and y coordinates of the points.

BEV map properties:

  • Height:

    H_{i,j} = max(P_{i,j} \cdot [0,0,1]T)

  • Intensity:

    I_{i,j} = max(I(P_{i,j}))

  • Density:

    D_{i,j} = min(1.0,\ \frac{log(N+1)}{64})

P_{i,j} is the set of points that falls into each cell, with i,j as the respective cell coordinates. N_{i,j} refers to the number of points in a cell.

Compute intensity layer of the BEV map

We created a BEV map of the "intensity" channel from the point-cloud data. We identified the top-most (max height) point with the same (x,y)-coordinates from the point-cloud, and assign the intensity value to the corresponding BEV map point. The data was normalized and outliers were removed until the features of interest were clearly visible.

Compute height layer of the BEV map

This is a visualization of the "height" channel BEV map. We sorted and pruned point-cloud data, normalizing the height in each BEV map pixel by the difference between max. and min.

Model-based Object Detection in BEV Image

We used YOLO3 and Resnet deep-learning models to doe 3D Object Detection. Complex-YOLO: Real-time 3D Object Detection on Point Clouds and Super Fast and Accurate 3D Object Detection based on 3D LiDAR Point Clouds.

Extract 3D bounding boxes from model response

The models take a three-channel BEV map as an input, and predict the class about coordinates of objects (vehicles). We then transformed these BEV coordinates back to the vehicle coordinate-space to draw the bounding boxes in both images.

Transforming back to vehicle space

Below is a gif the of detections in action: Results from 50 frames of resnet detection

Performance Evaluation for Object Detection

Compute intersection-over-union between labels and detections

Based on the labels within the Waymo Open Dataset, your task is to compute the geometrical overlap between the bounding boxes of labels and detected objects and determine the percentage of this overlap in relation to the area of the bounding boxes. A default method in the literature to arrive at this value is called intersection over union, which is what you will need to implement in this task.

After detections are made, we need a set of metrics to measure our progress. Common classification metrics for object detection include:

TP, FN, FP

  • TP: True Positive - Predicts vehicle or other object is there correctly
  • TN: True Negative - Correctly predicts vehicle or object is not present
  • FP: False Positive - Dectects object class incorrectly
  • FN: False Negative - Didn't detect object class when there should be a dectection

One popular method of making these determinations is measuring the geometric overlap of bounding boxes vs the total area two predicted bounding boxes take up in an image, or th Intersecion over Union (IoU).

IoU formula

IoU for Complex-Yolo

Classification Metrics Based on Precision and Recall

After all the LiDAR and Camera data has been transformed, and the detections have been predicted, we calculate the following metrics for the bounding box predictions:

Formulas

  • Precision:

    \frac{TP}{TP + FP}

  • Recall:

    \frac{TP}{TP + FN}

  • Accuracy:

    \frac{TP + TN}{TP + TN + FP + FN}

  • Mean Average Precision:

    \frac{1}{n} \sum_{Recall_{i}}Precision(Recall_{i})

Precision and Recall Results Visualizations

Results from 50 frames: Results from 50 frames

Precision: .954 Recall: .921

Complex Yolo Paper

Owner
Thomas Dunlap
Machine Learning Engineer and Data Scientist with a focus on deep learning, computer vision, and robotics.
Thomas Dunlap
UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems

[ICLR 2021] "UMEC: Unified Model and Embedding Compression for Efficient Recommendation Systems" by Jiayi Shen, Haotao Wang*, Shupeng Gui*, Jianchao Tan, Zhangyang Wang, and Ji Liu

VITA 39 Dec 03, 2022
Molecular Sets (MOSES): A benchmarking platform for molecular generation models

Molecular Sets (MOSES): A benchmarking platform for molecular generation models Deep generative models are rapidly becoming popular for the discovery

Neelesh C A 3 Oct 14, 2022
Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

Jingyun Liang 159 Dec 30, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 05, 2023
SplineConv implementation for Paddle.

SplineConv implementation for Paddle This module implements the SplineConv operators from Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Mül

北海若 3 Dec 29, 2021
Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022
Official repo for SemanticGAN https://nv-tlabs.github.io/semanticGAN/

SemanticGAN This is the official code for: Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalizat

151 Dec 28, 2022
Self-supervised Label Augmentation via Input Transformations (ICML 2020)

Self-supervised Label Augmentation via Input Transformations Authors: Hankook Lee, Sung Ju Hwang, Jinwoo Shin (KAIST) Accepted to ICML 2020 Install de

hankook 96 Dec 29, 2022
Franka Emika Panda manipulator kinematics&dynamics simulation

pybullet_sim_panda Pybullet simulation environment for Franka Emika Panda Dependency pybullet, numpy, spatial_math_mini Simple example (please check s

0 Jan 20, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on

Su Pang 254 Dec 16, 2022
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
Official and maintained implementation of the paper "OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data" [BMVC 2021].

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data Christoph Reich, Tim Prangemeier, Özdemir Cetin & Heinz Koeppl | Pr

Christoph Reich 23 Sep 21, 2022
A task Provided by A respective Artenal Ai and Ml based Company to complete it

A task Provided by A respective Alternal Ai and Ml based Company to complete it .

Parth Madan 1 Jan 25, 2022
Text to Image Generation with Semantic-Spatial Aware GAN

text2image This repository includes the implementation for Text to Image Generation with Semantic-Spatial Aware GAN This repo is not completely. Netwo

CVDDL 124 Dec 30, 2022
Some toy examples of score matching algorithms written in PyTorch

toy_gradlogp This repo implements some toy examples of the following score matching algorithms in PyTorch: ssm-vr: sliced score matching with variance

Ending Hsiao 21 Dec 26, 2022
A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A Broad Study on the Transferability of Visual Representations with Contrastive Learning This repository contains code for the paper: A Broad Study on

Ashraful Islam 29 Nov 09, 2022
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in

Tim 0 Dec 09, 2021
RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining Our code is based on Learning Attention-based Embed

宋朝都 4 Aug 07, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
An implementation for the loss function proposed in Decoupled Contrastive Loss paper.

Decoupled-Contrastive-Learning This repository is an implementation for the loss function proposed in Decoupled Contrastive Loss paper. Requirements P

Ramin Nakhli 71 Dec 04, 2022