Detecting Vehicles and Pedestrians With 3D Object Detection on LiDAR Point Clouds.
This article was initially published on Medium and edited on March 10th.
What do you see when you close your eyes and imagine a bulldozer?
The first thing that comes to mind for most people is the large, heavy machine with its massive blade at the front. But what if I told you that there was a different way of seeing bulldozers? What if I showed you a picture of one without its blade? You would probably still be able to identify it like a bulldozer, even though it looks very different! This is because your brain has the ability to create 3D perception from 2D images.
LiDAR technology works similarly — it uses lasers to create 3D perceptions of objects, which can then be used for things like object detection and mapping. I explain more in detail how a LiDAR scanner works in my article on early fusion.
This article explores how to process a point cloud from LiDAR to detect obstacles in 3D in the context of self-driving cars.
LiDAR is an essential sensor for autonomous driving because it can estimate distances accurately. Combined with other sensors such as cameras through sensor fusion, we can build more accurate perception systems for autonomous vehicles.
This article will only consider a lidar-based 3D object detection approach.
In the last few years, more and more research has been done to answer the question of accurately detecting a 3D object with the help of deep neural networks.
In this article, we will talk about unsupervised machine learning. While not as accurate as state-of-the-art neural networks, it is an excellent way to understand how to work with point clouds.
We use LiDAR point clouds from the Pandaset dataset by Hesai and Scale AI, an open-source dataset for autonomous driving recorded in the Silicon Valley area.
The scene is made of a sequence of point clouds concatenated together that we will analyze.
For ease of visualization, here is what data look like from the front camera.
We can also visualize the exact localization and plot the journey on Google map.
You can access the Pandaset data here and the associated pandaset-devkithere, which I used to concatenate the LiDAR frames and plot the route on Google Map.
Next, let’s visualize and process the LiDAR point clouds with Open3d. Open 3D is a convenient library to work with 3D point clouds.
LiDAR Point Clouds Visualization and Voxel Downsampling
First, let’s visualize the LiDAR frames before processing the point clouds with a technique called voxel downsampling.
LiDAR Point Clouds Visualization
First, let’s visualize the 3D point clouds from the LiDAR frames. In Pandaset, the LiDAR point cloud data are aggregated from a Rotating LiDAR on the roof and a forward-facing solid-state LiDAR.
In this case, the 3D data points are colored in blue. The intensity of the color refers to the timestamp associated with each point cloud. The last 3D frames have a more intense blue.
LiDAR Point Cloud Preprocessing With Voxel Downsampling
Next, we need to preprocess the 3D point cloud with a standard technique called Voxel downsampling.
The goal is to reduce the number of points while maintaining the object’s structure with a voxel grid.
Source: Vossman; M. W. Toews, CC BY-SA 2.5, via Wikimedia Commons
You can think of a voxel as a 3D pixel shaped like a cube.
To help you visualize better, the image below represents a voxelized Lamborghini, where each 3D point is a voxel.
Source: “voxel lambo” by Philippe Put is marked with CC BY 2.0.
Voxel downsampling is a technique to reduce the number of 3D data points by leaving only one point per cube. As a result, our object detection algorithm will be more accurate by simplifying the 3D structures and reducing data noise.
Since we keep only one 3D point per cube, the bigger the cube, the more points are removed.
1# Voxel Hyperparameters: voxel_size = 0.202Output:3Points before downsampling: 27380474Points after downsampling: 389849
After downsampling, we have seven times fewer points, but we still can clearly see the structure of the obstacles.
LiDAR Point Cloud Segmentation and Clustering
Next, we will use unsupervised machine learning techniques to do 3D LiDAR object detection. First, we will further reduce the number of points with an outlier detection algorithm to segment the driveable area in 3D.
Second, we will build a 3D object detection with a clustering algorithm.
RANSAC Driveable Area Segmentation in 3D
First, we perform driveable area segmentation in 3D to only keep the data that interest us. In other words, we process the 3D point clouds to separate the ground from the obstacles, such as vehicles, bicycles, and pedestrians.
Driveable area segmentation can be done with an algorithm called RANSAC, which stands for RANdom SAmple Consensus. This algorithm detects outliers and can be done in 2D or 3D, depending on the use case.
In this case, we define the ground points as inliers and the points related to obstacles as outliers. This is because the road has the highest amount of points.
Within a specific distance D between the points and an infinite plane X, Y, Z, these points are called inliers, while those beyond D are considered outliers.
RANSAC is a three steps process as follow:
- Take 3 points at random
- Fit a plane in the image
- Calculate the distance between the plane and the points.
This is an iterative process until the optimum value is found. The right plane will fit the most significant number of points.
Source:2D RANSAC Animation — Haldir, Public domain, via Wikimedia CommonThe visualization above shows the RANSAC algorithm running in 2D. It works similarly with 3D data. Therefore, the hyperparameters are the number of iterations and the distance threshold.
1# RANSAC Hyperparameters2# distance_threshold = 0.33# ransac_n = 34# number_iterations = 200
The 3D frame above shows the road in red after segmentation and the rest in blue. Next, we detect individual obstacles through clustering.
LiDAR Point Cloud Clustering With DBSCAN
Once the segmentation of the driveable area is done, we need to detect individual obstacles with a clustering algorithm.
Here, we use a DBSCAN ( Density-Based Spatial Clustering of Applications with Noise) clustering algorithm to build clusters of 3D objects.
Because K-MEANS outputs spheres as clusters, this is not a suitable algorithm for self-driving cars. In this case, obstacles have arbitrary shapes, making DBSCAN a better choice.
Hyperparameters will be epsilon ε which represents a radius of neighborhood around a point, and n will be a minimum number of points.
This algorithm follows a three-step process as follow:
- Take a point
- Find the neighbors within a distance epsilon ε
- Expand if the number of points is above n to see the neighbors.
The algorithm stops once the distance is too big and returns one cluster. We repeat the same process to find the following clusters.
1# DBSCAN Hyperparameters: eps=0.45, min_points=102Output:3[Open3D DEBUG] Done Compute Clusters: 1132
It is important to note that it is an unsupervised algorithm, and therefore, the clustering algorithm can only differentiate obstacles based on the density of points in a particular area.
The point cloud above shows clusters identified by the algorithm. I excluded the road from the final point clouds visualization for the sake of clarity.
You can also read about a deep learning approach of 3D classification and segmentation with PointNet, here.
LiDAR 3D Object Detection With Bounding boxes
The final step is to add the 3D bounding boxes surrounding the LiDAR objects. For this, we take the labels and group them together.
We also need to set the interval of 3D data points used to output the bounding boxes. In this case, I choose that bounding boxes will contain objects between 50 and 1000 data points.
1# Bounding boxes parameters2MIN_POINTS = 503MAX_POINTS = 1000
Looking at the frame above, we notice that the result lacks precision and contains a high number of false positives. Also, the clusters are built indistinctly without consideration of the nature of the object.
However, it is a quick approach to build a 3D object detection algorithm and can be used as a baseline. Also, if accuracy is not critical, this approach can still be viable and worth exploring depending on the use case. This result can be improved with a deep learning algorithm.
Closing Thoughts on 3D perception with Unsupervised Machine Learning
We explored how to process a 3D scene from concatenated LiDAR point clouds and 3D object detection with machine learning. However, as we mentioned earlier, the result is far from perfect, but it can be used as an initial baseline model as it is quick to prototype.
One way to improve the above result is by doing 3D object detection with a neural network. I will share more about it in a future article.
I learned the above technique through ThinkAutonomous.ai’s excellent course on 3D point clouds, which I used to prepare this article.
If you are interested in autonomous driving and computer vision, you can join my Newsletter and receive your FREE Starter-Kit here.
Also, feel free to browse my other articles here.