This article was originally published on Medium, on 8 January 2022.
Different sensors can detect the same obstacle with different pieces of information about it. Sensor fusion is the process of fusing the results together to improve the detection and reduce the uncertainties of each sensor individually.
We can think of it as a union of sensors.
In a previous article, I explained the process of early sensor fusion with LiDAR and cameras. In early sensor fusion, the fusion is done on raw data before object detection. In late sensor fusion, the data are fused after object detection has been done individually on each sensor.
Late Sensor Fusion is a five steps process as follow:
- Camera object detection
- LiDAR object detection
- Project 3D object onto the image
- Fuse LiDAR 3D boxes and the camera 2D boxes
- Build a fused object
Data Aggregation
We use the KITTI Vision Benchmark Suite dataset similar to my previous article on early sensor fusion, where you can also find more information on this dataset.
3D and 2D Object Detection From Visual Sensors
The first step is to detect 2D objects from the camera and 3D objects from the LiDAR.
Detect 2D Obstacles From Cameras
2D YOLO Object Detection on Image — Image by author
Like early sensor fusion, we use YOLO to detect objects in 2D on the camera image. I explain how it works in my early sensor fusion article.
3D Object Detection on Point Cloud
Next, we need to perform 3D Object detection on the LiDAR Point Cloud. I explain one way to do it in my article on 3D Perception and another one in my article on 3D Deep Learning.
Once the 3D objects have been detected in the X, Y, Z plane, we need to build a 3D Bounding box (W, H, L) and orient it accordingly.
3D Bounding Box in X, Y, Z plane — Image by author
The output from the two previous steps will be two lists of 2D and 3D objects. Next, we need to project the 2D and 3D objects on the image.
Projection of LiDAR Objects onto The Image
Project 3D Boxes of LiDAR Objects Onto The Image
3D boxes of LiDAR Objects on Image — Image by author
Then we need to project the liDAR points representing the bounding box on the image. This process is similar to projecting the 3D point cloud onto the image in early sensor fusion.
However, in late sensor fusion, we only project the vertices of the 3D bounding box from LiDAR onto the image. 2D Lidar Object Onto the Image
2D boxes LiDAR Objects on Image — Image by author
Next, we get the 2D bounding boxes from LiDAR by taking the min and the max of the 3D bounding box. Fusion of the 3D Bounding Boxes (LiDAR) with the 2D Bounding Boxes (Camera) Now that object detection is done on LiDAR and Camera; we need to fuse the results together.
LiDAR and Cameras Objects Onto the Image
LiDAR and Cameras Objects on Image — Image by author
The image above shows the LiDAR and the camera boxes on the image. But, their number is different. Which bounding boxes match together?
To find the correct bounding box, we can use the concept of Intersection Over Union along with the Hungarian algorithm. Find the Matching Boxes with Intersection Over Union (IOU) and Hungarian Algorithm. We use the Intersection Over Union to find the right match between the bounding boxes.
Intersection and Union — Image by author
Above is a quick reminder of the difference between the intersection and the union of the bounding boxes. Next, we divide the intersection by the union, which will output an IOU value between 0 and 1.
Intersection Over Union — Image by author
The higher the IOU value, the greater the match between the bounding boxes. The output will be a matrix of IOU values that will feed the Hungarian algorithm. Because we have different IOU value matching for different detected objects, we need to know which one to assign, as illustrated in the table below.
Examples of IOU values — Image by author
The table above contains some random values of IOU values to illustrate the assignment problem. The Hungarian algorithm can solve this problem to find the best matches.
Finally, we build a fused object for every match, which gives the final result below. Here, we kept the LiDAR label boxes (red boxes without tags), and we notice that several obstacles are not seen by the camera, possibly because there is an obstacle obstructing the view. Cameras are good sensors, but they are easily affected by conditions such as visibility, lighting, shadows, or reflections.
Late Sensor Fusion Image 1 — Image by author
Late Sensor Fusion Image 2— Image by author
Closing Thoughts on Late Sensor Fusion
In this article, I explained the process of late sensor fusion between LiDAR and cameras. We learned:
- how to detect 2D obstacles on the camera image
- how to project 3D LiDAR objects on a 2D image
- How to fuse the obstacles together
While this implementation works, there are some edge cases to be aware of; for example:
- What if the matching bounding boxes are, in fact, two different obstacles?
- What if there are two bounding boxes inside another one?
Those are only a few examples of problems that can occur during the process of sensor fusion. This article has been prepared with resources from ThinkAutonomous.ai.