Real-time Semantic Segmentation-based Depth Upsampling using Deep Learning

V. Miclea, S. Nedevschi

Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, 2018, pp. 300-306.

We propose a new real-time depth upsampling method based on convolutional neural networks (CNNs) that uses the local context provided by semantic information. Two solutions based on convolutional networks are introduced, modeled according to the level of sparsity given by the depth sensor. While first CNN upsamples data from a partial-dense input, the second one uses dilated convolutions as means to cope with sparse inputs from cost-effective depth sensors. Experiments over data extracted from Kitti dataset highlight the performance of our methods while running in real-time (11 ms for the first case and 17 ms for the second) on a regular GPU.


Multi-Object tracking of 3D cuboids using aggregated features

M.P. Muresan, S. Nedevschi

Proceedings of 2019 15th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 5-7, 2019, pp. 11-18.

The unknown correspondences of measurements and targets, referred to as data association, is one of the main challenges of multi-target tracking. Each new measurement received could be the continuation of some previously detected target, the first detection of a new target or a false alarm. Tracking 3D cuboids, is particularly difficult due to the high amount of data, which can include erroneous or noisy information coming from sensors, that can lead to false measurements, detections from an unknown number of objects which may not be consistent over frames or varying object properties like dimension and orientation. In the self-driving car context, the target tracking module holds an important role due to the fact that the ego vehicle has to make predictions regarding the position and velocity of the surrounding objects in the next time epoch, plan for actions and make the correct decisions. To tackle the above mentioned problems and other issues coming from the self-driving car processing pipeline we propose three original contributions: 1) designing a novel affinity measurement function to associate measurements and targets using multiple types of features coming from LIDAR and camera, 2) a context aware descriptor for 3D objects that improves the data association process, 3) a framework that includes a module for tracking dimensions and orientation of objects. The implemented solution runs in real time and experiments that were performed on real world urban scenarios prove that the presented method is effective and robust even in a highly dynamic environment.


Fusing semantic labeled camera images and 3D LiDAR data for the detection of urban curbs

S.E.C. Goga, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 301-308.

This article presents a new approach for detecting curbs in urban environments. It is based on the fusion between semantic labeled images obtained using a convolutional neural network and a LiDAR point cloud. Semantic information will be used in order to exploit context for the detection of urban curbs. Using only the semantic labels associated to 3D points, we will define a set of 3D ROIs in which curbs are most likely to reside, thus reducing the search space for a curb. A traditional curb detection method for the LiDAR sensor is next used to correct the previously obtained ROIs. For this, spatial features are computed and filtered in each ROI using the LiDAR’s high accuracy measurements. The proposed solution works in real time and requires few parameters tuning. It proved independent on the type of the urban road, being capable of providing good curb detection results in straight, curved and intersection shaped roads.


Semantic information based vehicle relative orientation and taillight detection

F. Vancea, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 259-264.

Vehicle taillight detection is an important topic in the fields of collision avoidance systems and autonomous vehicles. By analyzing the changes in the taillights of vehicles, the intention of the driver can be understood, which can prevent possible accidents. This paper presents a convolutional neural network architecture capable of segmenting taillight pixels by detecting vehicles and uses already computed features to segment taillights. The network is composed of a Faster RCNN that detects vehicles and classify them based their orientation relative to the camera and a subnetwork that is responsible for segmenting taillight pixels from vehicles that have their rear facing the camera. Multiple Faster RCNN configurations were trained and evaluated. This work also presents a way of adapting the ERFNet semantic segmentation architecture for the purpose of taillight extraction, object detection and classification. The networks were trained and evaluated using the KITTI object detection dataset.


Fusion Scheme for Semantic and Instance-level Segmentation

A.D. Costea, A. Petrovai, S. Nedevschi

Deep vision workshop; 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 18)

A powerful scene understanding can be achieved by combining the tasks of semantic segmentation and instance level recognition. Considering that these tasks are complementary, we propose a multi-objective fusion scheme which leverages the capabilities of each task: pixel level semantic segmentation performs well in background classification and delimiting foreground objects from background, while instance level segmentation excels in recognizing and classifying objects as a whole. We use a fully convolutional residual network together with a feature pyramid network in order to achieve both semantic segmentation and Mask R-CNN based instance level recognition. We introduce a novel fusion approach to refine the outputs of this network based on object sub-category class and instance propagation guidance by semantic segmentation for more general classes. The proposed solution achieves significant improvements in semantic object segmentation and object mask boundaries refinement at low computational costs.


Real-Time Object Detection Using a Sparse 4-Layer LIDAR

M.P. Muresan, S. Nedevschi, I. Giosan

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 317-322.

The robust detection of obstacles, on a given road path by vehicles equipped with range measurement devices represents a requirement for many research fields including autonomous driving and advanced driving assistance systems. One particular sensor system used for measurement tasks, due to its known accuracy, is the LIDAR (Light Detection and Ranging). The commercial price and computational intensiveness of such systems generally increase with the number of scanning layers. For this reason, in this paper, a novel six step based obstacle detection approach using a 4-layer LIDAR is presented. In the proposed pipeline we tackle the problem of data correction and temporal point cloud fusion and we present an original method for detecting obstacles using a combination between a polar histogram and an elevation grid. The results have been validated by using objects provided from other range measurement sensors.


An approach for segmenting 3D LiDAR data using Multi-Volume grid structures

S.E.C. Goga, S. Nedevschi

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 309-315.

This paper proposes a novel approach for segmenting and space partitioning data ofsparse 3D LiDAR point clouds for autonomous driving tasks in urban environments. Our main focus is building a compact data representation which provides enough information for an accurate segmentation algorithm. We propose the use of an extension of elevation maps for automotive driving perception tasks which is capable of dealing with both protruding and hanging objects found in urban scenes like bridges, hanging road barrier, traffic tunnels, tree branches over road surface, and so on. For this we use a MultiVolume grid representation of the environment. We apply a fast primary classifier in order to label the surface volumes as being part of the ground segment or of an object segment. Segmentation is performed on the object labeled data which is previously connected in a spatial graph structure using a height overlapping criterion. A comparison between the proposed method and the popular connected-components based segmentation method applied on an Elevation Map is performed in the end.


Online Cross-Calibration of Camera and LIDAR

B.C.Z. Blaga, S. Nedevschi

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 295-301.

In an autonomous driving system, drift can affect the sensor’s position, introducing errors in the extrinsic calibration. For this reason, we have developed a method which continuously monitors two sensors, camera, and LIDAR with 16 beams, and adjusts the value of their cross-calibration. Our algorithm, starting from correct values of the extrinsic crosscalibration parameters, can detect small sensor drift during vehicle driving, by overlapping the edges from the LIDAR over the edges from the image. The novelty of our method is that in order to obtain edges, we create a range image and filter the data from the 3D point cloud, and we use distance transform on 2D images to find edges. Another improvement we bring is applying motion correction on laser scanner data to remove distortions that appear during vehicle motion. An optimization problem on the 6 calibration parameters is defined, from which we are able to obtain the best value of the cross-calibration, and readjust it automatically. Our system performs successfully in real time, in a wide variety of scenarios, and is not affected by the speed of the car.


Multi-Task Network for Panoptic Segmentation in Automated Driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 2394-2401.

In this paper, we tackle the newly introduced panoptic segmentation task. Panoptic segmentation unifies semantic and instance segmentation and leverages the capabilities of these complementary tasks by providing pixel and instance level classification. Current state-of-the-art approaches employ either separate networks for each task or a single network for both task and post processing heuristics fuse the outputs into the final panoptic segmentation. Instead, our approach solves all three tasks including panoptic segmentation with an end-to-end learnable fully convolutional neural network. We build upon the Mask R-CNN framework with a shared backbone and individual network heads for each task. Our semantic segmentation head uses multi-scale information from the Feature Pyramid Network, while the panoptic head learns to fuse the semantic segmentation logits with variable number of instance segmentation logits. Moreover, the panoptic head refines the outputs of the network, improving the semantic segmentation results. Experimental results on the challenging Cityscapes dataset demonstrate that the proposed solution
achieves significant improvements for both panoptic segmentation and semantic segmentation.


Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

M.P. Muresan, I. Giosan, S. Nedevschi

Sensors 2020, 20, 1110; doi:10.3390/s20041110, pp. 1-33.

The stabilization and validation process of the measured position of objects is an important step for high‐level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super‐sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long‐range RADAR (Radio detection and ranging), and 4‐layer and 16‐layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super‐sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single‐layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real‐time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high‐precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.