Multi-Task Network for Panoptic Segmentation in Automated Driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 2394-2401.

In this paper, we tackle the newly introduced panoptic segmentation task. Panoptic segmentation unifies semantic and instance segmentation and leverages the capabilities of these complementary tasks by providing pixel and instance level classification. Current state-of-the-art approaches employ either separate networks for each task or a single network for both task and post processing heuristics fuse the outputs into the final panoptic segmentation. Instead, our approach solves all three tasks including panoptic segmentation with an end-to-end learnable fully convolutional neural network. We build upon the Mask R-CNN framework with a shared backbone and individual network heads for each task. Our semantic segmentation head uses multi-scale information from the Feature Pyramid Network, while the panoptic head learns to fuse the semantic segmentation logits with variable number of instance segmentation logits. Moreover, the panoptic head refines the outputs of the network, improving the semantic segmentation results. Experimental results on the challenging Cityscapes dataset demonstrate that the proposed solution
achieves significant improvements for both panoptic segmentation and semantic segmentation.


Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

M.P. Muresan, I. Giosan, S. Nedevschi

Sensors 2020, 20, 1110; doi:10.3390/s20041110, pp. 1-33.

The stabilization and validation process of the measured position of objects is an important step for high‐level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super‐sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long‐range RADAR (Radio detection and ranging), and 4‐layer and 16‐layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super‐sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single‐layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real‐time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high‐precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.


Curb Detection in Urban Traffic Scenarios Using LiDARs Point Cloud and Semantically Segmented Color Images

S.E.C. Deac, I. Giosan, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 3433-3440.

In this paper we propose a robust curb detection method which is based on the fusion between semantically labeled camera images and a 3D point cloud coming from LiDAR sensors. The labels from the semantically enhanced cloud are used to reduce the curbs’ searching area. Several spatial cues are next computed on each candidate curb region. Based on these features, a candidate curb region is either rejected or refined for obtaining a precise positioning of the curb points found inside it. A novel local model-based outlier removal algorithm is proposed to filter out the erroneous curb points. Finally, a temporal integration of the detected curb points in multiple consecutive frames is used to densify the detection result. An objective evaluation of the proposed solution is done using a highresolution digital map containing ground truth curb points. The proposed system has proved capable of detecting curbs of any heights (from 3cm up to 30cm) in complex urban road scenarios (straight roads, curved roads, intersections with traffic isles and roundabouts).


Real-Time Semantic Segmentation-Based Stereo Reconstruction

V.C. Miclea, S. Nedevschi

IEEE Transactions on Intelligent Transportation Systems (Early Access), pp. 1-11, 2019, DOI: 10.1109/TITS.2019.2913883.

In this paper, we propose a novel semantic segmentation-based stereo reconstruction method that can keep up with the accuracy of the state-of-the art approaches while running in real time. The solution follows the classic stereo pipeline, each step in the stereo workflow being enhanced by additional information from semantic segmentation. Therefore, we introduce several improvements to computation, aggregation, and optimization by adapting existing techniques to integrate additional surface information given by each semantic class. For the cost computation and optimization steps, we propose new genetic algorithms that can incrementally adjust the parameters for better solutions. Furthermore, we propose a new postprocessing edge-aware filtering technique relying on an improved convolutional neural network (CNN) architecture for disparity refinement. We obtain the competitive results at 30 frames/s, including segmentation.


Efficient instance and semantic segmentation for automated driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Vehicles Symposium (IV 2019), Paris, France, 9 – 12 June, 2019, pp. 2575-2581.

Environment perception for automated vehicles is achieved by fusing the outputs of different sensors such as cameras, LIDARs and RADARs. Images provide a semantic understanding of the environment at object level using instance segmentation, but also at background level using semantic segmentation. We propose a fully convolutional residual network based on Mask R-CNN to achieve both semantic and instance level recognition. We aim at developing an efficient network that could run in real-time for automated driving applications without compromising accuracy. Moreover, we compare and experiment with two different backbone architectures, a classification type of network and a faster segmentation type of network based on dilated convolutions. Experiments demonstrate top results on the publicly available Cityscapes dataset.


Environment Perception Architecture using Images and 3D Data

H. Florea, R. Varga, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 223-228.

This paper discusses the architecture of an environment perception system for autonomous vehicles. The modules of the system are described briefly and we focus on important changes in the architecture that enable: decoupling of data acquisition from data processing; synchronous data processing; parallel computation on GPU and multiple CPU cores; efficient data passing using pointers; adaptive architecture capable of working with different number of sensors. The experimental results compare execution times before and after the proposed optimizations. We achieve a 10 Hz frame rate for an object detection system working with 4 cameras and 4 LIDAR point clouds.


A Fast RANSAC Based Approach for Computing the Orientation of Obstacles in Traffic Scenes

F. Oniga, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 209-214.

A low complexity approach for computing the orientation of 3D obstacles, detected from lidar data, is proposed in this paper. The proposed method takes as input obstacles represented as cuboids without orientation (aligned with the reference frame). Each cuboid contains a cluster of obstacle locations (discrete grid cells). First, for each obstacle, the boundaries that are visible for the perception system are selected. A model consisting of two perpendicular lines is fitted to the set of boundary cells, one for each presumed visible side. The main dominant line is computed with a RANSAC approach. Then, the second line is searched, using a constraint of perpendicularity on the dominant line. The existence of the second line is used to validate the orientation. Finally, additional criteria are proposed to select the best orientation based on the free area of the cuboid (on top view) that is visible to the perception system.


Real-Time Stereo Reconstruction Failure Detection and Correction Using Deep Learning

V.C. Miclea, S. Nedevschi, L. Miclea

Proceedings of 2018 IEEE Intelligent Transportation Systems Conference (ITSC), Maui, Hawaii, USA, November 4-7, 2018, pp. 1095-1102.

This paper introduces a stereo reconstruction method that besides producing accurate results in real-time, is capable to detect and conceal possible failures caused by one of the cameras. A classification of stereo camera sensor faults is initially introduced, the most common types of defects being highlighted. We next present a stereo camera failure detection method in which various additional checks are being introduced, with respect to the aforementioned error classification. Furthermore, we propose a novel error correction method based on CNNs (convolutional neural networks) that is capable of generating reliable disparity maps by using prior information provided by semantic segmentation in conjunction with the last available disparity. We highlight the efficiency of our approach by evaluating its performance in various driving scenarios and show that it produces accurate disparities on images from Kitti stereo and raw datasets while running in real-time on a regular GPU.


Appearance-Based Landmark Selection for Visual Localization

Mathias Bürki, Cesar Cadena, Igor Gilitschenski, Roland Siegwart and Juan Nieto

Journal of Fields Robotics (JFR) 2019

Visual localization in outdoor environments is subject to varying appearance conditions rendering it difficult to match current camera images against a previously recorded map. Although it is possible to extend the respective maps to allow precise localization across a wide range of differing appearance conditions, these maps quickly grow in size and become impractical to handle on a mobile robotic platform. To address this problem, we present a landmark selection algorithm that exploits appearance co‐observability for efficient visual localization in outdoor environments. Based on the appearance condition inferred from recently observed landmarks, a small fraction of landmarks useful under the current appearance condition is selected and used for localization. This allows to greatly reduce the bandwidth consumption between the mobile platform and a map backend in a shared‐map scenario, and significantly lowers the demands on the computational resources on said mobile platform. We derive a landmark ranking function that exhibits high performance under vastly changing appearance conditions and is agnostic to the distribution of landmarks across the different map sessions. Furthermore, we relate and compare our proposed appearance‐based landmark ranking function to popular ranking schemes from information retrieval, and validate our results on the challenging University of Michigan North Campus long‐term vision and LIDAR data sets (NCLT), including an evaluation of the localization accuracy using ground‐truth poses. In addition to that, we investigate the computational and bandwidth resource demands. Our results show that by selecting 20–30% of landmarks using our proposed approach, a similar localization performance as the baseline strategy using all landmarks is achieved.


 title = {Appearance-Based Landmark Selection for Visual Localization},
 author = {M. Buerki and C. Cadena and I. Gilitschenski and R. Siegwart and Juan Nieto},
 fullauthor ={Buerki, Mathias and Cadena, Cesar and Gilitschenski, Igor and Siegwart, Roland and Nieto, Juan},
 journal = {{Journal of Fields Robotics}},
 year = {2019},
 volume = {6},
 number = {6},
 pages  = {1041--1073},

OREOS: Oriented Recognition of 3D Point Clouds in Outdoor Scenarios

Lukas Schaupp, Mathias Buerki, Renaud Dube, Roland Siegwart, and Cesar Cadena

IEEE/RJS Int. Conference on Intelligent RObots and Systems (IROS) 2019

We introduce a novel method for oriented place recognition with 3D LiDAR scans. A Convolutional Neural Network is trained to extract compact descriptors from single 3D LiDAR scans. These can be used both to retrieve near-by place candidates from a map, and to estimate the yaw discrepancy needed for bootstrapping local registration methods. We employ a triplet loss function for training and use a hard negative mining strategy to further increase the performance of our descriptor extractor. In an evaluation on the NCLT and KITTI datasets, we demonstrate that our method outperforms related state-of-the-art approaches based on both data-driven and handcrafted data representation in challenging long-term outdoor conditions.

pdf   video

Title = {Map Management for Efficient Long-Term Visual Localization in Outdoor Environments},
Author = {L. Schaupp and M. Buerki and R. Dube and R. Siegwart and C. Cadena},
Fullauthor = {Lukas Schaupp and Mathias Buerki and Renaud Dube and Roland Siegwart and Cesar Cadena},
Booktitle = {{IEEE/RJS} Int. Conference on Intelligent RObots and Systems ({IROS})},
Month = {November},
Year = {2019},