Multi-Object tracking of 3D cuboids using aggregated features

M.P. Muresan, S. Nedevschi

Proceedings of 2019 15th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 5-7, 2019, pp. 11-18.

The unknown correspondences of measurements and targets, referred to as data association, is one of the main challenges of multi-target tracking. Each new measurement received could be the continuation of some previously detected target, the first detection of a new target or a false alarm. Tracking 3D cuboids, is particularly difficult due to the high amount of data, which can include erroneous or noisy information coming from sensors, that can lead to false measurements, detections from an unknown number of objects which may not be consistent over frames or varying object properties like dimension and orientation. In the self-driving car context, the target tracking module holds an important role due to the fact that the ego vehicle has to make predictions regarding the position and velocity of the surrounding objects in the next time epoch, plan for actions and make the correct decisions. To tackle the above mentioned problems and other issues coming from the self-driving car processing pipeline we propose three original contributions: 1) designing a novel affinity measurement function to associate measurements and targets using multiple types of features coming from LIDAR and camera, 2) a context aware descriptor for 3D objects that improves the data association process, 3) a framework that includes a module for tracking dimensions and orientation of objects. The implemented solution runs in real time and experiments that were performed on real world urban scenarios prove that the presented method is effective and robust even in a highly dynamic environment.


Multi-Task Network for Panoptic Segmentation in Automated Driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 2394-2401.

In this paper, we tackle the newly introduced panoptic segmentation task. Panoptic segmentation unifies semantic and instance segmentation and leverages the capabilities of these complementary tasks by providing pixel and instance level classification. Current state-of-the-art approaches employ either separate networks for each task or a single network for both task and post processing heuristics fuse the outputs into the final panoptic segmentation. Instead, our approach solves all three tasks including panoptic segmentation with an end-to-end learnable fully convolutional neural network. We build upon the Mask R-CNN framework with a shared backbone and individual network heads for each task. Our semantic segmentation head uses multi-scale information from the Feature Pyramid Network, while the panoptic head learns to fuse the semantic segmentation logits with variable number of instance segmentation logits. Moreover, the panoptic head refines the outputs of the network, improving the semantic segmentation results. Experimental results on the challenging Cityscapes dataset demonstrate that the proposed solution
achieves significant improvements for both panoptic segmentation and semantic segmentation.


Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

M.P. Muresan, I. Giosan, S. Nedevschi

Sensors 2020, 20, 1110; doi:10.3390/s20041110, pp. 1-33.

The stabilization and validation process of the measured position of objects is an important step for high‐level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super‐sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long‐range RADAR (Radio detection and ranging), and 4‐layer and 16‐layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super‐sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single‐layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real‐time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high‐precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.


Curb Detection in Urban Traffic Scenarios Using LiDARs Point Cloud and Semantically Segmented Color Images

S.E.C. Deac, I. Giosan, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 3433-3440.

In this paper we propose a robust curb detection method which is based on the fusion between semantically labeled camera images and a 3D point cloud coming from LiDAR sensors. The labels from the semantically enhanced cloud are used to reduce the curbs’ searching area. Several spatial cues are next computed on each candidate curb region. Based on these features, a candidate curb region is either rejected or refined for obtaining a precise positioning of the curb points found inside it. A novel local model-based outlier removal algorithm is proposed to filter out the erroneous curb points. Finally, a temporal integration of the detected curb points in multiple consecutive frames is used to densify the detection result. An objective evaluation of the proposed solution is done using a highresolution digital map containing ground truth curb points. The proposed system has proved capable of detecting curbs of any heights (from 3cm up to 30cm) in complex urban road scenarios (straight roads, curved roads, intersections with traffic isles and roundabouts).


Real-Time Semantic Segmentation-Based Stereo Reconstruction

V.C. Miclea, S. Nedevschi

IEEE Transactions on Intelligent Transportation Systems (Early Access), pp. 1-11, 2019, DOI: 10.1109/TITS.2019.2913883.

In this paper, we propose a novel semantic segmentation-based stereo reconstruction method that can keep up with the accuracy of the state-of-the art approaches while running in real time. The solution follows the classic stereo pipeline, each step in the stereo workflow being enhanced by additional information from semantic segmentation. Therefore, we introduce several improvements to computation, aggregation, and optimization by adapting existing techniques to integrate additional surface information given by each semantic class. For the cost computation and optimization steps, we propose new genetic algorithms that can incrementally adjust the parameters for better solutions. Furthermore, we propose a new postprocessing edge-aware filtering technique relying on an improved convolutional neural network (CNN) architecture for disparity refinement. We obtain the competitive results at 30 frames/s, including segmentation.


Efficient instance and semantic segmentation for automated driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Vehicles Symposium (IV 2019), Paris, France, 9 – 12 June, 2019, pp. 2575-2581.

Environment perception for automated vehicles is achieved by fusing the outputs of different sensors such as cameras, LIDARs and RADARs. Images provide a semantic understanding of the environment at object level using instance segmentation, but also at background level using semantic segmentation. We propose a fully convolutional residual network based on Mask R-CNN to achieve both semantic and instance level recognition. We aim at developing an efficient network that could run in real-time for automated driving applications without compromising accuracy. Moreover, we compare and experiment with two different backbone architectures, a classification type of network and a faster segmentation type of network based on dilated convolutions. Experiments demonstrate top results on the publicly available Cityscapes dataset.


Appearance-Based Landmark Selection for Visual Localization

Mathias Bürki, Cesar Cadena, Igor Gilitschenski, Roland Siegwart and Juan Nieto

Journal of Fields Robotics (JFR) 2019

Visual localization in outdoor environments is subject to varying appearance conditions rendering it difficult to match current camera images against a previously recorded map. Although it is possible to extend the respective maps to allow precise localization across a wide range of differing appearance conditions, these maps quickly grow in size and become impractical to handle on a mobile robotic platform. To address this problem, we present a landmark selection algorithm that exploits appearance co‐observability for efficient visual localization in outdoor environments. Based on the appearance condition inferred from recently observed landmarks, a small fraction of landmarks useful under the current appearance condition is selected and used for localization. This allows to greatly reduce the bandwidth consumption between the mobile platform and a map backend in a shared‐map scenario, and significantly lowers the demands on the computational resources on said mobile platform. We derive a landmark ranking function that exhibits high performance under vastly changing appearance conditions and is agnostic to the distribution of landmarks across the different map sessions. Furthermore, we relate and compare our proposed appearance‐based landmark ranking function to popular ranking schemes from information retrieval, and validate our results on the challenging University of Michigan North Campus long‐term vision and LIDAR data sets (NCLT), including an evaluation of the localization accuracy using ground‐truth poses. In addition to that, we investigate the computational and bandwidth resource demands. Our results show that by selecting 20–30% of landmarks using our proposed approach, a similar localization performance as the baseline strategy using all landmarks is achieved.


 title = {Appearance-Based Landmark Selection for Visual Localization},
 author = {M. Buerki and C. Cadena and I. Gilitschenski and R. Siegwart and Juan Nieto},
 fullauthor ={Buerki, Mathias and Cadena, Cesar and Gilitschenski, Igor and Siegwart, Roland and Nieto, Juan},
 journal = {{Journal of Fields Robotics}},
 year = {2019},
 volume = {6},
 number = {6},
 pages  = {1041--1073},

OREOS: Oriented Recognition of 3D Point Clouds in Outdoor Scenarios

Lukas Schaupp, Mathias Buerki, Renaud Dube, Roland Siegwart, and Cesar Cadena

IEEE/RJS Int. Conference on Intelligent RObots and Systems (IROS) 2019

We introduce a novel method for oriented place recognition with 3D LiDAR scans. A Convolutional Neural Network is trained to extract compact descriptors from single 3D LiDAR scans. These can be used both to retrieve near-by place candidates from a map, and to estimate the yaw discrepancy needed for bootstrapping local registration methods. We employ a triplet loss function for training and use a hard negative mining strategy to further increase the performance of our descriptor extractor. In an evaluation on the NCLT and KITTI datasets, we demonstrate that our method outperforms related state-of-the-art approaches based on both data-driven and handcrafted data representation in challenging long-term outdoor conditions.

pdf   video

Title = {Map Management for Efficient Long-Term Visual Localization in Outdoor Environments},
Author = {L. Schaupp and M. Buerki and R. Dube and R. Siegwart and C. Cadena},
Fullauthor = {Lukas Schaupp and Mathias Buerki and Renaud Dube and Roland Siegwart and Cesar Cadena},
Booktitle = {{IEEE/RJS} Int. Conference on Intelligent RObots and Systems ({IROS})},
Month = {November},
Year = {2019},

VIZARD: Reliable Visual Localization for Autonomous Vehicles in Urban Outdoor Environments

Mathias Buerki, Lukas Schaupp, Marcyn Dymczyk, Renaud Dube, Cesar Cadena, Roland Siegwart, and Juan Nieto

IEEE Intelligent Vehicles Symposium (IV) 2019

Changes in appearance is one of the main sources of failure in visual localization systems in outdoor environments. To address this challenge, we present VIZARD, a visual localization system for urban outdoor environments. By combining a local localization algorithm with the use of multi-session maps, a high localization recall can be achieved across vastly different appearance conditions. The fusion of the visual localization constraints with wheel-odometry in a state estimation framework further guarantees smooth and accurate pose estimates. In an extensive experimental evaluation on several hundreds of driving kilometers in challenging urban outdoor environments, we analyze the recall and accuracy of our localization system, investigate its key parameters and boundary conditions, and compare different types of feature descriptors. Our results show that VIZARD is able to achieve nearly 100% recall with a localization accuracy below 0.5m under varying outdoor appearance conditions, including at night-time.

pdf   video

Title = {Map Management for Efficient Long-Term Visual Localization in Outdoor Environments},
Author = {M. Buerki and L. Schaupp and M. Dymczyk and R. Dube and C. Cadena and R. Siegwart and J. Nieto},
Fullauthor = {Mathias Buerki and Lukas Schaupp and Marcyn Dymczyk and Renaud Dube and Cesar Cadena and Roland Siegwart and Juan Nieto},
Booktitle = {{IEEE} Intelligent Vehicles Symposium ({IV})},
Month = {June},
Year = {2019},

Object Classification Based on Unsupervised Learned Multi-Modal Features for Overcoming Sensor Failures

Julia Nitsch, Juan Nieto, Roland Siegwart, Max Schmidt, and Cesar Cadena

IEEE International Conference on Robotics and Automation (ICRA) 2019

For autonomous driving applications it is critical to know which type of road users and road side infrastructure are present to plan driving manoeuvres accordingly. Therefore autonomous cars are equipped with different sensor modalities to robustly perceive its environment. However, for classification modules based on machine learning techniques it is challenging to overcome unseen sensor noise. This work presents an object classification module operating on unsupervised learned multi-modal features with the ability to overcome gradual or total sensor failure. A two stage approach composed of an unsupervised feature training and a uni-modal and multimodal classifiers training is presented. We propose a simple but effective decision module switching between uni-modal and multi-modal classifiers based on the closeness in the feature space to the training data. Evaluations on the ModelNet 40 data set show that the proposed approach has a 14% accuracy gain compared to a late fusion approach operating on a noisy point cloud data and a 6% accuracy gain when operating on noisy image data.


Title = {Object Classification Based on Unsupervised Learned Multi-Modal Features for Overcoming Sensor Failures},
Author = {J. Nitsch and J. Nieto and R. Siegwart and M. Schmidt and C. Cadena},
Fullauthor = {Julia Nitsch and Juan Nieto and Roland Siegwart and Max Schmidt and Cesar Cadena},
Booktitle = {{IEEE} International Conference on Robotics and Automation ({ICRA})},
Month = {May},
Year = {2019},