Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation

M.P. Muresan, I. Giosan, S. Nedevschi

Sensors 2020, 20, 1110; doi:10.3390/s20041110, pp. 1-33.

The stabilization and validation process of the measured position of objects is an important step for high‐level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super‐sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long‐range RADAR (Radio detection and ranging), and 4‐layer and 16‐layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super‐sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single‐layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real‐time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high‐precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.


Real-Time Semantic Segmentation-Based Stereo Reconstruction

V.C. Miclea, S. Nedevschi

IEEE Transactions on Intelligent Transportation Systems (Early Access), pp. 1-11, 2019, DOI: 10.1109/TITS.2019.2913883.

In this paper, we propose a novel semantic segmentation-based stereo reconstruction method that can keep up with the accuracy of the state-of-the art approaches while running in real time. The solution follows the classic stereo pipeline, each step in the stereo workflow being enhanced by additional information from semantic segmentation. Therefore, we introduce several improvements to computation, aggregation, and optimization by adapting existing techniques to integrate additional surface information given by each semantic class. For the cost computation and optimization steps, we propose new genetic algorithms that can incrementally adjust the parameters for better solutions. Furthermore, we propose a new postprocessing edge-aware filtering technique relying on an improved convolutional neural network (CNN) architecture for disparity refinement. We obtain the competitive results at 30 frames/s, including segmentation.


Appearance-Based Landmark Selection for Visual Localization

Mathias Bürki, Cesar Cadena, Igor Gilitschenski, Roland Siegwart and Juan Nieto

Journal of Fields Robotics (JFR) 2019

Visual localization in outdoor environments is subject to varying appearance conditions rendering it difficult to match current camera images against a previously recorded map. Although it is possible to extend the respective maps to allow precise localization across a wide range of differing appearance conditions, these maps quickly grow in size and become impractical to handle on a mobile robotic platform. To address this problem, we present a landmark selection algorithm that exploits appearance co‐observability for efficient visual localization in outdoor environments. Based on the appearance condition inferred from recently observed landmarks, a small fraction of landmarks useful under the current appearance condition is selected and used for localization. This allows to greatly reduce the bandwidth consumption between the mobile platform and a map backend in a shared‐map scenario, and significantly lowers the demands on the computational resources on said mobile platform. We derive a landmark ranking function that exhibits high performance under vastly changing appearance conditions and is agnostic to the distribution of landmarks across the different map sessions. Furthermore, we relate and compare our proposed appearance‐based landmark ranking function to popular ranking schemes from information retrieval, and validate our results on the challenging University of Michigan North Campus long‐term vision and LIDAR data sets (NCLT), including an evaluation of the localization accuracy using ground‐truth poses. In addition to that, we investigate the computational and bandwidth resource demands. Our results show that by selecting 20–30% of landmarks using our proposed approach, a similar localization performance as the baseline strategy using all landmarks is achieved.


 title = {Appearance-Based Landmark Selection for Visual Localization},
 author = {M. Buerki and C. Cadena and I. Gilitschenski and R. Siegwart and Juan Nieto},
 fullauthor ={Buerki, Mathias and Cadena, Cesar and Gilitschenski, Igor and Siegwart, Roland and Nieto, Juan},
 journal = {{Journal of Fields Robotics}},
 year = {2019},
 volume = {6},
 number = {6},
 pages  = {1041--1073},

SegMap: Segment-based Mapping and Localization using Data-driven Descriptors

Renaud Dube, Andrei Cramariuc1, Daniel Dugas, Hannes Sommer, Marcin Dymczyk, Juan Nieto, Roland Siegwart, and Cesar Cadena

International Journal of Robotics Research (IJRR) 2019

Precisely estimating a robot’s pose in a prior, global map is a fundamental capability for mobile robotics, e.g. autonomous driving or exploration in disaster zones. This task, however, remains challenging in unstructured, dynamic environments, where local features are not discriminative enough and global scene descriptors only provide coarse information. We therefore present SegMap: a map representation solution for localization and mapping based on the extraction of segments in 3D point clouds. Working at the level of segments offers increased invariance to view-point and local structural changes, and facilitates real-time processing of large-scale 3D data. SegMap exploits a single compact data-driven descriptor for performing multiple tasks: global localization, 3D dense map reconstruction, and semantic information extraction. The performance of SegMap is evaluated in multiple urban driving and search and rescue experiments. We show that the learned SegMap descriptor has superior segment retrieval capabilities, compared to state-of-the-art handcrafted descriptors. In consequence, we achieve a higher localization accuracy and a 6% increase in recall over state-of-the-art. These segment-based localizations allow us to reduce the open-loop odometry drift by up to 50%. SegMap is open-source available along with easy to run demonstrations.


 title = {{SegMap}: Segment-based Mapping and Localization using Data-driven Descriptors},
 author = {R. Dube and A. Cramariuc and D. Dugas and H. Sommer and M. Dymczyk and J. Nieto and R. Siegwart and C. Cadena},
 fullauthor ={Renaud Dube and Andrei Cramariuc and Daniel Dugas and Hannes Sommer and Marcin Dymczyk and Juan Nieto and Roland Siegwart and Cesar Cadena},
 journal = {{International Journal of Robotics Research}},
 year = {2019},
 volume = {XX},
 number = {X},
 pages  = {1--16},

Multiple Hypothesis Semantic Mapping for Robust Data Association

Lukas Bernreiter, Abel Gawel, Hannes Sommer, Juan Nieto, Roland Siegwart and Cesar Cadena

IEEE Robotics and Automation Letters, 2019

We present a semantic mapping approach with multiple hypothesis tracking for data association. As semantic information has the potential to overcome ambiguity in measurements and place recognition, it forms an eminent modality for autonomous systems. This is particularly evident in urban scenarios with several similar-looking surroundings. Nevertheless, it requires the handling of a non-Gaussian and discrete random variable coming from object detectors. Previous methods facilitate semantic information for global localization and data association to reduce the instance ambiguity between the landmarks. However, many of these approaches do not deal with the creation of completely globally consistent representations of the environment and typically do not scale well. We utilize multiple hypothesis trees to derive a probabilistic data association for semantic measurements by means of position, instance, and class to create a semantic representation. We propose an optimized mapping method and make use of a pose graph to derive a novel semantic SLAM solution. Furthermore, we show that semantic covisibility graphs allow for a precise place recognition in urban environments. We verify our approach using real-world outdoor dataset and demonstrate an average drift reduction of 33% w.r.t. the raw odometry source. Moreover, our approach produces 55% less hypotheses on average than a regular multiple hypothesis approach.


title={Multiple Hypothesis Semantic Mapping for Robust Data Association}, 
author={L. {Bernreiter} and A. {Gawel} and H. {Sommer} and J. {Nieto} and R. {Siegwart} and C. {Cadena}}, 
journal={{IEEE Robotics and Automation Letters}}, 

maplab: An Open Framework for Research in Visual-inertial Mapping and Localization

Thomas Schneider, Marcin Dymczyk, Marius Fehr, Kevin Egger, Simon Lynen, Igor Gilitschenski and Roland Siegwart

IEEE Robotics and Automation Letters, 2018

Robust and accurate visual-inertial estimation is crucial to many of today’s challenges in robotics. Being able to localize against a prior map and obtain accurate and drift-free pose estimates can push the applicability of such systems even further. Most of the currently available solutions, however, either focus on a single session use-case, lack localization capabilities or an end-to-end pipeline. We believe that by combining state-of-the-art algorithms, scalable multi-session mapping tools, and a flexible user interface, we can create an efficient research platform. We believe that only a complete system, combining state-of-the-art algorithms, scalable multi-session mapping tools, and a flexible user interface, can become an efficient research platform. We therefore present maplab, an open, research-oriented visual-inertial mapping framework for processing and manipulating multi-session maps, written in C++. On the one hand, maplab can be seen as a ready-to-use visual-inertial mapping and localization system. On the other hand, maplab provides the research community with a collection of multi-session mapping tools that include map merging, visual-inertial batch optimization, and loop closure. Furthermore, it includes an online frontend that can create visual-inertial maps and also track a global drift-free pose within a localization map. In this paper, we present the system architecture, five use-cases, and evaluations of the system on public datasets. The source code of maplab is freely available for the benefit of the robotics research community.


title={maplab: An Open Framework for Research in Visual-inertial Mapping and Localization}, 
author={T. Schneider and M. T. Dymczyk and M. Fehr and K. Egger and S. Lynen and I. Gilitschenski and R. Siegwart}, 
journal={{IEEE Robotics and Automation Letters}}, 

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid and John J. Leonard

IEEE Transactions on Robotics 32 (6) pp 1309-1332, 2016

Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors’ take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved?


 title = {Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age},
 author = {C. Cadena and L. Carlone and H. Carrillo and Y. Latif and D. Scaramuzza and J. Neira and I. Reid and J.J. Leonard},
 journal = {{IEEE Transactions on Robotics}},
 year = {2016},
 number = {6},
 pages  = {1309--1332},
 volume = {32},