Super-sensor for 360-degree Environment Perception: Point Cloud Segmentation Using Image Features

R. Varga, A.D. Costea, H. Florea, I. Giosan, S. Nedevschi

Proceedings of 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC 2017), 16-19 Oct. 2017, Yokohama, Japan, pp. 1-8

This paper describes a super-sensor that enables 360-degree environment perception for automated vehicles in urban traffic scenarios. We use four fisheye cameras, four 360-degree LIDARs and a GPS/IMU sensor mounted on an automated vehicle to build a super-sensor that offers an enhanced low-level representation of the environment by harmonizing all the available sensor measurements. Individual sensors cannot provide a robust 360-degree perception due to their limitations: field of view, range, orientation, number of scanning rays, etc. The novelty of this work consists of segmenting the 3D LIDAR point cloud by associating it with the 2D image semantic segmentation. Another contribution is the sensor configuration that enables 360-degree environment perception. The following steps are involved in the process: calibration, timestamp synchronization, fisheye image unwarping, motion correction of LIDAR points, point cloud projection onto the images and semantic segmentation of images. The enhanced low-level representation will improve the high-level perception environment tasks such as object detection, classification and tracking.


Semantic segmentation-based stereo reconstruction with statistically improved long range accuracy

V.C. Miclea, S. Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 1795-1802

Lately stereo matching has become a key aspect in autonomous driving, providing highly accurate solutions at relatively low cost. Top approaches on state of the art benchmarks rely on learning mechanisms such as convolutional neural networks (ConvNets) to boost matching accuracy. We propose a new real-time stereo reconstruction method that uses a ConvNet for semantically segmenting the driving scene. In a ”divide and conquer” approach this segmentation enables us to split the large heterogeneous traffic scene into smaller regions with similar features. We use the segmentation results to enhance Census Transform with an optimal census mask and the SGM energy optimization step with an optimal P1 penalty for each predicted class. Additionally, we improve the sub-pixel accuracy of the stereo matching by finding optimal interpolation functions for each particular segment class. In both cases we propose new stochastic optimization methods based on genetic algorithms that can incrementally adjust the parameters for better solutions. Tests performed on Kitti and real traffic scenarios show that our method outperforms the accuracy of previous solutions.


Semi-Automatic Image Annotation of Street Scenes

Andra Petrovai, Arthur D. Costea and Sergiu Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 448-455

Scene labeling enables very sophisticated and powerful applications for autonomous driving. Training classifiers for this task would not be possible without the existence of large datasets of pixelwise labeled images. Manually annotating a large number of images is an expensive and time consuming process. In this paper, we propose a new semi-automatic annotation tool for scene labeling tailored for autonomous driving. This tool significantly reduces the effort of the annotator and also the time spent to annotate the data, while at the same time it offers the necessary features to produce precise pixel-level semantic labeling. The main contribution of our work represents the development of a complex annotation framework able to generate automatic annotations for 20 classes, which the user can control and modify accordingly. Automatic annotations are obtained in two separate ways. First, we employ a pixelwise fully-connected Conditional Random Field (CRF). Second, we perform grouping of similar neighboring superpixels based on 2D appearance and 3D information using a boosted classifier. Polygons represent the manual correction mechanism for the automatic annotations.


Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features

Arthur Daniel Costea, Robert Varga and Sergiu Nedevschi

Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 17), 21-26 July 2017, Honolulu, HI, USA, pp. 993-1002

In this paper we propose a novel boosting-based sliding window solution for object detection which can keep up with the precision of the state-of-the art deep learning approaches, while being 10 to 100 times faster. The solution takes advantage of multisensorial perception and exploits information from color, motion and depth. We introduce multimodal multiresolution filtering of signal intensity, gradient magnitude and orientation channels, in order to capture structure at multiple scales and orientations. To achieve scale invariant classification features, we analyze the effect of scale change on features for different filter types and propose a correction scheme. To improve recognition we incorporate 2D and 3D context by generating spatial, geometric and symmetrical channels. Finally, we evaluate the proposed solution on multiple benchmarks for the detection of pedestrians, cars and bicyclists. We achieve competitive results at over 25 frames per second.


A Decentralized Trust-minimized Cloud Robotics Architecture

Alessandro Simovic, Ralf Kaestner and Martin Rufli

International Conference on Intelligent Robots and Systems (IROS) 2017 – Poster Track

We introduce a novel, decentralized architecture facilitating consensual, blockchain-secured computation and verification of data/knowledge. Through the integration of (i) a decentralized content-addressable storage system, (ii) a decentralized communication and time stamping server, and (iii) a decentralized computation module, it enables a scalable, transparent, and semantically interoperable cloud robotics ecosystem, capable of powering the emerging internet of robots.

Paper (.pdf)   Poster (.pdf)

Map Management for Efficient Long-Term Visual Localization in Outdoor Environments

Mathias Buerki, Marcyn Dymczyk, Igor Gilitschenski, Cesar Cadena, Roland Siegwart, and Juan Nieto

IEEE Intelligent Vehicles Symposium (IV) 2018

We present a complete map management process for a visual localization system designed for multi-vehicle long-term operations in resource constrained outdoor environments. Outdoor visual localization generates large amounts of data that need to be incorporated into a lifelong visual map in order to allow localization at all times and under all appearance conditions. Processing these large quantities of data is nontrivial, as it is subject to limited computational and storage capabilities both on the vehicle and on the mapping back-end. We address this problem with a two-fold map update paradigm capable of, either, adding new visual cues to the map, or updating co-observation statistics. The former, in combination with offline map summarization techniques, allows enhancing the appearance coverage of the lifelong map while keeping the map size limited. On the other hand, the latter is able to significantly boost the appearance-based landmark selection for efficient online localization without incurring any additional computational or storage burden. Our evaluation in challenging outdoor conditions shows that our proposed map management process allows building and maintaining maps for precise visual localization over long time spans in a tractable and scalable fashion

pdf   video

Title = {Map Management for Efficient Long-Term Visual Localization in Outdoor Environments},
Author = {M. Buerki and M. Dymczyk and I. Gilitschenski and C. Cadena and R. Siegwart and J. Nieto},
Fullauthor = {Mathias Buerki and Marcyn Dymczyk and Igor Gilitschenski and Cesar Cadena and Roland Siegwart and Juan Nieto},
Booktitle = {{IEEE} Intelligent Vehicles Symposium ({IV})},
Month = {June},
Year = {2018},

maplab: An Open Framework for Research in Visual-inertial Mapping and Localization

Thomas Schneider, Marcin Dymczyk, Marius Fehr, Kevin Egger, Simon Lynen, Igor Gilitschenski and Roland Siegwart

IEEE Robotics and Automation Letters, 2018

Robust and accurate visual-inertial estimation is crucial to many of today’s challenges in robotics. Being able to localize against a prior map and obtain accurate and drift-free pose estimates can push the applicability of such systems even further. Most of the currently available solutions, however, either focus on a single session use-case, lack localization capabilities or an end-to-end pipeline. We believe that by combining state-of-the-art algorithms, scalable multi-session mapping tools, and a flexible user interface, we can create an efficient research platform. We believe that only a complete system, combining state-of-the-art algorithms, scalable multi-session mapping tools, and a flexible user interface, can become an efficient research platform. We therefore present maplab, an open, research-oriented visual-inertial mapping framework for processing and manipulating multi-session maps, written in C++. On the one hand, maplab can be seen as a ready-to-use visual-inertial mapping and localization system. On the other hand, maplab provides the research community with a collection of multi-session mapping tools that include map merging, visual-inertial batch optimization, and loop closure. Furthermore, it includes an online frontend that can create visual-inertial maps and also track a global drift-free pose within a localization map. In this paper, we present the system architecture, five use-cases, and evaluations of the system on public datasets. The source code of maplab is freely available for the benefit of the robotics research community.


title={maplab: An Open Framework for Research in Visual-inertial Mapping and Localization}, 
author={T. Schneider and M. T. Dymczyk and M. Fehr and K. Egger and S. Lynen and I. Gilitschenski and R. Siegwart}, 
journal={{IEEE Robotics and Automation Letters}}, 

Traffic Scene Segmentation based on Boosting over Multimodal Low, Intermediate and High Order Multi-range Channel Features

Arthur D. Costea and Sergiu Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV) June 11-14, 2017, Redondo Beach, CA, USA, pp. 74-81

In this paper we introduce a novel multimodal boosting based solution for semantic segmentation of traffic scenarios. Local structure and context are captured from both monocular color and depth modalities in the form of image channels. We define multiple channel types at three different levels: low, intermediate and high order channels. The low order channels are computed using a multimodal multiresolution filtering scheme and capture structure and color information from lower receptive fields. For the intermediate order channels, we employ deep convolutional channels that are able to capture more complex structures, having a larger receptive field. The high order channels are scale invariant channels that consist of spatial, geometric and semantic channels. These channels are enhanced by additional pyramidal context channels, capturing context at multiple levels. The semantic segmentation is achieved by a boosting based classification scheme over superpixels using multi-range channel features and pyramidal context features. A presegmentation is used to generate semantic channels as input for more powerful final segmentation. The final segmentation is refined using a superpixel-level dense CRF. The proposed solution is evaluated on the Cityscapes segmentation benchmark and achieves competitive results at low computational costs. It is the first boosting based solution that is able to keep up with the performance of deep learning based approaches.


Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid and John J. Leonard

IEEE Transactions on Robotics 32 (6) pp 1309-1332, 2016

Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors’ take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved?


 title = {Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age},
 author = {C. Cadena and L. Carlone and H. Carrillo and Y. Latif and D. Scaramuzza and J. Neira and I. Reid and J.J. Leonard},
 journal = {{IEEE Transactions on Robotics}},
 year = {2016},
 number = {6},
 pages  = {1309--1332},
 volume = {32},

Appearance-Based Landmark Selection for Efficient Long-Term Visual Localization

Mathias Buerki, Igor Gilitschenski, Elena Stumm, Roland Siegwart, and Juan Nieto

International Conference on Intelligent Robots and Systems (IROS) 2016

landmark_selectionWe present an online landmark selection method for efficient and accurate visual localization under changing appearance conditions. The wide range of conditions encountered during long-term visual localization by e.g. fleets of autonomous vehicles offers the potential exploit redundancy and reduce data usage by selecting only those visual cues which are relevant at the given time. Therefore co-observability statistics guide landmark ranking and selection, significantly reducing the amount of information used for localization while maintaining or even improving accuracy.

pdf   video

Title = {Appearance-Based Landmark Selection for Efficient Long-Term Visual Localization},
Author = {M. Buerki and I. Gilitschenski and E. Stumm and R. Siegwart and J. Nieto},
Fullauthor = {Mathias Buerki and Igor Gilitschenski and Elena Stumm and Roland Siegwart and Juan Nieto},
Booktitle = {{IEEE/RSJ} International Conference on Intelligent Robots and Systems ({IROS})},
Address = {Daejeon, Korea},
Month = {October},
Year = {2016},