First development and integration cycle of lifelong mapping
This deliverable describes the lifelong mapping framework after the first development & integration cycle. All components, notably the metric and semantic map, the metric online localization, the semantic data aggregation and the map summarization are functional and integrated on the vehicles, fulfill their basic purposes and interact with each other in a limited fashion. All components deliver first evaluation results.
R. Varga, A.D. Costea, H. Florea, I. Giosan, S. Nedevschi
Proceedings of 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC 2017), 16-19 Oct. 2017, Yokohama, Japan, pp. 1-8
This paper describes a super-sensor that enables 360-degree environment perception for automated vehicles in urban traffic scenarios. We use four fisheye cameras, four 360-degree LIDARs and a GPS/IMU sensor mounted on an automated vehicle to build a super-sensor that offers an enhanced low-level representation of the environment by harmonizing all the available sensor measurements. Individual sensors cannot provide a robust 360-degree perception due to their limitations: field of view, range, orientation, number of scanning rays, etc. The novelty of this work consists of segmenting the 3D LIDAR point cloud by associating it with the 2D image semantic segmentation. Another contribution is the sensor configuration that enables 360-degree environment perception. The following steps are involved in the process: calibration, timestamp synchronization, fisheye image unwarping, motion correction of LIDAR points, point cloud projection onto the images and semantic segmentation of images. The enhanced low-level representation will improve the high-level perception environment tasks such as object detection, classification and tracking.
V.C. Miclea, S. Nedevschi
Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 1795-1802
Lately stereo matching has become a key aspect in autonomous driving, providing highly accurate solutions at relatively low cost. Top approaches on state of the art benchmarks rely on learning mechanisms such as convolutional neural networks (ConvNets) to boost matching accuracy. We propose a new real-time stereo reconstruction method that uses a ConvNet for semantically segmenting the driving scene. In a ”divide and conquer” approach this segmentation enables us to split the large heterogeneous trafﬁc scene into smaller regions with similar features. We use the segmentation results to enhance Census Transform with an optimal census mask and the SGM energy optimization step with an optimal P1 penalty for each predicted class. Additionally, we improve the sub-pixel accuracy of the stereo matching by ﬁnding optimal interpolation functions for each particular segment class. In both cases we propose new stochastic optimization methods based on genetic algorithms that can incrementally adjust the parameters for better solutions. Tests performed on Kitti and real trafﬁc scenarios show that our method outperforms the accuracy of previous solutions.
Andra Petrovai, Arthur D. Costea and Sergiu Nedevschi
Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 448-455
Scene labeling enables very sophisticated and powerful applications for autonomous driving. Training classiﬁers for this task would not be possible without the existence of large datasets of pixelwise labeled images. Manually annotating a large number of images is an expensive and time consuming process. In this paper, we propose a new semi-automatic annotation tool for scene labeling tailored for autonomous driving. This tool signiﬁcantly reduces the effort of the annotator and also the time spent to annotate the data, while at the same time it offers the necessary features to produce precise pixel-level semantic labeling. The main contribution of our work represents the development of a complex annotation framework able to generate automatic annotations for 20 classes, which the user can control and modify accordingly. Automatic annotations are obtained in two separate ways. First, we employ a pixelwise fully-connected Conditional Random Field (CRF). Second, we perform grouping of similar neighboring superpixels based on 2D appearance and 3D information using a boosted classiﬁer. Polygons represent the manual correction mechanism for the automatic annotations.
Arthur Daniel Costea, Robert Varga and Sergiu Nedevschi
Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 17), 21-26 July 2017, Honolulu, HI, USA, pp. 993-1002
In this paper we propose a novel boosting-based sliding window solution for object detection which can keep up with the precision of the state-of-the art deep learning approaches, while being 10 to 100 times faster. The solution takes advantage of multisensorial perception and exploits information from color, motion and depth. We introduce multimodal multiresolution ﬁltering of signal intensity, gradient magnitude and orientation channels, in order to capture structure at multiple scales and orientations. To achieve scale invariant classiﬁcation features, we analyze the effect of scale change on features for different ﬁlter types and propose a correction scheme. To improve recognition we incorporate 2D and 3D context by generating spatial, geometric and symmetrical channels. Finally, we evaluate the proposed solution on multiple benchmarks for the detection of pedestrians, cars and bicyclists. We achieve competitive results at over 25 frames per second.
Arthur D. Costea and Sergiu Nedevschi
Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV) June 11-14, 2017, Redondo Beach, CA, USA, pp. 74-81
In this paper we introduce a novel multimodal boosting based solution for semantic segmentation of traffic scenarios. Local structure and context are captured from both monocular color and depth modalities in the form of image channels. We define multiple channel types at three different levels: low, intermediate and high order channels. The low order channels are computed using a multimodal multiresolution filtering scheme and capture structure and color information from lower receptive fields. For the intermediate order channels, we employ deep convolutional channels that are able to capture more complex structures, having a larger receptive field. The high order channels are scale invariant channels that consist of spatial, geometric and semantic channels. These channels are enhanced by additional pyramidal context channels, capturing context at multiple levels. The semantic segmentation is achieved by a boosting based classification scheme over superpixels using multi-range channel features and pyramidal context features. A presegmentation is used to generate semantic channels as input for more powerful final segmentation. The final segmentation is refined using a superpixel-level dense CRF. The proposed solution is evaluated on the Cityscapes segmentation benchmark and achieves competitive results at low computational costs. It is the first boosting based solution that is able to keep up with the performance of deep learning based approaches.