Fusion Scheme for Semantic and Instance-level Segmentation

Arthur Daniel Costea, Andra Petrovai, Sergiu Nedevschi

Accepted for publication in Proceedings of 2018 IEEE 21th International Conference on Intelligent Transportation Systems (ITSC 2018), 4-7 Nov. 2018, Maui, Hawaii, USA

A powerful scene understanding can be achieved by combining the tasks of semantic segmentation and instance level recognition. Considering that these tasks are complementary, we propose a multi-objective fusion scheme which leverages the capabilities of each task: pixel level semantic segmentation performs well in background classification and delimiting foreground objects from background, while instance level segmentation excels in recognizing and classifying objects as a whole. We use a fully convolutional residual network together with a feature pyramid network in order to achieve both semantic segmentation and Mask R-CNN based instance level recognition. We introduce a novel heuristic fusion approach for panoptic segmentation. The instance and semantic segmentation output of the network is fused into a panoptic segmentation based on object sub-category class and instance propagation guidance by semantic segmentation for more general classes. The proposed solution achieves significant improvements in semantic object segmentation and object mask boundaries refinement at low computational costs.

Super-sensor for 360-degree Environment Perception: Point Cloud Segmentation Using Image Features

R. Varga, A.D. Costea, H. Florea, I. Giosan, S. Nedevschi

Proceedings of 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC 2017), 16-19 Oct. 2017, Yokohama, Japan, pp. 1-8

This paper describes a super-sensor that enables 360-degree environment perception for automated vehicles in urban traffic scenarios. We use four fisheye cameras, four 360-degree LIDARs and a GPS/IMU sensor mounted on an automated vehicle to build a super-sensor that offers an enhanced low-level representation of the environment by harmonizing all the available sensor measurements. Individual sensors cannot provide a robust 360-degree perception due to their limitations: field of view, range, orientation, number of scanning rays, etc. The novelty of this work consists of segmenting the 3D LIDAR point cloud by associating it with the 2D image semantic segmentation. Another contribution is the sensor configuration that enables 360-degree environment perception. The following steps are involved in the process: calibration, timestamp synchronization, fisheye image unwarping, motion correction of LIDAR points, point cloud projection onto the images and semantic segmentation of images. The enhanced low-level representation will improve the high-level perception environment tasks such as object detection, classification and tracking.

pdf

Semantic segmentation-based stereo reconstruction with statistically improved long range accuracy

V.C. Miclea, S. Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 1795-1802

Lately stereo matching has become a key aspect in autonomous driving, providing highly accurate solutions at relatively low cost. Top approaches on state of the art benchmarks rely on learning mechanisms such as convolutional neural networks (ConvNets) to boost matching accuracy. We propose a new real-time stereo reconstruction method that uses a ConvNet for semantically segmenting the driving scene. In a ”divide and conquer” approach this segmentation enables us to split the large heterogeneous traffic scene into smaller regions with similar features. We use the segmentation results to enhance Census Transform with an optimal census mask and the SGM energy optimization step with an optimal P1 penalty for each predicted class. Additionally, we improve the sub-pixel accuracy of the stereo matching by finding optimal interpolation functions for each particular segment class. In both cases we propose new stochastic optimization methods based on genetic algorithms that can incrementally adjust the parameters for better solutions. Tests performed on Kitti and real traffic scenarios show that our method outperforms the accuracy of previous solutions.

pdf

Semi-Automatic Image Annotation of Street Scenes

Andra Petrovai, Arthur D. Costea and Sergiu Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV 17), 11-14 June 2017, Los Angeles, CA, USA, pp. 448-455

Scene labeling enables very sophisticated and powerful applications for autonomous driving. Training classifiers for this task would not be possible without the existence of large datasets of pixelwise labeled images. Manually annotating a large number of images is an expensive and time consuming process. In this paper, we propose a new semi-automatic annotation tool for scene labeling tailored for autonomous driving. This tool significantly reduces the effort of the annotator and also the time spent to annotate the data, while at the same time it offers the necessary features to produce precise pixel-level semantic labeling. The main contribution of our work represents the development of a complex annotation framework able to generate automatic annotations for 20 classes, which the user can control and modify accordingly. Automatic annotations are obtained in two separate ways. First, we employ a pixelwise fully-connected Conditional Random Field (CRF). Second, we perform grouping of similar neighboring superpixels based on 2D appearance and 3D information using a boosted classifier. Polygons represent the manual correction mechanism for the automatic annotations.

pdf

Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features

Arthur Daniel Costea, Robert Varga and Sergiu Nedevschi

Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 17), 21-26 July 2017, Honolulu, HI, USA, pp. 993-1002

In this paper we propose a novel boosting-based sliding window solution for object detection which can keep up with the precision of the state-of-the art deep learning approaches, while being 10 to 100 times faster. The solution takes advantage of multisensorial perception and exploits information from color, motion and depth. We introduce multimodal multiresolution filtering of signal intensity, gradient magnitude and orientation channels, in order to capture structure at multiple scales and orientations. To achieve scale invariant classification features, we analyze the effect of scale change on features for different filter types and propose a correction scheme. To improve recognition we incorporate 2D and 3D context by generating spatial, geometric and symmetrical channels. Finally, we evaluate the proposed solution on multiple benchmarks for the detection of pedestrians, cars and bicyclists. We achieve competitive results at over 25 frames per second.

pdf

Traffic Scene Segmentation based on Boosting over Multimodal Low, Intermediate and High Order Multi-range Channel Features

Arthur D. Costea and Sergiu Nedevschi

Proceedings of 2017 IEEE Intelligent Vehicles Symposium (IV) June 11-14, 2017, Redondo Beach, CA, USA, pp. 74-81

In this paper we introduce a novel multimodal boosting based solution for semantic segmentation of traffic scenarios. Local structure and context are captured from both monocular color and depth modalities in the form of image channels. We define multiple channel types at three different levels: low, intermediate and high order channels. The low order channels are computed using a multimodal multiresolution filtering scheme and capture structure and color information from lower receptive fields. For the intermediate order channels, we employ deep convolutional channels that are able to capture more complex structures, having a larger receptive field. The high order channels are scale invariant channels that consist of spatial, geometric and semantic channels. These channels are enhanced by additional pyramidal context channels, capturing context at multiple levels. The semantic segmentation is achieved by a boosting based classification scheme over superpixels using multi-range channel features and pyramidal context features. A presegmentation is used to generate semantic channels as input for more powerful final segmentation. The final segmentation is refined using a superpixel-level dense CRF. The proposed solution is evaluated on the Cityscapes segmentation benchmark and achieves competitive results at low computational costs. It is the first boosting based solution that is able to keep up with the performance of deep learning based approaches.

pdf