Fusion Scheme for Semantic and Instance-level Segmentation

A.D. Costea, A. Petrovai, S. Nedevschi

Deep vision workshop; 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 18)

A powerful scene understanding can be achieved by combining the tasks of semantic segmentation and instance level recognition. Considering that these tasks are complementary, we propose a multi-objective fusion scheme which leverages the capabilities of each task: pixel level semantic segmentation performs well in background classification and delimiting foreground objects from background, while instance level segmentation excels in recognizing and classifying objects as a whole. We use a fully convolutional residual network together with a feature pyramid network in order to achieve both semantic segmentation and Mask R-CNN based instance level recognition. We introduce a novel fusion approach to refine the outputs of this network based on object sub-category class and instance propagation guidance by semantic segmentation for more general classes. The proposed solution achieves significant improvements in semantic object segmentation and object mask boundaries refinement at low computational costs.


Real-Time Object Detection Using a Sparse 4-Layer LIDAR

M.P. Muresan, S. Nedevschi, I. Giosan

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 317-322.

The robust detection of obstacles, on a given road path by vehicles equipped with range measurement devices represents a requirement for many research fields including autonomous driving and advanced driving assistance systems. One particular sensor system used for measurement tasks, due to its known accuracy, is the LIDAR (Light Detection and Ranging). The commercial price and computational intensiveness of such systems generally increase with the number of scanning layers. For this reason, in this paper, a novel six step based obstacle detection approach using a 4-layer LIDAR is presented. In the proposed pipeline we tackle the problem of data correction and temporal point cloud fusion and we present an original method for detecting obstacles using a combination between a polar histogram and an elevation grid. The results have been validated by using objects provided from other range measurement sensors.


An approach for segmenting 3D LiDAR data using Multi-Volume grid structures

S.E.C. Goga, S. Nedevschi

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 309-315.

This paper proposes a novel approach for segmenting and space partitioning data ofsparse 3D LiDAR point clouds for autonomous driving tasks in urban environments. Our main focus is building a compact data representation which provides enough information for an accurate segmentation algorithm. We propose the use of an extension of elevation maps for automotive driving perception tasks which is capable of dealing with both protruding and hanging objects found in urban scenes like bridges, hanging road barrier, traffic tunnels, tree branches over road surface, and so on. For this we use a MultiVolume grid representation of the environment. We apply a fast primary classifier in order to label the surface volumes as being part of the ground segment or of an object segment. Segmentation is performed on the object labeled data which is previously connected in a spatial graph structure using a height overlapping criterion. A comparison between the proposed method and the popular connected-components based segmentation method applied on an Elevation Map is performed in the end.


Online Cross-Calibration of Camera and LIDAR

B.C.Z. Blaga, S. Nedevschi

Proceedings of 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2017, pp. 295-301.

In an autonomous driving system, drift can affect the sensor’s position, introducing errors in the extrinsic calibration. For this reason, we have developed a method which continuously monitors two sensors, camera, and LIDAR with 16 beams, and adjusts the value of their cross-calibration. Our algorithm, starting from correct values of the extrinsic crosscalibration parameters, can detect small sensor drift during vehicle driving, by overlapping the edges from the LIDAR over the edges from the image. The novelty of our method is that in order to obtain edges, we create a range image and filter the data from the 3D point cloud, and we use distance transform on 2D images to find edges. Another improvement we bring is applying motion correction on laser scanner data to remove distortions that appear during vehicle motion. An optimization problem on the 6 calibration parameters is defined, from which we are able to obtain the best value of the cross-calibration, and readjust it automatically. Our system performs successfully in real time, in a wide variety of scenarios, and is not affected by the speed of the car.


Estimation of Absolute Scale in Monocular SLAM Using Synthetic Data

Danila Rukhovich, Daniel Mouritzen, Ralf Kaestner, Martin Rufli, Alexander Velizhev

International Conference on Computer Vision (ICCV) 2019 – Workshop on Computer Vision for Road Scene Understanding and Autonomous Driving

This paper addresses the problem of scale estimation in monocular SLAM by estimating absolute distances between camera centers of consecutive image frames. These estimates would improve the overall performance of classical(not deep) SLAM systems and allow metric feature locations to be recovered from a single monocular camera. We propose several network architectures that lead to an improvement of scale estimation accuracy over the state of the art. In addition, we exploit a possibility to train the neural network only with synthetic data derived from a computer graphics simulator. Our key insight is that, using only synthetic training inputs, we can achieve similar scale estimation accuracy as that obtained from real data. This fact indicates that fully annotated simulated data is a viable alternative to existing deep-learning-based SLAM systems trained on real (unlabeled) data. Our experiments with unsupervised domain adaptation also show that the difference in visual appearance between simulated and real data does not affect scale estimation results. Our method operates with low-resolution images (0.03 MP), which makes it practical for real-time SLAM applications with a monocular camera.

Paper (.pdf)

Multi-Task Network for Panoptic Segmentation in Automated Driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 2394-2401.

In this paper, we tackle the newly introduced panoptic segmentation task. Panoptic segmentation unifies semantic and instance segmentation and leverages the capabilities of these complementary tasks by providing pixel and instance level classification. Current state-of-the-art approaches employ either separate networks for each task or a single network for both task and post processing heuristics fuse the outputs into the final panoptic segmentation. Instead, our approach solves all three tasks including panoptic segmentation with an end-to-end learnable fully convolutional neural network. We build upon the Mask R-CNN framework with a shared backbone and individual network heads for each task. Our semantic segmentation head uses multi-scale information from the Feature Pyramid Network, while the panoptic head learns to fuse the semantic segmentation logits with variable number of instance segmentation logits. Moreover, the panoptic head refines the outputs of the network, improving the semantic segmentation results. Experimental results on the challenging Cityscapes dataset demonstrate that the proposed solution
achieves significant improvements for both panoptic segmentation and semantic segmentation.


Curb Detection in Urban Traffic Scenarios Using LiDARs Point Cloud and Semantically Segmented Color Images

S.E.C. Deac, I. Giosan, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zeeland, 26-30 October,2019, pp. 3433-3440.

In this paper we propose a robust curb detection method which is based on the fusion between semantically labeled camera images and a 3D point cloud coming from LiDAR sensors. The labels from the semantically enhanced cloud are used to reduce the curbs’ searching area. Several spatial cues are next computed on each candidate curb region. Based on these features, a candidate curb region is either rejected or refined for obtaining a precise positioning of the curb points found inside it. A novel local model-based outlier removal algorithm is proposed to filter out the erroneous curb points. Finally, a temporal integration of the detected curb points in multiple consecutive frames is used to densify the detection result. An objective evaluation of the proposed solution is done using a highresolution digital map containing ground truth curb points. The proposed system has proved capable of detecting curbs of any heights (from 3cm up to 30cm) in complex urban road scenarios (straight roads, curved roads, intersections with traffic isles and roundabouts).


Efficient instance and semantic segmentation for automated driving

A. Petrovai, S. Nedevschi

Proceeding of 2019 IEEE Intelligent Vehicles Symposium (IV 2019), Paris, France, 9 – 12 June, 2019, pp. 2575-2581.

Environment perception for automated vehicles is achieved by fusing the outputs of different sensors such as cameras, LIDARs and RADARs. Images provide a semantic understanding of the environment at object level using instance segmentation, but also at background level using semantic segmentation. We propose a fully convolutional residual network based on Mask R-CNN to achieve both semantic and instance level recognition. We aim at developing an efficient network that could run in real-time for automated driving applications without compromising accuracy. Moreover, we compare and experiment with two different backbone architectures, a classification type of network and a faster segmentation type of network based on dilated convolutions. Experiments demonstrate top results on the publicly available Cityscapes dataset.


Environment Perception Architecture using Images and 3D Data

H. Florea, R. Varga, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 223-228.

This paper discusses the architecture of an environment perception system for autonomous vehicles. The modules of the system are described briefly and we focus on important changes in the architecture that enable: decoupling of data acquisition from data processing; synchronous data processing; parallel computation on GPU and multiple CPU cores; efficient data passing using pointers; adaptive architecture capable of working with different number of sensors. The experimental results compare execution times before and after the proposed optimizations. We achieve a 10 Hz frame rate for an object detection system working with 4 cameras and 4 LIDAR point clouds.


A Fast RANSAC Based Approach for Computing the Orientation of Obstacles in Traffic Scenes

F. Oniga, S. Nedevschi

Proceedings of 2018 14th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, September 7-9, 2018, pp. 209-214.

A low complexity approach for computing the orientation of 3D obstacles, detected from lidar data, is proposed in this paper. The proposed method takes as input obstacles represented as cuboids without orientation (aligned with the reference frame). Each cuboid contains a cluster of obstacle locations (discrete grid cells). First, for each obstacle, the boundaries that are visible for the perception system are selected. A model consisting of two perpendicular lines is fitted to the set of boundary cells, one for each presumed visible side. The main dominant line is computed with a RANSAC approach. Then, the second line is searched, using a constraint of perpendicularity on the dominant line. The existence of the second line is used to validate the orientation. Finally, additional criteria are proposed to select the best orientation based on the free area of the cuboid (on top view) that is visible to the perception system.