CN114022563B

CN114022563B - A dynamic obstacle detection method for autonomous driving

Info

Publication number: CN114022563B
Application number: CN202111238591.2A
Authority: CN
Inventors: 田炜; 文永琨; 邓振文; 黄禹尧; 谭大艺; 韩帅
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2025-06-03
Anticipated expiration: 2041-10-25
Also published as: CN114022563A

Abstract

The present invention provides a dynamic obstacle detection method for automatic driving, comprising the following steps: obtaining an environmental image stream, point cloud data and positioning data during vehicle operation through an on-board sensor device; dividing the environmental image stream into continuous frame images, sequentially performing 2D feature point detection and matching, restoration and recovery of the 3D position of the 2D feature points, identification of dynamic and static points of the 3D feature points, and construction of dynamic and static binary labels for each pair of continuous frame images; using the dynamic and static binary labels of all frame images as inputs of a neural network to train the neural network; collecting the environmental image stream during vehicle operation in real time, using point cloud data and positioning data, and inputting the trained neural network to obtain dynamic and static binary segmentation images, thereby realizing dynamic obstacle detection. Compared with the prior art, the method can realize dynamic obstacle detection by segmenting dynamic targets and static targets, and improve the accuracy of obstacle detection.

Description

Dynamic obstacle detection method for automatic driving

Technical Field

The invention relates to the field of intelligent network automobiles and the field of machine learning, in particular to a dynamic obstacle detection method for automatic driving.

Background

Autopilot is a technology that controls a vehicle to automatically travel on a road through computer equipment, and the implementation of autopilot relies on cooperation of artificial intelligence, visual computing, radar, and positioning components. Because the actual road condition is complex, not only static obstacles such as trees, green belts and the like are present, but also dynamic obstacles such as a large number of pedestrians, vehicles and the like are present, so that how to realize dynamic obstacle recognition, and further planning a driving route for avoiding the obstacles becomes a key of automatic driving.

However, in the prior art, detection of dynamic obstacles is still blank, most of processing methods depend on labels, but the labeling work is very tedious and time-consuming, and the cost is high, so that the development of the field of obstacle detection is greatly limited.

Currently, a laser radar or a camera is generally provided on a vehicle supporting automatic driving. When the laser radar is used for detecting the obstacle, the laser radar emits laser beams, and a laser point cloud is constructed according to reflected beams formed by the laser beams on the surface of the object, so that the obstacle in the environment is identified according to the laser point cloud; when the camera is used for detecting the obstacle, the camera acquires an environment image and recognizes the obstacle in the environment image through an image recognition technology. But is limited by the working principles of the laser radar and the camera, the error of obstacle distance detection is larger during recognition, and the accuracy of obstacle detection is affected.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a dynamic obstacle detection method for automatic driving, which can realize dynamic obstacle detection by dividing a dynamic target and a static target and improve the accuracy of obstacle detection.

The aim of the invention can be achieved by the following technical scheme:

the invention provides a dynamic obstacle detection method for automatic driving, which comprises the following steps:

s1, acquiring an environment image stream, point cloud data and positioning data corresponding to the environment image stream in the running process of a vehicle through vehicle-mounted sensor equipment;

S2, dividing the environment image stream into continuous frame images, sequentially detecting and matching 2D characteristic points, restoring and recovering 3D positions of the 2D characteristic points, discriminating dynamic and static points of the 3D characteristic points and constructing dynamic and static binary labels for each frame image according to point cloud data and positioning data;

s3, taking dynamic and static binary labels constructed by all continuous frame images as the input of a neural network, and training the neural network by combining a cross entropy loss function;

And S4, acquiring an environment image stream, point cloud data and positioning data corresponding to the environment image stream in the running process of the vehicle in real time, inputting a trained neural network, acquiring a dynamic and static binary segmentation image, and further realizing the detection of dynamic obstacles.

Preferably, the vehicle-mounted sensor comprises a camera for acquiring an ambient image stream, a lidar for acquiring point cloud data, and an RTK positioning device for acquiring positioning data.

Preferably, the step S2 includes the steps of:

s2.1, detecting and matching 2D feature points of each pair of continuous frame images, and acquiring a 2D feature matching point set;

S2.2, according to the point cloud data, restoring and recovering the 3D position of each feature point of each pair of matching points in the 2D feature matching point set, and further obtaining a 3D feature matching point set;

s2.3, establishing a motion constraint condition equation according to the 3D feature matching point set and the positioning data, and screening dynamic and static points of each 3D feature point in the 3D feature point set according to the residual value;

And S2.4, constructing a dynamic and static binary label according to the discrimination result of the dynamic and static points.

Preferably, the step of restoring and recovering the 3D position of each feature point in S2.2 includes the following steps:

s2.2.1 projecting the laser radar 3D point onto the frame image according to the point cloud data;

s2.2.2, defining a region of interest on the frame image by taking each characteristic point as a center, and collecting laser radar 3D points projected into the region of interest;

S2.2.3, judging whether the number of the laser radar 3D points in the region of interest is smaller than a specified threshold, if so, directly discarding the region, otherwise, carrying out S2.2.4;

s2.2.4 fitting a plane according to a laser radar 3D point in the region of interest by adopting a least square method, evaluating the quality of the plane according to the average value of plane residual errors, and reserving a plane with good quality for S2.2.5;

s2.2.5 recovering the 3D position of the point according to the intersection equation of the ray and the local plane.

Preferably, in S2.2, if the 3D position recovery of any one of the pair of matching points fails, the pair of matching points is not reserved.

Preferably, the formula for projecting the lidar 3D point onto the frame image is:

Wherein: For the laser to camera extrinsic matrix, Is an extrinsic rotation matrix of the camera,The camera is an external reference translation matrix, K is a camera internal reference matrix, P _j∈pcl_0/1,pcl₀ and pcl ₁ are both point cloud data, P _i is 2D homogeneous coordinates of a laser radar 3D point projected onto an environment image, and D is Z coordinates of the laser radar 3D point under a camera coordinate system.

Preferably, the set of lidar 3D points projected into the region of interest is:

in the formula, AndRespectively as characteristic pointsP _x and p _y are the abscissa and ordinate, respectively, of point p, which is the point in the region of interest, w _x and w _y are the lateral and longitudinal dimensions, respectively, of the ROI region, Ω _roi is the set of all points in the region of interest, Ω _pcl is the set of all laser points projected into the region of interest.

Preferably, the fitted plane equation is specifically:

in the formula, Is an extrinsic rotation matrix of the camera,Is an external parameter translation matrix of the camera, n is a normal vector, and P is a three-dimensional coordinate matrix of the laser radar 3D point.

Preferably, the motion constraint equation is specifically:

in the formula, AndThe j pairs of matching points of the k frame and the k+1 frame images respectively, wherein Ω _3D is a 3D feature matching point set, res _j is a residual, if Res _j is smaller than a threshold value, the static feature points are considered, and otherwise, the dynamic feature points are considered.

Preferably, the step S2.4 includes the steps of:

S2.4.1 traversing all dynamic feature points and voting on pixel points in the interested area;

s2.4.2 traversing all static feature points, and setting 0 for the number of tickets of pixel points in the interested area;

s2.4.3, counting the total ticket number of each pixel point in each frame of image, judging whether the total ticket number is larger than a set threshold value, if so, considering the pixel point as a dynamic point, otherwise, considering the pixel point as a static point;

s2.4.4, binarizing the total number of tickets of each pixel point to construct a dynamic and static binarization label.

Compared with the prior art, the invention has the following advantages:

1. according to the invention, plane fitting is performed by adopting a least square method, and the intersection point is calculated by using the rays of the 2D points and the plane to finish estimation, so that 3D position recovery is accurately performed on the detected and matched 2D characteristic points, and a foundation is laid for accurately discriminating dynamic and static points.

2. According to the invention, dynamic points and static points of the recovered 3D characteristic points are discriminated, binary labels are constructed to perform neural network training, and the dynamic and static objects can be segmented more efficiently without depending on labeling data, so that dynamic obstacle detection is realized.

Drawings

Fig. 1 is a flow chart of a dynamic obstacle detection method for automatic driving according to the present embodiment;

FIG. 2 is a schematic diagram of 3D restoration of 2D feature points in the embodiment shown in FIG. 1;

Fig. 3 is a schematic flow chart of S2.2 in the embodiment shown in fig. 1.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

Referring to fig. 1, the present invention provides a dynamic obstacle detection method for automatic driving, comprising the steps of:

s1, acquiring an environment image stream in the running process of a vehicle and short-time accurate point cloud data and positioning data corresponding to the environment image stream through vehicle-mounted sensor equipment;

The vehicle-mounted sensor device comprises a camera for acquiring an environment image, a laser radar for acquiring point cloud data and an RTK positioning device for acquiring positioning data.

Taking a pair of continuous frame images I ₀,I₁ as an example, detecting and matching 2D feature points, and acquiring a 2D feature point matching point set omega _2D;

Where nums _i is the number of matching points for the I-th pair of previous frame image I ₀ and the subsequent frame image I ₁, Ω _2D is the 2D feature point matching point set, AndThe j-th pair of matching points of the frame image I ₀ and the frame image I ₁ are respectively, and j is more than or equal to 0 and less than or equal to nums _i.

As an alternative embodiment, the feature point detection method Superpoint based on deep learning is used to perform feature point detection on the 2D feature point.

As an alternative embodiment, the RGB three-channel image data is normalized, i.e. each pixel value of the frame image divided by 255, before the detection and matching of the 2D feature points is performed.

s2.2.1, according to the point cloud data, projecting the 3D point of the laser radar onto a frame image, wherein the projection formula is as follows:

Wherein: For the laser to camera extrinsic matrix, Is an extrinsic rotation matrix of the camera,The camera is an external reference translation matrix, K is a camera internal reference matrix, P _j∈pcl_0/1,pcl₀ and pcl ₁ are both point cloud data, P _j is 2D homogeneous coordinates of a laser radar 3D point projected onto an environment image, and D is Z coordinates of the laser radar 3D point under a camera coordinate system. The projected lidar 3D points are shown as gray points inside the circle in fig. 2.

A certain pair of matching points for the kth frame and the (k+1) th frame Their 3D positions are restored separately, and if any one of the 3D positions fails to be restored, the pair of matching points is not reserved.

By characteristic pointsTaking the recovery procedure of (a) as an exampleFor the center, a region of interest (ROI) is defined on the frame image, resulting in a set:

in the formula, AndRespectively isP _x and p _y are the horizontal and vertical coordinates, respectively, of a point p on the image, point p being a point within the region of interest, w _x and w _y being the horizontal and vertical dimensions, respectively, of the ROI region, Ω _roi being the set of all points within the region of interest.

Collecting laser 3D points projected into the region of interest:

Where Ω _pcl is the set of all lidar 3D points projected into the region of interest.

A first threshold thre _n is set, and if the number of laser points in the region of interest is less than thre _n, i.e., |Ω _pcl|＜thre_n, the region is directly discarded.

The first threshold thre _n is a super parameter, and related to the internal parameters of the camera, the manual adaptation needs to be performed according to different sensors. As an alternative embodiment, the first threshold thre _n is set to 20 pixels.

S2.2.4, fitting a plane by adopting a least square method, evaluating the quality of the plane according to the average value of plane residual errors, and reserving the plane with good quality for S2.2.5;

If the number of the 3D points of the laser radar in the region of interest is greater than or equal to thre _n, a plane is fitted by a least square method using the points in the region of interest, as shown in the lower right corner of fig. 2. The fitted plane equation is:

Wherein P is a three-dimensional coordinate matrix of the laser radar 3D point, wherein an ith column P _col[i] in the matrix is a three-dimensional coordinate of the ith laser 3D point in the region of interest, namely P _col[i]∈Ω_pcl, and n is a normal vector of a plane.

The quality of the plane can be evaluated by calculating the plane residual:

in the formula, And r is a plane residual error which is an estimated value of the normal vector n, the average value of the normal vector n is used as an evaluation index, and if the quality is too poor, namely the average value of the normal vector n exceeds a second threshold value, the plane is directly abandoned.

The second threshold is set according to different sensors, as an alternative embodiment the second threshold is 0.05 meters in size.

According to the plane equation andThe 3D position of the recovery point, as shown in fig. 2, is available from the ray intersecting the local plane:

Wherein, the pill is a coefficient for solving, and the 3D position of the intersection point is

Thus, the 2D feature point matching point set omega _2D is converted into the 3D feature point matching point set omega _3D

In training the segmentation model, continuous image streams are required, which correspond to short-time accurate positioning data and point cloud data. And for images of the front frame and the rear frame, acquiring a 3D characteristic point matching point set omega _3D by adopting the method described in S2. Due to the existence of short-time accurate positioning data, a position transformation matrix of each frame is obtained While matching points on the static object conform to the motion constraint that is used to construct the moving object tag.

Specifically, for a pair of 3D feature matching points

In the formula,AndThe j-th pair of matching points of the k-th frame and the k+1-th frame images respectively.

According to the motion constraint, there is the following relationship:

in the formula, AndThe position transformation matrices of the k-th frame and the k+1-th frame images, respectively.

However, due to errors in the actual motion, residuals will appear at both ends of the above formula, namely:

Where Res _j is a residual, if Res _j is smaller than the third threshold, that is, res _j＜threshold_dynamic, then the point is considered to be a static feature point and is classified as set D _s, otherwise, it is considered to be a dynamic feature point and is classified as set D _d.

The third threshold is a manually adjustable parameter according to the sensor, as an alternative embodiment, set to 0.07 meters.

s2.4.2 traversing all the static feature points and setting 0 for the number of tickets of the pixel points in the region of interest, wherein the purpose of traversing all the static points is to suppress noise.

S2.4.3, counting the total ticket number of each pixel point in each frame of image, judging whether the total ticket number is larger than a fourth threshold value, if so, considering the pixel point as a dynamic point, otherwise, considering the pixel point as a static point;

as an alternative embodiment, the fourth threshold is set to 5.

The pseudo code for this process is shown below, where (h, w) is the height and width of the image, and p [0] and p [1] are the x and y coordinates of the p-point, respectively:

Vote=np.zeros(h,w)

for p in Ds:

Vote[max(0,p[0]-ws):min(w,p[0]+ws),max(0,p[1]-hs):min(h,p[1]+hs)]+=1

for p in Dd:

Vote[max(0,p[0]-wS):min(w,p[0]+ws),max(0,p[1]-hs):min(h,p[1]+hs)]=0

Vote[Vote>thre]=1

Vote[Vote<=thre]=0

as an alternative embodiment, the neural network is selected from FCN network or Unet network.

The loss function (loss) of the network is set to cross entropy loss as follows:

loss=CrossEntropy(Vote,output)

where output is the output of the network and is a tensor with channel 2. By the above way, a dynamic object segmentation module can be trained unsupervised. Meanwhile, the unsupervised training mode is also suitable for the lightweight network.

S4, inputting real-time collected continuous image data, point cloud data and positioning data, and dividing a dynamic target and a static target by adopting a trained neural network to obtain a dynamic and static binary divided image so as to realize dynamic obstacle detection.

According to the method for detecting the dynamic obstacle for automatic driving, provided by the invention, plane fitting is carried out by adopting a least square method, and the intersection point is obtained by using the ray of the 2D point and the plane to finish estimation, so that 3D position recovery is accurately carried out on the detected and matched 2D characteristic point, and a foundation is laid for accurately discriminating the dynamic and static point. And then, the dynamic points and the static points of the recovered 3D characteristic points are discriminated, binary labels are constructed for neural network training, and the dynamic and static objects can be segmented more efficiently without depending on labeling data, so that the detection of dynamic obstacles is realized.

The previous description of the embodiments is provided to facilitate a person of ordinary skill in the art in order to make and use the present invention. It will be apparent to those skilled in the art that various modifications can be readily made to these embodiments and the generic principles described herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications without departing from the scope of the present invention.

Claims

1. A dynamic obstacle detection method for autonomous driving, comprising the following steps:

S1: Obtain the environment image stream during vehicle operation, as well as the point cloud data and positioning data corresponding to the environment image stream through the vehicle-mounted sensor device;

S2: Divide the environment image stream into continuous frame images, and perform 2D feature point detection and matching, 3D position restoration and recovery, dynamic and static point identification of 3D feature points, and dynamic and static binary labels for each frame image in sequence according to the point cloud data and positioning data;

S3: The binary labels of motion and stillness constructed from all consecutive frame images are used as the input of the neural network, and the neural network is trained in combination with the cross entropy loss function;

S4: Real-time collection of environmental image streams during vehicle operation, as well as point cloud data and positioning data corresponding to the environmental image streams, and input into the trained neural network to obtain dynamic and static binary segmentation images, thereby realizing the detection of dynamic obstacles;

The S2 comprises the following steps:

S2.1: Detect and match 2D feature points for each pair of continuous frame images, and obtain a 2D feature matching point set;

S2.2: According to the point cloud data, restore and recover the 3D position of each feature point of each pair of matching points in the 2D feature matching point set, thereby obtaining a 3D feature matching point set;

S2.3: establishing a motion constraint condition equation according to the 3D feature matching point set and the positioning data, and identifying a dynamic or static point for each 3D feature point in the 3D feature point set according to the size of the residual value;

S2.4: Construct dynamic and static binary labels based on the identification results of dynamic and static points;

The restoration and recovery of the 3D position of each feature point in S2.2 includes the following steps:

S2.2.1: Project the LiDAR 3D points onto the frame image based on the point cloud data;

S2.2.2: Define a region of interest on the frame image with each feature point as the center, and collect the lidar 3D points projected into the region of interest;

S2.2.3: Determine whether the number of LiDAR 3D points in the region of interest is less than a specified threshold. If so, the region is discarded directly, otherwise proceed to S2.2.4;

S2.2.4: Fit a plane using the least squares method based on the lidar 3D points in the region of interest, evaluate the quality of the plane based on the mean of the plane residuals, and retain the plane with good quality for S2.2.5;

S2.2.5: Recover the 3D position of the point based on the equation of intersection of the ray and the local plane;

The S2.4 comprises the following steps:

S2.4.1: Traverse all dynamic feature points and vote for the pixels in their area of interest;

S2.4.2: Traverse all static feature points and set the votes of the pixels in their area of interest to 0;

S2.4.3: Count the total number of votes for each pixel in each frame of the image, and determine whether the total number of votes is greater than a set threshold. If so, the pixel is considered to be a dynamic point, otherwise, the pixel is considered to be a static point;

S2.4.4: Binarize the total number of votes for each pixel and construct a dynamic or static binary label.

2. According to a dynamic obstacle detection method for autonomous driving according to claim 1, it is characterized in that the vehicle-mounted sensor includes a camera for collecting environmental image streams, a laser radar for collecting point cloud data, and an RTK positioning device for collecting positioning data.

3. A dynamic obstacle detection method for autonomous driving according to claim 1, characterized in that, in S2.2, if the 3D position recovery of any one feature point in a pair of matching points fails, the pair of matching points is not retained.

4. A dynamic obstacle detection method for autonomous driving according to claim 1, characterized in that the formula for projecting the laser radar 3D point onto the frame image is:

Where: is the external parameter matrix from laser to camera, is the camera's extrinsic rotation matrix, is the camera’s extrinsic translation matrix, K is the camera’s intrinsic matrix, _Pj∈pcl0 _/1 , _pcl0 and _pcl1 are both point cloud data, _pj is the 2D homogeneous coordinate of the lidar 3D point projected onto the environment image, and d is the Z coordinate of the lidar 3D point in the camera coordinate system.

5. A dynamic obstacle detection method for autonomous driving according to claim 4, characterized in that the set of laser radar 3D points projected into the region of interest is:

In the formula, and The feature points , _px and _py are the horizontal and vertical coordinates of point p, respectively. Point p is a point in the region of interest. _wx and _wy are the horizontal and vertical sizes of the ROI area, respectively. _Ωroi is the set of all points in the region of interest. _Ωpcl is the set of all laser points projected into the region of interest.

6. A dynamic obstacle detection method for autonomous driving according to claim 5, characterized in that the fitted plane equation is specifically:

In the formula, is the camera's extrinsic rotation matrix, is the camera's extrinsic translation matrix, n is the normal vector, and P is the three-dimensional coordinate matrix of the lidar 3D point.

7. The dynamic obstacle detection method for autonomous driving according to claim 1, wherein the motion constraint condition equation is:

In the formula, and are the j-th pair of matching points of the k-th frame and the k+1-th frame image respectively, Ω _3D is the 3D feature matching point set, Res _j is the residual, if Res _j is less than the threshold, it is considered to be a static feature point, otherwise it is considered to be a dynamic feature point.