Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a dynamic obstacle detection method for automatic driving, which can realize dynamic obstacle detection by dividing a dynamic target and a static target and improve the accuracy of obstacle detection.
The aim of the invention can be achieved by the following technical scheme:
the invention provides a dynamic obstacle detection method for automatic driving, which comprises the following steps:
s1, acquiring an environment image stream, point cloud data and positioning data corresponding to the environment image stream in the running process of a vehicle through vehicle-mounted sensor equipment;
S2, dividing the environment image stream into continuous frame images, sequentially detecting and matching 2D characteristic points, restoring and recovering 3D positions of the 2D characteristic points, discriminating dynamic and static points of the 3D characteristic points and constructing dynamic and static binary labels for each frame image according to point cloud data and positioning data;
s3, taking dynamic and static binary labels constructed by all continuous frame images as the input of a neural network, and training the neural network by combining a cross entropy loss function;
And S4, acquiring an environment image stream, point cloud data and positioning data corresponding to the environment image stream in the running process of the vehicle in real time, inputting a trained neural network, acquiring a dynamic and static binary segmentation image, and further realizing the detection of dynamic obstacles.
Preferably, the vehicle-mounted sensor comprises a camera for acquiring an ambient image stream, a lidar for acquiring point cloud data, and an RTK positioning device for acquiring positioning data.
Preferably, the step S2 includes the steps of:
s2.1, detecting and matching 2D feature points of each pair of continuous frame images, and acquiring a 2D feature matching point set;
S2.2, according to the point cloud data, restoring and recovering the 3D position of each feature point of each pair of matching points in the 2D feature matching point set, and further obtaining a 3D feature matching point set;
s2.3, establishing a motion constraint condition equation according to the 3D feature matching point set and the positioning data, and screening dynamic and static points of each 3D feature point in the 3D feature point set according to the residual value;
And S2.4, constructing a dynamic and static binary label according to the discrimination result of the dynamic and static points.
Preferably, the step of restoring and recovering the 3D position of each feature point in S2.2 includes the following steps:
s2.2.1 projecting the laser radar 3D point onto the frame image according to the point cloud data;
s2.2.2, defining a region of interest on the frame image by taking each characteristic point as a center, and collecting laser radar 3D points projected into the region of interest;
S2.2.3, judging whether the number of the laser radar 3D points in the region of interest is smaller than a specified threshold, if so, directly discarding the region, otherwise, carrying out S2.2.4;
s2.2.4 fitting a plane according to a laser radar 3D point in the region of interest by adopting a least square method, evaluating the quality of the plane according to the average value of plane residual errors, and reserving a plane with good quality for S2.2.5;
s2.2.5 recovering the 3D position of the point according to the intersection equation of the ray and the local plane.
Preferably, in S2.2, if the 3D position recovery of any one of the pair of matching points fails, the pair of matching points is not reserved.
Preferably, the formula for projecting the lidar 3D point onto the frame image is:
Wherein: For the laser to camera extrinsic matrix, Is an extrinsic rotation matrix of the camera,The camera is an external reference translation matrix, K is a camera internal reference matrix, P j∈pcl0/1,pcl0 and pcl 1 are both point cloud data, P i is 2D homogeneous coordinates of a laser radar 3D point projected onto an environment image, and D is Z coordinates of the laser radar 3D point under a camera coordinate system.
Preferably, the set of lidar 3D points projected into the region of interest is:
in the formula, AndRespectively as characteristic pointsP x and p y are the abscissa and ordinate, respectively, of point p, which is the point in the region of interest, w x and w y are the lateral and longitudinal dimensions, respectively, of the ROI region, Ω roi is the set of all points in the region of interest, Ω pcl is the set of all laser points projected into the region of interest.
Preferably, the fitted plane equation is specifically:
in the formula, Is an extrinsic rotation matrix of the camera,Is an external parameter translation matrix of the camera, n is a normal vector, and P is a three-dimensional coordinate matrix of the laser radar 3D point.
Preferably, the motion constraint equation is specifically:
in the formula, AndThe j pairs of matching points of the k frame and the k+1 frame images respectively, wherein Ω 3D is a 3D feature matching point set, res j is a residual, if Res j is smaller than a threshold value, the static feature points are considered, and otherwise, the dynamic feature points are considered.
Preferably, the step S2.4 includes the steps of:
S2.4.1 traversing all dynamic feature points and voting on pixel points in the interested area;
s2.4.2 traversing all static feature points, and setting 0 for the number of tickets of pixel points in the interested area;
s2.4.3, counting the total ticket number of each pixel point in each frame of image, judging whether the total ticket number is larger than a set threshold value, if so, considering the pixel point as a dynamic point, otherwise, considering the pixel point as a static point;
s2.4.4, binarizing the total number of tickets of each pixel point to construct a dynamic and static binarization label.
Compared with the prior art, the invention has the following advantages:
1. according to the invention, plane fitting is performed by adopting a least square method, and the intersection point is calculated by using the rays of the 2D points and the plane to finish estimation, so that 3D position recovery is accurately performed on the detected and matched 2D characteristic points, and a foundation is laid for accurately discriminating dynamic and static points.
2. According to the invention, dynamic points and static points of the recovered 3D characteristic points are discriminated, binary labels are constructed to perform neural network training, and the dynamic and static objects can be segmented more efficiently without depending on labeling data, so that dynamic obstacle detection is realized.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples.
Referring to fig. 1, the present invention provides a dynamic obstacle detection method for automatic driving, comprising the steps of:
s1, acquiring an environment image stream in the running process of a vehicle and short-time accurate point cloud data and positioning data corresponding to the environment image stream through vehicle-mounted sensor equipment;
The vehicle-mounted sensor device comprises a camera for acquiring an environment image, a laser radar for acquiring point cloud data and an RTK positioning device for acquiring positioning data.
S2, dividing the environment image stream into continuous frame images, sequentially detecting and matching 2D characteristic points, restoring and recovering 3D positions of the 2D characteristic points, discriminating dynamic and static points of the 3D characteristic points and constructing dynamic and static binary labels for each frame image according to point cloud data and positioning data;
s2.1, detecting and matching 2D feature points of each pair of continuous frame images, and acquiring a 2D feature matching point set;
Taking a pair of continuous frame images I 0,I1 as an example, detecting and matching 2D feature points, and acquiring a 2D feature point matching point set omega 2D;
Where nums i is the number of matching points for the I-th pair of previous frame image I 0 and the subsequent frame image I 1, Ω 2D is the 2D feature point matching point set, AndThe j-th pair of matching points of the frame image I 0 and the frame image I 1 are respectively, and j is more than or equal to 0 and less than or equal to nums i.
As an alternative embodiment, the feature point detection method Superpoint based on deep learning is used to perform feature point detection on the 2D feature point.
As an alternative embodiment, the RGB three-channel image data is normalized, i.e. each pixel value of the frame image divided by 255, before the detection and matching of the 2D feature points is performed.
S2.2, according to the point cloud data, restoring and recovering the 3D position of each feature point of each pair of matching points in the 2D feature matching point set, and further obtaining a 3D feature matching point set;
s2.2.1, according to the point cloud data, projecting the 3D point of the laser radar onto a frame image, wherein the projection formula is as follows:
Wherein: For the laser to camera extrinsic matrix, Is an extrinsic rotation matrix of the camera,The camera is an external reference translation matrix, K is a camera internal reference matrix, P j∈pcl0/1,pcl0 and pcl 1 are both point cloud data, P j is 2D homogeneous coordinates of a laser radar 3D point projected onto an environment image, and D is Z coordinates of the laser radar 3D point under a camera coordinate system. The projected lidar 3D points are shown as gray points inside the circle in fig. 2.
S2.2.2, defining a region of interest on the frame image by taking each characteristic point as a center, and collecting laser radar 3D points projected into the region of interest;
A certain pair of matching points for the kth frame and the (k+1) th frame Their 3D positions are restored separately, and if any one of the 3D positions fails to be restored, the pair of matching points is not reserved.
By characteristic pointsTaking the recovery procedure of (a) as an exampleFor the center, a region of interest (ROI) is defined on the frame image, resulting in a set:
in the formula, AndRespectively isP x and p y are the horizontal and vertical coordinates, respectively, of a point p on the image, point p being a point within the region of interest, w x and w y being the horizontal and vertical dimensions, respectively, of the ROI region, Ω roi being the set of all points within the region of interest.
Collecting laser 3D points projected into the region of interest:
Where Ω pcl is the set of all lidar 3D points projected into the region of interest.
S2.2.3, judging whether the number of the laser radar 3D points in the region of interest is smaller than a specified threshold, if so, directly discarding the region, otherwise, carrying out S2.2.4;
A first threshold thre n is set, and if the number of laser points in the region of interest is less than thre n, i.e., |Ω pcl|<thren, the region is directly discarded.
The first threshold thre n is a super parameter, and related to the internal parameters of the camera, the manual adaptation needs to be performed according to different sensors. As an alternative embodiment, the first threshold thre n is set to 20 pixels.
S2.2.4, fitting a plane by adopting a least square method, evaluating the quality of the plane according to the average value of plane residual errors, and reserving the plane with good quality for S2.2.5;
If the number of the 3D points of the laser radar in the region of interest is greater than or equal to thre n, a plane is fitted by a least square method using the points in the region of interest, as shown in the lower right corner of fig. 2. The fitted plane equation is:
Wherein P is a three-dimensional coordinate matrix of the laser radar 3D point, wherein an ith column P col[i] in the matrix is a three-dimensional coordinate of the ith laser 3D point in the region of interest, namely P col[i]∈Ωpcl, and n is a normal vector of a plane.
The quality of the plane can be evaluated by calculating the plane residual:
in the formula, And r is a plane residual error which is an estimated value of the normal vector n, the average value of the normal vector n is used as an evaluation index, and if the quality is too poor, namely the average value of the normal vector n exceeds a second threshold value, the plane is directly abandoned.
The second threshold is set according to different sensors, as an alternative embodiment the second threshold is 0.05 meters in size.
S2.2.5 recovering the 3D position of the point according to the intersection equation of the ray and the local plane.
According to the plane equation andThe 3D position of the recovery point, as shown in fig. 2, is available from the ray intersecting the local plane:
Wherein, the pill is a coefficient for solving, and the 3D position of the intersection point is
Thus, the 2D feature point matching point set omega 2D is converted into the 3D feature point matching point set omega 3D
S2.3, establishing a motion constraint condition equation according to the 3D feature matching point set and the positioning data, and screening dynamic and static points of each 3D feature point in the 3D feature point set according to the residual value;
In training the segmentation model, continuous image streams are required, which correspond to short-time accurate positioning data and point cloud data. And for images of the front frame and the rear frame, acquiring a 3D characteristic point matching point set omega 3D by adopting the method described in S2. Due to the existence of short-time accurate positioning data, a position transformation matrix of each frame is obtained While matching points on the static object conform to the motion constraint that is used to construct the moving object tag.
Specifically, for a pair of 3D feature matching points
In the formula,AndThe j-th pair of matching points of the k-th frame and the k+1-th frame images respectively.
According to the motion constraint, there is the following relationship:
in the formula, AndThe position transformation matrices of the k-th frame and the k+1-th frame images, respectively.
However, due to errors in the actual motion, residuals will appear at both ends of the above formula, namely:
Where Res j is a residual, if Res j is smaller than the third threshold, that is, res j<thresholddynamic, then the point is considered to be a static feature point and is classified as set D s, otherwise, it is considered to be a dynamic feature point and is classified as set D d.
The third threshold is a manually adjustable parameter according to the sensor, as an alternative embodiment, set to 0.07 meters.
And S2.4, constructing a dynamic and static binary label according to the discrimination result of the dynamic and static points.
S2.4.1 traversing all dynamic feature points and voting on pixel points in the interested area;
s2.4.2 traversing all the static feature points and setting 0 for the number of tickets of the pixel points in the region of interest, wherein the purpose of traversing all the static points is to suppress noise.
S2.4.3, counting the total ticket number of each pixel point in each frame of image, judging whether the total ticket number is larger than a fourth threshold value, if so, considering the pixel point as a dynamic point, otherwise, considering the pixel point as a static point;
as an alternative embodiment, the fourth threshold is set to 5.
S2.4.4, binarizing the total number of tickets of each pixel point to construct a dynamic and static binarization label.
The pseudo code for this process is shown below, where (h, w) is the height and width of the image, and p [0] and p [1] are the x and y coordinates of the p-point, respectively:
Vote=np.zeros(h,w)
for p in Ds:
Vote[max(0,p[0]-ws):min(w,p[0]+ws),max(0,p[1]-hs):min(h,p[1]+hs)]+=1
for p in Dd:
Vote[max(0,p[0]-wS):min(w,p[0]+ws),max(0,p[1]-hs):min(h,p[1]+hs)]=0
Vote[Vote>thre]=1
Vote[Vote<=thre]=0
s3, taking dynamic and static binary labels constructed by all continuous frame images as the input of a neural network, and training the neural network by combining a cross entropy loss function;
as an alternative embodiment, the neural network is selected from FCN network or Unet network.
The loss function (loss) of the network is set to cross entropy loss as follows:
loss=CrossEntropy(Vote,output)
where output is the output of the network and is a tensor with channel 2. By the above way, a dynamic object segmentation module can be trained unsupervised. Meanwhile, the unsupervised training mode is also suitable for the lightweight network.
S4, inputting real-time collected continuous image data, point cloud data and positioning data, and dividing a dynamic target and a static target by adopting a trained neural network to obtain a dynamic and static binary divided image so as to realize dynamic obstacle detection.
According to the method for detecting the dynamic obstacle for automatic driving, provided by the invention, plane fitting is carried out by adopting a least square method, and the intersection point is obtained by using the ray of the 2D point and the plane to finish estimation, so that 3D position recovery is accurately carried out on the detected and matched 2D characteristic point, and a foundation is laid for accurately discriminating the dynamic and static point. And then, the dynamic points and the static points of the recovered 3D characteristic points are discriminated, binary labels are constructed for neural network training, and the dynamic and static objects can be segmented more efficiently without depending on labeling data, so that the detection of dynamic obstacles is realized.
The previous description of the embodiments is provided to facilitate a person of ordinary skill in the art in order to make and use the present invention. It will be apparent to those skilled in the art that various modifications can be readily made to these embodiments and the generic principles described herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications without departing from the scope of the present invention.