Disclosure of Invention
In order to at least partially overcome the above problems in the prior art, the present invention provides a binocular vision system and a three-dimensional reconstruction method and apparatus thereof.
According to an aspect of the present invention, there is provided a three-dimensional reconstruction method of a binocular vision system, including: acquiring two images by using a binocular camera with a pre-calibrated parameter model; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction.
And carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.
Wherein the parametric model comprises an internal parameter, an external parameter, and a distortion parameter; the internal parameters are internal structure parameters of the binocular camera; the external parameters comprise a rotation matrix and a translation matrix of the binocular camera; the distortion parameters include radial distortion and tangential distortion.
Wherein, the neural network architecture model used in the YOLO target detection algorithm is Darknet 19; wherein, the Darknet19 comprises 19 connected layers and 5 maxporoling layers, and the conv layer comprises two Kernels of 1 × 1 and 3 × 3.
Wherein the training set used by the Darknet19 is the VOC2012 data set.
The obtaining of the identification information and the detection window of the target object image based on the two images by using a YOLO target detection algorithm specifically includes:
dividing the two images into grids of S, predicting B Bounding boxes and corresponding Confidence scales by each grid, and predicting C conditional class probability by the Bounding boxes;
the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.
According to another aspect of the present invention, there is provided a three-dimensional reconstruction apparatus of a binocular vision system, including: the image acquisition module is used for acquiring two images by using a binocular camera with a pre-calibrated parameter model; the processing module is used for acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm based on the two images, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and the three-dimensional model generation module is used for acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction.
Wherein, still include: and the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.
Wherein, still include: the target detection module is used for dividing the two images into grids of S, each grid predicts B Bounding boxes and corresponding Confidence Scores, and the Bounding boxes predict C conditional class probabilities; the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.
According to still another aspect of the present invention, there is provided a binocular vision system including the above three-dimensional reconstruction apparatus of the robot binocular vision system.
In summary, the invention provides a binocular vision system and a three-dimensional reconstruction method and device thereof, wherein a binocular camera with a pre-calibrated parameter model is used for acquiring two images; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction method of the binocular vision system can identify the target object, perform characteristic point matching by combining the identification result and finish stereoscopic vision, thereby providing necessary parameters for a service robot to finish grabbing and operating tasks.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flowchart of a three-dimensional reconstruction method of a binocular vision system according to an embodiment of the present invention, as shown in fig. 1, including:
s1, acquiring two images by using a binocular camera with a pre-calibrated parameter model;
the acquisition of the stereo image pair uses a group of binocular vision cameras and adopts a parallel stereo vision model. The internal parameters of the two cameras are kept consistent and work independently, and the two cameras are set to be parallel structures when images are collected. And simultaneously shooting to obtain an image A and an image B, wherein the shooting pictures of the two cameras are positioned on the same horizontal line, so that the transformation model between the two images can be regarded as a translation model.
S2, based on the two images, obtaining identification information and a detection window of the target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image;
and S3, acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model through the principle of triangular mapping so as to complete three-dimensional reconstruction.
Specifically, the position coordinates of the target object in the actual space can be obtained based on the space discrete point cloud and the parameter model by the principle of triangulation to complete three-dimensional reconstruction.
In the embodiment of the invention, a binocular camera with a pre-calibrated parameter model is used for acquiring two images; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction method of the binocular vision system provided by the embodiment of the invention can be used for identifying the target object, matching the characteristic points by combining the identification result and completing the stereoscopic vision, thereby providing necessary parameters for the service robot to complete the grabbing and operating tasks.
On the basis of the embodiment, parameter model calibration is carried out on the binocular camera based on an opencv camera plane calibration method.
In the embodiment of the invention, parameter model calibration is carried out on the binocular camera by using an opencv-based camera plane calibration method, and the relationship between the pixel position of the image acquired by the binocular camera and the position of the scene point is established.
On the basis of the above embodiment, the parametric model includes an internal parameter, an external parameter, and a distortion parameter; wherein,
the internal parameters are internal structural parameters of the binocular camera;
the external parameters comprise a rotation matrix and a translation matrix of the binocular camera;
the distortion parameters include radial distortion and tangential distortion.
Specifically, the internal parameters are basic parameters for imaging of the binocular camera, and represent internal structure parameters of the binocular camera; the external parameters comprise a rotation matrix and a translation matrix of the binocular double cameras and are used for determining the three-dimensional position and vector relation of a coordinate system of the binocular cameras relative to a world coordinate system.
On the basis of the above embodiment, the neural network architecture model used in the YOLO target detection algorithm is Darknet 19; wherein,
the Darknet19 includes 19 connected layers and 5 maxporoling layers, and the conv layer includes two Kernels of 1 × 1 and 3 × 3.
The Darknet19 network consisted of 19 confluent layers and 5 maxporoling layers, used a large number of 3 x 3 filters and doubled the number of channels after each pooling step, conv layers comprised of two Kernels 1 x 1 and 3 x 3, and used batch normalization to stabilize training, accelerate convergence, and regularize the model. The input of the network needs to cut the size of the input image to 416 × 416, and perform 32 times of dimensionality reduction sampling on the image, and output a feature map with the size of 13 × 13.
Based on the above examples, the training set used by the Darknet19 is the VOC2012 data set.
Specifically, each image in the VOC2012 data set has a corresponding annotation file, which gives a Bounding Box and a Class label where objects appear in the image, and multiple objects belonging to multiple categories appear in one picture.
On the basis of the above embodiment, the obtaining of the identification information and the detection window of the target object image based on the two images by using the YOLO target detection algorithm specifically includes:
dividing the two images into grids of S, predicting B Bounding boxes and corresponding Confidence scales by each grid, and predicting C conditional class probability by the Bounding boxes;
the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.
Specifically, an input image is divided into grids of S, each grid predicts B Bounding boxes and Confidence scales of the Bounding boxes, C conditional classification probabilities are predicted by the Bounding boxes, information of each predicted target is 5-dimensional and comprises 4-dimensional coordinate information (center point coordinate + target length and width) and Confidence degree of the target, the final output dimension is S (B5 + C), and frame information and target categories required by target detection are regressed on each grid. In the embodiment, a YOLO target detection algorithm is used for respectively detecting input images of a binocular camera, and an identification window of a detected object, confidence of target classification and 4-dimensional coordinate information are returned.
Fig. 2 is a schematic structural diagram of a three-dimensional reconstruction apparatus of a binocular vision system according to an embodiment of the present invention, as shown in fig. 2, including: the system comprises an image acquisition module 101, a processing module 102 and a three-dimensional model generation module 103; wherein,
the image acquisition module 101 is configured to acquire two images by using a binocular camera with a pre-calibrated parameter model;
the acquisition of the stereo image pair uses a group of binocular vision cameras and adopts a parallel stereo vision model. The internal parameters of the two cameras are kept consistent and work independently, and the two cameras are set to be parallel structures when images are collected. And simultaneously shooting to obtain an image A and an image B, wherein the shooting pictures of the two cameras are positioned on the same horizontal line, so that the transformation model between the two images can be regarded as a translation model.
The processing module 102 is configured to obtain identification information and a detection window of a target object image based on the two images by using a YOLO target detection algorithm, perform stereo matching on the detection windows of the two images as feature points, and obtain a spatial discrete point cloud of the target object image;
preferably, the neural network architecture model used by the YOLO target detection algorithm is Darknet 19; wherein,
the Darknet19 includes 19 connected layers and 5 maxporoling layers, and the conv layer includes two Kernels of 1 × 1 and 3 × 3.
The Darknet19 network consisted of 19 confluent layers and 5 maxporoling layers, used a large number of 3 x 3 filters and doubled the number of channels after each pooling step, conv layers comprised of two Kernels 1 x 1 and 3 x 3, and used batch normalization to stabilize training, accelerate convergence, and regularize the model. The input of the network needs to cut the size of the input image to 416 × 416, and perform 32 times of dimensionality reduction sampling on the image, and output a feature map with the size of 13 × 13.
Preferably, the training set used by the Darknet19 is the VOC2012 data set.
Specifically, each image in the VOC2012 data set has a corresponding annotation file, which gives a Bounding Box and a Class label where objects appear in the image, and multiple objects belonging to multiple categories appear in one picture.
The three-dimensional model generation module 103 is configured to obtain a position coordinate of the target object in an actual space based on the spatial discrete point cloud and the parameter model according to a principle of triangulation, so as to complete three-dimensional reconstruction.
Specifically, the position coordinates of the target object in the actual space can be obtained based on the space discrete point cloud and the parameter model by the principle of triangulation to complete three-dimensional reconstruction.
In the embodiment of the invention, the image acquisition module is used for acquiring two images by using a binocular camera with a pre-calibrated parameter model; the processing module is used for acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm based on the two images, and performing stereo matching on the detection windows of the two images as feature points to acquire a space discrete point cloud of the target object image; and the three-dimensional model generation module is used for acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction device of the binocular vision system provided by the embodiment of the invention can identify the target object, and performs characteristic point matching by combining the identification result to complete stereoscopic vision, thereby providing necessary parameters for the service robot to complete the grabbing and operating tasks.
On the basis of the above embodiment, the method further includes: and the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.
In the embodiment of the invention, the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera based on the opencv camera plane calibration method, and establishing the relationship between the pixel position of the image acquired by the binocular camera and the position of the scene point.
Wherein the parametric model comprises an internal parameter, an external parameter, and a distortion parameter; wherein,
the internal parameters are internal structural parameters of the binocular camera;
the external parameters comprise a rotation matrix and a translation matrix of the binocular camera;
the distortion parameters include radial distortion and tangential distortion.
Specifically, the internal parameters are basic parameters for imaging of the binocular camera, and represent internal structure parameters of the binocular camera; the external parameters comprise a rotation matrix and a translation matrix of the binocular double cameras and are used for determining the three-dimensional position and vector relation of a coordinate system of the binocular cameras relative to a world coordinate system.
On the basis of the above embodiment, the method further includes: the target detection module is used for dividing the two images into grids of S, each grid predicts B Bounding boxes and corresponding Confidence Scores, and the Bounding boxes predict C conditional class probabilities;
the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.
Specifically, the target detection module is configured to divide an input image into grids of S × S, predict B Bounding boxes and Confidence Scores of the Bounding boxes for each grid, predict C conditional class probabilities by the Bounding boxes, predict 5-dimensional information of each predicted target, include 4-dimensional coordinate information (center coordinates + target length and width) and a Confidence of the target, finally output a dimension of S × S (B × 5+ C), and regress frame information and target classes required for target detection on each grid.
On the basis of the above embodiments, a binocular vision system, characterized in that the robot binocular vision system comprises the three-dimensional reconstruction apparatus of the robot binocular vision system of any one of claims 7 to 9.
According to the binocular vision system provided by the embodiment of the invention, the three-dimensional reconstruction device in the binocular vision system can identify the target object, and the characteristic point matching is carried out by combining the identification result, so that the stereoscopic vision is completed, and therefore, necessary parameters are provided for the service robot to complete the grabbing and operating tasks.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.