CN108053449A

CN108053449A - Three-dimensional rebuilding method, device and the binocular vision system of binocular vision system

Info

Publication number: CN108053449A
Application number: CN201711425236.XA
Authority: CN
Inventors: 左国玉; 陈珂鑫; 卢佳豪; 潘婷婷; 刘月雷
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-12-25
Filing date: 2017-12-25
Publication date: 2018-05-18

Abstract

The present invention provides a three-dimensional reconstruction method, device and binocular vision system of a binocular vision system. The three-dimensional reconstruction method is: using a binocular camera with a pre-calibrated parameter model to obtain two images; based on the two images, using The YOLO target detection algorithm obtains the identification information and detection window of the target object image, performs stereo matching on the detection windows of the two images as feature points, and obtains the spatially discrete point cloud of the target object image; based on the spatially discrete point cloud and the obtained The parametric model is used to obtain the position coordinates of the target object in the actual space through the principle of triangulation mapping, so as to complete the three-dimensional reconstruction. The three-dimensional reconstruction method of the binocular vision system provided by the present invention can identify the target object, perform feature point matching in combination with the identification results, and complete stereo vision, thereby providing necessary parameters for the service robot to complete the grasping and operation tasks.

Description

Three-dimensional reconstruction method and device of binocular vision system and binocular vision system

Technical Field

The invention relates to the technical field of artificial intelligence and stereoscopic vision, in particular to a three-dimensional reconstruction method and device of a binocular vision system and the binocular vision system.

Background

The object grabbing and operation are basic tasks required to be executed by the intelligent service robot, and the recognition and positioning of the target object through the vision system of the robot are the premise that the robot finishes the task of grabbing the target object. Robots cannot perceive complex environments as easily as humans.

The binocular stereoscopic vision is a typical human-like vision model, two vision sensors are adopted to carry out image acquisition, stereoscopic correction, stereoscopic matching and other work, rich environment information can be obtained, the perception capability of scene depth information is improved on the basis of traditional image acquisition, and the binocular stereoscopic vision is very suitable for constructing a vision system of an intelligent service robot.

In the current robot industry, the visual system technology of an industrial robot is mature, but the requirement on the working environment is harsh, the application requirement of the unstructured environment cannot be met, and the industrial robot is difficult to transplant and apply to an intelligent service robot. The vision function of the intelligent service robot is often single, and the intelligent service robot can only be used for collecting environmental information required by specific work.

Disclosure of Invention

In order to at least partially overcome the above problems in the prior art, the present invention provides a binocular vision system and a three-dimensional reconstruction method and apparatus thereof.

According to an aspect of the present invention, there is provided a three-dimensional reconstruction method of a binocular vision system, including: acquiring two images by using a binocular camera with a pre-calibrated parameter model; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction.

And carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.

Wherein the parametric model comprises an internal parameter, an external parameter, and a distortion parameter; the internal parameters are internal structure parameters of the binocular camera; the external parameters comprise a rotation matrix and a translation matrix of the binocular camera; the distortion parameters include radial distortion and tangential distortion.

Wherein, the neural network architecture model used in the YOLO target detection algorithm is Darknet 19; wherein, the Darknet19 comprises 19 connected layers and 5 maxporoling layers, and the conv layer comprises two Kernels of 1 × 1 and 3 × 3.

Wherein the training set used by the Darknet19 is the VOC2012 data set.

The obtaining of the identification information and the detection window of the target object image based on the two images by using a YOLO target detection algorithm specifically includes:

dividing the two images into grids of S, predicting B Bounding boxes and corresponding Confidence scales by each grid, and predicting C conditional class probability by the Bounding boxes;

the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.

According to another aspect of the present invention, there is provided a three-dimensional reconstruction apparatus of a binocular vision system, including: the image acquisition module is used for acquiring two images by using a binocular camera with a pre-calibrated parameter model; the processing module is used for acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm based on the two images, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and the three-dimensional model generation module is used for acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction.

Wherein, still include: and the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.

Wherein, still include: the target detection module is used for dividing the two images into grids of S, each grid predicts B Bounding boxes and corresponding Confidence Scores, and the Bounding boxes predict C conditional class probabilities; the dimension of the detection information output by the YOLO target detection algorithm is S (B) 5+ C.

According to still another aspect of the present invention, there is provided a binocular vision system including the above three-dimensional reconstruction apparatus of the robot binocular vision system.

In summary, the invention provides a binocular vision system and a three-dimensional reconstruction method and device thereof, wherein a binocular camera with a pre-calibrated parameter model is used for acquiring two images; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction method of the binocular vision system can identify the target object, perform characteristic point matching by combining the identification result and finish stereoscopic vision, thereby providing necessary parameters for a service robot to finish grabbing and operating tasks.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a three-dimensional reconstruction method of a binocular vision system according to an embodiment of the present invention;

fig. 2 is a block diagram of a three-dimensional reconstruction apparatus of a binocular vision system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flowchart of a three-dimensional reconstruction method of a binocular vision system according to an embodiment of the present invention, as shown in fig. 1, including:

s1, acquiring two images by using a binocular camera with a pre-calibrated parameter model;

the acquisition of the stereo image pair uses a group of binocular vision cameras and adopts a parallel stereo vision model. The internal parameters of the two cameras are kept consistent and work independently, and the two cameras are set to be parallel structures when images are collected. And simultaneously shooting to obtain an image A and an image B, wherein the shooting pictures of the two cameras are positioned on the same horizontal line, so that the transformation model between the two images can be regarded as a translation model.

S2, based on the two images, obtaining identification information and a detection window of the target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image;

and S3, acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model through the principle of triangular mapping so as to complete three-dimensional reconstruction.

Specifically, the position coordinates of the target object in the actual space can be obtained based on the space discrete point cloud and the parameter model by the principle of triangulation to complete three-dimensional reconstruction.

In the embodiment of the invention, a binocular camera with a pre-calibrated parameter model is used for acquiring two images; based on the two images, acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm, and performing stereo matching on the detection windows of the two images as feature points to obtain a space discrete point cloud of the target object image; and acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction method of the binocular vision system provided by the embodiment of the invention can be used for identifying the target object, matching the characteristic points by combining the identification result and completing the stereoscopic vision, thereby providing necessary parameters for the service robot to complete the grabbing and operating tasks.

On the basis of the embodiment, parameter model calibration is carried out on the binocular camera based on an opencv camera plane calibration method.

In the embodiment of the invention, parameter model calibration is carried out on the binocular camera by using an opencv-based camera plane calibration method, and the relationship between the pixel position of the image acquired by the binocular camera and the position of the scene point is established.

On the basis of the above embodiment, the parametric model includes an internal parameter, an external parameter, and a distortion parameter; wherein,

the internal parameters are internal structural parameters of the binocular camera;

the external parameters comprise a rotation matrix and a translation matrix of the binocular camera;

the distortion parameters include radial distortion and tangential distortion.

Specifically, the internal parameters are basic parameters for imaging of the binocular camera, and represent internal structure parameters of the binocular camera; the external parameters comprise a rotation matrix and a translation matrix of the binocular double cameras and are used for determining the three-dimensional position and vector relation of a coordinate system of the binocular cameras relative to a world coordinate system.

On the basis of the above embodiment, the neural network architecture model used in the YOLO target detection algorithm is Darknet 19; wherein,

the Darknet19 includes 19 connected layers and 5 maxporoling layers, and the conv layer includes two Kernels of 1 × 1 and 3 × 3.

The Darknet19 network consisted of 19 confluent layers and 5 maxporoling layers, used a large number of 3 x 3 filters and doubled the number of channels after each pooling step, conv layers comprised of two Kernels 1 x 1 and 3 x 3, and used batch normalization to stabilize training, accelerate convergence, and regularize the model. The input of the network needs to cut the size of the input image to 416 × 416, and perform 32 times of dimensionality reduction sampling on the image, and output a feature map with the size of 13 × 13.

Based on the above examples, the training set used by the Darknet19 is the VOC2012 data set.

Specifically, each image in the VOC2012 data set has a corresponding annotation file, which gives a Bounding Box and a Class label where objects appear in the image, and multiple objects belonging to multiple categories appear in one picture.

On the basis of the above embodiment, the obtaining of the identification information and the detection window of the target object image based on the two images by using the YOLO target detection algorithm specifically includes:

Specifically, an input image is divided into grids of S, each grid predicts B Bounding boxes and Confidence scales of the Bounding boxes, C conditional classification probabilities are predicted by the Bounding boxes, information of each predicted target is 5-dimensional and comprises 4-dimensional coordinate information (center point coordinate + target length and width) and Confidence degree of the target, the final output dimension is S (B5 + C), and frame information and target categories required by target detection are regressed on each grid. In the embodiment, a YOLO target detection algorithm is used for respectively detecting input images of a binocular camera, and an identification window of a detected object, confidence of target classification and 4-dimensional coordinate information are returned.

Fig. 2 is a schematic structural diagram of a three-dimensional reconstruction apparatus of a binocular vision system according to an embodiment of the present invention, as shown in fig. 2, including: the system comprises an image acquisition module 101, a processing module 102 and a three-dimensional model generation module 103; wherein,

the image acquisition module 101 is configured to acquire two images by using a binocular camera with a pre-calibrated parameter model;

The processing module 102 is configured to obtain identification information and a detection window of a target object image based on the two images by using a YOLO target detection algorithm, perform stereo matching on the detection windows of the two images as feature points, and obtain a spatial discrete point cloud of the target object image;

preferably, the neural network architecture model used by the YOLO target detection algorithm is Darknet 19; wherein,

Preferably, the training set used by the Darknet19 is the VOC2012 data set.

The three-dimensional model generation module 103 is configured to obtain a position coordinate of the target object in an actual space based on the spatial discrete point cloud and the parameter model according to a principle of triangulation, so as to complete three-dimensional reconstruction.

In the embodiment of the invention, the image acquisition module is used for acquiring two images by using a binocular camera with a pre-calibrated parameter model; the processing module is used for acquiring identification information and a detection window of a target object image by using a YOLO target detection algorithm based on the two images, and performing stereo matching on the detection windows of the two images as feature points to acquire a space discrete point cloud of the target object image; and the three-dimensional model generation module is used for acquiring the position coordinates of the target object in the actual space based on the space discrete point cloud and the parameter model by a triangulation principle so as to complete three-dimensional reconstruction. The three-dimensional reconstruction device of the binocular vision system provided by the embodiment of the invention can identify the target object, and performs characteristic point matching by combining the identification result to complete stereoscopic vision, thereby providing necessary parameters for the service robot to complete the grabbing and operating tasks.

On the basis of the above embodiment, the method further includes: and the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera by using an opencv-based camera plane calibration method.

In the embodiment of the invention, the parameter model calibration module is used for carrying out parameter model calibration on the binocular camera based on the opencv camera plane calibration method, and establishing the relationship between the pixel position of the image acquired by the binocular camera and the position of the scene point.

Wherein the parametric model comprises an internal parameter, an external parameter, and a distortion parameter; wherein,

the distortion parameters include radial distortion and tangential distortion.

On the basis of the above embodiment, the method further includes: the target detection module is used for dividing the two images into grids of S, each grid predicts B Bounding boxes and corresponding Confidence Scores, and the Bounding boxes predict C conditional class probabilities;

Specifically, the target detection module is configured to divide an input image into grids of S × S, predict B Bounding boxes and Confidence Scores of the Bounding boxes for each grid, predict C conditional class probabilities by the Bounding boxes, predict 5-dimensional information of each predicted target, include 4-dimensional coordinate information (center coordinates + target length and width) and a Confidence of the target, finally output a dimension of S × S (B × 5+ C), and regress frame information and target classes required for target detection on each grid.

On the basis of the above embodiments, a binocular vision system, characterized in that the robot binocular vision system comprises the three-dimensional reconstruction apparatus of the robot binocular vision system of any one of claims 7 to 9.

According to the binocular vision system provided by the embodiment of the invention, the three-dimensional reconstruction device in the binocular vision system can identify the target object, and the characteristic point matching is carried out by combining the identification result, so that the stereoscopic vision is completed, and therefore, necessary parameters are provided for the service robot to complete the grabbing and operating tasks.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A three-dimensional reconstruction method of a binocular vision system, comprising:

Acquire two images using a binocular camera with a pre-calibrated parameter model;

Based on the two images, use the YOLO target detection algorithm to obtain the identification information and the detection window of the target object image, and perform stereo matching on the detection windows of the two images as feature points to obtain a spatially discrete point cloud of the target object image;

Based on the spatially discrete point cloud and the parameter model, the position coordinates of the target object in the actual space are obtained through the principle of triangulation, so as to complete the three-dimensional reconstruction.

2. the three-dimensional reconstruction method of binocular vision system according to claim 1, is characterized in that, the camera plane calibration method based on opencv carries out parameter model calibration to described binocular camera.

3. the three-dimensional reconstruction method of binocular vision system according to claim 1, is characterized in that, described parametric model comprises internal parameter, external parameter and distortion parameter; Wherein,

Described internal parameter is the internal structure parameter of binocular camera;

The external parameters include a rotation matrix and a translation matrix of the binocular camera;

The distortion parameters include radial distortion and tangential distortion.

4. the three-dimensional reconstruction method of binocular vision system according to claim 1, is characterized in that, the neural network architecture model used in the described YOLO target detection algorithm is Darknet19; Wherein,

The Darknet19 includes 19 layers of convolutional layers and 5 layers of maxpooling layers, and the conv layer includes 1*1 and 3*3 Kernels.

5. the three-dimensional reconstruction method of binocular vision system according to claim 4, is characterized in that, the training set that described Darknet19 uses is VOC2012 data set.

6. the three-dimensional reconstruction method of binocular vision system according to claim 1, is characterized in that, described based on described two images, use YOLO target detection algorithm to obtain the identification information and detection window of target object image specifically as:

The two images are divided into S*S grids, each grid predicts B Bounding Boxes and Confidence Scores corresponding thereto, and the Bounding Box predicts C conditional class probability;

The dimension of the detection information output by the YOLO target detection algorithm is S*S*(B*5+C).

7. A three-dimensional reconstruction device of a binocular vision system, characterized in that it comprises:

The image acquisition module is used to acquire two images using the binocular camera of the pre-calibrated parameter model;

The processing module is configured to use the YOLO target detection algorithm to obtain the identification information and the detection window of the target object image based on the two images, perform stereo matching on the detection windows of the two images as feature points, and obtain the space of the target object image discrete point cloud;

The three-dimensional model generation module is used to obtain the position coordinates of the target object in the actual space based on the spatially discrete point cloud and the parameter model through the principle of triangulation, so as to complete the three-dimensional reconstruction.

8. The three-dimensional reconstruction device of binocular vision system according to claim 7, is characterized in that, also comprises:

The parameter model calibration module is used for performing parameter model calibration on the binocular camera based on the camera plane calibration method of opencv.

9. The three-dimensional reconstruction device of binocular vision system according to claim 7, is characterized in that, also comprises:

The target detection module is used to divide the two images into S*S grids, each grid predicts B Bounding Boxes and corresponding Confidence Scores, and the Bounding Box predicts C conditionalclass probabilities;

10. A binocular vision system, characterized in that the robot binocular vision system comprises the three-dimensional reconstruction device of the robot binocular vision system according to any one of claims 7 to 9.