Summary of the invention
In view of the above deficiencies, the present invention provides a kind of method for generating 3 D visual image by single image, and this method is only
Need to input a color image, it can the new image under different perspectives is generated, for virtual reality glasses, naked eye
3D displaying etc., helps user to understand three-dimensional scenic, solves the problems, such as to be difficult to construct stereoscopic vision from single image.
To achieve the goals above, the technical scheme is that a kind of generate 3 D visual image by single image
Method mainly comprises the steps that
(1) training of estimation of Depth model is carried out using RGBD image data set;
(2) for individual color image of input, estimate its depth information;
(3) pass through part wrong in interactive and model parameter fine tuning amendment estimating depth;
(4) prospect protection operation is carried out to the depth map of estimation, so that depth edge and color image side be better aligned
Edge;
(5) according to image and depth information, parallax is calculated, the image under New Century Planned Textbook is obtained;
(6) according to the depth map in data set, hole region similar with image under New Century Planned Textbook is generated;
(7) generation with newly-generated data training for image mending fights network model;
(8) for test picture, small parameter perturbations are carried out to patch formation model;
(9) hollow sectors of image under New Century Planned Textbook, the 3 D visual image generated are repaired.
Further, the step (1) comprises the steps of:
(1.1) data processing is carried out, equal interval sampling is carried out to data set, and carry out random cropping, flip horizontal, color
The data of shake enhance, and measure relative depth according to depth map sampled point;
(1.2) model structure is encoder-decoder structure, obtains three kinds not from color image by three convolutional layers
With side input of the feature as model of scale, to restore detailed information;
(1.3) model loss function is made of L1loss, L2loss and rank loss;
(1.4) it is optimized using stochastic gradient descent method, and adjusts learning rate, batch size, weight decay
Hyper parameter, start to train after parameter setting.
Further, the step (3) comprises the steps of:
(3.1) it for the zone errors in estimating depth figure, picks up and regional aim color gray scale to be modified on the image
It is worth most similar color, and is smeared in zone errors, obtains interaction figure picture;
(3.2) random cropping, the data enhancing of random overturning and colour dither are carried out to input picture;
(3.3) data in step (3.1) are as real depth, and the data in step (3.2) are as input picture, to step
Suddenly the model of (1) carries out further fine tuning training;
(3.4) depth for predicting input picture again with the model after fine tuning in step (3.3), generates the depth after amendment
Spend prediction result.
Further, the step (6) comprises the steps of:
(6.1) mask code matrix onesize with image and that all value is 0 is initialized, by row to the pixel in image
It is scanned, if some pixel and the difference of adjacent pixel are greater than given threshold, then corresponding position in mask code matrix is set as 1;
(6.2) pixel for being 1 for mask code matrix intermediate value calculates the difference of the parallax of itself and adjacent pixel, and will be in image
All pixels point before from pixel to pixel is set as 0, and mask code matrix corresponding position is set as 1;
(6.3) pixel that mask code matrix intermediate value is 0 is hole region.
Further, the step (7) fills up model using Adadelta algorithm training image.
Further, the step (8) comprises the steps of:
(8.1) for image outside data set, hole region and mask code matrix are generated using the method for step (6);
(8.2) random cropping, the data enhancing of random overturning and colour dither are carried out to input picture;
(8.3) use the data in step (8.2) as input picture, no empty data are as true picture, to step (7)
Model carry out further fine tuning training.
The beneficial effects of the present invention are: only needing individual color image that can generate the image in a certain range under visual angle;
By editing to the depth map of estimation, the region of prediction error can be corrected;It is finely tuned by model parameter, for data set
Outer image can also have preferable performance, and the stereo-picture of generation can experience more apparent stereoscopic effect, to help
User is helped to more fully understand three-dimensional scenic.
Specific embodiment
The present invention will be further explained below with reference to the attached drawings:
As shown in Figure 1, a kind of method for generating 3 D visual image by single image, mainly comprises the steps that use
The training of RGBD image data set progress estimation of Depth model;For individual color image of input, its depth information is estimated;It is logical
Cross part wrong in interactive and model parameter fine tuning amendment estimating depth;Prospect protection operation is carried out to the depth map of estimation,
To which depth edge and Color Image Edge be better aligned;According to image and depth information, parallax is calculated, is obtained under New Century Planned Textbook
Image;According to the depth map in data set, hole region similar with image under New Century Planned Textbook is generated;It is instructed with newly-generated data
Practice and fights network model for the generation of image mending;For test picture, small parameter perturbations are carried out to patch formation model;The new view of repairing
The hollow sectors of image under angle, to generate final 3 D visual image.
Each step is described in detail below:
(1) training of estimation of Depth model is carried out using RGBD image data set: the step proposes a kind of estimation of Depth
Model constructs a kind of encoder-decoder structure based on ResNet50, and the input of model is color image, output
For single channel depth map.In order to supplement the detailed information in predetermined depth figure, model construction side-input structure, from defeated
The feature of three kinds of different scales is added in the color image entered in three times.In addition, having also combined side-output structure to help
Optimization.The loss function of model combines L1Loss, L2Loss and rank loss, and wherein rank loss can help to obtain non-office
The information in portion.Specific step is as follows:
Model structure design: the present invention is with (Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Sun.2016.Deep Residual Learning for Image Recognition.In CVPR.IEEE Computer
Society, 770-778.) based on the ResNet50 model proposed in, a kind of encoder-decoder structure is constructed, such as
Shown in Fig. 2.In order to restore the detailed information in predetermined depth figure, the present invention constructs side-input structure, i.e., from input
Feature is extracted by a new convolutional layer in color image, and is connected together with the characteristic pattern of the part decoder, as under
The input of one convolutional layer.The feature of three kinds of different scales is added in entire model in three times.In addition, having also combined (Saining
Xie and Zhuowen Tu.2017.Holistically-Nested Edge Detection.International
Journal of Computer Vision 125,1-3 (2017), 3-18.) the side-output structure proposed come help into
The optimization of row model.
Loss function design: the loss function of this model combines L1Loss, L2Loss and rank loss, wherein
Rankloss can help to obtain non local information.Specifically, for one group of RGBD image I in data set, the depth of prediction
Image is Z.For a pixel i, RGB information is represented byReal depth value isPredetermined depth value is Zi,
The poor D (I, i) of predicted value and true value can be represented as:
Similarly, the difference G of the gradient of depthxAnd GyIt can indicate are as follows:
L1Loss can be defined as:
L2Loss can be defined as:
In addition, (Weifeng Chen, Zhao Fu, Dawei Yang, and Jia Deng.2016.Single-Image
Depth Perception in the Wild.In NIPS.730-738.) for a pair of randomly selected point to (i, j), root
Label r is defined according to its depth valueij={+1, -1,0 }, wherein+1 indicates Zi>Zj, -1 indicates Zi< Zj, 0 indicates Zi=Zj.This point pair
Rank loss can indicate are as follows:
Finally, the loss function of estimation of Depth model is defined as:
Wherein, (k, l) indicates that the point pair on jth Zhang Xunlian picture, K are sum a little pair.
Model training process: the present invention uses house data collection NYU Depth v2 (Pushmeet Kohli Nathan
Silberman,Derek Hoiem and Rob Fergus.2012.Indoor Segmentation and Support
Inference from RGBD Images.In ECCV.) and outdoor data collection SYNTHIA (German Ros, Laura
Sellart,Joanna Materzynska,David Vazquez,and Antonio M.Lopez.2016.The SYNTHIA
Dataset:A Large Collection of Synthetic Images for Semantic Segmentation of
Urban Scenes.In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR)) it is trained.
For NYU data set, equal interval sampling is carried out to original Kinect sequence frame and obtains 40,000 RGBD images, and led to
It crosses Random Level overturning rotation and obtains 120,000 training sets, and unified bilinear interpolation narrows down to the resolution ratio of 320*240.Training
In the process by the size of image random cropping to 304*228.In addition also on depth image 3000 pairs of points of stochastical sampling to and mark
Infuse its relativeness.
For SYNTHIA data set, using two scenes of SEQS-04 and SEQS-06, and according to training set: test set 9:1
Ratio obtain 54000 training set images and 6000 test set images, the resolution ratio for finally narrowing down to 320*190 is instructed
Practice.
Model training uses stochastic gradient descent method, and batch size is 8, and entire training process continues 13 epochs.
Begin to use 0.001 learning rate to preheat thousands of times, formal training is started with 0.01 learning rate later, every 6 epochs will
Learning rate is reduced to original 1/10.
(2) for individual color image of input, estimate its depth information: when test, picture centre being cropped to 304*
228, and the output valve of model is mapped back by 10 meters of ranges by exponential function, prediction result example is shown in Fig. 3.
(3) pass through part wrong in interactive and model parameter fine tuning amendment estimating depth: directly using model estimation figure
When as depth, there may be partial error, as illustrated in fig. 4 c.The present invention provides interactive tool and corrects mistake area to help
Domain.User can pick up on the image with color similar in regional aim color to be modified, and smeared, obtained on region
The interaction figure picture of Fig. 4 d.The interaction figure picture is substituted into real depth map (Fig. 4 b) later, using the color image of Fig. 4 a to step
1) trained model carries out finetune in, carries out random cropping, random overturning and color to input picture in training process
Dither operation come achieve the purpose that data enhance.Finally image is carried out again with the model after fine tuning to predict that Fig. 4 e, which can be obtained, to be repaired
Prediction result after just.Wherein, model parameter fine tuning refers to one picture of input, by random cropping, random mirror image, and random color
Picture enhancing is a data set by shake etc., and trained model carries out further on current data set in back
Training, to improve the processing capacity to current image.
(4) prospect protection operation is carried out to the depth map of estimation, so that depth edge and color image side be better aligned
Edge: the depth map edge of model prediction may be not fully aligned with cromogram edge, can use (Xiaohan Lu, Fang
Wei,and Fangmin Chen.2012.Foreground-Object-Protected Depth Map Smoothing for
2012 IEEE International Conference on Multimedia and Expo.339-343. of DIBR.In) it mentions
Foreground object out is protected depth edge and Color Image Edge is better aligned.
(5) according to image and depth information, parallax is calculated, obtains the image under New Century Planned Textbook: the solid generated in the present invention
Visual pattern parallax in X direction, can be calculated by the following formula:
Wherein, x' indicates that the location of pixels in input picture, x indicate new location of pixels.B indicate view transformation away from
From i.e. baseline;F indicates the focal length of camera, which is determined by data set;Z indicates the depth at x'.It can be incited somebody to action by the formula
Image transforms under New Century Planned Textbook.In view transformation, when the region that is blocked displays again, hole region will form, 1 pixel is wide
The cavity of degree can directly be filled up by interpolation, and generation confrontation network can be used to carry out image by subsequent step for bigger cavity
It fills up.
(6) it according to the depth map in data set, generates hole region similar with image under New Century Planned Textbook: being produced by view transformation
Raw hole region often close to the edge of object and shows elongated shape.It is filled up to improve generation confrontation network image
Effect, the invention proposes a kind of algorithms that the image with hole region is generated according to depth map, which passes through scanning
Depth difference between pixel and adjacent pixel determines whether there may be hole regions for the position;Additionally by certain pixel of calculating
Parallax between adjacent pixel determines the size of the hole region of the position.It can be generated by this algorithm and stereopsis
Feel the similar hole region of location and shape in image.Specific step is as follows:
(6.1) position that simulation hole region generates: assuming that generating the right visual angle of input picture, hole region will appear
On the right of foreground object, we initialize a mask image by following formula:
Wherein Z (x, y) indicates that the depth value at (x, y) point, θ are the threshold value that system is formulated.The expression of M (x, y)=1 is carrying out
When visual angle change, which is likely to occur cavity.
(6.2) simulate the size of hole region: in X direction, the size in cavity is by (x, y) and (x+1, y) near (x, y) point
Parallax at two o'clock determines that formula is as follows:
All pixels point before from pixel (x+1, y) to pixel (x+1+s (x, y), y) is considered as hole area
Domain, mask image corresponding position are set as 1, and for the empty image of the band of generation as shown in Fig. 5 a- Fig. 5 f, white portion indicates hole area
Domain.
(7) fight network model for the generation of image mending with newly-generated data training: model uses (Satoshi
Iizuka,Edgar Simo-Serra,and Hiroshi Ishikawa.2017.Globally and locally
Consistent image completion.ACM Trans.Graph.36,4 (2017), 107:1-107:14.) in knot
Structure is trained on the SYNTHIA data set of the NYU data set of 560*426 resolution sizes and 640*380 respectively, input
Image is image and mask image with the hole region generated in step 6), and true picture is original complete image.Training
Using Adadelta algorithm, continue 10 epochs, batch size is 24.
(8) for test picture, small parameter perturbations are carried out to patch formation model;For the picture not in data set, the present invention
Using treated in step 4), predetermined depth figure generates hole region, and carries out random cropping, flip horizontal to input picture
Data enhancing, using enhanced data in step 7) model carry out small parameter perturbations.The result example of fine tuning front and back is shown in
Fig. 6.
(9) hollow sectors for repairing image under New Century Planned Textbook generate result as shown in fig. 7, white edge part is stereopsis in figure
Feel the obvious region of effect.