CN120976449A

CN120976449A - A method and system for cross-source data 3D reconstruction based on improved Gaussian sputtering

Info

Publication number: CN120976449A
Application number: CN202511501579.4A
Authority: CN
Inventors: 何欣; 普成宇; 亓洪兴; 杨杭; 曹易农; 吴开乐; 王柄涛; 张晨阳; 周子杰; 陈育伟; 陈弘毅; 刘世界
Original assignee: Hangzhou Institute of Advanced Studies of UCAS
Current assignee: Hangzhou Institute of Advanced Studies of UCAS
Priority date: 2025-10-21
Filing date: 2025-10-21
Publication date: 2025-11-18
Anticipated expiration: 2045-10-21
Also published as: CN120976449B

Abstract

The invention discloses a cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering, wherein the method comprises the steps of collecting unmanned aerial vehicle inclined images and ground panoramic images of a target area, constructing a space-time associated data set, establishing space-time coding matching point pairs through a self-adaptive feature pyramid based on multi-view geometric constraint, performing intelligent incremental cross-source data sparse reconstruction, outputting point clouds and camera parameters, compressing three-dimensional Gaussian kernels into two-dimensional Gaussian primitives through double tangent vector constraint and attaching the two-dimensional Gaussian primitives to surface geometry, realizing multi-scale reconstruction, optimizing primitive parameters through a differential renderer, and completing multi-scale fine reconstruction through gradient back propagation, so that a high-precision three-dimensional model is generated. The invention provides multi-scale accurate geometric prior input and accurate camera pose for three-dimensional reconstruction, greatly reduces the dependence on professional manual operation in the traditional three-dimensional reconstruction method, and simultaneously remarkably improves the geometric precision and visual fidelity of the reconstruction result.

Description

Cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering

Technical Field

The invention relates to the technical field of three-dimensional reconstruction, in particular to a cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering.

Background

The goal of image-based three-dimensional scene reconstruction is to convert a series of photographs or videos of the scene to be reconstructed into a digitized three-dimensional model that can be computationally processed, analyzed and manipulated, a difficult and long-lasting problem that is the basis for machine understanding of real world complexity. The three-dimensional reconstruction technology is important for the wide application of tasks such as robot navigation, medical imaging and diagnosis, historical remains digital twin protection, enhancement/virtual reality, automatic driving and the like.

The development of three-dimensional reconstruction technology has gone through from the theoretical foundation of geometry in the 60 th century of the 20 th century, to the maturation of the geometric method of view in the first 21 st century, to the revolutionary breakthrough of deep learning and micro-renderable in recent years. The traditional geometric method is based on a framework of a motion restoration structure (Structure from Motion, sfM) and a Multi-View Stereo algorithm (MVS), and represents a system such as COLMAP, the performance of which is severely dependent on a manually designed feature extraction algorithm, and which performs poorly in weak texture regions and repetitive texture scenes, and the dense reconstruction efficiency is low.

With the improvement of computing power and the development of a deep learning technology, a three-dimensional reconstruction method based on machine learning gradually shows a new angle, a neural radiation field NeRF achieves unprecedented rendering quality by implicitly representing a scene, redefines a technical route of scene representation and rendering, has high computing cost, is difficult to be qualified particularly in an application scene which is very sensitive to delay, and three-dimensional Gaussian sputtering 3DGS represents the scene by combining the advantages of explicit representation and micro-rendering and using a learnable three-dimensional Gaussian kernel, and a highly parallel processing flow is used, so that the high-quality reconstruction is maintained, the rendering speed is remarkably improved, and a new paradigm in the three-dimensional reconstruction field is formed, but the explicit expression characteristic of the neural radiation field can cause the problems that the outlier Gaussian affects the reconstruction effect, the three-dimensional Gaussian kernel is difficult to determine the reconstruction surface and the like.

Therefore, how to solve the problem of the outlier Gao Siwei shadow due to insufficient geometric accuracy of the three-dimensional gaussian sputtering based on the image is a technical problem to be solved by researchers in the field.

Disclosure of Invention

The invention aims to provide a cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering, which are used for solving the problems in the background technology.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering comprises the following steps:

acquiring an unmanned aerial vehicle oblique photography sequence and a ground panoramic image sequence of a reconstruction target area, and constructing a space-time associated cross-source target scene data set;

cross-source data joint sparse reconstruction, establishing a cross-source matching point pair with space-time coding through a self-adaptive scale feature pyramid based on multi-view geometric constraint of a constructed cross-source target scene data set, performing incremental sparse reconstruction based on space-time coding, and outputting a sparse three-dimensional point cloud and a camera external parameter matrix;

Adopting an improved Gaussian sputtering explicit radiation field, compressing an original three-dimensional Gaussian kernel into a two-dimensional Gaussian primitive by double tangent vector constraint, forcing the primitive to be attached to the geometric manifold of the reconstruction surface, and carrying out multi-scale reconstruction on the sparse reconstruction result of the cross-source data;

Geometric regularization optimization is performed on the two-dimensional Gaussian primitives based on the differential renderer, primitive parameters are adjusted through gradient back propagation, multi-scale reconstruction is performed on sparse reconstruction results of cross-source data through progressive reconstruction from sparse point cloud to dense geometry, and a high-precision three-dimensional model corresponding to a target scene is generated.

Further, the acquiring the unmanned aerial vehicle oblique photography sequence and the ground panoramic image sequence of the reconstructed target area, and constructing the time-space associated target scene data set, includes:

Using an unmanned aerial vehicle carrying with a camera sensor to perform time sequence coding on a reconstruction target area according to a planned path to obtain an unmanned aerial vehicle oblique photographing sequence;

Shooting a ground panoramic video of a reconstructed target area by using a panoramic camera, supplementing cross-source view angle and target scene detail information, and obtaining a ground panoramic image sequence;

Performing viewpoint expansion and calculation on the ground panoramic image sequence, adopting a dynamic equal-angle segmentation strategy, taking a panoramic acquisition point as a center, performing geometric projection on a video frame from a spherical surface to a plane to generate an undistorted image sequence covering 360 degrees horizontally, and equally dividing an interval in azimuth angle ,N is a natural number.

Further, the establishing a cross-source matching point pair with space-time coding through the adaptive scale feature pyramid comprises the following steps:

Constructing a multi-mode feature pyramid, respectively constructing an adaptive scale feature pyramid for an unmanned aerial vehicle inclined image sequence and a ground panoramic unfolding image sequence, and adding space-time coding vectors for feature points , wherein,For time sequence coding, unmanned aerial vehicle images are coded according to the sequence of the navigation belt, and ground images are coded according to the time sequence of the video frame; For spherical coordinate coding, obtaining by back projecting pixel coordinates to an initial coordinate system of the acquisition equipment;

and extracting cross-source feature points by using scale invariant feature transformation, and performing feature matching on the cross-source image sequence with space-time coding by using a time sequence adjacent frame matching method to obtain a cross-source matching point pair with space-time coding.

Further, the incremental sparse reconstruction based on space-time coding outputs a sparse three-dimensional point cloud and a camera external parameter matrix, including:

Based on cross-source matching point pairs with space-time coding, constructing a nonlinear optimization problem under multi-view geometric constraint, screening reliable matching pairs through geometric verification, selecting high-matching-degree image pairs for initialization reconstruction, decomposing an essential matrix to obtain an initial camera pose and triangulating to generate a seed point cloud;

incremental registration is carried out on the image sequence of the cross-source target scene data set, the pose of a new camera is calculated through a perspective n-point method, point clouds are expanded, and the process is circulated according to space-time coding of the cross-source matching point pairs;

And when the number of the incremental point clouds or the registered images is larger than a threshold value, optimizing the point clouds and the camera external parameter matrix by adopting a beam adjustment method until all the images are registered, and outputting final sparse point clouds and the camera pose.

Further, compressing the original three-dimensional gaussian kernel into a two-dimensional gaussian primitive by bi-tangent vector constraint, comprising:

Using a point in three-dimensional space And a set of mutually perpendicular unit tangent vectorsAndTo represent a compressed two-dimensional Gaussian primitive whose normal vector is defined as;

Using scaling factorsAndControlling the variance of two-dimensional Gaussian planes, respectively representing along tangent vectorsAndScaling coefficients of the directions, so that the dimension reduction representation from the three-dimensional Gaussian kernel to the two-dimensional Gaussian primitive is realized while the geometric characteristics of the surface are reserved;

the final two-dimensional Gaussian primitive is composed of a central point Sum covariance matrixFull definitionWherein the superscript T denotes a transpose,To rotate the matrix, define the spatial orientation of the two-dimensional Gaussian primitiveTo scale the matrix, define the shape of a two-dimensional Gaussian primitive, whereRepresenting the diagonal matrix of diagonal elements.

Further, the forcing the primitive to conform to the reconstructed surface geometry manifold includes:

Initializing a two-dimensional Gaussian primitive by using the sparse point cloud, and determining a center point The method comprises the steps of locating on characteristic points of a reconstruction target geometry, and performing preliminary fitting on a reconstruction surface;

Searching the nearest neighbor point by using a K nearest neighbor method, performing principal component analysis on local point cloud, calculating covariance matrix of the neighbor point, and performing eigenvalue decomposition on the covariance matrix to obtain eigenvalues And feature vector;

Based on local geometry, the minimum eigenvalue corresponds to the eigenvectorIs the plane normal vector of the two-dimensional Gaussian primitive, and the double tangent vector is initialized asScaling factorAndThe two-dimensional gaussian primitive is brought into close proximity to the reconstruction surface at initialization.

Further, the differential renderer-based performing geometric regularization optimization on the two-dimensional gaussian primitives includes:

using depth regularization loss As a depth-optimized loss function, wherein,Is the firstThe mixing weights of the individual rays and the plane intersection points,Is the firstThe mixing weights of the individual rays and the plane intersection points,Is the firstDepth values of a two-dimensional gaussian primitive to the imaging plane,Is the firstDepth values from the two-dimensional Gaussian primitives to the imaging plane by minimizing depth differences between the intersection points;

usage line consistency loss Optimizing the loss function as a normal, whereinIs the planar normal vector of the two-dimensional gaussian primitive,By aligning the normal of the two-dimensional Gaussian distribution with the normal of the actual surface, the reconstructed surface is ensured to be smooth and the local geometry is accurate.

Further, the adjusting primitive parameters by gradient back propagation includes:

Using an end-to-end differentiable rendering pipeline, taking all the position coordinates, the bi-tangent vector direction, the scaling factors, the color and transparency characteristics of the two-dimensional Gaussian primitives into a trainable variable set, generating a synthesized view through differentiable rasterization during each iterative rendering, and calculating pixel level loss with a real observation image;

the self-adaptive momentum optimizer is adopted to reversely propagate error signals along the rendering loss gradient direction, and for geometric parameters, the coordinates of the central point of the primitive and the tangential vector direction are optimized, so that the geometric accuracy is improved;

Splitting the two-dimensional Gaussian primitives to fit a complex geometry according to the loss gradient threshold, and rejecting outlier two-dimensional Gaussian primitives according to opacity and tangential scaling.

Further, the performing multi-scale reconstruction on the cross-source data sparse reconstruction result to generate a high-precision model corresponding to the target scene includes:

Performing coarse-to-fine cross-source data multi-scale reconstruction on the whole target scene based on space-time coding, and constructing a whole geometric framework and structural characteristics of the target scene by utilizing wide-area coverage characteristics of unmanned aerial vehicle oblique photography data;

Gradually fusing high-resolution detail information of the ground panoramic image through space-time coding, supplementing fine geometric features of wall textures and decoration components on a fine granularity level, and forming a layered reconstruction flow;

And high-performance GPU is utilized to realize efficient modeling and optimization, and finally a high-precision model corresponding to the target scene is generated.

The invention also provides a cross-source data three-dimensional reconstruction system based on the improved Gaussian sputtering, which is used for realizing the cross-source data three-dimensional reconstruction method based on the improved Gaussian sputtering, and comprises the following steps:

the data acquisition unit is used for acquiring an unmanned aerial vehicle oblique photographing sequence and a ground panoramic image sequence of a reconstructed target area, constructing a space-time associated target scene data set and providing multi-view geometric constraint;

The space-time coding unit is used for carrying out data preprocessing on the target scene data set, and comprises scale matching of a cross-source image sequence, viewpoint unfolding and resolving of the panoramic image sequence and extraction of cross-source characteristic points to obtain a cross-source matching point pair with space-time coding;

The sparse reconstruction unit is used for performing intelligent incremental registration on the image sequence of the cross-source target scene data set, performing incremental sparse reconstruction based on space-time coding, and outputting a sparse three-dimensional point cloud and a camera external parameter matrix as initial input of the three-dimensional reconstruction unit;

The three-dimensional reconstruction unit is used for adopting an improved Gaussian sputtering explicit radiation field, introducing double tangent vector constraint and geometric regularization optimization, then carrying out multi-scale reconstruction based on machine learning on a target reconstruction scene based on a differentiable renderer, generating a high-precision model corresponding to the target scene, and completing a three-dimensional reconstruction task.

Compared with the prior art, the invention has the following beneficial effects:

The invention uses cross-source data combined sparse reconstruction, establishes cross-source matching point pairs with space-time coding through a self-adaptive scale feature pyramid based on multi-view geometric constraint of a space-time associated target scene data set, performs incremental sparse reconstruction based on space-time coding, and provides multi-scale accurate geometric prior input and accurate camera pose for three-dimensional reconstruction.

According to the invention, by introducing double tangent vector constraint and geometric regular optimization, an original three-dimensional Gaussian kernel is compressed into a two-dimensional Gaussian primitive, the primitive is forced to be attached to a geometric manifold of a reconstruction surface, a more accurate geometric reconstruction result is obtained, and the problem of reconstruction artifacts caused by outlier abnormal Gaussian existing in the original three-dimensional Gaussian is greatly reduced.

According to the invention, the end-to-end differentiable rendering pipeline is used, the self-adaptive momentum optimizer is adopted, the traditional manual reconstruction model process is converted into the end-to-end high-fidelity three-dimensional reconstruction which can use the machine learning method, so that the model can automatically learn an optimal reconstruction strategy, the geometric and visual information of a target reconstruction scene is directly obtained from a cross-source dataset image, the three-dimensional reconstruction work is automatically carried out, the dependence on professional manual operation in the traditional three-dimensional reconstruction method is greatly reduced, and the geometric precision and visual fidelity of a reconstruction result are remarkably improved.

Drawings

FIG. 1 is a flow chart of a cross-source data three-dimensional reconstruction method based on improved Gaussian sputtering.

Fig. 2 is a reconstruction flow chart of an embodiment of a cross-source data three-dimensional reconstruction method based on improved gaussian sputtering according to the present invention.

Fig. 3 is a schematic diagram of a cross-source data three-dimensional reconstruction system framework based on improved gaussian sputtering.

Detailed Description

In order to make the technical scheme and advantages of the present invention more clear, the following detailed description will be given with reference to the accompanying drawings. It should be noted that these embodiments are merely for illustrating the core idea of the present invention and are not intended to limit the present invention, and those skilled in the relevant art can easily adjust or substitute the obtained technical solution without departing from the principle of the present invention, which should be considered as falling within the protection scope of the present invention.

As shown in fig. 1, in order to achieve the above objective, the present invention proposes a cross-source data three-dimensional reconstruction method based on improved gaussian sputtering, comprising the steps of:

step 1, acquiring an unmanned aerial vehicle oblique photography sequence and a ground panoramic image sequence of a reconstructed target area, and constructing a space-time associated target scene data set, wherein the method specifically comprises the following steps of:

and 1.1, performing time sequence encoding multi-angle oblique photography on a reconstruction target area according to a planned path by using an unmanned aerial vehicle carrying an imaging sensor, and obtaining an unmanned aerial vehicle oblique photography sequence.

Step 1.2, shooting a ground panoramic video of a reconstructed target area by using a panoramic camera, supplementing cross-source view angle and target scene detail information, and obtaining a ground panoramic image sequence, wherein (a) in fig. 2 shows a data acquisition mode in a specific embodiment and an obtained target scene data set sequence.

Step 1.3, performing viewpoint expansion and calculation on a ground panoramic image sequence, adopting a dynamic equiangular segmentation strategy, taking a panoramic acquisition point as a center, performing spherical-to-plane geometric projection on a video frame to generate an undistorted image sequence covering 360 degrees horizontally, wherein the dynamic segmentation angle is determined by scene complexity, and azimuth angles are equally spaced,N is a natural number.

The dynamic equal-angle segmentation strategy is known in the prior art, adopts a projection mode with a fixed angle interval, equally divides a panoramic video frame into N parts (N is a preset constant) according to a horizontal azimuth, and each part corresponds to a virtual pinhole camera visual angle.

And 1.4, fusing the unmanned aerial vehicle oblique photographing sequence and the undistorted panoramic image sequence after viewpoint expansion, and constructing a space-time associated cross-source target scene data set.

Step 2, cross-source data joint sparse reconstruction, namely establishing a cross-source matching point pair with space-time coding through a self-adaptive scale feature pyramid based on multi-view geometric constraint of a constructed cross-source target scene data set, performing incremental sparse reconstruction based on space-time coding, and outputting a sparse three-dimensional point cloud and a camera external parameter matrix. The key idea of the adaptive scale feature pyramid is to make the network adaptively and dynamically select or blend the most suitable feature scale according to the image content (especially geometric information) instead of making the best view of all areas. The step 2 specifically comprises the following steps:

step 2.1, constructing a multi-mode feature pyramid, respectively constructing an adaptive scale feature pyramid for the unmanned aerial vehicle inclined image sequence and the ground panoramic expansion image sequence, and adding space-time coding vectors for feature points , wherein,For time sequence coding, unmanned aerial vehicle images are coded according to the sequence of the navigation belt, and ground images are coded according to the time sequence of the video frame; for spherical coordinate encoding, obtained by back-projecting the pixel coordinates to the acquisition device initial coordinate system.

The multi-modal feature pyramid is a three-dimensional reconstruction known technology which fuses multi-scale information from different sensors to solve challenges of texture deletion, illumination change, geometric blurring and the like of a single data source through complementary advantages. The multi-scale feature pyramid is a known technology for extracting and fusing features on different resolution levels of the same image, so that rich details and high-level semantic information are captured at the same time, and the three-dimensional reconstruction accuracy and robustness are improved.

And 2.2, extracting cross-source characteristic points by using scale invariant characteristic transformation, and performing characteristic matching on the cross-source image sequence with space-time coding by using a time sequence adjacent frame matching method to obtain a cross-source matching point pair with space-time coding.

The Scale Invariant Feature Transform (SIFT) is a conventional algorithm for extracting key point features from images that remain unchanged for scale, rotation and illumination changes, and aims to provide stable and reliable image matching points for three-dimensional reconstruction. The time sequence adjacent frame matching method is a dynamic three-dimensional reconstruction basic method for estimating the motion gesture of a camera and a scene structure by analyzing common feature points between a front frame and a rear frame (video sequence) which are continuous in time with a current image, and is a known technology.

And 2.3, constructing a nonlinear optimization problem under the geometric constraint of multiple views based on cross-source matching point pairs with space-time codes, screening reliable matching pairs through geometric verification, carrying out initialization reconstruction on high-matching-degree image pairs, decomposing an essential matrix to acquire an initial camera pose and triangulating to generate a seed point cloud, wherein the triangulating to generate the seed point cloud is a geometric calculation method for generating an initial sparse three-dimensional point cloud by calculating optimal intersection points of sight lines of the two-dimensional image points successfully matched from multiple different views in a three-dimensional space.

And 2.4, performing incremental registration on the image sequence of the cross-source target scene data set, calculating the pose of the new camera by a perspective n-point method, expanding the point cloud, and cycling the process according to space-time coding of the cross-source matching point pair.

The incremental registration is a technical process for gradually expanding and refining the whole three-dimensional reconstruction result by gradually and sequentially aligning and fusing newly acquired three-dimensional scanning data (or images) with the existing global scene model.

And 2.5, when the number of the incremental point clouds or the registered images is larger than a preset threshold value, optimizing the point clouds and the camera external parameter matrix by adopting a beam adjustment method until all the images are registered, and outputting final sparse point clouds and camera pose, wherein the sparse reconstruction result is shown in (b) of fig. 2, and the pose of the sparse reconstruction characteristic point clouds and the cross-source images in the three-dimensional space of the specific embodiment is shown, so that a geometric prior is provided for further dense reconstruction.

The beam method adjustment (Bundle Adjustment, BA) is a refinement algorithm for obtaining a global optimal three-dimensional structure and motion parameters by simultaneously optimizing all camera parameters and three-dimensional point coordinates to minimize the re-projection error of the projection onto a two-dimensional image.

Step 3, adopting an improved Gaussian sputtering explicit radiation field, compressing an original three-dimensional Gaussian kernel into a two-dimensional Gaussian primitive by double tangent vector constraint, forcing the primitive to be attached to a reconstruction surface geometric manifold, and carrying out multi-scale reconstruction on a sparse reconstruction result of a cross-source data set, wherein the method specifically comprises the following steps of:

step 3.1, using a point in three-dimensional space And a set of mutually perpendicular unit tangent vectorsAndTo represent a compressed two-dimensional Gaussian primitive whose normal vector is defined as。

Step 3.2, using scaling factorsAndControlling the variance of two-dimensional Gaussian planes, respectively representing along tangent vectorsAndScaling coefficients of the directions, thereby achieving a reduced-dimension representation of the three-dimensional gaussian kernel to the two-dimensional gaussian primitive while preserving the surface geometry.

Step 3.3, final two-dimensional Gaussian primitive is composed of center pointsSum covariance matrixIt is fully defined that,WhereinFor the rotation matrix, the spatial orientation of the gaussian primitives is defined,To scale the matrix, the shape of the gaussian primitive is defined,Representing the diagonal matrix of diagonal elements.

Step 3.4, initializing the two-dimensional Gaussian primitive according to the sparse point cloud obtained in the step 2.5, and determining a center pointAnd (3) positioning on the characteristic points of the reconstructed object geometry, and performing preliminary fitting on the reconstructed surface.

Step 3.5, searching the nearest neighbor point by using a K neighbor method, performing principal component analysis on the local point cloud, calculating a covariance matrix of the neighbor point, and performing eigenvalue decomposition on the covariance matrix to obtain eigenvaluesAnd feature vectorThe feature values and the feature vectors are in one-to-one correspondence, the feature values correspond to the lengths of the feature vectors, and the feature vector corresponding to the feature value with the smallest feature value is selected as a plane normal vector.

The K nearest neighbor method (K-Nearest Neighbors, KNN) is a basic algorithm for quickly searching a plurality of nearest neighbors of each point based on a space distance in a point cloud so as to support outlier filtering, surface reconstruction and feature description, and is a known technology.

Step 3.6, based on the local geometry, the minimum eigenvalue corresponds to the eigenvectorFor normal vector of two-dimensional Gaussian primitive plane, double tangent vector is initialized toScaling factorAndThe two-dimensional gaussian primitive is brought into close proximity to the reconstruction surface at initialization.

Specifically, based on the sparse reconstruction point cloud, two-dimensional Gaussian primitives are initialized to enable the two-dimensional Gaussian primitives to be closely attached to the surface of a reconstruction target building, geometric prior information contained in the sparse point cloud generated by a motion recovery structure is fully utilized, and the principal tangent plane direction of each primitive is determined by calculating covariance matrix eigenvectors of local neighborhood of the point cloud, so that isotropic spherical Gaussian kernels in the traditional 3DGS are converted into two-dimensional plane primitives with definite directionality, further more accurate geometric reconstruction results are obtained, and the problem of reconstruction artifacts caused by outlier abnormal Gaussian existing in the original three-dimensional Gaussian is greatly reduced.

Step 4, performing geometric regularization optimization on the two-dimensional Gaussian primitives based on a differentiable renderer, adjusting primitive parameters through gradient back propagation, realizing progressive reconstruction from sparse point cloud to dense geometry, and generating a high-precision three-dimensional model corresponding to the target scene, wherein the method specifically comprises the following steps of:

Step 4.1, using depth regularization penalty As a depth optimized loss function, whereinIs the firstThe mixing weights of the individual rays and the plane intersection points,Is the firstDepth values of a two-dimensional gaussian primitive to the imaging plane,Is the firstAnd by minimizing the depth gap between the intersection points, the depth blurring caused by dispersion is avoided, gaussian artifacts are reduced, and the reconstructed geometric shape is clearer and more accurate.

Step 4.2, using the legal line consistency penaltyOptimizing the loss function as a normal, whereinIs the planar normal vector of the two-dimensional gaussian primitive,By aligning the normal of the two-dimensional Gaussian distribution with the normal of the actual surface, the reconstructed surface is ensured to be smooth and the local geometry is accurate.

And 4.3, using an end-to-end differentiable rendering pipeline, taking all parameters such as position coordinates, bi-tangent vector directions, scaling factors, color and transparency characteristics and the like of the two-dimensional Gaussian primitives into a trainable variable set, generating a synthesized view through differentiable rasterization during each iterative rendering, and calculating pixel level loss with a real observation image.

Wherein differential rasterization is a rendering mechanism in the original gaussian sputtering algorithm that projects three-dimensional gaussian primitives into a two-dimensional screen space and performs gradient back-propagation, which allows each step of gradient of the entire rendering process to be calculated, thus allowing the parameters of the three-dimensional model itself to be optimized by direct back-propagation of the final image loss, as is known.

And 4.4, adopting an adaptive momentum optimizer to reversely propagate error signals along the rendering loss gradient direction, optimizing the coordinates of the central point of the primitive and the tangential vector direction for geometric parameters, improving geometric accuracy, optimizing the opacity and color function of the primitive for appearance parameters, and improving rendering fidelity.

The self-adaptive momentum optimizer (Adam) is a deep learning optimization algorithm combining a momentum method and a self-adaptive learning rate mechanism, is used for efficiently optimizing large-scale nonlinear parameters of the neural network in three-dimensional reconstruction so as to realize rapid and stable convergence, and is a known technology.

And 4.5, splitting the Gaussian primitives to fit the complex geometric structure according to the loss gradient threshold, and eliminating the abnormal Gaussian primitives according to the opacity and tangential scaling to realize high-precision modeling optimization.

And 4.6, performing coarse-to-fine cross-source data multi-scale reconstruction on the whole target scene based on space-time coding, and constructing the whole geometric framework and main structural characteristics of the scene by utilizing the wide-area coverage characteristics of the unmanned aerial vehicle oblique photography data.

And 4.7, gradually fusing high-resolution detail information of the ground panoramic image through space-time coding, supplementing fine geometric features such as wall textures, decoration members and the like on the fine granularity level, and forming a layered reconstruction process.

And 4.8, high-performance GPU is utilized to realize efficient modeling and optimization, and finally a high-precision model file corresponding to the target scene is generated, as shown in (d) of fig. 2.

As shown in fig. 3, the present invention further provides a cross-source data three-dimensional reconstruction system based on improved gaussian sputtering, which is configured to implement the above-mentioned cross-source data three-dimensional reconstruction method based on improved gaussian sputtering, and includes a data acquisition unit 201, a space-time coding unit 202, a sparse reconstruction unit 203, and a three-dimensional reconstruction unit 204, where the data acquisition unit 201, the space-time coding unit 202, the sparse reconstruction unit 203, and the three-dimensional reconstruction unit 204 are all computer programs.

The data acquisition unit 201 is used for acquiring an unmanned aerial vehicle oblique photography sequence and a ground panoramic image sequence of a reconstructed target area, constructing a space-time associated target scene data set and providing multi-view geometric constraint.

The space-time coding unit 202 is used for preprocessing the data of the target scene data set, and comprises scale matching of a cross-source image sequence, viewpoint unfolding and resolving of the panoramic image sequence and extraction of cross-source characteristic points, so that a cross-source matching point pair with space-time coding is obtained.

The sparse reconstruction unit 203 is used for performing intelligent incremental registration on the cross-source target scene data set image sequence, performing incremental sparse reconstruction based on space-time coding, outputting a sparse three-dimensional point cloud and a camera external parameter matrix, and providing accurate geometric prior for subsequent three-dimensional reconstruction.

The three-dimensional reconstruction unit 204 is used for adopting an improved Gaussian sputtering explicit radiation field, introducing double tangent vector constraint and geometric regular optimization, then performing multi-scale reconstruction based on machine learning on a target reconstruction scene based on a differentiable renderer, generating a high-precision model corresponding to the target scene, and completing a three-dimensional reconstruction task.

In a specific implementation process, the data acquisition unit 201 uses an unmanned aerial vehicle system equipped with a camera sensor to implement multi-angle oblique shooting on a target reconstruction area according to a preset flight path, adds a time sequence coding mark for each image in the shooting process to finally obtain an unmanned aerial vehicle oblique shooting sequence with time sequence information, adopts panoramic camera equipment to perform ground panoramic video acquisition on the target reconstruction area, acquires complete scene panoramic image data through multi-angle continuous shooting, thereby supplementing cross-source information of different visual angles and capturing richer scene detail characteristics to form a complete ground panoramic image sequence.

The space-time coding unit 202 pre-processes the acquired data, performs viewpoint expansion processing on the acquired panoramic video data, converts the original video frame into a undistorted planar image sequence which is completely covered in the horizontal direction, the visual angles of the images are uniformly distributed, then respectively establishes a multi-scale feature pyramid for the unmanned plane inclined image sequence and the ground panoramic expansion image sequence, extracts cross-source feature points by using scale invariant feature transformation, adds space-time coding vectors for the feature points, and performs cross-source matching of the feature points to obtain cross-source matching point pairs with space-time coding.

The sparse reconstruction unit 203 gradually restores the three-dimensional structure of the scene by adopting an incremental reconstruction method according to space-time coding information in the preprocessing data set, generates sparse point cloud data through iterative computation and optimizes camera external parameters, and provides accurate geometric constraint and initial conditions for subsequent fine three-dimensional reconstruction.

Finally, the three-dimensional reconstruction unit 204 adopts an improved Gaussian sputtering explicit radiation field, performs leachable reconstruction on a target scene by using a machine learning method based on a micro-renderable pipeline, realizes efficient modeling and optimization by using a high-performance GPU, finally generates a high-precision model file corresponding to the target scene, and can be applied to tasks such as robot navigation, medical imaging and diagnosis, historical remains digital twin protection, enhancement/virtual reality, automatic driving and the like after being exported.

The invention provides a cross-source data three-dimensional reconstruction method and system based on improved Gaussian sputtering, which are described in detail by applying specific embodiments, and are only used for helping to understand the method and core ideas of the invention. It should be noted that the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting to the protection scope thereof. It will be appreciated by those skilled in the art that any equivalent alterations or modifications to the embodiments described herein may be made without departing from the central spirit of the invention. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. The cross-source data three-dimensional reconstruction method based on improved Gaussian sputtering is characterized by comprising the following steps of:

2. The three-dimensional reconstruction method of cross-source data based on improved gaussian sputtering according to claim 1, wherein the obtaining the unmanned aerial vehicle oblique photography sequence and the ground panoramic image sequence of the reconstructed target region constructs a time-space associated target scene data set, comprising:

3. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein said creating cross-source matching point pairs with space-time coding by means of adaptive scale feature pyramids comprises:

4. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein the incremental sparse reconstruction based on space-time coding outputs a sparse three-dimensional point cloud and a camera extrinsic matrix, comprising:

5. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein the compressing of the original three-dimensional gaussian kernel into two-dimensional gaussian primitives by bi-tangent vector constraints comprises:

6. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein said forcing the primitives to conform to the reconstructed surface geometry comprises:

7. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein said differentiable renderer-based performing geometric regularization optimization on two-dimensional gaussian primitives comprises:

8. The method for three-dimensional reconstruction of cross-source data based on modified gaussian sputtering according to claim 1, wherein said adjusting primitive parameters by gradient back propagation comprises:

9. The method for three-dimensional reconstruction of cross-source data based on improved gaussian sputtering according to claim 1, wherein the multi-scale reconstruction of the cross-source data sparse reconstruction result to generate a high-precision model corresponding to the target scene comprises:

10. A cross-source data three-dimensional reconstruction system based on improved gaussian sputtering for implementing a cross-source data three-dimensional reconstruction method based on improved gaussian sputtering as set forth in any one of claims 1 to 9, comprising: