CN119206027B

CN119206027B - Implicit Reconstruction Method for Boundless Scenes in Autonomous Driving Considering Geometric Information Augmentation

Info

Publication number: CN119206027B
Application number: CN202411290327.7A
Authority: CN
Inventors: 王亚飞; 汪博文; 李泽星; 李若尧; 信超尹; 章翼辰
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2024-09-14
Filing date: 2024-09-14
Publication date: 2025-12-16
Anticipated expiration: 2044-09-14
Also published as: CN119206027A

Abstract

This invention relates to the field of autonomous driving scene reconstruction, and discloses an implicit reconstruction method and system for boundless autonomous driving scenes considering geometric information enhancement. It employs a geometric perception mesh-based neural rendering system for autonomous driving, which utilizes a perspective deformation hash grid and a signed distance function (SDF) to reconstruct and render an accurate driving environment from sparse sensor data. The system includes a hash pool with 16 levels, each holding 219 2D feature vectors. These feature vectors are spatially interpolated and input to an MLP network to extract scene features and SDF. Finally, high-quality scene reconstruction is achieved through optimization of RGB images, depth, and normals. During joint optimization, RGB input and monocular depth and normal observations are used as supervision. This system improves scene reconstruction quality, effectively handles boundless scenes, enhances performance in viewpoint-sparse environments, and exhibits higher robustness and accuracy when dealing with complex scenes in real-world driving environments.

Description

Automatic driving borderless scene implicit reconstruction method considering geometric information enhancement

Technical Field

The invention relates to the field of automatic driving scene reconstruction, in particular to an automatic driving borderless scene implicit reconstruction method and system considering geometric information enhancement.

Background

Neural radiation fields are a novel three-dimensional reconstruction technique. In autopilot, three-dimensional reconstruction technology is an indispensable key technology. Autopilot systems require accurate environmental awareness capabilities such as recognition and tracking of roadways, static scenes, dynamic objects, as well as path planning and high quality three-dimensional modeling of scenes. The three-dimensional reconstruction technology is utilized to assist the autopilot to realize the tasks, so that the safety and reliability of the autopilot are improved. Firstly, the neural radiation field technology can reconstruct a 2D image into a 3D scene, so that a high-precision map is manufactured, high-precision vehicle positioning and map matching are realized, and research and development of a downstream task of automatic driving are promoted; secondly, the nerve radiation field technology can synthesize complex automatic driving scenes, further enrich training data of automatic driving, and help an automatic driving system to carry out efficient data enhancement, thirdly, the nerve radiation field technology can simulate severe scenes such as extreme weather, serious traffic accidents and the like, so that the real severe scenes can be restored by the simulated data, and the safety of automatic driving is improved. In a word, the neural radiation field technology in three-dimensional reconstruction has wide application in automatic driving, and the neural radiation field technology is combined with an automatic driving scene to help promote development and application of the automatic driving technology.

Because of the limited view angle of the autopilot scene, neRF reconstruction quality will be reduced, how to synthesize a high quality view under the limited view angle is a problem that researchers need to face, because of the large-scale data volume of the autopilot scene and the limited storage capacity of the NeRF model, how to reconstruct the large-scale and large-scale scene in the limited model storage is a challenge that researchers need to face, because the autopilot scene has the transformation of illumination appearance and the like and the dynamic change of dynamic objects, the method exceeds the assumption of the original NeRF model, how to process the dynamic objects in the scene needs to be solved urgently, and finally how to improve the training and rendering speed of the model and accelerate the rendering of the autopilot scene needs to be considered. High-quality automatic driving scene reconstruction is a precondition for application landing such as large-scale commercialization of automatic driving.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides an automatic driving borderless scene implicit reconstruction method and system considering geometric information enhancement, which use an automatic driving geometric perception grid-based nerve rendering system, wherein the system reconstructs and renders an accurate driving environment from sparse sensor data by utilizing perspective deformed hash grids and Signed Distance Functions (SDFs), improves scene reconstruction quality, can effectively process borderless scenes, and improves performance in viewpoint sparse environments.

On one hand, the automatic driving borderless scene implicit reconstruction method considering geometric information enhancement provided by the invention comprises the following steps:

S1, performing space division on a borderless open scene through an octree structure to generate a plurality of local small scenes, mapping a space at infinity to a limited distance by using a perspective deformation function on each local small scene, performing space coding on the local small scenes, storing coding features by using a multi-resolution hash grid, indexing to local grids corresponding to any point feature of the space through the hash coding, and performing three-line interpolation by using vertexes of the local grids to obtain features of the space points;

s2, inputting the characteristics of the space points into a neural implicit rendering network, outputting a signed distance function SDF field and a color field, and rendering by utilizing the differentiable geometric enhancement characteristics to obtain predicted values of scene colors, scene depths and scene normal vectors;

S3, extracting scene colors, scene depths and scene normal vectors through a monocular depth model, and applying additional constraint to a Signed Distance Function (SDF) field by introducing a geometric prior generated by a pre-trained monocular estimation model to obtain true values of the scene colors, the scene depths and the scene normal vectors;

and S4, calculating a loss function according to the predicted value in the step S2 and the true value in the step S3, and carrying out joint optimization according to the loss function to realize scene reconstruction.

In step S1, generating a plurality of local small scenes by performing spatial division on the borderless open scene through the octree structure further includes:

initializing the root node size of the octree to be 32 times of a bounding box comprising all input camera trajectories;

for each tree node of the octree, determining whether to further subdivide according to the visibility of the camera and the distance from the node center, forming a leaf node set.

Further, in step S1, constructing the perspective deformation function by a principal component analysis PCA method, and mapping the space at infinity to a finite distance by using the perspective deformation function on each of the local small scenes specifically includes:

Firstly, taking point cloud data or grid vertexes of a three-dimensional model as a data set, wherein the three-dimensional coordinates of each point form an original feature vector, finding out the main change direction of the data set through a principal component analysis PCA method, and defining a new two-dimensional coordinate system by the main change direction, wherein the perspective deformation function is F (x) =MW (x), W (x) is the two-dimensional coordinates of projecting the point x to all visible cameras, and M is a projection matrix constructed through the principal component analysis PCA method.

Further, in step S1, storing the encoded features using the multi-resolution hash grid further includes:

The corresponding feature of each leaf node is mapped into a hash pool through a hash function, ha Xichi comprises 16 levels, each level holds 219 two-dimensional feature vectors, and the feature vectors are obtained through the following spatial interpolation calculation:

Where o and d represent the center of the camera and the direction of the light, j _i is the jacobian matrix of the perspective deformation function at x _i, and l is the hyper-parameter controlling the sampling interval, respectively.

Further, in step S2, inputting the features of the spatial points into a neural implicit rendering network, and outputting the signed distance function SDF field and the color field further includes:

The method comprises the steps of learning a Signed Distance Function (SDF) through a multi-layer perceptron (MLP) network, performing density modeling by taking the SDF as an intermediate variable, defining the SDF as a zero level set of the surface of an object, converting the SDF into bulk density through a cumulative distribution function of Laplace distribution, wherein the density is expressed by the following formula:

σ(x)=αΨ_β(-d_Ω(x)),

Where α and β are the learnable parameters, ψ _β represents the cumulative distribution function of the laplace distribution, and d _Ω (x) is the point x signed distance function SDF field representation function.

Preferably, in step S2, rendering the predicted values of the scene color, the scene depth, and the scene normal vector by using the differentiable geometric enhancement features further includes:

calculating the gradient of the signed distance function SDF by a numerical differentiation method so as to obtain a surface normal, wherein the initial step size is the size of a leaf node, and the initial step size is gradually reduced to capture local details, and the gradient calculation formula of the signed distance function SDF is expressed as:

where e is a minimum vector for perturbation x _i;

and finally calculating the predicted values of the scene colors, the scene depths and the scene normal vectors through a volume rendering technology.

Further, in step S3, applying additional constraints to the signed distance function SDF field by introducing a geometric prior generated by the pre-trained monocular estimation model, obtaining true values of the scene color, the scene depth, and the scene normal vector further includes:

Generating a relative depth value by using a pre-trained monocular estimation model, learning the scale and deviation of each batch by a least squares criterion, and optimizing by a depth consistency loss function to align the relative depth value with an actual depth value, wherein the depth consistency loss function is expressed as:

ensuring the consistency of the rendering normals and the predicted monocular normals in the same coordinate space, optimizing by a normals consistency loss function, the normals consistency loss function being represented by a calculated L1 normals loss and an angle loss, the normals consistency loss function being as follows:

Wherein k and b are learnable parameters of the aligned depth values, R is a set of rays in the training batch, R is an element in set R, A predicted value representing the scene depth of the variable r,Is the true value of the scene depth of the variable r,Is the predicted value of the scene normal vector of the variable r,Is the true value of the scene normal vector of the variable r.

Preferably, step S4 further comprises:

setting a scene color reconstruction loss function that, by minimizing the difference between the input image and the rendered image by color loss, connects the 3D environment and its corresponding 2D observations, the scene color reconstruction loss function being defined as:

Wherein R is the light ray set in the training batch, R is the element in the set R, A predicted value of the scene color representing the variable r,True values of scene colors for variable r;

Setting a regularization loss function, introducing an Eikonal term to normalize a Signed Distance Function (SDF) field, and restricting parallax by parallax loss to reduce floating artifacts, wherein the regularization loss function is defined as:

Wherein, the There is a gradient of the signed distance function SDF for the point x.

More preferably, in step S4, performing joint optimization according to the loss function to implement scene reconstruction further includes:

All loss functions are jointly optimized by an Adam optimizer until a predetermined reconstruction quality and accuracy are reached, and the final loss function is expressed as:

L=L_rgb+λ_depthL_depth+λ_normalL_normal+λ_eikonalL_eikonal+λ_dispLx_isp,

Wherein λ _depth is a depth uniformity loss weight, λ _eikonal is a normal uniformity loss weight, λ _eikonal is a regular loss weight, L _disp is a disparity map loss, and λ _disp is a disparity map loss weight.

On the other hand, the invention provides an automatic driving borderless scene implicit reconstruction system with geometrical information enhancement, which comprises the following steps:

The space division characterization module is used for carrying out space division on an unbounded open scene through an octree structure to generate a plurality of local small scenes, mapping a space at infinity to a limited distance by using a perspective deformation function on each local small scene, carrying out space coding on the local small scenes, storing coding characteristics by using a multi-resolution hash grid, indexing to local grids corresponding to any point characteristic of the space through the hash coding, and carrying out three-line interpolation by utilizing vertexes of the local grids to obtain the characteristics of the space points;

the spatial rendering module is used for inputting the characteristics of the spatial points into a neural implicit rendering network, outputting a signed distance function SDF field and a color field, and rendering by utilizing the differentiable geometric enhancement characteristics to obtain predicted values of scene colors, scene depths and scene normal vectors

The multi-consistency loss function scene reconstruction module is used for extracting scene colors, scene depths and scene normal vectors through a monocular depth model, applying additional constraint to a signed distance function SDF field through introducing a geometric prior generated by a pre-trained monocular estimation model to obtain true values of the scene colors, the scene depths and the scene normal vectors, calculating a loss function according to the predicted values and the true values, and carrying out joint optimization according to the loss function to realize scene reconstruction.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, by introducing the geometric enhancement features and perspective deformation hash grids, the reconstructed driving scene is obviously improved in geometric details and global consistency, and the reconstruction quality on KITT I, free-dataset and self-acquired urban road data sets is superior to that of the existing most advanced method;

(2) The invention utilizes SDF fields and geometric prior enhancement characteristics, so that the system can still maintain high precision and high stability when processing complex open scenes with sparse view points, and particularly under the conditions of a low texture area and the sparse view points, the reconstruction effect of the system is still excellent, and mismatch and blurring phenomena in reconstruction are obviously reduced;

(3) According to the invention, through space division and feature storage of perspective deformed hash grids, efficient sampling and calculation are realized on data processing and feature extraction, the calculation complexity and time cost of a system are remarkably reduced, and under the same training turn, compared with the traditional multi-view three-dimensional reconstruction method, the method can obtain a reconstruction result with higher quality in a shorter time;

(4) By using the pre-trained monocular depth and normal estimation model, the system can extract geometric prior from relatively simple and low-cost image data, and the feasibility and economy of reconstruction are improved;

drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

In the drawings:

FIG. 1 is a flow chart of an automatic driving borderless scene implicit reconstruction method considering geometric information enhancement;

FIG. 2 is a schematic view of a scene color, normal vector, and depth rendering effect according to the present invention;

fig. 3 is a block diagram of an automatic driving borderless scene implicit reconstruction system considering geometric information enhancement according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The present invention and system includes a hash pool Ha Xichi including 16 levels, each level holding 219 two-dimensional feature vectors. The feature vector is input to the MLP network through spatial interpolation, and scene features and SDF are extracted. Finally, high quality scene reconstruction is achieved by optimization of RGB images, depth and normals.

The following describes specific embodiments of the present invention with reference to the drawings and examples.

Example 1

As shown in fig. 1, the technical scheme of the implicit reconstruction method of the automatic driving borderless scene considering geometric information enhancement provided in the embodiment includes the following steps:

In step S1, the sparse sensor data is spatially transformed through a perspective deformation function, and the hash grid is used for storing coding features, so that efficient data sampling and processing are ensured. Wherein, when space division, the generating a plurality of local small scenes by space division of the borderless open scene through the octree structure further comprises:

In addition, on perspective deformation, constructing the perspective deformation function through a Principal Component Analysis (PCA) method to ensure the information fidelity after the dimension reduction, specifically, mapping the space at infinity to a finite distance by using the perspective deformation function on each local small scene specifically comprises the following steps:

Firstly, taking point cloud data or grid vertexes of a three-dimensional model as a data set, wherein the three-dimensional coordinates of each point form an original feature vector, finding out the main change direction of the data set through a principal component analysis PCA method, and defining a new two-dimensional coordinate system by the main change direction, wherein the perspective deformation function is F (x) =MW (x), W (x) is the two-dimensional coordinates of projecting the point x to all visible cameras, and M is a projection matrix constructed through the principal component analysis PCA method. The deformation process can ensure uniform sampling in the deformation space, and improves the accuracy and efficiency of sampling.

σ(x)=αΨ_β(-d_Ω(x)),

In addition, in step S2, rendering the predicted values of the scene color, the scene depth, and the scene normal vector by using the differentiable geometric enhancement features further includes:

where e is a minimum vector for perturbation x _i;

Further, in step S3, by introducing a geometric prior generated by the pre-trained monocular estimation model, the geometric prior including depth and normal estimation, applying additional constraints to the signed distance function SDF field, obtaining true values of scene colors, scene depths, and scene normal vectors further includes:

In step S4, the joint optimization is performed by combining the RGB input and the monocular depth and normal line observation, and further includes:

Preferably, in step S4, performing joint optimization according to the loss function to implement scene reconstruction further includes:

L=L_rgb+λ_depthL_depth+λ_normalL_normal+λ_eikonalL_eikonal+λ_dispL_disp,

Wherein λ _depth is a depth uniformity loss weight, λ _normal is a normal uniformity loss weight, λ _eikonal is a regular loss weight, L _disp is a disparity map loss, and λ _disp is a disparity map loss weight.

Through the specific implementation manner, the method and the device can realize efficient and accurate automatic driving scene reconstruction, and provide reliable technical support for efficient training and testing of an automatic driving system.

We have compared the performance of the present invention with existing methods across multiple data sets through experimental verification. The results show that in the sky and stair scenes of Free-dataset, the invention achieves better performance on indexes such as PSNR, SSIM and LPIPS. On KITTI and the self-acquired FMAD dataset, the method and the device have higher robustness and accuracy when processing complex scenes in the actual driving environment.

The specific experimental data are as follows:

in a Free-dataset 'sky' scene, the PSNR of the method reaches 25.93, the SSIM reaches 0.827 and the LPIPS reaches 0.336, which are superior to the existing method.

In the "highway" scene from the acquired FMAD dataset, the method has a PSNR of 24.13, ssim of 0.825, and lpas of 0.347, significantly better than other methods.

In conclusion, theoretical analysis and experimental data prove that compared with the prior art, the method has the advantages that the reconstruction quality and robustness of the automatic driving scene are remarkably improved, the cost is reduced, the efficiency is improved, and the method has important practical application value.

Example 2

The automatic driving borderless scene implicit reconstruction system considering geometric information enhancement provided by the embodiment comprises:

Finally, it should be noted that the above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above examples, but all technical solutions belonging to the concept of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

Claims

1. An implicit reconstruction method for boundless scenes in autonomous driving considering geometric information enhancement, characterized by the following steps:

S1: The boundless open scene is spatially divided into several local small scenes through an octree structure. A perspective deformation function is applied to each local small scene to map the space at infinity to a finite distance. The local small scene is spatially encoded, and the encoded features are stored using a multi-resolution hash grid. The local grid corresponding to any point feature in the space is indexed by the hash encoding. The features of the spatial point are obtained by trilinear interpolation using the vertices of the local grid.

S2: Input the features of the spatial points into the neural implicit rendering network, output a signed distance function SDF field and a color field, and use differentiable geometric enhancement features to render the predicted values of scene color, scene depth and scene normal vector.

S3: Extract scene color, scene depth, and scene normal vector through a monocular depth model. By introducing geometric priors generated by a pre-trained monocular estimation model, additional constraints are applied to the signed distance function SDF field to obtain the true values of scene color, scene depth, and scene normal vector.

S4: Calculate the loss function based on the predicted value in step S2 and the true value in step S3, and perform joint optimization based on the loss function to achieve scene reconstruction.

2. The implicit reconstruction method for unbounded scenes of autonomous driving considering geometric information enhancement according to claim 1, characterized in that, in step S1, the spatial division of the unbounded open scene using an octree structure to generate several local small scenes further includes:

The root node size of the octree is initialized to 32 times the size of the bounding box that includes all input camera trajectories;

For each node of the octree, the decision to further subdivide it based on the camera's visibility and the distance from the node's center is made to form a set of leaf nodes.

3. The implicit reconstruction method for borderless scenes of autonomous driving considering geometric information enhancement according to claim 1, characterized in that, in step S1, the perspective distortion function is constructed by principal component analysis (PCA), and the perspective distortion function is used to map the space at infinity to a finite distance on each local small scene, specifically including:

First, the point cloud data or mesh vertices of the 3D model are used as the dataset. The 3D coordinates of each point constitute the original feature vector. The main direction of change of the dataset is found by the principal component analysis (PCA) method, and a new 2D coordinate system is defined accordingly. The perspective distortion function is: F(x) = MW(x), where W(x) is the 2D coordinate of the point x projected onto all visible cameras, and M is the projection matrix constructed by the principal component analysis (PCA) method.

4. The implicit reconstruction method for borderless scenes of autonomous driving considering geometric information enhancement according to claim 2 or 3, characterized in that, in step S1, storing the encoded features using a multi-resolution hash grid further includes:

The feature corresponding to each leaf node is mapped to a hash pool through a hash function. The hash pool includes 16 levels, and each level holds 219 two-dimensional feature vectors. The feature vectors are calculated through the following spatial interpolation:

Where o and d represent the center of the camera and the direction of the light rays, respectively, ji is the Jacobian matrix of the perspective distortion function at x _i , and l is the hyperparameter controlling the sampling interval.

5. The implicit reconstruction method for borderless scenes of autonomous driving considering geometric information enhancement according to claim 1, characterized in that, in step S2, inputting the features of the spatial points into a neural implicit rendering network and outputting a signed distance function SDF field and a color field further includes:

A signed distance function SDF is learned through a multilayer perceptron (MLP) network and used as an intermediary variable for density modeling. The signed distance function SDF is defined as the zero-level set of the object surface. The signed distance function SDF is transformed into volume density through the cumulative distribution function of the Laplace distribution, and the density is expressed by the following formula:

σ(x)＝αψβ(-d _Ω (x)),

Where α and β are learnable parameters, ψβ represents the cumulative distribution function of the Laplace distribution, and _dΩ (x) is the SDF field representation function of the signed distance function at point x.

6. The implicit reconstruction method for borderless scenes in autonomous driving considering geometric information enhancement according to claim 5, characterized in that, in step S2, the predicted values of scene color, scene depth, and scene normal vector are obtained by rendering differentiable geometric enhancement features, further comprising:

The gradient of the signed distance function SDF is calculated using numerical differentiation to obtain the surface normal. The initial step size is the size of the leaf node, which is gradually reduced to capture local details. The formula for calculating the gradient of the signed distance function SDF is expressed as follows:

Where ∈ is a minimal vector used to perturb x _i ;

Then, volume rendering technology is used to calculate the predicted values of scene color, scene depth, and scene normal vectors.

7. The implicit reconstruction method for boundaryless scenes in autonomous driving considering geometric information enhancement according to claim 1, characterized in that, in step S3, by introducing geometric priors generated by a pre-trained monocular estimation model, additional constraints are applied to the signed distance function SDF field to obtain the ground truth values of scene color, scene depth, and scene normal vector, further comprising:

Relative depth values are generated using a pre-trained monocular estimation model. The scale and bias of each batch are learned using the least squares criterion, and optimized using a depth consistency loss function to align the relative depth values with the actual depth values. The depth consistency loss function is expressed as:

To ensure consistency between the rendered normal and the predicted monocular normal in the same coordinate space, optimization is performed using a normal consistency loss function. This loss function is derived by calculating L1 norm loss and angle loss, and is expressed as follows:

Where k and b are learnable parameters for the alignment depth value, R is the set of rays in the training batch, and r is an element in set R. This represents the predicted value of scene depth for variable r. Let r be the truth value of the scene depth. Let r be the predicted value of the scene normal vector for variable r. Let r be the truth value of the scene normal vector for variable r.

8. The implicit reconstruction method for boundless scenes of autonomous driving considering geometric information enhancement according to claim 6 or 7, characterized in that step S4 further includes:

A scene color reconstruction loss function is set up to minimize the difference between the input image and the rendered image by color loss, connecting the 3D environment with its corresponding 2D observation. The scene color reconstruction loss function is defined as follows:

Where R is the set of rays in the training batch, and r is an element in set R. This represents the predicted value of the scene color for variable r. Let r be the truth value of the scene color;

A regularized loss function is set up, introducing an Eikonal term to normalize the signed distance function SDF field, and disparity loss constrains disparity to reduce floating artifacts. The regularized loss function is defined as follows:

in, Let x be the gradient of the signed distance function SDF at the point x.

9. The implicit reconstruction method for borderless autonomous driving scenes considering geometric information enhancement according to claim 8, characterized in that, in step S4, the joint optimization based on the loss function to achieve scene reconstruction further includes:

All loss functions are jointly optimized by the Adam optimizer until the predetermined reconstruction quality and accuracy are achieved. The final loss function is expressed as follows:

L＝L _rgb +λ _depth L _depth +λ _normal L _normal +λ _eikonal L _eikonal +λ _disp L _disp ,

Where λ _depth is the depth consistency loss weight, λ _normal is the normal consistency loss weight, λ _eikonal is the regularization loss weight, L _disp is the disparity map loss, and λ _disp is the disparity map loss weight.

10. An implicit reconstruction system for boundless scenes in autonomous driving, considering geometric information enhancement, characterized in that it includes:

The spatial partitioning and representation module is used to divide the boundless open scene into several local small scenes using an octree structure. A perspective deformation function is applied to each local small scene to map the space at infinity to a finite distance. The local small scene is spatially encoded, and the encoded features are stored using a multi-resolution hash grid. The hash encoding indexes the local grid corresponding to any point feature in the space. The features of the spatial point are obtained by trilinear interpolation using the vertices of the local grid.

The spatial rendering module is used to input the features of the spatial points into the neural implicit rendering network, output a signed distance function (SDF) field and a color field, and use differentiable geometric enhancement features to render predicted values of scene color, scene depth, and scene normal vector.

The multi-consistent loss function scene reconstruction module is used to extract scene color, scene depth, and scene normal vector through a monocular depth model. By introducing geometric priors generated by a pre-trained monocular estimation model, additional constraints are applied to the signed distance function SDF field to obtain the ground truth values of scene color, scene depth, and scene normal vector. At the same time, the loss function is calculated based on the predicted values and the ground truth values, and joint optimization is performed based on the loss function to achieve scene reconstruction.