CN120672942B - A method, system, device, and medium for 3D reconstruction based on monocular video. - Google Patents

A method, system, device, and medium for 3D reconstruction based on monocular video.

Info

Publication number
CN120672942B
CN120672942B CN202510722081.4A CN202510722081A CN120672942B CN 120672942 B CN120672942 B CN 120672942B CN 202510722081 A CN202510722081 A CN 202510722081A CN 120672942 B CN120672942 B CN 120672942B
Authority
CN
China
Prior art keywords
point cloud
processing
scene
video
monocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510722081.4A
Other languages
Chinese (zh)
Other versions
CN120672942A (en
Inventor
陈天戈
吴卉
黄志青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhongyiyong Intelligent Technology Co ltd
Original Assignee
Guangzhou Zhongyiyong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhongyiyong Intelligent Technology Co ltd filed Critical Guangzhou Zhongyiyong Intelligent Technology Co ltd
Priority to CN202510722081.4A priority Critical patent/CN120672942B/en
Publication of CN120672942A publication Critical patent/CN120672942A/en
Application granted granted Critical
Publication of CN120672942B publication Critical patent/CN120672942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three-dimensional [3D] modelling for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a three-dimensional reconstruction method, a system, equipment and a medium based on monocular video, wherein the method comprises the following steps: the method comprises the steps of acquiring and processing data of an indoor space through a monocular camera to obtain a monocular video, carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space to obtain video fragments, carrying out local reconstruction processing on the video fragments to obtain local reconstruction point clouds, carrying out key frame joint registration processing on the local reconstruction point clouds to obtain registration scene frames, and carrying out global scene optimization processing on the registration scene frames according to space constraint to obtain a three-dimensional reconstruction result. The embodiment of the application can improve the accuracy of three-dimensional reconstruction and can be widely applied to the technical field of computer vision.

Description

Three-dimensional reconstruction method, system, equipment and medium based on monocular video
Technical Field
The application relates to the technical field of computer vision, in particular to a three-dimensional reconstruction method, system, equipment and medium based on monocular video.
Background
In the related art, a three-dimensional reconstruction method generally collects a large amount of three-dimensional point cloud or image data based on a plurality of stereo cameras and other devices, and converts the data into a three-dimensional model through a three-dimensional reconstruction algorithm. However, in practical application, the related method is found that in a small space environment, the characteristic repeated area is increased due to a narrow visual field, registration ambiguity is caused, and the monocular drift is aggravated due to the fact that the motion parallax effectiveness is reduced in a limited moving range, so that the efficiency of three-dimensional reconstruction is affected. In summary, the technical problems in the related art are to be improved.
Disclosure of Invention
The embodiment of the application mainly aims to provide a three-dimensional reconstruction method, a system, equipment and a medium based on monocular video, which can improve the accuracy of three-dimensional reconstruction.
To achieve the above object, an aspect of an embodiment of the present application provides a three-dimensional reconstruction method based on monocular video, the method including:
the method comprises the steps that data acquisition processing is carried out on an indoor space through a monocular camera, so that monocular video is obtained;
Carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space to obtain a video segment;
Carrying out local reconstruction processing on the video segment to obtain local reconstruction point cloud;
Performing key frame joint registration processing on the local reconstruction point cloud to obtain a registration scene frame;
And carrying out global scene optimization processing on the registration scene frame according to space constraint to obtain a three-dimensional reconstruction result.
In some embodiments, the sliding window segmentation processing is performed on the monocular video according to the spatial volume of the indoor space to obtain a video segment, which includes the following steps:
performing volume calculation processing on the indoor space according to the monocular video to obtain the space volume;
Initializing a sliding window according to the space volume, and adjusting the length of the sliding window based on motion blur detection to obtain a target sliding window;
and dividing the monocular video according to the target sliding window to obtain the video fragment.
In some embodiments, the performing a local reconstruction process on the video segment to obtain a local reconstruction point cloud includes the following steps:
Performing feature extraction processing on the video clips through an image encoder, and performing time sequence feature fusion processing on the extracted features by adopting a gating circulation unit to obtain spatial scene features;
Performing multi-view information fusion processing on the spatial scene characteristics through a key frame decoder to obtain multi-view information;
performing key frame information supplementing processing on the space scene characteristics through a supporting frame decoder to obtain key frame information;
performing bidirectional cross attention calculation processing on the multi-view information and the key frame information according to space locality constraint to obtain fusion characteristics;
and carrying out regression prediction processing on the fusion characteristics based on a point cloud regression module of deformable convolution to obtain the local reconstruction point cloud.
In some embodiments, the performing a keyframe joint registration process on the local reconstruction point cloud to obtain a registered scene frame includes the following steps:
acquiring a scene frame buffer pool, wherein the scene frame buffer pool comprises historical scene frames;
performing coordinate transformation processing on the local reconstruction point cloud to obtain global point cloud data;
And carrying out registration retrieval processing on the global point cloud data according to the scene frame buffer pool to obtain the registration scene frame.
In some embodiments, the performing registration retrieval processing on the scene frame buffer pool according to the global point cloud data to obtain the registered scene frame includes the following steps:
performing cosine similarity retrieval processing on each historical scene frame in the scene frame buffer pool according to the global point cloud data to generate a key frame set;
performing space-time feature alignment processing on the keyframe set to obtain cross-keyframe space-time features;
performing three-dimensional point cloud registration processing on the keyframe set according to the cross-keyframe space-time characteristics to obtain a registration point cloud;
And carrying out point cloud fusion processing on the registration point cloud to obtain the registration scene frame.
In some embodiments, the global scene optimization processing is performed on the registration scene frame according to spatial constraint to obtain a three-dimensional reconstruction result, including the following steps:
performing point cloud optimization processing on the registration scene frame to obtain point cloud optimization data;
Performing plane constraint optimization processing on the point cloud optimization data to obtain plane optimization data;
and carrying out space topology optimization processing on the plane optimization data to obtain the three-dimensional reconstruction result.
In some embodiments, the performing a spatial topology optimization process on the plane optimization data to obtain the three-dimensional reconstruction result includes the following steps:
Performing topology construction processing on the plane optimization data to obtain a scene topological graph;
and carrying out iterative optimization processing on the scene topological graph according to a graph convolution network to obtain the three-dimensional reconstruction result.
To achieve the above object, another aspect of an embodiment of the present application provides a three-dimensional reconstruction system based on monocular video, the system including:
The first module is used for obtaining and processing the video of the indoor space through the monocular camera to obtain monocular video;
The second module is used for carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space to obtain a video segment;
the third module is used for carrying out local reconstruction processing on the video segment to obtain a local reconstruction point cloud;
A fourth module, configured to perform keyframe joint registration processing on the local reconstruction point cloud to obtain a registration scene frame;
And a fifth module, configured to perform global scene optimization processing on the registration scene frame according to spatial constraint, so as to obtain a three-dimensional reconstruction result.
To achieve the above object, another aspect of the embodiments of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the method described above when executing the computer program.
To achieve the above object, another aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method described above.
The embodiment of the application at least comprises the following beneficial effects that the three-dimensional reconstruction method, the system, the equipment and the medium based on the monocular video are provided, the monocular video is obtained by carrying out data acquisition processing on the indoor space through the monocular camera, the video fragments are obtained by carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space, the window length can be adaptively adjusted according to the space volume, the local window overlapping rate can be increased, the monocular scale drift is reduced, and a data base is provided for the follow-up local reconstruction. In addition, the scheme obtains local reconstruction point clouds by carrying out local reconstruction processing on video clips, obtains registration scene frames by carrying out key frame joint registration processing on the local reconstruction point clouds, obtains three-dimensional reconstruction results by carrying out global scene optimization processing on the registration scene frames according to space constraint, can detect reconstruction integrity based on the space constraint, reduces registration errors and improves the accuracy of three-dimensional reconstruction.
Drawings
Fig. 1 is a flowchart of a three-dimensional reconstruction method based on monocular video according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a three-dimensional reconstruction system based on monocular video according to an embodiment of the present application;
fig. 3 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with embodiments of the application, but are merely examples of systems and methods consistent with aspects of embodiments of the application as detailed in the accompanying claims.
It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present application. The words "if", as used herein, may be interpreted as "when" or "in response to a determination", depending on the context.
The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In the related art, a three-dimensional reconstruction method generally collects a large amount of three-dimensional point cloud or image data based on a plurality of stereo cameras and other devices, and converts the data into a three-dimensional model through a three-dimensional reconstruction algorithm. However, in practical application, the related method is found that in a small space environment, the characteristic repeated area is increased due to a narrow visual field, registration ambiguity is caused, and the monocular drift is aggravated due to the fact that the motion parallax effectiveness is reduced in a limited moving range, so that the efficiency of three-dimensional reconstruction is affected. In summary, the technical problems in the related art are to be improved. For example, in the related art, an instantaneous positioning and map building (SLAM) method is used for three-dimensional reconstruction, but the method needs offline processing, so that the real-time requirement cannot be met, and the real-time dense SLAM system has defects in reconstruction accuracy and integrity. While depth sensor based solutions are costly and environmentally limited.
In view of the above, embodiments of the present application provide a three-dimensional reconstruction method, system, device, and medium based on monocular video, where the monocular video is obtained by performing data acquisition processing on an indoor space by using a monocular camera, and a video segment is obtained by performing sliding window segmentation processing on the monocular video according to a spatial volume of the indoor space, so that a window length can be adaptively adjusted according to the spatial volume, a local window overlapping rate can be increased, a monocular scale drift is reduced, and a data base is provided for subsequent local reconstruction. In addition, the scheme obtains local reconstruction point clouds by carrying out local reconstruction processing on video clips, obtains registration scene frames by carrying out key frame joint registration processing on the local reconstruction point clouds, obtains three-dimensional reconstruction results by carrying out global scene optimization processing on the registration scene frames according to space constraint, can detect reconstruction integrity based on the space constraint, reduces registration errors and improves the accuracy of three-dimensional reconstruction.
The embodiment of the application provides a three-dimensional reconstruction method based on monocular video, and relates to the technical field of computer vision. The three-dimensional reconstruction method based on the monocular video provided by the embodiment of the application can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle terminal, etc., the server may be configured as an independent physical server, may be configured as a server cluster or a distributed system formed by a plurality of physical servers, may be configured as a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platform, and the server may also be a node server in a blockchain network, and the software may be an application for implementing a three-dimensional reconstruction method based on monocular video, etc., but is not limited to the above forms.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Fig. 1 is an optional flowchart of a three-dimensional reconstruction method based on monocular video according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S105.
Step S101, data acquisition processing is carried out on an indoor space through a monocular camera, so that monocular video is obtained;
Step S102, carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space to obtain a video segment;
Step S103, carrying out local reconstruction processing on the video segment to obtain local reconstruction point cloud;
step S104, performing key frame joint registration processing on the local reconstruction point cloud to obtain a registration scene frame;
And step S105, performing global scene optimization processing on the registration scene frame according to space constraint to obtain a three-dimensional reconstruction result.
In the small space environment, the steps S101 to S105 are performed by using only a common monocular camera, and the real-time three-dimensional reconstruction with high precision and high integrity is performed, wherein the small space in the embodiment of the application refers to an indoor closed space with the area of 10-50 square meters and the height of not more than 5 meters. Specifically, in the embodiment of the application, the monocular video is obtained by carrying out data acquisition processing on the indoor space through the monocular camera, then the length of the sliding window is adaptively adjusted according to the space volume, the monocular video is segmented through the sliding window, and the input video stream is segmented into overlapped short segments. And then carrying out local reconstruction processing on the video segment, obtaining local reconstruction point cloud through an improved multi-branch neural network model, directly predicting dense 3D point cloud pictures of frames in a window, and establishing a local coordinate system by taking an intermediate frame as a key frame. In the embodiment, the local reconstruction point cloud is further subjected to key frame joint registration processing, the local reconstruction point cloud is incrementally registered to a global coordinate system, and the historical scene frame related to the current frame can be retrieved from the buffer pool based on the visual similarity and the baseline suitability score, so that the registered scene frame is obtained through joint registration. And finally, carrying out global scene optimization processing on the registration scene frame according to the space constraint to obtain a three-dimensional reconstruction result. Aiming at the characteristics of a small space environment, the embodiment of the application particularly optimizes the following parameters that the window length is set to 11 frames, the reconstruction quality and efficiency are balanced, the scene frame buffer Chi Daxiao is set to 30-50 frames, 5-10 most relevant scene frames are searched for each registration, and a multi-key frame co-registration strategy is adopted for registration processing.
The technical scheme has the advantages that registration ambiguity and monocular scale drift can be reduced through a dynamic window segmentation strategy and space topology constraint, and the accuracy of three-dimensional reconstruction is improved.
In some embodiments, the sliding window segmentation processing is performed on the monocular video according to the spatial volume of the indoor space to obtain a video segment, which includes the following steps:
performing volume calculation processing on the indoor space according to the monocular video to obtain the space volume;
Initializing a sliding window according to the space volume, and adjusting the length of the sliding window based on motion blur detection to obtain a target sliding window;
and dividing the monocular video according to the target sliding window to obtain the video fragment.
In the embodiment of the application, the volume calculation can be performed on the indoor space through the acquired monocular video, and the space volume can also be obtained through actually measuring the indoor space. According to the embodiment of the application, the space recognition processing can be carried out on the monocular video through the depth learning model, the wall surface or the ground of the space in the monocular video is detected, and the depth of the space is predicted through the depth prediction model, so that the volume is calculated according to the detected area and depth, and the space volume is obtained. The sliding window is then initialized based on the spatial volume, e.g., window frame numberV denotes the spatial volume. And then, the length of the sliding window is adjusted based on motion blur detection, the sharpness or edge information of the image can be utilized to detect the blur degree, so that the window length is adaptively adjusted according to the blur degree, and the monocular video is segmented according to the adaptively adjusted target sliding window, so that a video segment is obtained.
The technical scheme has the advantages that the size of the sliding window can be adjusted in a self-adaptive mode according to scene requirements by dynamically adjusting the sliding window, optimal balance of resources and performance is achieved, and a data basis is provided for follow-up three-dimensional reconstruction.
In some embodiments, the performing a local reconstruction process on the video segment to obtain a local reconstruction point cloud includes the following steps:
Performing feature extraction processing on the video clips through an image encoder, and performing time sequence feature fusion processing on the extracted features by adopting a gating circulation unit to obtain spatial scene features;
Performing multi-view information fusion processing on the spatial scene characteristics through a key frame decoder to obtain multi-view information;
performing key frame information supplementing processing on the space scene characteristics through a supporting frame decoder to obtain key frame information;
performing bidirectional cross attention calculation processing on the multi-view information and the key frame information according to space locality constraint to obtain fusion characteristics;
and carrying out regression prediction processing on the fusion characteristics based on a point cloud regression module of deformable convolution to obtain the local reconstruction point cloud.
In the embodiment of the application, the video segment is subjected to feature extraction processing through an image encoder, a cross-window feature transfer mechanism is introduced aiming at the characteristic of high repeatability of the features of a small space scene, and a gating circulation unit is adopted to perform time sequence feature fusion processing on the extracted features, wherein the formula of the gating circulation unit is as follows:
zt=σ(Wz·[ht-1,xt])
Where z t denotes the updated gate output vector for the current time step, σ denotes the Sigmoid activation function, W z denotes the learnable weight matrix corresponding to the updated gate, h t-1 denotes the hidden state vector for the last time step, and x t denotes the input feature vector for the current time step. And then carrying out multi-view information fusion processing on the space scene characteristics through a key frame decoder to obtain multi-view information, and carrying out key frame information supplementing processing on the space scene characteristics through a support frame decoder to obtain key frame information. The embodiment of the application also designs space locality constraint for a bidirectional cross attention mechanism, and defines the attention action radius according to the characteristics of a small space environment, wherein the calculation formula of the attention action radius is as follows:
Wherein r is the attention radius, W, H is the image width and height respectively, and the embodiment of the application forcedly focuses on the local geometric association in the small space scene through the attention radius. It should be noted that the embodiment of the present application uses an intermediate frame as a key frame to establish a local coordinate system. And finally, carrying out regression prediction processing on the fusion characteristics based on a point cloud regression module of the deformable convolution to obtain a local reconstruction point cloud. When a refinement module based on deformable convolution is added in the point cloud regression module, the embodiment of the application limits the convolution kernel deformation offset according to the small space characteristics, and the calculation formula of the convolution kernel deformation offset is as follows:
Wherein Δp is a convolution kernel deformation offset, D is a scene depth estimated value, and f is a focal length parameter.
The technical scheme has the advantages that the video segment is partially reconstructed, and the corresponding spatial scale constraint is added by combining the characteristics of the small-space environment, so that the characteristics of a narrow visual field can be better detected, and the accuracy of characteristic extraction and reconstruction is improved.
In some embodiments, the performing a keyframe joint registration process on the local reconstruction point cloud to obtain a registered scene frame includes the following steps:
acquiring a scene frame buffer pool, wherein the scene frame buffer pool comprises historical scene frames;
performing coordinate transformation processing on the local reconstruction point cloud to obtain global point cloud data;
And carrying out registration retrieval processing on the global point cloud data according to the scene frame buffer pool to obtain the registration scene frame.
In the embodiment of the application, the scene frame buffer pool is a pre-constructed buffer pool, wherein the buffer pool comprises a plurality of frames of historical scene frames, and the historical scene frames can be obtained through a database or through acquisition and processing of a space in advance. According to the embodiment of the application, the local reconstruction point cloud is incrementally registered to the global coordinate system to obtain global point cloud data, then the global point cloud data is registered and searched according to the scene frame buffer pool, cross-window feature multiplexing is realized through the pre-constructed scene frame buffer pool, and a plurality of historical scene frames similar to cosine similarity search are adopted as key frames for registration, so that registered scene frames are obtained.
The technical scheme has the advantages that the embodiment of the application can perform batch multi-key frame joint registration by performing registration and retrieval processing on the global point cloud data, thereby improving the registration efficiency.
In some embodiments, the performing registration retrieval processing on the scene frame buffer pool according to the global point cloud data to obtain the registered scene frame includes the following steps:
performing cosine similarity retrieval processing on each historical scene frame in the scene frame buffer pool according to the global point cloud data to generate a key frame set;
performing space-time feature alignment processing on the keyframe set to obtain cross-keyframe space-time features;
performing three-dimensional point cloud registration processing on the keyframe set according to the cross-keyframe space-time characteristics to obtain a registration point cloud;
And carrying out point cloud fusion processing on the registration point cloud to obtain the registration scene frame.
In the embodiment of the application, the similarity calculation is carried out on each historical scene frame in the scene frame buffer pool on the global point cloud data according to the cosine similarity, and a key frame set can be generated by setting a similarity threshold value for screening. And then carrying out space-time feature alignment on the keyframe set, and obtaining the cross-keyframe space-time feature by constructing a space-time feature cube and combining a three-dimensional convolution check space-time feature cube to carry out feature extraction. Then carrying out three-dimensional point cloud registration processing on the keyframe set according to the cross-keyframe space-time characteristics, and carrying out multi-keyframe joint registration by adopting an improved three-dimensional point cloud registration (ICP) algorithm, wherein the objective function is as follows:
Wherein the weights w k are dynamically calculated from the key frame confidence, R represents the rotation transformation matrix, An ith three-dimensional point coordinate representing a kth key frame in the source point cloud, t representing a translational transformation vector,Representing the coordinate in the target point cloudCorresponding three-dimensional point coordinates. Finally, carrying out point cloud fusion processing on the registration point cloud to obtain a registration scene frame, wherein the point cloud fusion can be carried out by establishing a probability fusion model, and the expression of the fusion model is as follows:
where p (x) represents a fusion probability density function representing a three-dimensional spatial point x, alpha k represents a mixture weight coefficient of a kth gaussian component, Representing a three-dimensional gaussian distribution probability density function, mu k representing the mean vector of the kth gaussian component, Σ k representing the covariance matrix of the kth gaussian component. The embodiment of the application can solve the optimal fusion parameters through the expectation maximization algorithm, set the point cloud confidence threshold as 3, and filter low-quality reconstructed point cloud data.
The technical scheme has the advantages that the embodiment of the application can simultaneously register a plurality of key frames by adopting the multi-key frame co-registration strategy, thereby improving the registration efficiency and accuracy.
In some embodiments, the global scene optimization processing is performed on the registration scene frame according to spatial constraint to obtain a three-dimensional reconstruction result, including the following steps:
performing point cloud optimization processing on the registration scene frame to obtain point cloud optimization data;
Performing plane constraint optimization processing on the point cloud optimization data to obtain plane optimization data;
and carrying out space topology optimization processing on the plane optimization data to obtain the three-dimensional reconstruction result.
In the embodiment of the application, a small space optimization strategy is set according to the characteristics of a small space scene, point cloud optimization processing is performed on a registration scene frame by introducing point cloud distribution optimization to obtain point cloud optimization data, plane constraint optimization processing is performed on the point cloud optimization data by plane constraint optimization to obtain plane optimization data, for example, constraint optimization is performed on large planes such as a plane constraint optimization wall surface by adding, and space topology optimization processing is performed on the plane optimization data by introducing space topology optimization to obtain a three-dimensional reconstruction result. Specifically, the point cloud distribution optimization adopts a density perception clustering algorithm to optimize the point cloud data by defining a density measure, wherein the expression of the density measure is as follows:
Where ρ (x) represents the density measure, x represents the target three-dimensional point coordinates of the density to be calculated, and x i represents the ith neighboring point coordinate within the neighborhood N (x). Then by setting the adaptive neighborhood radius r=μ d+ασd, where μ d is the average neighbor distance, α represents, σ d represents. And optimizing the distribution of the point cloud data according to the density measurement and the adaptive neighborhood radius. The embodiment of the application also uses a multi-plane detection algorithm for plane constraint optimization, wherein the expression of the plane detection algorithm is as follows:
Where n represents the normal vector of the plane, d represents the distance from the plane to the origin, Representing the gradient term of the normal vector in space, λ represents the regularization coefficient. According to the embodiment of the application, the plane relation diagram is established, and orthogonal constraint is forced, for example, the orthogonal constraint is that the included angle between the wall surfaces is 90+/-5 degrees, so that plane constraint is carried out on point cloud data to obtain optimized plane optimization data.
The technical scheme has the advantages that the accuracy of three-dimensional reconstruction can be improved by introducing planar orthogonality constraint optimization and density self-adaptive topology optimization.
In some embodiments, the performing a spatial topology optimization process on the plane optimization data to obtain the three-dimensional reconstruction result includes the following steps:
Performing topology construction processing on the plane optimization data to obtain a scene topological graph;
and carrying out iterative optimization processing on the scene topological graph according to a graph convolution network to obtain the three-dimensional reconstruction result.
In the embodiment of the application, a scene topological graph G= (V, E, W) is obtained by constructing plane optimization data, vertexes represent space units, and then the distribution of point clouds is optimized through a graph rolling network (GCN), and an optimization formula is as follows:
Wherein H (l+1) represents the node characteristic matrix of the (l+1) th layer, sigma represents the nonlinear activation function, the embodiment of the application adopts the ReLU activation function, In a normalized form of the degree of representation matrix,Representing a normalized version of the adjacency matrix, H (l) representing the input node characteristics of the first layer, and W (l) representing the trainable weight matrix of the first layer.
The technical scheme has the advantages that the space topology optimization is carried out on the point cloud data through the graph convolution network, the reconstructed data can be more in line with the small space environment, and the accuracy of three-dimensional reconstruction is improved.
The following describes and illustrates the embodiments of the present application in detail with reference to specific application examples:
According to the embodiment of the application, the monocular RGB video input is received, the first window is initialized, all frames are tried to be used as key frame candidates, the reconstruction result with the highest total confidence is selected to initialize the global scene, and the optimization is carried out by establishing small space priori constraint, so that the three-dimensional reconstruction result is obtained. According to the embodiment of the application, the video fragment is obtained through the sliding window, the characteristics of each frame are extracted through the image encoder, the multi-view information is fused through the key frame decoder, the key frame information is supplemented through the support frame decoder, and the 3D point cloud and the confidence coefficient are predicted through the regression head. And registering the reconstructed point cloud data to a global coordinate system, searching related historical scene frames through a scene frame buffer pool, converting the coordinate system by combining the coded images and the geometric feature registration decoder, optimizing the global scene through the scene decoder, and updating the scene frame buffer pool. And then optimizing by a small space optimizing strategy, and introducing plane orthogonality constraint to eliminate wall registration errors. Specifically, the embodiment of the application adopts a two-stage neural network framework, divides a video into short segments through a sliding window mechanism, directly predicts local 3D point clouds by using a first-stage network, and then is incrementally registered to a global coordinate system through a second-stage network. The window size, scene frame management strategy and space constraint are optimized for the small space environment, and high-quality real-time reconstruction without explicit camera parameter estimation is realized. By introducing window segmentation, plane orthogonality constraint optimization and density self-adaptive topology optimization of space volume sensing, compared with a related three-dimensional reconstruction method, the reconstruction integrity is improved by 42.7% in a 5m multiplied by 5m standard test scene, the registration error is reduced to 0.11m, and the real-time performance of 23FPS is maintained.
Referring to fig. 2, the embodiment of the present application further provides a three-dimensional reconstruction system based on a monocular video, which can implement the three-dimensional reconstruction method based on a monocular video, where the system includes:
a first module 201, configured to perform video acquisition processing on an indoor space through a monocular camera to obtain a monocular video;
a second module 202, configured to perform sliding window segmentation processing on the monocular video according to the spatial volume of the indoor space, so as to obtain a video segment;
a third module 203, configured to perform local reconstruction processing on the video segment to obtain a local reconstruction point cloud;
A fourth module 204, configured to perform a keyframe joint registration process on the local reconstruction point cloud to obtain a registration scene frame;
And a fifth module 205, configured to perform global scene optimization processing on the registered scene frame according to spatial constraint, so as to obtain a three-dimensional reconstruction result.
It can be understood that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the three-dimensional reconstruction method based on the monocular video when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
It can be understood that the content in the above method embodiment is applicable to the embodiment of the present apparatus, and the specific functions implemented by the embodiment of the present apparatus are the same as those of the embodiment of the above method, and the achieved beneficial effects are the same as those of the embodiment of the above method.
Referring to fig. 3, fig. 3 illustrates a hardware structure of an electronic device according to another embodiment, where the electronic device includes:
the processor 301 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application-specific integrated Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs, so as to implement the technical solutions provided by the embodiments of the present application;
The Memory 302 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 302 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 302, and the processor 301 invokes the three-dimensional reconstruction method based on monocular video to execute the embodiments of the present disclosure;
an input/output interface 303 for implementing information input and output;
The communication interface 304 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
a bus 305 for transferring information between various components of the device (e.g., processor 301, memory 302, input/output interface 303, and communication interface 304);
Wherein the processor 301, the memory 302, the input/output interface 303 and the communication interface 304 are communicatively coupled to each other within the device via a bus 305.
The embodiment of the application also provides a computer readable storage medium, which stores a computer program, and the computer program realizes the three-dimensional reconstruction method based on monocular video when being executed by a processor.
It can be understood that the content of the above method embodiment is applicable to the present storage medium embodiment, and the functions of the present storage medium embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
According to the three-dimensional reconstruction method, system, equipment and medium based on the monocular video, the monocular video is obtained by carrying out data acquisition processing on the indoor space through the monocular camera, the video segments are obtained by carrying out sliding window segmentation processing on the monocular video according to the space volume of the indoor space, the window length can be adaptively adjusted according to the space volume, the local window overlapping rate can be increased, the monocular-scale drift is reduced, and a data base is provided for the follow-up local reconstruction. In addition, the scheme obtains local reconstruction point clouds by carrying out local reconstruction processing on video clips, obtains registration scene frames by carrying out key frame joint registration processing on the local reconstruction point clouds, obtains three-dimensional reconstruction results by carrying out global scene optimization processing on the registration scene frames according to space constraint, can detect reconstruction integrity based on the space constraint, reduces registration errors and improves the accuracy of three-dimensional reconstruction.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.
The system embodiments described above are merely illustrative, in that the units illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the above elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. The storage medium includes various media capable of storing programs, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (9)

1.一种基于单目视频的三维重建方法,其特征在于,所述方法包括以下步骤:1. A three-dimensional reconstruction method based on monocular video, characterized in that the method includes the following steps: 通过单目摄像头对室内空间进行数据采集处理,得到单目视频;The data of the indoor space is collected and processed by a monocular camera to obtain monocular video. 根据所述室内空间的空间体积对所述单目视频进行滑动窗口分割处理,得到视频片段;The monocular video is segmented using a sliding window based on the spatial volume of the indoor space to obtain video segments. 对所述视频片段进行局部重建处理,得到局部重建点云;The video segment is subjected to local reconstruction processing to obtain a locally reconstructed point cloud; 对所述局部重建点云进行关键帧联合配准处理,得到配准场景帧;The locally reconstructed point cloud is subjected to keyframe joint registration processing to obtain the registered scene frame; 根据空间约束对所述配准场景帧进行全局场景优化处理,得到三维重建结果;Global scene optimization processing is performed on the registered scene frames based on spatial constraints to obtain the 3D reconstruction results; 所述根据所述室内空间的空间体积对所述单目视频进行滑动窗口分割处理,得到视频片段,包括以下步骤:The process of performing sliding window segmentation on the monocular video based on the spatial volume of the indoor space to obtain video segments includes the following steps: 根据所述单目视频对所述室内空间进行体积计算处理,得到所述空间体积;The volume of the indoor space is calculated based on the monocular video. 根据所述空间体积初始化滑动窗口,并基于运动模糊检测对所述滑动窗口的长度进行调整处理,得到目标滑动窗口;The sliding window is initialized based on the spatial volume, and the length of the sliding window is adjusted based on motion blur detection to obtain the target sliding window; 根据所述目标滑动窗口对所述单目视频进行分割处理,得到所述视频片段。The monocular video is segmented according to the target sliding window to obtain the video segment. 2.根据权利要求1所述的方法,其特征在于,所述对所述视频片段进行局部重建处理,得到局部重建点云,包括以下步骤:2. The method according to claim 1, characterized in that, the step of performing local reconstruction processing on the video segment to obtain a locally reconstructed point cloud includes the following steps: 通过图像编码器对所述视频片段进行特征提取处理,并采用门控循环单元对提取得到的特征进行时序特征融合处理,得到空间场景特征;The video segment is processed by an image encoder to extract features, and the extracted features are fused by a gated loop unit to obtain spatial scene features. 通过关键帧解码器对所述空间场景特征进行多视图信息融合处理,得到多视图信息;The spatial scene features are fused using a keyframe decoder to obtain multi-view information. 通过支持帧解码器对所述空间场景特征进行关键帧信息补充处理,得到关键帧信息;By supporting frame decoder to supplement keyframe information of the spatial scene features, keyframe information is obtained; 根据空间局部性约束对所述多视图信息和所述关键帧信息进行双向交叉注意力计算处理,得到融合特征;Based on spatial locality constraints, bidirectional cross-attention calculation is performed on the multi-view information and the keyframe information to obtain fused features; 基于可变形卷积的点云回归模块对所述融合特征进行回归预测处理,得到所述局部重建点云。The point cloud regression module based on deformable convolution performs regression prediction processing on the fused features to obtain the local reconstructed point cloud. 3.根据权利要求1所述的方法,其特征在于,所述对所述局部重建点云进行关键帧联合配准处理,得到配准场景帧,包括以下步骤:3. The method according to claim 1, characterized in that, the step of performing keyframe joint registration processing on the locally reconstructed point cloud to obtain a registered scene frame includes the following steps: 获取场景帧缓冲池,所述场景帧缓冲池包括历史场景帧;Obtain the scene frame buffer pool, which includes historical scene frames; 对所述局部重建点云进行坐标变换处理,得到全局点云数据;The local reconstructed point cloud is subjected to coordinate transformation to obtain global point cloud data; 根据所述场景帧缓冲池对所述全局点云数据进行配准检索处理,得到所述配准场景帧。The global point cloud data is registered and retrieved based on the scene frame buffer pool to obtain the registered scene frame. 4.根据权利要求3所述的方法,其特征在于,所述根据所述全局点云数据对所述场景帧缓冲池进行配准检索处理,得到所述配准场景帧,包括以下步骤:4. The method according to claim 3, characterized in that, the step of performing registration retrieval processing on the scene frame buffer pool based on the global point cloud data to obtain the registered scene frame includes the following steps: 根据所述全局点云数据对所述场景帧缓冲池中每一所述历史场景帧进行余弦相似度检索处理,生成关键帧集合;Based on the global point cloud data, cosine similarity retrieval is performed on each historical scene frame in the scene frame buffer pool to generate a keyframe set. 对所述关键帧集合进行时空特征对齐处理,得到跨关键帧时空特征;Spatiotemporal feature alignment processing is performed on the keyframe set to obtain cross-keyframe spatiotemporal features; 根据所述跨关键帧时空特征对所述关键帧集合进行三维点云配准处理,得到配准点云;Based on the cross-keyframe spatiotemporal features, the keyframe set is subjected to 3D point cloud registration processing to obtain a registered point cloud. 对所述配准点云进行点云融合处理,得到所述配准场景帧。The registered point cloud is subjected to point cloud fusion processing to obtain the registered scene frame. 5.根据权利要求1所述的方法,其特征在于,所述根据空间约束对所述配准场景帧进行全局场景优化处理,得到三维重建结果,包括以下步骤:5. The method according to claim 1, characterized in that, the step of performing global scene optimization processing on the registered scene frame according to spatial constraints to obtain the three-dimensional reconstruction result includes the following steps: 对所述配准场景帧进行点云优化处理,得到点云优化数据;The registered scene frame is subjected to point cloud optimization processing to obtain point cloud optimization data; 对所述点云优化数据进行平面约束优化处理,得到平面优化数据;The point cloud optimization data is subjected to planar constraint optimization processing to obtain planar optimization data; 对所述平面优化数据进行空间拓扑优化处理,得到所述三维重建结果。The planar optimization data is subjected to spatial topology optimization processing to obtain the three-dimensional reconstruction result. 6.根据权利要求5所述的方法,其特征在于,所述对所述平面优化数据进行空间拓扑优化处理,得到所述三维重建结果,包括以下步骤:6. The method according to claim 5, characterized in that, the step of performing spatial topology optimization processing on the planar optimization data to obtain the three-dimensional reconstruction result includes the following steps: 对所述平面优化数据进行拓扑构建处理,得到场景拓扑图;The planar optimization data is subjected to topology construction processing to obtain a scene topology map; 根据图卷积网络对所述场景拓扑图进行迭代优化处理,得到所述三维重建结果。The scene topology graph is iteratively optimized using a graph convolutional network to obtain the 3D reconstruction result. 7.一种基于单目视频的三维重建系统,其特征在于,所述系统包括:7. A 3D reconstruction system based on monocular video, characterized in that the system comprises: 第一模块,用于通过单目摄像头对室内空间进行视频获取处理,得到单目视频;The first module is used to acquire and process video of the indoor space through a monocular camera to obtain monocular video. 第二模块,用于根据所述室内空间的空间体积对所述单目视频进行滑动窗口分割处理,得到视频片段;The second module is used to perform sliding window segmentation on the monocular video according to the spatial volume of the indoor space to obtain video segments. 第三模块,用于对所述视频片段进行局部重建处理,得到局部重建点云;The third module is used to perform local reconstruction processing on the video segment to obtain a local reconstructed point cloud; 第四模块,用于对所述局部重建点云进行关键帧联合配准处理,得到配准场景帧;The fourth module is used to perform keyframe joint registration processing on the local reconstructed point cloud to obtain the registered scene frame; 第五模块,用于根据空间约束对所述配准场景帧进行全局场景优化处理,得到三维重建结果;The fifth module is used to perform global scene optimization processing on the registered scene frame according to spatial constraints to obtain the three-dimensional reconstruction result; 所述第二模块,用于根据所述室内空间的空间体积对所述单目视频进行滑动窗口分割处理,得到视频片段,包括:The second module is used to perform sliding window segmentation on the monocular video based on the spatial volume of the indoor space to obtain video segments, including: 根据所述单目视频对所述室内空间进行体积计算处理,得到所述空间体积;The volume of the indoor space is calculated based on the monocular video. 根据所述空间体积初始化滑动窗口,并基于运动模糊检测对所述滑动窗口的长度进行调整处理,得到目标滑动窗口;The sliding window is initialized based on the spatial volume, and the length of the sliding window is adjusted based on motion blur detection to obtain the target sliding window; 根据所述目标滑动窗口对所述单目视频进行分割处理,得到所述视频片段。The monocular video is segmented according to the target sliding window to obtain the video segment. 8.一种电子设备,其特征在于,所述电子设备包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现权利要求1至6任一项所述的方法。8. An electronic device, characterized in that the electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method according to any one of claims 1 to 6. 9.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至6中任一项所述的方法。9. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 6.
CN202510722081.4A 2025-05-30 2025-05-30 A method, system, device, and medium for 3D reconstruction based on monocular video. Active CN120672942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510722081.4A CN120672942B (en) 2025-05-30 2025-05-30 A method, system, device, and medium for 3D reconstruction based on monocular video.

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510722081.4A CN120672942B (en) 2025-05-30 2025-05-30 A method, system, device, and medium for 3D reconstruction based on monocular video.

Publications (2)

Publication Number Publication Date
CN120672942A CN120672942A (en) 2025-09-19
CN120672942B true CN120672942B (en) 2026-04-03

Family

ID=97055982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510722081.4A Active CN120672942B (en) 2025-05-30 2025-05-30 A method, system, device, and medium for 3D reconstruction based on monocular video.

Country Status (1)

Country Link
CN (1) CN120672942B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119206106A (en) * 2024-09-04 2024-12-27 西安电子科技大学 A three-dimensional mapping method and system
CN119229018A (en) * 2024-09-29 2024-12-31 浙江大学 A method, system and device for 3D reconstruction of monocular dynamic video based on scene flow prediction and neural implicit expression

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104103062A (en) * 2013-04-08 2014-10-15 富士通株式会社 Image processing device and image processing method
US9648303B1 (en) * 2015-12-15 2017-05-09 Disney Enterprises, Inc. Systems and methods for facilitating three-dimensional reconstruction of scenes from videos
CN110310285B (en) * 2019-05-14 2022-12-20 武汉泓毅智云信息有限公司 Accurate burn area calculation method based on three-dimensional human body reconstruction
CN112435325B (en) * 2020-09-29 2022-06-07 北京航空航天大学 VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
US12586293B2 (en) * 2023-01-19 2026-03-24 Nvidia Corporation Scene reconstruction from monocular video
CN118470203B (en) * 2024-05-17 2025-03-04 广州极点三维信息科技有限公司 Indoor 3D reconstruction and whole-home design method and system based on big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119206106A (en) * 2024-09-04 2024-12-27 西安电子科技大学 A three-dimensional mapping method and system
CN119229018A (en) * 2024-09-29 2024-12-31 浙江大学 A method, system and device for 3D reconstruction of monocular dynamic video based on scene flow prediction and neural implicit expression

Also Published As

Publication number Publication date
CN120672942A (en) 2025-09-19

Similar Documents

Publication Publication Date Title
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
Whelan et al. Real-time large-scale dense RGB-D SLAM with volumetric fusion
CN112435338B (en) Method and device for acquiring position of interest point of electronic map and electronic equipment
CN109658445A (en) Network training method, increment build drawing method, localization method, device and equipment
US10554957B2 (en) Learning-based matching for active stereo systems
WO2009023044A2 (en) Method and system for fast dense stereoscopic ranging
Pascoe et al. Robust direct visual localisation using normalised information distance.
CN120236003A (en) Three-dimensional modeling method and device
CN114359377B (en) A real-time 6D pose estimation method and computer-readable storage medium
KR102615412B1 (en) Apparatus and method for performing visual localization
GB2566443A (en) Cross-source point cloud registration
CN115984093A (en) Infrared image-based depth estimation method, electronic device and storage medium
CN120088514A (en) Image feature matching model, estimation method and system based on spatial geometric constraints
CN121236166A (en) Method, device, equipment and storage medium for detecting three-dimensional space change
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
CN117392228A (en) Visual odometry calculation method, device, electronic equipment and storage medium
Delmerico et al. Building facade detection, segmentation, and parameter estimation for mobile robot stereo vision
Qin et al. Depth estimation by parameter transfer with a lightweight model for single still images
CN117576363A (en) A positioning technology method based on monocular vision/inertia in complex indoor environments
CN121353500A (en) A method and related equipment for 3D reconstruction of underwater scenes based on 3D Gaussian sputtering
Zhang et al. A robust RGB‐D visual odometry with moving object detection in dynamic indoor scenes
CN118470203B (en) Indoor 3D reconstruction and whole-home design method and system based on big data
CN120672942B (en) A method, system, device, and medium for 3D reconstruction based on monocular video.
CN120070733A (en) Binocular data generation method, binocular data generation system, electronic equipment and storage medium
CN120178265A (en) A road pothole detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant