CN119206106A

CN119206106A - A three-dimensional mapping method and system

Info

Publication number: CN119206106A
Application number: CN202411236157.4A
Authority: CN
Inventors: 程曦娜; 王嘉良; 王搏; 焦健斌; 陈维婕; 焦李成
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2024-09-04
Filing date: 2024-09-04
Publication date: 2024-12-27

Abstract

The invention relates to the technical field of synchronous positioning and mapping, in particular to a three-dimensional mapping method and system. The method comprises the steps of extracting feature points of an obtained target map image by using an ORB-SLAM system, carrying out three-dimensional visual reconstruction based on the feature points to generate an initial sparse point cloud map, fusing sparse point cloud data of the target map image obtained by a plurality of sensors to obtain dense point cloud data, processing the obtained dense point cloud data by using a point cloud segmentation algorithm and a target detection algorithm to obtain target information in the dense point cloud data, and carrying out space segmentation and organization on the dense point cloud data by using a tree data structure algorithm according to the target information in the dense point cloud data to obtain a point cloud map with a hierarchical structure. The map data management method is accurate in map data and can efficiently manage the map data.

Description

Three-dimensional mapping method and system

Technical Field

The invention relates to the technical field of synchronous positioning and mapping, in particular to a three-dimensional mapping method and system.

Background

In recent years, a visual SLAM system has been a focus of attention in various fields as an important autonomous navigation technique. ORB-SLAM is used as a visual SLAM system based on characteristic points, and the design and operation principle of the ORB-SLAM system become milestone achievement in the fields of visual navigation and environment map construction. The ORB-SLAM describes feature points in a video sequence captured by a camera sensor by analyzing and extracting the feature points and using ORB feature descriptors. The uniqueness of the feature descriptor is that the feature descriptor combines a FAST key point detector and a BRIEF descriptor, so that the feature descriptor has the advantages of quick calculation, low memory consumption, strong robustness and the like. In addition, the directivity design of the ORB descriptor enables the ORB descriptor to be effectively adapted to image rotation and affine transformation, and accuracy and stability of matching are improved.

In the aspect of real-time positioning and map construction, the ORB-SLAM system realizes the efficient real-time positioning and map construction function by utilizing the video sequence collected by the camera and extracting and matching the characteristic points. The algorithm design of the system enables the system to quickly respond in a continuously changing environment, and can effectively track the position of the camera and construct an accurate environment map. The functions have important significance for application in the fields of unmanned systems, autonomous navigation robots, virtual reality and the like, and powerful support is provided for development of the fields.

However, the current ORB-SLAM system has the defects of insufficient processing of large-scale environment, incapability of improving positioning accuracy, insufficient effective management of point cloud map data and the like in the aspect of map construction.

Aiming at the challenges of large-scale environment, the conventional ORB-SLAM system and the like can face the problems of high consumption of computing resources, low processing speed and the like when processing a large amount of point cloud data. This makes it difficult for the system to achieve fast and accurate localization and mapping in a complex, large-scale environment. In addition, in a dynamic environment, real-time updating and management of point cloud data is also a challenge, because changes in the environment may cause inaccuracy or obsolete of map data, affecting the positioning accuracy and stability of the system.

Another challenge is the improvement of positioning accuracy. The existing vision SLAM system may be affected by factors such as illumination change, shielding object or sensor error in some cases, so that the positioning accuracy is reduced. Especially in complex indoor or urban environments, positioning errors can accumulate and affect the stability and accuracy of the system.

Furthermore, it is also an important issue to efficiently manage point cloud map data. Large-scale point cloud data requires efficient storage, indexing, and retrieval mechanisms in order for the system to be able to quickly access and process such data. However, conventional data structures and algorithms may not be adequate to handle such scale data, which may lead to problems of wasted storage, inefficient queries, etc.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a three-dimensional mapping method and a system for solving at least one problem that the accuracy of map data is not high and the management of the map data cannot be realized efficiently aiming at the defects in the prior art.

The invention aims at realizing the following technical scheme:

A three-dimensional mapping method, comprising:

Extracting feature points of the obtained target map image by using an ORB-SLAM system, and carrying out three-dimensional visual reconstruction based on the feature points to generate an initial sparse point cloud map;

fusing sparse point cloud data of the target map image acquired by the sensor equipment to obtain dense point cloud data, wherein the sparse point cloud data is obtained based on a sparse point cloud map;

Processing the obtained dense point cloud data by utilizing a point cloud segmentation algorithm and a target detection algorithm to obtain target information in the dense point cloud data;

and carrying out space segmentation and organization on the dense point cloud data according to the target information in the dense point cloud data by utilizing a tree data structure algorithm to obtain a point cloud map with a hierarchical structure.

As a further improvement of the invention, the ORB-SLAM system is utilized to extract the characteristic points of the acquired target map image, and the three-dimensional visual reconstruction is carried out based on the characteristic points, so as to generate an initial sparse point cloud map, which specifically comprises the following steps:

aligning and matching target map images with different scales and different visual angles by using an image pyramid;

Splitting each layer of image into grids with different sizes, extracting FAST corner points in each grid, and continuously iterating and filtering the corner points;

Calculating the rotation angle of the corner points by using a gray centroid method, and calculating BRIEF descriptors for each filtered FAST corner point by combining the rotation angle to finish the extraction of ORB features and obtain feature points;

And performing visual reconstruction based on the feature points to generate a sparse point cloud map.

As a further improvement of the present invention, the visual reconstruction is performed based on the feature points, and a sparse point cloud map is generated, which specifically includes:

Initializing the position of the sensor assembly by utilizing the matching information of the characteristic points, and estimating the motion trail of the sensor assembly;

By matching the characteristic points and estimating the motion trail of the sensor equipment, the ORB-SLAM system starts to perform three-dimensional reconstruction, and deduces the three-dimensional position of point clouds in the scene according to the motion of the sensor equipment and the geometric information of the scene, wherein the point clouds are assembled to form a sparse point cloud map.

As a further improvement of the present invention, the fusing of sparse point cloud data of target map images acquired by a plurality of sensors to obtain dense point cloud data specifically includes:

Based on the set distance constraint, utilizing the Hamming distance between the characteristic points to construct precision constraint matching quality score, and based on the quality score, eliminating a large number of characteristic point pairs which are mismatched between the point cloud map frames;

Acquiring an RGB image and a depth map of a sparse point cloud map by using a sensor device, matching the RGB image with the depth map, estimating the pose of the sensor device according to the front end, transmitting pose information and depth information into a dense point cloud construction thread when a key frame is newly added, calculating the space coordinates of all pixels on a target map image, and generating a dense point cloud map;

And coloring the point cloud information by utilizing RGB image information, continuously performing point cloud splicing and global optimization with the generated point cloud map along with the continuous increase of key frames, and finally obtaining a color point cloud map consisting of discrete points to realize sparse-to-dense transition.

The invention further improves the method for obtaining dense point cloud data by utilizing a point cloud segmentation algorithm and a target detection algorithm to obtain target information in the dense point cloud data, and specifically comprises the steps that the point cloud segmentation algorithm and the target detection algorithm adopt a DeeplabV3 +neural network algorithm, the DeeplabV3 +neural network algorithm comprises a downsampling module and an upsampling module, and the specific steps are as follows:

Performing region segmentation on an indoor scene image in the dense point cloud map image by using DeeplabV & lt3+ & gt neural network algorithm to obtain a scene segmentation model;

The downsampling module is used for extracting features of the dense point cloud map to obtain a feature map; the up-sampling module is used for restoring the size of the feature map;

Dividing the scene image into image frames by adopting the scene division model to obtain a semantic division result image with the same size as the original scene image;

Filtering, segmenting and clustering a dense point cloud map, decomposing the dense point cloud map into independent point clouds, back projecting the point clouds to an RGB image plane, calculating 2D rectangular frames of projection points corresponding to the point clouds, calculating similarity with the 2D rectangular frames of target detection, matching each detection frame with one point cloud, further calculating 3D information of the point clouds, and constructing a 3D target database by combining a 2D target detection result and the 3D information of the point clouds.

The invention further improves the method, the downsampling module is used for extracting the characteristics of the dense point cloud map to obtain a characteristic map, and the upsampling module is used for recovering the size of the characteristic map, and the method specifically comprises the following steps:

the downsampling module performs four downsampling on the dense point cloud map image, and the obtained preliminary effective feature layer is input into the parallel cavity convolution layer;

in each branch of ASPP, adopting hole convolution with different sampling rates to further extract features;

combining the feature extraction results of all the branches of the ASPP to obtain a combined feature layer;

Carrying out 1x1 point-by-point convolution on the combined feature layers to obtain a green feature map, and completing the primary extraction of high-resolution features of the scene map;

The up-sampling module stacks the initial effective feature layer compressed twice by using 1x1 point-by-point convolution to adjust the channel number, and stacks the adjusted effective feature layer and the green feature map in the down-sampling module;

carrying out two-time depth separable convolution on the stacked feature layers to obtain a final effective feature layer, and carrying out channel adjustment on the final effective feature layer by utilizing point-by-point convolution with a set size to adjust the final effective feature layer into the total number of categories;

And outputting a predicted feature image according to the final effective feature layer, and then carrying out picture size adjustment on the output predicted feature image to enable the size of the output predicted feature image to be aligned with the size of the acquired target map image.

As a further development of the invention, the tree data structure algorithm comprises an octree data structure, an R-tree data structure.

As a further improvement of the present invention, the spatial segmentation and organization of the dense point cloud data using a tree data structure algorithm, to obtain a point cloud map having a hierarchical structure, specifically includes:

after a 3D target database and a dense point cloud Map are obtained, a Octo-Map three-dimensional occupancy grid probability Map construction task is added into a local Map thread, and the point cloud data are subjected to space segmentation and organization by adopting an octree data structure algorithm;

The three-dimensional space is divided into eight equal-sized cube nodes, the eight equal-sized cube nodes are distributed into corresponding cube nodes according to the distribution of the point cloud data, an octree map with a hierarchical structure is formed, and nodes of leaves in the octree map are determined by resolution parameters.

As a further improvement of the invention, after the octree map with the hierarchical structure is formed, the method also comprises the step of adjusting the depth and the precision of the nodes in the octree map, wherein the adjustment mode comprises static adjustment and dynamic adjustment, and specifically comprises the following steps:

The static adjustment mode is that for the point cloud dense area, the current node is further subdivided into smaller sub-nodes, and the depth of the node is increased;

the dynamic adjustment mode is that according to the statistical characteristics of the point cloud and the number or density of points in the nodes, the adjacent node information determines a dynamic adjustment strategy, and the adjustment strategy comprises setting threshold adjustment, self-adaptive strategy based on map distribution or a method based on machine learning to feed back and adjust the node depth.

In a second aspect, the present invention further provides a three-dimensional mapping system, configured to implement the three-dimensional mapping method, including:

The sparse point cloud map generation module is used for extracting characteristic points of the obtained target map image by using the ORB-SLAM system, and carrying out three-dimensional visual reconstruction based on the characteristic points to generate an initial sparse point cloud map;

the dense point cloud generation module is used for fusing sparse point cloud data of the target map images acquired by the plurality of sensors to obtain dense point cloud data, and the sparse point cloud data is obtained based on a sparse point cloud map;

The point cloud data processing module is used for processing the obtained dense point cloud data by utilizing a point cloud segmentation algorithm and a target detection algorithm to obtain target information in the dense point cloud data;

And the three-dimensional map generation module is used for carrying out space segmentation and organization on the dense point cloud data according to the target information in the dense point cloud data by utilizing a tree data structure algorithm to obtain a point cloud map with a three-dimensional hierarchical structure.

The three-dimensional map construction method has the advantages that by fusing multi-sensor data and an optimization algorithm, a more accurate and complete point cloud map can be generated, the ORB-SLAM system can be used for estimating the initial pose of a camera and initializing the positions of map points, so that a sparse point cloud map is constructed, the ORB-SLAM system has real-time performance, a dense point cloud map can be generated in a short time, and the requirement of real-time application is met. The method realizes the deep analysis, target extraction and map construction of the point cloud data, so that the robot can realize efficient navigation and positioning in a complex environment, and a reliable basis is provided for autonomous movement of the robot. The ORB-SLAM system is utilized to reconstruct three-dimensional vision and generate a point cloud map, which is a complex and efficient process, combines the advantages of various technologies and algorithms, and can generate a precise, complete and easy-to-analyze three-dimensional map.

Further, by setting the precision constraint of feature point matching, unstable matching feature point pairs between different frames are screened out under the condition that the angle of a camera is changed greatly. The method can be better suitable for feature point matching under different angles, reduces the risk of mismatching, and improves the robustness of the algorithm.

Further, the newly added dense point cloud construction thread provided by the invention utilizes the RGB-D camera to carry out dense reconstruction, thereby enhancing the environment sensing capability of the system, improving the positioning and mapping precision, enriching the visual information, enhancing the instantaneity and the efficiency, better integrating the multi-sensor information and comprehensively optimizing the performance of the SLAM system.

Further, the point cloud segmentation and 3D target information acquisition method provided by the invention utilizes a point cloud segmentation algorithm and a target detection technology to accurately identify and extract information of different targets in dense point cloud, and the extracted 3D target information is helpful for understanding three-dimensional space positions and characteristics of various objects in a scene by a system, and is helpful for understanding and explaining the scene more accurately by the system, so that positions and characteristics of different targets in the environment can be known more accurately, and decisions and paths can be made more effectively.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an implementation of three-dimensional mapping in the present invention;

FIG. 2 is a flow chart of dense point cloud map construction in the present invention;

FIG. 3 (a) is an original image in a sparse point cloud map constructed in the present invention;

FIG. 3 (b) is a point cloud image in a sparse point cloud map constructed in the present invention;

FIG. 4 is a graph of a matching accuracy model in the present invention;

FIG. 5 (a) is an exemplary diagram of a dense point cloud map employed in the present invention;

FIG. 5 (b) is a dense point cloud map of an example map of a dense point cloud map constructed in the present invention;

FIG. 6 is a diagram of a Mask RCNN target detection network in accordance with the present invention;

FIG. 7 is a point cloud segmentation diagram in accordance with the present invention;

FIG. 8 is a point cloud 3D object information acquisition diagram in the present invention;

FIG. 9 is an octree map constructed in the present invention;

FIG. 10 (a) is an exemplary diagram of DeeplabV3+ neural network model inputs in the present invention;

FIG. 10 (b) is a semantic segmentation result of an exemplary graph of DeeplabV3+ neural network model outputs in the present invention;

fig. 11 is a schematic structural diagram of DeeplabV3+ network in the present invention.

Detailed Description

In order to make the purpose and technical scheme of the invention clearer and easier to understand. The present invention will now be described in further detail with reference to the drawings and examples, which are given for the purpose of illustration only and are not intended to limit the invention thereto.

Interpretation of related terms:

SLAM (Simultaneous Localization AND MAPPING) is a synchronous positioning and map building algorithm;

ORB-SLAM, a three-dimensional positioning and map construction algorithm based on ORB characteristics;

ORB (Oriented FAST and Rotated BRIEF) an algorithm for extracting and describing FAST feature points, combining a FAST (corner detection) feature point detection algorithm and a BRIEF (feature description operator) feature description algorithm;

ASPP: spatial pyramid pooling structure;

atrous Convolution, cavity convolution;

The invention provides a three-dimensional graph construction method and a system, wherein the graph construction method mainly comprises the following steps:

Fusing sparse point cloud data of target map images acquired by a plurality of sensors to obtain dense point cloud data, wherein the sparse point cloud data is obtained based on a sparse point cloud map;

And carrying out space segmentation and organization on the dense point cloud data according to the target information in the dense point cloud data by utilizing a tree data structure algorithm to obtain a point cloud map with a three-dimensional hierarchical structure.

Based on the mapping method, the accuracy of map data is improved, and the point cloud map data can be effectively managed.

The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown.

Example 1

As shown in fig. 1 to 11, the present embodiment provides a three-dimensional mapping method, which mainly comprises the following principles:

and (3) extracting characteristic points, performing visual reconstruction and other technologies by using an ORB-SLAM system, and generating an initial sparse point cloud map. Then, on the basis of generating a sparse point cloud map, a multi-sensor fusion mode is adopted. By combining depth information from the lidar, RGB-D camera, etc. sensors, more accurate and rich point cloud data is obtained by fusing the data of these different sensors. And then, improving a characteristic point matching and three-dimensional reconstruction algorithm in the ORB-SLAM system to expand original sparse characteristic points and optimize a three-dimensional reconstruction process, so as to generate richer and denser point cloud data, and the finally generated point cloud map has higher density and accuracy.

After the dense point cloud map is obtained, point cloud segmentation and 3D target information acquisition are performed first. This step involves processing the dense point cloud data, and using point cloud segmentation algorithms and object detection techniques, identifying and extracting different target information in the point cloud, such as objects, dynamic elements in the scene, etc. Through the algorithm and the technology, the point cloud data can be segmented, and target information in each area, such as movable objects of chairs, cups, people and the like, can be obtained. Then, after the target information is obtained, the octree data structure is adopted to carry out space division and organization on the point cloud data. This process divides the three-dimensional space into eight equal-sized cube nodes and distributes the same into the corresponding cube nodes according to the distribution of the point cloud data, forming an octree map with a hierarchical structure. The octree map has the capability of dynamically adjusting the node depth and the node precision, and can adaptively adjust the hierarchical structure and the fineness of the nodes according to the change of the point cloud density and the spatial distribution so as to meet the requirements of different resolutions and spatial scales. The real-time performance and the accuracy of the map are guaranteed, and efficient environment sensing and positioning capability is provided for autonomous navigation of the robot.

The specific implementation manner of the three-dimensional mapping method in this embodiment is as follows:

And S1, extracting characteristic points of the acquired target map image by using an ORB-SLAM system, and carrying out three-dimensional visual reconstruction based on the characteristic points to generate an initial sparse point cloud map. In the S1 step, the ORB-SLAM system is utilized to process the input image data, firstly FAST corner points affected by each layer of the image are extracted and filtered continuously, then BRIEF descriptors corresponding to the corner points are calculated, the ORB feature descriptors are used to extract key points in the image, and map construction and positioning are carried out according to the feature points. The specific implementation is as follows:

And performing visual reconstruction based on the feature points to generate a sparse point cloud map. The ORB-SLAM system starts to reconstruct three-dimensionally by matching the characteristic points and estimating the motion track of the camera, deduces the three-dimensional position of point clouds in the scene according to the motion of the camera and the geometric information of the scene, and the set of the point clouds forms a sparse point cloud map.

Specifically, firstly, images with different scales and visual angles are aligned and matched by utilizing an image pyramid, then, images of each layer are split into grids with different sizes, FAST corner points are extracted in each grid, and finally, the FAST corner points are filtered continuously and iteratively by using a non-maximum suppression method.

The non-maximum value suppression method firstly calculates the sum of the gray scale difference absolute values of the central pixel and the surrounding pixels, and then utilizes the self-set threshold value to limit the sum of the absolute values so as to achieve the purpose of screening corner points.

Calculating the sum of gray scale difference absolute values of a center point pixel p and surrounding 16 point pixels, and c:

Wherein I _p is the gray level of the center point pixel p, and I _i is the gray level of the pixels around p.

Secondly, calculating the rotation angle of the FAST corner point by using a gray centroid method, and obtaining a BRIEF descriptor by using the angle and the filtered corner point:

Firstly, selecting a gray centroid of a pixel in a circular area, and connecting the centroid with a circle center to form a vector, wherein the angle of the vector is the angle of a corner point. The radius of the circle is taken to be 15 because the entire patch is typically taken to be 31×31. The gray centroid (cx, cy) is then:

where I (x, y) is the gray value of the pixel with coordinates (x, y).

Then the angle of the corner point is:

θ=arctan2(c_y,c_x)

So far, not only the FAST corner point is extracted, but also the angle of the corner point is found out. This angle can be used to guide the extraction of descriptors, ensuring that descriptors are calculated in the same direction each time, achieving angle invariance, and thus completing the extraction of ORB features.

Based on the feature points and the camera view angles extracted by the ORB-SLAM system, three-dimensional visual reconstruction is carried out by utilizing the acquired spatial position information of the environmental key points, so that an initial sparse point cloud map is generated. The specific implementation is as follows:

initializing the camera position and scene geometry using the matching information of the feature points, the ORB-SLAM system attempts to initialize the camera position and estimate the pose of the camera.

And solving pose transformation of the camera in motion through the relation between the two frames of images. Matching corresponding points among frames is carried out through ORB features, the obtained matching points have depth information, and pose changes in motion are calculated through the depth information of the matching points. Assume that there is a set of well-matched 3D points:

P=p₁,p₂,...,p_n

P'=p₁',p'₂,...,p'_n

for any matching point, there is the same Euclidean transformation:

Where R is the rotational transform matrix between frames and t is the translational transform matrix between frames. And solving the above formula by utilizing an ICP (ITERATIVE CLOTEST POINT) algorithm, and iterating to obtain the motion transformation pose of the camera.

And (3) three-dimensional reconstruction and point cloud generation, namely, by matching the characteristic points and estimating the camera motion, the ORB-SLAM system starts to build a map, and deduces the three-dimensional position of the point in the scene according to the camera motion and the geometric information of the scene. The set of points constitutes an initial sparse point cloud map, as shown in fig. 3 (a), 3 (b).

SLAM mapping is divided into keyframe insertion, map point cloud screening, new map point cloud creation, local constraint adjustment, and local keyframe screening. Specifically:

Key frame insertion, namely detecting whether a key frame exists in the sequence, if so, calculating a word bag descriptor of a new key frame, and inserting the word bag descriptor into a map.

And screening the map point cloud, namely when the map points appear in enough key frames, considering the map points as stable points, and otherwise, eliminating the map points.

And creating a new map point cloud, namely calculating the position of the new point cloud in the map according to the triangulated ORB feature vector, and updating the map.

Local constraint adjustment Bundle Adjustment the pose of the current key frame is optimized using the local constraint adjustment.

Local keyframe screening to reduce the number of keyframes, if the current keyframe can be seen by the other three keyframes and 90% of the point clouds are the same, then this keyframe is deleted.

After generating the initial sparse point cloud map, map optimization is typically required, and this embodiment uses the incremental update strategy of SLAM to gradually update the map and the robot's position by continually processing new sensor data and based on these data. This strategy allows the SLAM system to continuously improve map accuracy and positioning accuracy during operation without having to reprocess all of the previous data. Incremental updating improves the consistency and accuracy of the map.

And S2, fusing sparse point cloud data of the target map image acquired by the sensor equipment to obtain dense point cloud data. Aiming at the sparse point cloud map, the feature point matching and three-dimensional reconstruction algorithm in the ORB-SLAM system is improved and optimized, the original sparse feature points are expanded, and the reconstruction process of the point cloud data is optimized, so that richer and denser point cloud data are generated. Based on the set distance constraint, utilizing the Hamming distance between the feature points to construct precision constraint matching quality score, and based on the quality score, eliminating a large number of feature point pairs which are mismatched between the point cloud map frames;

Acquiring an RGB image and a depth map of a sparse point cloud map by using sensor equipment, matching the RGB image with the depth map, estimating the pose of a camera according to the front end, transmitting pose information and depth information into a dense point cloud construction thread when a key frame is newly added, calculating the space coordinates of all pixels on a target map image, and generating a dense point cloud map;

The sensor combination device comprises an RGB-D camera, a monocular camera, an IMU, a laser radar, a camera and the like, wherein a monocular camera and an Inertial Measurement Unit (IMU) are combined to generate a dense point cloud and position by a Visual Inertial Odometer (VIO) method. The laser radar and the camera are combined with the advantages of the laser radar and the camera, the laser radar is used for providing high-precision distance information, the camera is used for providing rich texture information, and more accurate three-dimensional point cloud data are generated.

The present embodiment employs an RGB-D camera. The specific implementation of the steps is as follows:

S2.1, the problem that the corresponding characteristics of two adjacent frames are not obvious due to large camera angle change is solved by utilizing the matching precision constraint (shown in fig. 4). Based on the distance constraint, the Hamming distance between the feature points is utilized to construct an accuracy constraint matching quality score, and based on the quality score, a large number of feature point pairs which are mismatched between frames are eliminated. The method specifically comprises the following steps:

S2.1.1 in the image feature point matching stage, the matching point pair with the maximum hamming distance far greater than the next-nearest hamming distance is set as reliable matching. In this embodiment, the nearest hamming distance d ₁ and the next nearest hamming distance d ₂ between each pair of feature points and other feature points are calculated, and the ratio between them is recorded as λ, namely:

λ=d₁/d₂

the smaller λ represents that the nearest hamming distance of the pair of matching points is much smaller than the next nearest hamming distance, the more reliable the matching;

S2.1.2, judging the reliability of the feature point matching points by setting up a precision constraint factor alpha, and if the ratio of the Hamming distances meets lambda < alpha, considering the currently judged feature point matching point pairs as reliable matching. Introducing the concept of matching quality, if the matching quality of a pair of feature points is higher, the probability of representing the correct matching is larger, so that an accuracy constraint score q _a can be defined to quantify the matching quality:

q_a＝λd₁

Wherein q _a represents the accuracy constraint matching quality score, the smaller the accuracy constraint score is, the higher the representative matching quality is, and the feature matching conceptual diagram based on the distance constraint is shown in fig. 3. Compared with the point pairs with low matching quality, the high-quality matching point pairs can provide more robust inter-frame relation for the SLAM system, and are more beneficial to pose estimation of a camera.

S2.2, ORB-SLAM can only generate sparse point cloud map, so in this embodiment, add dense point cloud to construct the thread on the original algorithm frame, utilize RGB-D camera to carry on dense reconstruction. Namely, adding 1 dense point cloud construction thread, firstly matching an RGB image with a depth map, then estimating the pose of a camera according to the front end, transmitting pose information and depth information into the dense point cloud construction thread when a key frame is newly added, calculating the space coordinates of all pixels on the image, and generating a point cloud map.

S2.3, coloring the point cloud by utilizing RGB image information, and continuously performing point cloud splicing and global optimization with the generated map along with the continuous increase of key frames to finally obtain 1 color point cloud map (shown in fig. 5 (a) and 5 (b)) composed of discrete points, thereby realizing sparse-to-dense transition.

And S3, after the dense point cloud map is obtained, point cloud segmentation and 3D target information acquisition are performed first. And processing the obtained dense point cloud data by using a point cloud segmentation algorithm and a target detection algorithm to obtain target information in the dense point cloud data. The method comprises the steps of adopting a DeeplabV3 +neural network algorithm for a point cloud segmentation algorithm and a target detection algorithm, adopting a DeeplabV3 +neural network algorithm comprising a downsampling module and an upsampling module, using the DeeplabV3 +neural network algorithm to segment an indoor scene image in a dense point cloud map image to obtain a scene segmentation model, adopting the downsampling module to extract features of the dense point cloud map to obtain a feature map, adopting the upsampling module to restore the size of the feature map, and adopting the scene segmentation model to segment an image frame of the scene image to obtain a semantic segmentation result image consistent with the size of an original scene image. The goal of this step is to identify and extract different target information in the point cloud, such as objects, dynamic elements in the scene, etc. These algorithms and techniques enable segmentation of point cloud data to obtain target information within various regions, such as movable objects of chairs, cups, people, etc.

Filtering, segmenting and clustering a dense point cloud map, decomposing the dense point cloud map into independent point clouds, back projecting the point clouds to an RGB image plane, calculating 2D rectangular frames of projection points corresponding to the point clouds, calculating similarity with the 2D rectangular frames of target detection, matching each detection frame with one point cloud, further calculating 3D information of the point clouds, and constructing a 3D target database by combining a 2D target detection result and the 3D information of the point clouds. The specific steps are as follows:

Firstly, using DeeplabV3 +neural network to segment the indoor scene image to obtain a scene segmentation model, and specifically realizing the following steps:

As shown in fig. 11, the DeeplabV3+ neural network includes a downsampling module Encoder and an upsampling module Decoder, wherein Encoder is responsible for feature extraction, the Decoder is responsible for restoring the feature map size by interpolation and transposed convolution, and the two modules, encoder and Decoder, are structurally symmetrical, and obtain high-resolution features by Encoder-Decoder structure, which is specifically implemented as follows:

s3.1, a down-sampling module Encoder performs feature extraction;

specifically, the downsampling module Encoder uses parallel hole convolutions Atrous Convolution on the preliminary valid feature layers compressed four times to expand the receptive field and capture the environmental information of the multi-scale picture:

Performing feature extraction by using Atrous spatial pyramid pooling structures ASPP with different sampling rates respectively, and performing concat merging on feature extraction results to obtain a merged feature layer;

and carrying out 1x1 point-by-point convolution on the combined feature layers to obtain a green feature map, and completing the primary extraction of high-resolution features of the scene map.

S3.2, the up-sampling module Decoder recovers the size of the feature map:

The up-sampling module Decoder adjusts the channel number of the primary effective feature layer compressed twice by utilizing 1x1 point-by-point convolution, and stacks the adjusted effective feature layer and the effective feature layer processed by ASPP in Encoder;

the stacked feature layers are subjected to two-time depth separable convolution to obtain a final effective feature layer, and the final effective feature layer is subjected to channel adjustment by using a 1x1 point-by-point convolution and is adjusted to be the total number of categories.

Among these, the depth separable convolution is largely divided into two processes, a channel-by-channel convolution DEPTHWISE CONVOLUTION and a point-by-point convolution Pointwise Convolution. The common neural network uses the channel-by-channel convolution, but the number of the feature images after the channel-by-channel convolution is the same as the number of channels of the input layer, the subsequent expansion of the feature images cannot be performed by expansion, and the operation independently performs convolution operation on each channel of the input layer, so that the feature information of different channels on the same spatial position is not effectively utilized, and therefore, the feature images need to be recombined by the point-by-point convolution to obtain more feature information. Fig. 6 shows three convolution forms of the depth separable convolution, where (a) in fig. 6 is an existing channel-by-channel convolution, (b) is an existing point-by-point convolution, and (c) is a channel-by-channel convolution of the fusion hole convolution of the present invention.

And after the adjusted effective feature layer outputs the predicted feature image, adjusting the size of the output predicted image by using the image to restore the original image size.

S3.3, dividing a picture frame based on a scene division model of DeeplabV & lt3+ & gt neural network:

And performing model iterative training on the scene picture data set by using the network until a scene segmentation model is obtained, and then performing segmentation prediction on the scene picture by using the segmentation model to output a semantic segmentation result graph with the same size as the original scene graph, as shown in fig. 10 (a) and 10 (b).

S3.4, filtering, segmenting and clustering point clouds, decomposing the point clouds into independent point clouds, back projecting the point clouds to an RGB image plane, calculating 2D rectangular frames of projection points corresponding to the point clouds, calculating similarity (IOU cross-correlation) with the 2D rectangular frames of target detection, matching each detection frame with one point cloud, and further calculating 3D information of the point cloud, wherein the specific method is shown in FIG. 7. Finally, a 3D target database is constructed by combining the 2D target detection result and the 3D information of the point cloud cluster, as shown in fig. 8.

And S4, performing space segmentation and organization on the dense point cloud data according to the target information in the dense point cloud data by utilizing a tree data structure algorithm to obtain a point cloud map with a three-dimensional hierarchical structure. The tree data structure algorithm can adopt any one of an octree data structure, an R tree data structure and a KD tree data structure.

KD-trees (k-dimensional trees) are employed to manage and organize three-dimensional point cloud data. The KD-tree enables efficient nearest neighbor searches and point cloud data queries by recursively partitioning k-dimensional space.

The R tree is divided by the layering rectangular area, so that three-dimensional point cloud data can be efficiently managed and queried.

In the embodiment, the octree data structure is adopted to carry out space division and management on the dense point cloud, so that efficient storage and query are realized.

The three-dimensional space is divided into eight equal-sized cube nodes, and the eight equal-sized cube nodes are distributed into the corresponding cube nodes according to the distribution of the point cloud data to form an octree map with a hierarchical structure, as shown in fig. 9. Wherein the nodes of the leaves in the octree map are determined by the resolution parameters. The specific implementation is as follows:

Storage of Octree map: octree is a hierarchical data structure. Each Node in Octree represents a cube, also called a voxel, in space. This cube will be recursively divided into eight blocks until each block reaches a minimum size.

In addition, the update strategy of the octree map is:

Octo-Map stores the Map in an octree structure, the minimum leaf node size is determined by the resolution parameter, and because of the existence of dynamic objects in the environment, the same space is occupied sometimes, and the space becomes idle after a while, so the occupancy probability of one leaf should be determined by the joint probability of multiple observations, and the following logic transformation is performed on the joint probability in this embodiment, so that a simplified mode of updating the probability p can be obtained:

α=log(p/1-p)

the inverse of the above-described logic transformation is in the form of a sigmoid function, as follows:

p=1/(1+exp(-α))

a simplified way of updating the probability is as follows:

L(n|Z_1:T)＝L(n|Z_1:T-1)+L(n|Z_T)

wherein L (n|Z1: T-1) is the alpha value of the previous lattice, the current comprehensive alpha value can be obtained by adding the last alpha value L (n|ZT), and the occupancy probability value of the lattice can be obtained by the inverse transformation of the joint probability, wherein L (n|ZT) is determined by the following formula:

Wherein locc and lfree are predefined values, locc =0.85, lfree= -0.4 corresponds to occupancy probability values of 0.7 and 0.4, respectively, and 0 corresponds to probability value of 0.5, if node n is frequently observed, the occupancy probability will increase, and the occupancy probability of points (point cloud segmentation to acquire ground and dynamic points) that are not observed and are not under consideration will be relatively reduced.

After forming the octree map with the hierarchical structure, the method further comprises the step of adjusting the depth and the precision of the nodes in the octree map, wherein the adjustment mode comprises static adjustment and dynamic adjustment, and specifically comprises the following steps:

the static adjustment mode is that for a point cloud dense area, the current node is further subdivided into smaller sub-nodes, the depth of the node is increased, and for a point cloud sparse area, the current node and the sub-nodes are combined, and the depth of the node is reduced.

In particular, for dense areas, the system may observe that more point cloud data falls within the same area. In this case, it may be considered to increase the depth of the node. Increasing the depth of a node means that the current node is further subdivided into smaller child nodes to allow more detailed information to be stored in this area. For sparse regions, there may be little point cloud data present in some regions. In this case, in order to reduce the complexity of the map, the system may consider reducing the depth of the nodes. Reducing the depth of a node means merging the current node with its children to reduce the details and complexity of the map.

Specifically, the dynamic adjustment policy is defined according to the statistical characteristics of the point cloud, the number or density of points in the node, the adjacent node information, and the like. The dynamic adjustment policy may include threshold settings, adaptive policies based on map distribution, or a machine learning based approach to automatically adjust node depth. The following is specific to each dynamic adjustment strategy:

Threshold setting the system adjusts based on a preset threshold. These thresholds are based on the indices of point cloud density, the number of points within a node, the change in spatial region, etc. For example, the depth of a node may be selected to be increased when the number of points within a node exceeds a certain threshold, whereas the depth of a node may be considered to be decreased if the number of points within a node is below a threshold.

The system can adaptively adjust the node depth by analyzing and monitoring the distribution of the map in real time, wherein the system comprises statistics and analysis of the point cloud density of the whole map or a specific area, and then dynamically adjusts the node depth according to the analysis result. For example, the division and merging of nodes is decided based on a certain density index.

Feedback control and real-time adjustment, another method is to realize a feedback control system and adjust the depth and the precision of the nodes in real time according to the change of the map. The performance of the actual map is monitored, information of node distribution and density is collected in real time, and then the depth and the precision of the nodes are adjusted according to the information, so that the optimal state of the map is maintained.

In addition, the update mechanism in this embodiment may further use an incremental update algorithm to gradually update the point cloud data in the map. By detecting the environmental change, only the changed part is updated, thereby improving the updating efficiency.

Or the update mechanism in the embodiment can also adopt a sliding window technology, and only the point cloud data in the current perception range is managed and updated. The sliding window can be dynamically adjusted according to environmental changes, so that the real-time updating is ensured, and meanwhile, the calculation complexity is reduced.

The final objective of the overall implementation is to provide efficient context awareness and localization capability for autonomous navigation of a robot. Through the steps, the depth analysis, target extraction and map construction of the point cloud data are realized, so that the robot can realize efficient navigation and positioning in a complex environment, and a reliable basis is provided for autonomous movement of the robot.

Example 2

The present embodiment provides a three-dimensional mapping system for implementing the three-dimensional mapping method described in embodiment 1, where the system includes:

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be referred to herein collectively as a "circuit," module, "or" platform.

The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A three-dimensional mapping method, comprising:

2. The method for three-dimensional mapping according to claim 1, wherein the steps of extracting feature points of the obtained target map image by using the ORB-SLAM system, and performing three-dimensional visual reconstruction based on the feature points, and generating an initial sparse point cloud map include:

3. The three-dimensional mapping method according to claim 2, wherein the generating a sparse point cloud map based on the visual reconstruction of the feature points specifically comprises:

4. The three-dimensional mapping method according to claim 2, wherein the fusing sparse point cloud data of the target map images acquired by the plurality of sensors to obtain dense point cloud data specifically includes:

5. The three-dimensional mapping method according to claim 1, wherein the processing of the obtained dense point cloud data by using the point cloud segmentation algorithm and the target detection algorithm to obtain the target information in the dense point cloud data specifically comprises the steps that the point cloud segmentation algorithm and the target detection algorithm adopt DeeplabV < 3+ > neural network algorithm, the DeeplabV < 3+ > neural network algorithm comprises a downsampling module and an upsampling module, and the specific steps are as follows:

6. The method for three-dimensional mapping according to claim 5, wherein the downsampling module is configured to perform feature extraction on the dense point cloud map to obtain a feature map, and the upsampling module is configured to restore the size of the feature map, and specifically comprises:

7. The method of claim 1, wherein the tree data structure algorithm comprises an octree data structure, an R-tree data structure, and a KD-tree data structure.

8. The method for three-dimensional mapping according to claim 7, wherein the spatial segmentation and organization of the dense point cloud data using a tree data structure algorithm, to obtain a point cloud map with a hierarchical structure, specifically comprises:

9. The three-dimensional mapping method according to claim 8, wherein after the octree map with the hierarchical structure is formed, further comprising depth and precision adjustment of nodes in the octree map, wherein the adjustment mode comprises static adjustment and dynamic adjustment, and specifically comprises:

10. A three-dimensional mapping system for implementing the three-dimensional mapping method according to any one of claims 1 to 9, comprising: