CN120707642B

CN120707642B - Cloud-edge cooperative visual slam method and system based on feature point prediction

Info

Publication number: CN120707642B
Application number: CN202511194614.2A
Authority: CN
Inventors: 卢旭; 余富健; 刘军; 黄翰媛; 刘海浚
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2025-08-26
Filing date: 2025-08-26
Publication date: 2025-11-14
Anticipated expiration: 2045-08-26
Also published as: CN120707642A

Abstract

This invention discloses a cloud-edge collaborative visual SLAM method and system based on feature point prediction. The method includes: adding several consecutive frames of images into a sliding window; extracting feature points from all pixels; establishing a quadtree as prior information for the corresponding frame images and merging them to establish global prior information; using the global prior information to predict the feature point distribution of the next frame image using a multi-step Markov probability model; adjusting the branches in the new quadtree; adding the next frame image into the sliding window, removing the earliest frame image and its corresponding prior information, and using the images within the sliding window and their corresponding prior information to predict the feature point distribution of subsequent frames. This invention considers the continuity of camera motion in space and the spatial relationship between feature points in consecutive frames. By constructing a multi-step Markov probability model, it reduces computational load while ensuring effective localization and mapping, significantly saving CPU computing resources.

Description

Cloud-edge cooperative visual slam method and system based on feature point prediction

Technical Field

The invention belongs to the technical field of robot vision, and particularly relates to a cloud edge cooperative vision slam method and system based on feature point prediction.

Background

With the popularization of intelligent terminal equipment and the rapid development of technologies such as mobile robots, unmanned aerial vehicles, augmented reality and the like, synchronous positioning and mapping (slam) technologies face new challenges and opportunities. Conventional slam systems typically rely on a single device to complete all computing tasks, which places extremely high demands on resource-constrained mobile devices. In order to realize stable and efficient slam functions under the conditions of limited computing power, storage space and battery endurance, a cloud edge cooperative slam architecture is generated, and the method becomes one of hot spot directions of current research.

Cloud edge cooperation slam can be used for unloading computation-intensive work to the cloud end through reasonable distribution of computation tasks, and meanwhile necessary real-time processing capacity is reserved on edge equipment, so that balance between system performance and resource consumption is achieved. The architecture is particularly suitable for miniaturized devices such as smart phones, unmanned aerial vehicles, service robots and the like which are provided with low-power-consumption processors. Although the computing capacity of the devices is limited, the limit of local hardware can be broken through by cloud edge cooperation, and the slam experience approaching to a high-performance computing platform is obtained.

In practical applications, low-power devices put various demands on cloud-edge collaborative slam architecture, including reducing the amount of computation and CPU burden as much as possible.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and provides a cloud edge collaborative visual slam method and a cloud edge collaborative visual slam system based on characteristic point prediction, which extract characteristic points based on quadtree and Markov properties, so that the calculated amount is reduced on the premise of ensuring the positioning and mapping effects.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a cloud edge collaborative visual slam method based on feature point prediction, which comprises the following steps of:

adding a plurality of initial continuous frame images into a sliding window;

extracting feature points of all pixels of a plurality of initial continuous frames of images;

establishing a quadtree for the feature points extracted from each frame;

The quadtree is saved as prior information of the corresponding frame image;

Combining prior information corresponding to a plurality of initial continuous frame images, and establishing a new quadtree as global prior information;

predicting the feature point distribution of the next frame of image by using the global priori information and adopting a multi-step Markov probability model;

the branches in the new quadtree are adjusted according to the actual characteristic point distribution of the next frame of image;

And adding the next frame image into the sliding window, removing the earliest frame image and the corresponding prior information, and predicting the characteristic point distribution of the next frame image by utilizing the image in the sliding window and the corresponding prior information.

As a preferred technical solution, the building a quadtree for the feature points extracted from each frame includes the following steps:

creating a root node representing the entire image area;

Starting from a root node, checking whether the current node is divided, and if the current node is not divided and the spatial density (N/Area) of feature points in the node is more than ρ1, dividing the current region into four sub-regions and distributing the feature points to corresponding sub-regions, wherein N is the number of the feature points of the current node, ρ1 is a set spatial density threshold, area= (x _max-x_min)×(y_max-y_min) is the Area of the current node, and x _max、x_min、y_max and y _min are the maximum and minimum values of the abscissa and the ordinate of the current region;

dividing by the node center (x _c,y_c):

;

Stopping dividing when the area reaches the minimum size limit s=W/(2 ^n')×H/(2^n'), n' E [6,8], wherein W and H are the width and height of the image frame, and Q ₁-Q₄ is four sub-areas obtained by dividing;

The process is recursively performed until feature points are assigned to the appropriate leaf nodes.

As a preferred technical scheme, the method establishes a new quadtree as global prior information, wherein the number of node characteristic pointsWherein w ₁,w₂,…,w_n is the weight of the images of different time frames, the subscript represents the sequence number, the weight value is determined by the time sequence of the image frames, the new image frames have larger weight, and vice versa, V ₁,V₂,…,V_n is the number of the characteristic points of the images of different time frames, and the subscript represents the sequence number;

and removing leaf nodes of the space density (N/Area) < ρ2 of the feature points, wherein N is the number of the feature points of the current node, ρ2 is a preset space density threshold, area= (x _max-x_min)×(y_max-y_min) is the Area of the current node, and x _max、x_min、y_max and y _min are the maximum and minimum values of the abscissa and the ordinate of the current region.

As an optimal technical scheme, the method for predicting the characteristic point distribution of the next frame of image by using the global prior information and adopting a multi-step Markov probability model specifically comprises the following steps:

Defining a multi-step Markov probability model P(X_n+1|X_n,X_n-1,...,X₀)=P(X_n+1|X_n,X_n-1,...,X_n-k+1),P(X_n+1|X_n,X_n-1,...,X₀) to represent the feature point distribution state X _n+1 of the current image, related to all distribution states X _n,X_n-1,...,X₀ preceding X _n+1, P (X _n+1|X_n,X_n-1,...,X_n-k+1) to limit X _n+1 to only a limited k states X _n,X_n-1,...,X_n-k+1 preceding X _n+1;

predicting the distribution state of image characteristic points of n+1 through X _n,X_n-1,...,X_n-k+1 :

;

Wherein E ∙ is the maximum likelihood.

As a preferred technical scheme, X _n+1 is related to the limited k=3 states X _n,X_n-1,...,X_n-2 before X _n+1 to obtain a three-order Markov probability model P(X_n+1|X_n,X_n-1,...,X₀)=P(X_n+1|X_n,X_n-1,...,X_n-2);

The state transition of the characteristic point distribution between the front image frame and the rear image frame in a short time satisfies the linear relation that X _n+1=aX_n+bX_n-1+cX_n-2+...+X₁ +epsilon, epsilon is Gaussian noise, and then:

;

wherein a, b and c are linear coefficients.

As a preferable technical solution, the following actual feature point distribution of the next frame image adjusts branches in the new quadtree, and specifically includes the following steps:

extracting feature points, namely extracting feature points from the corresponding images of the leaf nodes, counting the number of the feature points, and updating the leaf node data;

Dividing a current region into four sub-regions, distributing the feature points to the corresponding sub-regions, and performing recursion until the feature points are distributed to proper leaf nodes, wherein N is the number of the feature points of the current node, ρ1 is a set space density threshold, area= (x _max-x_min)×(y_max-y_min) is the Area of the current node, x _max、x_min、y_max and y _min are the maximum and minimum values of the abscissa and the ordinate of the current region, and when the region reaches the minimum size limit s=W/(2 ^n')×H/(2^n'), N' epsilon [6,8], dividing is stopped, and W and H are the width and height of an image frame;

If one father node P has only leaf nodes and the space density (N '/Area') < ρ3 of each leaf node characteristic point, then completing four leaf nodes of the father node P, wherein N 'is the number of leaf node characteristic points, area' is the Area of the leaf node, ρ3 is the set space density threshold of the leaf node;

Extracting characteristic points from the complemented leaf node area, if the number of the characteristic points of each leaf node of the complemented father node P does not reach a set threshold T2, enabling the father node P to be a leaf node and reconstructing the leaf node locally upwards;

The upward local reconstruction specifically comprises the step of judging whether a father node P1 of the P has only leaf nodes, if so, executing the local reconstruction on the P1.

As a preferable technical solution, the number of layers of the quadtree is set at the beginning of the step of adjusting branches in the new quadtree, and the corresponding leaf node whose number of leaf node characteristic points is smaller than the set threshold T2 is assigned to-1, which indicates the abandoned region, and the abandoned nodes do not play a role in merging.

The invention also provides a cloud edge collaborative visual slam system based on feature point prediction, which comprises an edge end, an edge server and a cloud server which are communicated with each other through a communication module;

the edge end adopts the cloud edge collaborative visual slam method based on the feature point prediction to obtain image feature point distribution, and constructs and stores a sub-map;

The edge server is used for restoring the data uploaded by the edge end according to the slam visual odometer to obtain a real-time path and roadside points, and carrying out optimization processing on the real-time path and roadside points through the slam rear end to obtain an edge map;

the cloud server is used for merging the edge maps uploaded by the edge servers as global maps, and carrying out optimization processing and storage on the global maps.

The invention further provides cloud edge cooperative visual slam equipment based on the characteristic point prediction, which comprises a memory and at least one processor, wherein the memory is stored with instructions, the memory is connected with the at least one processor through a line, and the at least one processor calls the instructions in the memory so that the cloud edge cooperative visual slam equipment based on the characteristic point prediction executes the cloud edge cooperative visual slam method based on the characteristic point prediction.

In another aspect of the present invention, a storage medium is further provided, where a program is stored, where the program, when executed by a processor, implements a cloud edge collaborative visual slam method based on feature point prediction as described above.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) According to the invention, the continuity of the movement of the camera in space is considered, the space relation is provided for the characteristic points between the front frame and the back frame, and the calculation resource consumption of the CPU is greatly saved by constructing a multi-step Markov probability model and then an algorithm.

(2) The invention adapts to the requirements of low-power equipment in a cloud-edge cooperative mode, extracts and manages the characteristic points of the visual odometer on the premise of ensuring the robustness of the system, and solves the problem that the extraction of the characteristic points (exemplified by orb characteristic points) of the whole image in the prior art consumes resources and time very much.

Drawings

FIG. 1 is a flow chart of a cloud edge collaborative visual slam method based on feature point prediction according to an embodiment of the present invention;

FIG. 2 is a flow chart of the set-up of a quadtree according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a corresponding relationship between a quadtree and an image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of quadtree weighted merge of an embodiment of the present invention;

FIG. 5 is a schematic diagram of a priori quadtree dynamic adjustment of an embodiment of the present invention;

FIG. 6 is a schematic view of window sliding in accordance with an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of a cloud edge cooperative visual slam system based on feature point prediction according to an embodiment of the present invention.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Example 1:

as shown in fig. 1, the embodiment provides a cloud edge cooperative visual slam method based on feature point prediction, which includes the following steps:

S1, inputting images, namely adding a plurality of initial continuous frame images into a sliding window.

S2, extracting global feature points of the image, namely extracting feature points of all pixels of a plurality of initial continuous frames of images.

And S3, building a quadtree, namely building quadtree management image feature points for the feature points extracted from each frame. The flow chart is shown in fig. 2, and the correspondence between the image and the quadtree is shown in fig. 3.

Quadtrees are a tree-like data structure for efficiently managing two-dimensional space. It recursively divides the region into four equal-sized sub-regions, with each node having a maximum of four sub-nodes. The quadtree is widely applied to image processing and can rapidly process space inquiry and collision detection. The hierarchical structure can optimize the storage and retrieval efficiency, and is particularly suitable for processing data with uneven distribution.

Further, step S3 includes the steps of:

s301, initializing a root node, namely creating the root node representing the whole image area.

S302, inserting feature points, namely checking whether the current node is divided from a root node, and dividing the current region into four sub-regions and distributing the feature points to corresponding sub-regions if the current node is not divided and the space density (unit Area point number) (N/Area) > ρ1 of the feature points in the node is not divided, wherein N is the number of the feature points of the current node, ρ1 is a set space density threshold value, area= (x _max-x_min)×(y_max-y_min) is the Area of the current node, and x _max、x_min、y_max and y _min are the maximum and minimum values of the abscissa and the ordinate of the current region.

Division rules (division bounded by node center (x _c,y_c):

;

The region reaches a minimum size limit s=w/(2 ^n')×H/(2^n'), the division is stopped when n' e [6,8], W and H are the width and height of the image frame, and Q ₁-Q₄ is the four sub-regions obtained by the division.

S303, recursion processing is performed until the feature points are allocated to the proper leaf nodes.

S4, storing the prior information, namely storing the image frames in the sliding window corresponding to the quadtree as the prior information.

S5, establishing a quadtree by using prior information, namely establishing a new quadtree corresponding to the quadtree by combining images in the sliding window as global prior information, wherein the node characteristic point number is calculated by the new quadtreeWherein w ₁,w₂,…,w_n is the weight of the images of different time frames, the subscript indicates the sequence number, the weight value is determined by the time sequence of the image frames, the new image frames have larger weight, and vice versa, V ₁,V₂,…,V_n is the number of the characteristic points of the images of different time frames, and the subscript indicates the sequence number. Removing leaf nodes of the space density (N/Area) < ρ2 of the feature points, wherein ρ2 is a preset space density threshold value, and is smaller than or equal to ρ1, and the specific value is obtained by adjusting according to scenes and experience. The merging process of the quadtree a and B into C is as shown in fig. 4, and assuming that the spatial density (N/Area) < ρ2 of the characteristic points of the leaf node a ₂ of the quadtree a, the nodes of the same position of the quadtree a and the quadtree B are weighted and merged to the corresponding position of the quadtree C, and the node with the zero number of characteristic points does not play a role in the merging process, for example, the leaf node a _l、a₂、a₃、a₄ of the quadtree B does not exist, so the number of the corresponding characteristic points is 0, the weight w.v =0, and only the leaf node a ₁、a₃、a₄ of the quadtree a, the leaf node B ₁ of the quadtree B and the leaf node B ₂ of the quadtree B remain in the quadtree C in the third layer.

S6, predicting the feature point distribution of the next frame of image by using the global priori information and adopting a multi-step Markov probability model.

Markov properties (Markov properties) refer to the Property that a random process, given a current state, relies on the current state alone for future states, and not on past states. Mathematical expression:

for the discrete-time random process { X _n }, if:

P(X_n+1=x|X_n=x_n,X_n-1=x_n-1,...,X₀=x₀)=P(X_n+1=x|X_n=x_n);

the process is said to have markov properties.

When the state at a certain instant depends on the state at the first k instants (not just the current instant), it is called a k-order markov process.

The mathematical expression is as follows:

P(X_n+1|X_n,X_n-1,...,X₀)=P(X_n+1|X_n,X_n-1,...,X_n-k+1);

I.e. the conditional probability of the future state depends on the last k historical states.

;

Wherein E ∙ is the maximum likelihood.

Further, in this embodiment, let X _n+1 relate only to the limited k=3 states X _n,X_n-1,...,X_n-2 before X _n+1, and obtain a third order markov probability model P(X_n+1|X_n,X_n-1,...,X₀)=P(X_n+1|X_n,X_n-1,...,X_n-2);

;

wherein a, b and c are linear coefficients.

More specifically, a, b, c can be expressed as:

;

。

S7, dynamically adjusting the prior quadtree, namely adjusting branches in the new quadtree in the step S5 by the actual distribution of the image characteristic points of the next frame of image. The flow chart is shown in fig. 5.

Further, step S7 includes the steps of:

And S701, extracting characteristic points, namely extracting the characteristic points from the corresponding images of the leaf nodes, counting the number of the characteristic points, and updating the leaf node data.

And S702, splitting the nodes, namely dividing the current region into four sub-regions if the space density (number of unit Area) of the characteristic points in the nodes (N/Area) > ρ1, distributing the characteristic points to the corresponding sub-regions, and carrying out recursion until the characteristic points are distributed to proper leaf nodes.

The region reaches a minimum size limit s=w/(2 ^n')×H/(2^n'), and the division is stopped when n' ∈ [6,8 ].

S703, locally reconstructing, namely if one father node P has only leaf nodes, and the spatial density (N '/Area') of each leaf node characteristic point is less than ρ3, then completing four leaf nodes of the father node P, wherein N 'is the number of leaf node characteristic points, area' is the Area of the leaf node, ρ3 is a set leaf node spatial density threshold value which is less than or equal to ρ1, and the specific value is obtained by adjusting according to scenes and experience. ;

Extracting characteristic points from the complemented leaf node area, if the quantity of the characteristic points of each leaf node of the father node P after the complementation does not reach a set threshold T2, enabling the father node P to be a leaf node and reconstructing the local part upwards;

The upward local reconstruction specifically includes determining whether the parent node P1 of P has only leaf nodes, if so, executing step S703 on P1.

Further, since the increase and decrease branches of the quadtree are bulky and complicated in actual operation, in this embodiment, by setting the number of quadtree layers at the beginning of the step of adjusting the branches in the new quadtree, the corresponding leaf node whose leaf node feature point number is smaller than the set threshold T2 is assigned to-1, which indicates the discarded area, and the discarded node does not play a role in merging.

S8, sliding the window, namely removing the image frame at the oldest moment in the sliding window, and adding a new image frame, as shown in FIG. 6.

The method processes the sequence data by dynamically maintaining a local data sliding window with a fixed size, and is characterized in that the sliding window discards old data and brings new data into the same way as a sliding viewport along with the advancement of the sequence, thereby realizing real-time analysis and calculation of continuous data. The method skillfully balances time efficiency and space efficiency, avoids the cost of processing the whole data set, can capture the local characteristics of the data, and is particularly suitable for scenes such as time sequence analysis, real-time signal processing, stream data mining and the like.

Example 2:

As shown in fig. 7, in this embodiment, a cloud-edge collaborative visual slam system based on feature point prediction is provided, where the system includes an edge end, an edge server and a cloud server that communicate with each other through a communication module;

The edge end applies the cloud edge collaborative visual slam method based on the feature point prediction to obtain image feature point distribution, and builds and stores a sub-map;

It should be noted that, the system provided in the foregoing embodiment is only exemplified by the division of the foregoing functional modules, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above, and the system is a cloud edge collaborative visual slam method based on feature point prediction applied to the foregoing embodiment.

Example 3:

In the embodiment, the cloud edge cooperative visual slam device based on the feature point prediction comprises a memory and at least one processor, wherein instructions are stored in the memory, the memory is connected with the at least one processor through a line, and the at least one processor calls the instructions in the memory so that the cloud edge cooperative visual slam device based on the feature point prediction can execute the cloud edge cooperative visual slam method based on the feature point prediction.

Example 4:

In this embodiment, a storage medium is provided, where a program is stored, and when the program is executed by a processor, a cloud edge cooperative visual slam method based on feature point prediction is implemented, specifically:

adding images of a plurality of initial continuous frames into a sliding window;

extracting feature points from all pixels of the image of a plurality of frames which are initially continuous;

establishing a quadtree for the feature points extracted from each frame;

The quadtree is saved as prior information of the corresponding frame image;

Combining prior information corresponding to the images of a plurality of initial continuous frames, and establishing a new quadtree as global prior information;

And adding the image of the next frame into the sliding window, removing the image of the earliest frame and the corresponding prior information, and predicting the characteristic point distribution of the image of the next frame by utilizing the image in the sliding window and the corresponding prior information.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of techniques known in the art, discrete logic circuits with logic gates for implementing logic functions on data signals, application specific integrated circuits with appropriate combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The cloud edge cooperative visual slam method based on feature point prediction is characterized by comprising the following steps of:

adding a plurality of initial continuous frame images into a sliding window;

establishing a quadtree for the feature points extracted from each frame;

The quadtree is saved as prior information of the corresponding frame image;

the branches in the new quadtree are adjusted with the actual feature point distribution of the next frame image, the method specifically comprises the following steps:

Judging whether the father node P1 of the P has only leaf nodes, if so, executing the partial reconstruction on the P1;

2. The cloud edge collaborative visual slam method based on feature point prediction according to claim 1, wherein the feature point extracted from each frame is set up into a quadtree, and the method comprises the following steps:

creating a root node representing the entire image area;

Dividing by the node center (xc, yc):

;

3. The cloud edge collaborative visual slam method based on feature point prediction according to claim 1, wherein the new quadtree is built as global prior information, wherein the number of node feature pointsWherein w ₁,w₂,…,w_n is the weight of the images of different time frames, the subscript represents the sequence number, the weight value is determined by the time sequence of the image frames, the new image frames have larger weight, and vice versa, V ₁,V₂,…,V_n is the number of the characteristic points of the images of different time frames, and the subscript represents the sequence number;

4. The cloud edge collaborative visual slam method based on feature point prediction according to claim 1, wherein the feature point distribution of the next frame of image is predicted by using a multi-step markov probability model by using global prior information specifically comprises:

;

Wherein E ∙ is the maximum likelihood.

5. The cloud edge collaborative visual slam method based on feature point prediction according to claim 4, wherein X _n+1 is related to the finite k=3 states X _n,X_n-1,...,X_n-2 before X _n+1 to obtain a third order Markov probability model P(X_n+1|X_n,X_n-1,...,X₀)=P(X_n+1| X_n,X_n-1,...,X_n-2);

The state transition of the characteristic point distribution between the front image frame and the rear image frame in a short time satisfies the linear relation that X _n+1=aX_n +bX_n-1+cX_n-2+...+ X₁ +epsilon, epsilon is Gaussian noise, and then:

;

wherein a, b and c are linear coefficients.

6. The cloud edge collaborative visual slam method based on feature point prediction according to claim 1, wherein the number of quadtree layers is set at the beginning of the step of adjusting branches in the new quadtree, and corresponding leaf nodes with leaf node feature point numbers smaller than a set threshold T2 are assigned to-1, indicating discarded regions, and discarded nodes do not play a role in merging.

7. The cloud edge collaborative visual slam system based on the feature point prediction is characterized by comprising an edge end, an edge server and a cloud server which are communicated with each other through a communication module;

The edge end applies the cloud edge collaborative visual slam method based on the characteristic point prediction according to any one of claims 1-6 to obtain image characteristic point distribution, and builds and stores a sub-map;

8. The cloud edge cooperative visual slam device based on the feature point prediction is characterized by comprising a memory and at least one processor, wherein the memory is stored with instructions, the memory is connected with the at least one processor through a line, and the at least one processor calls the instructions in the memory to enable the cloud edge cooperative visual slam device based on the feature point prediction to execute the cloud edge cooperative visual slam method based on the feature point prediction according to any one of claims 1-6.

9. A storage medium storing a program, wherein the program when executed by a processor implements a cloud edge collaborative visual slam method based on feature point prediction as set forth in any one of claims 1-6.