CN112016568B

CN112016568B - Tracking method and device for image feature points of target object

Info

Publication number: CN112016568B
Application number: CN201910467373.2A
Authority: CN
Inventors: 黄源; 柴政; 权美香; 穆北鹏
Original assignee: Beijing Chusudu Technology Co ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2024-07-05
Anticipated expiration: 2039-05-31
Also published as: CN112016568A

Abstract

The embodiment of the invention discloses a tracking method and a device for image feature points of a target object, wherein the method comprises the following steps: acquiring a current frame acquired by image acquisition equipment and a corresponding reference frame; acquiring pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame; obtaining target depth information corresponding to target feature points in a reference frame and first position information of the reference frame; based on pose conversion information, target depth information and first position information, determining first projection area information corresponding to a space point corresponding to a target feature point in a current frame, and further determining second projection area information from a reference frame; and determining the corresponding position information of the space point corresponding to the target feature point in the current frame based on the gray value of the first projection area information and the gray value of the second projection area information so as to realize more accurate tracking of the image feature point and improve the accuracy of a tracking result.

Description

Tracking method and device for image feature points of target object

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a tracking method and device of image feature points of a target object.

Background

In the related characteristic tracking process, an optical flow method based on an optimization problem is generally utilized to realize the tracking of image characteristic points, and the process specifically comprises the following steps: obtaining a current frame; and obtaining the target area position of the target feature point in the previous frame of the current frame, for example: a region of 21 x 21 in the previous frame with the target feature point as a center point is taken as a target region position; determining, from the current frame, a region position closest to the gray value of the target region position as a region position of a spatial point corresponding to the target feature point in the current frame, for example: determining a block pixel of 21 x 21 closest to the gray value of the target area position from the current frame, and taking the block pixel as the corresponding area position of the space point corresponding to the target feature point in the current frame; and determining the corresponding position information of the spatial point corresponding to the target feature point in the current frame based on the corresponding region position of the spatial point corresponding to the target feature point in the current frame, so as to obtain a tracking result.

However, the gray value is susceptible to environmental factors, and in the above-described feature point tracking process, feature point tracking is performed using only the similarity of the gray values of the pixels in the current frame and the previous frame, resulting in insufficient accuracy of the determined tracking result.

Disclosure of Invention

The invention provides a method and a device for tracking image feature points of a target object, which are used for realizing more accurate tracking of the image feature points and improving the accuracy of a tracking result. The specific technical scheme is as follows:

In a first aspect, an embodiment of the present invention provides a method for tracking an image feature point of a target object, including:

obtaining a current frame acquired by image acquisition equipment and a reference frame corresponding to the current frame, wherein the reference frame is: image frames meeting preset reference frame screening conditions before the current frame;

acquiring pose conversion information between the pose acquired by the image acquisition equipment and the pose acquired by the reference frame;

Obtaining target depth information corresponding to target feature points in the reference frame, and obtaining first position information of the target feature points in the reference frame;

Determining first projection area information corresponding to a space point corresponding to the target feature point in the current frame based on the pose conversion information, the target depth information and the first position information;

Determining second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information;

and determining the position information corresponding to the space point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information.

Optionally, the step of obtaining pose conversion information between the pose when the image acquisition device acquires the current frame and the pose when the image acquisition device acquires the reference frame includes:

Acquiring current pose information when the image acquisition equipment acquires the current frame;

acquiring reference pose information when the image acquisition equipment acquires the reference frame;

And determining pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame according to the current pose information and the reference pose information.

Optionally, before the step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, the method further includes:

constructing an image pyramid aiming at the current frame to obtain a preset number of first subframes corresponding to the current frame;

constructing an image pyramid aiming at the reference frame to obtain the second subframes with the preset number corresponding to the reference frame;

determining third projection area information corresponding to the first projection area information from each first subframe based on the first projection area information;

determining fourth projection area information corresponding to the first projection area information from each second subframe based on the first projection area information;

Determining second position information corresponding to the target feature point from each second subframe based on the first position information;

The step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information includes:

And determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information and the gray value corresponding to each fourth projection area information.

Optionally, before the step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, and the gray value corresponding to each fourth projection area information, the method further includes:

Obtaining a first M frame of the current frame, wherein M is a positive integer;

image pyramid construction is carried out on each frame of the previous M frames, and the preset number of third subframes corresponding to each frame of the previous M frames are obtained;

Determining fifth projection area information corresponding to the first projection area information from each frame of the previous M frames based on the first projection area information;

Based on the first projection area information, determining sixth projection area information corresponding to the first projection area information from each third subframe;

Obtaining third position information corresponding to the space point corresponding to the target feature point in each frame of the previous M frames;

determining fourth position information corresponding to a space point corresponding to the target feature point from each third subframe based on the third position information;

the step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, and the gray value corresponding to each fourth projection area information includes:

And determining position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information and the gray value corresponding to each sixth projection area information.

Optionally, the step of determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information, and the gray value corresponding to each sixth projection area information includes:

Constructing a least square equation aiming at the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information and the gray value corresponding to each sixth projection area information;

And solving the least square equation, and taking the solution when the result of the least square equation meets the preset constraint condition as the corresponding position information of the space point corresponding to the target feature point in the current frame.

Optionally, the step of obtaining the target depth information corresponding to the target feature point in the reference frame includes:

And obtaining target depth information corresponding to the target feature points in the reference frame through a depth filter.

Optionally, the step of obtaining, by a depth filter, target depth information corresponding to a target feature point in the reference frame includes:

Acquiring current pose change information between the pose of the reference frame acquired by the image acquisition equipment and the pose of the frame before the current frame acquired by the image acquisition equipment;

Acquiring current observation depth information corresponding to the target feature point by using a triangulation algorithm, the first position information, the position information of an imaging point of a space point corresponding to the target feature point in a frame before the current frame and the current pose change information;

Determining the current depth uncertainty corresponding to the target feature point by using the current observation depth information and a preset pixel error;

And updating parameters of the current depth filter by using the current observed depth information and the current depth uncertainty to obtain output when the current depth filter is re-converged, and obtaining target depth information corresponding to the target feature point based on the output.

Optionally, the step of determining second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information includes:

Calculating a first affine transformation matrix from the current frame to the reference frame based on the pose conversion information, the target depth information and the first position information;

And determining second projection area information corresponding to the first projection area information from the reference frame based on the first projection area information and the first affine transformation matrix.

In a second aspect, an embodiment of the present invention provides a tracking device for an image feature point of a target object, including:

The first obtaining module is configured to obtain a current frame acquired by the image acquisition equipment and a corresponding reference frame, wherein the reference frame is: image frames meeting preset reference frame screening conditions before the current frame;

The second obtaining module is configured to obtain pose conversion information between the pose when the current frame is acquired by the image acquisition equipment and the pose when the reference frame is acquired by the image acquisition equipment;

The third obtaining module is configured to obtain target depth information corresponding to target feature points in the reference frame and obtain first position information of the target feature points in the reference frame;

The first determining module is configured to determine first projection area information corresponding to a space point corresponding to the target feature point in the current frame based on the pose conversion information, the target depth information and the first position information;

a second determining module configured to determine second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information, and the first projection area information;

and a third determining module configured to determine position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information.

As can be seen from the foregoing, the feature tracking method and apparatus provided by the embodiments of the present invention may obtain a current frame acquired by an image acquisition device and a reference frame corresponding to the current frame, where the reference frame is: image frames meeting preset reference frame screening conditions before the current frame; acquiring pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame; obtaining target depth information corresponding to target feature points in a reference frame, and obtaining first position information of the target feature points in the reference frame; determining first projection area information corresponding to a space point corresponding to a target feature point in a current frame based on pose conversion information, target depth information and first position information; determining second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information; and determining the position information corresponding to the space point corresponding to the target characteristic point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information.

By the embodiment of the invention, the pose conversion information of the pose between the current frame and the reference frame and the target depth information corresponding to the target feature point in the reference frame can be acquired by combining the image acquisition equipment, the first projection area information of the space point corresponding to the target feature point is determined from the current frame, namely, the projection range with larger probability of the imaging point corresponding to the space point corresponding to the target feature point is determined from the current frame, and further, the position information corresponding to the space point corresponding to the target feature point in the current frame is determined based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, so that the tracking of the space point corresponding to the target feature point is improved to a certain extent, the more accurate tracking of the feature is avoided, and the accuracy of the tracking result is improved. Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

The innovation points of the embodiment of the invention include:

1. and acquiring pose conversion information of the pose between the current frame and the reference frame and target depth information corresponding to the target feature point in the reference frame by combining with image acquisition equipment, and pre-determining first projection area information of the space point corresponding to the target feature point from the current frame, namely pre-determining a projection range with larger probability of existence of the pixel point corresponding to the space point corresponding to the target feature point from the current frame. Based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, determining the space point corresponding to the target feature point, and the position information corresponding to the target feature point in the current frame, namely the position information of the imaging point of the space point corresponding to the target feature point in the current frame, tracking the space point corresponding to the target feature point is improved to a certain extent, more accurate tracking of the image feature point is realized, and accuracy of a tracking result is improved.

2. The image pyramid of the current frame and the reference frame is constructed, and then the corresponding position information of the space point corresponding to the target feature point in the current frame is determined by combining the gray value corresponding to the projection area information corresponding to the target feature point in the sub-frames corresponding to the current frame and the reference frame, so that the situation that the tracking result is inaccurate due to too fast movement of the space point is avoided to a certain extent, and the accuracy of the tracking result is further improved.

3. Combining gray values corresponding to position information of imaging points of the space points corresponding to the target feature points in the previous frame, aiming at the problem of determining the position information corresponding to the space points corresponding to the target feature points in the current frame, constructing multi-constraint conditions, realizing multi-constraint on tracking of the space points corresponding to the target feature points, and further better improving accuracy of tracking results.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the invention. Other figures may be derived from these figures without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a schematic flow chart of a tracking method of image feature points of a target object according to an embodiment of the present invention;

Fig. 2 is another flow chart of a tracking method of image feature points of a target object according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of another method for tracking image feature points of a target object according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a scenario in which the triangularization process and uncertainty are determined;

Fig. 5 is a schematic structural diagram of a tracking device for image feature points of a target object according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present invention and the accompanying drawings are intended to cover non-exclusive inclusions. A process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may alternatively include other steps or elements not listed or inherent to such process, method, article, or apparatus.

The invention provides a method and a device for tracking image feature points of a target object, which are used for realizing more accurate tracking of the image feature points and improving the accuracy of a tracking result. The following describes embodiments of the present invention in detail.

Fig. 1 is a schematic flow chart of a tracking method of image feature points of a target object according to an embodiment of the present invention. The method may comprise the steps of:

s101: and obtaining the current frame acquired by the image acquisition equipment and the corresponding reference frame.

Wherein, the reference frame is: image frames satisfying preset reference frame screening conditions before the current frame.

In the embodiment of the invention, the method can be applied to any type of electronic equipment, and the electronic equipment can be a server or terminal equipment. The electronic equipment can be image acquisition equipment or non-image acquisition equipment, and when the electronic equipment is the non-image acquisition equipment, the electronic equipment can be in communication connection with the image acquisition equipment. Wherein the image acquisition device may be arranged on a movable object, for example: the movable object may be a vehicle, a robot, or the like. In this case, when the image pickup apparatus is provided on the vehicle, the electronic apparatus may be an in-vehicle apparatus provided on the vehicle or may be an apparatus not provided on the vehicle, which is all right.

The electronic device may obtain a current frame acquired by the image device and a reference frame corresponding to the current frame, where the current frame is: the image acquisition equipment acquires an image frame at the current moment, and the reference frame is as follows: the image frame meeting the preset reference frame screening condition is acquired by the image acquisition equipment at a time before the current time, wherein the meeting the preset reference frame screening condition can be as follows: before the image acquisition equipment acquires the image frame, the acquired reference frame and non-reference frame have larger difference, wherein the larger difference can mean that the similarity between the acquired reference frame and the non-reference frame and other previous reference frames is smaller than a preset similarity threshold; the meeting of the preset reference frame screening condition may be: the pose of the corresponding image acquisition equipment is greatly different from the pose of the image acquisition equipment corresponding to other previous reference frames, and the pose of the image acquisition equipment corresponding to the other reference frames is as follows: and acquiring the pose of other reference frames by the image acquisition equipment. After the electronic device obtains the current frame, a reference frame with the acquisition time closest to the current moment can be selected from a plurality of pre-stored reference frames to serve as a reference frame corresponding to the current frame. Wherein, the reference frame corresponding to the current frame and the current frame have a coincidence region.

S102: and acquiring pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame.

In this step, the image capturing device has a pose when capturing the reference frame, and has a pose when capturing the current frame, and the image capturing device may have different poses when capturing the reference frame and capturing the current frame, and after capturing the current frame, the electronic device may directly obtain pose conversion information between the pose of the image capturing device when capturing the reference frame and the pose of the image capturing device when capturing the reference frame, where the pose conversion information may be characterized: the pose conversion relation between the pose of the image acquisition device when acquiring the reference frame and the pose of the image acquisition device when acquiring the current frame, wherein the pose conversion information can comprise: the image acquisition equipment acquires the pose of the reference frame and converts the pose into a conversion relation of the pose of the current frame; and/or the pose of the image acquisition equipment when acquiring the current frame is converted into the conversion relation of the pose when acquiring the reference frame. In one implementation, the S102 may include:

acquiring current pose information when the image acquisition equipment acquires a current frame;

Acquiring reference pose information when the image acquisition equipment acquires a reference frame;

In one implementation manner, the electronic device may first obtain inertial sensor data corresponding to the current frame, where the inertial sensor data is: sensor data acquired by an inertial sensor disposed on the same movable object as the image acquisition device. The inertial sensor data corresponding to the current frame may refer to: the inertial sensor acquires the inertial sensor data acquired in the acquisition period of the current frame acquired by the image acquisition equipment.

After the electronic equipment obtains inertial sensor data corresponding to the current frame, the electronic equipment can continuously obtain reference pose information when the image acquisition equipment acquires the reference frame, wherein the reference pose information is information which can represent the position and the pose of the image acquisition equipment when the image acquisition equipment acquires the reference frame; the electronic equipment can estimate pose information when the image acquisition equipment acquires the current frame based on the inertial sensor data, and the pose information is used as the current pose information; and further, based on the current pose information and the reference pose information, the pose conversion information can be determined. Based on the inertial sensor data, the electronic device estimates pose information when the image acquisition device acquires the current frame, and the process of using the pose information as the current pose information can be: the pose information of the previous frame acquired by the image acquisition equipment is taken as the previous pose information, the time interval between the previous frame acquired by the image acquisition equipment and the current frame is acquired, further, the pose change condition of the previous frame acquired by the image acquisition equipment relative to the pose change condition acquired by the previous frame can be determined to be taken as the first pose change condition when the current frame is acquired by the image acquisition equipment based on the inertial sensor data corresponding to the current frame and the time interval, and further, the pose information of the current frame acquired by the image acquisition equipment can be estimated and acquired based on the first pose change condition and the previous pose information to be taken as the current pose information.

S103: target depth information corresponding to target feature points in the reference frame is obtained, and first position information of the target feature points in the reference frame is obtained.

It may be appreciated that after the image acquisition device acquires the image, each pixel point in the image may correspond to depth information, where the depth information may represent: the distance between the corresponding spatial point of the corresponding pixel point and the image acquisition equipment. In this step, the electronic device obtains target depth information corresponding to the target feature point in the reference frame, where the target depth information may be represented: the space point corresponding to the target feature point is distant from the position where the image acquisition device acquires the reference frame. In one case, the distance between the space point corresponding to the target feature point and the image acquisition equipment can be obtained through measurement of a laser sensor; alternatively, the target depth information corresponding to the target feature point in the reference frame may be obtained by a triangulation algorithm and a depth filter. For layout purposes, a specific calculation process is described later.

And the electronic device may obtain first position information of a target feature point in the reference frame, where the target feature point is a pixel point to be tracked in the reference frame. It may be understood that the reference frame may include one or more pixels to be tracked, and when the reference frame includes a plurality of pixels to be tracked, each pixel to be tracked may be used as a target feature point, and the S103 and subsequent processes are further executed for each target feature point.

S104: and determining first projection area information corresponding to the space point corresponding to the target feature point in the current frame based on the pose conversion information, the target depth information and the first position information.

In this step, the electronic device may first determine, based on pose conversion information, target depth information, and first position information, projection position information of a projection point of a spatial point corresponding to a target feature point in a current frame, as the first projection point position information, and further, based on a preset extension rule and the first projection point position information, extend in the current frame to obtain first projection area information corresponding to the spatial point corresponding to the target feature point in the current frame, that is, determine, from the current frame, a projection range with a larger probability of having an imaging point corresponding to the spatial point corresponding to the target feature point.

For example, the preset extension rule may be: and taking a projection point of a space point corresponding to the target feature point in the current frame as a center point and taking a preset size n as an extension size for extension, for example: the preset size n may be 10, that is, the projection point of the space point corresponding to the target feature point in the current frame is taken as a center point, 10 pixel points are respectively extended from top to bottom, left to right, so as to obtain a 21 x 21 pixel block, and the pixel block is taken as a first projection area corresponding to the space point corresponding to the target feature point in the current frame, where the first projection area information is information describing the position of the 21 x 21 pixel block in the current frame; or may be: random expansion rules, such as: and randomly expanding based on projection points of the space points corresponding to the target feature points in the current frame to obtain an area with a preset size of M1 x M2, wherein M1 can be equal to M2 or not equal to M2.

The determining, based on the pose conversion information, the target depth information, and the first position information, projection position information of a projection point of a spatial point corresponding to a target feature point in a current frame, as the first projection point position information, may be: based on the first position information and a preset projection matrix, the position information of the spatial point corresponding to the target feature point under the equipment coordinate system of the image acquisition equipment can be determined and obtained to be used as the first equipment position information, wherein the preset projection matrix is as follows: the projection matrix from the device coordinate system of the image acquisition device to the image coordinate system, and the first device position information is: the space point corresponding to the target feature point corresponds to equipment position information when the image acquisition equipment is positioned at the position where the acquired reference frame is positioned; furthermore, based on the pose conversion information, the target depth information, and the first device position information, it may be determined that the spatial point corresponding to the target feature point corresponds to three-dimensional position information when the image acquisition device acquires the current frame, where the three-dimensional position information is used as second device position information, and the second device position information includes: the space point corresponding to the target feature point is located at a distance from the position where the image acquisition equipment is located when acquiring the current frame; and determining, according to the second device position information and the preset projection matrix, position information corresponding to the spatial point corresponding to the target feature point in the current frame, that is, projection position information of the spatial point corresponding to the target feature point in the projection point in the current frame, as first projection point position information, that is, an initial value of the imaging point of the spatial point corresponding to the target feature point in the current frame. The determining, based on the pose conversion information, the target depth information and the first position information, projection position information of a projection point of a spatial point corresponding to a target feature point in a current frame, where the process of using the projection position information as the first projection point position information adopts formulas (1) and (2) may be expressed as follows:

Wherein pi (·) represents a preset projection matrix, that is, a projection matrix from an equipment coordinate system of the image acquisition equipment to an image coordinate system, the image acquisition equipment determines, and the preset projection matrix determines; pi ^-1 (·) represents the inverse of the preset projection matrix; q _r denotes position information of the target feature point in the reference frame, i.e., first position information; Representing pose conversion information, specifically: the pose of the image acquisition equipment when acquiring the reference frame is converted into a conversion relation of the pose of the image acquisition equipment when acquiring the current frame, the pose of the image acquisition equipment when acquiring the reference frame can be represented, the relative position conversion of the pose of the image acquisition equipment when acquiring the current frame is converted into the relative position conversion of the pose of the image acquisition equipment when acquiring the current frame can be acquired through IMU (Inertial measurement unit ) data integration; Representing target depth information corresponding to target feature points in a reference frame; d (·) represents a distortion matrix of the preset image acquisition device; d ^-1 (·) represents the inverse of the distortion matrix of the preset image acquisition device; first projection point position information representing projection points of space points corresponding to target feature points in a current frame; And representing the second equipment position information corresponding to the space point corresponding to the target feature point when the image acquisition equipment acquires the current frame.

In this step, through pose conversion information, target depth information and first position information, first projection area information corresponding to a spatial point corresponding to a target feature point in a current frame can be determined, where the first projection area information is position information of a first projection area, and an imaging point of the spatial point corresponding to the target feature point in the current frame has a maximum probability of falling into the first projection area. In the embodiment of the invention, the first projection area information is determined first, so that the calculated amount is reduced to a certain extent, and the accuracy of the position information corresponding to the space point corresponding to the target feature point in the current frame is improved.

S105: and determining second projection area information corresponding to the first projection area information in the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information.

After determining the first projection area information corresponding to the space point corresponding to the target feature point in the current frame, the electronic device may continue to determine the second projection area information corresponding to the first projection area information from the reference frame. In one implementation, S105 may include:

Calculating to obtain a first affine transformation matrix from the current frame to the reference frame based on the pose conversion information, the target depth information and the first position information; wherein the first affine transformation matrix of the current frame to the reference frame may refer to: affine transformation matrix transformed from the current frame to the reference frame.

With reference to the above process, the projection position information of the projection point of the spatial point corresponding to the target feature point in the current frame, that is, the first projection point position information, can be determined through the pose conversion information, the target depth information and the first position information, and then, based on the first projection point position information and the first position information, the first affine transformation matrix from the current frame to the reference frame can be calculated. Subsequently, the first projection area information may be transformed based on the first affine transformation matrix to determine second projection area information corresponding to the first projection area information from the reference frame, which may be specifically expressed by the following formula (3):

Where Patch _r represents second projection area information, Representing the inverse of the first affine transformation matrix, patch _k represents the first projected area information. It can be understood that the position information of the pixel point in the first projection area information has a one-to-one correspondence with the position information of the pixel point in the second projection area information.

Wherein the first affine transformation matrix can be expressed as:

Where, (u _r,v_r) represents first position information, i.e., Q _r;(u_k,v_k) represents projection position information of a projection point of a spatial point corresponding to a target feature point in the current frame.

In one case, the values of the four elements in the first affine transformation matrix may be calculated by:

And determining the position of a feature point in the reference frame, such as the position of a target feature point (u _r,v_r), and determining the position of a projection point of a space point corresponding to the target feature point in the current frame from the current frame based on the pose conversion information, the target depth information and the first position information (u _k,v_k). On the basis of the position of the target feature point, one horizontal-direction change amount deltau _r is added, and the position of the changed target feature point is expressed as: (u _r+Δu_r,v_r) determining the change of the position of the projection point of the space point corresponding to the target feature point in the current frame, if the position of the projection point of the space point corresponding to the changed target feature point in the current frame is: (u _k+Δu_k,v_k+Δv_k) calculated to give Values of two elements in the first column of the first affine transformation matrix, respectively 2x 2, i.e

Similarly, on the basis of the position of the target feature point, a horizontal variation Δv _r is added, and further, on the basis of the variation Δv _r, the variation of the position of the projection point of the space point corresponding to the target feature point in the current frame is determined, and further, the values of two elements in the second column in the first affine transformation matrix are determined. The specific determining process is the same as that of determining the values of the two elements in the first row in the first affine transformation matrix, and will not be described herein.

S106: and determining the position information corresponding to the space point corresponding to the target characteristic point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information.

In this step, after determining the first projection area information and the second projection area information, the electronic device may construct a least square equation for the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information based on a least square method principle, as a first least square equation; and further, solving the first least square equation, and taking the solution when the result of the first least square equation meets the preset constraint condition as the corresponding position information of the space point corresponding to the target feature point in the current frame. Wherein, the preset constraint condition may be: the result of defining the first least squares equation is not greater than a first preset threshold or is at a minimum.

The gray value corresponding to the first projection area information refers to: the gray value of each pixel point in the first projection area represented by the first projection area information can also be called the gray value of each pixel point in the first projection area information; the gray value corresponding to the second projection area information refers to: the gray value of each pixel in the second projection area represented by the second projection area information may also be referred to as the gray value of each pixel in the second projection area information.

The gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information can be used for establishing gray level difference between the two projection area information pixels, and the lower the gray level difference is, the greater the gray level similarity between the two projection area information pixels is, and the more accurate the corresponding position information of the corresponding space point corresponding to the corresponding determined target feature point in the current frame is.

Accordingly, the first least square equation can be expressed by equation (4):

Wherein Q ₁ represents the result of the first least square equation, I (p _k+Δp_kj) represents the gray value of the j-th pixel point in the first projection area information corresponding to the target feature point in the current frame, j may be an integer in [1, (2n+1) ² ], p _k represents the projection position information corresponding to the space point corresponding to the target feature point to be solved in the current frame, and the iteration initial value is the projection position information of the projection point of the space point corresponding to the target feature point in the first projection area information in the current frame, that is, the first projection point position information; Δp _kj represents the distance from the jth pixel point in the first projection area information to the projection point of the spatial point corresponding to the target feature point in the current frame, and when the jth pixel point is the projection point of the spatial point corresponding to the target feature point in the current frame, Δp _kj may be 0; i (p _r+Δp_rj) represents the gray value of the j-th pixel point in the second projection area information, p _r represents the position of the first pixel point corresponding to the projection point in the current frame of the spatial point corresponding to the target feature point in the second projection area information, Δp _rj represents the distance from the jth pixel point to the first pixel point in the second projection area information, Representing the square of the modulus of the vector or value.

In solving the first least square equation, a gaussian-newton or levenberg-marquardt iterative algorithm may be used.

By the embodiment of the invention, the pose conversion information of the pose between the current frame and the reference frame and the target depth information corresponding to the target feature point in the reference frame can be acquired by combining the image acquisition equipment, the first projection area information of the space point corresponding to the target feature point is determined from the current frame, namely, the projection range with larger probability of existence of the pixel point corresponding to the space point corresponding to the target feature point is determined from the current frame, and further, the position information corresponding to the space point corresponding to the target feature point in the current frame is determined based on the first projection area information, the second projection area information and the first position information, so that the tracking of the space point corresponding to the target feature point is improved to a certain extent, the more accurate tracking of the feature is avoided, and the accuracy of the tracking result is improved.

In another embodiment of the present invention, as shown in fig. 2, the method may include the steps of:

s201: and obtaining the current frame acquired by the image acquisition equipment and the corresponding reference frame.

S202: and acquiring pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame.

S203: target depth information corresponding to target feature points in the reference frame is obtained, and first position information of the target feature points in the reference frame is obtained.

S204: and determining first projection area information corresponding to the space point corresponding to the target feature point in the current frame based on the pose conversion information, the target depth information and the first position information.

S205: and determining second projection area information corresponding to the first projection area information in the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information.

S206: and constructing an image pyramid aiming at the current frame to obtain a preset number of first subframes corresponding to the current frame.

S207: and constructing an image pyramid aiming at the reference frame to obtain a preset number of second subframes corresponding to the reference frame.

S208: and determining third projection area information corresponding to the first projection area information from each first subframe based on the first projection area information.

S209: and determining fourth projection area information corresponding to the first projection area information from each second subframe based on the first projection area information.

S210: and determining second position information corresponding to the target feature point from each second subframe based on the first position information.

S211: and determining the position information of the space point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information and the gray value corresponding to each fourth projection area information.

Wherein, S201 is the same as S101 shown in fig. 1, S202 is the same as S102 shown in fig. 1, S203 is the same as S103 shown in fig. 1, S204 is the same as S104 shown in fig. 1, and S205 is the same as S105 shown in fig. 1, and the description thereof is omitted. The S211 is one implementation of S106 shown in fig. 1.

Considering that when the moving speed of the image acquisition device is too high, for example, the moving speed exceeds a preset speed threshold, the tracking result of the image feature points may be affected, so that the tracking result is wrong, in this embodiment, image pyramid construction may be performed for the current frame and the reference frame respectively, that is, images with different resolutions may be constructed for the current frame, so as to obtain a predetermined number of first subframes; and constructing images with different resolutions for the reference frame to obtain a predetermined number of second subframes.

Taking image pyramid construction for a current frame as an example, when the image pyramid construction is performed for the current frame, the current frame may be taken as a bottom layer of the image pyramid, then each layer of other layers according to a preset number of image pyramids, based on a preset resolution corresponding to the layer, a first subframe corresponding to the layer corresponding to the current frame is obtained, the resolution of the first subframe corresponding to each layer is the resolution corresponding to the layer, the constructed image pyramid is upward from the bottom layer of the image pyramid, and the resolution of the image corresponding to each layer is smaller and smaller. The specific image pyramid construction mode can be referred to as a Lukas-Kanade optical flow method.

When the image pyramid construction is carried out on the image, the subframes with different resolutions corresponding to the image frame are constructed based on the image frame, rotation and translation can not exist between the image frame and the corresponding image subframe, and an affine transformation matrix from the image subframe corresponding to the image frame can be expressed by the evolution of the ratio of the resolution of the image subframe corresponding to the image frame to the resolution of the image frame. Namely: for each first subframe, the second affine transformation matrix between the current frame and the first subframe can be represented by an evolution of the ratio of the resolution of the first subframe to the resolution of the current frame. For example: the resolution of the first subframe is 500×500, the resolution of the current frame is 1000×1000, and at this time, the second affine transformation matrix from the current frame to the first subframe may be expressed as 1/2, that is, the position information of the point corresponding to the projection point in the current frame in the first subframe is: 1/2 the location information of the projection point in the current frame, where the example is only one example of performing image pyramid construction on the image frame, when performing image pyramid construction on the image frame, the resolution of the subframe may be adjusted according to the user requirement, and correspondingly, the second affine transformation matrix from the image frame to the subframe may also be adjusted correspondingly. Similarly, the affine transformation matrix of the reference frame to the second subframe corresponding to the reference frame can also be expressed by the square of the ratio of the resolution of the second subframe corresponding to the reference frame to the resolution of the reference frame.

Determining third projection area information corresponding to the first projection area information from each first subframe based on a second affine transformation matrix between each first subframe and the current frame and the first projection area information; and determining fourth projection area information corresponding to the first projection area information from each second subframe based on a third affine transformation matrix between each second subframe and the current frame and the first projection area information. The third affine transformation matrix may be determined based on the affine transformation matrix of the current frame to the reference frame and the affine transformation matrix of the reference frame to the corresponding second subframe, specifically, may be: the affine transformation matrix of the second subframe corresponding to the reference frame is multiplied by the first affine transformation matrix of the current frame to the reference frame.

Furthermore, based on the first position information, the second position information corresponding to the target feature is determined from each second subframe, it can be understood that each second subframe is obtained by constructing an image pyramid for the reference frame, the mapping relationship between each second subframe and the reference frame is determined, and based on the mapping relationship between each second subframe and the reference frame and the position information, namely the first position information, of the target feature point in the reference frame, namely the second position information corresponding to the target feature point can be determined and obtained in each second subframe.

Subsequently, based on a least square method principle, a least square method equation is constructed for the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information and the gray value corresponding to each fourth projection area information, and is used as a second least square method equation, further, the second least square method equation is solved, and the solution when the result of the second least square method equation meets the preset constraint condition is used as the position information of the space point corresponding to the target feature point in the current frame. Wherein, the preset constraint condition may be: the result of defining the second least squares equation is not greater than a second preset threshold.

Wherein the second least squares equation can be expressed by equation (5):

wherein Q ₂ represents the result of the second least squares equation, Representing the square of the modulus of the vector, N representing the predetermined number; i represents the number of layers of a preset image pyramid, i is equal to the preset number plus 1, and i can be an integer in [0, N ]; i (p _k+Δp_kj)_i represents the gray value of the jth pixel point in the first projection area information corresponding to the space point corresponding to the target feature point in the current frame, or the gray value of the jth pixel point in the third projection area information corresponding to the space point corresponding to the target feature point in each first subframe, i.e. I (p _k+Δp_kj)_i represents the gray value of the jth pixel point in the projection area information corresponding to the ith layer image of the image pyramid corresponding to the current frame) of the space point corresponding to the target feature point, for example, j may be an integer in [1, (2n+1) ² ];

When I is taken to be 0, I (p _k+Δp_kj)_i represents the gray value of the jth pixel point in the first projection area information, p _k represents the projection position information corresponding to the space point corresponding to the target feature point to be solved in the current frame, the iteration initial value is the projection position information of the projection point of the space point corresponding to the target feature point in the first projection area information in the current frame, namely the first projection point position information, Δp _kj represents the jth pixel point in the first projection area information, and the distance from the jth pixel point to the projection point of the space point corresponding to the target feature point in the current frame can be taken to be 0 when the jth pixel point is the projection point of the space point corresponding to the target feature point in the current frame;

When 1 to N are taken as I (p _k+Δp_kj)_i represents the gray value of the jth pixel point in the corresponding third projection area information in the first subframe corresponding to the ith layer, p _k represents the corresponding projection position information of the space point corresponding to the target feature point to be solved in the first subframe corresponding to the ith layer, and the iteration initial value is that in the projection area information corresponding to the first subframe corresponding to the ith layer, the projection position information of the space point corresponding to the target feature point in the projection point in the image, Δp _kj represents the projection position information of the jth pixel point in the corresponding third projection area information in the first subframe corresponding to the ith layer, the distance from the space point corresponding to the target feature point to the projection point in the first subframe corresponding to the ith layer, and accordingly, Δp _kj is equal to the second affine transformation matrix from the current frame to the first subframe corresponding to the ith layer, and the projection distance from the jth pixel point in the first projection area information to the space point corresponding to the target feature point in the current frame;

I (p _r+Δp_rj)_i represents the gray level of the jth pixel in the second projection area information or the gray level of the jth pixel in the fourth projection area information corresponding to the second subframe, i.e., I (p _k+Δp_kj)_i represents the gray level of the jth pixel in the projection area information corresponding to the ith layer of the image pyramid corresponding to the reference frame; wherein, when I takes 0, I (p _r+Δp_rj)_i represents the gray level of the jth pixel in the second projection area information; corresponding, p _r represents the position of the first pixel corresponding to the projection point in the current frame of the spatial point corresponding to the target feature point in the second projection area information; corresponding, Δp _rj represents the distance from the jth pixel in the second projection area information to the first pixel; when I takes 1 to N, I (p _r+Δp_rj)_i represents the gray level of the jth pixel in the second subframe corresponding to the ith layer; corresponding, p _r represents the gray level of the jth pixel in the second subframe corresponding to the spatial point in the second subframe corresponding to the spatial point in the current frame of the spatial point corresponding to the target feature point in the second subframe corresponding to the j layer, The method comprisesFor the affine transformation matrix between the second subframe corresponding to the ith layer and the current frame, Δp _rj represents the distance from the jth pixel point to the second pixel point in the fourth projection region information corresponding to the second subframe corresponding to the ith layer,

In this embodiment, this S206 may be performed before S202, as long as S206 is guaranteed to be performed after S201. The step S208 may be executed before the step S207 or may be executed in parallel with the step S207, as long as the step S208 is executed after the step S206, and the execution order of the steps is not limited in this embodiment.

In another embodiment of the present invention, as shown in fig. 3, the method may include the steps of:

s301: and obtaining the current frame acquired by the image acquisition equipment and the corresponding reference frame.

S302: and acquiring pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame.

S303: target depth information corresponding to target feature points in the reference frame is obtained, and first position information of the target feature points in the reference frame is obtained.

S304: and determining first projection area information corresponding to the space point corresponding to the target feature point in the current frame based on the pose conversion information, the target depth information and the first position information.

S305: and determining second projection area information corresponding to the first projection area information in the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information.

S306: and constructing an image pyramid aiming at the current frame to obtain a preset number of first subframes corresponding to the current frame.

S307: and constructing an image pyramid aiming at the reference frame to obtain a preset number of second subframes corresponding to the reference frame.

S308: and determining third projection area information corresponding to the first projection area information from each first subframe based on the first projection area information.

S309: and determining fourth projection area information corresponding to the first projection area information from each second subframe based on the first projection area information.

S310: and determining second position information corresponding to the target feature from each second subframe based on the first position information.

S311: and obtaining the first M frames of the current frame, wherein M is a positive integer.

S312: and constructing an image pyramid for each frame of the previous M frames to obtain a preset number of third subframes corresponding to each frame of the previous M frames.

S313: and determining fifth projection area information corresponding to the first projection area information from each frame of the previous M frames based on the first projection area information.

S314: and determining sixth projection area information corresponding to the first projection area information from each third subframe based on the first projection area information.

S315: and obtaining third position information corresponding to the space point corresponding to the target feature point in each frame of the previous M frames.

S316: and determining fourth position information corresponding to the space point corresponding to the target feature point from each third subframe based on the third position information.

S317: and determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information and the gray value corresponding to each sixth projection area information.

Wherein, the S301 is the same as S101 shown in fig. 1, the S302 is the same as S102 shown in fig. 1, the S303 is the same as S103 shown in fig. 1, the S304 is the same as S104 shown in fig. 1, the S305 is the same as S105 shown in fig. 1, the S306 is the same as S206 shown in fig. 2, the S307 is the same as S207 shown in fig. 2, the S308 is the same as S208 shown in fig. 2, the S309 is the same as S209 shown in fig. 2, and the S310 is the same as S210 shown in fig. 2, which will not be repeated herein. The S317 is one implementation of S106 shown in fig. 1.

In order to better improve the accuracy of the tracking result of the image feature points, the corresponding position information in the current frame of the space point corresponding to the target feature point can be determined by combining the reference frame, the current frame and the previous M frames of the current frame. Wherein, M can be 1 or any positive integer greater than 1.

Before the step of obtaining the position information corresponding to the spatial point corresponding to the target feature point in the current frame, the electronic device may continue to obtain the previous M frames of the current frame, and perform image pyramid construction for each frame of the previous M frames, to obtain a predetermined number of third subframes corresponding to each frame of the previous M frames. The process of performing image pyramid construction for each frame of the previous M frames may refer to the process of performing image pyramid construction for the current frame, which is not described herein.

Further, an affine transformation matrix of each frame from the current frame to the previous M frames is obtained through calculation, and fifth projection area information corresponding to the first projection area information is determined from each frame from the previous M frames based on the affine transformation matrix of each frame from the current frame to the previous M frames and the first projection area information; and determining sixth projection area information corresponding to the first projection area information from each third subframe. The process of calculating the affine transformation matrix of each frame from the current frame to the previous M frames may refer to the process of calculating the affine transformation matrix of each frame from the current frame to the reference frame, and the process of determining the sixth projection region information corresponding to the first projection region information from each third subframe may refer to the above process of determining the third projection region information corresponding to the first projection region information from each first subframe, which is not described herein.

Subsequently, the electronic device stores, locally or in a connected storage device, third position information corresponding to the spatial point corresponding to the target feature point in each frame of the previous M frames, that is, third position information of the imaging point corresponding to the spatial point corresponding to the target feature point in each frame of the previous M frames. The electronic equipment obtains a space point corresponding to the target feature point, and corresponding third position information in each frame of the previous M frames; determining fourth position information corresponding to the space point corresponding to the target feature point in each third subframe from each third subframe based on the third position information; and further, combining the current frame, the reference frame and each frame in the previous M frames of the current frame and the subframes corresponding to the frames to determine the corresponding position information of the spatial point corresponding to the target feature point in the current frame. The determining, based on the third location information, the fourth location information corresponding to the spatial point corresponding to the target feature point from each third subframe may refer to determining, based on the first location information, the second location information corresponding to the spatial point corresponding to the target feature point from each second subframe, which is not described herein.

In another embodiment of the present invention, S317 may include:

and solving a least square equation, and taking a solution when the result of the least square equation meets a preset constraint condition as the corresponding position information of the space point corresponding to the target feature point in the current frame.

In this embodiment, for the sake of layout clarity, a least square equation constructed for the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information, and the gray value corresponding to each sixth projection area information is referred to as a third least square equation. After the third least square equation is constructed, solving the third least square equation, and taking the solution when the result of the third least square equation meets the preset constraint condition as the corresponding position information of the space point corresponding to the target feature point in the current frame. The preset constraint may be that the result of defining the third least squares equation is not greater than a third preset threshold, or that it is at a minimum.

In one implementation, the current frame, the reference frame, and a frame previous to the current frame may be combined to determine location information corresponding to the spatial point corresponding to the target feature point in the current frame. Specifically, the resulting third least squares equation can be expressed by the following formula (6):

Wherein Q ₃ represents the result of the third least squares equation, I (p _l+Δp_lj)_i represents the gray value of the j-th pixel in the fifth projection area information corresponding to the target feature point in the previous frame, or the gray value of the j-th pixel in the sixth projection area information corresponding to the third subframe, where I is 0, (p _l+Δp_lj)_i represents the gray value of the j-th pixel in the fifth projection area information corresponding to the target feature point in the previous frame, and p _l represents the position of the third pixel corresponding to the projection point in the current frame, and Δp _lj represents the distance from the j-th pixel in the fifth projection area information, and I is 1 to N, (p _l+Δp_lj)_i represents the sixth projection area information corresponding to the third subframe corresponding to the I layer, and p _l represents the position of the fourth pixel corresponding to the target feature point in the fourth projection area information corresponding to the sixth subframe corresponding to the I layer, Representing an affine transformation matrix between a third subframe corresponding to the ith layer and the current frame, and correspondingly, Δp _lj represents the distance from the jth pixel point to the fourth pixel point in the sixth projection region information corresponding to the third subframe corresponding to the ith layer,

In another embodiment of the present invention, the step of obtaining target depth information corresponding to a target feature point in a reference frame may include:

In order to ensure the accuracy of the tracking result of the image feature points, in this embodiment, a depth filter may be pre-established, where the depth filter is used to determine and obtain depth information corresponding to the image feature points with higher accuracy. In this embodiment, the target depth information corresponding to the target feature point in the reference frame may be obtained through a depth filter.

In another embodiment of the present invention, the step of obtaining, by a depth filter, target depth information corresponding to a target feature point in a reference frame may include:

Acquiring current observation depth information corresponding to the target feature point by using a triangulation algorithm, first position information, position information of an imaging point of a space point corresponding to the target feature point in a frame before the current frame and current pose change information;

determining the current depth uncertainty corresponding to the target feature point by using the current observation depth information and the preset pixel error;

And updating parameters of the current depth filter by using the current observation depth information and the current depth uncertainty to obtain output when the current depth filter is re-converged, and obtaining target depth information corresponding to the target feature point based on the output.

The current depth filter is: and determining the depth filter based on the position information of the target feature point in the reference frame, namely the first position information, and the position information of the imaging point of the space point corresponding to the target feature point in each frame in the target frame. Wherein the target frame may include: each image frame between the reference frame and a frame previous to the current frame.

An initial depth filter can be established in advance for the spatial point corresponding to the target feature point, namely an initial depth filter model is established for the spatial point corresponding to the target feature point; and determining observation depth information by using a triangulation algorithm, first position information, position information of imaging points of the target feature points in the target frame and pose change information between a pose when the image acquisition device acquires a reference frame and a pose when the image acquisition device acquires the target frame after determining position information of the imaging points of the target feature points in the target frame by using the tracking process of the image feature points of the target object provided by the embodiment of the invention, and further determining depth uncertainty corresponding to the target feature points based on the observation depth information and preset pixel errors, updating parameters of a depth filter by using the observation depth information and the depth uncertainty to obtain output when the depth filter is re-converged, and further determining more accurate depth information based on the output. The determined more accurate depth information is output as depth information required for determining the position information of the target feature point at the imaging point of the next frame of the target frame.

The process of determining the observation depth information by using the triangulation algorithm, the first position information, the position information of the imaging point of the spatial point corresponding to the target feature point in the target frame, and the pose change information between the pose when the image acquisition device acquires the reference frame and the pose when the image acquisition device acquires the target frame may be referred to as a triangulation process.

For example, in one case, the target feature point in the reference frame may be observed for the first time, and when the position information of the target feature point at the imaging point of the frame subsequent to the reference frame is determined when the frame subsequent to the reference frame is obtained, the calculation manner of the target depth information corresponding to the target feature point may be: and determining target depth information corresponding to the target feature points based on the depth information of other feature points in the reference frame. The other feature points may be: other feature points which do not appear for the first time in the reference frame, and depth information of the other feature points can be determined through a triangulation process. In this case, the process of determining the target depth information corresponding to the target feature point based on the depth information of other feature points in the reference frame may be: taking the average value of the depth information of other feature points in the reference frame as target depth information corresponding to the target feature points; or may be: other feature points within a preset distance range from the target feature point are determined from other feature points in the reference frame and used as feature points to be utilized, and an average value of depth information of the feature points to be utilized is used as depth information corresponding to the target feature point.

And determining depth uncertainty corresponding to a target feature point corresponding to a next frame of the reference frame based on the observed depth information and a preset pixel error, inputting the depth uncertainty corresponding to the target feature point corresponding to the next frame of the reference frame and the observed depth information into an initial depth filter model, updating parameters of an initial depth filter, obtaining output when the initial depth filter is re-converged, determining more accurate depth information based on the output, and taking the depth information as depth information required for determining the position information of the target feature point at an imaging point of the reference frame.

And the like until the current frame is obtained, determining current observation depth information corresponding to the target feature point based on current pose change information between the pose of the reference frame acquired by the image acquisition equipment and the pose of the frame before the current frame acquired by the image acquisition equipment, a triangulation algorithm, first position information and position information of an imaging point of a space point corresponding to the target feature point in the frame before the current frame, further determining current depth uncertainty corresponding to the target feature point by utilizing the current observation depth information and a preset pixel error, and updating parameters of a current depth filter by utilizing the current observation depth information and the current depth uncertainty to obtain output when the current depth filter is re-converged, and obtaining target depth information corresponding to the target feature point based on the output.

In another case, the target feature point in the reference frame may be not observed for the first time, at this time, the three simplification processes may be directly utilized to determine the observed depth information of the target feature point, further, the observed depth information and the preset pixel error are utilized to determine the current depth uncertainty corresponding to the target feature point, and further, the current observed depth information and the current depth uncertainty are utilized to update the parameters of the current depth filter, so as to obtain the output when the current depth filter is re-converged, and the target depth information corresponding to the target feature point is obtained based on the output.

Wherein, it may be determined based on the previous frame of the reference frame, and whether the target feature point is observed for the first time, specifically, may be: after the electronic device obtains the reference frame, each feature point contained in the reference frame may be detected through a preset feature point detection algorithm, where, for each feature point in the reference frame, the feature point in the reference frame is further matched with a feature point in a frame of the reference frame, and when a feature point matched with the feature point in the reference frame exists in a feature point in a previous frame of the reference frame, it is determined that the feature point in the reference frame is not the first occurrence; otherwise, when there is no feature point matching the feature point in the reference frame among feature points in the frame preceding the reference frame, determining that the feature point in the reference frame appears for the first time.

As shown in fig. 4, a scene diagram of performing a triangulation process and uncertainty analysis on a reference frame and a frame previous to a current frame is shown, where it is assumed that a spatial point P exists, that is, a spatial point corresponding to the target feature point, C1 represents a first pose when the image capturing device captures the reference frame, where the first pose includes a position and a pose when the image capturing device captures the reference frame, C2 represents a second pose when the image capturing device captures the frame previous to the current frame, where the second pose includes a position and a pose when the image capturing device captures the current frame,Representing current pose change information between the pose of the image acquisition device when acquiring the reference frame and the pose of the frame before acquiring the current frame; i ₁ denotes a reference frame, I ₂ denotes a frame previous to the current frame, u ₁ denotes position information of a target feature point in the reference frame, i.e., first position information, and u ₂ denotes position information of an imaging point of a spatial point corresponding to the target feature point in the frame previous to the current frame; in theory, triangles can be formed among the space point corresponding to the target feature point, the position where the image acquisition device acquires the reference frame, and the position where the image acquisition device acquires the current frame.

Based on the triangle relation, the distance between the obtained space point and the position where the image acquisition equipment is located when acquiring the obtained reference frame, namely the current observation depth information d _k and the space position of the space point can be determined. As shown in the figure 4 of the drawings,In order to take the observation error into consideration, the change information of the position of the image acquisition equipment and the position of the space point when the image acquisition equipment acquires the reference frame is vector, namely the difference value between the position of the image acquisition equipment and the position of the space point when the image acquisition equipment acquires the reference frame can be expressed asF _P represents the spatial position of the spatial point without considering the observation error, f _C1 represents the position of the image acquisition device when acquiring the reference frame, and f _P and f _C1 are both coordinates in the world coordinate system; And In order to consider the observation error, the change information of the position where the image acquisition device is located when the reference frame is acquired and the change information of the position where the space point is located are vectors, the calculation mode refers to the calculation mode without considering the observation error, and the Z-axis in fig. 4 represents the direction of the optical axis when the image acquisition device is located when the reference frame is acquired, that is, the longitudinal axis when the image acquisition device is located when the reference frame is acquired.

The observed depth information obtained in the triangulation process is inevitably provided with errors, wherein the errors mainly come from the position information error values of the pixel points observed twice, namely the error values of u ₁ and u ₂ are respectively obtained by identifying the previous frame of the reference frame and the previous frame of the current frame.

The process of determining the depth uncertainty of the depth information is illustrated by the error existing in the observation of the previous frame of the current frame, as follows;

Calculating a connecting line of the position of the space point and the position of the image acquisition equipment when acquiring the reference frame under the condition that the observation error is not considered, and calculating an included angle alpha of the connecting line between the position of the image acquisition equipment when acquiring the reference frame and the position of the image acquisition equipment when acquiring the previous frame of the current frame, wherein the included angle alpha is shown in the following formula (7); calculating a connecting line between the position of the space point and the position of the image acquisition equipment when acquiring the previous frame of the current frame, and an included angle beta between the position of the image acquisition equipment when acquiring the reference frame and the position of the image acquisition equipment when acquiring the previous frame of the current frame, wherein the included angle beta is as shown in the following formula (8);

Wherein, Representing the length of a connecting line of a space point and the pose of the image acquisition equipment when acquiring a reference frame, wherein II & ltII & gt represents modeling of vectors; As a vector, f _C1 represents a position when the image capturing device is in a position when the reference frame is captured, and f _C2 represents a position when the image capturing device is in a position when the previous frame of the current frame is captured.

Wherein delta represents the position of the image acquisition equipment when acquiring the previous frame of the current frame, and the change information from the position of the space point to the position of the previous frame of the current frame is a vector, namely the difference value between the position of the image acquisition equipment when acquiring the previous frame of the current frame and the position of the space point; delta=f _C2-f_P,f_C2 represents the position of the image acquisition device when the image acquisition device acquires the previous frame of the current frame, and f _C2 is the coordinate under the world coordinate system; the II delta II represents the length of a connecting line between the position of the image acquisition device when acquiring the reference frame and the position of the space point.

Assuming that an observation error of +0.5 pixels exists in the reference frame observation, correspondingly, under the condition that the observation error of +0.5 pixels exists in the reference frame observation, namely that the observation error of 0.5 pixels is less observed, connecting a calculated space point with the pose of the image acquisition device when the image acquisition device is in the position of acquiring the previous frame of the current frame, and expressing an included angle beta ⁺ of the connecting line between the pose of the image acquisition device when the image acquisition device acquires the reference frame and the pose of the image acquisition device when the previous frame of the current frame is acquired as the following formula (9):

Where f represents the focal length of the image acquisition device.

Correspondingly, under the condition that the observation error of +0.5 pixel exists in the reference frame observation, the included angle gamma ⁺ between the spatial point and the connecting line of the pose of the image acquisition equipment when the reference frame is acquired and the pose of the image acquisition equipment when the image acquisition equipment is acquired and the previous frame of the current frame can be expressed as the following formula (10):

γ⁺＝π-α-β⁺(10)。

then, new first depth information can be obtained The following formula (11);

Wherein n _r =r/|r| represents a vector of a spatial point corresponding to the optical center pointing to the target feature point of the image acquisition device.

Correspondingly, under the condition that the observation error of-0.5 pixel exists in the observation of the reference frame, a connecting line can be formed between the obtained space point and the pose of the image acquisition device when the previous frame of the current frame is acquired, and the included angle beta ^- of the connecting line between the pose of the image acquisition device when the reference frame is acquired and the pose of the image acquisition device when the previous frame of the current frame is acquired is as follows:

Under the condition that the observation error of-0.5 pixel exists in the observation of the reference frame, the included angle gamma ^- between the spatial point and the connecting line of the pose of the image acquisition equipment when the image acquisition equipment is in the process of acquiring the reference frame and the pose of the image acquisition equipment when the image acquisition equipment is in the process of acquiring the previous frame of the current frame can be expressed as the following formula (13): gamma ^-＝π-α-β^- (13);

new second depth information As shown in the following formula (14),

Using the first depth informationAnd second depth informationCalculating to obtain depth uncertainty of the depth information, wherein the depth uncertainty is represented by the following formula (15);

Wherein, Representing depth uncertainty.

In the above process, the current observation depth information and the current depth uncertainty are determined.

And inputting the current observation depth information and the current depth uncertainty into a current depth filter, updating parameters of the current depth filter to obtain output when the current depth filter is reconverged, and obtaining target depth information corresponding to the target feature point based on the output.

The following describes the derivation of the depth filter formula corresponding to the depth filter whose purpose is to continuously reduce the depth uncertainty so that the depth uncertainty of the depth information can eventually converge to an acceptable value:

An initial depth filter corresponding to an initial depth filter model may be predefined, the initial depth filter defining correct depth information for the spatial point P corresponding to the target feature point And an interior point probability ρ, the depth information representing a spatial point P corresponding to the target feature point, a distance from a position when the image capturing apparatus processes the captured reference frame, assuming that a probability distribution of observed depth information of the spatial point P is represented by a gaussian distribution and a uniform distribution, the following formula (16):

Wherein, Representing a gaussian distribution, U (d|d _min,d_max) representing a uniform distribution; d _min denotes minimum depth information among currently observed depth information, d _max denotes maximum depth information among currently observed depth information, and the currently observed depth information is: the depth information of the spatial point P determined through the triangulation process and the depth information of the spatial point P calculated based on the depth information of other feature points when the spatial point P is observed for the first time.

Based on equation (16), as the obtained observed depth information is more and more, the correct depth informationAnd the probability distribution of the interior point probability ρ, i.e., the depth filter model, can be represented by the following formula (17):

Wherein d _E represents the currently observed depth information, d ₁ may represent the depth information of the spatial point P calculated based on the depth information of other feature points when the spatial point P is observed for the first time, that is, the first depth information, d ₁ may represent the depth information of the spatial point P calculated in the first time of the triangulating process, that is, the observed depth information corresponding to the determined target feature point based on the triangulating algorithm, the first position information, and the position information of the imaging point of the spatial point P in the later frame of the reference frame, and the position change information between the pose of the image capturing device when the image capturing device captures the reference frame and the pose of the image capturing device captures the later frame of the reference frame, and so on, d _e is the depth information of the spatial point P calculated in the last time of the current triangulating process, that is determined based on the triangulating algorithm, the first position information, the position information of the imaging point of the spatial point P in the later frame of the reference frame, and the position change information of the target feature point corresponding to the determined depth information when the image capturing device captures the pose of the later frame of the reference frame.

Subsequently, equation (17) can be modified to: expressed in the form of the product of a Beta distribution and a gaussian distribution, i.e. the correct depth informationThe probability distribution of the interior point probability ρ, i.e., the depth filter formula corresponding to the depth filter, can be expressed by the following formula (18):

Wherein a _e,b_e is a parameter of Beta distribution, a _e is minimum depth information obtained by calculating the depth information of the space point P and the corresponding depth uncertainty by utilizing the current last triangulation process, and b _e is maximum depth information obtained by updating the parameter of the depth filtering model by utilizing the depth information of the space point P and the corresponding depth uncertainty obtained by calculating the current last triangulation process; Is a Gaussian distribution parameter, wherein mu _e is the mean value of the correct depth information obtained by calculating the depth information of the space point P and the corresponding depth uncertainty by utilizing the current last triangulation process and updating the parameter of the depth filtering model, In order to calculate the depth information of the obtained space point P and the corresponding depth uncertainty by utilizing the current last triangulation process, the variance of the correct depth information after the parameters of the depth filtering model are updated.

If the position information of the imaging point of the spatial point P in the e-th frame after the reference frame, which is utilized by the d _e, is calculated as follows: position information of an imaging point in a second frame image preceding the current frame. The above (18) represents a current depth filter model corresponding to a current depth filter.

Subsequently, the current observed depth information d _k and the corresponding current depth uncertainty obtained in the process of obtaining the triangulationThen, the current observed depth information d _k and the corresponding current depth uncertainty thereofSubstituting the current depth filter, i.e., the current depth filter model, can be expressed by the following formula (19);

Wherein, Representing current depth uncertainty based on current observed depth information d _k and its correspondingCalculating the probability of current observation, namely the current observation depth information d _k and the corresponding current depth uncertaintySubstituting the probability calculated in equation (16).

Expanding two sides of the equation, respectively calculating a first moment and a second moment of the equation, and enabling the first moment of the equation to be equal to the second moment of the equation to obtain updated dataAnd obtaining an output result of the updated depth filter, and obtaining target depth information corresponding to the target feature point based on the output result, wherein the output result can comprise the target depth information.

Corresponding to the above method embodiment, the embodiment of the present invention provides a tracking device for image feature points of a target object, as shown in fig. 5, which may include:

the first obtaining module 510 is configured to obtain a current frame and a reference frame corresponding to the current frame, where the reference frame is acquired by the image acquisition device: image frames meeting preset reference frame screening conditions before the current frame;

a second obtaining module 520 configured to obtain pose conversion information between a pose when the image acquisition device acquires the current frame and a pose when the image acquisition device acquires the reference frame;

a third obtaining module 530, configured to obtain target depth information corresponding to a target feature point in the reference frame, and obtain first position information of the target feature point in the reference frame;

a first determining module 540, configured to determine, based on the pose conversion information, the target depth information, and the first position information, first projection area information corresponding to a spatial point corresponding to the target feature point in the current frame;

A second determining module 550 configured to determine second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information, and the first projection area information;

And a third determining module 560 configured to determine, based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, position information corresponding to the spatial point corresponding to the target feature point in the current frame.

By the embodiment of the invention, the pose conversion information of the pose between the current frame and the reference frame and the target depth information corresponding to the target feature point in the reference frame can be acquired by combining the image acquisition equipment, the first projection area information of the space point corresponding to the target feature point is determined from the current frame, namely, the projection range with larger probability of existence of the pixel point corresponding to the space point corresponding to the target feature point is determined from the current frame, and further, the position information corresponding to the space point corresponding to the target feature point in the current frame is determined based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, so that the tracking of the space point corresponding to the target feature point is improved to a certain extent, the more accurate tracking of the feature is avoided, and the accuracy of the tracking result is improved.

In another embodiment of the present invention, the second obtaining module 520 is specifically configured to: acquiring current pose information when the image acquisition equipment acquires the current frame; acquiring reference pose information when the image acquisition equipment acquires the reference frame; and determining pose conversion information between the pose when the image acquisition equipment acquires the current frame and the pose when the image acquisition equipment acquires the reference frame according to the current pose information and the reference pose information.

In another embodiment of the present invention, the apparatus further comprises: the first construction module is configured to perform image pyramid construction on the current frame before determining the position information corresponding to the space point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, so as to obtain a preset number of first subframes corresponding to the current frame; the second construction module is configured to construct an image pyramid for the reference frame to obtain the second subframes with the preset number corresponding to the reference frame; a fourth determining module configured to determine third projection area information corresponding to the first projection area information from each first subframe based on the first projection area information; a fifth determining module configured to determine fourth projection area information corresponding to the first projection area information from each of the second subframes based on the first projection area information; a sixth determining module configured to determine, based on the first location information, second location information corresponding to the target feature from each second subframe; the third determining module 560 is specifically configured to: and determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information and the gray value corresponding to each fourth projection area information.

In another embodiment of the present invention, the apparatus further comprises: a fourth obtaining module configured to obtain a first M frame of the current frame before determining position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, and the gray value corresponding to each fourth projection area information, where M is a positive integer; the third construction module is configured to construct an image pyramid for each frame of the previous M frames to obtain the predetermined number of third subframes corresponding to each frame of the previous M frames; a seventh determining module configured to determine fifth projection area information corresponding to the first projection area information from each frame of the previous M frames based on the first projection area information; an eighth determining module configured to determine, from each of the third subframes, sixth projection area information corresponding to the first projection area information based on the first projection area information; a fifth obtaining module configured to obtain third position information corresponding to a spatial point corresponding to the target feature point in each of the previous M frames; a ninth determining module, configured to determine fourth location information corresponding to a spatial point corresponding to the target feature point from each third subframe based on the third location information; the third determining module 560 is specifically configured to: and determining position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information and the gray value corresponding to each sixth projection area information.

In another embodiment of the present invention, the third determining module 560 is specifically configured to: constructing a least square equation aiming at the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information, the gray value corresponding to each fourth projection area information, the gray value corresponding to the fifth projection area information and the gray value corresponding to each sixth projection area information; and solving the least square equation, and taking the solution when the result of the least square equation meets the preset constraint condition as the corresponding position information of the space point corresponding to the target feature point in the current frame.

In another embodiment of the present invention, the third obtaining module 530 is specifically configured to: and obtaining target depth information corresponding to the target feature points in the reference frame through a depth filter.

In another embodiment of the present invention, the third obtaining module 530 is specifically configured to: acquiring current pose change information between the pose of the reference frame acquired by the image acquisition equipment and the pose of the frame before the current frame acquired by the image acquisition equipment; acquiring current observation depth information corresponding to the target feature point by using a triangulation algorithm, the first position information, the position information of an imaging point of a space point corresponding to the target feature point in a frame before the current frame and the current pose change information; determining the current depth uncertainty corresponding to the target feature point by using the current observation depth information and a preset pixel error; and updating parameters of the current depth filter by using the current observation depth information and the current depth uncertainty to obtain output when the current depth filter is re-converged, and obtaining target depth information corresponding to the target feature point based on the output.

In another embodiment of the present invention, the first determining module 540 is specifically configured to: calculating a first affine transformation matrix from the current frame to the reference frame based on the pose conversion information, the target depth information and the first position information; and determining second projection area information corresponding to the first projection area information from the reference frame based on the first projection area information and the first affine transformation matrix.

The device embodiment corresponds to the method embodiment, and has the same technical effects as the method embodiment, and the specific description refers to the method embodiment. The apparatus embodiments are based on the method embodiments, and specific descriptions may be referred to in the method embodiment section, which is not repeated herein.

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

Those of ordinary skill in the art will appreciate that: the modules in the apparatus of the embodiments may be distributed in the apparatus of the embodiments according to the description of the embodiments, or may be located in one or more apparatuses different from the present embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for tracking image feature points of a target object, comprising:

Obtaining target depth information corresponding to target feature points in the reference frame through a laser sensor or through a triangulation algorithm and a depth filter, and obtaining first position information of the target feature points in the reference frame, wherein the target depth information represents: the space point corresponding to the target feature point is distant from the position where the image acquisition equipment is located when acquiring the reference frame;

determining position information corresponding to the space point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information;

The step of determining second projection area information corresponding to the first projection area information from the reference frame based on the pose conversion information, the target depth information, the first position information and the first projection area information includes:

2. The method according to claim 1, wherein the step of obtaining pose conversion information between the pose when the current frame is acquired by the image acquisition device and the pose when the reference frame is acquired, comprises:

3. The method of claim 1, wherein before the step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, the method further comprises:

4. The method of claim 3, wherein before the step of determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection region information, the gray value corresponding to the second projection region information, the gray value corresponding to each third projection region information, and the gray value corresponding to each fourth projection region information, the method further comprises:

5. The method of claim 4, wherein the determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to the third projection area information, the gray value corresponding to the fourth projection area information, the gray value corresponding to the fifth projection area information, and the gray value corresponding to the sixth projection area information includes:

6. The method according to any one of claims 1 to 5, wherein the step of obtaining target depth information corresponding to target feature points in the reference frame includes:

7. The method of claim 6, wherein the step of obtaining target depth information corresponding to target feature points in the reference frame by a depth filter comprises:

8. A tracking device for image feature points of a target object, comprising:

the third obtaining module is configured to obtain target depth information corresponding to target feature points in the reference frame, and obtain first position information of the target feature points in the reference frame, wherein the target depth information represents: the space point corresponding to the target feature point is distant from the position where the image acquisition equipment is located when acquiring the reference frame;

A third determining module configured to determine, based on the gray value corresponding to the first projection area information and the gray value corresponding to the second projection area information, position information corresponding to a spatial point corresponding to the target feature point in the current frame;

the first determining module is specifically configured to: calculating a first affine transformation matrix from the current frame to the reference frame based on the pose conversion information, the target depth information and the first position information; and determining second projection area information corresponding to the first projection area information from the reference frame based on the first projection area information and the first affine transformation matrix.

9. The apparatus of claim 8, wherein the apparatus further comprises:

The first construction module is configured to perform image pyramid construction on the current frame before determining the position information corresponding to the spatial point corresponding to the target feature point in the current frame based on the first projection area information, the second projection area information and the first position information, so as to obtain a preset number of first subframes corresponding to the current frame;

the second construction module is configured to construct an image pyramid for the reference frame to obtain the second subframes with the preset number corresponding to the reference frame;

A fourth determining module configured to determine third projection area information corresponding to the first projection area information from each first subframe based on the first projection area information;

A fifth determining module configured to determine fourth projection area information corresponding to the first projection area information from each of the second subframes based on the first projection area information;

A sixth determining module configured to determine, based on the first location information, second location information corresponding to the target feature from each second subframe;

The third determining module is specifically configured to: and determining the position information of the spatial point corresponding to the target feature point in the current frame based on the gray value corresponding to the first projection area information, the gray value corresponding to the second projection area information, the gray value corresponding to each third projection area information and the gray value corresponding to each fourth projection area information.