Disclosure of Invention
In view of the foregoing, it is desirable to provide a remote sensing image target detection method, apparatus, computer device, computer readable storage medium, and computer program product.
In a first aspect, the present application provides a remote sensing image target detection method. The method comprises the following steps:
performing pixel segmentation on an image to be processed to obtain a super-pixel group corresponding to the image to be processed, wherein the super-pixel group comprises a plurality of super-pixel blocks, and the super-pixel blocks comprise a plurality of similar pixel points;
determining a feature matrix corresponding to the super pixel group, wherein the feature matrix is determined based on RGB values of all pixel points in the super pixel block;
determining an edge matrix corresponding to the super pixel group based on the local edge set and the global edge set corresponding to the super pixel group; the local edge set is determined according to a distance between the super pixel block and an adjacent super pixel block of the super pixel block; the global edge set is determined according to membership degrees between the super pixel blocks and other super pixel blocks in the image to be processed;
And performing target detection processing on the image to be processed based on the feature matrix, the edge matrix and the trained target detection model to obtain the type of each super pixel block in the image to be processed.
In one embodiment, the determining the feature matrix corresponding to the superpixel group includes:
traversing each super pixel block in the super pixel group, and determining a characteristic value of the current super pixel block based on an RGB average value corresponding to the current super pixel block; the RGB average value is determined based on the average value of R value, G value and B value of a plurality of pixel points;
And after traversing each super pixel block in the super pixel group, obtaining a feature matrix containing a plurality of feature values.
In one embodiment, the determining the edge matrix corresponding to the super pixel group based on the local edge set and the global edge set corresponding to the super pixel group includes:
Determining a distance value between a current super-pixel block and an adjacent super-pixel block of the current super-pixel block for each super-pixel block;
If the distance value is not greater than a first threshold value, determining a first connection edge between the current super-pixel block and the adjacent super-pixel block, and merging a plurality of first connection edges to obtain a local edge set corresponding to the super-pixel group;
Determining a plurality of membership degrees corresponding to the current super-pixel block according to a fuzzy C-means clustering strategy; the membership degree is used for indicating the degree that the current super pixel block belongs to a certain class cluster;
If the membership degree is larger than a second threshold value, determining a second continuous edge between the current super-pixel block and a clustering center of a class cluster corresponding to the membership degree, and combining a plurality of the second continuous edges to obtain a global edge set corresponding to the super-pixel group;
and merging the local edge set and the global edge set to obtain an edge matrix corresponding to the super pixel group.
In one embodiment, the performing object detection processing on the image to be processed based on the feature matrix, the edge matrix and the trained object detection model to obtain types of the super pixel blocks in the image to be processed includes:
Inputting the feature matrix and the edge matrix into a trained target detection model to obtain a type value of each super pixel block in the image to be processed; the target detection model is used for determining the graph structural characteristics of the feature matrix and the edge matrix after being combined based on the graph attention layer, inputting the graph structural characteristics into at least one graph convolution layer for convolution, and outputting the type value of each super pixel block in the graph structural characteristics; the graph structure features comprise feature values and continuous edges corresponding to the super pixel blocks.
In one embodiment, before the determining the feature matrix corresponding to the superpixel group, the method further includes:
determining a plurality of similar windows corresponding to the super pixel blocks, wherein the similar windows are similar pixel blocks determined based on the positions, the texture change trends and the texture direction angles of the super pixel blocks;
based on a Gaussian weighted Euclidean distance algorithm, determining the similarity between the super pixel block and each similar window, and based on a plurality of the similarities, determining the weight values of a plurality of the similar windows corresponding to the super pixel block;
and carrying out weighted average calculation on the similar windows based on the weight values to obtain the denoised super-pixel block.
In one embodiment, the method further comprises:
performing pixel segmentation on a training image to obtain a super-pixel group corresponding to the training image, wherein the super-pixel group comprises a plurality of super-pixel blocks, and the super-pixel blocks comprise a plurality of similar pixel points;
acquiring label values of all pixel points in the current super-pixel block aiming at all super-pixel blocks of the training image; the label value is used for representing the classification type of each pixel point;
determining the label value corresponding to the current super-pixel block according to the label value of each pixel point and the majority voting strategy, so as to obtain the label values of a plurality of super-pixel blocks in the training image, and taking the training image and a plurality of label values corresponding to the training image as a training sample set; the label values of the superpixel blocks are used to train the target detection model.
In one embodiment, the method further comprises:
Determining a feature matrix and a side matrix of the training image in the training sample set;
Inputting the feature matrix and the edge matrix of the training image into a graph convolution neural network in a target detection model, and training the graph convolution neural network based on the label value of each super pixel block in the training image and a cross entropy loss function to obtain a trained target detection model; wherein the graph roll-up neural network comprises a graph attention layer and a graph roll-up layer.
In a second aspect, the application further provides a remote sensing image target detection device. The device comprises:
The pixel segmentation module is used for carrying out pixel segmentation on an image to be processed to obtain a super pixel group corresponding to the image to be processed, wherein the super pixel group comprises a plurality of super pixel blocks, and the super pixel blocks comprise a plurality of similar pixel points;
the characteristic matrix determining module is used for determining a characteristic matrix corresponding to the super pixel group, and the characteristic matrix is determined based on RGB values of all pixel points in the super pixel block;
The edge matrix determining module is used for determining an edge matrix corresponding to the super pixel group based on the local edge set and the global edge set corresponding to the super pixel group; the local edge set is determined according to a distance between the super pixel block and an adjacent super pixel block of the super pixel block; the global edge set is determined according to membership degrees between the super pixel blocks and other super pixel blocks in the image to be processed;
And the target detection module is used for carrying out target detection processing on the image to be processed based on the feature matrix, the edge matrix and the trained target detection model to obtain the type of each super pixel block in the image to be processed.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method according to the first aspect when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the first aspect.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect.
According to the remote sensing image target detection method, the remote sensing image target detection device, the computer equipment, the storage medium and the computer program product, the remote sensing image to be processed is subjected to pixel segmentation to obtain the super-pixel group formed by a plurality of super-pixel blocks, and each super-pixel block comprises a plurality of similar pixel points. For the super pixel group, determining RGB values of pixel points in each super pixel block, and determining feature matrixes corresponding to a plurality of super pixel blocks based on the RGB values, so as to obtain internal feature data contained in each super pixel block. For a superpixel group, determining a local edge set containing short-range semantic information based on a distance between a superpixel block and an adjacent superpixel block; and determining a global edge set containing global semantic information based on the membership degree between the super pixel blocks. And merging the local edge set and the global edge set to obtain an edge matrix containing the association relationship among the super pixel blocks. And finally, taking the feature matrix and the edge matrix as feature vectors, and inputting a trained target detection model to obtain a target detection result. Because the feature matrix contains the internal feature data of the super pixel blocks, and the edge matrix contains the association relation among a plurality of super pixel blocks, the trained target detection model can embody the association among the super pixel blocks and output a more accurate target detection result, thereby improving the accuracy of target detection of the remote sensing image by the server.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The remote sensing image target detection method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 uploads the acquired remote sensing image to the server 104, and the server can directly perform target detection processing on the remote sensing image, or temporarily store the remote sensing image in a data storage system for subsequent centralized target detection processing. The data storage system may store the remote sensing image that the server 104 needs to process, i.e., the image to be processed. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The server performs pixel segmentation on the image to be processed to obtain a super-pixel group formed by a plurality of super-pixel blocks, respectively determines a feature matrix corresponding to the super-pixel group and a side matrix corresponding to the super-pixel group, inputs the feature matrix and the side matrix into a target detection model, and obtains a remote sensing image marked with the type of each super-pixel block, thus obtaining a target detection result. The terminal 102 may be, but is not limited to, various remote sensing image capturing devices, personal computers, notebook computers, smart phones, tablet computers, and other devices for capturing images. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In an exemplary embodiment, as shown in fig. 2, a remote sensing image target detection method is provided, and an example of application of the method to the server in fig. 1 is described, which includes the following steps S202 to S208. Wherein:
Step S202, performing pixel segmentation on the image to be processed to obtain a super pixel group corresponding to the image to be processed.
The image to be processed can be a remote sensing image shot by a terminal, and the terminal can be a satellite, an unmanned aerial vehicle, a radar and other equipment. The super pixel group comprises a plurality of super pixel blocks, and the super pixel blocks comprise a plurality of similar pixel points.
Specifically, the service area acquires a remote sensing image shot by the terminal and serves as an image to be processed. The server can divide the pixels of the image to be processed by adopting a super-pixel algorithm to obtain a plurality of super-pixel blocks, and takes the super-pixel blocks corresponding to the image to be processed as super-pixel groups. It should be appreciated that the super-pixel algorithm is a process of aggregating a plurality of pixels having similar characteristics to form a more representative element that can be the basic unit of a subsequent image processing algorithm. Super-pixel algorithms are commonly used as a preprocessing means for images to reduce the size of high resolution images without degrading the accuracy of the images.
In one example, the server may use a simple linear iterative clustering algorithm SLIC (SIMPLE LINEAR ITERATIVE Clustering) to perform pixel segmentation on the image to be processed, resulting in a plurality of super-pixel blocks. SLICs are a k-means based super-pixel segmentation algorithm for grouping pixels into super-pixels with similar color and spatial location. The super-pixel algorithm may also be a phillips tile algorithm (Felzenszwalb), a Watershed algorithm (Watershed), or the like, and is not particularly limited in the present application. The phillips tile algorithm is a super-pixel segmentation algorithm based on the edge and color information of an image for producing super-pixel blocks with good visual effects while maintaining the connectivity inside the image. The watershed algorithm is a super-pixel segmentation algorithm based on image gradients and is used for determining the boundary of each super-pixel block by utilizing gradient information of an image.
Step S204, determining a feature matrix corresponding to the super pixel group.
Wherein the feature matrix is determined based on RGB values of each pixel point in the super pixel block. The feature matrix is determined by the feature values of the plurality of super pixel blocks contained in the super pixel group.
Specifically, the server obtains RGB values of each pixel point in the super pixel block, and determines a feature value of the super pixel block according to the RGB values of the plurality of pixel points. And taking the characteristic value of each super pixel block as one element in the matrix, and combining to obtain the characteristic matrix corresponding to the super pixel group.
Step S206, determining an edge matrix corresponding to the super pixel group based on the local edge set and the global edge set corresponding to the super pixel group.
Wherein the local edge set is determined according to a distance between the super pixel block and an adjacent super pixel block of the super pixel block; the global edge set is determined according to membership degrees between the super pixel block and the rest of the super pixel blocks in the image to be processed.
Specifically, the server traverses a plurality of super-pixel blocks contained in the image to be processed, determines the distance between the current super-pixel block and the adjacent super-pixel block of the current super-pixel block, determines a plurality of super-pixel blocks associated with the current super-pixel block according to the distance, and serves as a local edge set corresponding to the current super-pixel block. And after the traversal is finished, obtaining a local edge set corresponding to the super pixel group.
The server traverses a plurality of super-pixel blocks contained in the image to be processed, determines the membership degree between the current super-pixel block and other non-adjacent super-pixel blocks, and the non-adjacent super-pixel blocks can be represented by using a clustering center. Based on this, the server may calculate membership degrees of the plurality of cluster centers and the current superpixel block according to a clustering algorithm. And determining a global edge set corresponding to the current super-pixel block according to the membership degree, and obtaining the global edge set corresponding to the super-pixel group after traversing is finished. And the server combines the local edge set and the global edge set to obtain an edge matrix comprising each super-pixel block and other super-pixel blocks connected.
And step S208, performing target detection processing on the image to be processed based on the feature matrix, the edge matrix and the trained target detection model to obtain the type of each super pixel block in the image to be processed.
The target detection model is used for identifying each super-pixel block of the remote sensing image and obtaining a classified artificial intelligent model of each super-pixel block. The object detection model may be a model of a graph neural network. The trained target detection model is obtained based on the remote sensing image and training labels corresponding to the super pixel blocks in the remote sensing image.
Specifically, the server takes a feature matrix and an edge matrix corresponding to the image to be processed as feature data, inputs the feature data into a trained target detection model, processes the feature data by the target detection model, and finally outputs a type value corresponding to each super-pixel block.
In the remote sensing image target detection method, the remote sensing image to be processed is subjected to pixel segmentation to obtain the super pixel group formed by a plurality of super pixel blocks, and each super pixel block comprises a plurality of similar pixel points. For the super pixel group, determining RGB values of pixel points in each super pixel block, and determining feature matrixes corresponding to a plurality of super pixel blocks based on the RGB values, so as to obtain internal feature data contained in each super pixel block. For a superpixel group, determining a local edge set containing short-range semantic information based on a distance between a superpixel block and an adjacent superpixel block; and determining a global edge set containing global semantic information based on the membership degree between the super pixel blocks. And merging the local edge set and the global edge set to obtain an edge matrix containing the association relationship among the super pixel blocks. And finally, taking the feature matrix and the edge matrix as feature vectors, and inputting a trained target detection model to obtain a target detection result. Because the feature matrix contains the internal feature data of the super pixel blocks, and the edge matrix contains the association relation among a plurality of super pixel blocks, the trained target detection model can embody the association among the super pixel blocks and output a more accurate target detection result, thereby improving the accuracy of target detection of the remote sensing image by the server.
In an exemplary embodiment, the specific implementation process of the step of determining the feature matrix corresponding to the superpixel group includes:
traversing each super-pixel block in the super-pixel group, and determining the characteristic value of the current super-pixel block based on the RGB average value corresponding to the current super-pixel block. After the traversal of each super pixel block in the super pixel group is completed, a feature matrix containing a plurality of feature values is obtained.
Wherein the RGB average value is determined based on an average value of R values, G values, and B values of the plurality of pixel points.
Specifically, the server determines, for each super pixel block in the super pixel group, an RGB average value corresponding to each pixel point in each super pixel block. Namely, the R value, the G value and the B value of each pixel point in the current super pixel block are determined, and the value ranges of the R value, the G value and the B value are all 0-255. And determining an R average value according to the R value of each pixel point. And determining a G average value according to the G value of each pixel point. And determining a B average value according to the B value of each pixel point. And summing the R average value, the G average value and the B average value, and then dividing the preset value to obtain the characteristic value corresponding to the current super pixel block. Based on the above, the server can respectively determine the characteristic values corresponding to the super pixel blocks, and store the characteristic values into the matrix according to the traversal sequence to obtain the characteristic matrix corresponding to the super pixel group.
In this embodiment, by determining the R average value, the G average value, and the B average value corresponding to each pixel point, and obtaining the feature value of each super pixel block, the feature value may represent the internal feature of the super pixel block from the dimension of the color, so that the feature matrix corresponding to the image to be processed may be determined quickly and accurately.
In an exemplary embodiment, as shown in fig. 3, the step of determining the edge matrix corresponding to the superpixel group based on the local edge set and the global edge set corresponding to the superpixel group includes steps S302 to S310. Wherein:
step S302, for each super-pixel block, determining a distance value between the current super-pixel block and the adjacent super-pixel block of the current super-pixel block.
Specifically, the server traverses each super-pixel block to determine the adjacent super-pixel block corresponding to the current super-pixel block. The distance between the center point of the current super pixel block and the center point of each adjacent super pixel block is calculated. And determining Euclidean distances of the two center points on the RGB channel by adopting an Euclidean distance algorithm, and obtaining the color distance between the two center points.
Step S304, if the distance value is not greater than the first threshold value, determining a first connection edge between the current super-pixel block and the adjacent super-pixel block, and merging the plurality of first connection edges to obtain a local edge set corresponding to the super-pixel group.
Specifically, a first threshold value is determined, a plurality of distance values are compared with the first threshold value, a current super-pixel block and an adjacent super-pixel block corresponding to the current distance value are determined under the condition that the distance value is smaller than or equal to the first threshold value, a connecting edge between the current super-pixel block and the adjacent super-pixel block is established, the value is 1, and a first connecting edge with the current super-pixel block and the adjacent super-pixel block as vertexes and the connecting edge value of the two vertexes being 1 is obtained. And obtaining a plurality of vertexes and a plurality of first edges corresponding to the current super-pixel block, and taking the vertexes and the first edges as a local edge set corresponding to the current super-pixel block. Based on the above, after the traversal of each super pixel block is completed, a local edge set corresponding to a plurality of super pixel blocks is obtained. And finally, collecting local edges of the super pixel blocks and obtaining a local edge set corresponding to the super pixel group.
And step S306, determining a plurality of membership degrees corresponding to the current super-pixel block according to the fuzzy C-means clustering strategy.
The membership degree is used for indicating the degree that the current super pixel block belongs to a certain class cluster.
Specifically, the server divides a plurality of super pixel blocks into a preset number of class clusters through fuzzy C-means clustering, and determines a clustering center corresponding to each class cluster, so that an objective function corresponding to the fuzzy C-means clustering meets the minimum value. Based on the fuzzy C-means clustering, traversing each super-pixel block of the image to be processed through fuzzy division, and determining the membership degree between the current super-pixel block and a plurality of class clusters. The membership range is 0 to 1. It should be appreciated that the sum of membership degrees between the current superpixel block and the plurality of class clusters is 1.
And step 308, if the membership degree is greater than a second threshold, determining a second continuous edge between the current super-pixel block and the clustering center of the class cluster corresponding to the membership degree, and combining a plurality of second continuous edges to obtain a global edge set corresponding to the super-pixel group.
Specifically, the server determines a second threshold and compares the plurality of membership degrees of the current superpixel block with the second threshold. And under the condition that the membership degree is larger than a second threshold value, determining a second continuous edge between the current super-pixel block and the clustering center of the class cluster corresponding to the membership degree, namely determining that the current super-pixel block and the clustering center are two vertexes, and determining the continuous edge between the two vertexes, wherein the continuous edge is 1. And the vertex and the connecting edge are used as a second connecting edge. And obtaining a plurality of second continuous edges corresponding to the current super-pixel block and taking the second continuous edges as a global edge set corresponding to the current super-pixel block. Based on the above, after the traversal of each super pixel block is completed, a global edge set corresponding to a plurality of super pixel blocks is obtained. And finally, collecting global edges of the super pixel blocks and obtaining a global edge set corresponding to the super pixel group.
And step S310, merging the local edge set and the global edge set to obtain an edge matrix corresponding to the super pixel group.
In particular, since the local edge set can be regarded as an undirected graph, the global edge set can also be regarded as an undirected graph. The undirected graph model corresponding to the super-pixel group is constructed based on the local edge set and the global edge set, and the undirected graph model is used as an edge matrix corresponding to the super-pixel group. In one example, the undirected graph model of a superpixel group may be G (V, E), where one superpixel block or cluster center corresponds to one vertex. Where V and E represent the set of all vertices and edges, respectively.
In this embodiment, the distance between the super pixel block and the adjacent super pixel block is determined by the distance value of the RGB channel, and the local edge set is obtained. And obtaining membership degrees of the super pixel blocks and each clustering center through a fuzzy C-means clustering strategy, and obtaining a global edge set, thereby obtaining an edge matrix corresponding to the super pixel group. Because the edge matrix contains the association relationship between the super pixel blocks and the clustering center, the implicit relationship between the super pixel blocks can be accurately represented.
In an exemplary embodiment, the specific implementation process of the step of performing object detection processing on an image to be processed based on the feature matrix, the edge matrix and the trained object detection model to obtain the type of each super pixel block in the image to be processed includes:
and inputting the feature matrix and the edge matrix into a trained target detection model to obtain the type value of each super pixel block in the image to be processed.
The target detection model is used for determining the graph structural features of the feature matrix and the edge matrix after being combined based on the graph attention layer, inputting the graph structural features into at least one graph convolution layer for convolution, and outputting the type values of each super-pixel block in the graph structural features; the image structure characteristics comprise characteristic values and continuous edges corresponding to each super pixel block.
Specifically, the server combines the feature matrix and the edge matrix, namely, the feature value of each super pixel block in the feature matrix is used as the attribute value of each super pixel block in the edge matrix, and the updated edge matrix is obtained. The trained target detection model receives the updated edge matrix, determines the edge connecting weight between the super pixel blocks in the updated edge matrix through the graph attention layer, updates the edge connecting weight into the edge matrix, and obtains the combined graph structural characteristics. The target detection model inputs the graph structural features into a multi-layer graph convolution layer for calculation to obtain the type value corresponding to each super pixel block in the graph structural features.
In this embodiment, the feature matrix and the edge matrix are combined, the graph structural feature corresponding to the image to be processed is obtained based on the graph attention layer, and finally the graph structural feature is processed through the multi-layer graph convolution layer to obtain the type value corresponding to each super pixel block in the image to be processed, so that a more accurate association relationship between the super pixel blocks can be obtained through the graph structural feature with weight, and the accuracy of target detection of the target detection model is further improved.
In an exemplary embodiment, before the step of determining the feature matrix corresponding to the superpixel group, the method further includes steps S402 to S406, as shown in fig. 4. Wherein:
step S402, a plurality of similar windows corresponding to each super pixel block are determined.
Wherein the similar window is a similar pixel block determined based on the position of the super pixel block, the texture change trend, the texture direction angle.
Specifically, the server determines a texture contour map corresponding to the image to be processed based on edge information of the segmented super pixel blocks. The server selects a certain super-pixel block of the image to be processed, determines a center point corresponding to the super-pixel block, and determines whether the center point of the current super-pixel block is positioned in the current super-pixel block or on an edge line of the current super-pixel block according to the texture distribution of the texture contour map. If the center point of the current super-pixel block is inside the current super-pixel block, selecting a similar window based on the position of the current super-pixel block in a translation mode. If the center point of the current super pixel block is on the edge line of the current super pixel block, determining a similar window according to the texture change trend and the texture direction angle near the center point.
Step S404, based on Gaussian weighted Euclidean distance algorithm, the similarity between the super pixel block and each similar window is determined, and based on a plurality of similarities, the weight values of a plurality of similar windows corresponding to the super pixel block are determined.
Specifically, the server adopts a Gaussian weighted Euclidean distance algorithm to determine the distance value between the center point of the current super pixel block and each pixel point in the similar window on the RGB channel. Thereby obtaining the distance between the center point of the current super pixel block and the similar window. Based on this, the server may determine distances corresponding to the center points of the current super pixel blocks and the plurality of similar windows, and take the distances as the similarity.
The server determines the weight value of the similarity window corresponding to each similarity by the following formula:
wherein, Represents a window with subscript k within a similar window,For the normalization factor h is the decay parameter or smoothing parameter of the weighting function, the value of which depends on the noise level involved. If the value of h is too small, only excessive noise is removed, and if the value is too large, excessive smoothness is caused to the image; i is a similar block set of the central pixel point I; /(I)Is a weight function for measuring the similarity between pixel points i and j, and,. The weighting function is valued according to the Euclidean distance between u (i) and u (j).
And step S406, carrying out weighted average calculation on a plurality of similar windows based on the weight values to obtain the denoised super pixel block.
Specifically, according to the weight value corresponding to each similar window, the RGB value of each pixel point in each similar window is respectively determined, and the similar window endowed with weight is obtained. And as the sum of the weight values of the similar windows is 1, summing the RGB values corresponding to the similar windows given with the weights to obtain the RGB value of each denoised pixel point, and taking the RGB value as the denoised super-pixel block.
In this embodiment, by traversing each super pixel block, a plurality of similar windows corresponding to each super pixel block are obtained, and weight values of the plurality of similar windows are determined, so that weighted average is performed on the plurality of similar windows according to the weight values, and a denoised super pixel block is obtained, so that noise of the super pixel block can be accurately reduced, and accuracy of image processing in subsequent steps is improved.
In an exemplary embodiment, as shown in fig. 5, the method further includes steps S502 to S506. Wherein:
step S502, performing pixel segmentation on the training image to obtain a super-pixel group corresponding to the training image.
The super pixel group comprises a plurality of super pixel blocks, and the super pixel blocks comprise a plurality of similar pixel points.
Specifically, the server acquires a training image, divides the training image through a superpixel algorithm to obtain a plurality of superpixel blocks corresponding to the training image, and merges the superpixel blocks into a superpixel group.
Step S504, for each super-pixel block of the training image, obtaining the label value of each pixel point in the current super-pixel block.
The label value is used for representing the classification type of each pixel point.
Specifically, traversing the super-pixel blocks of each training image, and aiming at each super-pixel block, acquiring labeled label values of each pixel point in the current super-pixel block to obtain the classification type of each pixel point. Such as farmland, river or city type, etc.
Step S506, determining the label value corresponding to the current super-pixel block according to the label value of each pixel point and the majority voting strategy, so as to obtain the label values of a plurality of super-pixel blocks in the training image, and taking the training image and a plurality of label values corresponding to the training image as a training sample set.
Wherein the label values of the super pixel blocks are used to train the target detection model.
Specifically, for the current super-pixel block, the label value of each pixel point in the current super-pixel block is input into an algorithm corresponding to a majority voting strategy, and the number of the pixel points corresponding to each type of label value is determined, so that the label value of the current super-pixel block is determined. It should be understood that the label values of the pixels are different, and the server may select the label value with the highest occurrence frequency from the label values as the label value of the current super-pixel block. Based on this, the server may determine the label value for each super-pixel block in the training image. And labeling each super-pixel block of the training image by the server according to the label value of each super-pixel block to obtain a training sample set.
In this embodiment, the tag value of each super-pixel block can be accurately determined by acquiring the tag value corresponding to each pixel point in the training image, and determining the tag value with the largest number from the tag values of each pixel point as the tag value corresponding to the super-pixel block, so as to obtain the labeled training sample set.
In an exemplary embodiment, the method further comprises:
Determining a feature matrix and an edge matrix of a training image in a training sample set; inputting the feature matrix and the edge matrix of the training image into a graph convolution neural network in the target detection model, and training the graph convolution neural network based on the label value of each super pixel block in the training image and the cross entropy loss function to obtain a trained target detection model.
Wherein the graph roll-up neural network comprises a graph attention layer and a graph roll-up layer.
Specifically, the server processes the training sample set according to the method in the above embodiment to obtain a feature matrix corresponding to the training image and a side matrix corresponding to the training image. Inputting a feature matrix corresponding to the training image and a side matrix corresponding to the training image into a target detection model, and adjusting the super parameters in the target detection model according to the label value and the loss function of each super pixel block in the training image until a preset convergence condition is met.
In one example, the server may train the graph convolution neural network according to the cross entropy loss function until the loss value of the cross entropy loss function is below a preset value, resulting in a trained target detection model.
In this embodiment, the cross entropy loss function is used to train the graph convolution neural network, so that training efficiency can be improved, and a trained target detection model can be obtained more quickly.
As shown in fig. 6, the following describes in detail a specific implementation procedure of the remote sensing image target detection method in combination with a specific embodiment.
For training images, in order to obtain ideal homogeneous regions and good edges, an SLIC super-pixel method based on region merging and denoising can be adopted for preprocessing, so that a plurality of super-pixel regions are obtained; and then adopting fuzzy C-means clustering to excavate the connection among the nodes, constructing an edge matrix of the graph, and constructing a feature matrix of the graph by utilizing RGB features of the super-pixel nodes. And training the target detection model through the edge matrix and the feature matrix.
In the application process, the server obtains each node after classification through the same super-pixel segmentation method and the same edge matrix construction mode of the image to be processed based on the trained attention diagram convolutional network and finally through the trained target detection model, finds out the corresponding class and obtains the detection results of a plurality of super-pixel blocks in the remote sensing image by utilizing the corresponding relation between the class labels and the super-pixel areas. An embodiment of the present application will be described in detail with reference to fig. 6.
Step 1, the remote sensing image input into the system is an image which is regularly shot and uploaded by a professional satellite company by using a satellite, and has the characteristics of large occupied memory, high spectrum, high resolution and the like.
And 2, after the remote sensing image information is read and obtained, in order to meet the real-time requirement of processing a large remote sensing image, the problems of large memory occupation, low training speed, high requirement on the computing capacity of the GPU and the like in the training process are solved. Firstly, the large-scale remote sensing images are subjected to over-segmentation by utilizing a simple linear iterative clustering algorithm, so that a group of super-pixel groups is obtained. Compared with the Euclidean structure commonly used in deep learning, the super pixel can retain more scene edge information, and the speed of a segmentation algorithm is effectively increased.
Step 3, extracting edge information of a pixel segmentation block from an image subjected to super-pixel segmentation, wherein if the window with lower similarity is directly weighted and summed, partial loss of information such as edges, details and the like can be caused, in order to effectively keep more detail information in an edge area with rich changes, a non-local mean denoising algorithm based on a super-pixel segmentation result is provided, and the specific steps are as follows:
Step 3.1: dividing the image to be processed into a plurality of super pixel blocks with internal similarity by using SILC super pixel algorithm And obtaining texture profile/>, of the image to be processed。
Step 3.2: selecting a center pixel point corresponding to each super pixel blockAccording to texture profileAnd judging that the central pixel point is positioned in the super pixel block or on the edge line of the super pixel block, and adopting different similar window selection strategies and window sizes according to the structural characteristics of the position of the central pixel point.
Step 3.3: the similarity between each similar window and the center image block is calculated by using the formula (1), and the weight of each similar block is measured by the formula (2). The similar window is weighted and averaged in the formula (3) to obtain the pixel value after denoising。
Formula (1):
formula (2):
Equation (3):
Wherein: for Gaussian weighted Euclidean distance, a is the standard deviation of Gaussian kernel function, and a >0; /(I) Represents a window with subscript k within a similar window,For the normalization factor h is the decay parameter or smoothing parameter of the weighting function, the value of which depends on the noise level involved. If the value of h is too small, only excessive noise is removed, and if the value is too large, excessive smoothness is caused to the image; i is a similar block set of the central pixel point I; /(I)Is a weight function for measuring the similarity between pixel points i and j, and,. The weighting function is valued according to the Euclidean distance between u (i) and u (j).
Step 4, group-truth in the training set only provides labels at the pixel level. Although the simple linear iterative clustering algorithm considers various text features and boundary information during the over-segmentation, it is still possible to include different kinds of pixels in one super-pixel. Therefore, we use a majority voting strategy to obtain labels for superpixels.
Equation (4):
where M represents the number of pixels in the super pixel. The label value of the mth pixel is represented by sign (true) =1, sign (false) =0, r refers to the label type in the data set, and K labels are shared.
In step 5, in order to convert the superpixel group into a graph structure, fuzzy C-means clustering is introduced, the relation between the local and the global of the superpixel blocks is mined, the superpixel blocks capture the details and textures of the image at the local level and provide local information with more semantics, the superpixel blocks provide semantic information with higher level at the global level, and the relation between the local and the global is smoothness and consistency among the superpixel blocks. Adjacent super pixel blocks typically have similar characteristics, preserving consistency and continuity of the image. And the super-pixel RGB eigenvalues are used as eigenvectors of the graph nodes to form an eigenvmatrix X.
Equation (5):
Wherein r, g, b represent the average value of all pixel points RGB characteristic values of the ith super pixel block respectively.
Step 6, modeling the super pixel block as an undirected graph model: g (V, E), wherein the superpixel corresponds to a vertex. Where V and E represent the set of all vertices and edges, respectively.
Step 6.1: all the super pixel blocks adjacent to the target super pixel block are found.
Step 6.2: and calculating the Euclidean distance of the super pixel block adjacent to the target point on the RGB channel. The formula is as follows:
Equation (6):
Step 6.3: and setting a threshold value, and constructing a local edge set E 1 by the super-pixel points adjacent to the threshold value and the target point.
Equation (7):
Step 6.4: global edge set E 2 is constructed by fuzzy C-means clustering (FCM). FCM divides M super pixel blocks x i (i=1, 2,..n) into c class clusters and calculates the cluster center of each class to minimize the objective function. FCM uses fuzzy partitioning to make each given data point determine its degree of membership to each cluster class with a value between 0, 1. The membership matrix U allows elements between values of [0,1] to be taken in adaptation to the introduction of fuzzy partitioning. Let u ij denote the membership degree of the j-th super-pixel block belonging to the i-th class, then u ij satisfies the following condition:
Equation (8):
similarly, a threshold is set And constructing a global edge set E 2 by combining the super pixel blocks with membership degrees higher than the threshold value with the target points.
Equation (9):
Step 7, the input is processed by the graph convolution neural network, which can be expressed as:
Equation (10):
Wherein the method comprises the steps of Known as the generalized graph laplace operator.
Equation (11):
generalized graph laplacian The elements of (a) represent the structural information of the graph model (which points are interconnected).
The embodiment of the application uses two layers of graph convolution, and the input graph structural characteristics are propagated layer by layer according to the following rules:
equation (12):
Wherein the method comprises the steps of ForA trainable weight matrix for layer graph convolution; /(I)Representing an activation function. /(I)Refers to theThe layer diagram convolves the input signal. The input in the first layer convolution is X.
The embodiment of the application uses two layers of graph roll lamination, and the output of the second layerWe classify the output of the second graph convolution with the SoftMax function.
And 8, introducing an attention mechanism, and adding an attention layer before stacking the graph.
Equation (13):
Wherein:
equation (14):
equation (15):
Step 9, after the two-layer graph is convolved, a classification result about the node can be obtained, and the training parameters are updated by random gradient descent through multi-classification cross entropy calculation loss:
Equation (16):
where i denotes the ith node, y denotes the set of groups-truth, and F denotes the number of classifications.
Step 10, as shown in fig. 7, according to the classification result y of the node,And attributing the super-pixel block to a remote sensing image segmentation category corresponding to the super-pixel block, wherein if y 1 represents a city, if the predicted result y is equal to y 1, the target detection result of the super-pixel block is the city.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a remote sensing image target detection device for realizing the remote sensing image target detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for detecting a target of a remote sensing image provided below may be referred to the limitation of the method for detecting a target of a remote sensing image hereinabove, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 8, there is provided a remote sensing image object detection apparatus, including: a pixel segmentation module 801, a feature matrix determination module 802, an edge matrix determination module 803, and an object detection module 804, wherein:
The pixel segmentation module 801 is configured to perform pixel segmentation on an image to be processed to obtain a superpixel group corresponding to the image to be processed, where the superpixel group includes a plurality of superpixel blocks, and the superpixel blocks include a plurality of similar pixel points;
a feature matrix determining module 802, configured to determine a feature matrix corresponding to the superpixel group, where the feature matrix is determined based on RGB values of each pixel point in the superpixel block;
An edge matrix determining module 803, configured to determine an edge matrix corresponding to the superpixel group based on the local edge set and the global edge set corresponding to the superpixel group; the local edge set is determined according to a distance between the super pixel block and an adjacent super pixel block of the super pixel block; the global edge set is determined according to membership degrees between the super pixel blocks and other super pixel blocks in the image to be processed;
The target detection module 804 is configured to perform target detection processing on the image to be processed based on the feature matrix, the edge matrix, and the trained target detection model, so as to obtain types of the super pixel blocks in the image to be processed.
Further, the feature matrix determining module 802 is specifically configured to: traversing each super pixel block in the super pixel group, and determining a characteristic value of the current super pixel block based on an RGB average value corresponding to the current super pixel block; the RGB average value is determined based on the average value of R value, G value and B value of a plurality of pixel points; and after traversing each super pixel block in the super pixel group, obtaining a feature matrix containing a plurality of feature values.
Further, the edge matrix determining module 803 is specifically configured to: determining a distance value between a current super-pixel block and an adjacent super-pixel block of the current super-pixel block for each super-pixel block; if the distance value is not greater than a first threshold value, determining a first connection edge between the current super-pixel block and the adjacent super-pixel block, and merging a plurality of first connection edges to obtain a local edge set corresponding to the super-pixel group; determining a plurality of membership degrees corresponding to the current super-pixel block according to a fuzzy C-means clustering strategy; the membership degree is used for indicating the degree that the current super pixel block belongs to a certain class cluster; if the membership degree is larger than a second threshold value, determining a second continuous edge between the current super-pixel block and a clustering center of a class cluster corresponding to the membership degree, and combining a plurality of the second continuous edges to obtain a global edge set corresponding to the super-pixel group; and merging the local edge set and the global edge set to obtain an edge matrix corresponding to the super pixel group.
Further, the object detection module 804 is specifically configured to: inputting the feature matrix and the edge matrix into a trained target detection model to obtain a type value of each super pixel block in the image to be processed; the target detection model is used for determining the graph structural characteristics of the feature matrix and the edge matrix after being combined based on the graph attention layer, inputting the graph structural characteristics into at least one graph convolution layer for convolution, and outputting the type value of each super pixel block in the graph structural characteristics; the graph structure features comprise feature values and continuous edges corresponding to the super pixel blocks.
Further, the device further comprises a denoising module, which is specifically configured to: determining a plurality of similar windows corresponding to the super pixel blocks, wherein the similar windows are similar pixel blocks determined based on the positions, the texture change trends and the texture direction angles of the super pixel blocks; based on a Gaussian weighted Euclidean distance algorithm, determining the similarity between the super pixel block and each similar window, and based on a plurality of the similarities, determining the weight values of a plurality of the similar windows corresponding to the super pixel block; and carrying out weighted average calculation on the similar windows based on the weight values to obtain the denoised super-pixel block.
Further, the device further comprises a label determining module, specifically configured to: performing pixel segmentation on a training image to obtain a super-pixel group corresponding to the training image, wherein the super-pixel group comprises a plurality of super-pixel blocks, and the super-pixel blocks comprise a plurality of similar pixel points; acquiring label values of all pixel points in the current super-pixel block aiming at all super-pixel blocks of the training image; the label value is used for representing the classification type of each pixel point; determining the label value corresponding to the current super-pixel block according to the label value of each pixel point and the majority voting strategy, so as to obtain the label values of a plurality of super-pixel blocks in the training image, and taking the training image and a plurality of label values corresponding to the training image as a training sample set; the label values of the superpixel blocks are used to train the target detection model.
Further, the device also comprises a training module, which is specifically used for: determining a feature matrix and a side matrix of the training image in the training sample set; inputting the feature matrix and the edge matrix of the training image into a graph convolution neural network in a target detection model, and training the graph convolution neural network based on the label value of each super pixel block in the training image and a cross entropy loss function to obtain a trained target detection model; wherein the graph roll-up neural network comprises a graph attention layer and a graph roll-up layer.
The modules in the remote sensing image target detection device can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing original remote sensing image data and the remote sensing image after target detection. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor is configured to implement a remote sensing image target detection method.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is also provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.