CN119649135A

CN119649135A - A classification method for multi-source remote sensing data

Info

Publication number: CN119649135A
Application number: CN202411798370.4A
Authority: CN
Inventors: 涂兵; 陈卓宇; 刘博�; 李军; 方乐缘; 陈云云; 曹兆楼; 贺燕; 刘立成
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-12-09
Filing date: 2024-12-09
Publication date: 2025-03-18

Abstract

The present invention provides a multi-source remote sensing data classification method, which belongs to the field of hyperspectral image processing in the remote sensing field, including: obtaining remote sensing data of multiple targets; extracting shallow features of each remote sensing data; obtaining multiple frequency features of each remote sensing data; fusing the same frequency features of all remote sensing data to obtain multiple same-frequency fusion features; splicing multiple same-frequency fusion features to obtain multi-source fusion features; passing the multi-source fusion features through superimposed frequency modulation layers and attention layers in sequence to obtain fused global features and local features; weighting the global features and local features in the spectral dimension to obtain the predicted classification results of the target. The present invention realizes the directional frequency feature decomposition and fusion of multi-source data, extracts multi-source fusion features based on multi-source remote sensing data as the basis for classification, provides more comprehensive information for the classification of the target, can fully capture complex ground features, and make the classification results more accurate.

Description

Multi-source remote sensing data classification method

Technical Field

The invention relates to a multi-source remote sensing data classification method, and belongs to the field of hyperspectral image processing in the remote sensing field.

Background

At present, the remote sensing data presents the coexistence of multi-source data such as high, medium and low resolution, multi-spectrum, hyperspectral, synthetic aperture radar SAR, street view, liDAR laser point cloud and the like, and provides basic data guarantee for remote sensing monitoring and other multi-field applications.

The existing remote sensing images mainly comprise visible light RGB remote sensing images, panchromatic remote sensing images, multi/hyperspectral remote sensing images, infrared remote sensing images, liDAR remote sensing images and synthetic aperture radar SAR remote sensing images.

The visible light RGB remote sensing image is a special case in a multispectral image, the waves of three channels of red spectrum, blue spectrum and green spectrum are fused, the RGB remote sensing image is the most commonly applied remote sensing image in real life, and the RGB remote sensing image is usually used for distinguishing the terrain and the ground objects.

The full-color remote sensing image is different from the RGB remote sensing image, and is a black-and-white image of the whole visible light wave region acquired by a remote sensor and is called full-color image, the full-color remote sensing image is displayed as a gray picture on a picture, and the full-color remote sensing image is high in spatial resolution generally but cannot display the colors of ground objects.

Multispectral remote sensing technology can bring more color information by fusing tens to hundreds of spectrums, can assist in judging the properties of earth surface substances, but has lower spatial resolution.

Hyperspectral imaging techniques generate rich spectral data by capturing reflectance spectra in multiple bands, and can detect unique spectral features at different spatial locations of a single object, and thus. The hyperspectral imaging technology can finely analyze and distinguish different substances, such as plant types, soil types, water quality and the like, each wave band provides unique spectral characteristics, and visually indistinguishable substances can be detected, so that the hyperspectral image has unique advantages in material identification and change detection.

The infrared remote sensing image is an image obtained by sensing infrared rays reflected by a ground object and radiated by the ground object, and has the defects of low resolution, low contrast, low signal to noise ratio and blurred visual effect because the infrared rays have long wavelength and strong penetrating power in the atmosphere and are not influenced by night and smog.

LiDAR remote sensing images are images obtained by resolving the ground coordinates of a laser spot for the angle of laser light emitted from the air or space vehicle and the distance of the laser light detected. The laser radar LiDAR technology obtains three-dimensional space information of a target object by emitting laser and measuring reflection time, generates high-precision point cloud data, and the LiDAR data not only provides ground height, but also can depict the shape and structure of a ground object, thereby having important value for forest monitoring, city modeling, topography analysis and other applications.

The synthetic aperture radar SAR is a technology for achieving the measurement effect of a large aperture radar by using a small aperture antenna through motion and mathematical calculation, is a high resolution imaging radar system, and can synthesize a larger synthetic aperture than a physical antenna aperture through the antenna motion on a mobile platform (such as a satellite or an airplane), thereby improving imaging resolution, and can accurately image the ground under various weather conditions and illumination conditions, and each pixel of the generated remote sensing image not only comprises a reflected intensity of a reaction surface microwave, namely a so-called gray value, but also comprises a phase value related to a radar tilt distance, but the latter shows extremely high randomness, is generally regarded as noise, and brings inconvenience to interference analysis.

In summary, each single-source remote sensing image has the advantages, but because the single-source remote sensing image has a small information amount, complex ground feature features may not be captured sufficiently, so that it is necessary to combine different remote sensing data to judge the nature of the earth surface material, and perform forest monitoring, land utilization and above-ground biomass estimation by using the information provided by the multi-source data.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a multi-source remote sensing data classification method, and solves the problem that complex features cannot be fully captured by utilizing single-source remote sensing data.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

The invention provides a multi-source remote sensing data classification method which comprises the steps of obtaining multiple types of remote sensing data of a target object, respectively extracting shallow features of each remote sensing data, inputting the shallow features into a pre-built frequency feature decomposition module to obtain multiple preset frequency features of each remote sensing data, inputting the frequency features of all remote sensing data into a pre-built same-frequency feature fusion module, carrying out feature fusion on the same frequency features by the same-frequency feature fusion module to obtain multiple corresponding same-frequency fusion features, splicing and fusing the multiple same-frequency fusion features to obtain multi-source fusion features, sequentially enabling the multi-source fusion features to pass through a superposed frequency modulation layer and an attention layer to obtain fused global features and local features, weighting the fused global features and local features in a spectrum dimension, extracting depth information, and further obtaining a prediction classification result of the target object.

The method comprises the steps of obtaining a plurality of scale characteristics, obtaining channel fusion characteristics, carrying out element summation, average and maximum pooling operations on the channel fusion characteristics along the channel dimension in any preset channel, obtaining a corresponding summation characteristic diagram, an average characteristic diagram and a maximum pooling characteristic diagram, carrying out splicing on the summation characteristic diagram, the average characteristic diagram and the maximum pooling characteristic diagram, carrying out further channel fusion by convolution, obtaining low-dimensional channel characteristics of the channel, and carrying out splicing on the low-dimensional channel characteristics of all channels, thus obtaining the shallow characteristics of the remote sensing data.

Further, the multi-scale convolution operation comprises the steps of respectively carrying out 3×3 convolution, 5×5 convolution and 7×7 convolution on the partitioned patch blocks, and carrying out batch normalization operation and ReLU operation after each convolution operation in sequence to obtain corresponding 3 scale features.

The frequency characteristic decomposition module is constructed based on frequency domain transform, and the obtaining of the plurality of preset frequency characteristics of each shallow characteristic comprises the steps of obtaining the processed shallow characteristics by rolling and normalizing the input shallow characteristicsAnd the processed shallow layer featuresAverage division into spectral dimensionsIn the head and willThe individual heads are divided intoAn aliquot, wherein,For the number of frequency signatures, such that each aliquot is used to calculate one frequency signature; Is a preset value.

Each frequency characteristic corresponds to a preset window form, each head is divided into non-overlapping windows by adopting the window form corresponding to the frequency characteristic for any equal part, then the attention of each window is calculated, the frequency characteristic of each head is further obtained, and finally the frequency characteristic of each head is spliced to obtain the frequency characteristic corresponding to the equal part.

Further, the frequency characteristics comprise low frequency characteristics, high frequency characteristics, vertical characteristics and horizontal characteristics, and window forms corresponding to the low frequency characteristics are as follows:; in the form of a window size index, Taking a positive integer; in a header for computing low frequency features, each window containsA token, which refers to a minimum unit in the window.

The window shape corresponding to the high-frequency characteristic is as follows: in the header for computing the high frequency characteristics, each window contains The window forms corresponding to the vertical features are as follows: in the header for computing the vertical features, each window contains The window forms corresponding to the horizontal features are as follows: in the header for computing the horizontal features, each window contains And a token.

Further, the vertical feature obtaining process comprises the steps ofThe individual heads being uniformly divided into non-overlapping partsIndividual windows,For the number of windows to be the number of windows,,For the processed shallow featuresIs longer or wider than the above;, For the processed shallow features Is defined by the spectral dimensions of (a);

First, the Query tensor for individual headerTensor of keySum tensorIs of the dimension ofWherein, the method comprises the steps of, wherein,;Calculate the firstAttention of each window in the head, whereinThe attention calculations for the individual windows are:

;

In the formula, Is the firstThe result of the attention calculations for the individual windows,In order to calculate the attention operation,,,Respectively the firstA query matrix, a key matrix, and a value matrix for each header;

According to the first Attention of each window in the head gets the firstThe vertical features of the individual head are expressed as:

;

In the formula, Is the firstThe vertical nature of the individual head is such that,,,1 St, 2 nd and 2 nd, respectivelyThe attention calculation results of the windows;

The vertical features of all heads for calculating the vertical features are spliced to obtain the vertical features of the shallow features, and the expression is:

;

In the formula, Is a vertical feature of the shallow features,In order for the splicing operation to be performed,、AndRespectively the firstFirst, secondAnd (b)Vertical features of the individual head.

Further, the same-frequency component fusion module performs feature fusion on the same frequency components to obtain a plurality of corresponding same-frequency fusion features, wherein the method comprises the steps of adding any one frequency component from all remote sensing data according to elements to obtain the same-frequency componentWill (i) beGlobal average pooling is carried out on the channel dimension, and then the channel weight is obtained through a channel attention moduleThe expression is:

;

In the formula, For the channel weight to be a function of the channel weight,Is a convolution layer of 1 x1,In order to take the maximum value it is,The output of the global average pooling is performed in the channel dimension.

Will beRespectively carrying out global average pooling and global maximum pooling on the space dimension, and then obtaining space weight through a space attention moduleThe expression is:

;

In the formula, As the spatial weight of the object to be processed,For a 7 x 7 convolutional layer,Is thatThe output of global average pooling is done in the spatial dimension,Is thatThe output of global maximum pooling is done in the spatial dimension.

According to the broadcasting rule, the channel weight is calculated by addition operationAnd spatial weightFusing to obtain coarse weightWill (i) beAndIs rearranged by a rearrangement operation, expressed as:

;

In the formula, Is a fine weight; as a function of the sigmoid, For the group convolution, the number of groups is set to the number of channels,For the channel re-arrangement operation,Is a coarse weight.

And according to the frequency components and the fine weights of all the remote sensing data, combining residual connection, and adopting a weighted summation mode to obtain the same-frequency fusion characteristic of the frequency characteristic.

Further, the frequency modulation layer and the attention layer adopt a staged architecture of superposition of the frequency modulation layer and the attention layer connected in series after superposition, and are introducedThe factor controls the number of fm and attention layers in the total number of layers, wherein,The frequency modulation layer is the ratio of the total layer number.

Further, the frequency modulation layer is used for capturing local features, including by first applying a block-based fast Fourier transformWill input featuresTransforming to frequency domain, then introducing a learnable matrix, suppressing or amplifying all frequency components by multiplication of elements in the frequency domain to obtain frequency modulation characteristicsRe-use of inverse fourier transformAnd reconstruct to obtain refined output characteristicsThe expression is:

;

In the formula, As an input feature of the frequency modulation layer,For features obtained through the forward propagation network,In order to be a frequency modulation feature,For the output characteristics of the frequency modulation layer,For the layer normalization,Is a convolution layer of 1 x1,In order to activate the function,For the block-partitioning operation of the block,For the multiplication of the elements,In order for the matrix to be a matrix to be learnable,For the block-merging operation,In the case of a fast fourier transform,In the case of an inverse fast fourier transform,Is a multi-layer perceptron operation.

The attention layer is used for capturing global attributes or semantic features, and comprises the steps of sequentially carrying out layer normalization and multi-head attention operation on input features of the attention layer, sequentially carrying out layer normalization and multi-layer perceptron operation, and finally outputting, wherein the multi-layer perceptron operation is used for mixing channels in the attention layer.

Further, weighting the fused global features and local features in the spectrum dimension to obtain depth features and further obtain the classification result of the target object, wherein the method comprises the steps of firstly learning key information through one-dimensional convolution, then highlighting the obvious features through an activation function, and finally passing throughThe function obtains a prediction result, and the expression is:

;

In the formula, As a feature of the depth,In order to predict the outcome of the classification,In order to operate the full-connection type of the device,AndIs two activation functions.

Compared with the prior art, the invention has the beneficial effects that:

(1) The multi-source remote sensing data classification method provided by the invention is characterized in that multi-scale convolution and channel level fusion operations are respectively used on the cube blocks of each single-source remote sensing image to realize shallow feature extraction and fusion, and then different window shapes are used for capturing different frequency features of each single-source remote sensing image by utilizing a frequency feature decomposition module established based on multi-head self-attention, and then the same-frequency feature fusion module is used for carrying out feature fusion on multi-source data to realize directional frequency feature decomposition and fusion of the multi-source data, so that multi-source fusion features extracted based on the multi-source remote sensing data are obtained as the basis for classifying target objects. In addition, the frequency modulation layer and the attention layer are combined, local and global feature learning is realized through frequency modulation, different frequency component features are obtained, depth features are extracted according to the different frequency component features, and accurate classification of the target object is completed;

(2) The multi-source remote sensing data classification method provided by the invention adopts a deep learning classification method with stronger characterization and generalization capability to extract deeper image features, learns the spatial and spectral features of the remote sensing image under a low training sample, and improves the classification precision to obtain more discrimination features so as to obtain a good classification result;

(3) The invention is introduced into The factor controls the number of the frequency modulation layers and the attention layers, and the method helps to accurately capture global features and local features by flexibly changing the number of the frequency modulation layers and the attention layers.

Drawings

FIG. 1 is a flow chart of a multi-source remote sensing data classification method provided in embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of shallow features of remote sensing data extraction in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of obtaining frequency characteristics by using a frequency characteristic decomposition module in embodiment 2 of the present invention;

Fig. 4 is a schematic diagram of obtaining co-frequency fusion features by using a co-frequency feature fusion module in embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of a frequency modulation layer according to embodiment 1 of the present invention;

FIG. 6 is a schematic view of the attention layer provided in example 1 of the present invention;

FIG. 7 is a graph showing the classification results of the different methods of example 3 of the present invention on the Houston2013 dataset.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.

The remote sensing technology is a technology for detecting and recognizing an object by sensing electromagnetic waves, visible light, and infrared rays Q reflected or radiated from the object at a long distance. The remote sensing satellite is provided with a relevant remote sensing sensor, electromagnetic wave information radiated or reflected by the earth or an atmospheric target is collected by the remote sensing sensor and recorded, the information is sent back to the ground by a signal start and transmission device, and a visible image, namely a satellite image which is commonly known by people, is obtained through electromagnetic wave conversion and recognition.

In the process of remote sensing digital image processing, the spatial characteristics of the ground object are mainly represented by the change of spectral characteristics. The multispectral technology is a spectrum detection technology capable of simultaneously acquiring a plurality of optical spectrum bands and expanding towards infrared light and ultraviolet light on the basis of visible light.

Example 1

The invention provides a multi-source remote sensing data classification method, as shown in figure 1, which comprises the following steps:

And step 1, acquiring various types of remote sensing data of the target object.

In a specific embodiment, two or three remote sensing images of the target object are selectively acquired according to the actual requirements of the image study.

It should be noted that, after the hyperspectral image is obtained, PCA dimension reduction processing is required to be performed on the hyperspectral image so as to extract main features in the image and reduce redundant information.

And 2, respectively extracting shallow layer characteristics of each remote sensing data, wherein the shallow layer characteristics comprise:

And step 21, performing patch block division on the remote sensing data.

Specifically, the patch block division is performed on an image to divide the image into units that are smallest one by one.

In embodiments where hyperspectral and LiDAR images are acquired, the dimension-reduced hyperspectral image is processedAnd LiDAR imageRespectively taking patch blocks one by one to obtain hyperspectral image patch blocksAnd LiDAR image patch blockWherein, the method comprises the steps of, wherein,To take the length and width of the pixel after the patch,Is the firstQuery tensor for individual headerTensor of keySum tensorIs a dimension of (c).

And step 22, performing multi-scale convolution operation on the partitioned patch blocks to obtain corresponding multi-scale features.

The multi-scale convolution operation comprises the steps of respectively carrying out different convolution operations on the divided images for a plurality of times, and sequentially carrying out batch normalization operation and ReLU operation after each convolution operation to obtain corresponding various scale features.

As shown in FIG. 2, in an embodiment in which hyperspectral and LiDAR images are acquired, a block of hyperspectral images is patchedAnd LiDAR image patch blockThe operations of 3×3 convolution+batch normalization+relu, 5×5 convolution+batch normalization+relu and 7×7 convolution+batch normalization+relu are performed respectively, and the calculation expressions are:

;

Wherein, For the convolution kernel size, set to 3,5,7,Is the firstThe normalization function is batched under the scale of each,Is a common function of activation and is,AndThe first of hyperspectral image and LiDAR image, respectivelyThe individual dimensions output features.

Step 23, in any preset channel, firstly, stacking the multiple scale features along the channel dimension to obtain a channel fusion feature.

Specifically, the expression of the channel fusion feature is:

;

Wherein, Is the firstThe number of channels in the channel is the same,,,Respectively the firstThree different scale output features of the channels,Is the firstThe channel fusion characteristics of the individual channels,For channel stacking operations.

In the embodiment of acquiring hyperspectral and LiDAR images, three different scale output features of the hyperspectral and LiDAR images are calculated respectively, and by using the expression, a plurality of channel fusion features corresponding to the hyperspectral images and a plurality of channel fusion features corresponding to the LiDAR images are calculated respectively in each preset channel dimension.

And then, element summation, average and maximum pooling operations are respectively carried out on each channel fusion characteristic along the channel dimension, channel level attributes are extracted, and a corresponding summation characteristic diagram, average characteristic diagram and maximum pooling characteristic diagram are obtained.

Specifically, the expression is:

;

Wherein, To perform the feature map obtained for the element summing operation,In order to perform the element summing operation,In order to perform the feature map obtained by the averaging operation,In order to perform the averaging operation,To perform the feature map obtained for the max-pooling operation,To perform the max-pooling operation.

Then, the three feature maps are spliced, and channel fusion is carried out by using 3×3 convolution, so that low-dimensional channel features are obtained.

Specifically, the expression is:

;

Wherein, Is the firstLow dimensional channel characteristics of individual channels.

In embodiments that acquire hyperspectral and LiDAR images, the hyperspectral image acquires the low-dimensional channel characteristics of each channel, and the LiDAR image also acquires the low-dimensional channel characteristics of each channel.

And step 24, splicing the low-dimensional channel characteristics of all channels to obtain shallow layer characteristics of the remote sensing data.

Specifically, the expression is:

;

Wherein, As a shallow feature of the remote sensing data,1 St, 2 nd and 2 nd, respectivelyLow dimensional channel characteristics of individual channels.

In the embodiment of acquiring hyperspectral and LiDAR images, the method is used for splicing the low-dimensional channel characteristics of each channel of the hyperspectral image to obtain the shallow layer characteristics of the hyperspectral imageSimultaneously, splicing the low-dimensional channel characteristics of all the channels of the LiDAR image to obtain the shallow layer characteristics of the LiDAR image。

The shallow features of each type of remote sensing data also comprise the data characteristics of the technology of the shallow features, as different remote sensing technologies have the advantages and disadvantages of each type of remote sensing data. Next, feature fusion is performed on the different types of remote sensing data.

And step 3, inputting the shallow features into a frequency feature decomposition module to obtain a plurality of preset frequency features of each remote sensing data.

It should be noted that the frequency characteristic decomposition module is constructed based on the frequency domain transducer and is used for extracting characteristics of different frequencies of the remote sensing data by adopting different windows, so that the remote sensing data can be analyzed more comprehensively.

Specifically, before frequency division, the shallow features input into the frequency feature decomposition module are rolled and normalized to obtain processed shallow features.

In an embodiment that acquires hyperspectral and LiDAR images, the expression is:

;

Wherein, For shallow features of the processed hyperspectral image,For the processed LiDAR image shallow features,Is a layer normalization.

The process of obtaining multiple frequency features from the processed shallow features includes equally dividing the processed shallow features into spectral dimensionsIndividual head and willThe individual heads are divided intoAn aliquot, wherein,For the number of frequency signatures, such that each aliquot is used to calculate one frequency signature; Is a preset value.

Each frequency characteristic corresponds to a preset window form, for any equal part, each head is uniformly divided into non-overlapping windows by adopting the window form corresponding to the frequency characteristic, then the attention of each window is calculated, the frequency characteristic of each head is further obtained, and finally the frequency characteristic of each head is spliced to obtain the frequency characteristic corresponding to the equal part.

In the embodiment where hyperspectral and LiDAR images are acquired, the preset frequency features include a low frequency feature, a high frequency feature, a vertical feature, and a horizontal feature. In these embodiments, the low frequency, high frequency, vertical, and horizontal features of the hyperspectral image, as well as the low frequency, high frequency, vertical, and horizontal features of the LiDAR image, are obtained by the methods described above.

And 4, inputting the frequency characteristics of all the remote sensing data into a same-frequency characteristic fusion module to obtain a plurality of corresponding same-frequency fusion characteristics.

The same-frequency characteristic fusion module performs characteristic fusion on the same frequency characteristics, and comprises the following steps:

Step 41, for any frequency component, adding the frequency components from all remote sensing data by elements to obtain the same frequency component Will (i) beCarrying out global average pooling on the channel dimension, and then obtaining channel weight through a channel attention module, wherein the expression is as follows:

;

In the formula, For the channel weight to be a function of the channel weight,Is a convolution layer of 1 x1,The output of the global average pooling is performed in the channel dimension.

Step 42. Will beRespectively carrying out global average pooling and global maximum pooling on the space dimension, and then obtaining space weight through a space attention moduleThe expression is:

;

Step 43, weighting the channel by addition operation according to the broadcasting ruleAnd spatial weightFusing to obtain coarse weightWill (i) beAndIs rearranged by a rearrangement operation to obtain a fine weightThe expression is:

;

In the formula, For the fine weight of the weight, the weight of the weight is,As a function of the sigmoid,For the group convolution, the number of groups is set to the number of channels,For the channel re-arrangement operation,Is a coarse weight.

And step 44, according to the frequency components and the fine weights of all the remote sensing data, combining residual connection, and adopting a weighted summation mode to obtain the same-frequency fusion characteristic of the frequency characteristic.

In the embodiment of acquiring hyperspectral and LiDAR images, as shown in FIG. 4, the same-frequency feature I from the hyperspectral image and the same-frequency feature II from the LiDAR image are fused by the method to obtain the same-frequency fusion feature, wherein the same-frequency fusion feature comprises a high-frequency fusion feature, a low-frequency fusion feature, a vertical fusion feature and a horizontal fusion feature.

And 5, splicing and fusing the same-frequency fusion characteristics to obtain multi-source fusion characteristics.

In the embodiment of acquiring hyperspectral and LiDAR images, the acquired low-frequency fusion features, high-frequency fusion features, vertical fusion features and horizontal fusion features are spliced to obtain multi-source fusion features.

And 6, sequentially passing the multi-source fusion features through the overlapped frequency modulation layer and the attention layer to obtain the fused global features and local features.

Specifically, the frequency modulation layer and the attention layer are laid out in a staged architecture, i.e. a plurality of frequency modulation layers and a plurality of attention layers are respectively stacked in series, and then the two parts are connected in series.

Wherein the frequency modulation layer is used to capture local features, as shown in FIG. 5, comprising first applying a block-based fast Fourier transformWill input featuresTransforming to frequency domain, then introducing a learnable matrix, suppressing or amplifying all frequency components by multiplication of elements in the frequency domain to obtain frequency modulation characteristicsRe-use of inverse fourier transformAnd reconstruct to obtain refined output characteristicsThe expression is:

;

In the formula, As an input feature of the frequency modulation layer,For features obtained through the forward propagation network,In order to be a frequency modulation feature,For the output characteristics of the frequency modulation layer,Is a convolution layer of 1 x1,In order to activate the function,For the block-partitioning operation of the block,For the multiplication of the elements,In order for the matrix to be a matrix to be learnable,For the block-merging operation,In the case of a fast fourier transform,In the case of an inverse fast fourier transform,Is a multi-layer perceptron operation.

The attention layer is used to capture global attributes or semantic features, as shown in FIG. 6, which is a standard attention layer, including input features that will be the attention layerLayer normalization and multi-head attention operation are sequentially performed, then layer normalization and multi-layer perceptron operation are sequentially performed, and finally output is performed, wherein the expression is as follows:

;

Wherein, For the input features of the attention layer,For features obtained through layer normalization and multi-head attention,For the output characteristics of the attention layer,For the operation of the multi-head attention,For multi-layer perceptron operation, for channel mixing in the attention layer.

In some specific embodiments, by introducingThe factor controls the number of fm and attention layers in the total number of layers, wherein,The frequency modulation layer is the ratio of the total layer number.

It is known that the fm layer has the disadvantage of not accurately handling global properties or semantic features, while the attention layer has the disadvantage of not accurately capturing local features, where the two are combined and introducedThe factor flexibly changes the number of frequency modulation layers and attention layers, which is helpful for accurately capturing global features and local features.

Step 7, weighting the fused global features and local features in the spectrum dimension to obtain depth features, and further obtaining the prediction classification result of the target object, wherein the method comprises the steps of firstly learning key information through one-dimensional convolution, then highlighting significant features through an activation function, and finally passing throughThe function obtains a prediction result, and the expression is:

;

Example 2

On the basis of embodiment 1, this embodiment provides a process of obtaining frequency characteristics of two remote sensing data by using a frequency characteristic decomposition module in the embodiment of obtaining hyperspectral and LiDAR images.

In this embodiment, as shown in FIG. 3, the frequency characteristics include a low frequency characteristic, a high frequency characteristic, a vertical characteristic, and a horizontal characteristic.

Specifically, the window shape corresponding to the low frequency characteristic is: the window shape corresponding to the high-frequency characteristic is as follows: the window shape corresponding to the vertical feature is as follows: The window forms corresponding to the horizontal features are as follows: Wherein, the method comprises the steps of, wherein, In the form of a window size index,Taking a positive integer.

It should be noted that the number of the components,Is a window size index, characterizes the size of the window,The value determines the number of windows obtained by the division.

First, shallow features of the processed hyperspectral imageAnd processed LiDAR image shallow featuresAverage division in spectral dimensionIndividual head and willThe individual heads were divided into 4 equal parts.

First part of the first equal part) For calculating low frequency characteristics byWindow morphology of the first aliquot, each window comprisingA token, wherein the token refers to a minimum unit in a window.

First part of the second equal part) For calculating high frequency characteristics byWindow morphology of the second aliquot, each window comprisingAnd a token.

First part of the third equal part) For computing vertical features, usingDividing each header in the third aliquot into windows, each window comprisingAnd a token.

First part of the fourth equal part) For calculating horizontal features, usingWindow morphology of the first aliquot, each window comprisingAnd a token.

The following is a processed LiDAR image shallow layer featureThe vertical feature of the third equal part is calculated by taking the example, and a specific calculation process is shown, wherein the method comprises the following steps:

S1, by Will be the window form of (2)The individual heads being uniformly divided into non-overlapping partsIndividual windows,;;For shallow features of processed LiDAR imagesIs defined, the spectral dimensions of (a) are defined.

First, theThe query tensor, key tensor and value tensor of the individual header have dimensions ofWherein, the method comprises the steps of,。

S2, calculating the firstAttention of each window in the head, whereinThe attention calculations for the individual windows are:

;

In the formula, Is the firstThe result of the attention calculations for the individual windows,In order to calculate the attention operation,,,Respectively the firstA query matrix, a key matrix, and a value matrix for each header.

S3 according to the firstAttention of each window in the head gets the firstVertical features of individual head, 1The vertical features of the individual head are:

;

s4, splicing vertical features of all heads in the third equal part to obtain vertical features of the processed LiDAR image shallow features, wherein the expression is as follows:

;

In the formula, For vertical features of the processed LiDAR image shallow features,In order for the splicing operation to be performed,、AndRespectively the firstFirst, secondAnd (b)Vertical features of the individual head.

It should be noted that the process of calculating the low frequency feature, the high frequency feature, and the horizontal feature is similar to the above method of calculating the vertical feature.

The frequency characteristic decomposition module in the invention adopts the method to extract different frequency characteristics respectively in different window postures, cuts in from the angle of frequency domain decomposition, and well utilizes various directional frequency characteristics to enhance the accuracy of classification results by a multi-head attention mechanism mode.

Example 3

The present example provides an experimental procedure to verify the classification effect of the proposed method.

The experimental hardware platform is a high-performance computer, and is configured as Intel Core i9-11900K, the CPU operation speed is 3.50GHz, the eight-Core processor, the memory is 32G, the graphics card is NVIDIA GTX3070Ti, the software platform is Python3.8 in Windows11 system environment, and the proposed method is realized in PyTorch frames.

1. Experimental data and sample partitioning

To evaluate the classification effect of the proposed method, the Houston2013 dataset was selected to verify the performance of the proposed method. The Houston2013 dataset is collected by a national onboard laser mapping center of a university campus of Houston to obtain LiDAR data of the region based on HSI and DSM. The spatial resolution of the HSI and LiDAR data was 2.5m, containing 349×1905 pixels in total. The HSI has 144 bands ranging from 0.38 nm to 1.05 nm.

The dataset was marked as 15029 ground truth samples, and samples selected from 15 categories, the size of each category sample, and the partitioning of the number of experimental training samples and test set samples are shown in table 1.

The classification accuracy evaluation index of the hyperspectral image adopts three common evaluation indexes of overall classification accuracy (OA), average classification accuracy (AA) and Kappa coefficient to measure the classification accuracy.

TABLE 1 training set and test set sample numbers for Houston2013 dataset

。

2. Parameter setting

In the experiment, three parameters of learning rate, space size and discarding rate can have significant influence on the experiment. Taking the Houston2013 dataset as an example, the experimental parameters were evaluated in detail.

1) Learning rate-in experiments, higher learning rates may lead to rapid model convergence but may also cause training instability and even loss function divergence, while lower learning rates generally make convergence more stable but training process slower. In addition, the learning rate also affects the ability of the model to escape from the locally optimal solution, and too much learning rate may cause concussion to fail to converge, while too little learning rate may stagnate in the locally optimal solution. Thus, the experiment selects different learning rates to test the influence of the learning rate, respectively, and the selected learning rates include 0.01,0.005,0.001,0.0007,0.0005,0.0003,0.0001,0.00007,0.00005,0.00003 and 0.00001. Experimental results show that the classification effect is best when the learning rate is 0.0001.

2) Space size-because of the extraction of image space features, the size of the spatial domain area is severely dependent. And the larger space input provides more opportunities to learn more space features, the smaller space size can capture more detailed local features, the sensitivity of the model to fine objects and changes is enhanced, the model is suitable for identifying complex ground object types, however, the too small space size can lead to sparse information and increase noise interference so as to influence classification accuracy, and conversely, the larger space size can extract more abundant context information, enhance the learning of global features, is suitable for classification of larger areas, but can ignore details and lead to class confusion. Therefore, it is very important to select a proper space size to improve the classification performance, and in the case that the number of spectrum channels is fixed, the optimal learning rate, the batch size is 64, and the training iteration number is 100, the classification precision results under different space sizes are shown in table 2.

3) Discarding rate:

as can be seen from Table 3, when the spatial size of the input data is 8 8, The best discarding rate was chosen in this experiment to optimize the classification performance, since the classification effect was best when the discarding rate was 0.5.

TABLE 2 classification accuracy at different spatial dimensions

。

TABLE 3 classification accuracy at different discard rates

。

3. Experimental results

To ensure the accuracy of the experimental results, the experiment was repeated 10 times and then averaged.

In order to verify the effectiveness and superiority of the proposed method, the invention is experimentally compared with some traditional methods and mainstream deep learning methods.

The comparison method comprises a linear self-attention fusion algorithm LSAF, a CNN fusion algorithm CoupledCNN, a classification algorithm CALC based on coupling antagonism learning, a hierarchical CNN and a transducer algorithm HCT, and a network method g2 for global and local feature fusion.

The classification performance of the different methods on the Houston2013 dataset versus the experimental results are shown in table 4.

As can be seen from the results in Table 4, the OA value, AA value and Kappa coefficient value of the method provided by the invention are higher in accuracy than those of other mainstream deep learning classification methods on the Houston2013 data set.

Wherein, the OA value is 4.26% higher than LSAF, 2.07% higher than CoupledCNN, 2.02% higher than CALC, 1.89% higher than HCT, and 0.30% higher than the accuracy of the g2 classification method.

AA 3.45% higher than LSAF, 1.48% higher than CoupledCNN, 1.62% higher than CALC, 1.63% higher than HCT, 0.42% higher than g 2.

Kappa coefficient value 4.61% higher than LSAF, 2.24% higher than CoupleCNN, 2.18% higher than CALC, 2.06% higher than HCT, 0.37% higher than g 2.

All three indexes show that the method provided by the invention is superior to other methods in classification performance.

TABLE 4 Classification Properties of different methods in the Houston2013 dataset

。

In addition, the classification chart of the different methods on the Houston2013 dataset is shown in fig. 7, and it can be seen from the chart that the final classification result of the network method (g 2) of the linear self-attention fusion algorithm (LSAF), the coupled CNN fusion algorithm (CoupledCNN), the Classification Algorithm (CALC) based on coupled antagonism learning, the hierarchical CNN and the Transformer algorithm (HCT) and the global and local feature fusion all have a large number of cluttered spots, and some areas have the phenomenon of misclassification compared with the ground truth value (GT). The g2 method has good classification effect, but less clutter exists in the middle and lower positions. The classification result graph predicted by the method provided by the invention is basically and completely classified, almost no spots are seen, and the classification result graph is relatively smooth in a homogeneous region.

Therefore, the method provided by the invention does not need a complex and huge multi-stage network, the method well utilizes various directional frequency characteristics in a multi-head attention mechanism mode, carries out multi-source same-frequency fusion in a self-adaptive method, finally comprehensively extracts local and global characteristics, further weights and extracts depth characteristic information from spectrum information, achieves a more ideal classification effect, effectively improves the accuracy of multi-source remote sensing data combined classification, and is superior to a plurality of advanced classification methods.

The foregoing embodiments are merely for illustrating the technical solution of the present application, but not for limiting the same, and although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present application in essence.

Claims

1. A multi-source remote sensing data classification method, characterized by comprising:

Obtain various types of remote sensing data of the target object;

Extract shallow features of each type of remote sensing data respectively;

Inputting the shallow features into a pre-built frequency feature decomposition module to obtain a plurality of preset frequency features for each remote sensing data;

The frequency features of all remote sensing data are input into a pre-built same-frequency feature fusion module, and the same-frequency feature fusion module performs feature fusion on the same frequency features to obtain corresponding multiple same-frequency fusion features;

The multiple same-frequency fusion features are concatenated and fused to obtain multi-source fusion features;

Passing the multi-source fusion features through the superimposed frequency modulation layer and attention layer in sequence to obtain fused global features and local features;

The fused global and local features are weighted in the spectral dimension to extract depth information and obtain the predicted classification results of the target object.

2. The multi-source remote sensing data classification method according to claim 1, characterized in that the extraction of shallow features of each type of remote sensing data comprises:

Divide remote sensing data into image patches;

Perform multi-scale convolution operations on the divided patch blocks to obtain corresponding multi-scale features;

In any preset channel,

Superimposing the multiple scale features along the channel dimension to obtain channel fusion features;

Perform element summation, averaging and maximum pooling operations on the channel fusion features along the channel dimension to obtain corresponding sum feature maps, average feature maps and maximum pooling feature maps;

The sum feature map, the average feature map and the maximum pooling feature map are concatenated, and convolution is used to further perform channel fusion to obtain low-dimensional channel features of the channel;

The low-dimensional channel features of all channels are concatenated to obtain the shallow features of the remote sensing data.

3. The multi-source remote sensing data classification method according to claim 2 is characterized in that the multi-scale convolution operation includes: performing 3×3 convolution, 5×5 convolution and 7×7 convolution on the divided patch blocks, and performing batch normalization operation and ReLU operation in sequence after each convolution operation to obtain corresponding three scale features.

4. The multi-source remote sensing data classification method according to claim 1 is characterized in that the frequency feature decomposition module is constructed based on the frequency domain Transformer; the multiple preset frequency features of each shallow feature are obtained, including:

The input shallow features are passed through the convolution and normalization layers to obtain the processed shallow features ;

The shallow features after processing Averaged over the spectral dimension In the head, Divide into Equal parts, of which, is the number of frequency features, so that each equal portion is used to calculate one frequency feature; is the default value;

Each frequency feature corresponds to a preset window shape;

For any equal part, each head is first evenly divided into non-overlapping windows using the window shape corresponding to the frequency characteristics, and then the attention of each window is calculated; then the frequency characteristics of each head are obtained; finally, the frequency characteristics of each head are spliced to obtain the frequency characteristics corresponding to the equal part.

5. The multi-source remote sensing data classification method according to claim 4, characterized in that the frequency features include: low-frequency features, high-frequency features, vertical features and horizontal features;

The window shape corresponding to the low-frequency feature is: , is the window size index, Take a positive integer;

In the header used to calculate low-frequency features, each window contains token; the token refers to a minimum unit in the window;

The window shape corresponding to the high-frequency feature is: ; In the header used to calculate high-frequency features, each window contains Tokens;

The window shape corresponding to the vertical feature is: ; In the header used to calculate vertical features, each window contains Tokens;

The window shape corresponding to the horizontal feature is: ; In the header used to calculate horizontal features, each window contains tokens.

6. The multi-source remote sensing data classification method according to claim 5, characterized in that the process of obtaining the vertical features comprises:

The first The heads are evenly divided into non-overlapping Windows , is the number of windows, , The shallow features after processing length or width; , The shallow features after processing The spectral dimension of The query tensor of the head , key tensor Sum value tensor The dimension is ,in, ; ; Calculate the The attention of each window in the head; The attention calculation result of the window is:

;

In the formula, For the The attention calculation results of the windows are: To calculate the attention operation, , , Respectively The query matrix, key matrix and value matrix of each head;

According to The attention of each window in the head gets The vertical characteristics of the head are expressed as:

;

In the formula, For the The vertical characteristics of the head, , , The first, second and The attention calculation results of the window;

The vertical features of all heads used to calculate vertical features are concatenated to obtain the vertical features of shallow features. The expression is:

;

In the formula, is the vertical feature of the shallow feature, For splicing operation, , and Respectively , and The vertical characteristics of the head.

7. The multi-source remote sensing data classification method according to claim 1 is characterized in that the same-frequency component fusion module performs feature fusion on the same frequency components to obtain corresponding multiple same-frequency fusion features, including:

For any frequency component,

The frequency components from all remote sensing data are added element by element to obtain the same frequency components. ;

Will Perform global average pooling on the channel dimension, and then pass through the channel attention module to obtain the channel weight, which is expressed as:

;

In the formula, is the channel weight, is a 1×1 convolutional layer, To obtain the maximum value, The output of global average pooling in the channel dimension;

Will Global average pooling and global maximum pooling are performed in the spatial dimension, and then the spatial weight is obtained through the spatial attention module. The expression is:

;

In the formula, is the spatial weight, is a 7×7 convolutional layer, for The output of global average pooling in the spatial dimension, for The output of global maximum pooling in the spatial dimension;

According to the broadcast rule, the channel weight and the spatial weight are fused through addition operation to obtain the coarse weight;

The coarse weight and each channel of the same frequency component are rearranged through the rearrangement operation to obtain the fine weight, which is expressed as:

;

In the formula, is the fine weight, is the sigmoid function, For group convolution, the number of groups is set to the number of channels. For channel rearrangement operation, is the coarse weight;

According to the frequency components and detailed weights of all remote sensing data, combined with residual connection, a weighted summation method is adopted to obtain the same-frequency fusion features of this frequency feature.

8. The multi-source remote sensing data classification method according to claim 1 is characterized in that the frequency modulation layer and the attention layer adopt a staged architecture in which the frequency modulation layer is superimposed and the attention layer is superimposed and then connected in series, and introduces The factor controls the number of FM layers and attention layers in the total number of layers, where is the proportion of FM layers in the total number of layers.

9. The multi-source remote sensing data classification method according to claim 8, characterized in that the frequency modulation layer is used to capture local features, comprising: first applying a block-based fast Fourier transform The input features Transform to the frequency domain; then, introduce a learnable matrix to suppress or amplify all frequency components by multiplying the elements in the frequency domain to obtain the frequency modulation feature ;

Then use the inverse Fourier transform And reconstruct to obtain refined output features , the expression is: ;

;

In the formula, is the input feature, is the feature obtained through the forward propagation network, is the frequency modulation feature, is the refined output feature, is layer normalization, is a 1×1 convolutional layer, is the activation function, For block operation, is element-wise multiplication, is the learnable matrix, For block merging operations, is the fast Fourier transform, is the inverse fast Fourier transform, Operate for multi-layer perceptron;

The attention layer is used to capture global attributes or semantic features, including: first performing layer normalization and multi-head attention operations on the features of the input attention layer, then performing layer normalization and multi-layer perceptron operations in sequence, and finally outputting; wherein the multi-layer perceptron operation is used for channel mixing in the attention layer.

10. The multi-source remote sensing data classification method according to claim 1 is characterized in that the fused global features and local features are weighted in the spectral dimension to obtain deep features, and then obtain the classification results of the target object, including: first learning key information through one-dimensional convolution, then highlighting significant features through activation functions, and finally The function gets the prediction result, the expression is:

;

In the formula, is the deep feature, To predict the classification results, For full connection operation, and There are two activation functions.