Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a multi-source remote sensing data classification method, and solves the problem that complex features cannot be fully captured by utilizing single-source remote sensing data.
In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:
The invention provides a multi-source remote sensing data classification method which comprises the steps of obtaining multiple types of remote sensing data of a target object, respectively extracting shallow features of each remote sensing data, inputting the shallow features into a pre-built frequency feature decomposition module to obtain multiple preset frequency features of each remote sensing data, inputting the frequency features of all remote sensing data into a pre-built same-frequency feature fusion module, carrying out feature fusion on the same frequency features by the same-frequency feature fusion module to obtain multiple corresponding same-frequency fusion features, splicing and fusing the multiple same-frequency fusion features to obtain multi-source fusion features, sequentially enabling the multi-source fusion features to pass through a superposed frequency modulation layer and an attention layer to obtain fused global features and local features, weighting the fused global features and local features in a spectrum dimension, extracting depth information, and further obtaining a prediction classification result of the target object.
The method comprises the steps of obtaining a plurality of scale characteristics, obtaining channel fusion characteristics, carrying out element summation, average and maximum pooling operations on the channel fusion characteristics along the channel dimension in any preset channel, obtaining a corresponding summation characteristic diagram, an average characteristic diagram and a maximum pooling characteristic diagram, carrying out splicing on the summation characteristic diagram, the average characteristic diagram and the maximum pooling characteristic diagram, carrying out further channel fusion by convolution, obtaining low-dimensional channel characteristics of the channel, and carrying out splicing on the low-dimensional channel characteristics of all channels, thus obtaining the shallow characteristics of the remote sensing data.
Further, the multi-scale convolution operation comprises the steps of respectively carrying out 3×3 convolution, 5×5 convolution and 7×7 convolution on the partitioned patch blocks, and carrying out batch normalization operation and ReLU operation after each convolution operation in sequence to obtain corresponding 3 scale features.
The frequency characteristic decomposition module is constructed based on frequency domain transform, and the obtaining of the plurality of preset frequency characteristics of each shallow characteristic comprises the steps of obtaining the processed shallow characteristics by rolling and normalizing the input shallow characteristicsAnd the processed shallow layer featuresAverage division into spectral dimensionsIn the head and willThe individual heads are divided intoAn aliquot, wherein,For the number of frequency signatures, such that each aliquot is used to calculate one frequency signature; Is a preset value.
Each frequency characteristic corresponds to a preset window form, each head is divided into non-overlapping windows by adopting the window form corresponding to the frequency characteristic for any equal part, then the attention of each window is calculated, the frequency characteristic of each head is further obtained, and finally the frequency characteristic of each head is spliced to obtain the frequency characteristic corresponding to the equal part.
Further, the frequency characteristics comprise low frequency characteristics, high frequency characteristics, vertical characteristics and horizontal characteristics, and window forms corresponding to the low frequency characteristics are as follows:; in the form of a window size index, Taking a positive integer; in a header for computing low frequency features, each window containsA token, which refers to a minimum unit in the window.
The window shape corresponding to the high-frequency characteristic is as follows: in the header for computing the high frequency characteristics, each window contains The window forms corresponding to the vertical features are as follows: in the header for computing the vertical features, each window contains The window forms corresponding to the horizontal features are as follows: in the header for computing the horizontal features, each window contains And a token.
Further, the vertical feature obtaining process comprises the steps ofThe individual heads being uniformly divided into non-overlapping partsIndividual windows,For the number of windows to be the number of windows,,For the processed shallow featuresIs longer or wider than the above;, For the processed shallow features Is defined by the spectral dimensions of (a);
First, the Query tensor for individual headerTensor of keySum tensorIs of the dimension ofWherein, the method comprises the steps of, wherein,;Calculate the firstAttention of each window in the head, whereinThe attention calculations for the individual windows are:
;
In the formula, Is the firstThe result of the attention calculations for the individual windows,In order to calculate the attention operation,,,Respectively the firstA query matrix, a key matrix, and a value matrix for each header;
According to the first Attention of each window in the head gets the firstThe vertical features of the individual head are expressed as:
;
In the formula, Is the firstThe vertical nature of the individual head is such that,,,1 St, 2 nd and 2 nd, respectivelyThe attention calculation results of the windows;
The vertical features of all heads for calculating the vertical features are spliced to obtain the vertical features of the shallow features, and the expression is:
;
In the formula, Is a vertical feature of the shallow features,In order for the splicing operation to be performed,、AndRespectively the firstFirst, secondAnd (b)Vertical features of the individual head.
Further, the same-frequency component fusion module performs feature fusion on the same frequency components to obtain a plurality of corresponding same-frequency fusion features, wherein the method comprises the steps of adding any one frequency component from all remote sensing data according to elements to obtain the same-frequency componentWill (i) beGlobal average pooling is carried out on the channel dimension, and then the channel weight is obtained through a channel attention moduleThe expression is:
;
In the formula, For the channel weight to be a function of the channel weight,Is a convolution layer of 1 x1,In order to take the maximum value it is,The output of the global average pooling is performed in the channel dimension.
Will beRespectively carrying out global average pooling and global maximum pooling on the space dimension, and then obtaining space weight through a space attention moduleThe expression is:
;
In the formula, As the spatial weight of the object to be processed,For a 7 x 7 convolutional layer,Is thatThe output of global average pooling is done in the spatial dimension,Is thatThe output of global maximum pooling is done in the spatial dimension.
According to the broadcasting rule, the channel weight is calculated by addition operationAnd spatial weightFusing to obtain coarse weightWill (i) beAndIs rearranged by a rearrangement operation, expressed as:
;
In the formula, Is a fine weight; as a function of the sigmoid, For the group convolution, the number of groups is set to the number of channels,For the channel re-arrangement operation,Is a coarse weight.
And according to the frequency components and the fine weights of all the remote sensing data, combining residual connection, and adopting a weighted summation mode to obtain the same-frequency fusion characteristic of the frequency characteristic.
Further, the frequency modulation layer and the attention layer adopt a staged architecture of superposition of the frequency modulation layer and the attention layer connected in series after superposition, and are introducedThe factor controls the number of fm and attention layers in the total number of layers, wherein,The frequency modulation layer is the ratio of the total layer number.
Further, the frequency modulation layer is used for capturing local features, including by first applying a block-based fast Fourier transformWill input featuresTransforming to frequency domain, then introducing a learnable matrix, suppressing or amplifying all frequency components by multiplication of elements in the frequency domain to obtain frequency modulation characteristicsRe-use of inverse fourier transformAnd reconstruct to obtain refined output characteristicsThe expression is:
;
;
;
In the formula, As an input feature of the frequency modulation layer,For features obtained through the forward propagation network,In order to be a frequency modulation feature,For the output characteristics of the frequency modulation layer,For the layer normalization,Is a convolution layer of 1 x1,In order to activate the function,For the block-partitioning operation of the block,For the multiplication of the elements,In order for the matrix to be a matrix to be learnable,For the block-merging operation,In the case of a fast fourier transform,In the case of an inverse fast fourier transform,Is a multi-layer perceptron operation.
The attention layer is used for capturing global attributes or semantic features, and comprises the steps of sequentially carrying out layer normalization and multi-head attention operation on input features of the attention layer, sequentially carrying out layer normalization and multi-layer perceptron operation, and finally outputting, wherein the multi-layer perceptron operation is used for mixing channels in the attention layer.
Further, weighting the fused global features and local features in the spectrum dimension to obtain depth features and further obtain the classification result of the target object, wherein the method comprises the steps of firstly learning key information through one-dimensional convolution, then highlighting the obvious features through an activation function, and finally passing throughThe function obtains a prediction result, and the expression is:
;
;
In the formula, As a feature of the depth,In order to predict the outcome of the classification,In order to operate the full-connection type of the device,AndIs two activation functions.
Compared with the prior art, the invention has the beneficial effects that:
(1) The multi-source remote sensing data classification method provided by the invention is characterized in that multi-scale convolution and channel level fusion operations are respectively used on the cube blocks of each single-source remote sensing image to realize shallow feature extraction and fusion, and then different window shapes are used for capturing different frequency features of each single-source remote sensing image by utilizing a frequency feature decomposition module established based on multi-head self-attention, and then the same-frequency feature fusion module is used for carrying out feature fusion on multi-source data to realize directional frequency feature decomposition and fusion of the multi-source data, so that multi-source fusion features extracted based on the multi-source remote sensing data are obtained as the basis for classifying target objects. In addition, the frequency modulation layer and the attention layer are combined, local and global feature learning is realized through frequency modulation, different frequency component features are obtained, depth features are extracted according to the different frequency component features, and accurate classification of the target object is completed;
(2) The multi-source remote sensing data classification method provided by the invention adopts a deep learning classification method with stronger characterization and generalization capability to extract deeper image features, learns the spatial and spectral features of the remote sensing image under a low training sample, and improves the classification precision to obtain more discrimination features so as to obtain a good classification result;
(3) The invention is introduced into The factor controls the number of the frequency modulation layers and the attention layers, and the method helps to accurately capture global features and local features by flexibly changing the number of the frequency modulation layers and the attention layers.
Detailed Description
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments.
The remote sensing technology is a technology for detecting and recognizing an object by sensing electromagnetic waves, visible light, and infrared rays Q reflected or radiated from the object at a long distance. The remote sensing satellite is provided with a relevant remote sensing sensor, electromagnetic wave information radiated or reflected by the earth or an atmospheric target is collected by the remote sensing sensor and recorded, the information is sent back to the ground by a signal start and transmission device, and a visible image, namely a satellite image which is commonly known by people, is obtained through electromagnetic wave conversion and recognition.
In the process of remote sensing digital image processing, the spatial characteristics of the ground object are mainly represented by the change of spectral characteristics. The multispectral technology is a spectrum detection technology capable of simultaneously acquiring a plurality of optical spectrum bands and expanding towards infrared light and ultraviolet light on the basis of visible light.
Example 1
The invention provides a multi-source remote sensing data classification method, as shown in figure 1, which comprises the following steps:
And step 1, acquiring various types of remote sensing data of the target object.
In a specific embodiment, two or three remote sensing images of the target object are selectively acquired according to the actual requirements of the image study.
It should be noted that, after the hyperspectral image is obtained, PCA dimension reduction processing is required to be performed on the hyperspectral image so as to extract main features in the image and reduce redundant information.
And 2, respectively extracting shallow layer characteristics of each remote sensing data, wherein the shallow layer characteristics comprise:
And step 21, performing patch block division on the remote sensing data.
Specifically, the patch block division is performed on an image to divide the image into units that are smallest one by one.
In embodiments where hyperspectral and LiDAR images are acquired, the dimension-reduced hyperspectral image is processedAnd LiDAR imageRespectively taking patch blocks one by one to obtain hyperspectral image patch blocksAnd LiDAR image patch blockWherein, the method comprises the steps of, wherein,To take the length and width of the pixel after the patch,Is the firstQuery tensor for individual headerTensor of keySum tensorIs a dimension of (c).
And step 22, performing multi-scale convolution operation on the partitioned patch blocks to obtain corresponding multi-scale features.
The multi-scale convolution operation comprises the steps of respectively carrying out different convolution operations on the divided images for a plurality of times, and sequentially carrying out batch normalization operation and ReLU operation after each convolution operation to obtain corresponding various scale features.
As shown in FIG. 2, in an embodiment in which hyperspectral and LiDAR images are acquired, a block of hyperspectral images is patchedAnd LiDAR image patch blockThe operations of 3×3 convolution+batch normalization+relu, 5×5 convolution+batch normalization+relu and 7×7 convolution+batch normalization+relu are performed respectively, and the calculation expressions are:
;
;
Wherein, For the convolution kernel size, set to 3,5,7,Is the firstThe normalization function is batched under the scale of each,Is a common function of activation and is,AndThe first of hyperspectral image and LiDAR image, respectivelyThe individual dimensions output features.
Step 23, in any preset channel, firstly, stacking the multiple scale features along the channel dimension to obtain a channel fusion feature.
Specifically, the expression of the channel fusion feature is:
;
Wherein, Is the firstThe number of channels in the channel is the same,,,Respectively the firstThree different scale output features of the channels,Is the firstThe channel fusion characteristics of the individual channels,For channel stacking operations.
In the embodiment of acquiring hyperspectral and LiDAR images, three different scale output features of the hyperspectral and LiDAR images are calculated respectively, and by using the expression, a plurality of channel fusion features corresponding to the hyperspectral images and a plurality of channel fusion features corresponding to the LiDAR images are calculated respectively in each preset channel dimension.
And then, element summation, average and maximum pooling operations are respectively carried out on each channel fusion characteristic along the channel dimension, channel level attributes are extracted, and a corresponding summation characteristic diagram, average characteristic diagram and maximum pooling characteristic diagram are obtained.
Specifically, the expression is:
;
;
;
Wherein, To perform the feature map obtained for the element summing operation,In order to perform the element summing operation,In order to perform the feature map obtained by the averaging operation,In order to perform the averaging operation,To perform the feature map obtained for the max-pooling operation,To perform the max-pooling operation.
Then, the three feature maps are spliced, and channel fusion is carried out by using 3×3 convolution, so that low-dimensional channel features are obtained.
Specifically, the expression is:
;
Wherein, Is the firstLow dimensional channel characteristics of individual channels.
In embodiments that acquire hyperspectral and LiDAR images, the hyperspectral image acquires the low-dimensional channel characteristics of each channel, and the LiDAR image also acquires the low-dimensional channel characteristics of each channel.
And step 24, splicing the low-dimensional channel characteristics of all channels to obtain shallow layer characteristics of the remote sensing data.
Specifically, the expression is:
;
Wherein, As a shallow feature of the remote sensing data,1 St, 2 nd and 2 nd, respectivelyLow dimensional channel characteristics of individual channels.
In the embodiment of acquiring hyperspectral and LiDAR images, the method is used for splicing the low-dimensional channel characteristics of each channel of the hyperspectral image to obtain the shallow layer characteristics of the hyperspectral imageSimultaneously, splicing the low-dimensional channel characteristics of all the channels of the LiDAR image to obtain the shallow layer characteristics of the LiDAR image。
The shallow features of each type of remote sensing data also comprise the data characteristics of the technology of the shallow features, as different remote sensing technologies have the advantages and disadvantages of each type of remote sensing data. Next, feature fusion is performed on the different types of remote sensing data.
And step 3, inputting the shallow features into a frequency feature decomposition module to obtain a plurality of preset frequency features of each remote sensing data.
It should be noted that the frequency characteristic decomposition module is constructed based on the frequency domain transducer and is used for extracting characteristics of different frequencies of the remote sensing data by adopting different windows, so that the remote sensing data can be analyzed more comprehensively.
Specifically, before frequency division, the shallow features input into the frequency feature decomposition module are rolled and normalized to obtain processed shallow features.
In an embodiment that acquires hyperspectral and LiDAR images, the expression is:
;
;
Wherein, For shallow features of the processed hyperspectral image,For the processed LiDAR image shallow features,Is a layer normalization.
The process of obtaining multiple frequency features from the processed shallow features includes equally dividing the processed shallow features into spectral dimensionsIndividual head and willThe individual heads are divided intoAn aliquot, wherein,For the number of frequency signatures, such that each aliquot is used to calculate one frequency signature; Is a preset value.
Each frequency characteristic corresponds to a preset window form, for any equal part, each head is uniformly divided into non-overlapping windows by adopting the window form corresponding to the frequency characteristic, then the attention of each window is calculated, the frequency characteristic of each head is further obtained, and finally the frequency characteristic of each head is spliced to obtain the frequency characteristic corresponding to the equal part.
In the embodiment where hyperspectral and LiDAR images are acquired, the preset frequency features include a low frequency feature, a high frequency feature, a vertical feature, and a horizontal feature. In these embodiments, the low frequency, high frequency, vertical, and horizontal features of the hyperspectral image, as well as the low frequency, high frequency, vertical, and horizontal features of the LiDAR image, are obtained by the methods described above.
And 4, inputting the frequency characteristics of all the remote sensing data into a same-frequency characteristic fusion module to obtain a plurality of corresponding same-frequency fusion characteristics.
The same-frequency characteristic fusion module performs characteristic fusion on the same frequency characteristics, and comprises the following steps:
Step 41, for any frequency component, adding the frequency components from all remote sensing data by elements to obtain the same frequency component Will (i) beCarrying out global average pooling on the channel dimension, and then obtaining channel weight through a channel attention module, wherein the expression is as follows:
;
In the formula, For the channel weight to be a function of the channel weight,Is a convolution layer of 1 x1,The output of the global average pooling is performed in the channel dimension.
Step 42. Will beRespectively carrying out global average pooling and global maximum pooling on the space dimension, and then obtaining space weight through a space attention moduleThe expression is:
;
In the formula, As the spatial weight of the object to be processed,For a 7 x 7 convolutional layer,Is thatThe output of global average pooling is done in the spatial dimension,Is thatThe output of global maximum pooling is done in the spatial dimension.
Step 43, weighting the channel by addition operation according to the broadcasting ruleAnd spatial weightFusing to obtain coarse weightWill (i) beAndIs rearranged by a rearrangement operation to obtain a fine weightThe expression is:
;
In the formula, For the fine weight of the weight, the weight of the weight is,As a function of the sigmoid,For the group convolution, the number of groups is set to the number of channels,For the channel re-arrangement operation,Is a coarse weight.
And step 44, according to the frequency components and the fine weights of all the remote sensing data, combining residual connection, and adopting a weighted summation mode to obtain the same-frequency fusion characteristic of the frequency characteristic.
In the embodiment of acquiring hyperspectral and LiDAR images, as shown in FIG. 4, the same-frequency feature I from the hyperspectral image and the same-frequency feature II from the LiDAR image are fused by the method to obtain the same-frequency fusion feature, wherein the same-frequency fusion feature comprises a high-frequency fusion feature, a low-frequency fusion feature, a vertical fusion feature and a horizontal fusion feature.
And 5, splicing and fusing the same-frequency fusion characteristics to obtain multi-source fusion characteristics.
In the embodiment of acquiring hyperspectral and LiDAR images, the acquired low-frequency fusion features, high-frequency fusion features, vertical fusion features and horizontal fusion features are spliced to obtain multi-source fusion features.
And 6, sequentially passing the multi-source fusion features through the overlapped frequency modulation layer and the attention layer to obtain the fused global features and local features.
Specifically, the frequency modulation layer and the attention layer are laid out in a staged architecture, i.e. a plurality of frequency modulation layers and a plurality of attention layers are respectively stacked in series, and then the two parts are connected in series.
Wherein the frequency modulation layer is used to capture local features, as shown in FIG. 5, comprising first applying a block-based fast Fourier transformWill input featuresTransforming to frequency domain, then introducing a learnable matrix, suppressing or amplifying all frequency components by multiplication of elements in the frequency domain to obtain frequency modulation characteristicsRe-use of inverse fourier transformAnd reconstruct to obtain refined output characteristicsThe expression is:
;
;
;
In the formula, As an input feature of the frequency modulation layer,For features obtained through the forward propagation network,In order to be a frequency modulation feature,For the output characteristics of the frequency modulation layer,Is a convolution layer of 1 x1,In order to activate the function,For the block-partitioning operation of the block,For the multiplication of the elements,In order for the matrix to be a matrix to be learnable,For the block-merging operation,In the case of a fast fourier transform,In the case of an inverse fast fourier transform,Is a multi-layer perceptron operation.
The attention layer is used to capture global attributes or semantic features, as shown in FIG. 6, which is a standard attention layer, including input features that will be the attention layerLayer normalization and multi-head attention operation are sequentially performed, then layer normalization and multi-layer perceptron operation are sequentially performed, and finally output is performed, wherein the expression is as follows:
;
;
Wherein, For the input features of the attention layer,For features obtained through layer normalization and multi-head attention,For the output characteristics of the attention layer,For the operation of the multi-head attention,For multi-layer perceptron operation, for channel mixing in the attention layer.
In some specific embodiments, by introducingThe factor controls the number of fm and attention layers in the total number of layers, wherein,The frequency modulation layer is the ratio of the total layer number.
It is known that the fm layer has the disadvantage of not accurately handling global properties or semantic features, while the attention layer has the disadvantage of not accurately capturing local features, where the two are combined and introducedThe factor flexibly changes the number of frequency modulation layers and attention layers, which is helpful for accurately capturing global features and local features.
Step 7, weighting the fused global features and local features in the spectrum dimension to obtain depth features, and further obtaining the prediction classification result of the target object, wherein the method comprises the steps of firstly learning key information through one-dimensional convolution, then highlighting significant features through an activation function, and finally passing throughThe function obtains a prediction result, and the expression is:
;
;
In the formula, As a feature of the depth,In order to predict the outcome of the classification,In order to operate the full-connection type of the device,AndIs two activation functions.
Example 2
On the basis of embodiment 1, this embodiment provides a process of obtaining frequency characteristics of two remote sensing data by using a frequency characteristic decomposition module in the embodiment of obtaining hyperspectral and LiDAR images.
In this embodiment, as shown in FIG. 3, the frequency characteristics include a low frequency characteristic, a high frequency characteristic, a vertical characteristic, and a horizontal characteristic.
Specifically, the window shape corresponding to the low frequency characteristic is: the window shape corresponding to the high-frequency characteristic is as follows: the window shape corresponding to the vertical feature is as follows: The window forms corresponding to the horizontal features are as follows: Wherein, the method comprises the steps of, wherein, In the form of a window size index,Taking a positive integer.
It should be noted that the number of the components,Is a window size index, characterizes the size of the window,The value determines the number of windows obtained by the division.
First, shallow features of the processed hyperspectral imageAnd processed LiDAR image shallow featuresAverage division in spectral dimensionIndividual head and willThe individual heads were divided into 4 equal parts.
First part of the first equal part) For calculating low frequency characteristics byWindow morphology of the first aliquot, each window comprisingA token, wherein the token refers to a minimum unit in a window.
First part of the second equal part) For calculating high frequency characteristics byWindow morphology of the second aliquot, each window comprisingAnd a token.
First part of the third equal part) For computing vertical features, usingDividing each header in the third aliquot into windows, each window comprisingAnd a token.
First part of the fourth equal part) For calculating horizontal features, usingWindow morphology of the first aliquot, each window comprisingAnd a token.
The following is a processed LiDAR image shallow layer featureThe vertical feature of the third equal part is calculated by taking the example, and a specific calculation process is shown, wherein the method comprises the following steps:
S1, by Will be the window form of (2)The individual heads being uniformly divided into non-overlapping partsIndividual windows,;;For shallow features of processed LiDAR imagesIs defined, the spectral dimensions of (a) are defined.
First, theThe query tensor, key tensor and value tensor of the individual header have dimensions ofWherein, the method comprises the steps of,。
S2, calculating the firstAttention of each window in the head, whereinThe attention calculations for the individual windows are:
;
In the formula, Is the firstThe result of the attention calculations for the individual windows,In order to calculate the attention operation,,,Respectively the firstA query matrix, a key matrix, and a value matrix for each header.
S3 according to the firstAttention of each window in the head gets the firstVertical features of individual head, 1The vertical features of the individual head are:
;
In the formula, Is the firstThe vertical nature of the individual head is such that,,,1 St, 2 nd and 2 nd, respectivelyThe attention calculation results of the windows;
s4, splicing vertical features of all heads in the third equal part to obtain vertical features of the processed LiDAR image shallow features, wherein the expression is as follows:
;
In the formula, For vertical features of the processed LiDAR image shallow features,In order for the splicing operation to be performed,、AndRespectively the firstFirst, secondAnd (b)Vertical features of the individual head.
It should be noted that the process of calculating the low frequency feature, the high frequency feature, and the horizontal feature is similar to the above method of calculating the vertical feature.
The frequency characteristic decomposition module in the invention adopts the method to extract different frequency characteristics respectively in different window postures, cuts in from the angle of frequency domain decomposition, and well utilizes various directional frequency characteristics to enhance the accuracy of classification results by a multi-head attention mechanism mode.
Example 3
The present example provides an experimental procedure to verify the classification effect of the proposed method.
The experimental hardware platform is a high-performance computer, and is configured as Intel Core i9-11900K, the CPU operation speed is 3.50GHz, the eight-Core processor, the memory is 32G, the graphics card is NVIDIA GTX3070Ti, the software platform is Python3.8 in Windows11 system environment, and the proposed method is realized in PyTorch frames.
1. Experimental data and sample partitioning
To evaluate the classification effect of the proposed method, the Houston2013 dataset was selected to verify the performance of the proposed method. The Houston2013 dataset is collected by a national onboard laser mapping center of a university campus of Houston to obtain LiDAR data of the region based on HSI and DSM. The spatial resolution of the HSI and LiDAR data was 2.5m, containing 349×1905 pixels in total. The HSI has 144 bands ranging from 0.38 nm to 1.05 nm.
The dataset was marked as 15029 ground truth samples, and samples selected from 15 categories, the size of each category sample, and the partitioning of the number of experimental training samples and test set samples are shown in table 1.
The classification accuracy evaluation index of the hyperspectral image adopts three common evaluation indexes of overall classification accuracy (OA), average classification accuracy (AA) and Kappa coefficient to measure the classification accuracy.
TABLE 1 training set and test set sample numbers for Houston2013 dataset
。
2. Parameter setting
In the experiment, three parameters of learning rate, space size and discarding rate can have significant influence on the experiment. Taking the Houston2013 dataset as an example, the experimental parameters were evaluated in detail.
1) Learning rate-in experiments, higher learning rates may lead to rapid model convergence but may also cause training instability and even loss function divergence, while lower learning rates generally make convergence more stable but training process slower. In addition, the learning rate also affects the ability of the model to escape from the locally optimal solution, and too much learning rate may cause concussion to fail to converge, while too little learning rate may stagnate in the locally optimal solution. Thus, the experiment selects different learning rates to test the influence of the learning rate, respectively, and the selected learning rates include 0.01,0.005,0.001,0.0007,0.0005,0.0003,0.0001,0.00007,0.00005,0.00003 and 0.00001. Experimental results show that the classification effect is best when the learning rate is 0.0001.
2) Space size-because of the extraction of image space features, the size of the spatial domain area is severely dependent. And the larger space input provides more opportunities to learn more space features, the smaller space size can capture more detailed local features, the sensitivity of the model to fine objects and changes is enhanced, the model is suitable for identifying complex ground object types, however, the too small space size can lead to sparse information and increase noise interference so as to influence classification accuracy, and conversely, the larger space size can extract more abundant context information, enhance the learning of global features, is suitable for classification of larger areas, but can ignore details and lead to class confusion. Therefore, it is very important to select a proper space size to improve the classification performance, and in the case that the number of spectrum channels is fixed, the optimal learning rate, the batch size is 64, and the training iteration number is 100, the classification precision results under different space sizes are shown in table 2.
3) Discarding rate:
as can be seen from Table 3, when the spatial size of the input data is 8 8, The best discarding rate was chosen in this experiment to optimize the classification performance, since the classification effect was best when the discarding rate was 0.5.
TABLE 2 classification accuracy at different spatial dimensions
。
TABLE 3 classification accuracy at different discard rates
。
3. Experimental results
To ensure the accuracy of the experimental results, the experiment was repeated 10 times and then averaged.
In order to verify the effectiveness and superiority of the proposed method, the invention is experimentally compared with some traditional methods and mainstream deep learning methods.
The comparison method comprises a linear self-attention fusion algorithm LSAF, a CNN fusion algorithm CoupledCNN, a classification algorithm CALC based on coupling antagonism learning, a hierarchical CNN and a transducer algorithm HCT, and a network method g2 for global and local feature fusion.
The classification performance of the different methods on the Houston2013 dataset versus the experimental results are shown in table 4.
As can be seen from the results in Table 4, the OA value, AA value and Kappa coefficient value of the method provided by the invention are higher in accuracy than those of other mainstream deep learning classification methods on the Houston2013 data set.
Wherein, the OA value is 4.26% higher than LSAF, 2.07% higher than CoupledCNN, 2.02% higher than CALC, 1.89% higher than HCT, and 0.30% higher than the accuracy of the g2 classification method.
AA 3.45% higher than LSAF, 1.48% higher than CoupledCNN, 1.62% higher than CALC, 1.63% higher than HCT, 0.42% higher than g 2.
Kappa coefficient value 4.61% higher than LSAF, 2.24% higher than CoupleCNN, 2.18% higher than CALC, 2.06% higher than HCT, 0.37% higher than g 2.
All three indexes show that the method provided by the invention is superior to other methods in classification performance.
TABLE 4 Classification Properties of different methods in the Houston2013 dataset
。
In addition, the classification chart of the different methods on the Houston2013 dataset is shown in fig. 7, and it can be seen from the chart that the final classification result of the network method (g 2) of the linear self-attention fusion algorithm (LSAF), the coupled CNN fusion algorithm (CoupledCNN), the Classification Algorithm (CALC) based on coupled antagonism learning, the hierarchical CNN and the Transformer algorithm (HCT) and the global and local feature fusion all have a large number of cluttered spots, and some areas have the phenomenon of misclassification compared with the ground truth value (GT). The g2 method has good classification effect, but less clutter exists in the middle and lower positions. The classification result graph predicted by the method provided by the invention is basically and completely classified, almost no spots are seen, and the classification result graph is relatively smooth in a homogeneous region.
Therefore, the method provided by the invention does not need a complex and huge multi-stage network, the method well utilizes various directional frequency characteristics in a multi-head attention mechanism mode, carries out multi-source same-frequency fusion in a self-adaptive method, finally comprehensively extracts local and global characteristics, further weights and extracts depth characteristic information from spectrum information, achieves a more ideal classification effect, effectively improves the accuracy of multi-source remote sensing data combined classification, and is superior to a plurality of advanced classification methods.
The foregoing embodiments are merely for illustrating the technical solution of the present application, but not for limiting the same, and although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present application in essence.