Disclosure of Invention
Technical problem to be solved
Aiming at the problem of insufficient precision of the existing semantic change detection result, the invention provides a high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning.
Technical proposal
A high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning is characterized by comprising the following steps:
Step 1: constructing a simple and extensible direct semantic change detection model based on a high-resolution network to detect semantic change;
high-resolution network encoder for inputting remote sensing images I 1 and I 2 of T 1 and T 2 phases into two shared weights AndObtaining semantic context feature pairsAndWill beAndInputting a change feature extraction module to obtain a change feature f cd; for the T 1 phaseAnd f cd input feature fusion module, firstly, toAndObtaining d cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d cd For the T 2 phaseAnd f cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T 1 and T 2 And
Step 2: monitoring change detection using contrast learning loss;
Adding a contrast learning feature representation header after the change feature extraction module Inputting the change characteristic f cd into the contrast learning characteristic representing headThe resulting dense variation characteristic representation Z rep,Comprising 2 convolutional layers; the first convolution layer contains 13 x3 convolutions with 1 step and 2 padding, the convolution operation changes the number of channels of the input change characteristic f cd to 1/4, the resolution remains unchanged, namely H 0×W0 x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operations with 1 step, the number of input eigenvector channels is changed to 256, and the resolution is kept unchanged, namely, H 0×W0 ×256; since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning; sampling feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L c:
wherein, since the categories are a changed category and an unchanged category, iic ii = 2; z ca is the feature vector of the class c a anchor, Positive samples of class c, the average of feature vectors of all class c samples,Is the feature vector of the negative sample belonging to other classes of the b-th anchor point of the c-th class a; for each category in C, the anchor point and the feature vector corresponding to the negative sample of the anchor point are collected from the change feature representation Z rep, each category has A anchor points, and each anchor point has a positive sample and B negative samples; the cosine similarity between two eigenvectors is used for measuring the distance between the two eigenvectors, the range is-1 to 1, and tau=0.5 is a temperature coefficient; for each category in the current training batch, the distance between the anchor point of the category and the positive sample of the anchor point is shortened by optimizing network parameters to enable L c to be minimum, and the distance between the anchor point of the category and the negative sample of the anchor point is further shortened;
Step 3: optimizing parameters of the model by minimizing the overall loss function L, which detects loss from semantic changes of T 1 and T 2 AndAnd a contrast learning loss function L c on change detection; semantic change detection lossAndFor cross entropy loss, we describe:
Where T is the number of pixels, And AndRepresenting the truth labels and decoder, respectivelyAndA prediction probability at the t-th pixel;
The overall loss function L is described as:
And (3) carrying out back propagation after obtaining the overall loss function, optimizing by using a AdamW optimizer, and repeating iteration until the iteration number reaches a set initial value, and judging that training is completed.
The invention further adopts the technical scheme that: the step 1 is specifically as follows:
Step 1-1: for the T 1 time phase, the remote sensing image I 1 obtains a feature map with the dimension of H 0×W0 multiplied by 64 through a convolution module M 0 H 0=Hinput/4 and W 0=Winput/4, where H input and W input are the height and width of the input image I 1;
Step 1-2: will be Input high resolution network encoderHigh resolution network encoderThe system comprises 4 layers, wherein the ith layer comprises i convolution modules M i,j, i is more than or equal to 1, and j is more than or equal to 0 and less than or equal to j < i; the feature map output by the convolution module M i,j is defined asFeature mapThe resolution of (a) is denoted as H i,j×Wi,j, the number of channels is denoted as C i,j, wherein H i,j=H0/2j,Wi,j=W0/2j,Ci,j=40×2j;
When i is more than or equal to 2, outputting the i-1 layer convolution module M i-1,y After the following processing is carried out, the resolution and the channel number are unified, an addition fusion is adopted to input a convolution module M i,j of an ith layer, y epsilon [0, i-2];
when y < j, for the feature map Performing stride convolution with the step length of 2 for j-y times, wherein each stride convolution doubles the channel number of the feature map through 3X 3 convolution, and the resolution is halved;
When y=j, for the feature map Performing 3×3 convolution to extract features with unchanged channel number and resolution;
When y > j, for the feature map Performing 3×3 convolution and changing the number of channels to 40×2 j, and up-sampling using bilinear interpolation so that the resolution becomes H 0/2j×W0/2j;
after passing through all convolution modules in the high-resolution encoder, the obtained characteristic diagrams with 4 different scales k=0,1,2,3;The resolution and channel number of (a) are H 0/2k×W0/2k and 40×2 k, respectively; will beThe resolution is unified through up-sampling operation, and then splicing is carried out along the channel dimension to obtain semantic context feature pairsAnd
Step 1-3, semantic context feature pairs are obtainedAndInput change feature extraction module, pair ofAndPerforming absolute difference d cd, and performing 1×1 convolution operation on d cd for 1 time to obtain a change characteristic f cd with a dimension of H 0×W0 ×600;
step 1-4, for the T 1 phase, the And f cd, inputting a feature fusion module, wherein the feature fusion module comprises 1 splicing operation along the channel dimension and 1 multiplied by 1 convolution operation for changing the channel number; firstly, splicing along the channel dimension, and then compressing the channel number to 1/2 of the original channel number through 1 times of 1X 1 convolution operation to obtain semantic change characteristics with the dimension of H 0×W0 X600
Step 1-5, connectingInput decoderThe decoder comprises 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation; the first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, which changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H 0×W0 x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operation with 1 step, and changes the number of input feature vector channels into the number of semantic change detection categories, and the resolution is kept unchanged, namely H 0×W0 ×7; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, namely H input×Winput multiplied by 7, and finally normalizes by softmax operation to obtain the semantic change detection result of the time phase
T 2 is obtained by the same principle
The invention further adopts the technical scheme that: all convolution modules consist of a 3 x 3 convolution layer, a batch normalization layer and a modified linear element.
The invention further adopts the technical scheme that: the semi-difficult and semi-easy sampling strategy in the step 2 is to sample half of difficult-to-classify samples and half of easy-to-classify samples for each category of anchor points, and half of difficult-to-classify samples and half of easy-to-classify samples are adopted for each negative sample of the anchor points; dividing the samples difficult to classify and the samples easy to classify by a threshold delta; the binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax.
A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.
Advantageous effects
The invention provides a semantic change detection method for a high-resolution remote sensing image based on binary change detection contrast learning; a simple and extensible direct semantic change detection model based on a high-resolution network is constructed to carry out semantic change detection, and contrast learning loss is applied to the change detection as supervision, so that a difficult-to-classify sample caused by unbalanced classification is effectively mined, and the classification performance of the network to the difficult-to-classify sample is improved; the model provided by the invention has more accurate details on the change area, more complete shape and more accurate classification of semantic change categories.
The invention designs and constructs a simple and extensible direct semantic change detection model SFSCDNet based on a high-resolution network, and monitors change detection by contrast learning loss. In contrast learning, a sampling strategy of semi-difficulty and semi-easiness is adopted, so that the network is easy to converge while focusing on a sample difficult to classify. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention provides a semantic change detection method of a high-resolution remote sensing image based on binary change detection contrast learning, which is used for semantic change detection of the high-resolution remote sensing image, and solves the problem of lower result precision of the existing semantic change detection method by constructing a direct semantic change detection model based on a high-resolution network which is simple and extensible and designing contrast learning to monitor the binary change detection. The high-resolution network based simple and extensible direct semantic change detection model constructed by the invention uses two high-resolution networks sharing weights as encoders to extract semantic context features of a double-time-phase remote sensing image, then inputs the extracted double-time-phase semantic context features into a change feature extraction module to obtain a change feature, inputs the semantic context features and the change feature into a feature fusion module to obtain semantic change features for each time phase, and finally inputs the double-time-phase semantic change features into two decoders to obtain a double-time-phase semantic change detection result. In order to use contrast learning to monitor change detection, a contrast learning feature representation head is added after a change feature extraction module to provide change feature representation for contrast learning, and then the change feature representation is provided for contrast learning by a semi-difficult and semi-easy sampling strategy to sample so as to calculate a contrast learning loss function. The two-phase semantic change detection loss and the contrast learning loss on the change detection form an overall loss function for optimizing the overall model.
The method comprises the following steps:
step 1: constructing a simple and extensible direct semantic change detection model based on a high-resolution network to detect semantic change;
high-resolution network encoder for inputting remote sensing images I 1 and I 2 of T 1 and T 2 phases into two shared weights AndObtaining semantic context feature pairsAndThen willAndThe input change feature extraction module obtains a change feature f cd. For the T 1 phaseAnd f cd input feature fusion module, firstly, toAndObtaining d cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d cd For the T 2 phaseAnd f cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T 1 and T 2 AndSince the two phases are identical in the encoding and decoding phases, the following is given only the phase T 1 as an example, and the same is true for T 2;
Step 1-1: for the T 1 time phase, the remote sensing image I 1 obtains a feature map with the dimension of H 0×W0 multiplied by 64 through a convolution module M 0 Wherein H input and W input are the height and width of the input image I 1, H 0=Hinput/4 and W 0=Winput/4;
Step 1-2: will be Input high resolution network encoderHigh resolution network encoderThe system comprises 4 layers, wherein the ith layer comprises i convolution modules M i,j, i is more than or equal to 1, and j is more than or equal to 0 and less than or equal to j < i; the feature map output by the convolution module M i,j is defined asFeature mapThe resolution of (a) is denoted as H i,j×Wi,j, the number of channels is denoted as C i,j, wherein H i,j=H0/2j,Wi,j=W0/2j,Ci,j=40×2j;
When i is more than or equal to 2, outputting the i-1 layer convolution module M i-1,y After the following processing is carried out, the resolution and the channel number are unified, an addition fusion is adopted to input a convolution module M i,j of an ith layer, y epsilon [0, i-2];
when y < j, for the feature map Performing stride convolution with the step length of 2 for j-y times, wherein each stride convolution doubles the channel number of the feature map through 3X 3 convolution, and the resolution is halved;
When y=j, for the feature map Performing 3×3 convolution to extract features with unchanged channel number and resolution;
When y > j, for the feature map Performing 3×3 convolution and changing the number of channels to 40×2 j, and up-sampling using bilinear interpolation so that the resolution becomes H 0/2j×W0/2j;
after passing through all convolution modules in the high-resolution encoder, the obtained characteristic diagrams with 4 different scales k=0,1,2,3;The resolution and channel number of (a) are H 0/2k×W0/2k and 40×2 k, respectively; will beThe resolution is unified through up-sampling operation, and then splicing is carried out along the channel dimension to obtain semantic context feature pairsAnd
Step 1-3, semantic context feature pairs are obtainedAndInput change feature extraction module, pair ofAndPerforming absolute difference d cd, and performing 1×1 convolution operation on d cd for 1 time to obtain a change characteristic f cd with a dimension of H 0×W0 ×600;
step 1-4, for the T 1 phase, the And f cd, inputting a feature fusion module, wherein the feature fusion module comprises 1 splicing operation along the channel dimension and 1 multiplied by 1 convolution operation for changing the channel number. Firstly, splicing along the channel dimension, and then compressing the channel number to 1/2 of the original channel number through 1 times of 1X 1 convolution operation to obtain semantic change characteristics with the dimension of H 0×W0 X600
Step 1-5, connectingInput decoderThe decoder contains 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation. The first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, which changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H 0×W0 x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operation with 1 step, and changes the number of input feature vector channels into the number of semantic change detection categories, and the resolution is kept unchanged, namely H 0×W0 ×7; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, namely H input×Winput multiplied by 7, and finally normalizes by softmax operation to obtain the semantic change detection result of the time phase
T 2 is the same;
Preferably, all convolution modules consist of a3×3 convolution layer, a batch normalization layer, and a modified linear element.
And 2, supervising the change detection by using the contrast learning loss.
Adding a contrast learning feature representation header after the change feature extraction moduleInputting the change characteristic f cd into the contrast learning characteristic representing headThe resulting dense variation characteristic representation Z rep,Comprising 2 convolutional layers. The first convolution layer contains 13 x3 convolutions with 1 step and 2 padding, the convolution operation changes the number of channels of the input change characteristic f cd to 1/4, the resolution remains unchanged, namely H 0×W0 x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operations with 1 step, the number of input eigenvector channels is changed to 256, and the resolution is kept unchanged, namely, H 0×W0 ×256; since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning; sampling feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L c:
Wherein, since the categories are a changed category and an unchanged category, iic ii = 2.z ca is the feature vector of the class c a anchor, Positive samples of class c, the average of feature vectors of all class c samples,Is the feature vector of the negative sample belonging to the other class of the b-th of the c-th class a anchor point. For each class in C, the anchor point and the feature vector corresponding to the negative sample of the anchor point are collected from the change feature representation Z rep, each class has A anchor points, and each anchor point has one positive sample and B negative samples. Here, a=512, b=512. The term "< - > is the cosine similarity between two eigenvectors, used to measure the distance between the two eigenvectors, ranging from-1 to 1, τ=0.5 being the temperature coefficient. For each class in the current training batch, the distance between the anchor point of the class and its positive sample is pulled up by optimizing the network parameters so that L c is the smallest, while the distance between the anchor point of the class and its negative sample is pushed away.
The semi-difficult and semi-easy sampling strategy is to sample half of the difficult-to-classify samples and half of the easy-to-classify samples for each class of anchor points, and half of the difficult-to-classify samples and half of the easy-to-classify samples are used for each anchor point negative sample. Classification of difficult classification by threshold delta samples and easily classified samples. The binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax. Difficult-to-classify samples guide the network more attention to difficult-to-classify samples, but too many difficult-to-classify samples can make the network difficult to converge. Thus, by selecting one half of the difficult-to-classify samples and one half of the easy-to-classify samples, the network can be made to converge easily while focusing on the difficult-to-classify samples. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.
Step 3, optimizing parameters of the model by minimizing an overall loss function L, which detects loss from semantic changes of T 1 and T 2 AndAnd a contrast learning loss function L c on the change detection. Semantic change detection lossAndFor cross entropy loss, we describe:
Where T is the number of pixels, And AndRepresenting the truth labels and decoder, respectivelyAndPrediction probability at the t-th pixel.
The overall loss function L is described as:
And (3) carrying out back propagation after obtaining the overall loss function, optimizing by using a AdamW optimizer, and repeating iteration until the iteration number reaches a set initial value, and judging that training is completed.
Examples:
As shown in fig. 1, the method is a model of a semantic change detection method of a high-resolution remote sensing image based on binary change detection contrast learning. The method comprises a high-resolution remote sensing image semantic change detection model SFSCDNet and a contrast learning algorithm on change detection. Wherein the structure of the high resolution encoder is shown in fig. 2 and the structure of the contrast learning feature representation head is shown in fig. 3. The specific method comprises the following steps:
S1, inputting remote sensing images I 1 and I 2 of T 1 and T 2 phases into two high-resolution network encoders sharing weights AndObtaining semantic context feature pairsAndThen willAndThe input change feature extraction module obtains a change feature f cd. For the T 1 phaseAnd f cd input feature fusion module, firstly, toAndObtaining d cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d cd For the T 2 phaseAnd f cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T 1 and T 2 And
S2, adding a contrast learning characteristic representing head after the change characteristic extraction moduleInputting the change characteristic f cd into the contrast learning characteristic representing headObtaining a dense change characteristic representation Z rep, and monitoring change detection by using contrast learning loss;
S3, optimizing model parameters by using AdamW optimizer to minimize overall loss function L, and detecting loss by semantic changes of T 1 and T 2 AndAnd a contrast learning loss function L c on the change detection.
In this embodiment, the execution network of step S1 is simply referred to as SFSCDNet. The execution of steps S1-S3 will be described in further detail below in connection with the structure SFSCDNet.
In this embodiment, referring to fig. 1 and 2, step S1 inputs remote sensing images I 1 and I 2 with phase resolutions of H input×Winput ×3 of T 1 and T 2 into a shared-weight pre-trained high-resolution network encoder on ImageNetAndObtaining semantic context feature pairs with resolution H input/4×Winput/4 x 600AndVariable feature extraction module pairAndAnd d cd is carried out on the absolute difference d cd, and 1 times of 1 x1 convolution operation is carried out, so that the change characteristic f cd with the resolution of H input/4×Winput/4 x 600 is obtained. For the T 1 phaseAnd f cd, inputting the data into a feature fusion module, firstly splicing along the channel dimension, and then compressing the channel number to 1/2 by 1 times of 1X 1 convolution operation to obtain semantic change features with the resolution of H input/4×Winput/4X 600The same principle is adopted in the T 2, and the semantic change characteristics with the resolution of H input/4×Winput/4 multiplied by 600 are obtained through a characteristic fusion moduleWill beAndInput two decoders of identical structure but not sharing weightAndThe decoder contains 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation. The first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, the convolution operation changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H input/4×Winput/4 x 150, and contains 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1X 1 convolution operation with 1 step length, and the number of input characteristic vector channels is changed into the number of semantic change detection categories, namely H input/4×Winput/4X 7, and the resolution is kept unchanged; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, and finally normalizes the input feature vector by the softmax operation to obtain semantic change detection results of the phases T 1 and T 2 And
In order to monitor the change detection in the semantic change detection, step S2 adds a contrast learning feature representation head after the change feature extraction moduleMonitoring the change detection by using the contrast learning loss function, and inputting the change characteristic f cd into the contrast learning characteristic representation headA dense semantic feature representation Z rep is obtained,The structure of (2) is shown in figure 3.Comprising 2 convolution layers, the first convolution layer comprising 13 x 3 convolutions with 1 step and 2 padding, the convolution operation changing the number of channels of the variation characteristic f cd to 1/4, the resolution remaining unchanged, i.e. H 0×W0 x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer contains 1 x 1 convolution operations with 1 step, changing the number of input eigenvector channels to 256, and the resolution remains unchanged, i.e., H 0×W0 x 256. Since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning.
Step S2 samples feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z rep by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L c:
Wherein, since the categories are a changed category and an unchanged category, iic ii = 2.z ca is the feature vector of the class c a anchor, Positive samples of class c, the average of feature vectors of all class c samples,Is the feature vector of the negative sample belonging to the other class of the b-th of the c-th class a anchor point. For each class in C, the anchor point and the feature vector corresponding to the negative sample of the anchor point are collected from the change feature representation Z rep, each class has A anchor points, and each anchor point has one positive sample and B negative samples. Here, a=512, b=512. The term "< - > is the cosine similarity between two eigenvectors, used to measure the distance between the two eigenvectors, ranging from-1 to 1, τ=0.5 being the temperature coefficient. For each class in the current training batch, the distance between the anchor point of the class and its positive sample is pulled up by optimizing the network parameters so that L c is the smallest, while the distance between the anchor point of the class and its negative sample is pushed away.
The semi-difficult and semi-easy sampling strategy is to sample half of the difficult-to-classify samples and half of the easy-to-classify samples for each category of anchor points, and half of the difficult-to-classify samples and half of the easy-to-classify samples for each anchor point negative sample. Classification of difficult classification by threshold delta samples and easily classified samples. The binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax. Difficult-to-classify samples guide the network more attention to difficult-to-classify samples, but too many difficult-to-classify samples can make the network difficult to converge. Thus, by selecting one half of the difficult-to-classify samples and one half of the easy-to-classify samples, the network can be made to converge easily while focusing on the difficult-to-classify samples. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.
The overall loss function L of step S3 is described as:
Wherein semantic change detects loss AndFor cross entropy loss, we describe:
Where T is the number of pixels, And AndRepresenting the truth labels and decoder, respectivelyAndPrediction probability at the t-th pixel.
And (3) carrying out back propagation after obtaining the overall loss function, optimizing by using a AdamW optimizer, and repeating iteration until the iteration number reaches a set initial value, and judging that training is completed.
To verify the effectiveness of the method, the present embodiment uses the public dataset SECOND for training and testing of the network framework, and compares with other methods. The SECOND dataset contains 2968 sets of data, each set containing two images of different phases, each image being 512 x 512 in size, and the 2968 sets of data all contain regions of variation. According to 9:1 divides the training set and the test set.
The algorithm proposed in this example was compared with 7 most recent change detection methods, DSCD, SCDS, ICDS, changeMask, HBSCD, bi-SRNet and SCDNet, and the specific results are shown in table 1. The total number of evaluation indexes is 3, namely mIoU, seK and Score. As can be seen in combination with Table 1, the present example runs 3 evaluation indexes, reaching 73.83%, 26.37% and 40.61% on mIoU, seK and Score, all being optimal results. Compared with the second best SCDNet, the method improves mIoU by 0.77%, seK by 2.71% and Score by 2.02%. Fig. 4 is a comparison schematic diagram of three sets of semantic change detection results of the method of the embodiment and other existing methods, and it can be seen from the first row, the second row and the third row in fig. 4 that the semantic change detection effect of the method of the embodiment on the water surface and the tree is very close to the real situation, the semantic change area is completely predicted, the outline is clear, but other comparison methods have the situation of false alarm or missing detection.
Table 1 comparative table of test results for the methods of the examples of the present invention and other prior art methods
| Methods |
mIoU(%) |
Sek(%) |
Score(%) |
| DSCD |
62.45 |
10.20 |
25.88 |
| SCDS |
69.18 |
14.96 |
31.22 |
| ICDS |
71.95 |
21.83 |
36.86 |
| ChangeMask |
- |
17.89 |
- |
| HBSDC |
72.40 |
21.46 |
36.74 |
| Bi-SRNet |
73.41 |
23.22 |
38.59 |
| SCDNet |
73.06 |
23.66 |
38.59 |
| SFSCDNet + contrast learning |
73.83 |
26.37 |
40.61 |
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.