CN116524346B

CN116524346B - Semantic change detection method for high-resolution remote sensing images based on contrastive learning for binary change detection

Info

Publication number: CN116524346B
Application number: CN202310203834.1A
Authority: CN
Inventors: 张艳宁; 张秀伟; 杨一哲; 于雷; 田牧; 安博远; 邢颖慧
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-03-06
Filing date: 2023-03-06
Publication date: 2024-08-09
Anticipated expiration: 2043-03-06
Also published as: CN116524346A

Abstract

The present invention relates to a semantic change detection method for high-resolution remote sensing images based on binary change detection contrast learning; a simple and scalable direct semantic change detection model based on a high-resolution network is constructed to perform semantic change detection, and contrast learning loss is used as supervision in change detection to effectively mine difficult-to-classify samples caused by class imbalance, thereby improving the classification performance of the network for difficult-to-classify samples; in contrast learning, a semi-difficult and semi-easy sampling strategy is adopted to make the network easy to converge while paying attention to difficult-to-classify samples. Through the semi-difficult and semi-easy sampling strategy, contrast loss can guide the network to provide appropriate supervision for change detection, especially to pay more attention to samples that are difficult to correctly classify into change classes or unchanged classes, thereby improving the overall semantic change performance of the model. The present invention has more precise details and more complete shapes in the change area, and at the same time, the classification of semantic change categories is more accurate.

Description

High-resolution remote sensing image semantic change detection method based on binary change detection contrast learning

Technical Field

The invention belongs to the technical field of remote sensing image processing, and particularly relates to a high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning.

Background

Semantic change detection is a key and challenging task in remote sensing image interpretation, and is a method for detecting, positioning and analyzing earth surface land coverage type 'from-to' semantic change by using multi-temporal remote sensing images of the same geographic position, and the method plays an important role in the fields of urban planning, environment monitoring, disaster assessment and the like.

In recent years, with the rapid development of deep learning technology and the generation of a large number of multi-temporal high-resolution remote sensing images, a semantic change detection method based on deep learning has made great progress. Compared with the traditional semantic change detection method, the semantic change detection performance is obviously improved. In order to solve the problem of semantic segmentation and change detection at the same time, the existing methods based on deep learning are mostly focused on designing a proper network structure to effectively encode and integrate semantic context information and change information. For example, dault et al explored and compared 4 common deep learning based semantic change detection network structures in the literature "Multitask learning for large-SCALE SEMANTIC CHANGE detection," respectively: and comparing the two-time-phase semantic segmentation results to obtain a semantic change detection result, directly carrying out semantic change detection, respectively carrying out semantic segmentation and change detection, and providing semantic segmentation information for the change detection in a decoder stage. Peng et al propose SCDNet in literature "Scdnet:Anovel convolutional network for semantic change detection in high resolution optical remote sensing imagery.", which uses Resnet sharing weights to extract semantic context information and change information in the encoder stage, and integrates the semantic context information and change information in the decoder stage to directly perform semantic change detection. Zheng et al in literature "Changemask:Deep multi-task encoder-transformer-decoder architecture for semantic change detection." decouples SCD into two semantic segmentation tasks and one change detection task, then uses EFFICIENTNET as encoder, unet++ as decoder for semantic segmentation, transducer extracts change information, unet ++ as decoder for change detection, and finally generates semantic change detection results from semantic segmentation results and change detection results.

However, the existing semantic change detection method for deep learning has some problems: 1. in the existing semantic change detection method, a part of methods divide a semantic change detection task into a semantic segmentation task and a change detection task, a final semantic change detection result is obtained by integrating semantic segmentation and change detection results, the method occupies a large amount of memory and calculation resources, and the other part of methods directly carry out semantic change detection, but can not integrate semantic segmentation and change detection information while keeping the simplicity, high efficiency and expandability of a model, so that it is very necessary to design a simple and expandable network to directly carry out semantic change detection; 2. when the semantic change detection is directly carried out, the effect of binary change detection can greatly influence the effect of integral semantic change detection, so that the addition of extra supervision for the change detection is very necessary when the semantic change detection is carried out; 3. the high resolution semantic change detection dataset has serious class imbalance problems on unchanged classes and changed classes, specifically, the number of samples of unchanged classes accounts for more than 90% of the total number of samples, which is far greater than the number of samples of changed classes. During training, the network is dominated by the high-duty unchanged class, resulting in poor performance in the low-duty changed class, so that during training, a large number of samples difficult to classify correctly appear, and how to pay attention to and utilize the difficult samples is important and difficult for the network to classify them correctly.

Disclosure of Invention

Technical problem to be solved

Aiming at the problem of insufficient precision of the existing semantic change detection result, the invention provides a high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning.

Technical proposal

A high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning is characterized by comprising the following steps:

Step 1: constructing a simple and extensible direct semantic change detection model based on a high-resolution network to detect semantic change;

high-resolution network encoder for inputting remote sensing images I ¹ and I ² of T ¹ and T ² phases into two shared weights AndObtaining semantic context feature pairsAndWill beAndInputting a change feature extraction module to obtain a change feature f _cd; for the T ¹ phaseAnd f _cd input feature fusion module, firstly, toAndObtaining d _cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d _cd For the T ² phaseAnd f _cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T ¹ and T ² And

Step 2: monitoring change detection using contrast learning loss;

Adding a contrast learning feature representation header after the change feature extraction module Inputting the change characteristic f _cd into the contrast learning characteristic representing headThe resulting dense variation characteristic representation Z _rep,Comprising 2 convolutional layers; the first convolution layer contains 13 x3 convolutions with 1 step and 2 padding, the convolution operation changes the number of channels of the input change characteristic f _cd to 1/4, the resolution remains unchanged, namely H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operations with 1 step, the number of input eigenvector channels is changed to 256, and the resolution is kept unchanged, namely, H ₀×W₀ ×256; since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning; sampling feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L _c:

wherein, since the categories are a changed category and an unchanged category, iic ii = 2; z _ca is the feature vector of the class c a anchor, Positive samples of class c, the average of feature vectors of all class c samples,Is the feature vector of the negative sample belonging to other classes of the b-th anchor point of the c-th class a; for each category in C, the anchor point and the feature vector corresponding to the negative sample of the anchor point are collected from the change feature representation Z _rep, each category has A anchor points, and each anchor point has a positive sample and B negative samples; the cosine similarity between two eigenvectors is used for measuring the distance between the two eigenvectors, the range is-1 to 1, and tau=0.5 is a temperature coefficient; for each category in the current training batch, the distance between the anchor point of the category and the positive sample of the anchor point is shortened by optimizing network parameters to enable L _c to be minimum, and the distance between the anchor point of the category and the negative sample of the anchor point is further shortened;

Step 3: optimizing parameters of the model by minimizing the overall loss function L, which detects loss from semantic changes of T ¹ and T ² AndAnd a contrast learning loss function L _c on change detection; semantic change detection lossAndFor cross entropy loss, we describe:

Where T is the number of pixels, And AndRepresenting the truth labels and decoder, respectivelyAndA prediction probability at the t-th pixel;

The overall loss function L is described as:

And (3) carrying out back propagation after obtaining the overall loss function, optimizing by using a AdamW optimizer, and repeating iteration until the iteration number reaches a set initial value, and judging that training is completed.

The invention further adopts the technical scheme that: the step 1 is specifically as follows:

Step 1-1: for the T ¹ time phase, the remote sensing image I ¹ obtains a feature map with the dimension of H ₀×W₀ multiplied by 64 through a convolution module M ₀ H ₀＝H_input/4 and W ₀＝W_input/4, where H _input and W _input are the height and width of the input image I ¹;

Step 1-2: will be Input high resolution network encoderHigh resolution network encoderThe system comprises 4 layers, wherein the ith layer comprises i convolution modules M _i,j, i is more than or equal to 1, and j is more than or equal to 0 and less than or equal to j < i; the feature map output by the convolution module M _i,j is defined asFeature mapThe resolution of (a) is denoted as H _i,j×W_i,j, the number of channels is denoted as C _i,j, wherein H _i,j＝H₀/2^j,W_i,j＝W₀/2^j,C_i,j＝40×2^j;

When i is more than or equal to 2, outputting the i-1 layer convolution module M _i-1,y After the following processing is carried out, the resolution and the channel number are unified, an addition fusion is adopted to input a convolution module M _i,j of an ith layer, y epsilon [0, i-2];

when y < j, for the feature map Performing stride convolution with the step length of 2 for j-y times, wherein each stride convolution doubles the channel number of the feature map through 3X 3 convolution, and the resolution is halved;

When y=j, for the feature map Performing 3×3 convolution to extract features with unchanged channel number and resolution;

When y > j, for the feature map Performing 3×3 convolution and changing the number of channels to 40×2 ^j, and up-sampling using bilinear interpolation so that the resolution becomes H ₀/2^j×W₀/2^j;

after passing through all convolution modules in the high-resolution encoder, the obtained characteristic diagrams with 4 different scales k＝0,1,2,3；The resolution and channel number of (a) are H ₀/2^k×W₀/2^k and 40×2 ^k, respectively; will beThe resolution is unified through up-sampling operation, and then splicing is carried out along the channel dimension to obtain semantic context feature pairsAnd

Step 1-3, semantic context feature pairs are obtainedAndInput change feature extraction module, pair ofAndPerforming absolute difference d _cd, and performing 1×1 convolution operation on d _cd for 1 time to obtain a change characteristic f _cd with a dimension of H ₀×W₀ ×600;

step 1-4, for the T ¹ phase, the And f _cd, inputting a feature fusion module, wherein the feature fusion module comprises 1 splicing operation along the channel dimension and 1 multiplied by 1 convolution operation for changing the channel number; firstly, splicing along the channel dimension, and then compressing the channel number to 1/2 of the original channel number through 1 times of 1X 1 convolution operation to obtain semantic change characteristics with the dimension of H ₀×W₀ X600

Step 1-5, connectingInput decoderThe decoder comprises 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation; the first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, which changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operation with 1 step, and changes the number of input feature vector channels into the number of semantic change detection categories, and the resolution is kept unchanged, namely H ₀×W₀ ×7; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, namely H _input×W_input multiplied by 7, and finally normalizes by softmax operation to obtain the semantic change detection result of the time phase

T ² is obtained by the same principle

The invention further adopts the technical scheme that: all convolution modules consist of a 3 x 3 convolution layer, a batch normalization layer and a modified linear element.

The invention further adopts the technical scheme that: the semi-difficult and semi-easy sampling strategy in the step 2 is to sample half of difficult-to-classify samples and half of easy-to-classify samples for each category of anchor points, and half of difficult-to-classify samples and half of easy-to-classify samples are adopted for each negative sample of the anchor points; dividing the samples difficult to classify and the samples easy to classify by a threshold delta; the binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax.

A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

A computer readable storage medium, characterized by storing computer executable instructions that when executed are configured to implement the method described above.

Advantageous effects

The invention provides a semantic change detection method for a high-resolution remote sensing image based on binary change detection contrast learning; a simple and extensible direct semantic change detection model based on a high-resolution network is constructed to carry out semantic change detection, and contrast learning loss is applied to the change detection as supervision, so that a difficult-to-classify sample caused by unbalanced classification is effectively mined, and the classification performance of the network to the difficult-to-classify sample is improved; the model provided by the invention has more accurate details on the change area, more complete shape and more accurate classification of semantic change categories.

The invention designs and constructs a simple and extensible direct semantic change detection model SFSCDNet based on a high-resolution network, and monitors change detection by contrast learning loss. In contrast learning, a sampling strategy of semi-difficulty and semi-easiness is adopted, so that the network is easy to converge while focusing on a sample difficult to classify. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, like reference numerals being used to refer to like parts throughout the several views.

Fig. 1 is a network configuration diagram of a method according to an embodiment of the present invention.

Fig. 2 is a network configuration diagram of the high resolution encoder of the present invention.

FIG. 3 is a diagram of a head structure of a comparison learning feature representation in a network model according to an embodiment of the present invention.

FIG. 4 is a graph comparing semantic change detection results of the method of the embodiment of the present invention with other prior methods.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention provides a semantic change detection method of a high-resolution remote sensing image based on binary change detection contrast learning, which is used for semantic change detection of the high-resolution remote sensing image, and solves the problem of lower result precision of the existing semantic change detection method by constructing a direct semantic change detection model based on a high-resolution network which is simple and extensible and designing contrast learning to monitor the binary change detection. The high-resolution network based simple and extensible direct semantic change detection model constructed by the invention uses two high-resolution networks sharing weights as encoders to extract semantic context features of a double-time-phase remote sensing image, then inputs the extracted double-time-phase semantic context features into a change feature extraction module to obtain a change feature, inputs the semantic context features and the change feature into a feature fusion module to obtain semantic change features for each time phase, and finally inputs the double-time-phase semantic change features into two decoders to obtain a double-time-phase semantic change detection result. In order to use contrast learning to monitor change detection, a contrast learning feature representation head is added after a change feature extraction module to provide change feature representation for contrast learning, and then the change feature representation is provided for contrast learning by a semi-difficult and semi-easy sampling strategy to sample so as to calculate a contrast learning loss function. The two-phase semantic change detection loss and the contrast learning loss on the change detection form an overall loss function for optimizing the overall model.

The method comprises the following steps:

high-resolution network encoder for inputting remote sensing images I ¹ and I ² of T ¹ and T ² phases into two shared weights AndObtaining semantic context feature pairsAndThen willAndThe input change feature extraction module obtains a change feature f _cd. For the T ¹ phaseAnd f _cd input feature fusion module, firstly, toAndObtaining d _cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d _cd For the T ² phaseAnd f _cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T ¹ and T ² AndSince the two phases are identical in the encoding and decoding phases, the following is given only the phase T ¹ as an example, and the same is true for T ²;

Step 1-1: for the T ¹ time phase, the remote sensing image I ¹ obtains a feature map with the dimension of H ₀×W₀ multiplied by 64 through a convolution module M ₀ Wherein H _input and W _input are the height and width of the input image I ¹, H ₀＝H_input/4 and W ₀＝W_input/4;

step 1-4, for the T ¹ phase, the And f _cd, inputting a feature fusion module, wherein the feature fusion module comprises 1 splicing operation along the channel dimension and 1 multiplied by 1 convolution operation for changing the channel number. Firstly, splicing along the channel dimension, and then compressing the channel number to 1/2 of the original channel number through 1 times of 1X 1 convolution operation to obtain semantic change characteristics with the dimension of H ₀×W₀ X600

Step 1-5, connectingInput decoderThe decoder contains 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation. The first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, which changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operation with 1 step, and changes the number of input feature vector channels into the number of semantic change detection categories, and the resolution is kept unchanged, namely H ₀×W₀ ×7; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, namely H _input×W_input multiplied by 7, and finally normalizes by softmax operation to obtain the semantic change detection result of the time phase

T ² is the same;

Preferably, all convolution modules consist of a3×3 convolution layer, a batch normalization layer, and a modified linear element.

And 2, supervising the change detection by using the contrast learning loss.

Adding a contrast learning feature representation header after the change feature extraction moduleInputting the change characteristic f _cd into the contrast learning characteristic representing headThe resulting dense variation characteristic representation Z _rep,Comprising 2 convolutional layers. The first convolution layer contains 13 x3 convolutions with 1 step and 2 padding, the convolution operation changes the number of channels of the input change characteristic f _cd to 1/4, the resolution remains unchanged, namely H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operations with 1 step, the number of input eigenvector channels is changed to 256, and the resolution is kept unchanged, namely, H ₀×W₀ ×256; since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning; sampling feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L _c:

Wherein, since the categories are a changed category and an unchanged category, iic ii = 2.z _ca is the feature vector of the class c a anchor, Positive samples of class c, the average of feature vectors of all class c samples,Is the feature vector of the negative sample belonging to the other class of the b-th of the c-th class a anchor point. For each class in C, the anchor point and the feature vector corresponding to the negative sample of the anchor point are collected from the change feature representation Z _rep, each class has A anchor points, and each anchor point has one positive sample and B negative samples. Here, a=512, b=512. The term "< - > is the cosine similarity between two eigenvectors, used to measure the distance between the two eigenvectors, ranging from-1 to 1, τ=0.5 being the temperature coefficient. For each class in the current training batch, the distance between the anchor point of the class and its positive sample is pulled up by optimizing the network parameters so that L _c is the smallest, while the distance between the anchor point of the class and its negative sample is pushed away.

The semi-difficult and semi-easy sampling strategy is to sample half of the difficult-to-classify samples and half of the easy-to-classify samples for each class of anchor points, and half of the difficult-to-classify samples and half of the easy-to-classify samples are used for each anchor point negative sample. Classification of difficult classification by threshold delta samples and easily classified samples. The binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax. Difficult-to-classify samples guide the network more attention to difficult-to-classify samples, but too many difficult-to-classify samples can make the network difficult to converge. Thus, by selecting one half of the difficult-to-classify samples and one half of the easy-to-classify samples, the network can be made to converge easily while focusing on the difficult-to-classify samples. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.

Step 3, optimizing parameters of the model by minimizing an overall loss function L, which detects loss from semantic changes of T ¹ and T ² AndAnd a contrast learning loss function L _c on the change detection. Semantic change detection lossAndFor cross entropy loss, we describe:

Where T is the number of pixels, And AndRepresenting the truth labels and decoder, respectivelyAndPrediction probability at the t-th pixel.

The overall loss function L is described as:

Examples:

As shown in fig. 1, the method is a model of a semantic change detection method of a high-resolution remote sensing image based on binary change detection contrast learning. The method comprises a high-resolution remote sensing image semantic change detection model SFSCDNet and a contrast learning algorithm on change detection. Wherein the structure of the high resolution encoder is shown in fig. 2 and the structure of the contrast learning feature representation head is shown in fig. 3. The specific method comprises the following steps:

S1, inputting remote sensing images I ¹ and I ² of T ¹ and T ² phases into two high-resolution network encoders sharing weights AndObtaining semantic context feature pairsAndThen willAndThe input change feature extraction module obtains a change feature f _cd. For the T ¹ phaseAnd f _cd input feature fusion module, firstly, toAndObtaining d _cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d _cd For the T ² phaseAnd f _cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T ¹ and T ² And

S2, adding a contrast learning characteristic representing head after the change characteristic extraction moduleInputting the change characteristic f _cd into the contrast learning characteristic representing headObtaining a dense change characteristic representation Z _rep, and monitoring change detection by using contrast learning loss;

S3, optimizing model parameters by using AdamW optimizer to minimize overall loss function L, and detecting loss by semantic changes of T ¹ and T ² AndAnd a contrast learning loss function L _c on the change detection.

In this embodiment, the execution network of step S1 is simply referred to as SFSCDNet. The execution of steps S1-S3 will be described in further detail below in connection with the structure SFSCDNet.

In this embodiment, referring to fig. 1 and 2, step S1 inputs remote sensing images I ¹ and I ² with phase resolutions of H _input×W_input ×3 of T ¹ and T ² into a shared-weight pre-trained high-resolution network encoder on ImageNetAndObtaining semantic context feature pairs with resolution H _input/4×W_input/4 x 600AndVariable feature extraction module pairAndAnd d _cd is carried out on the absolute difference d _cd, and 1 times of 1 x1 convolution operation is carried out, so that the change characteristic f _cd with the resolution of H _input/4×W_input/4 x 600 is obtained. For the T ¹ phaseAnd f _cd, inputting the data into a feature fusion module, firstly splicing along the channel dimension, and then compressing the channel number to 1/2 by 1 times of 1X 1 convolution operation to obtain semantic change features with the resolution of H _input/4×W_input/4X 600The same principle is adopted in the T ², and the semantic change characteristics with the resolution of H _input/4×W_input/4 multiplied by 600 are obtained through a characteristic fusion moduleWill beAndInput two decoders of identical structure but not sharing weightAndThe decoder contains 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation. The first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, the convolution operation changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H _input/4×W_input/4 x 150, and contains 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1X 1 convolution operation with 1 step length, and the number of input characteristic vector channels is changed into the number of semantic change detection categories, namely H _input/4×W_input/4X 7, and the resolution is kept unchanged; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, and finally normalizes the input feature vector by the softmax operation to obtain semantic change detection results of the phases T ¹ and T ² And

In order to monitor the change detection in the semantic change detection, step S2 adds a contrast learning feature representation head after the change feature extraction moduleMonitoring the change detection by using the contrast learning loss function, and inputting the change characteristic f _cd into the contrast learning characteristic representation headA dense semantic feature representation Z _rep is obtained,The structure of (2) is shown in figure 3.Comprising 2 convolution layers, the first convolution layer comprising 13 x 3 convolutions with 1 step and 2 padding, the convolution operation changing the number of channels of the variation characteristic f _cd to 1/4, the resolution remaining unchanged, i.e. H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer contains 1 x 1 convolution operations with 1 step, changing the number of input eigenvector channels to 256, and the resolution remains unchanged, i.e., H ₀×W₀ x 256. Since contrast loss provides supervision only during training, contrast learning features indicate that the head is removed during reasoning.

Step S2 samples feature vectors corresponding to samples of the changed category and the unchanged category in the changed feature representation Z _rep by using a semi-difficult and semi-easy sampling strategy to calculate a contrast learning loss function L _c:

The semi-difficult and semi-easy sampling strategy is to sample half of the difficult-to-classify samples and half of the easy-to-classify samples for each category of anchor points, and half of the difficult-to-classify samples and half of the easy-to-classify samples for each anchor point negative sample. Classification of difficult classification by threshold delta samples and easily classified samples. The binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax. Difficult-to-classify samples guide the network more attention to difficult-to-classify samples, but too many difficult-to-classify samples can make the network difficult to converge. Thus, by selecting one half of the difficult-to-classify samples and one half of the easy-to-classify samples, the network can be made to converge easily while focusing on the difficult-to-classify samples. Through a semi-difficult and semi-easy sampling strategy, the comparison loss can guide the network to provide proper supervision for the change detection, and particularly, samples which are difficult to be correctly divided into changed types or unchanged types are more concerned, so that the overall semantic change performance of the model is improved.

The overall loss function L of step S3 is described as:

Wherein semantic change detects loss AndFor cross entropy loss, we describe:

To verify the effectiveness of the method, the present embodiment uses the public dataset SECOND for training and testing of the network framework, and compares with other methods. The SECOND dataset contains 2968 sets of data, each set containing two images of different phases, each image being 512 x 512 in size, and the 2968 sets of data all contain regions of variation. According to 9:1 divides the training set and the test set.

The algorithm proposed in this example was compared with 7 most recent change detection methods, DSCD, SCDS, ICDS, changeMask, HBSCD, bi-SRNet and SCDNet, and the specific results are shown in table 1. The total number of evaluation indexes is 3, namely mIoU, seK and Score. As can be seen in combination with Table 1, the present example runs 3 evaluation indexes, reaching 73.83%, 26.37% and 40.61% on mIoU, seK and Score, all being optimal results. Compared with the second best SCDNet, the method improves mIoU by 0.77%, seK by 2.71% and Score by 2.02%. Fig. 4 is a comparison schematic diagram of three sets of semantic change detection results of the method of the embodiment and other existing methods, and it can be seen from the first row, the second row and the third row in fig. 4 that the semantic change detection effect of the method of the embodiment on the water surface and the tree is very close to the real situation, the semantic change area is completely predicted, the outline is clear, but other comparison methods have the situation of false alarm or missing detection.

Table 1 comparative table of test results for the methods of the examples of the present invention and other prior art methods

Methods	mIoU(％)	Sek(％)	Score(％)
				DSCD	62.45	10.20	25.88
SCDS	69.18	14.96	31.22
				ICDS	71.95	21.83	36.86
ChangeMask	-	17.89	-
				HBSDC	72.40	21.46	36.74
Bi-SRNet	73.41	23.22	38.59
				SCDNet	73.06	23.66	38.59
SFSCDNet + contrast learning	73.83	26.37	40.61

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made without departing from the spirit and scope of the invention.

Claims

1. A high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning is characterized by comprising the following steps:

high-resolution network encoder for inputting remote sensing images I ¹ and I ² of T ¹ and T ² phases into two shared weights AndObtaining semantic context feature pairsAndWill beAndInputting a change feature extraction module to obtain a change feature f _cd; for the T ¹ phaseAnd f _dd input feature fusion module, firstly, toAndObtaining d _cd by absolute difference, and obtaining semantic change characteristics by carrying out 1 times of 1X 1 convolution operation on d _cd For the T ² phaseAnd f _cd, inputting the feature fusion module to obtain the semantic change featureFinally, willAndInputting two decodersAndObtaining the semantic change detection results of the time phases T ¹ and T ² And

Step 2: monitoring change detection using contrast learning loss;

The overall loss function L is described as:

2. The high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning according to claim 1, wherein the method is characterized by comprising the following steps of: the step 1 is specifically as follows:

Step 1-5, connectingInput decoderThe decoder comprises 2 convolutional layers, a bilinear interpolation up-sampling operation and a softmax operation; the first convolution layer contains 13 x 3 convolutions with 1 step and 2 padding, which changes the number of input eigenvector channels to 1/4, the resolution remains unchanged, i.e. H ₀×W₀ x 150, and 1 batch normalization operation and 1 correction linear unit; the second convolution layer comprises 1×1 convolution operation with 1 step, and changes the number of input feature vector channels into the number of semantic change detection categories, and the resolution is kept unchanged, namely H ₀×W₀ ×7; the bilinear interpolation up-sampling operation restores the resolution of the input feature vector to the resolution of the input image, namely H _input×W_input multiplied by 7, and finally normalizes by softmax operation to obtain the semantic change detection result of the time phaseT ² is obtained by the same principle

3. The high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning according to claim 2, wherein the method is characterized by comprising the following steps of: all convolution modules consist of a3 x 3 convolution layer, a batch normalization layer and a modified linear element.

4. The high-resolution remote sensing image semantic change detection method based on binary change detection contrast learning according to claim 1, wherein the method is characterized by comprising the following steps of: the semi-difficult and semi-easy sampling strategy in the step 2 is to sample half of difficult-to-classify samples and half of easy-to-classify samples for each category of anchor points, and half of difficult-to-classify samples and half of easy-to-classify samples are adopted for each negative sample of the anchor points; dividing the samples difficult to classify and the samples easy to classify by a threshold delta; the binary change detection prediction probability of the difficult-to-classify sample on the corresponding category is smaller than a threshold delta, and the binary change detection prediction probability of the easy-to-classify sample on the corresponding category is larger than the threshold delta, wherein the binary change detection prediction probability is from the normalized result of the encoder softmax.

5. A computer system, comprising: one or more processors, a computer-readable storage medium storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of claim 1.

6. A computer readable storage medium, characterized by storing computer executable instructions that, when executed, are adapted to implement the method of claim 1.