Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
In the prior art, a scheme (such as MDTNet, BDTNet) combining CNN and a transducer is used for urban road segmentation, and the segmentation effect is improved through multi-scale feature fusion and long-distance dependent modeling. However, aiming at the open pit scene with complex background, fuzzy boundary, broken stone tree interference and texture characteristic degradation, the prior art still has the core technical problems that the long-distance space dependence modeling is insufficient and the characteristic discrimination caused by multi-source interference is mixed, so that the road segmentation result is poor in edge smoothness, continuity and hole repair, and the trunk road, the abandoned road and the temporary road of the open pit cannot be accurately segmented.
In order to solve the above problems, an embodiment of the present invention provides a method for dividing a strip mine transportation road, as shown in fig. 1, the method includes:
and 110, performing strip mine transportation road segmentation on the strip mine remote sensing image by using the trained road segmentation model to obtain a road segmentation result.
The surface image of the strip mine is obtained through remote sensing equipment such as satellites and unmanned aerial vehicles, and comprises complex ground objects such as roads, broken stones, trees, piles and the like, and is the original data of road segmentation. Before the strip mine remote sensing image is input into the trained road segmentation model, image preprocessing (such as denoising, radiation correction, size normalization and the like) can be performed on the strip mine remote sensing image, interference factors in an original image are reduced, more stable input is provided for the model, noise interference of subsequent feature extraction is reduced, the capturing efficiency of the model on effective features is improved, the road segmentation model is a deep learning model for automatically identifying and segmenting transportation roads from the remote sensing image, the core is that features are extracted through a multi-layer neural network and road area masks are output, and the road segmentation result refers to a binary image (or masks) which is output and can accurately distinguish transportation roads from non-road areas after the strip mine remote sensing image is processed through the trained road segmentation model. In this result, the traffic road area in the strip mine, including the trunk road, the abandoned road and the temporary road, is usually marked with a specific pixel value (e.g., 1), and the non-road area is marked with another pixel value (e.g., 0), covering the broken stone, the trees, the piles, the empty ground, and the like in the background.
As shown in fig. 2, the road segmentation model includes a dual-branch encoder 21, a transform encoding module 22, a road multi-scale feature extraction module (composed of a first road multi-scale feature extraction module 231 and a second road multi-scale feature extraction module 232) and a progressive upsampling module 24, the dual-branch encoder includes a base convolution layer (conv3×3+bn+relu), a plurality ResNetV2 modules (ResNetV 2 blocks) with increasing channel dimensions, a plurality KANConv modules (KANConv block) with increasing channel dimensions, and a first feature channel stitching module (concat). In the following steps of the embodiment of the present application, the double-branch encoder 21 includes 3 ResNetV modules and 3 KANConv modules, which are described as an example, but the present application is not limited thereto.
Correspondingly, when the road segmentation model is trained, the method comprises the steps of obtaining a remote sensing image training set, wherein the remote sensing image training set comprises a sample strip mine remote sensing image and a corresponding road pixel level marking mask, inputting the strip mine remote sensing image in the remote sensing image training set into the road segmentation model after initializing network parameters, sequentially extracting fusion feature images through a double-branch encoder, modeling long-distance dependence through a transform encoding module, enhancing features of a road multi-scale feature extraction module, decoding and outputting the prediction segmentation mask through a progressive up-sampling module, calculating difference values of the prediction segmentation mask and the road pixel level marking mask based on a preset loss function, wherein the preset loss function comprises a Dice loss function, a BCE loss function and a Hausdorff distance loss function, the Dice loss function is used for optimizing category intersection ratio, the BCE loss function is used for enhancing pixel class classification errors, the Hausdorff distance loss function is used for reducing boundary distances between prediction and marking, updating network parameters of the road segmentation model through a backward propagation algorithm based on the difference values, and repeatedly executing the steps until the preset loss function or the preset training round is achieved, and the road segmentation model is obtained.
The method comprises the steps of calculating the intersection ratio of a prediction segmentation mask and a road pixel level annotation mask by using a Dice loss function to obtain Dice loss, calculating BCE loss by comparing the difference between the prediction segmentation mask and the road pixel level annotation mask pixel by using a BCE loss function, calculating the maximum distance between the prediction segmentation mask and corresponding edge pixels of the road pixel level annotation mask by using a Hausdorff distance loss function to obtain Hausdorff distance loss, and carrying out weighted summation on the Dice loss, the BCE loss and the Hausdorff distance loss to obtain the difference value between the prediction segmentation mask and the road pixel level annotation mask.
The training process can optimize the model from different dimensions by combining multiple loss functions, the race loss improves the class intersection ratio, and compared with the model which is in response to class imbalance, the BCE loss enhanced pixel class classification accuracy is reduced, the Hausdorff distance loss optimizes the boundary to enable the edge to be more fit, and the three are synergistic to comprehensively improve the model performance. The parameters are continuously updated by means of a back propagation algorithm, the road segmentation model can master the characteristic rule of the strip mine road step by step, the problem of characteristic confusion caused by insufficient long-distance dependent modeling and multi-source interference is effectively solved, the finally obtained model can accurately segment various roads of the strip mine, and the edge smoothness, continuity and integrity of segmentation results are improved.
The deep learning model architecture core provided by the application realizes high-precision road extraction through multi-module cooperation. The model adopts a double-branch encoder to process remote sensing images in parallel, wherein the right branch of the double-branch encoder adopts a plurality of ResNetV modules to extract global semantic and local spatial characteristics layer by layer, and the left branch captures nonlinear complex characteristics and detail differences (such as texture distinction of roads, surrounding crushed stones and trees) through a plurality of KANConv modules. The features of the two branches are increased in the channel dimension (such as 256-512-1024), and are fused through a first feature channel stitching module to form a multi-dimensional feature representation. The method comprises the steps of establishing a self-attention mechanism modeling long-distance dependence by a transducer coding module, enhancing continuity and integrity of a road, further fusing features of different scales by a road multi-scale feature extraction module, improving segmentation accuracy of an edge blurring region, and finally restoring spatial resolution by a progressive up-sampling module through layer-by-layer decoding to generate a final road segmentation mask.
Accordingly, for the embodiment of the present disclosure, the performing, in step 110, the strip mine transportation road segmentation on the strip mine remote sensing image by using the trained road segmentation model to obtain the road segmentation result may include the following steps:
Step 110-1, extracting an initial feature map of a remote sensing image of the strip mine by using a basic convolution layer in the double-branch encoder, respectively carrying out multi-stage feature extraction by using a plurality of ResNetV modules and a plurality of KANConv modules based on the initial feature map to obtain a first feature map which is output by each ResNetV module and comprises global semantic features and local spatial features, a second feature map which is output by each KANConv module and comprises nonlinear complex features and detail difference features, and carrying out channel fusion on the first feature map which is output by the last ResNetV module and the second feature map which is output by the last KANConv module by using a first feature channel splicing module to obtain a first fusion feature map which comprises multi-dimensional features.
The double-branch encoder is a characteristic extraction structure consisting of two parallel characteristic extraction branches (ResNetV branch and KANConv branch) and an auxiliary module, and is used for extracting complementary multi-type characteristics from a remote sensing image, and expressing global and local characteristics and linear and nonlinear characteristics; the basic convolution layer is a basic feature extraction unit generally composed of 3×3 convolution, batch Normalization (BN) and an activation function (such as ReLU), and is used for extracting initial low-level features (such as edges and textures) from an original remote sensing image and providing a basis for subsequent deep feature extraction, the ResNetV2 module is a feature extraction unit based on residual network improvement, the feature graph containing global semantic features (such as road overall layout) and local spatial features (such as road edges and local forms) is extracted through residual connection of a main branch (multi-convolution layer) and a parallel branch (1×1 convolution adjustment dimension), and the channel dimension is increased along with the module level, gradually focusing more abstract semantic information, the KANConv module can be a kernel adaptive convolution module, a nonlinear transformation is introduced through operations such as 1×1 convolution and spline calculation, and nonlinear complex association of roads and backgrounds and detail difference features (such as texture difference of different roads) are specifically extracted, the channel dimension is increased along with the module level, capturing capacity of the fine features is enhanced, the first feature graph is a feature graph output by the ResNetV module, the feature graph contains global features (such as global feature, ground structure, road space and local detail space relation and road detail space relation, the method comprises the steps of describing the key features of macroscopic and microscopic structures of roads, enabling a second feature map to be a feature map output by a KANConv module, enabling a core to comprise nonlinear complex features (complex background association of roads with broken stones, trees and other interferents) and detail difference features (texture and damage degree differences of different types of roads) for distinguishing slight differences of the roads and the background, enabling a first feature channel splicing module to be an operation module (usually a concat function) for combining the first feature map and the second feature map in channel dimensions, enabling multiple types of features to be integrated to form a richer feature representation by splicing output features of a last ResNetV module and a last KANConv module, enabling the first fusion feature map to be a feature map obtained by fusion of the first feature channel splicing module, and enabling the first fusion feature map to comprise global-local features of the ResNetV module and nonlinear-detail features of the KANConv module and to integrate multidimensional information and provide comprehensive feature input for subsequent modules.
For the embodiment of the disclosure, when a first fused feature map containing multi-dimensional features is extracted from a strip mine remote sensing image by a double-branch encoder, an initial feature map containing low-level features (such as edges and textures) can be firstly extracted from the strip mine remote sensing image through a basic convolution layer to serve as a basis for subsequent processing, then a plurality of channel-dimension-increasing ResNetV modules and a plurality of channel-dimension-increasing KANConv modules work in parallel, multi-stage feature extraction is carried out based on the initial feature map, specifically, the ResNetV modules are used for extracting the first feature map containing global semantic features (such as the whole layout of a road and the spatial relationship with peripheral ground objects) and local spatial features (such as the edge of the road and the shape of a local road segment), the KANConv modules are used for extracting the second feature map containing nonlinear complex features (such as the complex background association of roads and broken stones) and detail difference features (such as the texture difference of different types of roads) in a channel dimension manner, and finally, the first feature channel-increasing module is used for fusing the feature map output by the last ResNetV module and the last KANConv module to form a first feature map containing multi-dimensional information, and provide comprehensive basic modeling feature for the subsequent feature map.
The feature extraction process realizes feature complementation through a double-branch parallel design, wherein a basic convolution layer provides uniform initial features to ensure consistency of subsequent processing, a ResNetV module effectively captures global semantic and local spatial features by means of residual connection and channel incremental design to solve the problem of insufficient modeling of the overall structure of an open-air mine road, a KANConv module accurately distinguishes the subtle differences of roads and background interferents (such as broken stones and trees) through nonlinear transformation and detail sensitive design to relieve feature confusion caused by multi-source interference, and after the features of the two are spliced and fused by the channels, a first fusion feature map integrates multi-dimensional information such as global-local, linear-nonlinear, semantic-detail and the like, so that richer feature input can be provided for a subsequent Transformer coding module and a multi-scale feature extraction module, and the adaptability of the model to complex open-air mine scenes is improved from the source.
And 110-2, performing long-distance dependency modeling on the first fusion feature map through a transducer coding module, and outputting a coded global feature map.
The method comprises the steps that a transducer coding module is a coding component based on a transducer architecture, the core function is to code input features, and the correlation among the features is captured through mechanisms such as self-attention, and the like, in particular to a module for long-distance dependent modeling; the long-distance dependence modeling is used for calculating the association weights of pixels at different positions in a characteristic sequence through a self-attention mechanism, capturing the dependence relation among the long-distance pixels (such as the continuity of two sections of roads far apart in a strip mine), solving the limitation that the traditional convolution can only capture local association, and the global characteristic map is a characteristic map output after being processed by a transducer coding module, contains long-distance dependence information (such as the whole continuity of roads and the spatial relation of different sections), simultaneously retains the core information of multidimensional characteristics and is adaptive to the characteristic map input format of a subsequent module.
For the embodiment of the disclosure, after the first fused feature map including multidimensional features is output based on the dual-branch encoder, the link between the remote features in the feature map can be established through the mechanism of the transducer encoding module, so as to generate the encoded feature map including the global association. The defect of the traditional CNN in capturing the long-distance continuity of the open-air mine road can be effectively overcome through long-distance dependent modeling of a transducer coding module.
And 110-3, respectively extracting and fusing the road multi-scale features of the first feature map output by the target ResNetV module and the second feature map output by the target KANConv module by using the road multi-scale feature extraction module to obtain a corresponding first enhancement feature map and second enhancement feature map.
The target ResNetV2 module is a specific ResNetV module (usually an intermediate layer module) selected for outputting the features to be enhanced, the first feature map output by the target ResNetV module is one of inputs of multi-scale enhancement, the target KANConv module is a specific KANConv module (usually an intermediate layer module) selected for outputting the features to be enhanced, the second feature map output by the target KANConv module is another input of multi-scale enhancement, the first enhancement feature map is a feature map obtained by multi-scale extraction and fusion of the first feature map output by the target ResNetV module, the multi-scale global-local feature is integrated, the feature expression of the features of the roads with different scales can be enhanced, the second enhancement feature map is a feature map obtained by multi-scale extraction and fusion of the second feature map output by the target KANConv module, and the capability of multi-scale non-detail enhancement is improved.
For the embodiment of the disclosure, in a dual-branch encoder, a first feature map including global semantics and local spatial features output by a target ResNetV module and a second feature map including nonlinear complex and detail difference features output by a target KANConv module can be aimed at, and a road multi-scale feature extraction module respectively performs multi-scale feature extraction (including local details, different spatial scale contexts and global semantics) on two types of feature maps through a multi-branch structure in the road multi-scale feature extraction module, and then the extracted multi-scale sub-features are spliced and weighted and fused in a channel dimension to finally obtain a first enhancement feature map and a second enhancement feature map which integrate multi-scale information, so that the discrimination capability of the features is enhanced.
Through multi-scale feature extraction and fusion, the problem of insufficient feature expression caused by variable scales (such as wide main roads and narrow temporary roads), fuzzy boundaries and background interference of the roads of the strip mine can be solved pertinently, the multi-scale suitability of global semantics and local spatial features of the roads can be further enhanced through the enhancement of the first feature map, the overall capturing capability of the roads with different widths and lengths is improved, the detail difference (such as texture distinction of the roads and broken stones) in a nonlinear complex scene can be highlighted through the enhancement of the second feature map, and the feature discrimination of the boundary fuzzy region is enhanced.
And 110-4, performing layer-by-layer upsampling, channel merging and compression decoding processing on the first feature map, the second feature map, the global feature map, the initial feature map, the first enhancement feature map and the second enhancement feature map by utilizing a progressive upsampling module to obtain a road segmentation result of the remote sensing image of the strip mine.
The progressive upsampling module is a decoding module composed of a plurality of decoding layers with decreasing channel dimensions and segmentation result generation layers, spatial resolution is restored through layer-by-layer upsampling, multi-stage and multi-type features (such as original features, enhanced features and global features) are fused, accurate mapping from a low-resolution feature map to a high-resolution segmentation result can be achieved, layer-by-layer upsampling is that in the decoding process, each layer improves the spatial resolution (height and width) of the feature map to 2 times (or a designated multiple) of the previous layer through interpolation (such as bilinear interpolation) or deconvolution, the spatial resolution (height and width) of the feature map is gradually restored to an input image size, detail loss caused by direct upsampling is avoided, channels are combined to splice the feature map with different sources in the channel dimension (such as combining decoding features with the original features with the same resolution), multi-dimensional information is integrated, feature expression is enriched, compression decoding processing is that in the upsampling process, channel dimension (such as 1024-512-256) is reduced through a convolution layer, calculation amount is reduced while key features are reserved, balance between decoding efficiency and feature effectiveness is ensured, and finally output as a two-level image is distinguished from a road map (road map, road map and a road map is predicted by a road map, a road map is temporarily-surface road map).
For the embodiment of the disclosure, the full-process complementation of detail-semantic-global association-multi-scale feature can be realized by fusing an initial feature map (retaining edge details), a first/second feature map (supplementing middle-high layer semantics), a global feature map (strengthening long-distance association) and an enhanced feature map (optimizing multi-scale adaptation) while spatial resolution is restored, the feature richness and the computing efficiency can be balanced by channel merging and compression decoding, feature redundancy is avoided, and finally output segmentation results are compatible with edge smoothness (action of the initial feature map and the enhanced feature map), road continuity (action of the global feature map) and interference resistance (action of the second feature map and the enhanced feature map), so that the problems of hole, fracture and edge blurring in a complex scene of a traditional method are remarkably improved by precisely segmenting an open pit mine main road, a waste road and a temporary road.
In conclusion, according to the strip mine transportation road segmentation method provided by the invention, the core problem in strip mine road segmentation can be effectively solved through multi-module cooperation and targeted design. In the double-branch encoder, a ResNetV module with an increased channel extracts global semantic features and local spatial features, a KANConv module with an increased channel extracts nonlinear complex features and detail difference features, multi-dimensional features are spliced and fused through the channel to relieve feature confusion caused by multi-source interference, a transducer encoding module models long-distance dependence on the fusion features, overcomes the defect of long-distance spatial correlation modeling of traditional CNN in a strip mine scene, improves continuity and hole restoration capability of road segmentation, a road multi-scale feature extraction module strengthens edge features through multi-scale fusion to improve segmentation precision of boundary fuzzy areas, a progressive up-sampling module integrates multi-stage and multi-type features, gradually restores spatial details in the decoding process to ensure road edge smoothness, and a model adopts a Dice loss optimization category cross-merging ratio, a BCE loss reinforcement pixel classification and Hausdof distance loss constraint boundary precision in a training stage, so that the edge quality and the whole accuracy of segmentation results can be further improved, and finally accurate segmentation of various strip mine roads can be realized.
Based on the embodiment shown in fig. 1, as a refinement and extension of the above embodiment, in order to fully describe the specific implementation procedure of the method of this embodiment, this embodiment provides a specific method as shown in fig. 3. Fig. 3 further defines the steps of the embodiment based on the embodiment shown in fig. 1. As shown in fig. 3, the method comprises the steps of:
Step 310, extracting an initial feature map of a remote sensing image of the strip mine by using a basic convolution layer in the dual-branch encoder, respectively carrying out multi-stage feature extraction by using a plurality of ResNetV modules and a plurality of KANConv modules based on the initial feature map to obtain a first feature map which is output by each ResNetV module and contains global semantic features and local space features, a second feature map which is output by each KANConv module and contains nonlinear complex features and detail difference features, and carrying out channel fusion on the first feature map which is output by the last ResNetV module and the second feature map which is output by the last KANConv module by using a first feature channel splicing module to obtain a first fusion feature map which contains multidimensional features.
As shown in fig. 4, the ResNetV2 module (ResNetV 2 block) includes a main branch sequence layer, a parallel branch adjustment layer connected with the main branch sequence layer residual, and a second characteristic channel splicing module (concat), where the main branch sequence layer includes a group normalization and correction linear unit activation combination layer (GroupNorm +relu), a 1×1 convolution layer (conv1×1), a group normalization and correction linear unit activation combination layer (GroupNorm +relu), a3×3 convolution layer (conv3×3), a group normalization and correction linear unit activation combination layer (GroupNorm +relu), and a 1×1 convolution layer (conv1×1), and the parallel branch adjustment layer includes a 1×1 convolution and group normalization combination layer (conv1×1+GroupNorm).
In ResNetV module, three-layer convolution structure of main branch sequence layer gradually extracts features through sequence of normalization, activation and convolution, wherein the first layer GroupNorm +ReLU firstly normalizes and non-linearly activates input features, then reduces dimension through 1X 1 convolution to reduce calculation amount, the second layer GroupNorm +ReLU further normalizes and activates, then captures local space features through 3X 3 convolution, and the third layer GroupNorm +ReLU normalizes, activates again and increases dimension through 1X 1 convolution to integrate local and global semantic features. The 1 x1 convolution of the parallel branch adjustment layer adjusts the channel number of the input feature to match the main branch output, groupNorm stabilizes the feature distribution, and the two together provide an adaptation path for the residual connection. The second characteristic channel splicing module splices the complex characteristics extracted by the main branches and the basic characteristics of the parallel branches in the channel dimension, so that multi-level characteristic fusion is realized, the basic information of original input is reserved, the characteristic diversity is enhanced, and the characteristic expression capability of the model to complex scenes such as strip mine roads can be effectively improved.
Correspondingly, for the embodiment of the disclosure, when the multiple ResNetV modules perform multi-stage feature extraction based on the initial feature map to obtain a first feature map including global semantic features and local spatial features output by each ResNetV module, each ResNetV module processes input features through a dual-path structure, a main branch sequence layer (composed of multiple convolution layers and batch normalization layers) performs deep processing on the first input features, extracts a main branch feature map including global semantic features (such as road overall layout) and local spatial features (such as road edge details), and a parallel branch adjustment layer (1×1 convolution) performs lightweight processing on the same input features, extracts basic global semantic features (such as preliminary relationships between roads and peripheral features) as auxiliary feature maps, and then a second feature channel splicing module merges the two feature maps in a channel dimension to form the first feature map output by the current module. For the first ResNetV module, the first input feature is the initial feature map output by the basic convolution layer, and the subsequent module takes the first feature map output by the previous module as input, and gradually enhances the feature expression in the cascade mode, so that the subsequent module can extract the semantic and spatial features at a higher level.
Correspondingly, the embodiment steps can include the steps of extracting global semantic features and local spatial features of a first input feature by utilizing a main branch sequence layer to obtain a main branch feature map, extracting basic global semantic features of the first input feature by utilizing a parallel branch adjustment layer to obtain an auxiliary feature map, and combining the main branch feature map and the auxiliary feature map in a channel dimension by utilizing a second feature channel splicing module to obtain a first feature map output by a current ResNetV2 module, wherein when the current ResNetV module is a first ResNetV module in a plurality of ResNetV2 modules, the first input feature is an initial feature map, and when the current ResNetV module is a plurality of ResNetV2 modules and other ResNetV modules except the first ResNetV2 module, the first input feature is the first feature map output by the current ResNetV module corresponding to the previous ResNetV module.
The ResNetV module extracts and fuses the features of different levels by utilizing the main branch and the parallel branch, so that the first feature map has rich global semantics and fine local space features at the same time, the distinguishing capability of the features is enhanced, the cascading mode enables the subsequent module to further extract more abstract and higher-level features on the basis of the preamble features, the capturing of the features of the open-air mine road is deepened gradually, the problems of complex background, texture degradation and the like are effectively solved, a high-quality feature basis is provided for the subsequent feature fusion and segmentation processing, and the accuracy of road segmentation is improved.
As shown in fig. 5, KANConv module (KANConv block) includes a1×1 convolution and padding layer (conv1×1+padding), a feature map flattening layer (Flatting), a spline computation layer (Spline Calculation), a batch Normalization layer (Normalization), and a concealment layer (Hidden Features) connected in sequence. In KANConv module, 1X1 convolution and filling layer integrate channel information of input features through 1X1 convolution, meanwhile filling operation keeps feature diagram size, stable space dimension is provided for subsequent processing, feature diagram flattening layer converts two-dimensional feature diagram into one-dimensional feature vector, input format of spline calculation layer is adapted, spline calculation layer carries out nonlinear transformation on flattened features through spline interpolation function, complex nonlinear relation (such as slight difference between strip mine road and background) among features is captured, batch normalization layer normalizes features after spline calculation, stabilizing training process and accelerating convergence, hidden layer further refines and enhances feature expression through connection among neurons, second feature diagram containing nonlinear complex features and detail difference features is output, and capturing capability of integral promotion module on road features under complex scene is improved.
Correspondingly, for the embodiment of the disclosure, when the plurality of KANConv modules perform multi-stage feature extraction based on the initial feature map to obtain a second feature map which is output by each KANConv module and contains nonlinear complex features and detail difference features, the method comprises the steps of performing channel compression and boundary filling processing on the second input features by using a 1X 1 convolution and filling layer to obtain a first adaptive feature map, performing space dimension flattening on the first adaptive feature map by using a feature map flattening layer to obtain one-dimensional sequence features, performing spline interpolation calculation on the one-dimensional sequence features by using a spline calculation layer, introducing detail difference expression of nonlinear transformation enhancement features to obtain transformed sequence features, performing batch normalization processing on the transformed sequence features by using a batch normalization layer to obtain normalized features, and recovering the normalized features to be two-dimensional feature maps by using a hiding layer through full-connection mapping to serve as the second feature map which is output by the current KANConv module and contains nonlinear complex features and detail difference features;
When the current KANConv module is the first KANConv module in the plurality of KANConv modules, the second input feature is an initial feature map, and when the current KANConv module is the other KANConv modules except the first KANConv module in the plurality of KANConv modules, the second input feature is a second feature map output by the current KANConv module corresponding to the previous KANConv module.
The characteristic capturing capability of the complex scene of the strip mine can be remarkably improved through multi-stage cascading and targeted processing of KANConv modules, and the method is specifically characterized in that boundary information is reserved when a 1 multiplied by 1 convolution and filling layer compresses a channel, a foundation is laid for subsequent processing, characteristic diagram flattening and spline computing are combined, the modules can accurately capture nonlinear relations (such as complex association of roads and backgrounds) among characteristics, detail differential expression is enhanced, batch normalization layers stabilize training distribution, gradient fluctuation is avoided, and a hidden layer further refines key characteristics when the two-dimensional characteristic diagram is restored. The multi-module cascade enables the characteristics to be gradually upgraded from the initial edge texture to the advanced characteristics comprising nonlinear association and nuances, effectively solves the problem of road texture degradation and more background interference in the strip mine, and provides more discriminative characteristic support for subsequent segmentation.
And 320, performing long-distance dependency modeling on the first fusion feature map through a transducer coding module, and outputting a coded global feature map.
As shown in fig. 6, the transducer encoding module 22 (Transformer block) includes a feature pretreatment layer, a transducer stack layer (Transformer Layer ×12), and a feature reduction layer (Hidden Features) connected in this order. For the embodiment of the disclosure, when the first fusion feature map is subjected to long-distance dependency modeling through the transducer coding module and the coded global feature map is output, the first fusion feature map (the multidimensional features of ResNetV and KANConv are integrated) can be firstly segmented, embedded in position information and adjusted in dimension through the feature preprocessing layer to be converted into a feature sequence with a space position, the transducer stacking layer consists of a plurality of transducer layers (such as 12 Transformer Layer), the association weights of different positions in the feature sequence are calculated through a self-attention mechanism, the long-distance dependency relationship (such as continuity of road sections far apart in open pit) is captured, global semantic association is enhanced, the feature reduction layer converts the coded feature sequence into a two-dimensional global feature map, the long-distance dependency information is reserved, the input format of a subsequent module is adapted, and finally the coded global feature map containing global association information is output.
Accordingly, the embodiment step 320 may specifically include the following steps:
and 320-1, performing blocking processing on the first fusion feature map by utilizing a feature preprocessing layer, and embedding position information to obtain a feature sequence with spatial position information.
As shown in fig. 6, the Feature preprocessing layer includes a1×1 convolution layer (conv1×1), a Hidden Feature layer (Hidden Feature), and a linear projection layer (Linear Projection) connected in order. The 1 multiplied by 1 convolution layer is used for executing channel dimension transformation and feature fusion operation on the first fusion feature image, the hidden feature layer is used for carrying out channel number adjustment on the first fusion feature image processed by the 1 multiplied by 1 convolution layer to obtain a second adaptive feature image, and the linear projection layer is used for carrying out blocking processing on the second adaptive feature image and carrying out full-connection mapping to obtain a feature sequence with space position information.
The method can effectively solve the problem of adaptation of a feature map and a Transformer input format through three layers of collaborative processing, and is characterized in that redundant features are reduced through channel transformation and fusion of a 1X 1 convolution layer, feature compactness can be improved, channel number adjustment of a hidden feature layer ensures consistency of subsequent blocking processing, information loss caused by dimension mismatch can be avoided, blocking and spatial position embedding of a linear projection layer not only reserves spatial distribution information (such as the position relation of a strip mine road) of the features, but also converts the features into a sequence form which can be processed by the Transformer, and can provide accurate spatial coordinate reference for capturing long-distance dependence (such as continuity of a cross-regional road) by a subsequent self-attention mechanism, so that the effectiveness and the suitability of the feature sequence are integrally improved.
And 320-2, performing feature transformation processing on the feature sequence by using a plurality of transducer layers through a layer-by-layer iterative self-attention mechanism, and taking the high-dimensional feature vector output by the last transducer layer as a global associated feature vector.
As shown in fig. 6, the transducer stack layer includes a first normalization layer (Norm), a QKV Linear transformation layer (QKV), a Multi-Head Attention layer (Multi-Head Attention), a second normalization layer (Norm), a feedforward layer (linear+ GeLU), a random inactivation layer (Dropout), and an output layer (linear+ GeLU) connected in sequence for each of a plurality of transducer layers (Transformer Layer). The first normalization layer is used for carrying out normalization processing on third input features of a current transducer layer, the first normalization features are input to the QKV linear transformation layer, when the current transducer layer is a first transducer layer in a plurality of transducer layers, the third input features are feature sequences, when the current transducer layer is another transducer layer except the first transducer layer in the plurality of transducer layers, the third input features are high-dimensional feature vectors output by the current transducer layer corresponding to the previous transducer layer, the QKV linear transformation layer is used for projecting the first normalization features into query, key and value vectors through 3 independent linear layers, the multi-head attention layer is used for carrying out multi-feature information joint extraction on the query, key and value vectors, the second normalization layer is used for carrying out normalization processing on the attention weighted features again, the feedforward layer is used for carrying out channel dimension expansion on the second normalization features processed by the second normalization layer, the random inactivation layer is used for carrying out random inactivation processing on the output features of the feedforward layer, and the random inactivation processing is carried out through the random inactivation processing, and the random inactivation processing is carried out the method is connected with the first linear transformation layer is used for carrying out the first normalization processing.
The method comprises the steps of carrying out iterative processing on a plurality of layers of transformers, wherein the transformers are stacked, capturing capability on long distance dependence can be remarkably enhanced, the steps of stabilizing feature distribution of a first normalization layer/a second normalization layer, avoiding training fluctuation, carrying out linear transformation on QKV and a plurality of groups of feature subspaces in cooperation with a multi-head attention layer, extracting associated information (such as spatial relations of different road sections in strip mines) from the plurality of groups of feature subspaces, solving the limitation of a single attention view angle, carrying out nonlinear transformation on a feedforward layer, enhancing feature expression capability, adapting to association modeling of roads and complex backgrounds, and carrying out model generalization on a random inactivation layer, so that the risk of overfitting is reduced. The layer-by-layer iteration enables the characteristics to be gradually upgraded from an initial sequence to a high-dimensional vector containing global association, long-distance continuity of the strip mine road is effectively captured, core association information is provided for subsequent global characteristic diagram generation, and integrity and anti-interference performance of a segmentation result are improved.
And step 320-3, converting the global associated feature vector into an encoded global feature map by using a feature reduction layer.
For the global associated feature vector (high-dimensional sequence feature containing long-distance dependency) output by the transducer stacking layer, the feature reduction layer can convert the global associated feature vector in a one-dimensional sequence form into a two-dimensional feature map format through dimension reshaping and convolution operation. Specifically, the vector can be firstly subjected to block recombination according to the space dimension (such as length and width) of the original feature map, the space dimension is recovered, the channel number is adjusted through a convolution layer, the space relevance is enhanced, and finally the coded global feature map is obtained. The process can realize format conversion from sequence features to image features, and adapt to the input requirements of a subsequent multi-scale feature extraction module while preserving global associated information.
According to the embodiment of the disclosure, the characteristic format adaptation problem can be solved, global association information is reserved, global association characteristic vectors (in a sequence form) are converted into global characteristic diagrams (in an image form), the fact that a subsequent road multi-scale characteristic extraction module can directly process the characteristic information is guaranteed, long-distance dependence information (such as the overall continuity of a strip mine road) is not lost in the conversion process of dimension remolding and convolution operation, the global characteristic diagrams not only contain global association of a transducer code, but also have space structures of image characteristics, the output global characteristic diagrams can provide global semantic support for the subsequent fusion multi-scale characteristics, road breakage or misjudgment caused by lack of a global view angle of a segmentation result is effectively avoided, and the integrity and consistency of segmentation are improved.
And 330, respectively extracting and fusing the road multi-scale features of the first feature map output by the target ResNetV module and the second feature map output by the target KANConv module by using the road multi-scale feature extraction module to obtain a corresponding first enhancement feature map and second enhancement feature map.
As shown in fig. 2, the road multi-scale feature extraction module (RMFF block) includes a first road multi-scale feature extraction module 231 (right RMFF block) and a second road multi-scale feature extraction module 232 (left RMFF block). The first road multi-scale feature extraction module 231 is coupled to a target ResNetV module (e.g., a second ResNetV2 block) and the second road multi-scale feature extraction module 232 is coupled to a target KANConv module (e.g., a second KANConv block). The network structures of the first road multi-scale feature extraction module and the second road multi-scale feature extraction module are the same, as shown in fig. 7, and each of the first road multi-scale feature extraction module and the second road multi-scale feature extraction module includes a 1×1 convolution branch (conv1×1 ReLU), a multi-scale hole convolution branch (conv3× rate = ReLU, conv3× rate = ReLU, conv3× rate =12 ReLU), a global averaging pooling branch (Poolingl ×1 conv1×1 UpSample), and an adaptive fusion layer (Adaptive Feature Fusion, conv1×1 ReLU).
The multi-branch multi-scale design of the road multi-scale feature extraction module can effectively solve the problem of scale diversity in open-pit road segmentation, and is specifically characterized in that multi-scale cavity convolution branches cover roads with different sizes (from narrow temporary roads to wide main roads) through different expansion rates, detail omission or fuzzy edges of single-scale convolution are avoided, global average pooling branches are introduced into global context to inhibit local noise interference (such as ore piles and vegetation), and an adaptive fusion layer dynamically adjusts each branch weight (such as enhancing small expansion rate branch weights for narrow roads) according to input features, so that feature adaptability is improved. By processing the global-local features of ResNetV and the nonlinear-detail features of KANConv respectively, the multi-scale expression capability of ResNetV features can be enhanced, the global relevance of KANConv features can be improved, and the finally generated enhanced feature map can more accurately describe complex forms of the strip mine road, so that a high-quality feature basis is provided for subsequent segmentation.
Accordingly, for embodiments of the present disclosure, embodiment step 330 may include the steps of:
Step 330-1, performing local detail feature extraction, multi-scale space feature extraction and global context information compression and restoration processing on the first feature map output by the target ResNetV module by using a1×1 convolution branch, a multi-scale cavity convolution branch and a global average pooling branch configured in the first path multi-scale feature extraction module to obtain a first sub-feature map, a second sub-feature map and a third sub-feature map, and performing channel splicing and weighted fusion on the first sub-feature map, the second sub-feature map and the third sub-feature map by using an adaptive fusion layer configured in the first path multi-scale feature extraction module to obtain a first enhancement feature map integrating multi-scale information.
The first sub-feature map is a feature map of 1×1 convolution branch output in a first road multi-scale feature extraction module, contains local detail features, retains fine textures and edge information in the first feature map, the second sub-feature map is a feature map of multi-scale cavity convolution branch output in the first road multi-scale feature extraction module, integrates multi-scale space features of convolution extraction with different expansion rates, covers various road structures from narrow to wide, the third sub-feature map is a feature map of global average pooling branch output in the first road multi-scale feature extraction module, contains global context information, can reflect the spatial relationship between a road and the whole mining area environment, and the first enhancement feature map is a feature map of adaptive fusion layer output, integrates local details of the first sub-feature map, multi-scale space features of the second sub-feature map and global context of the third sub-feature map, and can enhance multi-scale expression capacity of open-pit roads.
For the embodiment of the disclosure, the processing flow of the first feature map output by the target ResNetV2 module by the first road multi-scale feature extraction module is as follows, a 1×1 convolution branch is activated by 1×1 convolution and ReLU, local detail features (such as road edge textures) of a cross channel are extracted to generate a first sub-feature map, the multi-scale cavity convolution branch captures spatial features of different scales (small expansion rate focusing narrow road details and large expansion rate covering wide road global contours) in parallel by 3×3 cavity convolutions of different expansion rates (such as 6, 9 and 12) to generate a second sub-feature map, and the global average pooling branch firstly carries out global average pooling compression spatial dimension on the first feature map, then carries out 1×1 convolution and up-sampling recovery size, and extracts global context information (such as the relation of road and mining area overall layout) to generate a third sub-feature map. And then, the self-adaptive fusion layer splices the three sub-feature images in the channel dimension, generates attention weight through 1X 1 convolution, carries out weighted fusion on each sub-feature image, highlights multi-scale features related to the road, and finally obtains a first enhancement feature image integrating multi-scale information.
And 330-2, performing local detail feature extraction, multi-scale space feature extraction and global context information compression and restoration processing on the second feature map output by the target KANConv module by using a 1×1 convolution branch, a multi-scale cavity convolution branch and a global average pooling branch configured in the second road multi-scale feature extraction module to obtain a fourth sub-feature map, a fifth sub-feature map and a sixth sub-feature map, and performing channel splicing and weighted fusion on the fourth sub-feature map, the fifth sub-feature map and the sixth sub-feature map by using an adaptive fusion layer configured in the second road multi-scale feature extraction module to obtain a second enhanced feature map integrating multi-scale information.
The fourth sub-feature map is output of a1×1 convolution branch in the second road multi-scale feature extraction module and comprises local detail features in the second feature map, the fifth sub-feature map is output of a multi-scale cavity convolution branch in the second road multi-scale feature extraction module and integrates multi-scale space features of the second feature map, the sixth sub-feature map is output of a global average pooling branch in the second road multi-scale feature extraction module and comprises global context information of the second feature map, and the second enhancement feature map is output of the second road multi-scale feature extraction module and integrates multi-scale information and enhances characterization capability of road features.
For the embodiment of the disclosure, the processing flow of the second feature map output by the second road multi-scale feature extraction module to the target KANConv module is that a 1×1 convolution branch extracts local detail features from the second feature map to obtain a fourth sub-feature map, a multi-scale cavity convolution branch extracts multi-scale space features through cavity convolution with different expansion rates to obtain a fifth sub-feature map, a global average pooling branch compresses first and then restores global context information of the second feature map to obtain a sixth sub-feature map, and then an adaptive fusion layer performs channel splicing on the three sub-feature maps and highlights key features through weighted fusion to finally obtain a second enhanced feature map integrating multi-scale information.
And 340, performing layer-by-layer upsampling, channel merging and compression decoding processing on the first feature map, the second feature map, the global feature map, the initial feature map, the first enhancement feature map and the second enhancement feature map by utilizing a progressive upsampling module to obtain a road segmentation result of the remote sensing image of the strip mine.
As shown in fig. 2, the progressive upsampling module 24 includes a plurality of decoding layers with decreasing channel dimensions, each including a third characteristic channel splicing module (concat) and an upsampling module (Decode block), and a segmentation result generation layer connected to the plurality of decoding layers, each including conv1×1 and Sigmoid layers. As shown in fig. 8, the upsampling module (Decode block) includes a conv1×1+relu layer, a batch normalization layer (BatchNorm layer), a conv3×3+relu layer, and an upsampling layer (UpSample) wherein the 1×1 convolution in the conv1×1+relu layer is used for cross-channel information integration and dimension adjustment (such as compression/expansion channel number), the ReLU activation function is used for introducing nonlinearity, enhancing feature expression capability, primarily extracting and transforming features, the BatchNorm layer is used for performing batch normalization on the features output by conv1×1+relu, stabilizing feature distribution (reducing mean value and variance fluctuation), accelerating model training convergence, alleviating gradient disappearance/explosion problems, the 3×3 convolution in the conv3+relu layer is used for capturing spatial local correlation, extracting richer spatial features, reLU further enhancing nonlinearity, improving feature discrimination, and UpSample layer is used for performing upsampling operations (such as interpolation, transposition, etc.), amplifying feature map size, restoring to a target resolution, and preparing for subsequent feature fusion or output.
Correspondingly, for the embodiment of the disclosure, the steps of the embodiment may include sequentially determining each decoding layer of the plurality of decoding layers as a current decoding layer, performing channel fusion on a fourth input feature by using a third feature channel splicing module configured therein to obtain a second fusion feature map of the current decoding layer, performing upsampling and compression decoding processing on the second fusion feature map by using an upsampling module configured therein to obtain a decoding feature map output by the current decoding layer, and performing channel dimension compression, probability mapping and binarization processing by using a segmentation result generation layer based on the decoding feature map output by the last decoding layer of the plurality of decoding layers to obtain a road segmentation result of the strip mine remote sensing image.
The fourth input feature comprises a first feature map output by a last ResNetV module, a second feature map output by a last KANConv module and a global feature map when the current decoding layer is the first decoding layer of the plurality of decoding layers, a first enhancement feature map and a second enhancement feature map when the current decoding layer is the target decoding layer with the same channel dimension corresponding to the target ResNetV module and the target KANConv module, and a decoding feature map output by the current decoding layer corresponding to the previous decoding layer, a decoding feature map and an initial feature map output by the current decoding layer corresponding to the previous decoding layer when the current decoding layer is the last decoding layer of the plurality of decoding layers, and a decoding feature map and a second feature map when the current decoding layer is any decoding layer other than the first decoding layer, the target decoding layer and the last decoding layer, wherein the fourth input feature comprises the decoding feature map output by the current decoding layer corresponding to the previous decoding layer and the first feature map and the same channel dimension corresponding to the current decoding layer.
By fusing the multi-source features in stages and gradually up-sampling, the accuracy and the integrity of road segmentation can be effectively improved. Different decoding layers are integrated with various features in a targeted mode, a first decoding layer fuses high-level features with global association to lay a semantic foundation for segmentation, a target decoding layer adds enhanced features to strengthen multi-scale adaptability, and a last decoding layer introduces initial features to supplement detail information. The layers of the up-sampling module cooperate to compress the channel while recovering the resolution, and balance the feature richness and the computing efficiency. The processing of the segmentation result generation layer converts the characteristics into a binary segmentation map, so that the road area can be clearly presented, the limitation of single characteristic segmentation is solved by the whole flow, and the continuity and the accuracy of road segmentation in a complex strip mine scene can be improved.
In summary, the technical scheme of the application can pointedly solve the core problem of open-pit road segmentation through multi-module collaborative design, and has the technical effects that in a double-branch encoder, a ResNetV2 module accurately extracts global semantic and local spatial characteristics by means of residual connection and channel splicing of a main branch and a parallel branch, a KANConv module captures detail difference characteristics (such as textures for distinguishing roads, broken stones and trees) under a complex background through spline calculation and nonlinear transformation, the two are fused to form multi-dimensional characteristics, feature discrimination confusion caused by multi-source interference is effectively relieved, a transform encoding module enhances long-distance spatial dependence modeling of open-pit roads by embedding position information in blocks, a multi-layer self-attention mechanism and channel expansion, continuity and hole restoration capability of road segmentation are improved, a road multi-scale characteristic extracting module utilizes multi-branch (1×1 convolution, multi-scale hole convolution and global pooling) to fuse different scale characteristics, edge expression under a boundary scene is enhanced, a progressive up-sampling module accurately restores dimensional design in a decoding process to restore characteristic (including initial characteristics, enhancement characteristics and the like), and finally, the overall quality of the open-pit road segmentation is greatly improved, the overall quality is greatly improved, and the overall stability is greatly improved, and the overall quality is greatly improved, and the accuracy of the overall segmentation accuracy is greatly improved.
Further, as a specific implementation of the method shown in fig. 1 and 3, the embodiment provides a strip mine transportation road dividing device, as shown in fig. 9, which includes a dividing module 91.
The segmentation module 91 is used for carrying out strip mine transportation road segmentation on the strip mine remote sensing image by utilizing a trained road segmentation model to obtain a road segmentation result, wherein the road segmentation model comprises a double-branch encoder, a transform encoding module, a road multi-scale feature extraction module and a progressive up-sampling module, and the double-branch encoder comprises a basic convolution layer, a plurality of ResNetV modules with increasing channel dimension, a plurality of KANConv modules with increasing channel dimension and a first feature channel splicing module;
the segmentation module 91 is specifically configured to:
Extracting an initial feature map of a remote sensing image of the strip mine by utilizing a basic convolution layer in a double-branch encoder, respectively carrying out multi-stage feature extraction by a plurality of ResNetV modules and a plurality of KANConv modules based on the initial feature map to obtain a first feature map which is output by each ResNetV module and contains global semantic features and local spatial features, a second feature map which is output by each KANConv module and contains nonlinear complex features and detail difference features, and carrying out channel fusion on the first feature map which is output by the last ResNetV module and the second feature map which is output by the last KANConv module by utilizing a first feature channel splicing module to obtain a first fusion feature map which contains multi-dimensional features;
Performing long-distance dependency modeling on the first fusion feature map through a transducer coding module, and outputting a coded global feature map;
Respectively extracting and fusing the road multi-scale features of the first feature map output by the target ResNetV module and the second feature map output by the target KANConv module by using the road multi-scale feature extraction module to obtain a corresponding first enhancement feature map and second enhancement feature map;
And carrying out layer-by-layer upsampling, channel merging and compression decoding processing on the first feature map, the second feature map, the global feature map, the initial feature map, the first enhancement feature map and the second enhancement feature map by utilizing a progressive upsampling module to obtain a road segmentation result of the remote sensing image of the strip mine.
In some embodiments of the present application, the ResNetV module includes a main branch sequence layer, a parallel branch adjustment layer connected to the main branch sequence layer residual, and a second characteristic channel splicing module, where the main branch sequence layer includes a normalization and correction linear unit activation combination layer, a 1×1 convolution layer, a group normalization and correction linear unit activation combination layer, a3×3 convolution layer, a group normalization and correction linear unit activation combination layer, and a 1×1 convolution layer that are sequentially connected, and the parallel branch adjustment layer includes a 1×1 convolution and group normalization combination layer;
the plurality of ResNetV modules perform multi-stage feature extraction based on the initial feature map, so as to obtain a first feature map including global semantic features and local spatial features output by each ResNetV module, where the segmentation module 91 is specifically configured to:
Extracting global semantic features and local spatial features of the first input features by using a main branch sequence layer to obtain a main branch feature map, extracting basic global semantic features of the first input features by using a parallel branch adjustment layer to obtain an auxiliary feature map, and merging the main branch feature map and the auxiliary feature map in a channel dimension by using a second feature channel splicing module to obtain a first feature map output by a current ResNetV module;
When the current ResNetV2 module is the first ResNetV module of the plurality ResNetV of modules, the first input feature is an initial feature map, and when the current ResNetV module is the other ResNetV modules of the first ResNetV2 module of the plurality ResNetV of modules, the first input feature is a first feature map output by the current ResNetV module corresponding to the previous ResNetV2 module.
In some embodiments of the present application, the KANConv module includes a1×1 convolution and fill layer, a feature map flattening layer, a spline calculation layer, a batch normalization layer, and a concealment layer, connected in sequence;
The segmentation module 91 is specifically configured to, when the multiple KANConv modules perform multi-stage feature extraction based on the initial feature map to obtain a second feature map including nonlinear complex features and detail difference features output by each KANConv module:
performing channel compression and boundary filling processing on the second input feature by using a1 multiplied by 1 convolution and filling layer to obtain a first adaptive feature map;
carrying out space dimension flattening on the first adaptive feature map by using a feature map flattening layer to obtain one-dimensional sequence features;
performing spline interpolation calculation on the one-dimensional sequence features by using a spline calculation layer, and introducing detail differential expression of nonlinear transformation enhancement features to obtain transformed sequence features;
performing batch normalization processing of channel dimensions on the transformed sequence features by using a batch normalization layer to obtain normalized features;
restoring the normalized features into a two-dimensional feature map through full-connection mapping by using the hidden layer, and taking the two-dimensional feature map as a second feature map which is output by the current KANConv module and contains nonlinear complex features and detail difference features;
When the current KANConv module is the first KANConv module in the plurality of KANConv modules, the second input feature is an initial feature map, and when the current KANConv module is the other KANConv modules except the first KANConv module in the plurality of KANConv modules, the second input feature is a second feature map output by the current KANConv module corresponding to the previous KANConv module.
In some embodiments of the application, a transducer encoding module comprises a feature pretreatment layer, a transducer stacking layer and a feature reduction layer which are connected in sequence, wherein the feature pretreatment layer comprises a 1X 1 convolution layer, a hidden feature layer and a linear projection layer which are connected in sequence, the transducer stacking layer comprises a plurality of transducer layers, and each transducer layer comprises a first normalization layer, a QKV linear transformation layer, a multi-head attention layer, a second normalization layer, a feedforward layer, a random inactivation layer and an output layer which are connected in sequence;
When the first fused feature map is modeled by the transform coding module in a long-distance dependency manner, and the coded global feature map is output, the segmentation module 91 is specifically configured to:
The method comprises the steps that a feature preprocessing layer is utilized to conduct block processing on a first fusion feature image and embed position information to obtain a feature sequence with spatial position information, a1 multiplied by 1 convolution layer is used for conducting channel dimension transformation and feature fusion operation on the first fusion feature image, a hidden feature layer is used for conducting channel number adjustment on the first fusion feature image processed by the 1 multiplied by 1 convolution layer to obtain a second adaptive feature image, and a linear projection layer is used for conducting block processing on the second adaptive feature image and mapping through full connection to obtain the feature sequence with the spatial position information;
The method comprises the steps of carrying out feature transformation processing on a feature sequence by using a plurality of transducer layers through a self-attention mechanism iterated layer by layer, taking a high-dimensional feature vector output by a last transducer layer as a global associated feature vector, wherein a first normalization layer is used for carrying out normalization processing on a third input feature of the current transducer layer, inputting the first normalization feature into a QKV linear transformation layer, wherein when the current transducer layer is a first transducer layer in the plurality of transducer layers, the third input feature is a feature sequence, and when the current transducer layer is other transducer layers except the first transducer layer in the plurality of transducer layers, the third input feature is a high-dimensional feature vector output by the first transducer layer corresponding to the current transducer layer;
And converting the global associated feature vector into an encoded global feature map by using a feature reduction layer.
In some embodiments of the present application, the road multi-scale feature extraction module includes a first road multi-scale feature extraction module connected to the target ResNetV module and a second road multi-scale feature extraction module connected to the target KANConv module, each of the first road multi-scale feature extraction module and the second road multi-scale feature extraction module including a1×1 convolution branch, a multi-scale cavity convolution branch, a global average pooling branch, and an adaptive fusion layer;
When the road multi-scale feature extraction module is utilized to extract and fuse the road multi-scale feature of the first feature map output by the target ResNetV module and the second feature map output by the target KANConv module respectively, the segmentation module 91 is specifically configured to:
The method comprises the steps of respectively carrying out local detail feature extraction, multi-scale space feature extraction and global context information compression and recovery processing on a first feature image output by a target ResNetV module by using a 1X 1 convolution branch, a multi-scale cavity convolution branch and a global average pooling branch which are configured in a first road multi-scale feature extraction module to obtain a first sub-feature image, a second sub-feature image and a third sub-feature image, and carrying out channel splicing and weighted fusion on the first sub-feature image, the second sub-feature image and the third sub-feature image by using a self-adaptive fusion layer configured in the first road multi-scale feature extraction module to obtain a first enhancement feature image integrating multi-scale information;
And respectively carrying out local detail feature extraction, multi-scale space feature extraction and global context information compression and restoration processing on the second feature map output by the target KANConv module by using a 1X 1 convolution branch, a multi-scale cavity convolution branch and a global average pooling branch which are configured in the second road multi-scale feature extraction module to obtain a fourth sub-feature map, a fifth sub-feature map and a sixth sub-feature map, and carrying out channel splicing and weighted fusion on the fourth sub-feature map, the fifth sub-feature map and the sixth sub-feature map by using an adaptive fusion layer configured in the second road multi-scale feature extraction module to obtain a second enhancement feature map integrating multi-scale information.
In some embodiments of the present application, the progressive upsampling module includes a plurality of decoding layers having decreasing channel dimensions and a segmentation result generation layer connected to the plurality of decoding layers, each of the plurality of decoding layers including a third characteristic channel stitching module and an upsampling module;
When the progressive upsampling module is used for performing layer-by-layer upsampling, channel merging and compression decoding processing on the first feature map, the second feature map, the first fusion feature map, the first enhancement feature map and the second enhancement feature map to obtain a road segmentation result of the remote sensing image of the strip mine, the segmentation module 91 is specifically configured to sequentially determine each of a plurality of decoding layers as a current decoding layer, perform channel fusion on a fourth input feature by using a third feature channel splicing module configured therein to obtain a second fusion feature map of the current decoding layer, and perform upsampling and compression decoding processing on the second fusion feature map by using an upsampling module configured therein to obtain a decoding feature map output by the current decoding layer;
Carrying out channel dimension compression, probability mapping and binarization processing by utilizing a segmentation result generation layer based on a decoding feature map output by the last decoding layer in a plurality of decoding layers to obtain a road segmentation result of the strip mine remote sensing image;
The fourth input feature comprises a first feature map output by a last ResNetV module, a second feature map output by a last KANConv module and a global feature map when the current decoding layer is the first decoding layer of the plurality of decoding layers, a first enhancement feature map and a second enhancement feature map when the current decoding layer is the target decoding layer with the same channel dimension corresponding to the target ResNetV module and the target KANConv module, and a decoding feature map output by the current decoding layer corresponding to the previous decoding layer, a decoding feature map and an initial feature map output by the current decoding layer corresponding to the previous decoding layer when the current decoding layer is the last decoding layer of the plurality of decoding layers, and a decoding feature map and a second feature map when the current decoding layer is any decoding layer other than the first decoding layer, the target decoding layer and the last decoding layer, wherein the fourth input feature comprises the decoding feature map output by the current decoding layer corresponding to the previous decoding layer and the first feature map and the same channel dimension corresponding to the current decoding layer.
As shown in FIG. 10, the apparatus further includes a training module 92;
the training module 92 is used for training a road segmentation model, and is specifically used for acquiring a remote sensing image training set, wherein the remote sensing image training set comprises a sample strip mine remote sensing image and a corresponding road pixel level labeling mask, inputting the strip mine remote sensing image in the remote sensing image training set into the road segmentation model after initializing network parameters, sequentially extracting fusion feature images through a double-branch encoder, modeling long-distance dependence through a transform encoding module, enhancing features through a road multi-scale feature extraction module, decoding and outputting a prediction segmentation mask through a progressive up-sampling module, calculating a difference value between the prediction segmentation mask and the road pixel level labeling mask based on a preset loss function, and obtaining the road segmentation model after training completion by repeatedly executing the steps until the preset loss function converges or reaches a preset training round based on the difference value, wherein the preset loss function comprises a Dice loss function, a BCE loss function and a Hausdorff distance loss function, the die loss function is used for optimizing category intersection ratio, the Hausdorff distance loss function is used for reducing the boundary distance between prediction and labeling, and updating the network parameters of the road segmentation model through a back propagation algorithm based on the difference value.
It should be noted that, other corresponding descriptions of each functional unit related to the strip mine transportation road dividing device provided in this embodiment may refer to corresponding descriptions in fig. 1 and fig. 3, and are not described herein again.
Based on the above-described method shown in fig. 1 and 3, accordingly, the present embodiment also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described strip mine haul road segmentation method shown in fig. 1 and 3.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing an electronic device (may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present application.
In order to achieve the above objective, the embodiments of the present application further provide an electronic device, which may be a personal computer, a tablet computer, a server, or other network devices, based on the methods shown in fig. 1 and 3 and the virtual device embodiments shown in fig. 9 and 10, where the device includes a storage medium and a processor, where the storage medium is used to store a computer program, and the processor is used to execute the computer program to implement the method for dividing a strip mine transportation road shown in fig. 1 and 3.
Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and so on. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.
It will be appreciated by those skilled in the art that the above-described physical device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be a different arrangement of components.
The storage medium may also include an operating system, a network communication module. The operating system is a program that manages the physical device hardware and software resources described above, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the information processing entity equipment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.
The embodiment of the invention can pointedly solve the core problem of open-pit road segmentation through multi-module collaborative design, and has the technical effects that in a double-branch encoder, a ResNetV module accurately extracts global semantics and local space characteristics by means of residual connection and channel splicing of a main branch and a parallel branch, a KANConv module captures detail difference characteristics (such as textures for distinguishing roads, broken stones and trees) under a complex background through spline calculation and nonlinear transformation, the two are fused to form multidimensional characteristics, feature discrimination confusion caused by multi-source interference is effectively relieved, a transform encoding module enhances long-distance space dependence modeling of open-pit roads by embedding position information in blocks, a multi-layer self-attention mechanism and channel expansion, continuity and hole restoration capability of road segmentation are improved, a road multi-scale characteristic extraction module utilizes multi-branch (1×1 convolution, multi-scale hole convolution and global pooling) to fuse different scale characteristics, edge expression under a boundary fuzzy scene is enhanced, and edge smoothness is improved, and an upper sampling module accurately restores the characteristics (including initial characteristics, enhancement characteristics and the like) in a decoding process by gradually fusing multi-stage characteristics, and the channel dimension is matched to realize accurate restoration, the accuracy is reduced, the overall quality performance is improved, and the overall quality performance is remarkably improved on the overall road segmentation accuracy, and the overall segmentation accuracy is improved.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.