CN119598825B - Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system - Google Patents

Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system

Info

Publication number
CN119598825B
CN119598825B CN202410276466.8A CN202410276466A CN119598825B CN 119598825 B CN119598825 B CN 119598825B CN 202410276466 A CN202410276466 A CN 202410276466A CN 119598825 B CN119598825 B CN 119598825B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
red
countermeasure
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410276466.8A
Other languages
Chinese (zh)
Other versions
CN119598825A (en
Inventor
池沛
安乐
赵江
王英勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202410276466.8A priority Critical patent/CN119598825B/en
Publication of CN119598825A publication Critical patent/CN119598825A/en
Application granted granted Critical
Publication of CN119598825B publication Critical patent/CN119598825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/60Intended control result
    • G05D1/69Coordinated control of the position or course of two or more vehicles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

本发明涉及一种多无人机协同对抗决策方法及系统,方法包括如下步骤:通过建立多无人机空战对抗运动模型和空战态势评估模型来构建多无人机协同空战对抗决策环境;根据每架无人机在对抗决策环境中的动作空间和局部观测及状态建立多无人机协同对抗决策问题的分布式部分可观测马尔可夫决策过程模型;设计多机协同对抗奖励函数和HASAC算法网络空间;基于HASAC算法网络空间与多无人机协同对抗决策环境交互,训练并生成多机协同对抗策略模型。本发明面向多机协同空战对抗问题,为多机协同对抗决策无人机设计特定的全局状态,相较于直接将各个无人机观测向量拼接,降低了全局状态维度,提升训练效率。

This invention relates to a method and system for multi-UAV cooperative adversarial decision-making. The method includes the following steps: constructing a multi-UAV cooperative air combat adversarial decision-making environment by establishing a multi-UAV air combat adversarial motion model and an air combat situation assessment model; establishing a distributed partially observable Markov decision process model for the multi-UAV cooperative adversarial decision-making problem based on the action space, local observations, and states of each UAV in the adversarial decision-making environment; designing a multi-UAV cooperative adversarial reward function and a HASAC algorithm network space; and training and generating a multi-UAV cooperative adversarial strategy model based on the interaction between the HASAC algorithm network space and the multi-UAV cooperative adversarial decision-making environment. This invention addresses the multi-UAV cooperative air combat adversarial problem by designing specific global states for the UAVs involved in multi-UAV cooperative adversarial decision-making. Compared to directly concatenating the observation vectors of each UAV, this reduces the dimensionality of the global state and improves training efficiency.

Description

Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system
Technical Field
The invention belongs to the technical field of multi-unmanned aerial vehicle decision making, and particularly relates to a multi-unmanned aerial vehicle collaborative countermeasure decision making method and system.
Background
The purpose of the unmanned aerial vehicle countermeasure decision is to obtain an advantageous air combat situation by carrying out maneuver flight in an air combat environment with high dynamic change of situation through real-time decision, and the unmanned aerial vehicle countermeasure decision has the occupation condition of striking an enemy and damages the use condition of the enemy weapon. The single unmanned aerial vehicle has limited combat capability, and the multi-machine cooperative countermeasure decision-making can be realized through mutual cooperation of unmanned aerial vehicles, so that the combat mission is completed, and the single unmanned aerial vehicle becomes a main scene of future combat.
The unmanned aerial vehicle countermeasure decision algorithm can be divided into algorithms based on expert knowledge, game theory, optimization theory, deep reinforcement learning and the like. The algorithm based on the game theory and the optimization theory is widely used in one-to-one unmanned aerial vehicle countermeasure decision-making problem, the solving time is increased with the increase of the number of unmanned aerial vehicles, the solving is difficult, and the real-time performance is poor. The current multi-machine collaborative countermeasure decision-making problem is often decomposed into target allocation and single-machine countermeasure decision-making, and the combat scene is often a simple two-dimensional scene, so that the applicability is poor.
The deep reinforcement learning algorithm can generate a real-time accurate decision method for unmanned aerial vehicle countermeasure without expert experience, but is currently used for single-machine countermeasure. Compared with the single-machine countermeasure problem, the multi-machine cooperative countermeasure is a multi-agent system, and the strategy learning of the multi-machine cooperative countermeasure is not only related to the agent itself, but also related to the strategy of teammate opponents, and has the difficulties of non-stationarity, high state action dimension, partial observability, cooperative exploration and the like. Based on the centralized training and the distributed execution framework, the problem of non-stationarity can be relieved. The multi-agent reinforcement learning algorithm based on parameter sharing can only be used in isomorphic agent problems, and can relieve solving calculation burden, but can prevent learning combination strategies among agents, and has insufficient exploratory property on solving space and can be converged to suboptimal Nash balance. The heterogeneous multi-agent reinforcement learning algorithm based on the introduction of the maximized entropy can be applied to heterogeneous unmanned aerial vehicles, has strong exploratory property, avoids sinking into sub-optimal Nash equilibrium, and generates a real-time accurate multi-machine collaborative countermeasure strategy model.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention provides a multi-unmanned aerial vehicle collaborative countermeasure decision-making method and a system, which are based on a HASAC (Heterogeneous-Agent Soft Actor-Critic) algorithm, can realize real-time dynamic countermeasure decision of heterogeneous multi-unmanned aerial vehicles based on multi-Agent reinforcement learning and are used for solving the problems in the prior art.
A multi-unmanned aerial vehicle collaborative countermeasure decision-making method, the method comprising the steps of:
Step 1, constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by building a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model;
step 2, establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
step 3, designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space;
And 4, training and generating a multi-machine collaborative countermeasure policy model based on HASAC algorithm network space and multi-unmanned aerial vehicle collaborative countermeasure decision-making environment interaction, wherein the multi-unmanned aerial vehicle comprises multiple unmanned aerial vehicles and multiple unmanned aerial vehicles of enemy, wherein the three unmanned aerial vehicles are red parties, and the enemy is blue party.
In the aspects and any possible implementation manner described above, there is further provided an implementation manner, where the step 1 specifically includes:
step 11, analyzing the stress of each unmanned aerial vehicle and establishing a particle motion model;
step 12, analyzing the interrelation among the multiple unmanned aerial vehicles, and establishing a relative motion model of the multiple unmanned aerial vehicles;
and 13, establishing an unmanned aerial vehicle air combat situation assessment model according to the angle, the speed, the height and the distance of the unmanned aerial vehicle.
Aspects and any one of the possible implementations described above, further providing an implementation in which the particle motion model includes kinematic and kinetic equations, in particular of the formula:
In the formula, Respectively unmanned aerial vehicle under inertial coordinate systemCoordinates of the shaft, speed, track inclination angle, track deflection angle and gravity acceleration; the tangential overload of the unmanned aerial vehicle is represented along the flying speed direction of the unmanned aerial vehicle; Vertical and flight velocity vectors, representing the normal overload of the drone, Indicating the roll angle of the drone about the speed axis.
In the aspects and any possible implementation manner as described above, there is further provided an implementation manner, wherein the building of the relative motion model of the unmanned aerial vehicle is as follows:
in the formula, Numbering set for number of red unmanned aerial vehicle,Numbering set for the number of blue unmanned aerial vehiclesRed squareSpeed vector of unmanned aerial vehicle,Lan Fangdi ASpeed vector of unmanned aerial vehicle,Red squareUnmanned aerial vehicle's teammate unmanned aerial vehicleVelocity vector of (2),I.e.A teammate number; Is red square Unmanned aerial vehicle and Lan FangdiUnmanned aerial vehicle is atThe distance of the axis; Represent the first Red square unmanned aerial vehicle and red square teammate unmanned aerial vehicleAt the position ofDistance of axis, red squareUnmanned aerial vehicle and Lan FangdiRelative position vector of unmanned aerial vehicleRed squareUnmanned aerial vehicle frame and red team friend unmanned aerial vehicleIs a relative position vector of (2)Red square (I)Unmanned aerial vehicle and Lan FangdiOff angle of unmanned aerial vehicleIs red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and Lan FangdiRelease angle of unmanned aerial vehicleIs blue squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs a departure angle of (2)Is red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs of the angle of departure of (2)Unmanned aerial vehicle for teammatesVelocity vector and relative position vectorIs arranged at the lower end of the cylinder,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Express Red squareUnmanned aerial vehicle is relative to its teammate unmanned aerial vehicleThe magnitude of the relative distance is such that,Respectively represent red squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent the blue squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleSpeed, track dip angle, track offset angle; Respectively represent red square The position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent the blue squareThe position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehiclePosition under inertial coordinates.
In the aspect and any possible implementation manner described above, there is further provided an implementation manner, and step 13 specifically includes converting the multi-machine cooperative countermeasure into a target assignment and a single machine countermeasure, where the target assignment is based on situation assessment, and in the case of mutual threat of red party and blue party, the assigned target is made to have a minimum threat than itself, and the threat of itself is greater than that of the target. The expression of the situation assessment is as follows:
Wherein, the Is red squareUnmanned aerial vehicle is with respect to Lan FangdiA situation evaluation value of the unmanned aerial vehicle,,,,Respectively represent red squareUnmanned aerial vehicle is with respect to Lan FangdiThe attitude advantages of the unmanned plane angle, height, speed and distance,For the corresponding weight, satisfy
Aspects and any one of the possible implementations as described above, further providing an implementation, the step 2 specifically includes that the multi-unmanned aerial vehicle collaborative countermeasure decision model is implemented based on a distributed partially observable markov decision process, the distributed partially observable markov process employing tuplesTo describe, among others,Representation ofA set of red-colored square unmanned aerial vehicles; Is the state space of the red unmanned plane; Is the joint action space of all red unmanned aerial vehicles, Express Red squareThe action space of the unmanned aerial vehicle is set up,;Is red squareThe frame unmanned aerial vehicle locally observes in a global state,Is a joint rewarding function of all red unmanned aerial vehicles to cooperate against blue parties,Is a state transfer function that is a function of the state,Is discount factor, red squareThe local observation of the unmanned aerial vehicle comprises a red squareAnd (5) information of the unmanned aerial vehicle, information of the blue unmanned aerial vehicle and information of teammates.
Aspects and any one of the possible implementations as described above, further providing an implementation of the design of the multi-machine collaborative countermeasure bonus functionThe sum of rewards is obtained for each unmanned plane of red party against blue party, namely, wherein,Express Red squareRewards against all blue-side drones by the drones.
The method for establishing HASAC algorithm network space specifically comprises the steps of adopting a centralized training distributed execution framework, including n strategy networks, two value networks and two target value networks, wherein each red unmanned aerial vehicle corresponds to one strategy network, the strategy networks have the same structure and are mutually independent and are used for approximating an unmanned aerial vehicle decision model, and the value networks are used for evaluating whether the strategy networks execute actions under given observation.
The above aspect and any possible implementation manner further provides an implementation manner, wherein the step 4 specifically includes taking the observation of each red unmanned aerial vehicle at the current moment as the input of the policy network of each red unmanned aerial vehicle, outputting the action of each red unmanned aerial vehicle under the observation at the current moment, and simultaneously returning the observation and global state and joint rewards of each red unmanned aerial vehicle at the next moment by the interaction environment, and storing the observation and global state of each red unmanned aerial vehicle at the current moment, the joint action of each red unmanned aerial vehicle and the observation and global state of each red unmanned aerial vehicle at the next moment, and the joint rewards into an experience pool connected with a HASAC algorithm network space.
The invention also provides a multi-unmanned aerial vehicle cooperative countermeasure decision-making system, which is used for realizing the method and comprises the following steps:
The construction module is used for constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by constructing a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model;
The first establishing module is used for establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
the second building module is used for designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space;
The generation module is used for interacting with the multi-unmanned aerial vehicle collaborative countermeasure decision-making environment based on HASAC algorithm network space, training and generating a multi-unmanned aerial vehicle collaborative countermeasure policy model, wherein the multi-unmanned aerial vehicle comprises a plurality of unmanned aerial vehicles on the my side and a plurality of unmanned aerial vehicles on the enemy side, the my side is taken as a red side, and the enemy side is taken as a blue side.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a multi-machine collaborative countermeasure decision-making method based on heterogeneous multi-agent reinforcement learning based on unmanned aerial vehicle motion modeling and air combat situation assessment. Firstly, a three-degree-of-freedom unmanned aerial vehicle particle model and a situation assessment model are established. And secondly, establishing a multi-machine collaborative countermeasure decision-making model based on the distributed part observable Markov decision-making process model, and designing actions, states, observations and rewarding functions and networks of the multi-machine collaborative countermeasure. And finally, training the network by adopting HASAC algorithm as heterogeneous multi-agent reinforcement learning algorithm to generate a multi-machine collaborative countermeasure decision model. Has the following beneficial effects:
(1) The invention aims at the problem of multi-machine collaborative air combat countermeasure, designs a specific global state for multi-machine collaborative countermeasure decision-making agents, reduces the dimension of the global state and improves the training efficiency compared with the direct splicing of the observation vectors of all agents.
(2) According to the invention, HASAC algorithm is adopted as heterogeneous multi-agent reinforcement learning algorithm, maximized entropy is introduced, randomness of action exploration is increased, sub-optimal Nash equilibrium is avoided, strategy networks of all agents are sequentially updated, and a combined strategy is formed by the trained strategy networks, so that multi-machine collaborative countermeasure real-time decision can be realized.
Drawings
FIG. 1 is a schematic diagram of a multi-machine collaborative countermeasure decision framework according to the present invention;
FIG. 2 is a schematic diagram of the relative motion relationship of multiple unmanned aerial vehicles according to the present invention;
FIG. 3 is a diagram of a multi-machine collaborative countermeasure decision heterogeneous multi-agent reinforcement learning framework of the present invention;
FIG. 4 is a schematic diagram of a policy network architecture according to the present invention;
FIG. 5 is a schematic diagram of a value network architecture of the present invention;
FIG. 6 is a schematic diagram of a multi-machine collaborative countermeasure simulation graph (average situation) according to the present invention;
FIG. 7 is a schematic diagram of a multi-machine collaborative countermeasure simulation graph (dominant situation) according to the present invention;
fig. 8 is a schematic diagram of a multi-machine collaborative countermeasure simulation graph (inferior situation) of the present invention.
Detailed Description
For a better understanding of the present invention, the present disclosure includes, but is not limited to, the following detailed description, and similar techniques and methods should be considered as falling within the scope of the present protection. In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
It should be understood that the described embodiments of the invention are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The invention provides a multi-unmanned aerial vehicle collaborative countermeasure decision-making method, which comprises the following steps:
Step 1, constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by building a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model;
Step 2, establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
step 3, designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space;
And 4, training and generating a multi-machine collaborative countermeasure policy model based on HASAC algorithm network space and multi-unmanned aerial vehicle collaborative countermeasure decision-making environment interaction, wherein the multi-unmanned aerial vehicle comprises multiple unmanned aerial vehicles and multiple unmanned aerial vehicles of enemy, wherein the three unmanned aerial vehicles are red parties, and the enemy is blue party.
Preferably, the step 1 specifically includes:
step 11, analyzing the stress of each unmanned aerial vehicle and establishing a particle motion model;
step 12, analyzing the interrelation among the multiple unmanned aerial vehicles, and establishing a relative motion model of the multiple unmanned aerial vehicles;
and 13, establishing an unmanned aerial vehicle air combat situation assessment model according to the angle, the speed, the height and the distance of the unmanned aerial vehicle.
Preferably, the particle motion model comprises kinematic and kinetic equations, in particular the following formula:
in the formula, Respectively unmanned aerial vehicle under inertial coordinate systemCoordinates of the shaft, speed, track inclination angle, track deflection angle and gravity acceleration; the tangential overload of the unmanned aerial vehicle is represented along the flying speed direction of the unmanned aerial vehicle; Vertical and flight velocity vectors, representing the normal overload of the drone, Indicating the roll angle of the drone about the speed axis.
Preferably, the relative motion model of the unmanned aerial vehicle is established as follows:
in the formula, Numbering set for number of red unmanned aerial vehicle,Numbering set for the number of blue unmanned aerial vehiclesRed squareSpeed vector of unmanned aerial vehicle,Lan Fangdi ASpeed vector of unmanned aerial vehicle,Red squareUnmanned aerial vehicle's teammate unmanned aerial vehicleVelocity vector of (2),I.e.A teammate number; Is red square Unmanned aerial vehicle and Lan FangdiUnmanned aerial vehicle is atThe distance of the axis; Represent the first Red square unmanned aerial vehicle and red square teammate unmanned aerial vehicleAt the position ofDistance of axis, red squareUnmanned aerial vehicle and Lan FangdiRelative position vector of unmanned aerial vehicleRed squareUnmanned aerial vehicle frame and red team friend unmanned aerial vehicleIs a relative position vector of (2)Red square (I)Unmanned aerial vehicle and Lan FangdiOff angle of unmanned aerial vehicleIs red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and Lan FangdiRelease angle of unmanned aerial vehicleIs blue squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs a departure angle of (2)Is red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs of the angle of departure of (2)Unmanned aerial vehicle for teammatesVelocity vector and relative position vectorIs arranged at the lower end of the cylinder,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Express Red squareUnmanned aerial vehicle is relative to its teammate unmanned aerial vehicleThe magnitude of the relative distance is such that,Respectively represent red squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent the blue squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleSpeed, track dip angle, track offset angle; Respectively represent red square The position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent the blue squareThe position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehiclePosition under inertial coordinates.
Preferably, the step 13 specifically comprises converting the multi-machine cooperative countermeasure into target allocation and single-machine countermeasure, wherein the target allocation is based on situation assessment, so that the allocated target has minimum threat than the target itself and has greater threat than the target itself under the condition that the red party and the blue party threaten each other. The expression of the situation assessment is as follows:
Wherein, the Is red squareUnmanned aerial vehicle is with respect to Lan FangdiA situation evaluation value of the unmanned aerial vehicle,,,,Respectively represent red squareUnmanned aerial vehicle is with respect to Lan FangdiThe attitude advantages of the unmanned plane angle, height, speed and distance,For the corresponding weight, satisfy
Preferably, the step 2 specifically comprises that the multi-unmanned aerial vehicle collaborative countermeasure decision model is implemented based on a distributed partially observable Markov decision process, and the distributed partially observable Markov decision process adopts tuplesTo describe, among others,Representation ofA set of red-colored square unmanned aerial vehicles; Is the state space of the red unmanned plane; Is the joint action space of all red unmanned aerial vehicles, Express Red squareThe action space of the unmanned aerial vehicle is set up,;Is red squareThe frame unmanned aerial vehicle locally observes in a global state,Is a joint rewarding function of all red unmanned aerial vehicles to cooperate against blue parties,Is a state transfer function that is a function of the state,Is discount factor, red squareThe local observation of the unmanned aerial vehicle comprises a red squareAnd (5) information of the unmanned aerial vehicle, information of the blue unmanned aerial vehicle and information of teammates.
Preferably, the design multi-machine collaborative countermeasure reward functionThe sum of rewards is obtained for each unmanned plane of red party against blue party, namely, wherein,Express Red squareRewards against all blue-side drones by the drones.
Preferably, the establishing HASAC algorithm network space specifically comprises the steps of adopting a centralized training distributed execution framework, including n strategy networks, two value networks and two target value networks, wherein each red unmanned aerial vehicle corresponds to one strategy network, the strategy networks have the same structure and are mutually independent and are used for approximating an unmanned aerial vehicle decision model, and the value networks are used for evaluating the performance of the strategy networks under given observation.
Preferably, the step 4 specifically includes taking the observation of each red unmanned aerial vehicle at the current moment as the input of a strategy network of each red unmanned aerial vehicle, outputting the actions of each red unmanned aerial vehicle under the observation at the current moment, returning the observation and global state and the combined rewards of each red unmanned aerial vehicle at the next moment by the interactive environment, and storing the observation and global state of each red unmanned aerial vehicle at the current moment, the combined actions of each red unmanned aerial vehicle and the observation and global state and the combined rewards of each red unmanned aerial vehicle at the next moment into an experience pool connected with a HASAC algorithm network space.
As shown in fig. 1, fig. 2, fig. 3, fig. 4 and fig. 5, according to the perception information, the invention assumes that the red-blue unmanned aerial vehicle is the two opposing sides, the my is the red side, the opponent is the blue side, the two sides can acquire the position, speed and gesture information of the self, opponent and teammate, only consider the opposing of the decision link to perform situation assessment and maneuvering decision, and the overall frame of the air combat decision is shown in fig. 1. The air combat decision is a key link of air combat countermeasure, and the specific implementation process of the invention is as follows:
S1, multi-machine collaborative air combat countermeasure decision-making environment
Assuming that the three-dimensional unmanned aerial vehicle is a red unmanned aerial vehicle, and an opponent is a blue unmanned aerial vehicle, establishing a single unmanned aerial vehicle particle motion model and a multi-unmanned aerial vehicle relative motion model. And combining the air combat situation elements to establish an air combat situation assessment model.
S1-1 building a multi-unmanned aerial vehicle air combat countermeasure motion model
A. Single unmanned aerial vehicle motion model
By simplifying and deducing the stress of the unmanned aerial vehicle, a three-degree-of-freedom particle model is established, and the kinematic and dynamic equations are as follows
(1)
(2)
In the formula,Respectively unmanned aerial vehicle under inertial coordinate systemCoordinates of the shaft, speed, track inclination angle, track deflection angle and gravity acceleration; the tangential overload of the unmanned aerial vehicle is represented along the flying speed direction of the unmanned aerial vehicle; Vertical and flight velocity vectors, representing the normal overload of the drone, Indicating the roll angle of the drone about the speed axis.
B. Multi-unmanned aerial vehicle relative motion model
Multiple units against middle redUnmanned aerial vehicle frame, number setThe blue prescription hasUnmanned aerial vehicle frame, number setRed squareThe relative motion relationship between the frame unmanned aerial vehicle and the blue unmanned aerial vehicle and the red teammate unmanned aerial vehicle is shown in fig. 2. Red squareSpeed vector of unmanned aerial vehicle() Lan Fangdi ASpeed vector of unmanned aerial vehicle() Red squareUnmanned aerial vehicle's teammate unmanned aerial vehicleVelocity vector of (2)Subscript ofI.e.The number of the teammate unmanned aerial vehicle is indicated.Is red squareUnmanned aerial vehicle and Lan FangdiUnmanned aerial vehicle is atThe distance of the axis; Represent the first Red square unmanned aerial vehicle and red square teammate unmanned aerial vehicleAt the position ofDistance of axis, red squareUnmanned aerial vehicle and Lan FangdiRelative position vector of unmanned aerial vehicleRed squareUnmanned aerial vehicle frame and red team friend unmanned aerial vehicleIs a relative position vector of (2)Red square (I)Unmanned aerial vehicle and Lan FangdiOff angle of unmanned aerial vehicleIs red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and Lan FangdiRelease angle of unmanned aerial vehicleIs blue squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs a departure angle of (2)Is red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs of the angle of departure of (2)Unmanned aerial vehicle for teammatesVelocity vector and relative position vectorIs arranged at the lower end of the cylinder,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Express Red squareUnmanned aerial vehicle is relative to its teammate unmanned aerial vehicleThe relative distance is calculated by the following formula:
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
in the formula, Respectively represent red squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent the blue squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleSpeed, track dip angle, track offset angle; Respectively represent red square The position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent the blue squareThe position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehiclePosition under inertial coordinates.
S1-2 unmanned aerial vehicle air combat situation assessment model
And realizing multi-machine countermeasure multi-objective distribution based on situation evaluation in the multi-machine cooperative countermeasure process, and converting the multi-machine cooperative countermeasure problem into objective distribution and single-machine countermeasure problem. The target allocation is based on situation assessment, and the threat of the red party to the blue party and the threat of the blue party to the red party are considered, so that the allocated target has the least threat than the target, and the threat of the target is larger than the target. The situation assessment focuses on the angle, speed, height and distance elements of the red and blue unmanned aerial vehicle, and the following formula is calculated:
(20)
Wherein, the Is red squareUnmanned aerial vehicle is with respect to Lan FangdiA situation evaluation value of the unmanned aerial vehicle,,,,Respectively represent red squareUnmanned aerial vehicle is with respect to Lan FangdiThe attitude advantages of the unmanned plane angle, height, speed and distance,For the corresponding weight, satisfy
The angle dominance function is designed as follows:
(21)
the height dominance function is designed as follows:
(22) In which, in the process, Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe z-axis coordinate difference of the unmanned aerial vehicle,And the optimal height advantage difference of the unmanned aerial vehicle for both red and blue is shown.
The speed dominance function is designed as follows:
(23)
in the formula, Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe speed of the unmanned aerial vehicle is different,Indicating the maximum speed and minimum speed of the unmanned aerial vehicle.
The distance advantage is designed as follows:
(24)
in the formula, Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Representing the maximum and minimum distances of the unmanned aerial vehicle on-board weapon attack.
By applying the formulae (20) - (24) toIs replaced by-Then calculate to get the red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleIs used for the parameter setting.
S2, a distributed part observable Markov decision process model of the multi-machine collaborative countermeasure decision problem is established, two parts of unmanned plane model and situation assessment in the environment of FIG. 3 are corresponding to S1, modeling is carried out through the distributed part observable Markov process in the step S2-modeling of the whole multi-machine countermeasure decision problem is carried out, and the distributed part observable Markov process is formed by tuplesDescription.
In the case of a multi-machine collaborative countermeasure decision-making problem,Representation ofA set of red-colored square unmanned aerial vehicles; Is the state space of the red unmanned plane; Is the joint action space of all red unmanned aerial vehicles, Express Red squareThe action space of the unmanned aerial vehicle is set up,;Is red squareThe frame unmanned aerial vehicle locally observes in a global state,Is a state transfer function that is a function of the state,Is a discount factor. At the time ofThe red unmanned plane is in observationExecuting actions at that time(,) Is the firstPolicy of the red-legged unmanned aerial vehicle), obtain a joint rewarding functionAnd the state of the next momentAnd observation ofCombined reward functionThe result is obtained in S3, which is simply referred to as R. Its joint goal is to learn Red party strategySo as to maximize the expected total yield, introducing a maximized entropy term in the joint objective, whose objective is the following formula:
(25)
Wherein, the Is a temperature constant that is a trade-off between rewards and maximization of entropy.Representing mathematical expectations, T represents the moment when the round ends,Representing policiesIs a function of the entropy of (a).
Each red square unmanned aerial vehicle has own local observation, red squareUnmanned aerial vehicle observationComprises a red squareFrame unmanned aerial vehicle self informationUnmanned aerial vehicle informationAnd teammate informationComprises a red squareThe speed of the unmanned aerial vehicle, the track dip angle and the track deflection angle; Comprises a red square The relative angle and distance elements of the unmanned aerial vehicle relative to each blue unmanned aerial vehicle, and the speed, the track dip angle and the track deflection angle of the blue unmanned aerial vehicle; Comprises a red square Relative angle and distance elements of unmanned aerial vehicle relative to each team, speed, track dip angle and track deflection angle of unmanned aerial vehicle, subscriptThe teammate number is represented by the following formula:
Wherein the method comprises the steps of
,
,
,
Wherein, the Respectively represent the firstThe speed of the red-square unmanned aerial vehicle, the track dip angle and the track deflection angle; Respectively represent the first The speed, the track dip angle and the track deflection angle of the blue-based unmanned aerial vehicle; Respectively represent the first Team friend unmanned aerial vehicle of red square unmanned aerial vehicle of frameSpeed, track pitch angle and track offset angle; Respectively represent the first Red square unmanned aerial vehicle and Lan FangdiDeviation angle, deviation angle and relative position deviation between unmanned aerial vehicles; express Red square Unmanned aerial vehicle and red team friend unmanned aerial vehicleOffset angle, disengaging angle, relative positional deviation between the two.
Global stateThe method has the advantages that redundant information is removed on the basis of combining the observation information of each red unmanned aerial vehicle, and compared with direct observation splicing, the method reduces the state dimension and is beneficial to accelerating training to convergence.
Wherein, the Unmanned plane for expressing red team friendsRemoving and reddening the observed information of (2)Identical elements in the observation information of the unmanned aerial vehicle.
Action spaceDesigned as continuous action space, the red unmanned aerial vehicle jointly acts, wherein,
S3, designing a multi-machine collaborative countermeasure rewarding function
Multi-machine collaborative air combat joint rewarding function in step S2The sum of rewards is obtained for each unmanned plane of red party against blue party, namely,Express Red squareRewards against all blue unmanned aerial vehicles by the unmanned aerial vehicle are set up as follows
Wherein, the Express Red squareFrame unmanned aerial vehicle relative blue squareRewards for unmanned aerial vehicles, including short-term rewardsAnd long-term rewardsI.e.
Is dense rewards, is short-term rewards designed based on situation assessment, and has the expression:
(26)
Wherein, the Is red squareUnmanned aerial vehicle is with respect to Lan FangdiAngle, height, speed and distance rewards of the unmanned aerial vehicle,Corresponding weights are awarded for angle, altitude, speed, distance.
(27)
(28)
In the formula,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe z-axis coordinate of the unmanned aerial vehicle is different.
(29)
(30)
Is sparse rewarding, red squareUnmanned aerial vehicle and Lan FangdiAnd (5) rewarding value for ending the fight of the unmanned aerial vehicle. Wherein the two unmanned aerial vehicle fight ending marks are divided into red squaresRack unmanned aerial vehicle hit Lan FangdiFrame unmanned aerial vehicle or by blue squareHit by unmanned aerial vehicle, red squareThe unmanned aerial vehicle is beyond the simulation boundary Lan FangdiThe frame unmanned aerial vehicle exceeds the simulation boundary and exceeds the maximum simulation step length of each round, and the following formula is calculated:
(31)
Red square Rack unmanned aerial vehicle hit Lan FangdiThe unmanned aerial vehicle needs to meet the following constraints
(32)
In the formula,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Representing the maximum and minimum distances of the unmanned aerial vehicle on-board weapon attack.
The unmanned aerial vehicle needs to satisfy the following formula in the emulation boundary:
(33)
Each round of simulation step used in equation (31) The following formula is required to be satisfied:
(34)
in the formula, Is the maximum simulation step per round.
S4 design HASAC algorithm network space
The HASAC algorithm adopts a centralized training distributed execution framework, and consists of n strategy networks, two value networks and two target value networks, wherein n represents the number of red unmanned aerial vehicles, and each strategy network corresponds to one red unmanned aerial vehicle. Each red unmanned aerial vehicle is connected with a strategy network and is used for approximating an unmanned aerial vehicle decision model to generate a decision action under the local observation of the red unmanned aerial vehicle, the strategy network structures are the same, but all the strategy networks are independently learned, and network parameters are not shared. For example, red squarePolicy network of unmanned aerial vehicleThe input of (a) is red squareUnmanned aerial vehicle is atObservation of time of dayAnd outputting the decision action distribution of the unmanned aerial vehicle under the current observation. Value networks are used to approximate a given global stateAll red unmanned aerial vehicles execute joint actionI.e., the input is state and action, and the output is Q. The two value networks are independently trained in the same structure, and smaller Q values are used for strategy network updating to relieve the problem of Q value overestimation, namely, the self characteristics of a reinforcement learning algorithm are improved, and when the Q values are estimated by using the value networks, the learned Q value function overestimates the real Q values due to estimation errors or other factors in the training process. The target value network is used for stabilizing the training process, the network structure is the same as that of the value network, and the target value network parameter update is obtained by carrying out weighted average based on the value network parameter. The policy network inputs the observation and outputs the decision action under the observation, each policy network interacts with the environment, the collected training data is stored in the experience pool, namely, the action instruction of the red unmanned aerial vehicle is acted on the red unmanned aerial vehicle model, the following whole environment is used for updating the state and collecting the data, o ', S ' R of the environment are returned, the o ', S and a are stored in the experience pool together, then in the training process, the estimated Q (S, a) value of the current state and the estimated Q (S ', a ') value of the next state are needed for calculating the loss function loss, at this time, the S ' for the estimated Q (S ', a ') of the next state is extracted from the experience pool, and a ' is a ' generated by using the policy network input o ', and the content is described in detail in step S5.
S5 training based on HASAC and generating multi-machine collaborative countermeasure decision model
When the countermeasure training is performed based on HASAC algorithm, a centralized training and distributed execution framework is adopted, after parameters such as a network, an experience pool and the like are initialized, data are collected and network parameter updating is performed based on strategy network and environment interaction, as shown in fig. 3, the specific training process is as follows:
1) Acquisition from an interaction environment Time of day observationRespectively, policy networkInput Red squareUnmanned aerial vehicle observationOutput red squareAction of unmanned aerial vehicleAll red unmanned aerial vehicle motor forms combined action;
2) All actions are performedInputting the interactive environment, and returning the environment to the next observationAnd global stateAnd performing a joint reward for an action. Wherein the motion isThe method comprises the steps of respectively acting on n unmanned aerial vehicle models of the red party to update the state of the red party unmanned aerial vehicle, converting a multi-countermeasure decision problem into a situation that each blue party unmanned aerial vehicle distributes a red party target and a one-to-one countermeasure decision problem based on target distribution, outputting the red party target attacked by the blue Fang Moren based on situation evaluation by the target distribution model, outputting the decision action of the blue party unmanned aerial vehicle against the red party target by the minimum, and updating the state of the blue party unmanned aerial vehicle. The target allocation is used for evaluating the threat of an opponent when the red unmanned aerial vehicle and the blue unmanned aerial vehicle are opponent based on the situation evaluation model, so that the threat of the allocated red target to the blue target is small, and the threat of the blue unmanned aerial vehicle to the red target is large. Extraction of observations of a red unmanned aerial vehicle by an observation and global state processorAnd global state valueAnd calculating a joint prize based on the prize function;
3) Will currently beObservation of each red square unmanned aerial vehicle at momentAnd global stateCombined action of all red unmanned aerial vehiclesObserving each red unmanned aerial vehicle at next momentAnd global stateCombined rewardsComposition tupleStoring the data into an experience pool;
4) Then randomly sampling data from the experience pool As training data of the value network, the strategy network and the target value network, updating the value network, the target value network and the strategy network of each red unmanned aerial vehicle based on an Adam optimizer;
The specific process of updating the value network is that two value networks input sampling data AndOutput stateExecute action downwardsIs of the estimated Q value of (2)And will be smallerThe value is used as the predicted Q value in the loss function, and the two target value networks are based on the state of the sampling data at the next momentAnd the policy network observes at the next time of inputOutput at the timePredicting an estimated Q value for a next time instantBased on rewardsEntropy regularization termCalculating a target Q value, and taking the mean square error of the minimum target Q value and the predicted Q value as a loss function loss to update the value network parameters, wherein the two value network parameters are independently updated;
and updating the strategy network, namely randomly generating a group of n red unmanned aerial vehicle number arrangements, updating the strategy network of each red unmanned aerial vehicle according to the arrangement sequence, and considering the previously updated red unmanned aerial vehicle strategy network when updating the current red unmanned aerial vehicle strategy network parameters, wherein a loss function for guiding the strategy network to update takes a smaller Q value and entropy regularization item in the estimated Q values of the two value networks as input.
Updating the target value network, namely optimizing the target value network parameters through soft updating at intervals of a certain step number.
And repeating the steps until all the networks gradually converge, namely taking each strategy network as a multi-unmanned aerial vehicle cooperative countermeasure decision model of the red party when rewards per round are not obviously increased and the countermeasure time length of the round is not obviously shortened in a period of time, inputting the strategy network as local observation of each unmanned aerial vehicle of the red party at the current moment, and outputting the strategy network as decision action executed by each unmanned aerial vehicle of the red party under the current observation.
In order to facilitate understanding of the above technical solutions of the present invention, the following specific examples are used to describe the above technical solutions of the present invention in detail.
Taking two-to-two air combat as an example, the red party adopts HASAC algorithm to conduct combat training, and the blue party adopts traditional decision algorithm including target allocation and minimax algorithm. The target allocation is designed based on situation assessment values of the blue unmanned aerial vehicle and the red unmanned aerial vehicle, a threat matrix of the red party to the blue party and a threat matrix of the blue party to the red party are constructed, threat to an enemy target is maximized, self risk is minimized, and the method is solved through linear programming. The target allocation changes the multi-machine cooperative countermeasure into one-to-one countermeasure, and the blue-side one-to-one countermeasure decision algorithm is a minimum algorithm. The multi-machine collaborative countermeasure decision framework based on heterogeneous multi-agent reinforcement learning is shown in fig. 3.
The simulation environment parameters are set as follows, the simulation boundary is 20km x 10km, parameters below the unmanned aerial vehicle with red and blue are the same, the speed value range is 80 m/s-400 m/s, and the track dip angle range isThe range of track deflection angle isThe weapon attack range is 150 m-900 m, the action quantity tangential overload range is-3 g, the normal overload range is-5 g, and the rolling angle range around the speed shaft is
The parameters of the training algorithm were set to 1000,000 experience pool size, 1000 experience playback batch size, 0.99 discount factor, 0.005 soft update factor, 500 maximum steps per round, and 0.0005 each network learning rate.
The first set of experimental red Fang Chushi is in an average situation, the second set of experimental red Fang Chushi is in a dominant situation, the third set of experimental red Fang Chushi is in a disadvantaged situation, and the states of the two specific red and blue are shown in table 1.
Observations ofWherein the method comprises the steps of
I.e.
Global stateSplicing is carried out on the basis of local observation of two red unmanned aerial vehicles, andAndThe duplicate elements in the tree are de-duplicated, the global state dimension is reduced,Representation ofIs removed fromThe elements of the repetition are selected to be,Representation ofIs removed fromRepeated elements.
The observation space of each red unmanned aerial vehicle is 27 dimension, the state space is 42 dimension, wherein,,Respectively representing the speeds, track dip angles and track deflection angles of two red unmanned aerial vehicles;, respectively representing the speeds, track dip angles and track deflection angles of the two blue unmanned aerial vehicles; Respectively represent red square Relative angle and relative position information between the unmanned aerial vehicle and Lan Fangdi unmanned aerial vehicles; Represent the first Relative angle and relative position information between the red unmanned aerial vehicle and Lan Fangdi unmanned aerial vehicles; Represent the first Relative angle and relative position information between red unmanned aerial vehicle and red teammate unmanned aerial vehicle, whereinIndicating the teammate number.
,,,The parameters are shown in Table 1, relative angles,Calculation was performed using the previous formulas (14), (15), (16) and (17);,, the relative position information calculation is obtained using the foregoing formulas (6) (7) (8) (9) (10) (11).
The motion space of each red unmanned aerial vehicle is 3D, and the red unmanned aerial vehicles jointly act, wherein,
Joint rewarding function
Wherein, the
When the fight training is carried out based on HASAC algorithm, a centralized training and distributed execution framework is adopted, each red unmanned aerial vehicle is provided with a strategy network, and all red unmanned aerial vehicles are provided with a centralized value network and a target value network. The structure of the policy network and the value network are shown in fig. 4 and 5, respectively. In order to alleviate the problem of Q value overestimation, two independent value networks are designed, and meanwhile, two target value networks are introduced for stabilizing the training process.
After initializing parameters such as a network, an experience pool and the like, collecting data based on interaction of a strategy network and an environment and updating network parameters, wherein a specific training process is shown in fig. 3:
1) Acquisition from an interaction environment Time of day observationThe policy network 1 inputs the observation of the first unmanned aerial vehicle of red squareOutputting the action of the first unmanned aerial vehicleThe policy network 2 inputs the observation of the second unmanned aerial vehicle of the red squareOutputting the action of the second unmanned aerial vehicleForm a combined action;
2) Action is to takeRespectively inputting the environment to the interaction environment, and returning the environment to the observation at the next momentAnd global stateAnd performing a joint reward for an action. Wherein the motion isThe method comprises the steps of operating on a red 1 unmanned aerial vehicle model and a red 2 unmanned aerial vehicle model, carrying out state updating of the red unmanned aerial vehicle, converting a two-to-two countermeasure decision problem into a one-to-one countermeasure decision problem by a blue party decision model based on target allocation, allocating a red party target and a one-to-one countermeasure decision problem by each blue party unmanned aerial vehicle, outputting the red party target attacked by the blue Fang Moren machine by the target allocation model based on situation evaluation, outputting decision actions of the blue party unmanned aerial vehicle against the red party target by the minimum, and carrying out state updating of the blue party unmanned aerial vehicle. The target allocation is used for evaluating the threat of an opponent when the red unmanned aerial vehicle and the blue unmanned aerial vehicle are opponent based on the situation evaluation model, so that the threat of the allocated red target to the blue target is small, and the threat of the blue unmanned aerial vehicle to the red target is large. Extraction of observations of a red unmanned aerial vehicle by an observation and global state processorAnd global state valueAnd calculating a joint prize based on the prize function;
3) Then, the current isObservation of two red unmanned aerial vehicles at momentAnd global stateCombined action of two red square unmanned aerial vehiclesTwo red unmanned aerial vehicle observations at the next momentAnd global stateCombined rewardsComposition tupleStoring the data into an experience pool;
4) Thereafter, randomly sampling data from the experience pool As training data of the value network, the strategy network and the target value network, updating the value network, the target value network and the strategy networks of the two red unmanned aerial vehicles based on an Adam optimizer;
updating value networks two value networks input sample data AndOutput stateExecute action downwardsIs of the estimated Q value of (2)And will be smallerAs the predicted Q value in the loss function, two target value networks based on the state of the sampling data at the next momentAnd the policy network observes at the next time of inputOutput at the timePredicting an estimated Q value for a next time instantBased on rewardsEntropy regularization termCalculating a target Q value, and updating value network parameters by taking the mean square error of the minimum target Q value and the predicted Q value as a loss function, wherein the two value network parameters are independently updated;
And updating the strategy network, namely randomly generating a group of red unmanned aerial vehicle number arrangement, updating strategy networks of two red unmanned aerial vehicles according to the arrangement sequence, and considering the previously updated red unmanned aerial vehicle strategy network when updating the current red unmanned aerial vehicle strategy network parameters, wherein a loss function for guiding the strategy network to update takes a smaller Q value and an entropy regularization term in the estimated Q values of the two value networks as input.
Updating the target value network, namely optimizing the target value network parameters through soft updating at intervals of a certain step number.
Repeating the steps until all the networks gradually converge, taking each strategy network as a cooperative countermeasure decision model of the red unmanned aerial vehicle, inputting the strategy network into local observation of each red unmanned aerial vehicle in the current environment, and outputting the strategy network into actions executed by each red unmanned aerial vehicle under the current observation.
Table 1 unmanned plane state meter for red and blue parties under three initial situations
The simulation experiment results are shown in fig. 6, 7 and 8, under three basic initial situations, all the red-square generation strategy models can hit an enemy plane in a short time, and the two red-square unmanned aerial vehicles are matched in a cooperative manner to hit a blue unmanned aerial vehicle respectively. When the situation is initially in the dominant situation, the red and blue unmanned aerial vehicles quickly approach to the blue unmanned aerial vehicle in an acceleration way, and the blue unmanned aerial vehicles are respectively knocked down by the two red unmanned aerial vehicles due to the fact that the turning speed is high, and when the situation is initially in the inferior situation, the red unmanned aerial vehicles firstly conduct turning maneuver to twist the inferior situation, the blue unmanned aerial vehicles are accelerated to approach to the opponents, and after that, the blue unmanned aerial vehicles are overturned to the front of the red unmanned aerial vehicles due to the fact that the speed of the blue unmanned aerial vehicles is high, the two red unmanned aerial vehicles are subjected to the minimum turning maneuver again, and the two blue unmanned aerial vehicles can be respectively knocked down by the blue unmanned aerial vehicles.
As a disclosed embodiment, the present invention further provides a multi-unmanned aerial vehicle collaborative countermeasure decision-making system for implementing the method, the system comprising:
The construction module is used for constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by constructing a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model;
The first establishing module is used for establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
the second building module is used for designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space;
The generating module is used for training the observation of each unmanned aerial vehicle obtained by the multi-unmanned aerial vehicle collaborative countermeasure decision-making model based on HASAC and generating a multi-unmanned aerial vehicle collaborative countermeasure strategy model, wherein the multi-unmanned aerial vehicle comprises a plurality of unmanned aerial vehicles on the my side and a plurality of unmanned aerial vehicles on the enemy side, and the enemy side is red and the enemy side is blue.
While the foregoing description illustrates and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as limited to other embodiments, and is capable of numerous other combinations, modifications and environments and is capable of changes or modifications within the scope of the inventive concept as expressed herein, either as a result of the foregoing teachings or as a result of the knowledge or technology of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (6)

1. The multi-unmanned aerial vehicle collaborative countermeasure decision-making method is characterized in that the multi-unmanned aerial vehicle comprises a plurality of unmanned aerial vehicles and a plurality of unmanned aerial vehicles of enemy, wherein the enemy is red, and the enemy is blue, and the method comprises the following steps:
Step 1, constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by building a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model, wherein the method specifically comprises the following steps of:
step 11, analyzing the stress of each unmanned aerial vehicle and establishing a particle motion model;
step 12, analyzing the interrelation among the unmanned aerial vehicles, and establishing a relative motion model of the unmanned aerial vehicles, wherein the establishment of the relative motion model of the unmanned aerial vehicles is as follows:
in the formula, Numbering set for number of red unmanned aerial vehicle,Numbering set for the number of blue unmanned aerial vehiclesRed squareSpeed vector of unmanned aerial vehicle,Lan Fangdi ASpeed vector of unmanned aerial vehicle,Red squareUnmanned aerial vehicle's teammate unmanned aerial vehicleVelocity vector of (2),A teammate number; Is red square Unmanned aerial vehicle and Lan FangdiUnmanned aerial vehicle is atThe distance of the axis; express Red square Unmanned aerial vehicle and red team friend unmanned aerial vehicleAt the position ofDistance of axis, red squareUnmanned aerial vehicle and Lan FangdiRelative position vector of unmanned aerial vehicleRed squareUnmanned aerial vehicle frame and red team friend unmanned aerial vehicleIs a relative position vector of (2)Red square (I)Unmanned aerial vehicle and Lan FangdiOff angle of unmanned aerial vehicleIs red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and Lan FangdiRelease angle of unmanned aerial vehicleIs blue squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs a departure angle of (2)Is red squareSpeed vector and relative position vector of unmanned aerial vehicleAngle of (2) red squareUnmanned aerial vehicle and teammate unmanned aerial vehicle thereofIs of the angle of departure of (2)Unmanned aerial vehicle for teammatesVelocity vector and relative position vectorIs arranged at the lower end of the cylinder,Express Red squareUnmanned aerial vehicle is with respect to Lan FangdiThe relative distance between the unmanned aerial vehicle and the frame,Express Red squareUnmanned aerial vehicle is relative to its teammate unmanned aerial vehicleThe magnitude of the relative distance is such that,Respectively represent red squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent the blue squareThe speed of the unmanned aerial vehicle, the track inclination angle and the track deflection angle,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleSpeed, track dip angle, track offset angle; Respectively represent red square The position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent the blue squareThe position of the unmanned aerial vehicle under the inertial coordinate system,Respectively represent red squareUnmanned aerial vehicle frame teammate unmanned aerial vehicleA position under an inertial coordinate system;
step 13, establishing an unmanned aerial vehicle air combat situation assessment model according to the angle, the speed, the height and the distance of the unmanned aerial vehicle;
Step 2, establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
Step 3, designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space, wherein the HASAC algorithm adopts a centralized training distributed execution framework and consists of n strategy networks, two value networks and two target value networks, n represents the number of red unmanned aerial vehicles, each strategy network corresponds to one red unmanned aerial vehicle, each red unmanned aerial vehicle is connected with one strategy network and is used for approximating an unmanned aerial vehicle decision model to generate actions decided under the local observation of the red unmanned aerial vehicle, the strategy network structures are the same, each strategy network independently learns, network parameters are not shared, the value networks are used for evaluating the quality of the action executed by the strategy network under the given observation, the target value networks are used for stabilizing the training process, the network structures are the same as the value networks, and the target value network parameter updating is obtained based on weighted average of the value network parameters;
Step 4, based on HASAC algorithm network space and multi-unmanned plane cooperative countermeasure decision-making environment interaction, training and generating a multi-machine cooperative countermeasure strategy model, specifically comprising the following steps:
1) Acquisition from an interaction environment Time of day observationRespectively, policy networkInput Red squareUnmanned aerial vehicle observationOutput red squareAction of unmanned aerial vehicleAll red unmanned aerial vehicle motor forms combined action;
2) All actions are performedInputting the interactive environment, and returning the environment to the next observationAnd global stateAnd performing a joint reward for an actionWherein the actions areThe method comprises the steps of respectively acting on n unmanned aerial vehicle models of the red party to update the state of the red party unmanned aerial vehicle, converting a multi-countermeasure decision problem into a situation that each blue party unmanned aerial vehicle distributes a red party target and a one-to-one countermeasure decision problem based on target distribution, outputting the red party target attacked by the blue Fang Moren machine based on situation evaluation by the target distribution model, outputting the decision action of the blue party unmanned aerial vehicle against the red party target by the minimum, updating the state of the blue party unmanned aerial vehicle, and extracting the observation value of the red party unmanned aerial vehicle by an observation and global state processorAnd global state valueAnd calculating a joint prize based on the prize function;
3) Will currently beObservation of each red square unmanned aerial vehicle at momentAnd global stateCombined action of all red unmanned aerial vehiclesObserving each red unmanned aerial vehicle at next momentAnd global stateCombined rewardsComposition tupleStoring the data into an experience pool;
4) Then randomly sampling data from the experience pool As training data of the value network, the strategy network and the target value network, updating the value network, the target value network and the strategy network of each red unmanned aerial vehicle based on an Adam optimizer;
And repeating the steps until all the networks gradually converge, namely taking each strategy network as a multi-unmanned aerial vehicle cooperative countermeasure decision model of the red party when rewards per round are not obviously increased and the countermeasure time length of the round is not obviously shortened in a period of time, inputting the strategy network as local observation of each unmanned aerial vehicle of the red party at the current moment, and outputting the strategy network as decision action executed by each unmanned aerial vehicle of the red party under the current observation.
2. The multi-unmanned aerial vehicle collaborative countermeasure decision-making method of claim 1, wherein the particle motion model includes kinematic and kinetic equations, in particular of the formula
In the formula,Respectively unmanned aerial vehicle under inertial coordinate systemCoordinates of the shaft, speed, track inclination angle, track deflection angle and gravity acceleration; the tangential overload of the unmanned aerial vehicle is represented along the flying speed direction of the unmanned aerial vehicle; Vertical and flight velocity vectors, representing the normal overload of the drone, Indicating the roll angle of the drone about the speed axis.
3. The multi-unmanned aerial vehicle collaborative countermeasure decision-making method according to claim 1, wherein the step 13 specifically comprises converting multi-machine collaborative countermeasure into target assignment and single machine countermeasure, wherein the target assignment is based on situation assessment, and in the case of mutual threat of red party and blue party, the assigned target is made to be least threat than the target itself and threat of the target itself is greater, and the expression of the situation assessment is as follows:
Wherein, the Is red squareUnmanned aerial vehicle is with respect to Lan FangdiA situation evaluation value of the unmanned aerial vehicle,,,,Respectively represent red squareUnmanned aerial vehicle is with respect to Lan FangdiThe attitude advantages of the unmanned plane angle, height, speed and distance,For the corresponding weight, satisfy
4. The method according to claim 1, wherein the step 2 comprises the steps of implementing the multi-unmanned aerial vehicle cooperative countermeasure decision model based on a distributed partially observable Markov decision process, wherein the distributed partially observable Markov process adopts a tupleTo describe, among others,Representation ofA set of red-colored square unmanned aerial vehicles; Is the state space of the red unmanned plane; Is the joint action space of all red unmanned aerial vehicles, Express Red squareThe action space of the unmanned aerial vehicle is set up,;Is red squareThe frame unmanned aerial vehicle locally observes in a global state,Is a joint rewarding function of all red unmanned aerial vehicles to cooperate against blue parties,Is a state transfer function that is a function of the state,Is discount factor, red squareThe local observation of the unmanned aerial vehicle comprises a red squareAnd (5) information of the unmanned aerial vehicle, information of the blue unmanned aerial vehicle and information of teammates.
5. The multi-unmanned aerial vehicle collaborative countermeasure decision-making method of claim 1, wherein a design collaborative countermeasure reward function is establishedThe sum of rewards is obtained for each unmanned plane of red party against blue party, namely, wherein,Express Red squareRewards against all blue-side drones by the drones.
6. A multi-unmanned cooperative challenge decision system for implementing the method of any of claims 1-5, the system comprising:
The construction module is used for constructing a multi-unmanned aerial vehicle collaborative air combat countermeasure decision-making environment by constructing a multi-unmanned aerial vehicle air combat countermeasure motion model and an air combat situation assessment model;
The first establishing module is used for establishing a distributed partially observable Markov decision process model of the multi-unmanned aerial vehicle collaborative countermeasure decision problem according to the action space and the local observation and the state of each unmanned aerial vehicle in the countermeasure decision environment;
the second building module is used for designing a multi-machine collaborative countermeasure reward function and HASAC algorithm network space;
The generation module is used for interacting with the multi-unmanned aerial vehicle collaborative countermeasure decision-making environment based on HASAC algorithm network space, training and generating a multi-unmanned aerial vehicle collaborative countermeasure policy model, wherein the multi-unmanned aerial vehicle comprises a plurality of unmanned aerial vehicles on the my side and a plurality of unmanned aerial vehicles on the enemy side, the my side is taken as a red side, and the enemy side is taken as a blue side.
CN202410276466.8A 2024-03-11 2024-03-11 Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system Active CN119598825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410276466.8A CN119598825B (en) 2024-03-11 2024-03-11 Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410276466.8A CN119598825B (en) 2024-03-11 2024-03-11 Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system

Publications (2)

Publication Number Publication Date
CN119598825A CN119598825A (en) 2025-03-11
CN119598825B true CN119598825B (en) 2026-04-07

Family

ID=94837572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410276466.8A Active CN119598825B (en) 2024-03-11 2024-03-11 Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system

Country Status (1)

Country Link
CN (1) CN119598825B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121277229B (en) * 2025-12-09 2026-02-03 中国科学技术大学 A UAV adversarial scheduling method under incomplete observation information

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116974299A (en) * 2023-08-10 2023-10-31 北京理工大学 Reinforcement learning UAV trajectory planning method based on delayed experience priority playback mechanism
CN117270400A (en) * 2023-10-27 2023-12-22 北京航空航天大学 A course-based learning-based decision-making optimization method for drone confrontation

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095481B (en) * 2021-04-03 2024-02-02 西北工业大学 An air combat maneuver method based on parallel self-game
CN115951709A (en) * 2023-01-09 2023-04-11 中国人民解放军国防科技大学 Multi-UAV air combat strategy generation method based on TD3
CN116362471A (en) * 2023-01-10 2023-06-30 国网湖北省电力有限公司营销服务中心(计量中心) A Flexible Deep Reinforcement Learning Approach to Building Load Demand Response Considering Energy Storage Participation
CN116430904B (en) * 2023-05-15 2025-08-01 西安电子科技大学 Unmanned aerial vehicle autonomous path planning method based on lightweight continuous SAC algorithm
CN117156272A (en) * 2023-08-28 2023-12-01 上海大学 Electro-hydraulic adjustable-focus lens automatic focusing method and system and electronic equipment
CN117394461B (en) * 2023-12-11 2024-03-15 中国电建集团西北勘测设计研究院有限公司 Supply and demand coordinated control system and method for integrated energy system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116974299A (en) * 2023-08-10 2023-10-31 北京理工大学 Reinforcement learning UAV trajectory planning method based on delayed experience priority playback mechanism
CN117270400A (en) * 2023-10-27 2023-12-22 北京航空航天大学 A course-based learning-based decision-making optimization method for drone confrontation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Maximum Entropy Heterogeneous一Agent Mirror Learning;Jiarong Liu等;arxiv;20230119;参见第4节 *

Also Published As

Publication number Publication date
CN119598825A (en) 2025-03-11

Similar Documents

Publication Publication Date Title
CN115291625B (en) Multi-Agent Hierarchical Reinforcement Learning-Based Multi-UAV Air Combat Decision-Making Method
CN115185294B (en) QMIX-based aviation soldier multi-formation collaborative autonomous behavior decision modeling method
Kong et al. Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat
CN108549233B (en) A UAV Air Combat Maneuver Game Method with Intuitive Fuzzy Information
CN113893539B (en) Cooperative fighting method and device for intelligent agent
CN110928329A (en) Multi-aircraft track planning method based on deep Q learning algorithm
CN113962012A (en) Unmanned aerial vehicle countermeasure strategy optimization method and device
CN112198892A (en) Multi-unmanned aerial vehicle intelligent cooperative penetration countermeasure method
CN118917171B (en) Air combat decision-making method based on multi-agent reinforcement learning based on attention network transfer
CN119129413A (en) A UAV collaborative air combat decision-making method based on GRU-MAPPO deep reinforcement learning
CN114489144B (en) Unmanned aerial vehicle autonomous maneuver decision method and device and unmanned aerial vehicle
CN109597839B (en) Data mining method based on avionic combat situation
CN115963724A (en) Unmanned aerial vehicle cluster task allocation method based on crowd-sourcing-inspired alliance game
CN118778678A (en) Cooperative confrontation method for proximal strategy optimization of drone swarm based on counterfactual baseline
CN119598825B (en) Multi-unmanned aerial vehicle collaborative countermeasure decision-making method and system
Xianyong et al. Research on maneuvering decision algorithm based on improved deep deterministic policy gradient
CN116610139B (en) Unmanned plane cluster distributed cooperative guidance law based on multi-agent reinforcement learning method
CN115933717A (en) UAV intelligent air combat maneuver decision-making training system and method based on deep reinforcement learning
Kong et al. Reinforcement learning for multiaircraft autonomous air combat in multisensor UCAV platform
CN113741186B (en) A decision-making method for two-aircraft air combat based on proximal strategy optimization
CN116468121B (en) Multi-machine air combat decision-making method based on general experience game reinforcement learning
CN114895710B (en) Unmanned aerial vehicle cluster autonomous behavior control method and system
CN114492677B (en) Unmanned aerial vehicle countermeasure method and device
Chen et al. UAV cooperative air combat maneuvering decision-making using GRU-MAPPO
CN111157002A (en) Aircraft 3D path planning method based on multi-agent evolutionary algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant