CN115309555B

CN115309555B - Parallel computing method and system for satellite, storage medium and equipment

Info

Publication number: CN115309555B
Application number: CN202210946245.8A
Authority: CN
Inventors: 李昭男; 曾伟刚; 李琮; 董卫华; 张治国
Original assignee: Xi'an Zhongke Tianta Technology Co ltd
Current assignee: Xi'an Zhongke Tianta Technology Co ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2024-03-15
Anticipated expiration: 2042-08-08
Also published as: CN115309555A

Abstract

The invention belongs to a parallel computing method, which aims to solve the technical problems that when satellite data are computed at present, an integrated resolving method is adopted, single computing operation is complex, a large display memory space is occupied, when the number of satellites in parallel computing reaches a certain number, overflow errors occur due to insufficient buffering, resolving is failed, a modularized resolving method is adopted, a great amount of time is consumed in the process of parameter interaction between a central processing unit and a graphic processor, and the accelerating effect is not ideal.

Description

Parallel computing method and system for satellite, storage medium and equipment

Technical Field

The invention belongs to a parallel computing method, and particularly relates to a parallel computing method and system for satellites, a computer readable storage medium and terminal equipment.

Background

The construction of a low orbit satellite constellation communication system capable of covering the world is a necessary path for the development of aerospace industry. By 2017, there have been up to 18000 artificial targets on the near-earth space orbit. Meanwhile, with the development of the small satellite technology, the number of satellites contained in the low-orbit constellation network is continuously increased, and the number of low-orbit broadband constellations projected by the spaceX reaches more than 4000.

At present, when satellite data is calculated, an SGP4/SDP4 algorithm is often used for carrying out serial operation based on a central processing unit, and although the performance of the central processing unit is rapidly improved in recent decades, the calculation by using the central processing unit is still very time-consuming in the face of the increasing number of satellites. For example, calculating position data for 200 satellites for each second in 24 hours takes approximately four minutes. Thus, with the increasing number of satellites, calculations using a central processor are no longer applicable. In order to solve the problem, the Jitianyang et al adopts a modularized acceleration method and a CUDA library provided by Injeida company in the graphic processor acceleration-based low-orbit satellite constellation coverage performance calculation and analysis, and the matched graphic processor carries out an acceleration test on an SGP4/SDP4 orbit model. Kong Fanze et al propose two graphics processor acceleration methods, an integrated solution and a modular solution, in the satellite orbit recursive graphics processor integrated parallel acceleration method. In the integrated resolving method, an SGP4 resolving model (including initial constant definition) is added as a whole into a kernel function computing process, computer memory only needs to carry out one-time interaction data transmission and one-time kernel function call and video memory space allocation with a graphic processor memory, compared with a modularized method, the time of multiple parameter interaction and function call is saved, but the integrated resolving method is complex in single computing operation and large in video memory space occupation, and when the number of stars in parallel computing reaches a certain number, overflow errors occur due to insufficient cache, so that resolving is failed. The modularized resolving method divides the resolving process into 4 modules, namely a gravity perturbation constant initializing module, a double-row element and orbit determination parameter converting module, an SGP4 orbit model initializing module and an SGP4 orbit forecasting module. Except for the gravity perturbation constant initialization module, the other 3 modules perform parallel computation in the graphics processor, the split module performs calculation, has small use requirement on a video memory, and can realize large-scale satellite orbit parallel computation, but the method consumes a great amount of time in the process of parameter interaction between the central processor and the graphics processor, and has unsatisfactory acceleration effect.

Disclosure of Invention

The invention provides a parallel computing method and system for satellites, a computer readable storage medium and terminal equipment, which are used for solving the technical problems that when satellite data are computed at present, an integrated resolving method is adopted, single computing operation is complex, a large memory space is occupied, when the number of satellites in parallel computing reaches a certain number, overflow errors occur due to insufficient buffering, and resolving failure is caused.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

the parallel computing method for the satellite is characterized by comprising the following steps of:

s1, performing disturbance data initialization processing in a central processing unit to obtain a disturbance data initialization result;

s2, reading TLE data through a central processing unit, and extracting a time period, a step value and a TLE data column which need to be calculated;

s3, a central processing unit is used for taking single satellite orbit parameters from the TLE data column for initialization;

s4, copying the disturbance data initialization result, the single satellite orbit parameters initialized in the step S3, the time period information to be calculated and the step value information to a graphic processor;

s5, performing ephemeris computation and conversion under different time nodes through a plurality of computation units in the graphic processor respectively to obtain a computation result;

s6, reading and storing the calculation result in the step S5 through a central processing unit; and judging whether the ephemeris calculation and conversion of all satellites are finished, if yes, finishing the parallel calculation, otherwise, returning to the step S3 until the ephemeris calculation and conversion of all satellites are finished.

Further, in step S5, the computing units are in one-to-one correspondence with the time nodes.

Further, the storing in step S6 and step S3 are performed in two threads of the central processor, respectively.

Further, the video memory of the graphics processor comprises a constant memory and a global memory;

in step S4, the disturbance data initialization result is stored in a constant memory; other data in the graphics processor is stored in global memory.

Further, the central processing unit calls a class method in the running process;

when the graphic processor calls a function in the running process, deleting a function table pointer, and independently transferring a class object in a class method as a structural body.

Further, in step S5, the time information and the coordinate information data types in the calculation result are the structure types of the array.

Further, in step S5, the performing, by the multiple computing units in the graphics processor, ephemeris computation and conversion under different time nodes respectively, specifically, trigonometric function computation in ephemeris computation and conversion under different time nodes each downgrade a double-precision floating point number form to a single-precision floating point number form.

The invention provides a parallel computing system for satellites aiming at the parallel computing method, which is used for realizing the parallel computing method for satellites and is characterized by comprising a central processing unit and a graphic processor;

the central processing unit is used for carrying out disturbance data initialization processing to obtain a disturbance data initialization result, reading TLE data, initializing single satellite orbit parameters in the TLE data, copying the disturbance data initialization result and the initialized single satellite orbit parameters to the graphic processor, reading and storing calculation results of all calculation units, and judging whether ephemeris calculation and conversion of all satellites are completed or not;

the graphics processor is used for respectively executing ephemeris computation and conversion under different time nodes through a plurality of computing units in the graphics processor.

The invention also provides a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the above method.

In addition, the invention also provides a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the steps of the method are realized when the processor executes the computer program.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the existing CPU serial double-layer circulation calculation method, the parallel calculation method for the satellite provided by the invention has the advantages that the ephemeris calculation and conversion which need to be circularly calculated are completed by a plurality of calculation units of the GPU, and in terms of logic, the ephemeris calculation and conversion of a single satellite which originally needs to be circularly calculated for a plurality of times on the CPU can be completed only by one calculation, so that the aim of accelerating the calculation pair is fulfilled.

2. According to the invention, the disturbance data initialization processing and the TLE data reading are performed in the CPU, so that a plurality of computing units of the GPU are prevented from performing the same computation on the same data, and the memory space and the computing resources of the CPU are saved.

3. According to the invention, the storage of the GPU calculation result and the initialization processing of the single satellite orbit parameter in the CPU are respectively executed in two threads, so that the CPU is prevented from being stored and stopped, and the execution speed is further improved.

4. According to the invention, the GPU is provided with the constant memory for storing the disturbance data initialization result, the GPU deletes the function table pointer in the operation process, the time information and coordinate information data type in the calculation result in the GPU are the structure type of an array, and the GPU degrades the double-precision floating point number form into the single-precision floating point number form when executing trigonometric function calculation, so that the calculation speed can be greatly improved.

5. The invention also provides a computer readable storage medium and terminal equipment capable of executing the steps of the method, and the method can be popularized and applied to realize fusion on corresponding hardware equipment.

Drawings

FIG. 1 is a schematic flow chart of a serial double-layer cyclic calculation method of a CPU;

FIG. 2 is a flow chart of an embodiment of a parallel computing method for satellites according to the present invention;

FIG. 3 is a schematic diagram of an AoS data structure according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a SoA data structure according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

As shown in fig. 1, a serial double-layer circulation calculation method is adopted, after calculation is started, disturbance data is initialized, TLE data is read, a single satellite is used for initializing orbit parameters, an SGP4/SDP4 orbit model is initialized, coordinates are calculated through the SGP4/SDP4 orbit model, then planetary calendar conversion is performed, if the current step number is smaller than the total step number, the coordinates are recalculated through the SGP4/SDP4 orbit model, if the current step number is larger than or equal to the total step number, then whether the satellite number is smaller than the total satellite number is judged, if the satellite number is smaller than the total satellite number, the next single satellite is continuously taken for initializing the orbit parameters, calculation is repeated, and if the satellite number is judged to be equal to the total satellite number, the calculation is ended.

The above-mentioned step number refers to the step number specifically, the step refers to the time span of each calculation, for example, 1 step refers to calculating the position data of each second, and 2 steps refers to calculating the position data of every two seconds. The total number of steps refers to the number of all time nodes to be calculated.

Compared with serial double-layer cyclic computation of a CPU, the core idea of the GPU parallel computation method for satellites designed by the invention is as follows: coordinate data of multiple time nodes which are needed to be circularly calculated originally are respectively delivered to a plurality of calculation units of the GPU, and each calculation unit calculates ephemeris calculation and conversion under different time nodes. Therefore, logically, the ephemeris calculation and conversion of a single satellite which can be finished by repeated cyclic calculation on the CPU can be finished by only one calculation, so that the effect of accelerating operation is achieved.

As shown in fig. 2, the specific steps of a parallel computing method for satellites of the present invention are as follows:

(1) The CPU (central processing unit) reads the track disturbance data and performs analysis initialization processing, and the specific process is as follows: reading a text file with track disturbance data, converting the read text into a data structure which can be used for later calculation and storing the data structure in a memory;

(2) The TLE (Two lines of track data) data file is read by the CPU, and the time period, step value, and TLE data column to be calculated are extracted. And then ready to begin entering the loop portion of the calculation.

(3) And extracting TLE data of one satellite from the extracted TLE data column, analyzing the data and finishing initialization. The initialization here is because the TLE data of a single satellite taken from the extracted TLE data column is the number of orbits for storage, requiring re-revisions and calculations to obtain the available orbit parameters. The algorithm for how to initialize can be accomplished using existing correlation algorithms.

(4) Copying the initialized result of the disturbance data, the initialized single satellite orbit parameters, the time period which is extracted from the TLE data file and needs to be calculated and the stepping value to the GPU;

(5) Ephemeris computation and conversion is done by a GPU (graphics processor) where multiple computation units perform ephemeris computation and conversion at different time nodes. The calculation unit may perform ephemeris calculation and conversion under one time node, or may perform ephemeris calculation and conversion under a plurality of time nodes, and the allocation of the time nodes is performed through a video memory driver in the GPU in specific implementation, and the time nodes are equally allocated to each calculation node.

(6) And (3) the CPU reads and stores calculation results after the GPU finishes ephemeris calculation and conversion, in addition, judges whether the ephemeris calculation and conversion of all satellites are finished, if yes, finishes parallel calculation, otherwise, returns to the step (3), continues to take out TLE data of one satellite, analyzes the data, finishes initialization, and then continues to perform calculation until the ephemeris calculation and conversion of all satellites are finished, and the data is completely saved and exits from calculation.

When the CPU stores the TLE data, the TLE data can be analyzed with the TLE data of one satellite taken out in the step (3), and the initialization is completed and the TLE data and TLE data can be asynchronously executed in two threads. Specifically, the calculation result in the GPU is stored in the buffer area of the CPU memory, the CPU synchronously takes out TLE data of one satellite in the first thread while the CPU stores in the second thread, analyzes the data, and completes initialization.

In the method, firstly, the analysis initialization process of disturbance data, track parameters and the like is finished by a CPU, so that a plurality of computing units of the GPU are prevented from carrying out the same computation on the same data, and a certain amount of GPU memory space and computing resources are saved. And secondly, the method of the invention returns the result of each calculation completion to the CPU and stores asynchronously, and stores the last calculation result while waiting for GPU calculation completion, thereby avoiding the phenomenon that the process is stopped when the CPU waits for the calculation result, solving the problem that the process is not working, and simultaneously, completing multi-star calculation with little video memory.

However, the data of the calculation result is returned to the CPU after each calculation is completed, which results in frequent data exchange between the CPU and the GPU. In most cases, the memory space of the CPU is connected to the video memory of the GPU through a PCIe (Peripheral Component Interconnect Express) bus, and frequent and massive data exchange requires a certain time and bus bandwidth. In addition, through performance analysis, the data read-write performance of the calculation method is not high, and particularly, more conflicts occur during access. In addition, a large amount of trigonometric function calculation is carried out in the ephemeris calculation and conversion process, and the trigonometric function operation of the double-precision floating point type is very time-consuming.

In order to solve the problems, the invention also designs a plurality of optimization methods so as to accelerate the operation speed of the parallel computing method.

(1) Setting constant memory

The constant memory is characterized in that only one data part exists in the video memory space. When the multithread accesses the constant memory and the same address, only one chip read-write cycle is needed, and then data is sent to all threads in a broadcasting mode. The read-write cycle refers to the minimum time interval required by two continuous read-write operations on the memory chip, and because some memories require a certain recovery time after one access operation, the access cycle is generally greater than or equal to the fetch time, but if multiple continuous read-write operations are required, the whole read-write cycle must be spaced between each read-write operation.

The global memory is characterized in that when the multithread accesses different addresses of the global memory (the addresses are not in the same granularity), no conflict exists, the performance is best, the granularity refers to the data segments accessed by the memory chip through one-time read-write operation, each data segment is just like one grid storing the data with the same size, and two threads need two read-write cycles for accessing the different addresses in the same grid. When multiple threads access the same address of the global memory, conflicts occur, and the threads need to be queued for reading. At this time, the data is read only as parameters and is not modified, if the same data is copied for each thread, the memory consumption is extremely high, so in the invention, when the disturbance data initialization result, the initialized single satellite orbit parameters, the time period information to be calculated and the step value information are copied to the graphic processor, the disturbance data initialization result is stored in the constant memory, other data is stored in the global memory, and the disturbance data initialization result is stored in the constant memory, thereby effectively saving the memory resources.

(2) Deleting inefficient data parameters

In existing CPU algorithms, the computation process is done inside the class object, so class methods can be invoked without passing pointers. However, in the acceleration algorithm introduced into the GPU, in order to use the class method, the pointer needs to be transferred to the GPU, and meanwhile, the whole class object needs to be copied to the video memory of the GPU, including the function table pointer of the class object, which wastes certain memory resources. Therefore, deleting the function table pointer of the class object, transmitting the required data as a structural body independently, and extracting the class method as a common function call, so that the memory consumption can be saved.

(3) Conversion of AoS to SoA

AoS refers to an array of structures, the form of which is shown in fig. 3. The structure of the SoA index set is shown in FIG. 4. On the GPU, the SoA has better read-write performance than the AoS because the number of read cycles spent by the memory chip per access by the SoA is much smaller than the number of cycles required by the AoS access. When the structure is used as array, i.e. AoS, the variables in the same structure (AoS [2 ]) are stored in adjacent memory space, when adjacent data are in the same read-write granularity, the read-write conflict of adjacent variables can occur at the same time, the thread needs to wait in line, and the read-write cycle of the chip is increased. When the array forms a structural body, namely the SoA, the data which are originally positioned at adjacent positions in the AoS are far apart in the SoA, and when the data with the same subscript are accessed, the occurrence of conflict is avoided, and the access can be completed only by a few read-write cycles. The method can greatly improve the read-write performance of the algorithm.

The calculation result obtained by performing ephemeris calculation and conversion in the GPU contains time and coordinate information, and SoA optimization can be performed on the data structure of time and coordinates.

(4) Double form operation to float form operation conversion

When the GPU executes ephemeris calculation and conversion, a large number of complex logic operations such as trigonometric functions are involved, the operation speed is low, and the speed improvement can be realized through degradation of Double (Double-precision floating point number). Because some local high-precision calculations are unavoidable, only double-precision degradation single-precision calculations are performed on trigonometric function calculations in a specific function. The calculation result optimized by the method shows that the error is only reflected on the last bit after the decimal point, the numerical difference of the last bit compared with the numerical difference before the precision is not reduced is only 1, and the error occurrence probability is less than 1 percent and is within an acceptable range.

Based on the above calculation method, the invention also provides a parallel calculation system for satellites, which can realize the above parallel calculation method for satellites and comprises a central processing unit and a graphic processor. The central processing unit is used for carrying out disturbance data initialization processing to obtain disturbance data initialization results, reading TLE data, initializing single satellite orbit parameters in the TLE data, copying the disturbance data initialization results and the initialized single satellite orbit parameters to the graphic processor, reading and storing calculation results of all calculation units, and judging whether ephemeris calculation and conversion of all satellites are completed or not; a graphics processor for performing ephemeris computation and conversion at different time nodes, respectively, by a plurality of computation units in the graphics processor.

In the parallel computing system, the CPU and the graphics processor can be optimized, so that better computing speed and computing effect can be achieved.

By adopting the parallel computing method, combining with the optimization method, the actual test is carried out, in the actual test, the RTX3070 video card is used, the star duration of which the steps are 1 within 24 hours of 8000 stars is computed, and the speed can be increased from 110 seconds originally to 20 seconds after optimization, so that the effect is remarkable.

The parallel computing method of the present invention can be applied to a computer-readable storage medium in which a computer program is stored, and the parallel computing method can be stored as a computer program in the computer-readable storage medium, which when executed by a processor, implements the steps of the parallel computing method.

In addition, the parallel computing method of the present invention may also be applied to a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the parallel computing method of the present invention when executing the computer program. The terminal device may be a computer, a notebook computer, a palm computer, various cloud servers, and other computing devices, and the processor may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, or other programmable logic devices.

In summary, the present invention improves and optimizes the SGP4/SDP4 algorithm calculated by CPU to the acceleration algorithm calculated by GPU. Compared with the existing CPU calculation result, the algorithm of the invention can still reach the acceleration ratio of 40-70 on common PC equipment even under the condition that the CPU uses multi-core optimization. The whole algorithm process is completely controllable, the calculation result is reliable, and the method has important significance for orbit calculation and aerospace industry.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A parallel computing method for a satellite, comprising the steps of:

s5, performing ephemeris computation and conversion under different time nodes through a plurality of computation units in the graphic processor respectively to obtain a computation result; the method comprises the steps that a plurality of calculation units in a graphic processor respectively execute ephemeris calculation and conversion under different time nodes, namely, trigonometric function calculation in the ephemeris calculation and conversion under different time nodes degrade a double-precision floating point number form into a single-precision floating point number form;

s6, reading and storing the calculation result in the step S5 through a central processing unit; judging whether the ephemeris calculation and conversion of all satellites are finished, if yes, finishing the parallel calculation, otherwise, returning to the step S3 until the ephemeris calculation and conversion of all satellites are finished;

the storing in step S6 and step S3 are performed in two threads of the central processing unit, respectively.

2. A parallel computing method for satellites according to claim 1, characterized in that: in step S5, the computing units are in one-to-one correspondence with the time nodes.

3. A parallel computing method for satellites according to claim 1, characterized in that: the video memory of the graphics processor comprises a constant memory and a global memory;

4. A parallel computing method for satellites according to claim 1, characterized in that:

the central processing unit calls a class method in the running process;

5. A parallel computing method for satellites according to claim 1, characterized in that: in step S5, the time information and the coordinate information in the calculation result are of an array structure type.

6. A parallel computing system for a satellite for implementing a parallel computing method for a satellite according to any one of claims 1 to 5, characterized in that: the system comprises a central processing unit and a graphic processor;

7. A computer-readable storage medium having stored thereon a computer program, characterized by: the program when executed by a processor performs the steps of the method according to any one of claims 1 to 5.

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that: the processor, when executing the computer program, implements the steps of the method according to any one of claims 1 to 5.