CN111580974B

CN111580974B - GPU instance allocation method, device, electronic equipment and computer readable medium

Info

Publication number: CN111580974B
Application number: CN202010383919.9A
Authority: CN
Inventors: 杨启凡; 罗建勋; 王长虎
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2023-06-27
Anticipated expiration: 2040-05-08
Also published as: CN111580974A

Abstract

The embodiment of the disclosure discloses a GPU instance allocation method and device. One embodiment of the method comprises the following steps: for each service in a service set requiring GPU operation of a graphic processor, acquiring service information of the service; determining GPU computing power required by each service based on service information of the service; grouping the service sets based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service set based on the determined GPU computing power; and distributing the GPU instances to the service groups based on the GPU resources and the number of the GPU instances required by each service group. This embodiment enables allocation of GPU instances in groups, providing an efficient way of resource allocation.

Description

GPU instance allocation method, device, electronic equipment and computer readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a GPU instance allocation method, apparatus, electronic device, and computer readable medium.

Background

In recent years, with the continuous development of understanding of big data and video. Whether or not many instances of the service GPU (graphics processor, graphics Processing Unit) are reasonably allocated can not only affect the quality of service, but also make the utilization of resources low.

However, the related method is often capable of scheduling resources for a single service, and lacks a more reasonable resource allocation mode when facing multiple services at the same time.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of one application scenario of a GPU instance allocation method according to some embodiments of the present disclosure;

FIG. 2 is a flowchart of one embodiment of a GPU instance allocation method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of still further embodiments of a GPU instance allocation method according to the present disclosure;

FIG. 4 is an exemplary flowchart of the determine GPU power steps according to further embodiments of the GPU instance assignment methods of the present disclosure;

FIG. 5 is a schematic diagram of some embodiments of a GPU instance distribution apparatus according to the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

Fig. 1 is a schematic diagram 100 of one application scenario of a GPU instance allocation method according to some embodiments of the present disclosure.

As shown in fig. 1, as an example, the electronic device 101 first acquires, for the service 1, the service 2, the service 3, and the service 4 in the service set, the service information 102 of the corresponding service 1, the service information 104 of the service 2, the service information 106 of the service 3, and the service information 108 of the service 4. Then, according to the acquired service information, the GPU computing power 103 required by the service 1, the GPU computing power 105 required by the service 2, the GPU computing power 107 required by the service 3 and the GPU computing power 109 required by the service 4 are respectively determined. Next, the services 1 to 4 in the above-described service set are grouped according to the service priority of each service, for example, a service group 1 and a service group 2 are generated. Wherein, service group 1 may include service 1 and service 3; service group 2 may include service 2 and service 4. Here, the number of GPU instances needed by service group 1 is determined by GPU computing power 103 needed by service 1 and GPU computing power 107 needed by service 3; the number of GPU instances needed for service group 2 is determined by GPU computation 105 needed for service 2 and GPU computation 109 needed for service 4. Finally, GPU instances corresponding to the existing GPU resources are sequentially distributed to the service group 1 and the service group 2 according to the GPU instances required by the service group 1 and the service group 2.

It is understood that the method of generating GPU instance assignments may be performed by the electronic device 101 described above. The electronic device 101 may be hardware or software. When the electronic device 101 is hardware, it may be a variety of electronic devices having information processing capabilities, including but not limited to smartphones, tablets, electronic book readers, laptop computers, desktop computers, servers, and the like. When the electronic apparatus 101 is software, it can be installed in the above-listed electronic apparatus. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of electronic devices in fig. 1 is merely illustrative. There may be any number of electronic devices as desired for an implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a GPU instance allocation method according to the present disclosure is shown. The GPU instance allocation method comprises the following steps:

step 201, for each service in a service set requiring GPU operation of a graphics processor, acquiring service information of the service;

in some embodiments, the execution body of the GPU instance allocation method (e.g., the electronic device 101 shown in fig. 1) may obtain the service information of each service in the service set through a wired connection manner or a wireless connection manner. Here, the service may be characterized by a large computational power demand, stateless computing. Here, the above-mentioned service may include, but is not limited to, at least one of: an online synchronization service or a batch service. The service information may include, but is not limited to, at least one of: category information of the service, log information of the service, and utilization information of a CPU of the service.

Step 202, determining the GPU computing power required by each service based on the service information of the service.

In some embodiments, the executing entity may determine the GPU computing power required by each service from the service information of each service obtained in step 201. Here, GPU computing power is generally used to characterize the amount of computation required for each service. By way of example, from the service information for each service, the GPU computation required for the service may be determined using various approaches.

And 203, grouping the service sets based on the service priority of each service to obtain at least one service group.

In some embodiments, the executing body groups the services included in the service set according to the order of service priorities. And at least one service group is obtained. As an example, when 100 services are included in the service set, the services in the service set are divided into 10 groups, each group including 10 services, in order of higher-to-lower service priority and the number of services accommodated by each group is defined as 10.

Step 204, determining the number of GPU instances required for each service group based on the determined GPU computing power.

In some embodiments, the executing entity may determine the number of GPU instances required for each service group by GPU computing power of step 202. Then, from the service groups determined in step 203, the GPU instance required for each service group may be determined. Here, the GPU instance may correspond to computing power provided to the service in a certain specification. For example, "1GPU with 6 cores CPU,12GB memory" can be considered as an example.

Step 205, assigning GPU instances to respective service groups based on the GPU resources and the number of GPU instances required for each service group.

In some embodiments, the executing entity may traverse each service group in turn based on the current GPU resources and the required GPU instance for each service group resulting from step 204. When traversing to the current service group, the GPU instances required by the current service group may be assigned to each service in the current service group in turn. Here, the GPU resources may be resources characterized by available GPUs installed in different regions and different models.

In some alternative implementations of some embodiments, the step of assigning GPU instances to respective service groups may be as follows:

first, for each service group in the at least one service group, the executing body generates a priority of each service group according to a service priority of a service included in the service group. For example, if the priority of the service in the first service group is higher than the priority of the service in the second service group, the priority of the first service group is higher than the priority of the service in the second service group.

In the second step, the executing body may further allocate GPU instances to each service group according to the order of the service group priorities from high to low.

This implementation meets the resource requirements of the high priority service, thereby ensuring the quality of service of the high priority service.

In some optional implementations of some embodiments, when the executing body detects that the number of GPU instances required by the target service group is greater than the number of GPU instances corresponding to the remaining GPU resources, the executing body may convert the number of GPU instances corresponding to the remaining GPU resources into the number of GPU instances required by the target service group. Finally, the execution body distributes the plurality of sub-GPU instances of the GPU instance needed by the target service set to the target service set.

As an example, assume that 10 groups are allocated in total according to the priority of the services, wherein 5 services are included in each group. When the execution body allocates resources to the service groups, when the execution body allocates resources to the 10 th group, the total of 10 GPU instances are required by 5 services in the 10 th group. However, the currently remaining GPU resources can only allocate 9 GPU instances. Here, the 9 GPU instances corresponding to the current remaining GPU resources may be split into 10 sub GPU instances on average. Finally, the 10 child GPU instances are assigned to the 5 services of group 10. This implementation allows each service in the service group to be allocated to a GPU instance to the extent possible, thereby ensuring the operation of the service.

One of the above embodiments of the present disclosure has the following advantageous effects: first, by acquiring service information of each service in a service set, category information of each service can be obtained. Then, according to the service information of each service, GPU computing power required by the service can be generated. By grouping services in a set of services, the services in the set of services may be distinguished by service priority. Finally, the GPU instances are distributed to each service group through the GPU instances required by the service group and the existing GPU resources, so that the GPU instances are distributed in units of groups, and an effective mode of resource distribution is provided.

With continued reference to fig. 3, a flow 300 of some embodiments of GPU instance allocation methods according to the present disclosure is shown. The GPU instance allocation method comprises the following steps:

in step 301, for each service in the service set requiring GPU operation of the graphics processor, service information of the service is obtained.

The specific implementation of step 301 and the technical effects thereof may refer to step 201 in those embodiments corresponding to fig. 2, which are not described herein.

And 302, performing pressure measurement on a plurality of hardware by using the service to obtain a pressure measurement result of the service.

In some embodiments, the service information includes service meta information and service status information. The service meta information includes policy information to which the service is bound. The policy information to which the service is bound may be one of: and single policy information, and combining the policy information. By single policy is meant that the service has only one policy bound. In contrast, a combined policy refers to a service that can bind multiple (e.g., 3) policies. This implementation has the advantage that policies can be flexibly combined to universally adapt to a variety of different properties of services.

As an example, the service status information may include, but is not limited to, at least one of: real-time state information of the service, for example, utilization information of the GPU by the service at the moment; historical state information over a period of time, such as utilization information of the GPU by the service history time.

In some embodiments, the execution body may first perform a pressure test (pressure test) with the service using a plurality of hardware (for example, a plurality of GPUs with different models), so as to obtain a pressure test result. The pressure measurement result may be a set of throughput rates of the service to a plurality of hardware.

Step 303, determining GPU computing power corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information respectively.

In some embodiments, the executing body may determine the GPU computing power corresponding to the middle boundary policy according to the middle boundary policy bound by the service, and refer to the service state information of the service and the pressure measurement result obtained in step 302. For example, the middle-boundary policy of the service may be "real-time utilization according to service computing power to complete corresponding expansion and contraction of the service". First, the execution subject can obtain a calculation power utilization rate set in advance for the service. Then, the executing body extracts the GPU calculation force required by the service from the state information of the service. And then, the executing main body removes the preset computing power utilization rate of the service by using the GPU computing power value required by the service at the moment, so as to determine the GPU computing power required by the service according to the middle-boundary strategy. And finally, the execution main body can obtain the GPU calculation force required by the service according to the middle-boundary strategy.

And so on, the executing body can determine the GPU computing power corresponding to the upper bound policy according to the upper bound policy bound by the service and by referring to the service state information of the service and the pressure measurement result obtained in the step 302. For example, the upper bound policy of the service may be "according to the historical utilization of the service computing power, to complete the corresponding expansion and contraction of the service". First, the execution subject can obtain a preset calculation power utilization rate of the service. Then, GPU computing power required by the service history is extracted from the service state information. Dividing the GPU computing power required by the history by the preset computing power utilization rate of the service, thereby determining the GPU computing power required by the service according to the upper bound strategy. And finally, the execution main body can obtain the GPU calculation force required by the service according to the upper bound strategy.

Here, the executing body may determine the GPU calculation corresponding to the lower bound policy according to the lower bound policy bound by the service, and refer to the service state information of the service and the pressure measurement result obtained in step 302. For example, the lower bound policy of a service may be "configure by time to accomplish a corresponding expansion and contraction of the service". First, the execution subject may extract the calculation force required for the service at the current time point from the service status information at predetermined intervals (for example, 5 minutes). Then, the computing power required at the current point in time is taken as the GPU computing power required by the service according to the lower bound policy. And finally, the execution main body can obtain the GPU calculation force required by the service according to the lower bound strategy.

Here, the GPU power size may be measured using a common power unit (NCU, normalized Computing Unit). The GPU computing power is a ratio, and the execution subject can define the general computing power unit of the service aiming at the problem that multiple services are freely mixed on the multi-model GPU. And the execution main body determines sample throughput rates among different GPU models and service combinations according to the compression measurement result of the service. And then, the execution main body determines the calculation power weight of each service and GPU model combination according to the sample throughput rate, so that calculation power matching among multiple types and multiple services is achieved, and GPU instances of the same service can use different GPU model resources in a mixed mode.

Here, the policy information may include, but is not limited to, at least one of: GPU real-time utilization information, GPU historical utilization information, timing specified instance information, waiting queue length information and waiting queue duration information. Accordingly, the policies corresponding to the policy information may include, but are not limited to, at least one of the following: according to the real-time utilization rate of the service GPU, the corresponding expansion and contraction capacity of the service is completed; according to the historical utilization rate of the service GPU, the corresponding expansion and contraction capacity of the service is completed; according to the length of the service waiting queue, completing corresponding expansion and contraction of the service; and completing corresponding expansion and contraction of the service according to the waiting queue time of the service. This implementation provides a variety of strategies to meet the needs of different nature services so that resources can be better allocated.

Step 304, determine the GPU computation required for the service.

In some embodiments, reference may be further made to fig. 4, which illustrates another exemplary flow 400 of the determining step 304 of the GPU instance allocation method according to some embodiments of the present application. As shown in fig. 4, the determining step 304 may also proceed as follows.

Step 401: the constraint middle-bound policy corresponds to a computational power that is not greater than the upper-bound policy corresponds to a computational power.

When the calculation force required by the middle boundary strategy is larger than the calculation force required by the upper boundary strategy, the calculation force required by the upper boundary strategy is taken as the calculation force required by the middle boundary strategy.

Step 402: the corresponding computing force of the constraint middle boundary strategy is not smaller than the corresponding computing force of the lower boundary strategy.

When the calculation force required by the middle boundary strategy is smaller than the calculation force required by the lower boundary strategy, the calculation force required by the lower boundary strategy is taken as the calculation force required by the middle boundary strategy.

Step 403: and taking the calculation force required by the middle-boundary strategy as the GPU calculation force required by the service.

At step 305, at least one service group is obtained.

Step 306 determines the number of GPU instances required for each service group.

In step 307, GPU instances are assigned to respective service groups.

The specific implementation of steps 305-307 and the technical effects thereof may refer to steps 203-205 in the embodiments corresponding to fig. 2, and will not be described herein.

In some optional implementations of some embodiments, the number of GPU models corresponding to the GPU instances is at least one, which may enable the GPU instances that are served together to use different GPU model resources in a mixed manner.

As can be seen from fig. 3, compared with the description of some embodiments corresponding to fig. 2, the flow 300 of the GPU instance allocation method in some embodiments corresponding to fig. 3 highlights that, for each service in the service set, the service can flexibly bind the policy, and the service bound policy can be extended to a plurality of services, so that a reasonable expansion and contraction policy can be provided for each service, and reasonable allocation of GPU resources is ensured.

With further reference to fig. 5, as an implementation of the methods described above for the various figures, the present disclosure provides some embodiments of GPU instance allocation apparatus, which correspond to those described above for fig. 2, which are particularly applicable in a variety of electronic devices.

As shown in fig. 5, GPU instance allocation device 500 of some embodiments includes: an acquisition unit 501, a first determination unit 502, a generation unit 503, a second determination unit 504, and an allocation unit 505. An obtaining unit 501 configured to obtain, for each service in a service set requiring a GPU operation, service information of the service; a first determining unit 502 configured to determine GPU computing power required for each service based on service information of the service; a generating unit 503 configured to group the service sets based on a service priority of each service, and generate at least one service group; a second determining unit 504 configured to determine the number of GPU instances required for each service group based on the determined GPU computing power; an allocation unit 505 configured to allocate GPU instances to respective service groups based on the GPU resources and the number of GPU instances required for each service group.

In some optional implementations of some embodiments, the allocation unit 505 may be further configured to: for each of the at least one service group, determining a service group priority for the service group based on a service priority for a service included in the service group; and distributing the GPU examples to the service groups according to the order of the service group priorities from high to low.

In some optional implementations of some embodiments, the service information includes service meta information and service state information, the service meta information includes policy information to which the service is bound, and the policy information to which the service is bound is one of: and single policy information, and combining the policy information.

In some optional implementations of some embodiments, the combining policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and the first determination unit may be further configured to: using the service to perform pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service; determining GPU computing forces respectively corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information based on the pressure measurement result, the upper limit policy information, the middle limit policy information and the lower limit policy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper bound strategy, the GPU computing power corresponding to the middle bound strategy and the GPU computing power corresponding to the lower bound strategy.

In some optional implementations of some embodiments, the allocation unit 505 may be further configured to: responding to the GPU instance number required by the target service group being greater than the GPU instance number corresponding to the residual GPU resources, converting the GPU instance number corresponding to the residual GPU resources into a plurality of sub-GPU instances of the GPU instance number required by the target service group; and distributing the plurality of sub GPU instances of the GPU instances needed by the target service set to the target service set.

In some optional implementations of some embodiments, the number of types of GPUs corresponding to the GPU instance is at least one.

It will be appreciated that the elements described in the apparatus 500 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 500 and the units contained therein, and are not described in detail herein.

Referring now to fig. 6, a schematic diagram of an electronic device 600 (e.g., the electronic device of fig. 1) suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 609, or from storage device 608, or from ROM 602. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.

It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be embodied in the apparatus; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: for each service in a service set requiring GPU operation of a graphic processor, acquiring service information of the service; determining GPU computing power required by each service based on service information of the service; grouping the service sets based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service set based on the determined GPU computing power; based on GPU resources and the number of GPU instances required by each service set, distributing GPU instances to each service set

Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a first determination unit, a generation unit, a second determination unit, and an allocation unit. The names of these units do not in some way limit the unit itself, for example, the acquisition unit may also be described as "acquire service information of a service for each service in a set of services requiring graphics processor GPU operations".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

According to one or more embodiments of the present disclosure, there is provided a GPU instance allocation method, including: for each service in a service set requiring GPU operation of a graphic processor, acquiring service information of the service; determining GPU computing power required by each service based on service information of the service; grouping the service sets based on the service priority of each service to generate at least one service group; determining the number of GPU instances required for each service set based on the determined GPU computing power; and distributing the GPU instances to the service groups based on the GPU resources and the number of the GPU instances required by each service group.

According to one or more embodiments of the present disclosure, the obtaining a second feature map based on the sub-image set and a pre-trained second deep learning network includes: inputting each sub-image in the sub-image set into the second deep learning network to generate a sub-image feature map, so as to obtain a sub-image feature map set; and splicing each sub-image characteristic image according to the corresponding sub-image at the space position of the target image to obtain the second characteristic image.

According to one or more embodiments of the present disclosure, the assigning GPU instances to respective service groups includes: for each of the at least one service group, determining a service group priority for the service group based on a service priority for a service included in the service group; and distributing the GPU examples to the service groups according to the order of the service group priorities from high to low.

According to one or more embodiments of the present disclosure, the service information includes service meta information including policy information to which the service is bound, and service status information, the policy information to which the service is bound is one of: and single policy information, and combining the policy information.

According to one or more embodiments of the present disclosure, the above-described combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and determining the GPU computing power required by the service based on the service information of each service, including: using the service to perform pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service; determining GPU computing forces respectively corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information based on the pressure measurement result, the upper limit policy information, the middle limit policy information and the lower limit policy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper bound strategy, the GPU computing power corresponding to the middle bound strategy and the GPU computing power corresponding to the lower bound strategy.

According to one or more embodiments of the present disclosure, the assigning GPU instances to the service groups includes: responding to the GPU instance number required by the target service group being greater than the GPU instance number corresponding to the residual GPU resources, converting the GPU instance number corresponding to the residual GPU resources into a plurality of sub-GPU instances of the GPU instance number required by the target service group; and distributing the plurality of sub GPU instances of the GPU instances needed by the target service set to the target service set.

According to one or more embodiments of the present disclosure, the number of types of GPUs corresponding to the GPU instance is at least one.

According to one or more embodiments of the present disclosure, the above-mentioned apparatus for GPU instance allocation includes: an acquisition unit configured to acquire service information of each service in a service set requiring graphics processor GPU operation; a first determining unit configured to determine GPU computing power required for each service based on service information of the service; a generation unit configured to group the service sets based on a service priority of each service, and generate at least one service group; a second determining unit configured to determine the number of GPU instances required for each service group based on the determined GPU computing power; and the distribution unit is configured to distribute the GPU instances to the service groups based on the GPU resources and the number of the GPU instances required by each service group.

In accordance with one or more embodiments of the present disclosure, the allocation unit 505 may be further configured to: for each of the at least one service group, determining a service group priority for the service group based on a service priority for a service included in the service group; and distributing the GPU examples to the service groups according to the order of the service group priorities from high to low.

According to one or more embodiments of the present disclosure, the above-described combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and the first determination unit may be further configured to: using the service to perform pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service; determining GPU computing forces respectively corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information based on the pressure measurement result, the upper limit policy information, the middle limit policy information and the lower limit policy information; and determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper bound strategy, the GPU computing power corresponding to the middle bound strategy and the GPU computing power corresponding to the lower bound strategy.

In accordance with one or more embodiments of the present disclosure, the allocation unit 505 may be further configured to: responding to the GPU instance number required by the target service group being greater than the GPU instance number corresponding to the residual GPU resources, converting the GPU instance number corresponding to the residual GPU resources into a plurality of sub-GPU instances of the GPU instance number required by the target service group; and distributing the plurality of sub GPU instances of the GPU instances needed by the target service set to the target service set.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments above.

According to one or more embodiments of the present disclosure, there is provided a computer readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements a method as described in any of the embodiments above.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims

1. A GPU instance allocation method, comprising:

for each service in a service set requiring GPU operation of a graphic processor, acquiring service information of the service;

determining GPU computing power required by each service based on service information of the service; wherein the service information includes service meta information and service state information, the service meta information includes policy information to which the service is bound, and the policy information to which the service is bound is one of: single policy information, combined policy information; wherein the combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and determining GPU computing power required by each service based on the service information of the service, including: using the service to perform pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service; determining GPU computing power corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information respectively based on the pressure measurement result, the upper limit policy information, the middle limit policy information and the lower limit policy information; determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper bound policy, the GPU computing power corresponding to the middle bound policy and the GPU computing power corresponding to the lower bound policy;

grouping the service sets based on the service priority of each service, generating at least one service group;

determining the number of GPU instances required for each service set based on the determined GPU computing power;

and distributing the GPU instances to the service groups based on the GPU resources and the number of the GPU instances required by each service group.

2. The method of claim 1, wherein the assigning GPU instances to respective service groups comprises:

for each of the at least one service group, determining a service group priority for the service group based on service priorities of services included in the service group;

and distributing the GPU instances to the service groups according to the order of the service group priorities from high to low.

3. The method of claim 1, wherein the assigning GPU instances to the service groups comprises:

responding to the GPU instance number required by the target service group being greater than the GPU instance number corresponding to the residual GPU resources, converting the GPU instance number corresponding to the residual GPU resources into a plurality of sub-GPU instances of the GPU instance number required by the target service group;

and distributing the plurality of sub GPU instances of the GPU instances needed by the target service set to the target service set.

4. The method of claim 1, wherein the number of GPU models to which the GPU instance corresponds is at least one.

5. A GPU instance allocation device, comprising:

an acquisition unit configured to acquire service information of a service requiring graphics processor GPU operation for each service in a set of services;

a first determining unit configured to determine GPU computing power required for each service based on service information of the service; wherein the service information includes service meta information and service state information, the service meta information includes policy information to which the service is bound, and the policy information to which the service is bound is one of: single policy information, combined policy information; wherein the combination policy information includes: upper bound policy information, middle bound policy information, and lower bound policy information; and specifically configured to: using the service to perform pressure measurement on a plurality of hardware to obtain a pressure measurement result of the service; determining GPU computing power corresponding to the upper limit policy information, the middle limit policy information and the lower limit policy information respectively based on the pressure measurement result, the upper limit policy information, the middle limit policy information and the lower limit policy information; determining the GPU computing power required by the service based on the GPU computing power corresponding to the upper bound policy, the GPU computing power corresponding to the middle bound policy and the GPU computing power corresponding to the lower bound policy;

a generation unit configured to group the service sets based on a service priority of each service, generating at least one service group;

a second determining unit configured to determine the number of GPU instances required for each service group based on the determined GPU computing power;

and the distribution unit is configured to distribute the GPU instances to the service groups based on the GPU resources and the number of the GPU instances required by each service group.

6. An electronic device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.

7. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-4.